U.S. patent application number 11/221752 was filed with the patent office on 2007-03-15 for method and apparatus for sequencing transactions globally in a distributed database cluster.
Invention is credited to Elaine Wang, Frankie Wong, Xiong Yu.
Application Number | 20070061379 11/221752 |
Document ID | / |
Family ID | 37835340 |
Filed Date | 2007-03-15 |
United States Patent
Application |
20070061379 |
Kind Code |
A1 |
Wong; Frankie ; et
al. |
March 15, 2007 |
Method and apparatus for sequencing transactions globally in a
distributed database cluster
Abstract
A system and method for receiving and tracking a plurality of
transactions and distributing the transactions to at least two
replication queues over a network. The system and method comprise a
global queue for storing a number of the received transactions in a
first predetermined order. The system and method also comprise a
sequencer coupled to the global queue for creating a copy of each
of the transactions for each of said at least two replication
queues and for distributing in a second predetermined order each
said copy to each of said at least two replication queues
respectively, said copy containing one or more of the received
transactions.
Inventors: |
Wong; Frankie; (Pickering,
CA) ; Yu; Xiong; (North York, CA) ; Wang;
Elaine; (Aurora, CA) |
Correspondence
Address: |
Gowling Lafleur Henderson LLP;Suite 1600
1 First Canadain Place
100 King Street West
Toronto
ON
M5X 1G5
CA
|
Family ID: |
37835340 |
Appl. No.: |
11/221752 |
Filed: |
September 9, 2005 |
Current U.S.
Class: |
1/1 ;
707/999.201; 707/E17.007; 707/E17.032 |
Current CPC
Class: |
G06F 11/2097 20130101;
G06F 16/27 20190101 |
Class at
Publication: |
707/201 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A system for receiving and tracking a plurality of transactions
and distributing the transactions to at least two replication
queues over a network, the system comprising: a global queue for
storing a number of the received transactions in a first
predetermined order; and a sequencer coupled to the global queue
for creating a copy of each of the transactions for each of said at
least two replication queues and for distributing in a second
predetermined order each said copy to each of said at least two
replication queues respectively, said copy containing one or more
of the received transactions.
2. The system according to claim 1, wherein the predetermined
orders are selected from the group comprising: the first
predetermined order is the same as the second predetermined order;
and the first predetermined order is different from the second
predetermined order.
3. The system according to claim 2 in which the sequencer
distributes each said copy at a predetermined time interval.
4. The system according to claim 2 in which the sequencer
distributes each said copy when the number of the transactions
within the global queue exceeds a predetermined value.
5. The system according to claim 2 in which the sequencer
distributes each said copy upon the earlier of: a predetermined
time interval; and the number of the transactions within the global
queue exceeds a predetermined value.
6. The system according to claim 5 in which each of the
transactions comprises an update transaction and a unique
transaction id assigned by the sequencer.
7. The system according to claim 6 further comprising a global disk
queue in communication with the global queue for receiving and
storing the transactions when the global queue is above a global
threshold.
8. The system according to claim 7 wherein each of said at least
two replication queues have a corresponding replication disk queue
for receiving and storing the transactions from the global queue
when the corresponding replication queue is above a replication
threshold.
9. The system according to claim 8 in which the global queue
receives the transactions from the global disk queue and other than
receives the transactions from said at least one application server
when the global disk queue is other than empty.
10. The system according to claim 5 further comprising an indoubt
transaction queue in communication with the sequencer for storing
the transactions identified as having unknown status by a database
server during system failures.
11. The system according to claim 6 wherein the update transaction
comprises at least one of a read, insert, update or delete request
for at least one database in communication with at least one of
said at least two replication queues.
12. The system according to claim 6 further comprising a resent
transaction queue for storing the transactions when the
transactions repeated the request for the transaction id.
13. The system according to claim 2, wherein the global queue is
configured for receipt of the received transactions from a network
entity selected from the group comprising: an application; and an
application server.
14. The system according to claim 2, wherein the global queue is a
searchable first-in first-out pipe.
15. The system according to claim 14 further comprising the
sequencer configured for assuring the order of transactions in the
global queue remain consistent with their execution order at a
database server coupled to at least one of the replication
queues.
16. The system according to claim 14, wherein the global disk queue
is configured for storing an indexed and randomly accessible data
set.
17. The system according to claim 2, wherein the global queue and
sequencer are hosted on a network entity selected from the group
comprising: a central control server and a peer-to-peer node.
18. A system for receiving a plurality of transactions from at
least one application server, distributing the transactions to at
least two replication queues and applying the transactions to a
plurality of databases comprising: a director coupled to each of
said at least one application server for capturing a plurality of
database calls therefrom as the plurality of transactions; and a
controller for receiving each of the plurality of transactions, the
controller configured for storing the transactions within a global
queue in a predetermined order, for generating a copy of each said
transaction for each of said at least two replication queues, and
for transmitting in the predetermined order each said copy to each
of said at least two replication queues respectively.
19. The system according to claim 18 further comprising at least
two replication servers including said at least two replication
queues wherein each of said at least two replication servers is
coupled to each of the databases; wherein the director routes each
of the transactions to one or more of the databases relative to the
workload and transaction throughput.
20. The system according to claim 19 further comprising a backup
controller for receiving the transactions from said at least one
application server upon failure of the controller, the backup
controller including a backup global queue wherein the backup
global queue is substantially synchronized with the controller and
the backup global queue is a copy of the global queue.
21. A method for receiving and tracking a plurality of transactions
and distributing the transactions to at least two replication
queues over a network, the method comprising: storing a number of
the received transactions in a first predetermined order in a
global queue; creating a copy of each of the transactions for each
of said at least two replication queues; and distributing in a
second predetermined order each said copy to each of said at least
two replication queues respectively, said copy containing one or
more of the received transactions.
22. The method according to claim 21 wherein the step of
distributing each said copy occurs at a predetermined time
interval.
23. The method according to claim 21 wherein the step of
distributing each said copy occurs when the number of the
transactions within the global queue exceeds a predetermined
number.
24. The method according to claim 21 wherein the step of
distributing each said copy occurs upon the earlier of: a
predetermined time interval; and the number of the transactions
within the global queue exceeds a predetermined number.
25. The method according to claim 24, wherein each of the
transactions comprises an update transaction and a unique
transaction id assigned by the sequencer.
26. The method according to claim 24 further comprising the step of
receiving and storing the transactions within a global disk queue
when the global queue storage capacity reaches a global
threshold.
27. The method according to claim 21 further comprising the steps
of: determining whether the global disk queue is other than empty;
and receiving the transaction from the global disk queue rather
than receiving the transactions from said at least one application
server when the global disk queue is other than empty.
28. The method according to claim 21 further comprising the step of
storing the transactions within an indoubt transaction queue during
system failures.
29. The method according to claim 25 wherein the update transaction
comprises at least one of a read, insert, update or delete request
for at least one database in communication with at least one of
said at least two replication queues.
30. The method according to claim 24 further comprising the steps
of: determining when at least one of said at least two replication
queues are above a replication threshold, each of said at least two
replication queues having a corresponding replication disk queue;
storing a number of the transactions within said corresponding
replication disk queue based upon the determination; and sending an
alert to notify when said at least two replication queues and said
corresponding replication disk queue capacity reach a preselected
threshold.
31. The method according to claim 30 further comprising the step
of: redirecting the transactions to at least one of said at least
two replication queues being below said preselected threshold,
based on receiving the alert.
32. A system for receiving and tracking a plurality of transactions
and distributing the transactions to at least two replication
queues over a network, the system comprising: means for storing a
number of the received transactions in a first predetermined order;
and means for creating a copy of each of the transactions for each
of said at least two replication queues and for distributing in a
second predetermined order each said copy to each of said at least
two replication queues respectively, said copy containing one or
more of the received transactions.
Description
FIELD OF THE INVENTION
[0001] This invention relates generally to the sequencing and
processing of transactions within a cluster of replicated
databases.
BACKGROUND OF THE INVENTION
[0002] A database has become the core component of most computer
application software nowadays. Typically application software makes
use of a single or multiple databases as repositories of data
(content) required by the application to function properly. The
application's operational efficiency and availability is greatly
dependent on the performance and availability of these database(s),
which can be measured by two metrics: (1) request response time;
and (2) transaction throughput.
[0003] There are several techniques for improving application
efficiency based on these two metrics: (1) Vertical scale up of
computer hardware supporting the application--this is achieved by
adding to or replacing existing hardware with faster central
processing units (CPUs), random access memory (RAM), disk
adapters/controllers, and network; and (2) Horizontal scale out
(clustering) of computer hardware supporting the application--this
approach refers to connecting additional computing hardware to the
existing configuration by interconnecting them with a fast network.
Although both approaches can address the need of reducing request
response time and increase transaction throughput, the scale out
approach can offer higher efficiency at lower costs, thus driving
most new implementations into clustering architecture.
[0004] The clustering of applications can be achieved readily by
running the application software on multiple, interconnected
application servers that facilitate the execution of the
application software and provide hardware redundancy for high
availability, with the application software actively processing
requests concurrently. However current database clustering
technologies cannot provide the level of availability and
redundancy in a similar active-active configuration. Consequently
database servers are primarily configured as active-standby,
meaning that one of the computer systems in the cluster does not
process application request until a failover occurs. Active-standby
configuration wastes system resources, extends the windows of
unavailability and increases the chance of data loss.
[0005] To cluster multiple database servers in an active-active
configuration, one technical challenge is to resolve update
conflict. An update conflict refers to two or more database servers
updating the same record in the databases that they manage. Since
data in these databases must be consistent among them in order to
scale out for performance and achieve high availability, the
conflict must be resolved. Currently there are two different
schemes of conflict resolution: (1) time based resolution; and (2)
location based resolution. However, neither conflict resolution
schemes can be enforced without some heuristic decision to be made
by human intervention. It is not possible to determine these
heuristic decision rules unless there is a thorough understanding
of the application software business rules and their implications.
Consequently, most clustered database configurations adopt the
active-standby model, and fail to achieve high performance and
availability at the same time. There is a need for providing a
database management system that uses an active-active configuration
and substantially reduces the possibility of update conflicts that
may occur when two or more databases attempt to update a record at
the same time.
[0006] The systems and methods disclosed herein provide a system
for globally managing transaction requests to one or more database
servers and to obviate or mitigate at least some of the above
presented disadvantages.
SUMMARY OF THE INVENTION
[0007] To cluster multiple database servers in an active-active
configuration, one technical challenge is to resolve update
conflict. An update conflict refers to two or more database servers
updating the same record in the databases that they manage. Since
data in these databases must be consistent among them in order to
scale out for performance and achieve high availability, the
conflict must be resolved. Currently there are two different
schemes of conflict resolution: (1) time based resolution; and (2)
location based resolution. However, neither conflict resolution
schemes can be enforced without some heuristic decision to be made
by human intervention. Consequently, most clustered database
configurations adopt the active-standby model, and fail to achieve
high performance and availability at the same time. Contrary to
current database configurations there is provided a system and
method for receiving and tracking a plurality of transactions and
distributing the transactions to at least two replication queues
over a network. The system and method comprise a global queue for
storing a number of the received transactions in a first
predetermined order. The system and method also comprise a
sequencer coupled to the global queue for creating a copy of each
of the transactions for each of said at least two replication
queues and for distributing in a second predetermined order each
said copy to each of said at least two replication queues
respectively, said copy containing one or more of the received
transactions.
[0008] One aspect provided is a system for receiving and tracking a
plurality of transactions and distributing the transactions to at
least two replication queues over a network, the system comprising:
a global queue for storing a number of the received transactions in
a first predetermined order; and a sequencer coupled to the global
queue for creating a copy of each of the transactions for each of
said at least two replication queues and for distributing in a
second predetermined order each said copy to each of said at least
two replication queues respectively, said copy containing one or
more of the received transactions.
[0009] A further aspect provided is a system for receiving a
plurality of transactions from at least one application server,
distributing the transactions to at least two replication queues
and applying the transactions to a plurality of databases
comprising: a director coupled to each of said at least one
application server for capturing a plurality of database calls
therefrom as the plurality of transactions; and a controller for
receiving each of the plurality of transactions, the controller
configured for storing the transactions within a global queue in a
predetermined order, for generating a copy of each said transaction
for each of said at least two replication queues, and for
transmitting in the predetermined order each said copy to each of
said at least two replication queues respectively.
[0010] A still further aspect provided is a method for receiving
and tracking a plurality of transactions and distributing the
transactions to at least two replication queues over a network, the
method comprising: storing a number of the received transactions in
a first predetermined order in a global queue; creating a copy of
each of the transactions for each of said at least two replication
queues; and distributing in a second predetermined order each said
copy to each of said at least two replication queues respectively,
said copy containing one or more of the received transactions.
[0011] A still further aspect provided is a system for receiving
and tracking a plurality of transactions and distributing the
transactions to at least two replication queues over a network, the
system comprising: means for storing a number of the received
transactions in a first predetermined order; and means for creating
a copy of each of the transactions for each of said at least two
replication queues and for distributing in a second predetermined
order each said copy to each of said at least two replication
queues respectively, said copy containing one or more of the
received transactions.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] Exemplary embodiments of the invention will now be described
in conjunction with the following drawings, by way of example only,
in which:
[0013] FIG. 1A is a block diagram of a system for sequencing
transactions;
[0014] FIG. 1B is a block diagram of a transaction replicator of
the system of FIG. 1A;
[0015] FIGS. 1C, 1D and 1E show an example operation of receiving
and processing transactions for the system of FIG. 1A;
[0016] FIG. 2 is a block diagram of a director of the system of
FIG. 1A;
[0017] FIG. 3 is a block diagram of a monitor of the system of FIG.
1A;
[0018] FIG. 4 is an example operation of the transaction replicator
of FIG. 1B;
[0019] FIG. 5 is an example operation of a global transaction queue
and a replication queue of FIG. 1B;
[0020] FIG. 6 is an example operation of the transaction replicator
of FIG. 1B for resolving gating and indoubt transactions; and
[0021] FIG. 7 is an example operation of a replication server of
FIG. 1B.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0022] A method and apparatus for sequencing transactions in a
database cluster is described for use with computer programs or
software applications whose functions are designed primarily to
replicate update transactions to one or more databases such that
data in these databases are approximately synchronized for read and
write access.
[0023] Referring to FIG. 1A, shown is a system 10 comprising a
plurality of application servers 7 for interacting with one or more
database servers 4 and one or more databases 5 via a transaction
replicator 1. It is understood that in two-tier applications, each
of the application 7 instances represents a client computer. For
three-tiered applications, each of the application 7 instances
represents an application server that is coupled to one or more
users (not shown). Accordingly, it is recognized that the
transaction replicator 1 can receive transactions from applications
7, application servers 7, or a combination thereof.
[0024] Referring to FIGS 1A and 1B, the transaction replicator 1 of
the system 10, receives transaction requests from the application
servers 7 and provides sequenced and replicated transactions using
a controller 2 to one or more replication servers 3, which apply
the transactions to the databases 5. By providing sequencing of
transactions in two or more tiered application architectures, the
transaction replicator 1 helps to prevent the transaction requests
from interfering with each other and facilitates the integrity of
the databases 5. For example, a transaction refers to a single
logical operation from a user application 7 and typically include
requests to read, insert, update and delete records within a
predetermined database 5.
[0025] Referring again to FIG. 1A, the controller 2 can be the
central command center of the transaction replicator 1 that can run
for example on the application servers 7, the database servers 4 or
dedicated hardware. The controller 2 may be coupled to a backup
controller 9 that is set up to take over the command when the
primary controller 2 fails. The backup controller 9 is
approximately synchronized with the primary controller such that
there exists transaction integrity. It is recognized that the
controller 2 and associated transaction replicator 1 can also be
configured for use as a node in a peer-to-peer network, as further
described below.
[0026] Referring again to FIG. 1A, when a backup and a primary
controller are utilized, a replica global transaction queue is
utilized. The backup controller 9 takes over control of transaction
replicator 1 upon the failure of the primary controller 2.
Preferably, the primary and backup controllers are installed at
different sites and a redundant WAN is recommended between the two
sites.
[0027] As is shown in FIG. 1B, the controller 2 receives input
transactions 11 from a user application 7 and provides sequenced
transactions 19 via the replication servers 3, the sequenced
transactions 19 are then ready for commitment to the database
servers 4. The controller 2 comprises a resent transaction queue 18
(resent TX queue), an indoubt transaction queue 17 (indoubt TX
queue), a global transaction sequencer 12 (global TX sequencer), a
global TX queue 13 (global TX queue) and at least one global disk
queue 14. The global queue 13 (and other queues if desired) can be
configured as searchable a first-in-first out pipe (FIFO) or as a
first-in-any-out (FIAO), as desired. For example, a FIFO queue 13
could be used when the contents of the replication queues 15 are
intended for databases 5, and a FIAO queue 13 could be used when
the contents of the replication queues 15 are intended for
consumption by unstructured data processing environments (not
shown). Further, it is recognized that the global disk queue 14 can
be configured for an indexed and randomly accessible data set.
[0028] The transaction replicator 1 maintains the globally
sequenced transactions in two different types of queues: the global
TX queue 13 and one or more replication queues 15 equal to that of
the database server 4 instances. These queues are created using
computer memory with spill over area on disks such as the global
disk queue 14 and one or more replication disk queues 16. The disk
queues serve a number of purposes including: persist transactions
to avoid transaction loss during failure of a component in the
cluster; act as a very large transaction storage (from gigabytes to
terabytes) that computer memory cannot reasonably provide
(typically less than 64 gigabytes). Further, the indoubt TX queue
17 is only used when indoubt transactions are detected after a
certain system failures. Transactions found in this queue have an
unknown transaction state and require either human intervention or
pre-programmed resolution methods to resolve.
[0029] For example, in the event of a temporary communication
failure resulting in lost response from the global TX sequencer 12
to a transaction ID request, the application resends the request
which is then placed in the resent TX queue 18. Under this
circumstance, there can be two or more transactions with different
Transaction ID in the global TX queue 13 and duplicated
transactions are removed subsequently.
[0030] In normal operation, the controller 2 uses the global TX
queue 13 to track the status of each of the input transactions and
to send the committed transaction for replication in sequence.
Referring to FIGS. 1C, 1D, and 1E, shown is an example operation of
the system 10 for receiving and processing a new transaction. For
example, upon receiving a new transaction, the sequencer 12 assigns
a new transaction ID to the received transaction. The transaction
ID is a globally unique sequence number for each transaction within
a replication group. In FIG. 1C, the sequence ID for the newly
received transaction is "K". Once the controller 2 receives the
transaction, the transaction and its ID are transferred to the
global TX queue 20 if there is space available. Otherwise, if the
global TX queue 13 is above a predetermined threshold and is full,
for example, as shown in FIG. 1C, the transaction K and its ID are
stored in the global disk queue 14 (FIG. 1D).
[0031] Before accepting any new transactions in the global TX
queue, the sequencer distributes the committed transactions from
the global TX queue 13 to a first replication server 20 and a
second (or more) replication server 23 for execution against the
databases. As will be discussed, the transfer of the transactions
to the replication servers can be triggered when at least one of
the following two criteria occurs: 1) a predetermined transfer time
interval and 2) a predetermined threshold for the total number of
transactions within the global TX queue 13 is met. However, each
replication server 20, 23 has a respective replication queue 21, 24
and applies the sequenced transactions, obtained from the global
queue 13, at its own rate.
[0032] For example, when a slower database server is unable to
process the transactions at the rate the transactions are
distributed by the controller 2, the transactions in the
corresponding replication queue are spilled over to the replication
disk queues. As shown in FIGS. 1C and 1D, transaction F is
transferred from the global TX queue 13 to the first and second
replication servers 20, 23. The first replication server 20 has a
first replication queue 21 and a first replication disk queue 22
and the second replication server 23 has a second replication queue
22 and a second replication disk queue 25. The replication queues
are an ordered repository of update transactions stored in computer
memory for executing transactions on a predetermined database. In
this case, since the second replication queue 24 is above a
predetermined threshold (full, for example) transaction F is
transferred to the second replication disk queue 25. Referring to
FIG. 1D and FIG. 1E, once space opens up in the second replication
queue 24 as transaction J is applied to its database server, the
unprocessed transaction F in the second replication disk queue 25
is moved to the second replication queue 24 for execution of the
transaction request against the data within its respective
database. In the case where both the replication disk queue and the
replication queues are above a preselected threshold (for example,
full), an alert is sent by the sequencer 12 and the database is
marked unusable until the queues become empty.
[0033] The core functions of the controller 2 can be summarized as
registering one or more directors 8 and associating them with their
respective replication groups; controlling the replication servers'
activities; maintaining the global TX queue 13 that holds all the
update transactions sent from the directors 8; synchronizing the
global TX queue 13 with the backup controller 9(where applicable);
managing all replication groups defined; distributing committed
transactions to the replication servers 3; tracking the operational
status of each database server 4 within a replication group;
providing system status to a monitor 6; and recovering from various
system failures.
[0034] The registry function of the controller 2 occurs when
applications are enabled on a new application server 7 to access
databases 5 in a replication group. Here, the director 8 on the new
application server contacts the controller 2 and registers itself
to the replication group. Advantageously, this provides dynamic
provisioning of application servers to scale up system capacity on
demand. The registration is performed on the first database call
made by an application. Subsequently the director 8 communicates
with the controller 2 for transaction and server status
tracking.
[0035] The replication server control function allows the
controller 2 to start the replication servers 3 and monitors their
state. For example, when an administrator requests to pause
replication to a specific database 5, the controller then instructs
the replication server to stop applying transactions until an
administrator or an automated process requests it.
[0036] The replication group management function allows the
controller 2 to manage one or more groups of databases 5 that
require transaction synchronization and data consistency among
them. The number of replication groups that can be managed and
controlled by the controller 2 is dependent upon the processing
power of the computer that the controller is operating on and the
sum of the transaction rates of all the replication groups.
Director
[0037] Referring to FIG. 2, shown is a block diagram of the
director 8 of the system 10 of FIG. 1A. The director can be
installed on the application server 7 or the client computer. The
director 8 is for initiating a sequence of operations to track the
progress of a transaction. The director 8 comprises a first 27, a
second 28, a third 29 and a fourth 30 functional module. According
to an embodiment of the system 10, the director 8 wraps around a
vendor supplied JDBC driver. As discussed earlier, the director 8
is typically installed on the application server 7 in a 3-tier
architecture, and on the client computer in a 2-tier architecture.
As a wrapper, the director 8 can act like an ordinary JDBC driver
to the applications 7, for example. Further, the system 10 can also
support any of the following associated with the transaction
requests, such as but not limited to: [0038] 1. a database access
driver/protocol based on SQL for a relational database 5 (ODBC,
OLE/DB, ADO.NET, RDBMS native clients, etc. . .); [0039] 2.
messages sent over message queues of the network; [0040] 3. XML
(and other structured definition languages) based transactions; and
[0041] 4. other data access drivers as desired.
[0042] As an example, the first module 27 captures all JDBC calls
26, determines transaction type and boundary, and analyzes the SQLs
in the transaction. Once determined to be an update transaction,
the director 8 initiates a sequence of operations to track the
progress of the transaction until it ends with a commit or
rollback. Both DDL and DML are captured for replication to other
databases in the same replication group.
[0043] The second module 28 collects a plurality of different
statistical elements on transactions and SQL statements for
analyzing application execution and performance characteristics.
The statistics can be exported as comma delimited text file for
importing into a spreadsheet.
[0044] In addition to intercepting and analyzing transactions and
SQL statements, the director's third module 29, manages database
connections for the applications 7. In the event that one of the
databases 5 should fail, the director 8 reroutes transactions to
one or more of the remaining databases. Whenever feasible, the
director 8 also attempts to re-execute the transactions to minimize
in flight transaction loss. Accordingly, the director 8 has the
ability to instruct the controller 2 as to which database 5 is the
primary database for satisfying the request of the respective
application 7.
[0045] Depending on a database's workload and the relative power
settings of the database servers 4 in a replication group, the
director 8 routes read transactions to the least busy database
server 4 for processing. This also applies when a database server 4
failure has resulted in transaction redirection.
[0046] Similarly, if the replication of transactions to a database
server 4 becomes too slow for any reason such that the transactions
start to build up and spill over to the replication disk queue 16,
the director 8 redirects all the read transactions to the least
busy database server 4. Once the disk queue becomes empty, the
director 8 subsequently allows read access to that database.
Accordingly, the fill/usage status of the replication disk queues
in the replication group can be obtained or otherwise received by
the director 8 for use in management of through-put rate of
transactions applied to the respective databases 5.
[0047] For example, when the director 8 or replication servers 3
fails to communicate with the database servers 4, they report the
failure to the controller 2 which then may redistribute
transactions or take other appropriate actions to allow continuous
operation of the transaction replicator 1. When one of the database
servers 4 cannot be accessed, the controller 2 instructs the
replication server 3 to stop applying transactions to it and relays
the database lock down status to a monitor 6. The transactions
start to accumulate within the queues until the database server 3
is repaired and the administrator or an automated process instructs
to resume replication via the monitor 6. The monitor 6 may also
provide other predetermined administrative commands (for example:
create database alias, update parameters, changing workload
balancing setting).
Monitor
[0048] Referring again to FIG. 1A, the monitor 6 allows a user to
view and monitor the status of the controllers 2, the replication
servers 3, and the databases 5. Preferably, the monitor 6 is a web
application that is installed on an application or application
server 7 and on the same network as the controllers 2.
[0049] Referring to FIG. 3, shown is a diagrammatic view of the
system monitor 6 for use with the transaction replicator 1. The
system monitor 6 receives input data 32 from both primary and
backup controllers 2, 9 (where applicable), replication servers 3,
the database servers 4 and relevant databases 5 within a
replication group. This information is used to display an overall
system status on a display screen 31.
[0050] For example, depending on whether the controller is
functioning or a failure has occurred, the relevant status of the
controller 2 is shown. Second, the status of each of the
replication servers 3 within a desired replication group is shown.
A detailed description of the transaction rate, the number of
transactions within each replication queue 15, the number
transactions within each replication disk queue 16 is further
shown. The monitor 6 further receives data regarding the databases
5 and displays the status of each database 5 and the number of
committed transactions.
[0051] The administrator can analyze the above information and
choose to manually reroute the transactions. For example, when it
is seen that there exists many transactions within the replication
disk queue 16 of a particular replication server 3 or that the
transaction rate of a replication server 3 is slow, the
administrator may send output data in the form of a request 33 to
distribute the transactions for a specified amount of time to a
different database server within the replication group.
[0052] Referring to FIG. 4, shown is a flow diagram overview of the
method 100 for initializing and processing transactions according
to the invention. The global TX sequencer 12 also referred to as
the sequencer hereafter and as shown in FIG. 1B, is the control
logic of the transaction replicator 1.
[0053] When the controller 2 is started, it initializes itself by
reading from configuration and property files the parameters to be
used in the current session 101. The global TX Queue 13, indoubt TX
queue 17 and resent TX queue 18 shown in FIG. 1B, are created and
emptied in preparation for use. Before accepting any new
transactions, the sequencer 12 examines the global disk queue 14 to
determine if any transactions are left behind from previous
session. For example, if a transaction is found on the global disk
queue 14, it implies at least one database in the cluster is out of
synchronization with the others and the database must be applied
with these transactions before it can be accessed by applications.
Transactions on the global disk queue 14 are read into the global
TX queue 13 in preparation for applying to the database(s) 5. The
sequencer 12 then starts additional servers called replication
servers 3 that create and manage the replication queues 15. After
initialization is complete, the sequencer 12 is ready to accept
transactions from the application servers 7.
[0054] The sequencer 12 examines the incoming transaction to
determine whether it is a new transaction or one that has already
been recorded in the global TX queue 102. For a new transaction,
the sequencer 12 assigns a Transaction ID 103 and records the
transaction together with this ID in the global TX queue 13. If the
new transactions ID is generated as a result of lost ID 104, the
transaction and the ID are stored in the resent TX queue 109 for
use in identifying duplicated transactions. The sequencer 12 checks
the usage of the global TX queue 105 to determine if the maximum
number of transactions in memory has already been exceeded. The
sequencer 12 stores the transaction ID in the global TX queue 13 if
the memory is not full 106. Otherwise, the sequencer 12 stores the
transaction ID in the global disk queue 107. The sequencer 12 then
returns the ID to the application 108 and the sequencer 12 is ready
to process another request from the application.
[0055] When a request from the application or application server 7,
comes in with a transaction that has already obtained a transaction
ID previously and recorded in the global TX queue 13, the sequencer
12 searches and retrieves the entry from either the global TX queue
13 or the disk queue 110. If this transaction has been committed to
the database 111, the entry's transaction status is set to
"committed" 112 by the sequencer 12, indicating that this
transaction is ready for applying to the other databases 200. If
the transaction has been rolled back 113, the entry's transaction
status is marked "for deletion" 114 and as will be described,
subsequent processing 200 deletes the entry from the global TX
queue. If the transaction failed with an indoubt status, the
entry's transaction status is set to "indoubt" 115. An alert
message is sent to indicate that database recovery may be required
116. Database access is suspended immediately 117 until the indoubt
transaction is resolved manually 300 or automatically 400.
[0056] Referring to FIG. 5, shown is a flow diagram of the method
200 for distributing transactions from the global TX queue 13
according to the invention. The global TX queue 13 is used to
maintain the proper sequencing and states of all update
transactions at commit time. To apply the committed transactions to
the other databases, the replication queue 5 is created by the
sequencer 12 for each destination database. The sequencer 12 moves
committed transactions from the global TX queue to the replication
queue based on the following two criteria: (1) a predetermined
transaction queue threshold (Q threshold) and (2) a predetermined
sleep time (transfer interval).
[0057] For a system with sustained workload, the Q Threshold is the
sole determining criteria to move committed transactions to the
replication queue 201. For a system with sporadic activities, both
the Q Threshold and transfer interval are used to make the transfer
decision 201, 213. Transactions are transferred in batches to
reduce communication overhead. When one or both criteria are met,
the sequencer 12 prepares a batch of transactions to be moved from
the global TX queue 13 to the replication queue 202. If the batch
contains transactions, the sequencer 12 removes all the rolled back
transactions from it because they are not to be applied to the
other databases 204. The remaining transactions in the batch are
sent to the replication queue for processing 205. If the batch does
not contain any transaction 203, the sequencer 12 searches the
global TX queue for any unprocessed transactions (status is
committing) 206. Since transactions are executed in a same order of
occurrence, unprocessed transactions typically occur when a
previous transaction has not completed, therefore delaying the
processing of subsequent transactions. A transaction that is being
committed and has not yet returned its completion status is called
gating transaction. A transaction that is being committed and
returns a status of unknown is called indoubt transaction. Both
types of transactions will remain in the state of "committing" and
block processing of subsequent committed transactions, resulting in
the transaction batch being empty. The difference between a gating
transaction and an indoubt transaction is that gating transaction
is transient, meaning that it will eventually become committed,
unless there is a system failure that causes it to remain in the
"gating state" indefinitely. Therefore when the sequencer 12 finds
unprocessed transactions 207 it must differentiate the two types of
"committing" transactions 208. For a gating transaction, the
sequencer 12 sends out an alert 209 and enters the transaction
recovery process 300. Otherwise, the sequencer 12 determines if the
transaction is resent from the application 210, 211, and removes
the resent transaction from the global TX queue 211. A resent
transaction is a duplicated transaction in the global TX queue 13
and has not been moved to the replication queue 15. The sequencer
12 then enters into a sleep because there is no transaction to be
processed at the time 214. The sleep process is executed in its own
thread such that it does not stop 200 from being executed at any
time. It is a second entry point into the global queue size check
at 201. When the sleep time is up, the sequencer 12 creates the
transaction batch 202 for transfer to the replication queue 203,
204, 205.
[0058] Referring to FIG. 6, shown is a flow diagram illustrating
the method 300 for providing manual recovery of transactions 116 as
shown in FIG. 100. There are two scenarios under which the
sequencer 12 is unable to resolve gating transactions and indoubt
transactions caused by certain types of failure and manual recovery
may be needed. First, a gating transaction remains in the global TX
queue 13 for an extended period of time, stopping all subsequent
committed transactions from being applied to the other databases.
Second, a transaction status is unknown after some system component
failure. The sequencer 12 first identifies the transactions causing
need resolution 301 and send out an alert 302. Then the transaction
can be manually analyzed to determine whether the transaction has
been committed or rolled back in the database 304 and whether any
manual action needs to be taken. If the transaction is found to
have been rolled back in the database, the transaction entry is
deleted manually from the global TX queue 305. If the transaction
has been committed to the database, it is manually marked
"committed" 306. In both cases the replication process can resume
without having to recover the database 500. If the transaction is
flagged as indoubt in the database, it must be forced to commit or
roll back at the database before performing 304, 305 and 306.
[0059] Referring again to FIG. 6, the process 400 is entered when
an indoubt transaction is detected 115 and automatic failover and
recovery of a failed database is performed. Unlike gating
transactions that may get resolved in the next moment, an indoubt
transaction is permanent until the transaction is rolled back or
committed by hand or by some heuristic rules supported by the
database. If the resolution is done with heuristic rules, the
indoubt transaction will have been resolved as "committed" or
"rolled back" and will not require database failover or recovery.
Consequently the process 400 is only entered when an indoubt
transaction cannot be heuristically resolved and an immediate
database failover is desirable. Under the automatic recovery
process, the database is marked as "needing recovery" 401, with an
alert sent out 402 by the sequencer 12. To help prevent further
transaction loss, the sequencer 12 stops the generation of new
transaction ID 403 and moves the indoubt transactions to the
indoubt TX queue 404. While the database is marked "needing
recovery" the sequencer 12 replaces it with one of the available
databases in the group 405 and enables the transaction ID
generation 406 such that normal global TX queue processing can
continue 200. The sequencer 12 then executes a user defined
recovery procedure to recover the failed database 407. For example,
if the database recovery fails, the recovery process is reentered
408, 407.
[0060] Referring to FIG. 7, shown is a flow diagram illustrating
the processing of committed transactions by the replication servers
3 and the management of transactions in the replication queue 15
according to the present invention. Replication queues 15 are
managed by the replication servers 3 started by the sequencer 12.
One of the replication servers 3 receives batches of transactions
from the sequencer 12. The process 500 is entered if a new batch of
committed transactions arrives or at any time when queued
transactions are to be applied to the databases.
[0061] If the process is entered because of new transactions 501,
the batch of transactions are stored in the replication queue in
memory 508, 509, or in replication disk queue 511 if the memory
queue is full. Replication disk queue capacity is determined by the
amount of disk space available. If the disk is above a
predetermined threshold or is full for example 510, an alert is
sent 512 by the sequencer 12 and the database is marked unusable
513 because committed transactions cannot be queued up anymore.
[0062] If the process is entered in an attempt to apply
transactions in the replication queue to the databases, the
replication server first determines whether there is any
unprocessed transaction in the replication queue in memory 502. If
the memory queue is empty but unprocessed transactions are found in
the replication disk queue 503, they are moved from the disk queue
to the memory queue in batches for execution 504, 505. Upon
successful execution of all the transactions in the batch they are
removed from the replication queue by the replication server and
another batch of transactions are processed 501. If there are
transactions in the replication disk queue 16, the processing
continues until the disk queue is empty, at which time the
replication server 3 waits for more transactions from the global TX
queue 501. During execution of the transactions in the replication
queue 15, error may occur and the execution must be retried until
the maximum number of retries is exceeded 507, then an alert is
sent 512 with the database marked unusable 513. However, even
though a database is marked unusable, the system continues to serve
the application requests. The marked database is inaccessible until
the error condition is resolved. The replication server 3 stops
when it is instructed by the sequencer during the apparatus
shutdown process 118, 119 and 120 shown in FIG. 4.
[0063] It will be evident to those skilled in the art that the
system 10 and its corresponding components can take many forms, and
that such forms are within the scope of the invention as claimed.
For example, the transaction replicators 1 can be configured as a
plurality of transaction replicators 1 in a replicator peer-to-peer
(P2P) network, in which each database server 4 is assigned or
otherwise coupled to at least one principal transaction replicator
1. The distributed nature of the replicator P2P network can
increase robustness in case of failure by replicating data over
multiple peers (i.e. transaction replicators 1), and by enabling
peers to find/store the data of the transactions without relying on
a centralized index server. In the latter case, there may be no
single point of failure in the system 10 when using the replicator
P2P network. For example, the application or application servers 7
can communicate with a selected one of the database servers 7, such
that the replicator P2P network of transaction replicators 1 would
communicate with one another for load balancing and/or failure mode
purposes. One example would be one application server 7 sending the
transaction request to one of the transaction replicators 1, which
would then send the transaction request to another of the
transaction replicators 1 of the replicator P2P network, which in
turn would replicate and then communicate the replicated copies of
the transactions to the respective database servers 4.
[0064] Further, it is recognized that the applications/application
servers 7 could be configured in an application P2P network such
that two or more application computers could share their resources
such as storage hard drives, CD-ROM drives, and printers. Resources
would then accessible from every computer on the application P2P
network. Because P2P computers have their own hard drives that are
accessible by all computers, each computer can act as both a client
and a server in the application P2P networks (e.g. both as an
application 7 and as a database 4). P2P networks are typically used
for connecting nodes via largely ad hoc connections. Such P2P
networks are useful for many purposes, such as but not limited to
sharing content files, containing audio, video, data or anything in
digital format is very common, and realtime data, such as Telephony
traffic, is also passed using P2P technology. The term "P2P
network" can also mean grid computing. A pure P2P file transfer
network does not have the notion of clients or servers, but only
equal peer nodes that simultaneously function as both "clients" and
"servers" to the other nodes on the network. This model of network
arrangement differs from the client-server model where
communication is usually to and from a central server or
controller. It is recognized that there are three major types of
P2P network, by way of example only, namely: [0065] 1) Pure P2P in
which peers act as clients and server, there is no central server,
and there is no central router; [0066] 2) Hybrid P2P which has a
central server that keeps information on peers and responds to
requests for that information, peers are responsible for hosting
the information as the central server does not store files and for
letting the central server know what files they want to share and
for downloading its shareable resources to peers that request it,
and route terminals are used as addresses which are referenced by a
set of indices to obtain an absolute address; and [0067] 3) Mixed
P2P which has both pure and hybrid characteristics. Accordingly, it
is recognized that in the application and replicator P2P networks
the applications/application servers 7 and the transaction
replicators 1 can operate as both clients and servers, depending
upon whether they are the originator or receiver of the transaction
request respectively. Further, it is recognized that both the
application and replicator P2P networks can be used in the system
10 alone or in combination, as desired.
[0068] in view of the above, the spirit and scope of the appended
claims should: not be limited to the examples or the description of
the preferred versions contained herein.
* * * * *