U.S. patent application number 13/953451 was filed with the patent office on 2015-01-29 for shared data de-duplication method and system.
This patent application is currently assigned to SAP AG. The applicant listed for this patent is Or Igelka. Invention is credited to Or Igelka.
Application Number | 20150032802 13/953451 |
Document ID | / |
Family ID | 50943027 |
Filed Date | 2015-01-29 |
United States Patent
Application |
20150032802 |
Kind Code |
A1 |
Igelka; Or |
January 29, 2015 |
SHARED DATA DE-DUPLICATION METHOD AND SYSTEM
Abstract
This disclosure relates to synchronizing dictionaries of
acceleration nodes in a computer network. For example, dictionaries
of a plurality of acceleration nodes of a client-server network can
be synchronized to each include one or more identical data items
and data identifier pairs. Synchronization can include transmitting
a particular data item, or a combination of a data item and an
associated data identifier, to another acceleration node which
includes it in its dictionary. A particular acceleration node can,
instead of transmitting a data item, transmit an associated data
identifier to another acceleration node. As all (or a subset) of
the acceleration nodes can have an identical dictionary when
employing the methods described herein, the particular acceleration
node can use the same dictionary to communicate with all (or the
subset of) other acceleration nodes of the computer network.
Inventors: |
Igelka; Or; (Ramat Gan,
IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Igelka; Or |
Ramat Gan |
|
IL |
|
|
Assignee: |
SAP AG
Walldorf
DE
|
Family ID: |
50943027 |
Appl. No.: |
13/953451 |
Filed: |
July 29, 2013 |
Current U.S.
Class: |
709/203 |
Current CPC
Class: |
H04L 47/801 20130101;
H04L 47/783 20130101; H04L 67/2828 20130101 |
Class at
Publication: |
709/203 |
International
Class: |
H04L 12/927 20060101
H04L012/927; H04L 12/911 20060101 H04L012/911; H04L 29/06 20060101
H04L029/06 |
Claims
1. A computer-implemented method comprising: identifying a first
acceleration node included in a computer network comprising a
plurality of acceleration nodes, an acceleration node to accelerate
transmission of resources between a client computer system and a
server computer system connected through one or more of the
plurality of acceleration nodes, the first acceleration node
including a first dictionary of data items and data identifiers,
each data identifier identifying a corresponding data item, and
wherein an acceleration node is configured to provide data
identifiers to other acceleration nodes and to identify data items
based on data identifiers received from another acceleration node;
receiving, at the first acceleration node and from a second
acceleration node, a data item; including the data item in the
first dictionary included in the first acceleration node; and
providing the received data item, a data identifier identifying the
received data item, or both to a third acceleration node of the
computer network, wherein the data identifier is either determined
by the first acceleration node or obtained from another
acceleration node.
2. The method of claim 1, further comprising: receiving, at the
first acceleration node and from one or more further acceleration
nodes, further data items; and including the further data items in
the first dictionary.
3. The method of claim 1, further comprising determining the data
identifier for a corresponding data item using a predetermined
algorithm wherein the predetermined algorithm comprises determining
a hash value.
4. The method of claim 1, wherein the first acceleration node
stores a protocol of dictionary entries including data items, data
identifiers, or both in a second dictionary of the third
acceleration node.
5. The method of claim 1 further comprising: determining at the
first acceleration node which dictionary entries of the first
dictionary are missing in the second dictionary; and providing from
the first acceleration node the missing dictionary entries.
6. The method of claim 1, further comprising: determining that a
dictionary of the third acceleration node contains the data item;
providing a data identifier identifying the received data item to
the third acceleration node if it has been determined that the
dictionary of the third acceleration node contains the data item;
and providing the data item to the third acceleration node if it
has been determined that the dictionary of the third acceleration
node does not contain the data item.
7. The method of claim 1, further comprising: estimating an amount
of resources and/or time it takes for the third acceleration node
to determine the data identifier identifying the received data item
and an amount of resources and/or time it takes to transmit the
data identifier from the first acceleration node to the third
acceleration node; comparing the amount of resources and/or time it
takes for the third acceleration node to determine the data
identifier identifying the received data item and the amount of
resources and/or time it takes to transmit the data identifier from
the first acceleration node to the third acceleration node;
transmitting the data identifier identifying the received data item
from the first acceleration node to the third network node if the
amount of resources and/or time it takes for the third acceleration
node to determine the data identifier identifying the received data
item is larger than the amount of resources and/or time it takes to
transmit the data identifier from the first acceleration node to
the third acceleration node; and letting the third network node
calculate the data identifier if the amount of resources and/or
time it takes for the third acceleration node to determine the data
identifier identifying the received data item is smaller than the
amount of resources and/or time it takes to transmit the data
identifier from the first acceleration node to the third
acceleration node.
8. The method of claim 1 wherein a data identifier is provided by
the first acceleration node, further comprising: receiving a data
identifier of the first dictionary at the third acceleration node;
determining that a dictionary of the third acceleration node does
not include the data identifier; requesting, from the first
acceleration node, the data item identified by the data identifier;
transmitting the data item from the first acceleration node to the
third acceleration node; and including the received data identifier
and the received data item in the dictionary of the third
acceleration.
9. The method of claim 1, wherein the first acceleration node
regularly broadcasts at least a portion of its dictionary to one or
more neighboring acceleration nodes including the third
acceleration node.
10. The method of claim 1, wherein the data item is a resource to
be transmitted via the first acceleration node across the computer
network.
11. The method of claim 1, further comprising: comparing network
traffic at an acceleration node of the plurality of acceleration
nodes with a predetermined threshold; determining that the network
traffic at the acceleration node of the computer network is below
the predetermined threshold; and providing the received data item,
a data identifier identifying the received data item, or both to
the third acceleration node of the computer network in response to
determining that the network traffic at the acceleration node of
the computer network is below the predetermined threshold.
12. The method of claim 1, further comprising: generating at the
first acceleration node a data item to be included in the
dictionary of the first acceleration node; determining a data
identifier identifying the generated data item; and transmitting
the generated data item or a combination of the generated data item
and the calculated data identifier to the third acceleration
node.
13. The method of claim 1, further comprising: determining, by the
first acceleration node, a number of times a predetermined data
identifier is used in communication with other acceleration nodes
in a predetermined period of time; comparing the number of times
with a threshold number of times; and deleting a data item
identified by the data identifier from the dictionary upon
determining that the number of times is less than the threshold
number of times.
14. The method of claim 1, wherein the client-server network
includes at least three different subsets of the plurality of
acceleration nodes each subset including at least one acceleration
node, wherein the first acceleration node is included in the first
subset, the second acceleration node is included in the second
subset, and the third acceleration node is included in a third
subset of acceleration nodes, wherein the first acceleration node
includes at least one additional dictionary, and the method further
comprising: providing a data identifier of the additional
dictionary to another acceleration node of the first subset of
acceleration nodes to identify a data item based on the data
identifier of the additional dictionary.
15. The method of claim 1, further comprising: regularly
synchronizing all dictionaries of all acceleration nodes of the
computer network or a subset of acceleration nodes of the computer
network, wherein after synchronization has been completed all
dictionaries of the acceleration nodes of the computer network at
least partially include identical dictionary entries.
16. The method of claim 1, further comprising: adding the third
acceleration node to the computer network without a populated
dictionary; or in which a second dictionary of the third
acceleration node has been partially or completely lost; and
building or recovering the second dictionary of the third
acceleration node by receiving data from other acceleration nodes
of the computer network.
17. The method of claim 1, wherein the data item received by the
first network node as part of a communication process between a
server and a client.
18. The method of claim 1, wherein the computer network includes a
cloud computing environment.
19. A system comprising: one or more processors; and a
non-transitory computer-readable medium storing instructions
executable by the one or more processors to perform operations
comprising: identifying a first acceleration node included in a
computer network comprising two or more acceleration nodes, an
acceleration node to accelerate transmission of resources between a
client computer system and a server computer system connected
through the acceleration node, the first acceleration node
including a first dictionary of data items and data identifiers,
each data identifier identifying a corresponding data item, and
wherein an acceleration node is configured to provide data
identifiers to other acceleration nodes and to identify a data item
based on a data identifier received from another acceleration node;
receiving, at the first acceleration node and from a second
acceleration node, a data item; including the data item in the
first dictionary; and providing the received data item, a data
identifier identifying the received data item, or both to a third
acceleration node of the computer network, wherein the data
identifier is determined at the first acceleration node or obtained
from another acceleration node.
20. The system of claim 19, wherein the computer-readable medium
further stores instructions executable by the one or more
processors to perform operations comprising: estimating an amount
of resources and/or time it takes for the third acceleration node
to determine the data identifier identifying the received data item
and an amount of resources and/or time it takes to transmit the
data identifier from the first acceleration node to the third
acceleration node; comparing the amount of resources and/or time it
takes for the third acceleration node to determine the data
identifier identifying the received data item and the amount of
resources and/or time it takes to transmit the data identifier from
the first acceleration node to the third acceleration node;
transmitting the data identifier identifying the received data item
from the first acceleration node to the third network node if the
amount of resources and/or time it takes for the third acceleration
node to determine the data identifier identifying the received data
item is larger than the amount of resources and/or time it takes to
transmit the data identifier from the first acceleration node to
the third acceleration node; and letting the third network node
calculate the data identifier if the amount of resources and/or
time it takes for the third acceleration node to determine the data
identifier identifying the received data item is smaller than the
amount of resources and/or time it takes to transmit the data
identifier from the first acceleration node to the third
acceleration node.
21. A non-transitory computer readable medium storing instructions
thereon which when executed by a processor cause the processor to:
identify a first acceleration node included in a computer network
comprising two or more acceleration nodes, an acceleration node to
accelerate transmission of resources between a client computer
system and a server computer system connected through the
acceleration node, the first acceleration node including a first
dictionary of data items and data identifiers, each data identifier
identifying a corresponding data item, and wherein an acceleration
node is configured to provide data identifiers to other
acceleration nodes and to identify a data item based on a data
identifier received from another acceleration node; receive, at the
first acceleration node and from a second acceleration node, a data
item; include the data item in the first dictionary; and provide
the received data item, a data identifier identifying the received
data item, or both to a third acceleration node of the computer
network, wherein the data identifier is determined at the first
acceleration node or obtained from another acceleration node.
22. The computer readable medium of claim 20 further storing
instructions which when executed by a processor cause the processor
to: estimate an amount of resources and/or time it takes for the
third acceleration node to determine the data identifier
identifying the received data item and an amount of resources
and/or time it takes to transmit the data identifier from the first
acceleration node to the third acceleration node; compare the
amount of resources and/or time it takes for the third acceleration
node to determine the data identifier identifying the received data
item and the amount of resources and/or time it takes to transmit
the data identifier from the first acceleration node to the third
acceleration node; transmit the data identifier identifying the
received data item from the first acceleration node to the third
network node if the amount of resources and/or time it takes for
the third acceleration node to determine the data identifier
identifying the received data item is larger than the amount of
resources and/or time it takes to transmit the data identifier from
the first acceleration node to the third acceleration node; and let
the third network node calculate the data identifier if the amount
of resources and/or time it takes for the third acceleration node
to determine the data identifier identifying the received data item
is smaller than the amount of resources and/or time it takes to
transmit the data identifier from the first acceleration node to
the third acceleration node.
Description
TECHNICAL FIELD
[0001] The present disclosure relates to methods and systems for
sharing data in a computer network including network nodes.
BACKGROUND
[0002] Modern computer network systems can be fairly complex and
span large spatial distances. For instance, a central database
including a server of a client-server network can be located in
Europe. Different systems of a client can be located in, e.g., the
U.S.A., Australia, South Africa, and other geographical locations.
Such distributed systems can be the result of consolidated networks
structures of globally operating enterprises. At the same time, the
amount of data transmitted over these networks is steadily
increasing. This can result in considerable delays, in particular
on the wide area network connections. In the example described
above, a client located in Australia can launch a report program,
which may access the central database in Europe. This can lead to
high response times as a result of network limitations such as
bandwidth, latency and congestion. Furthermore, advanced
communication schemes might increase the amount of data that needs
to be stored at the different networks nodes and thus increases the
cost and complexity of the network nodes.
SUMMARY
[0003] In a first aspect of the present disclosure, a
computer-implemented method includes identifying a first
acceleration node included in a computer network comprising a
plurality of acceleration nodes, an acceleration node to accelerate
transmission of resources between a client computer system and a
server computer system connected through one or more of the
plurality of acceleration nodes, the first acceleration node
including a first dictionary of data items and data identifiers,
each data identifier identifying a corresponding data item, and
wherein an acceleration node is configured to provide data
identifiers to other acceleration nodes and to identify data items
based on data identifiers received from another acceleration node,
receiving, at the first acceleration node and from a second
acceleration node, a data item, including the data item in the
first dictionary included in the first acceleration node and
providing the received data item, a data identifier identifying the
received data item, or both to a third acceleration node of the
computer network, wherein the data identifier is either determined
by the first acceleration node or obtained from another
acceleration node.
[0004] In a second aspect according to the first aspect, the method
further includes receiving, at the first acceleration node and from
one or more further acceleration nodes, further data items and
including the further data items in the first dictionary.
[0005] In a third aspect according to the first or second aspect,
the method further includes determining the data identifier for a
corresponding data item using a predetermined algorithm.
[0006] In a fourth aspect according to the third aspect, the
predetermined algorithm comprises determining a hash value.
[0007] In a fifth aspect according to anyone of the previous
aspects, the first acceleration node stores a protocol of
dictionary entries including data items, data identifiers, or both
in a second dictionary of the third acceleration node.
[0008] In a sixth aspect according to anyone of the previous
aspects, the method further includes determining at the first
acceleration node which dictionary entries of the first dictionary
are missing in the second dictionary and providing from the first
acceleration node the missing dictionary entries.
[0009] In a seventh aspect according to anyone of the previous
aspects, the method further includes determining that a dictionary
of the third acceleration node contains the data item, providing a
data identifier identifying the received data item to the third
acceleration node if it has been determined that the dictionary of
the third acceleration node contains the data item and providing
the data item to the third acceleration node if it has been
determined that the dictionary of the third acceleration node does
not contain the data item.
[0010] In an eighth aspect according to anyone of the previous
aspects, the method further includes estimating an amount of
resources and/or time it takes for the third acceleration node to
determine the data identifier identifying the received data item
and an amount of resources and/or time it takes to transmit the
data identifier from the first acceleration node to the third
acceleration node, comparing the amount of resources and/or time it
takes for the third acceleration node to determine the data
identifier identifying the received data item and the amount of
resources and/or time it takes to transmit the data identifier from
the first acceleration node to the third acceleration node,
transmitting the data identifier identifying the received data item
from the first acceleration node to the third network node if the
amount of resources and/or time it takes for the third acceleration
node to determine the data identifier identifying the received data
item is larger than the amount of resources and/or time it takes to
transmit the data identifier from the first acceleration node to
the third acceleration node and letting the third network node
calculate the data identifier if the amount of resources and/or
time it takes for the third acceleration node to determine the data
identifier identifying the received data item is smaller than the
amount of resources and/or time it takes to transmit the data
identifier from the first acceleration node to the third
acceleration node.
[0011] In a ninth aspect according to anyone of the previous
aspects a data identifier is provided by the first acceleration
node and the method further includes receiving a data identifier of
the first dictionary at the third acceleration node, determining
that a dictionary of the third acceleration node does not include
the data identifier, requesting, from the first acceleration node,
the data item identified by the data identifier, transmitting the
data item from the first acceleration node to the third
acceleration node and including the received data identifier and
the received data item in the dictionary of the third
acceleration.
[0012] In a tenth aspect according to anyone of the previous
aspects the first acceleration node regularly broadcasts at least a
portion of its dictionary to one or more neighboring acceleration
nodes including the third acceleration node.
[0013] In an eleventh aspect according to anyone of the previous
aspects the data item is a resource to be transmitted via the first
acceleration node across the computer network.
[0014] In a twelfth aspect according to anyone of the previous
aspects the method further includes comparing network traffic at an
acceleration node of the plurality of acceleration nodes with a
predetermined threshold, determining that the network traffic at
the acceleration node of the computer network is below the
predetermined threshold and providing the received data item, a
data identifier identifying the received data item, or both to the
third acceleration node of the computer network in response to
determining that the network traffic at the acceleration node of
the computer network is below the predetermined threshold.
[0015] In a thirteenth aspect according to anyone of the previous
aspects the method further includes generating at the first
acceleration node a data item to be included in the dictionary of
the first acceleration node, determining a data identifier
identifying the generated data item and transmitting the generated
data item or a combination of the generated data item and the
calculated data identifier to the third acceleration node.
[0016] In a fourteenth aspect according to anyone of the previous
aspects the method further includes determining, by the first
acceleration node, a number of times a predetermined data
identifier is used in communication with other acceleration nodes
in a predetermined period of time, comparing the number of times
with a threshold number of times and deleting a data item
identified by the data identifier from the dictionary upon
determining that the number of times is less than the threshold
number of times.
[0017] In a fifteenth aspect according to anyone of the previous
aspects the client-server network includes at least three different
subsets of the plurality of acceleration nodes each subset
including at least one acceleration node, wherein the first
acceleration node is included in the first subset, the second
acceleration node is included in the second subset, and the third
acceleration node is included in a third subset of acceleration
nodes, the first acceleration node including at least one
additional dictionary and the method further includes providing a
data identifier of the additional dictionary to another
acceleration node of the first subset of acceleration nodes to
identify a data item based on the data identifier of the additional
dictionary.
[0018] In a sixteenth aspect according to anyone of the previous
aspects the method further includes regularly synchronizing all
dictionaries of all acceleration nodes of the computer network or a
subset of acceleration nodes of the computer network, where after
synchronization has been completed all dictionaries of the
acceleration nodes of the computer network at least partially
include identical dictionary entries.
[0019] In a seventeenth aspect according to anyone of the previous
aspects the method further includes adding the third acceleration
node to the computer network without a populated dictionary or in
which a second dictionary of the third acceleration node has been
partially or completely lost and building or recovering the second
dictionary of the third acceleration node by receiving data from
other acceleration nodes of the computer network.
[0020] In an eighteenth aspect according to anyone of the previous
aspects the data item received by the first network node as part of
a communication process between a server and a client.
[0021] In a nineteenth aspect a system comprises one or more
processors and a computer-readable medium storing instructions
executable by the one or more processors to perform operations
including identifying a first acceleration node included in a
computer network comprising two or more acceleration nodes, an
acceleration node to accelerate transmission of resources between a
client computer system and a server computer system connected
through the acceleration node, the first acceleration node
including a first dictionary of data items and data identifiers,
each data identifier identifying a corresponding data item, and
wherein an acceleration node is configured to provide data
identifiers to other acceleration nodes and to identify a data item
based on a data identifier received from another acceleration node,
receiving, at the first acceleration node and from a second
acceleration node, a data item, including the data item in the
first dictionary and providing the received data item, a data
identifier identifying the received data item, or both to a third
acceleration node of the computer network, wherein the data
identifier is determined at the first acceleration node or obtained
from another acceleration node.
[0022] In a twentieth aspect according to the nineteenth aspect the
computer-readable medium further stores instructions executable by
the one or more processors to perform operations including
estimating an amount of resources and/or time it takes for the
third acceleration node to determine the data identifier
identifying the received data item and an amount of resources
and/or time it takes to transmit the data identifier from the first
acceleration node to the third acceleration node comparing the
amount of resources and/or time it takes for the third acceleration
node to determine the data identifier identifying the received data
item and the amount of resources and/or time it takes to transmit
the data identifier from the first acceleration node to the third
acceleration node, transmitting the data identifier identifying the
received data item from the first acceleration node to the third
network node if the amount of resources and/or time it takes for
the third acceleration node to determine the data identifier
identifying the received data item is larger than the amount of
resources and/or time it takes to transmit the data identifier from
the first acceleration node to the third acceleration node and
letting the third network node calculate the data identifier if the
amount of resources and/or time it takes for the third acceleration
node to determine the data identifier identifying the received data
item is smaller than the amount of resources and/or time it takes
to transmit the data identifier from the first acceleration node to
the third acceleration node.
[0023] In a twenty-first aspect a computer readable medium stores
instructions thereon which when executed by a processor cause the
processor to identify a first acceleration node included in a
computer network comprising two or more acceleration nodes, an
acceleration node to accelerate transmission of resources between a
client computer system and a server computer system connected
through the acceleration node, the first acceleration node
including a first dictionary of data items and data identifiers,
each data identifier identifying a corresponding data item, and
wherein an acceleration node is configured to provide data
identifiers to other acceleration nodes and to identify a data item
based on a data identifier received from another acceleration node,
receive, at the first acceleration node and from a second
acceleration node, a data item, include the data item in the first
dictionary and provide the received data item, a data identifier
identifying the received data item, or both to a third acceleration
node of the computer network, wherein the data identifier is
determined at the first acceleration node or obtained from another
acceleration node.
[0024] In a twenty-second aspect according to the twenty-first
aspect the computer readable medium further stores instructions
which when executed by a processor cause the processor to estimate
an amount of resources and/or time it takes for the third
acceleration node to determine the data identifier identifying the
received data item and an amount of resources and/or time it takes
to transmit the data identifier from the first acceleration node to
the third acceleration node, compare the amount of resources and/or
time it takes for the third acceleration node to determine the data
identifier identifying the received data item and the amount of
resources and/or time it takes to transmit the data identifier from
the first acceleration node to the third acceleration node,
transmit the data identifier identifying the received data item
from the first acceleration node to the third network node if the
amount of resources and/or time it takes for the third acceleration
node to determine the data identifier identifying the received data
item is larger than the amount of resources and/or time it takes to
transmit the data identifier from the first acceleration node to
the third acceleration node and let the third network node
calculate the data identifier if the amount of resources and/or
time it takes for the third acceleration node to determine the data
identifier identifying the received data item is smaller than the
amount of resources and/or time it takes to transmit the data
identifier from the first acceleration node to the third
acceleration node.
[0025] In a twenty-third aspect the system comprises one or more
processors and a computer-readable medium storing instructions
executable by the one or more processors to perform operations
according to any of aspects 1 to 18.
[0026] In a twenty-fourth aspect a computer-readable medium stores
instructions executable by the one or more processors to perform
operations according to any of aspects 1 to 18.
[0027] In a twenty-fifth aspect according to any of aspects 1 to 18
the computer network includes a cloud computing environment.
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] FIG. 1 illustrates an example client-server network.
[0029] FIG. 2 illustrates an example client-server network
including multiple servers, multiple clients, multiple server
front-end nodes and multiple client front-end nodes at the
beginning of a dictionary synchronization process.
[0030] FIG. 3 illustrates the client-server network of FIG. 2 after
the dictionary synchronization process has been completed.
[0031] FIG. 4 illustrates an example method for synchronizing two
dictionaries of acceleration nodes in a client-server network.
[0032] FIG. 5 illustrates another example method for synchronizing
two dictionaries of acceleration nodes in a client-server
network.
[0033] While generally described as computer-implemented software
embodied on tangible media that processes and transforms the
respective data, some or all of the aspects may be
computer-implemented methods or further included in respective
systems or other devices for performing this described
functionality. The details of these and other aspects and
implementations of the present disclosure are set forth in the
accompanying drawings and the description below. Other features and
advantages of the disclosure will be apparent from the description
and drawings, and from the claims.
DETAILED DESCRIPTION
[0034] This disclosure relates to synchronizing dictionaries of
acceleration nodes in a computer network (e.g., a client-server
network).
[0035] For example, dictionaries of a plurality of acceleration
nodes of a client-server network can be synchronized to each
include one or more identical data items and data identifier pairs.
Synchronization can include transmitting a particular data item, or
a combination of a data item and an associated data identifier, to
another acceleration node which includes it in its dictionary. A
particular acceleration node can, instead of transmitting a data
item, transmit an associated data identifier to another
acceleration node. As all (or a subset) of acceleration nodes can
have an identical dictionary when employing the methods described
herein, the particular acceleration node can use the same
dictionary to communicate with all (or the subset of) other
acceleration nodes of the computer network.
[0036] By implementing the techniques described here, the required
memory at a network node (e.g., an acceleration node) of the
computer network can be reduced by reducing a number of
dictionaries that have to be stored at the network node to
communicate with different other network nodes (e.g., acceleration
nodes) and/or by reducing an amount of duplicated data ("data
de-duplication") in the dictionaries of the network nodes. In some
examples, a single dictionary per network node can be sufficient to
handle communication in the computer network. In this manner,
duplicated data can be removed from the dictionaries of the network
nodes. In some examples, every data item to be transmitted over the
computer network is only represented once in the dictionaries of
the acceleration nodes. In addition, a processor load for a network
node (e.g., an acceleration node) of the computer network for
synchronizing a dictionary can be reduced. Also, network traffic in
the computer network when updating dictionaries of one or more
network nodes can be reduced and/or more evenly distributed in
time. In particular, additional network traffic in peak times,
where the amount of data transported across the computer network is
highest, can be avoided. Moreover, network nodes (e.g.,
acceleration nodes) can "train themselves," i.e., the network nodes
can self-update their dictionaries to contain certain data items
before a predetermined transmission process utilizes the dictionary
to provide a faster transmission between network nodes. Further,
the network can dynamically synchronize dictionaries of the network
nodes to adapt to different requirements during operation.
Furthermore, additional network nodes (e.g., new acceleration
nodes) can be added conveniently and flexibly as the dictionaries
of the added network nodes can be added in a dynamic fashion. In
addition, lost or partially lost dictionaries of particular network
nodes can be restored as neighboring network nodes of the
particular network nodes can have identical dictionaries.
[0037] FIG. 1 shows an example client-server network including
multiple network nodes 102a, 102b, 102c, 104, 108a, 108b, 108c,
110. One or more of the nodes in the network can be acceleration
nodes, each of which can accelerate transmission of resources
between a client computer system and a server computer system. As
described in detail below, an acceleration node can include a
dictionary of data items and data identifiers. Each data identifier
can identify a corresponding data item. In some implementations, an
acceleration node can provide data identifiers to one or more of
the other acceleration nodes. Alternatively, or in addition, an
acceleration node can identify a data item based on data
identifiers received from another acceleration node. In some
implementations, a first acceleration node can receive a data item
from a second acceleration node. The first acceleration node can
include the data item in the first dictionary included in the first
acceleration node. The first acceleration node can provide the
received data item, a data identifier identifying the received data
item, or both to a third acceleration node of the computer network.
The first acceleration node can have determined the data identifier
or obtained the data identifier from another acceleration node.
[0038] In the example of FIG. 1, the client-server network contains
one or more servers 104 and multiple clients 102a, 102b, 102c in
communication with the one or more servers 104. In some
implementations, the server is a database server arranged to
provide database services. The server 104 is connected to a server
front-end node (SFE) 110 which is arranged to receive data from the
server 104 to be transmitted to the clients 102a, 102b, 102c
through client front ends (CFEs) 108a, 108b, 108c respectively and
vice versa. The client front end nodes 108a, 108b, 108c and the
server front-end nodes 110 can be configured to accelerate
communication between the server 104 and the clients 102a, 102b,
102c. Thus, the client and server front end nodes are acceleration
nodes of the client-server network. In some implementations, the
network connection 105d between the server front-end node (SFE) 110
and the server 104 and the connections 105a, 105b, 105c between the
client front-end nodes (CFEs) 108a, 108b, 108c respectively and
their respective clients 102a, 102b, 102c includes a local area
network connection. The server front-end node 110 and the client
front-end nodes 108a, 108b, 108c are connected via wide area
connections 107a, 107b, 107c respectively. The client front-end
nodes 108a, 108b, 108c can also be mutually connected via network
connections (e.g., local area network connections or wide area
network connections).
[0039] In some implementations, a particular client 102b requests a
service from the server 104. In the course of the execution of this
request, data is transmitted between the server 104 and the client
102b. For instance, the request can be a request for the homepage
of a website of a company. Serving a single request can include
multiple data transmission cycles between the server 104 and the
particular client 102b. In the example communication between the
server 104 located in a first geographic location and the client
102b located in a second geographic location remote from the first
geographic location over wide area network connection 107b,
bandwidth limitations, latency and congestion can add up to a
considerable delay in providing the requested services to the
client. For example, it can take up to a minute or more in a
typical client-server network to serve the request for the said
homepage. The bottleneck for this communication can be, among
others, transmitting data over the wide area connection between the
server and the client.
[0040] In the system illustrated in FIG. 1, data transmitted
between the server 104 and one of the clients 102a, 102b, 102c is
routed through server front-end node 110 and at least one of the
client front-end nodes 108a, 108b, 108c. In general, any network
node implementing functions to accelerate communication between
network nodes of a computer network is an acceleration node.
Therefore, the methods and systems of the present disclosure are
not limited to server and client frond end nodes but can also be
applied to other acceleration nodes (and, as described below, also
to general network nodes including dictionaries used for
communication).
[0041] In order to accelerate communication between the server 104
and the clients 102a, 102b, 102c, the client front-end nodes 108a,
108b, 108c and server front-end nodes 110 can compress data before
it is sent over the wide area network connections 107a, 107b, 107c.
Alternatively or in addition, the client front-end nodes 108a,
108b, 108c and server front-end nodes 110 can reduce a number of
communication roundtrips over the wide area network connections
107a, 107b, 107c required to execute a predetermined task. This can
include caching data at the client front-end nodes 108a, 108b, 108c
and server front-end node 110 to serve data from a local cache
instead of from the original server (e.g., from a dictionary of the
server 104). This can also include keeping network connections
(e.g., transmission control protocol connections) open to avoid
latency caused by re-opening network connections. In addition or
alternatively, prioritizing services, caching of redundant traffic
and reducing of packet loss by establishing multiple network
connections in parallel can be employed to reduce delays.
[0042] The measures described above may include using dictionaries
when transmitting data between the client front-end nodes 108a,
108b, 108c and server front-end node 110. A dictionary includes
data items. In addition, a dictionary includes data identifiers
associated with the data item identifying the data items. The data
identifiers and the data items are referred to as "dictionary
entries." The dictionary can also have additional dictionary
entries. For example, if data identifiers and data items are not
one-to-one matched and thus their relationship may be ambiguous
(e.g., a data identifier is associated with two or more data items)
additional dictionary entries for resolving this ambiguity can be
provided. In other embodiments, an ambiguous data identifier can be
transmitted in addition with a unique prefix of the data item
associated with the data identifier. In this situation there might
not be the need to store additional dictionary entries. In most
examples, the data identifier uses less memory space than the data
item it is associated with and identifies. However, in some
examples the data identifier can also be longer than the data item
it identifies (e.g., to add additional information or redundancy).
In some implementations, the dictionary includes pairs of data
identifiers and associated data items. In other examples, the
dictionary can include multiple levels of data identifiers. Each
network node of a computer network can have one more
dictionaries.
[0043] The term "data item" includes any data associated with the
network nodes of a computer network. For example, data to be
transmitted over a network connection can be segmented into data
items and stored (e.g., cached) in a database. In other examples,
the data items can include resources from both sides of a network
connection. The segmentation of data into data items can be
executed according to any convenient segmentation algorithm. For
example, data can be segmented based on its content. For instance,
if a web page is to be segmented into data items, different images
or other objects can be put in one separate data item each.
However, the content can also be fragmented and stored in multiple
data items. In other embodiments, a data item can be a series of
bits that forms part of a resource. For instance, the resource can
be a file. The data item can also be a series of bits that form
part of a data packet or a buffer content. The same data item may
occur in two or more different resources of the computer network
(e.g., on different web pages or in different files). In other
examples, the data is segmented into data items based on an order
in which the data is going to be transmitted. A particular data
item can appear in two or more different resources of the computer
network (e.g., files, web sites). In this situation, the dictionary
of a network node only has to include one dictionary entry
including the data item. When transmitting the different resources
including the identical data item, the network node can use the
same data identifier.
[0044] A data identifier identifies an associated data item. In
some examples, the data identifier can be determined from the data
item in a deterministic manner. Some implementations of determining
a data identifier includes determining a hash value of the data (or
a portion of the data) of the data item. Employing data identifiers
which can be determined in a deterministic manner is advantageous
as for a given data item the associated data identifier can be
determined independently at each network node by only knowing the
data identifier generation function and applying it on the data
item.
[0045] In communication between two network nodes (e.g.,
acceleration nodes), both involved network nodes can have the same
(or at least partially the same) dictionary. When a data item
included in the dictionaries of a transmitting and receiving
network node arrives at the transmitting network node for
transmittal, the transmitting network node looks up the associated
data identifier in its dictionary. If the network node cannot find
the data identifier, it can calculate it and store the newly
calculated data identifier and the data item in its dictionary.
Instead of transmitting the data item, the transmitting network
node transmits the data identifier associated with the data item.
The receiving network node, after having received the data
identifier, can then look up the data item associated with the
received data identifier. In examples where the data identifier is
shorter than the data item it is associated with, this can reduce
the amount of traffic which needs to pass through the network
connection between the transmitting and receiving network node and
thus reduce the response times for clients' requests. Thus, as
described above, this technique can be employed in acceleration
nodes of a computer network. In the example of FIG. 1 each of the
client front-end nodes 108a, 108b, 108c and server front-end nodes
110 can have one or more dictionaries to communicate with one or
more other client front-end nodes 108a, 108b, 108c and server
front-end nodes 110.
[0046] The client front-end nodes 108a, 108b, 108c and server
front-end nodes 110 can include a processing unit configured to
receive and transmit data in a bidirectional fashion through the
network node. The processing unit is adapted to employ any of the
techniques for accelerating communication over the network
described here. The dictionary can be stored in a volatile or
persistent memory of any acceleration node in the network. In most
examples, the dictionary will be stored in a volatile memory (e.g.,
a cache) to provide for a fast access to the dictionary.
[0047] A particular network node can be connected with multiple
other network nodes. In addition, multiple other clients and
servers can be part of the network also including client and server
front-end nodes including network nodes. Furthermore, the server
front-end nodes or the client front-end nodes can also be directly
connected with each other. This can mean that a particular network
node has to maintain different dictionaries to communicate with
different connected network nodes. An example network topology is
schematically illustrated in FIG. 2. In this example, server
front-end network node 110d is connected with three other client or
server front-end network nodes. Thus, server front-end network node
110d might need one separate dictionary for accelerating
communication with each of the three client or server front-end
network nodes it is directly connected to ("neighboring network
nodes"). This can produce a considerable amount of dictionary data
to be stored in the memory of the server front-end network node
110d.
[0048] The present disclosure provides for a computer-implemented
method as illustrated in FIG. 5, which, among other things, can
decrease the amount of dictionary data to be stored in the memory
of the server front-end network node 110d or any other network
node. The method includes, at 501, identifying a first acceleration
node included in a computer network comprising a plurality of
acceleration nodes, an acceleration node to accelerate transmission
of resources between a client computer system and a server computer
system connected through one or more of the plurality of
acceleration nodes, the first acceleration node including a first
dictionary of data items and data identifiers, each data identifier
identifying a corresponding data item, and wherein an acceleration
node is configured to provide data identifiers to other
acceleration nodes and to identify data items based on data
identifiers received from another acceleration node, at 502,
receiving, at the first acceleration node and from a second
acceleration node, a data item, including, at 503, the data item in
the first dictionary included in the first acceleration node and,
at 504, providing the received data item, a data identifier
identifying the received data item, or both to a third acceleration
node of the computer network, the data identifier being either
determined by the first acceleration node or obtained from another
acceleration node.
[0049] FIG. 2 and FIG. 3 illustrate example computer implemented
systems for synchronizing dictionaries of acceleration nodes in a
computer network. The computer network of FIG. 2 and FIG. 3 is a
client-server system having multiple servers 104a to 104f and
multiple clients 102a and 102f. These servers 104a to 104f and
clients 102a to 102f are connected via a network of client
front-end network nodes 108a to 108f and server front-end network
nodes 110a to 110e. Each client front-end network node 108a to 108f
is connected to at least one server front-end network nodes 110a to
110e, for instance via a wide area network connection. FIG. 2 shows
the client-server system in a first state where the dictionary of a
particular network node 108f includes a particular pair of a data
item and an associated data identifier (symbolized by "#" in FIG.
2). The remaining network nodes do not have this pair in their
respective dictionaries (the missing or differing pair is
symbolized by "x" in FIG. 2). At a predetermined point in time or
upon a trigger event, the particular network node 108f transmits a
dictionary entry (e.g., a data item or a combination of a data
identifier and a data item) of its dictionary 116f for accelerating
communication to a server front-end network node 110e. This network
node 110e receives the dictionary entry and updates its dictionary
113e. In addition, server front-end network node 110e may transmit
the dictionary entry to server front-end 110d. In some
implementations, a dictionary entry can be transmitted as-is.
Alternatively, or in addition, the network node can store and
retrieve the dictionary entry or encrypt, compress or otherwise
process it before transmission to a further server front-end
network node 110d, which also updates its dictionary 113d. This
network node can again forward the dictionary entry to more network
nodes 108e, 110c and so on. After a predetermined number of
transmission steps, the dictionary entry can have been propagated
by the client and server front-end network nodes 108a to 108f and
110a to 110e throughout the complete network, and all dictionaries
113a to 113e and 116a to 116f can have been synchronized. This
means that the dictionaries of all server and client front-end
network nodes include the dictionary entry. As can be seen in FIG.
3, the dictionaries of all network nodes include the identical pair
# of dictionary entries. Therefore, for accelerating any
transmission between two client or server front-end network nodes
of the data item of the common pair (e.g., from a client front-end
network node to another client front-end network node, from a
server front-end network node to another server front-end network
node, from a client front-end network node to a server front-end
network node, or vice versa), the same data identifier can be used.
This process can be executed for any number of dictionary entries
(e.g., data items). In this manner, a portion of all dictionaries
of all acceleration nodes (e.g., client and server front-end
network nodes) have corresponding (e.g. identical) dictionary
entries. This can supersede the necessity to have multiple
dictionaries or at least reduce the number of dictionaries required
to communicate with multiple other network nodes. In turn, the
memory requirements at the acceleration nodes (e.g., the client and
server front-end network nodes 108a to 108f, 110a to 110e)
decrease. In addition, an amount of duplicated data can be reduced
as a particular data item can only be represented by a single data
item/data identifier pair in the dictionaries of all acceleration
nodes (e.g., the client and server front-end network nodes 108a to
108f and 110a to 110e). This also can reduce the amount of memory
required at each acceleration node.
[0050] The dictionary synchronization method described in
connection with FIG. 2 and FIG. 3 can be implemented in several
different ways, which can also be used concurrently in the same
computer network. In some implementations, a first acceleration
node (e.g., client front-end network node 108f) transmits a
predetermined data item to a second network node. The receiving
acceleration node can then determine the data identifier from the
received data item (e.g., by determining a hash value of the data
item). Alternatively, the transmitting acceleration node can
transmit the data identifier and the data item to the receiving
acceleration node. Optionally, the acceleration nodes can determine
if it is more resource efficient to transmit the data identifier
and the associated data item or if is it more resource efficient to
let the receiving acceleration node determine the data identifier
associated with the transmitted data item after having received the
data item. In some implementations, the acceleration nodes
determine which option is faster (e.g., in view of the processing
power and processor load of the receiving acceleration node and the
available bandwidth of the network connection between the
acceleration nodes). For example, due to a temporal high processor
load of a processor of an acceleration node or due to one
acceleration node having a processor with comparatively low
processing power, it can be temporarily or permanently faster to
transmit the data identifier from a neighboring acceleration node
than to determine it locally at the acceleration node. In other
examples, it can also be faster to determine the data identifier
locally at an acceleration node instead of transmitting it over the
network due to temporal or permanent bandwidth restrictions. In
other embodiments, a first acceleration node can also send the data
item to another acceleration node to determine the data identifier
associated with the data item. After having determined the data
identifier, the other acceleration node can transmit it to the
first acceleration node. This can also be resource efficient (e.g.
faster) in some situations other than the options describes above.
Instead of using the processing speed to decide if a data
identifier is transmitted or determined locally, the acceleration
nodes can also use other criteria to decide if a data identifier is
to be transmitted or determined locally. For example, in some
examples network traffic should be as low as possible, so data
identifiers are determined locally by the acceleration nodes. The
criteria described above can also be used in combination or
alternatingly (e.g., depending on the state of a computer network).
The criteria described above can be selected by an administrator of
the computer network. For instance, the administrator can decide
that the network should be optimized to secure fast delivery of
data over the network. Then, "resource efficient" means "time
efficient". In other embodiments, the administrator can decide to
minimize the traffic over the network. In this situation, "resource
efficient" means "bandwidth efficient". A combination of different
optimization criteria is also possible.
[0051] FIG. 4 illustrates another example dictionary
synchronization method between two acceleration nodes of a computer
network in the context of a communication process. At 401, a first
acceleration node (e.g., the first server front-end node in FIG. 4)
listens passively for transmissions (e.g., transmissions from the
second server front-end network node and from the first client
front-end network node in FIG. 4). At operation 402, the first
acceleration node starts to communicate with a neighboring
acceleration node (e.g., with the second server front-end network
node in FIG. 4) to carry out a communication process across the
network connection between the first network node and the
neighboring network node. The following update operation can take
place in several different ways. At operation 403, the first
acceleration node can determine if the neighboring network node it
is going to communicate with has a predetermined data item in its
dictionary. For instance, the first acceleration node can use the
information about the other acceleration node's dictionary gathered
while passively listening to broadcasts of the neighboring
acceleration nodes. Depending on the outcome of the determination
operation, the first network acceleration node can select one of
several operations. Firstly, at operation 408, if the first
acceleration node has determined that the other acceleration node
has the data item to be transmitted in its dictionary, the first
acceleration node only transmits the data identifier associated
with the data item. At 410, the other acceleration node receives
the data identifier and can identify the associated data item in
its dictionary, if the determination of the first acceleration node
regarding its existence in the other network node's dictionary was
correct. The other acceleration node determines at 412 if the data
item associated with the received data identifier is in its
dictionary. If the other acceleration node does not have the
associated data item in its dictionary, it can ask at 411 the first
acceleration node to transmit the data item. In a second
alternative, the first acceleration node cannot determine if the
data item is present in the other network node's dictionary. In
this case, at operation 408, it can send the data identifier to the
other acceleration node. If the other acceleration node does not
have the associated data item in its dictionary, at operation 411,
it can ask the first acceleration node to transmit it. Thus, in the
example of FIG. 4 the dictionary of the second acceleration node is
updated "on the fly," i.e., in connection with a communication
process (e.g., a transmission of the data item from the first
acceleration node to the second acceleration node) between the
first and second acceleration nodes. For example, the data item can
be part of a transmission of data between a server and a client
(e.g., a web page served to the client).
[0052] Alternatively, if the first acceleration node cannot
determine that the data item is present in the other acceleration
node's dictionary, it can send the data item directly and the other
acceleration node can determine the associated data identifier
(e.g., by determining a hash value of the data item), or receive it
from the first acceleration node as well. In a third alternative,
the first acceleration node has determined that the data item is
not present in the other acceleration node's dictionary. At
operation 404, the first acceleration node transmits the data item.
At operation 406, the first and/or the other acceleration node can
determine if it would be more optimal to also send the data
identifier associated with the data item or let the other
acceleration node determine the data identifier associated to the
received data item (as described above). Depending on the outcome
of this determination operation, the first acceleration node can
either, at operation 408, send the data identifier as well or, at
operation 407, the other acceleration node can determine the data
identifier associated with the received data item.
[0053] In other examples, the first acceleration node can first
transmit a data identifier associated with a predetermined data
item to a second acceleration node. The second acceleration node
receives the data identifier and determines if the data item
associated with the data identifier already exists in its
dictionary. If this is the case, the second acceleration node can
signal the transmitting acceleration node that a transmittal of the
data item associated with the transmitted data identifier is not
required. If the receiving acceleration nodes determine that the
data item associated with the data identifier is not yet in its
dictionary, the second acceleration node can poll the data item
from the first acceleration node. Alternatively, the second
acceleration node can also poll the data item from another
acceleration node which has it in its dictionary. For instance,
transmission from the other acceleration node can be faster as
transmission from the first acceleration node (e.g., as the other
acceleration node is closer to the second acceleration than the
first acceleration node). After having received the poll, the first
acceleration node also transmits the data item associated with the
transmitted data identifier. The receiving acceleration node
receives the data item and updates its dictionary to include the
data item and the associated data identifier.
[0054] As described in connection with FIG. 4, a dictionary
synchronization process can be triggered when a particular data
item is to be transmitted through an acceleration node. In
addition, or alternatively, the acceleration nodes can monitor
network traffic to determine convenient times to carry out a
dictionary synchronization process (e.g., the methods described in
connection with FIG. 4 and FIG. 5). In one example, the
acceleration nodes perform the synchronization operations during
"off-peak times," e.g., when the volume of the network traffic
(e.g., between the acceleration nodes involved in the process) is
below a predetermined threshold. Alternatively or in addition, the
acceleration nodes can monitor priorities of transmission processes
that take place at a certain time at the involved acceleration
nodes. In other examples, the synchronization process is scheduled
to take place regularly, in particular periodically (e.g., once
every day) or event-driven (e.g., after the resources stored at a
predetermined acceleration node have been changed in a
predetermined manner). The methods for triggering the update
process can also be combined. For example, a periodic update can be
combined with network traffic monitoring. In this manner, update
processes can take place regularly but in the same time at off-peak
times.
[0055] The dictionary synchronization methods described herein can
be initiated locally by each acceleration node of the computer
network, or they can be scheduled globally for all acceleration
nodes of the computer network. A combination of both concepts is
also possible. In one embodiment, an acceleration node initiates a
synchronization operation while transmitting data to another
acceleration node (e.g., as described in connection with FIG. 4).
In other embodiments, a schedule for a dictionary synchronization
process can be provided in the computer network. The schedule can
include information that indicates which acceleration node
periodically or event-driven updates the dictionaries of which
neighboring acceleration nodes. In other examples, the dictionary
synchronization process can be initiated by a particular
acceleration node upon occurrence of a trigger event (e.g., network
traffic below a predetermined threshold at the particular
acceleration node or new/modified data available at the
acceleration node). In this situation, multiple acceleration nodes
in a computer network can initiate dictionary synchronization
processes at the same time or at different times.
[0056] In the methods described herein, an acceleration node can
keep track of the data items, the data identifiers, or both it
transmits to or receives from neighboring acceleration nodes. In
other examples, an acceleration node can listen to broadcasts of
dictionary entries of other acceleration nodes. For example, an
acceleration node can store which data items or which data
identifiers (or both) have been transmitted to a particular
neighboring acceleration node. Likewise, the acceleration node can
store which data items or which data identifiers (or both) have
been received from a particular neighboring acceleration node, be
it via point-to-point communication or via a broadcast. In addition
or alternatively, an acceleration node can communicate (e.g., via
broadcast or point-to-point communication) that it has deleted or
is going to delete a particular data item from its dictionary. In
addition or alternatively, an acceleration node can communicate
(e.g., via broadcast or point-to-point communication) that it has
detected a collision in a function generating the data identifiers
(e.g., two different data items resulting in the same data
identifier). This information can be used by the acceleration nodes
to coordinate data item and data identifier transmission operations
in a dictionary synchronization process. For instance, the first
acceleration node can refrain from transmitting a data identifier
or a data item to a predetermined other acceleration node for a
predetermined time after having sent the data identifier or data
item. Alternatively or in addition, the first acceleration node can
refrain from sending a data identifier or a data item obtained from
a predetermined other acceleration node for a predetermined time
after having obtained the data identifier or data item.
Alternatively or in addition, the first acceleration node can
determine which dictionary entries are missing in the dictionary of
one or more neighboring acceleration nodes.
[0057] In other examples, the acceleration nodes can regularly
broadcast dictionary entries (for examples, an acceleration node
can broadcast which dictionary entries it is familiar with) to
neighboring acceleration nodes, whether by broadcasting the data
item or broadcasting the data identifier or both. The acceleration
nodes can identify which dictionary entries their neighboring
acceleration nodes are familiar with and which dictionary entries
of their own dictionaries are unknown to the neighboring
acceleration nodes. Then, an acceleration node having a dictionary
entry not in the dictionary of one or more neighboring acceleration
nodes can broadcast the dictionary entry or transmit it via
point-to-point communication to the acceleration nodes lacking the
dictionary entry. In other examples, an acceleration node can
determine that a dictionary entry broadcast by another network node
is missing in its dictionary and update its dictionary (e.g., by
asking the other acceleration node to transmit a dictionary entry,
or by calculating its data identifier on its own assuming the
broadcast included the data item). In some examples, an
acceleration node broadcasts only the data identifiers of its
dictionary to keep the amount of data transmitted as low as
possible. By employing the methods described in the present
paragraph, the "more knowledgeable" acceleration nodes can "teach"
the "less knowledgeable" acceleration nodes.
[0058] As described above, every acceleration node in a network (or
a portion of a network) can have only a single dictionary when
using the dictionary synchronization methods described herein.
However, in some examples only selected acceleration nodes of a
network employ the dictionary synchronization methods described
herein and communicate using the synchronized dictionaries between
each other. Additionally, these acceleration nodes can have one or
more additional dictionaries for communication with other
acceleration nodes. For instance, groups of acceleration nodes can
be clustered in regional clusters (e.g. based on their location),
where one or more acceleration nodes of each cluster directly
communicate with corresponding acceleration nodes of other
clusters. The remaining acceleration nodes only communicate
directly with acceleration nodes within their regional cluster. In
this system, the dictionary synchronization processes described
herein can be employed to synchronize only the dictionaries for
inter-cluster communication. For communication within one cluster,
the acceleration nodes can use other dictionaries. Optionally,
acceleration nodes of a particular cluster of acceleration nodes
can have a second dictionary that is also synchronized using the
methods described herein for intra-cluster communication.
[0059] As described above, the data items to be included in the
dictionary can be any data stored at the transmitting acceleration
node or any resource of a network node (e.g., a resource of the
server or of the client). By using the dictionary synchronization
methods described herein, the dictionaries of all or of a sub-set
of acceleration nodes can also be populated with data items before
the actual data items are used in a service request of the
client-server network. For example, a first acceleration node can
modify a particular data item in its dictionary to generate a new
data item not yet in its dictionary. Additionally or alternatively,
a first acceleration node can generate a random data item (e.g., by
concatenation of random bits). In this manner, the acceleration
node can "invent" new data items and "prophylactically" prepare
itself or other acceleration nodes for transmitting these data
items (or their associated data identifiers). The first
acceleration node can transmit a newly generated data item to
neighboring acceleration nodes as described above. For instance,
the first acceleration node can transmit the data item and an
associated data identifier, or only the data item. In this manner,
dictionaries of the first and other acceleration nodes can be
populated with dictionary entries at low-peak times, which can
accelerate communication in times of high network traffic
volume.
[0060] The methods described herein can also be used to populate
the dictionaries of acceleration nodes added to a computer network.
In general, acceleration nodes can be added to a computer network
with a pre-installed dictionary, or with no pre-installed
dictionary. Other acceleration nodes having a populated dictionary
can transmit (e.g., broadcast) dictionary entries to the newly
added acceleration node. In this manner, the dictionary of the
newly added acceleration node is built or dictionary entries of a
pre-installed dictionary can be updated and synchronized with the
existing dictionaries of other acceleration nodes in the computer
network. Thus, the methods described herein can provide for a
dynamic and flexible dictionary synchronization process in which
new nodes can be easily integrated in an existing computer network.
The methods described herein can also be used to back-up the
dictionary of one or more acceleration nodes in the computer
network. For example, if a particular acceleration node loses part
of its dictionary or its complete dictionary (e.g., an in-memory
dictionary), the neighboring acceleration nodes can populate the
particular acceleration node's dictionary by the dictionary
synchronization operations described herein.
[0061] The methods described herein can also be used by a
particular acceleration node to validate its dictionary. For
instance, all acceleration nodes in a particular computer network
(or a portion of a network) can be synchronized to have identical
dictionaries. At a certain point in time, a particular acceleration
node can check the validity of its dictionary by comparing data
items and/or data identifiers obtained from or monitored in other
acceleration nodes with its own dictionary. If there is a
discrepancy, the particular acceleration node can determine that
its dictionary is (at least partially) invalid. Optionally, the
acceleration node can request dictionary entries from other
acceleration nodes to replace the invalid dictionary entries. This
provides for a built-in error checking operation.
[0062] The methods described herein relate to dictionary
synchronization of network nodes connected in a network to include
identical or corresponding dictionary entries. In some example, the
dictionaries of all acceleration nodes or a subgroup of three or
more acceleration nodes in a computer network include identical
dictionary entries. In other examples, only a portion of the
dictionary of each acceleration node is synchronized using the
methods as described herein. In other examples, the different
acceleration nodes can dynamically delete or rearrange their
dictionaries (or parts of their dictionaries). For example, a
dictionary of a particular acceleration node can have a
predetermined maximum size. As long as the current dictionary is
smaller than this maximum size, the acceleration node can add
dictionary entries to its dictionary. However, as soon as the
dictionary size reaches the maximum size, the network node can
delete dictionary entries as soon as a new dictionary entry is
received during a synchronization operation. The dictionary entries
to be deleted can be selected based on one or more of multiple
criteria. For instance, the least popular dictionary entry can be
deleted from the dictionary. The popularity of a dictionary entry
can be measured by its use frequency in communication in the
overall network or at the particular acceleration node. In other
examples, a dictionary entry which has not been used for the
longest time (recency) in communication can be deleted from the
dictionary. A combination of popularity and recency can also be
used by the acceleration nodes to select a dictionary entry to be
deleted. In other examples, acceleration nodes can decide not to
include dictionary entries relating to data items that do not meet
one or more predetermined criteria even if their maximum dictionary
size has not yet been reached. In this fashion, the acceleration
nodes can secure that their dictionaries do not grow excessively
(after all, dictionaries are often stored in-memory to be quickly
available) and that their respective dictionaries are tailored to
the data transmitted over the particular acceleration node. For
instance, a first acceleration node may seldom or never transmit a
first resource that is transmitted frequently by a second
acceleration node in a predetermined period of time. In this
situation, the first acceleration node does not include data items
of this resource in its dictionary. However, as described above, a
data item can be associated with multiple resources. Therefore,
even though the first acceleration node might not transmit the
first resource, it can nevertheless have the data item in its
dictionary. On the other hand, the second acceleration node can
have one or more entries including data items of the resource in
its dictionary.
[0063] Two identical data items in two dictionaries can encode the
same content (e.g., a picture in a web page to be served). However,
that does not mean that the dictionary entries have to be stored in
an identical format or manner in the memories of the different
network nodes. Likewise, if herein it is described that dictionary
entries are obtained, transmitted or received, that refers to the
content of the dictionary entry (e.g., a particular data item or
data identifier). The dictionary entries can be processed (e.g.,
encoded, decoded or compressed) for transmission in different ways
and still include identical content. Moreover, if a dictionary
entry is obtained, stored and then transmitted or transmitted
multiple times, this again refers to the content and not to the
particular data piece encoding the content.
[0064] Above client-server networks have been described.
Client-server networks are particular computer networks. A computer
network includes a plurality of network nodes communicating via
network connections. However, the methods and systems described
herein can be equally applied in other computer networks including
multiple network nodes using dictionaries of any form including
data items and data identifiers as described above for
communication between the network nodes. Moreover, above
communication between acceleration nodes of a client-server network
is described. An acceleration node is a particular network node
whose attributes are described above. The methods and systems
described herein can be equally applied to other network nodes
besides acceleration nodes. For example, the methods and systems
described herein can be applied to server or client nodes or to
network nodes having other functions.
[0065] Even though different components of the system 100 of FIG. 1
are symbolized using symbols for physical devices, FIG. 1 depicts a
view of the functional units of the computer network. These
functional units can be embodied in many different hardware
configurations. For instance, each functional unit can be hosted on
a dedicated device. Alternatively, multiple functional units can be
hosted on the same host device, or any mixture of the two (or
more). Further details regarding possible hardware implementations
of the functional units are described below. The same is true for
the computer networks depicted in FIG. 2 and FIG. 3.
[0066] In one embodiment, the computer networks described herein
include a cloud computing environment (e.g., some or all of the
server-side network nodes in the client-server networks of FIG. 1,
FIG. 2 and FIG. 3 can be included in a cloud computing
environment). Then, the functional units can be distributed over
multiple computer systems. For instance, network nodes (e.g.,
acceleration nodes) using the dictionary synchronization methods
described herein can be part of the cloud computing environment
(i.e., an environment for distributed computing over a network
including the network nodes). In one embodiment, a client can
request a service and this service is (at least partially)
processed by network nodes of a cloud computing environment. In
these embodiments, the synchronized dictionaries can be used to
accelerate communication between different network nodes of the
cloud computing environment.
[0067] At a high level, the clients, servers and network nodes
(e.g., acceleration nodes) are associated with a computer or
processor. A computer or processor comprises an electronic
computing unit (e.g., a processor) operable to receive, transmit,
process, store, or manage data and information associated with an
operating environment of the database system. As used in the
present disclosure, the term "computer" or "processor" is intended
to encompass any suitable processing device. The term "processor"
is to be understood as being a single processor that is configured
to perform operations as defined by one or more aspects described
in this disclosure, or the "processor" comprises two or more
processors, that are configured to perform the same operations,
e.g. in a manner that the operations are distributed among the two
or more processors. The processor may comprise multiple organic
field-effect transistors or thin film transistors or a combination
thereof. This may allow processing the operations in parallel by
the two or more processors. The two or more processors may be
arranged within a supercomputer, the supercomputer may comprise
multiple cores allowing for parallel processing of the operations.
For instance, computer or processor may be a desktop or a laptop
computer, a cellular phone, a smartphone, a personal digital
assistant, a tablet computer, an e-book reader or a mobile player
of media. Furthermore, the operating environment of the database
system can be implemented using any number of servers, as well as
computers other than servers, including a server pool. Indeed, the
computer or processor and the server may be any computer or
processing device such as, for example, a blade server,
general-purpose personal computer (PC), Macintosh, workstation,
Unix-based workstation, or any other suitable device. In other
words, the present disclosure contemplates computers other than
general purpose computers, as well as computers without
conventional operating systems. Further, the computer, processor
and server may be adapted to execute any operating system,
including Linux, Unix, Windows, Mac OS, iOS, Android or any other
suitable operating system.
[0068] The term "computing device", "server" or "processor"
encompasses all kinds of apparatus, devices, and machines for
processing data, including by way of example a programmable
processor, a computer, a system on a chip, or multiple ones, or
combinations of the foregoing. The apparatus can include special
purpose logic circuitry, e.g., an FPGA (field programmable gate
array), a CUDA (Compute Unified Device Architecture) or an ASIC
(application specific integrated circuit). The apparatus can also
include, in addition to hardware, code that creates an execution
environment for the computer program in question, e.g., code that
constitutes processor firmware, a protocol stack, a database
management system, an operating system, a cross-platform runtime
environment, a virtual machine, or a combination of one or more of
them. The apparatus and operating environment can realize various
different computing model infrastructures. In enterprise systems,
there are OLTP (OnLine Transaction processing) systems used to
carry out business processes of a company where employees and other
stakeholders, such as suppliers or customers, follow a business
process which may result in business documents created in a
database of the OLTP system. The database system can include
in-memory databases in addition to the persistent databases
described in connection with FIG. 1 and FIG. 2 and thereby exploit
recent innovations in hardware to run a database in main memory. In
an implementation of the present disclosure described herein, the
servers may be types of a Java development platform, e.g.,
Enterprise JavaBeans.RTM. (EJB), J2EE Connector Architecture (JCA),
Java Messaging Service (JMS), Java Naming and Directory Interface
(JNDI), and Java Database Connectivity (JDBC), a ByDesign platform,
SuccessFactors Platform, ERP Suite technology or in-memory database
such as High Performance Analytic Appliance (HANA) platform. In an
aspect, the servers may be based on two or more different of the
above mentioned platforms.
[0069] Regardless of the particular implementation, "software" or
"operations" may include computer-readable instructions, firmware,
wired or programmed hardware, or any combination thereof on a
tangible and non-transitory medium operable when executed to
perform at least the processes and operations described herein.
Indeed, each software component may be fully or partially written
or described in any appropriate computer language including C, C++,
Java, Visual Basic, assembler, Python and/or R, Perl, any suitable
version of 4GL, as well as others.
[0070] The figures and accompanying descriptions illustrate example
processes and computer-implementable techniques. However, the
database system operating environment (or its software or hardware
components) contemplates using, implementing, or executing any
suitable technique for performing these and other processes. It
will be understood that these processes are for illustration
purposes only and that the described or similar techniques may be
performed at any appropriate time, including concurrently,
individually, or in combination. In addition, many of the
operations in these processes may take place simultaneously,
concurrently, and/or in different orders or combinations than
shown. Moreover, operating environment may use processes with
additional operations, fewer operations, and/or different
operations, so long as the methods remain appropriate.
[0071] Aspects of the subject-matter and the operations described
in this specification can be implemented in digital electronic
circuitry, semiconductor circuits, analog circuits, or in computer
software, firmware, or hardware, including the structures disclosed
in this specification and their structural equivalents, or in
combinations of one or more of them. Embodiments of the
subject-matter described in this specification can be implemented
as one or more computer programs, i.e., one or more modules of
computer program instructions, encoded on computer storage medium
for execution by, or to control the operation of a data processing
apparatus. Alternatively or in addition, the program instructions
can be encoded on an artificially generated propagated signal,
e.g., a machine-generated electrical, optical, or electromagnetic
signal, which is generated to encode information for transmission
to a suitable receiver apparatus for execution by a data processing
apparatus. A computer storage medium can be, or be included in, a
computer-readable storage device, a computer-readable storage
substrate, a random or serial access memory array or device, or a
combination of one or more of them. Moreover, while a computer
storage medium is not a propagated signal, a computer storage
medium can be a source or destination of computer program
instructions encoded in an artificially generated propagated
signal. The computer storage medium can also be, or be included in,
one or more separate physical components or media (e.g., multiple
CDs, disks, USB drives, flash drivers, removable storage devices
(e.g. SD cards) or other storage devices). The operations described
in this specification can be implemented as operations performed by
a data processing apparatus on data stored on one or more
computer-readable storage devices or received from other
sources.
[0072] A computer program (also known as a program, software,
software application, script, or code) or "user interface" can be
written in any form of programming language, including compiled or
interpreted languages, declarative or procedural languages, and it
can be deployed in any form, including as a stand-alone program or
as a module, component, subroutine, object, or other unit suitable
for use in a computing environment. A computer program may, but
need not, correspond to a file in a file system. A program can be
stored in a portion of a file that holds other programs or data
(e.g., one or more scripts stored in a markup language document),
in a single file dedicated to the program in question, or in
multiple coordinated files (e.g., files that store one or more
modules, sub programs, or portions of code). A computer program can
be deployed to be executed on one computer or on multiple computers
that are located at one site or distributed across multiple sites
and interconnected by a communication network.
[0073] The term "graphical user interface," or GUI, may be used in
the singular or the plural form to describe one or more graphical
user interfaces and each of the displays of a particular graphical
user interface. Therefore, a GUI may represent any graphical user
interface, including but not limited to, a web browser, a touch
screen, or a command line interface (CLI) that processes
information and efficiently presents the information results to the
user. In general, a GUI may include a plurality of user interface
(UI) "icons", some or all associated with a web browser, such as
interactive fields, pull-down lists, and buttons operable by the
user of the computing device hosting the UI. These and other UI
icons may be related to or represent the functions of the web
browser. The term "browser user interface" refers to a graphical
user interface embedded in a web browser environment on the remote
computing device. The browser user interface may be configured to
initiate a request for a uniform resource locator (URL) and may be
configured to display a retrieved web page such as an HTML coded
web page. The browser user interface may comprise displayed or
hidden icons which, upon activation, initiate an associated
electronic process inside or outside the remote computing device.
For example, the browser user interface may be Internet Explorer,
Chrome or Firefox. "Creating an icon" is to be understood as
generating a new icon on the user interface. "Modifying an icon" is
to be understood as changing a property of an existing icon on the
user interface. "Deleting an icon" is to be understood as removing
an existing icon on the user interface, e.g., for replacement by a
newly created icon. "Updating the user interface" thereby is to be
understood as creating, modifying, or deleting an icon on the user
interface.
[0074] Processors suitable for the execution of a computer program
include, by way of example, both general and special purpose
microprocessors, and any one or more processors of any kind of
digital computer. Generally, a processor will receive instructions
and data from a read only memory or a random access memory or both.
The essential elements of a computer or processor may be a
processor for performing actions in accordance with instructions
and one or more memory devices for storing instructions and data.
Generally, a computer or processor will also include, or be
operatively coupled to receive data from or transfer data to, or
both, one or more mass storage devices for storing data, e.g.,
magnetic, magneto optical disks, or optical disks. However, a
computer or computing device need not have such devices. Moreover,
a computer or computing device can be embedded in another device,
e.g., a mobile telephone, a personal digital assistant (PDA), a
mobile audio or video player, a game console, a Global Positioning
System (GPS) receiver, or a portable storage device (e.g., a
universal serial bus (USB) flash drive), to name just a few.
Devices suitable for storing computer program instructions and data
include all forms of non-volatile memory, media and memory devices,
including by way of example semiconductor memory devices, e.g.,
EPROM, EEPROM, and flash memory devices; magnetic disks, e.g.,
internal hard disks or removable disks; magneto optical disks; and
CD ROM and DVD-ROM disks. The processor and the memory can be
supplemented by, or incorporated in, special purpose logic
circuitry.
[0075] To provide for interaction with a user, implementations of
the user interface described in this specification can be
implemented on a computer having a non-flexible or flexible screen,
e.g., a CRT (cathode ray tube), LCD (liquid crystal display), LED
(light emitting diode) or OLED (organic light emitting diode)
monitor, for displaying information to the user and a keyboard and
a pointer, e.g., a finger, a stylus, a mouse or a trackball, by
which the user can provide input to the computer. Other kinds of
devices can be used to provide for interaction with a user as well;
for example, feedback provided to the user can be any form of
sensory feedback, e.g., touch feedback, visual feedback, auditory
feedback, or tactile feedback; and input from the user can be
received in any form, including acoustic, speech, touch or tactile
input. In addition, a computer or processor can interact with a
user by sending documents to and receiving documents from a device
that is used by the user; for example, by sending web pages to a
web browser on a user device in response to requests received from
the web browser.
[0076] Implementations of the subject-matter described in this
specification can be implemented in a computing system that
includes a back end component, e.g., as a server, or that includes
a middleware component, e.g., an application server, or that
includes a front-end component, e.g., a user computer having a
graphical user interface or a web browser through which a user can
interact with an implementation of the subject-matter described in
this specification, or any combination of one or more such back
end, middleware, or front-end components. The components of the
system can be interconnected by any form or medium of digital data
communication, e.g., a communication network. Examples of
communication networks include a local area network ("LAN") and a
wide area network ("WAN"), an inter-network (e.g., the Internet),
and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).
[0077] The computing system can include users and servers. A user
and server are generally remote from each other and typically
interact through a communication network. The relationship of user
and server arises by virtue of computer programs running on the
respective computers and having a user-server relationship to each
other. In some implementations, a server transmits data (e.g., an
HTML page) to a user device (e.g., for purposes of displaying data
to and receiving user input from a user interacting with the user
device). Data generated at the user device (e.g., a result of the
user interaction) can be received from the user device at the
server.
[0078] While this specification contains many specific
implementation details, these should not be construed as
limitations on the scope of any implementation or on the scope of
what may be claimed, but rather as descriptions of features that
may be specific to particular implementations. Certain features
that are described in this specification in the context of separate
implementations can also be implemented in combination in a single
implementation. Conversely, various features that are described in
the context of a single implementation can also be implemented in
multiple implementations separately or in any suitable
sub-combination. Moreover, although features may be described above
as acting in certain combinations and even initially claimed as
such, one or more features from a claimed combination can in some
cases be excised from the combination, and the claimed combination
may be directed to a sub-combination or variation of a
sub-combination.
[0079] Similarly, while operations are depicted in the drawings in
a particular order, this should not be understood as requiring that
such operations be performed in the particular order shown or in
sequential order, or that all illustrated operations be performed,
to achieve desirable results. In certain circumstances,
multitasking and parallel processing may be advantageous. Moreover,
the separation of various system modules and components in the
implementations described above should not be understood as
requiring such separation in all implementations, and it should be
understood that the described program components and systems can
generally be integrated together in a single software product or
packaged into multiple software products.
[0080] Particular implementations of the subject matter have been
described. Other implementations, alterations, and permutations of
the described implementations are within the scope of the following
claims as will be apparent to those skilled in the art. For
example, the operations recited in the claims can be performed in a
different order and still achieve desirable results.
[0081] Accordingly, the above description of example
implementations does not define or constrain this disclosure. Other
changes, substitutions, and alterations are also possible without
departing from the spirit and scope of this disclosure.
* * * * *