U.S. patent application number 16/119053 was filed with the patent office on 2018-12-27 for data buffering.
This patent application is currently assigned to Solarflare Communications, Inc.. The applicant listed for this patent is Solarflare Communications, Inc.. Invention is credited to Steven Leslie Pope, David James Riddoch.
Application Number | 20180375782 16/119053 |
Document ID | / |
Family ID | 35911635 |
Filed Date | 2018-12-27 |
![](/patent/app/20180375782/US20180375782A1-20181227-D00000.png)
![](/patent/app/20180375782/US20180375782A1-20181227-D00001.png)
![](/patent/app/20180375782/US20180375782A1-20181227-D00002.png)
![](/patent/app/20180375782/US20180375782A1-20181227-D00003.png)
United States Patent
Application |
20180375782 |
Kind Code |
A1 |
Pope; Steven Leslie ; et
al. |
December 27, 2018 |
DATA BUFFERING
Abstract
A method is disclosed for bridging between a first data link
carrying data units of a first data protocol and a second data link
for carrying data units of a second protocol by means of a bridging
device. This method may comprise receiving by means of a first
entity data units of a first protocol, and storing those data units
in the memory. Then, accessing by means of a protocol processing
entity the protocol data of data units stored in the memory and
thereby performing protocol processing for those data units under
the first protocol. The method also accesses by means of a second
interface entity the traffic data of data units stored in the
memory and thereby transmits that traffic data over the second data
link in data units of the second data protocol.
Inventors: |
Pope; Steven Leslie;
(Cambridge, GB) ; Riddoch; David James;
(Fenstanton, GB) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Solarflare Communications, Inc. |
Irvine |
CA |
US |
|
|
Assignee: |
Solarflare Communications,
Inc.
Irvine
CA
|
Family ID: |
35911635 |
Appl. No.: |
16/119053 |
Filed: |
August 31, 2018 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13644433 |
Oct 4, 2012 |
10104005 |
|
|
16119053 |
|
|
|
|
12215437 |
Jun 26, 2008 |
8286193 |
|
|
13644433 |
|
|
|
|
PCT/GB2006/004946 |
Dec 28, 2006 |
|
|
|
12215437 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04L 47/27 20130101;
G06F 9/544 20130101; H04L 47/62 20130101; G06F 9/45533 20130101;
H04L 29/06 20130101; H04L 69/16 20130101; H04L 47/30 20130101 |
International
Class: |
H04L 12/863 20060101
H04L012/863; H04L 29/06 20060101 H04L029/06; H04L 12/835 20060101
H04L012/835; H04L 12/807 20060101 H04L012/807; G06F 9/455 20060101
G06F009/455; G06F 9/54 20060101 G06F009/54 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 28, 2005 |
GB |
0526519.4 |
Feb 1, 2006 |
GB |
0602033.3 |
Jan 10, 2016 |
GB |
0600417.0 |
Claims
1-27. (canceled)
28. A method comprising: receiving, at a first interface entity,
data units in accordance with a first protocol, and storing at
least traffic data of those data units in a memory buffer, wherein
the data units in accordance with each protocol include protocol
data and traffic data; receiving a call from a protocol processing
entity, at the first interface entity, and returning in response
thereto a message comprising a reference to the memory buffer and
at least some protocol data, control of said memory buffer being
handed over to said protocol processing entity; performing, in
response to said message, by the protocol processing entity,
protocol processing for those data units, control of said memory
buffer being handed over to a second interface entity; and
accessing, by the second interface entity, the traffic data of data
units stored in the memory buffer and thereby transmitting that
traffic data through an output in data units in accordance with a
second protocol.
29. A method as claimed in claim 28, wherein the protocol
processing entity is arranged to perform protocol processing for
the data units stored in the memory buffer without it accessing the
traffic data of those units stored in said memory buffer.
30. A method as claimed in claim 29, wherein the first protocol is
such that protocol data of a data unit of the first protocol
includes check data that is a function of the traffic data of the
data unit, and the method comprises: applying the function by the
first interface entity to the content of a data unit of the first
protocol received by the first interface entity to calculate first
check data; transmitting the first check data to the protocol
processing entity; and comparing by the protocol processing entity
the first check data calculated for a data unit with the check data
included in the protocol data of that data unit.
31. A method as claimed in claim 28, wherein: the memory buffer is
at least one memory buffer; the first interface entity, the second
interface entity and the protocol processing entity are each
arranged to access a first memory buffer of the at least one memory
buffer only when they have control of the first memory buffer; and
the method comprises: the first interface entity storing a received
data unit of the first protocol in a given memory buffer, of the at
least one memory buffer, of which it has control and subsequently
passing control of the given memory buffer to the protocol
processing entity; the protocol processing entity passing control
of a given memory buffer to the second interface entity when it has
performed protocol processing of the or each data unit stored in
the given memory buffer; and the second interface entity passing
control of a given memory buffer to the first interface entity when
it has transmitted the traffic data contained in the given memory
buffer through the output in data units in accordance with the
second data protocol.
32. A method as claimed in claim 28, comprising: generating, by the
protocol processing entity, protocol data of the second protocol
for the data units to be transmitted under the second protocol;
communicating that protocol data to the second interface entity;
and the second interface entity including that protocol data in the
said data units in accordance with the second protocol.
33. A method as claimed in claim 32, wherein the second protocol is
such that protocol data of a data unit of the second protocol
includes check data that is a function of the traffic data of the
data unit, and the method comprises: applying the function by the
second interface entity to the content of a data unit of the second
protocol to be transmitted by the second interface entity to
calculate first check data; combining that check data with protocol
data received from the protocol processing entity to form second
protocol data; and the second interface entity including the second
protocol data in the said data units in accordance with the second
protocol.
34. A method as claimed in claim 28, wherein one of the first and
second protocols is TCP.
35. A method as claimed in claim 28, wherein one of the first and
second protocols is Fibrechannel.
36. A method as claimed in claim 28, wherein the first and second
protocols are the same protocol.
37. A method as claimed in claim 28, wherein the first and second
interface entities each communicate with the respective data link
via a respective hardware interface.
38. A method as claimed in claim 28, wherein the protocol
processing comprises terminating a link of the first protocol.
39. A method as claimed in clam 28, wherein the protocol processing
comprises: inspecting the traffic data of the first protocol;
comparing the traffic data of the first protocol with one or more
pre-set rules; and if the traffic data does not satisfy the rules
preventing that traffic data from being transmitted by the second
interface entity.
40. An apparatus comprising: one or more processors configured to
provide a first interface entity for interfacing with an input, a
second interface entity for interfacing with an output, and a
protocol processing entity, wherein the input carries data units in
accordance with a first protocol and the output carries data units
in accordance with a second protocol, the first and second
protocols being such that data units in accordance with each
protocol include protocol data and traffic data; the first
interface entity being arranged to receive data units in accordance
with the first protocol, store those data units in a memory buffer,
and in response to receipt of a call from the protocol processing
entity, return a message thereto comprising a reference to the
memory buffer and at least some protocol data, control of said
memory buffer being handed over to said protocol processing entity;
the protocol processing entity being arranged to perform protocol
processing for those data units, control of said memory buffer
being handed over to said second interface entity; and the second
interface entity being arranged to access the traffic data of the
data units stored in the memory buffer and thereby transmit that
traffic data through the output in data units in accordance with a
second data protocol.
41. An apparatus comprising: an interface for interfacing with an
input, wherein the input carries data units in accordance with a
first protocol, the first protocol being such that data units in
accordance with the first protocol include protocol data and
traffic data; and one or more processors configured to provide a
protocol processing entity, wherein said interface is configured to
perform at least some protocol processing for data units received
at the input, and respond to at least one message from said
protocol processing entity with a reference to a memory buffer in
which one or more data units are stored, control of said memory
buffer being handed over to said protocol processing entity by
removing said memory buffer from an owned buffer list of the
interface and adding said memory buffer to an owned buffer list of
the protocol processing entity, and the protocol processing entity
is arranged to cause said one or more processors to perform
protocol processing for said data units stored in said memory
buffer without accessing the traffic data of those data units
stored in said memory buffer.
42. A method as claimed in claim 28, wherein the first interface
entity comprises a first transport library, and the second
interface entity comprises a second transport library.
43. A method as claimed in claim 28, further comprising: in
response to receiving a message from the protocol processing entity
at the second interface entity, processing protocol data to form
protocol data of a second protocol.
44. A method as claimed in claim 31, wherein: each memory buffer is
identifiable by a virtual reference; the step of the first
interface entity passing control of a buffer to the protocol
processing entity comprises passing to the protocol processing
entity a virtual reference to that buffer; and the step of the
protocol processing entity passing control of a buffer to the
second interface entity comprises passing to the second interface
entity a virtual reference to that buffer.
45. A method as claimed in claim 28, further comprising: the
protocol processing entity periodically issuing calls to the first
interface entity to initiate protocol processing of data units
stored in said memory buffer; the first interface entity, in
response to receiving one or more of said calls, returning a
response message to the protocol processing entity comprising at
least one of: protocol data of data units stored in the memory
buffer; and/or an indication of a location in memory of the traffic
data of data units stored in the memory buffer.
46. A method as claimed in claim 28, wherein the one or more
processors are configured to provide the first interface entity,
second interface entity, and protocol processing entity at user
level.
Description
PRIORITY CLAIM
[0001] This application is a continuation of and claims priority to
U.S. patent application No. 12/215,437 filed Jun. 26, 2008, which
claims priority to PCT Application No. PCT/GB2006/004946 filed on
Dec. 28, 2006 which claims priority to Great Britain Application
No. 0602033.3 filed on Feb. 1, 2006.
FIELD OF THE INVENTION
[0002] This invention relates to the buffering of data, for example
in the processing of data units in a device bridging between two
data protocols.
BACKGROUND OF THE INVENTION
[0003] FIG. 1 shows in outline the logical and physical
architecture of a bridge 1 for bridging between data links 2 and 3.
In this example link 2 carries data according to the Fibrechannel
protocol and link 3 carries data according to the ISCSI (Internet
Small Computer Serial Interface) protocol over the Ethernet
protocol (known as ISCSI-over-Ethernet). The bridge comprises a
Fibrechannel hardware interface 4, an Ethernet hardware interface 5
and a data processing section 6. The interfaces link the data
processing section to the respective data links 2 and 3. The data
processing section implements a series of logical protocol layers:
a Fibrechannel driver 7, a Fibrechannel stack 8, a bridge/buffer
cache 9, an ISCSI stack 10, a TCP (transmission control protocol)
stack 11 and an Ethernet driver 12. These layers convert packets
that have been received in accordance with one of the protocols
into packets for transmission according to the other of the
protocols, and buffer the packets as necessary to accommodate flow
control over the links.
[0004] FIG. 2 shows the physical architecture of the data
processing section 6. The data processing section 6 comprises a
data bus 13, such as a PCI (personal computer interface) bus.
Connected to the data bus 13 are the Ethernet hardware interface 5,
the Fibrechannel hardware interface 4 and the memory bus 14.
Connected to the memory bus 14 are a memory unit 15, such as a RAM
(random access memory) chip, and a CPU (central processing unit) 16
which has an integral cache 17.
[0005] The example of an ISCSI-over-Ethernet packet being received
and translated to Fibrechannel will be discussed, in order to
explain problems of the prior art. The structure of the Ethernet
packet is shown in FIG. 3. The packet 30 comprises an Ethernet
header 31, a TCP header 32, an ISCSI header 33 and ISCSI traffic
data 34.
[0006] Arrows 20 to 22 in FIG. 2 illustrate the conventional manner
of processing an incoming Ethernet packet in this system. The
Ethernet packet is received by Ethernet interface 5 and passed over
the PCI and memory buses 12, 13 to memory 14 (step 20), where it is
stored until it can be processed by the CPU 15. When the CPU is
ready to process the Ethernet packet it is passed over the memory
bus to the cache 16 of the CPU. (Step 21). The CPU processes the
packet to perform protocol processing and re-encapsulate the data
for transmission over Fibrechannel. The Fibrechannel packet is then
passed over the memory bus and the PCI bus to the Fibrechannel
interface 4 (step 22), from which it is transmitted. It will be
appreciated that this process involves passing the entire Ethernet
packet three times over the memory bus 13. These bus traversals
slow down the bridging process.
[0007] It would be possible to pass the Ethernet packet directly
from the Ethernet interface 5 to the CPU, without it first being
stored in memory. However, this would require the CPU to signal the
Ethernet hardware to tell it to pass the packet, or alternatively
for the CPU and the Ethernet hardware to be synchronised, which
would be inefficient and could also lead to poor cache performance.
In any event, this is not readily possible in current server
chipsets.
[0008] An alternative process is illustrated in FIG. 4. FIG. 4 is
analogous to FIG. 2 but shows different process steps. In step 23
the received Ethernet packet is passed from the Ethernet hardware
to the memory 14. When the CPU is ready to process the packet only
the header data is passed to the CPU. (Step 24). The CPU process
the header data, forms a Fibrechannel header and transmits the
Fibrechannel header to the Fibrechannel interface. (Step 25). Then
the traffic data 34 is passed to the Fibrechannel hardware (step
26), which mates it with the received header to form a Fibrechannel
packet for transmission. This method has the advantage that the
traffic data 34 traverses the memory bus only twice. However, this
method is not straightforward to implement, since the CPU must be
capable of arranging for the traffic data to be passed from the
memory 14 to the Fibrechannel hardware in step 26. This is
problematic because the CPU would conventionally have received only
the headers for that packet, without any indication of where the
packet was located in memory, and so it would have no knowledge of
where the traffic data is located in the memory. As a result, the
CPU would be unable to inform the bridging entity that is to
transmit that data onwards of what data is to be transmitted.
Furthermore, if that transmitting entity is to be implemented in
software then it could be implemented at user level, for example as
an application, or as part of the operating system kernel. If it is
implemented at user level then it would not conventionally be able
to access physical memory addresses, being restricted instead to
accessing memory via virtual memory addresses. As a result, it
could not access the packet data in memory directly via a physical
address. Alternatively, if the transmitting entity is implemented
in the kernel then for software abstraction and engineering reasons
it would be preferable for it to interface with the network at a
high level of abstraction, for instance by way of a sockets API
(application programming interface). As a result, it would be
preferred that it does not access the packet data in memory
directly via a physical address.
[0009] One way of addressing this problem is to permit the Ethernet
hardware 5 to access the memory 14 by RDMA (remote direct memory
access), and for the Ethernet hardware to be allocated named
buffers in the memory. Then the Ethernet hardware can write the
traffic data of each packet to a specific named buffer and through
the RDMA interface with the bridging application (e.g. uDAPL)
indicate to the application the location/identity of the buffer
which has received data. The CPU can access the data by means of
reading the buffer, for example by means of a post( ) instruction
having as its operand the name of the buffer that is to be read.
The Fibrechannel hardware can then be passed a reference to the
named buffer by the application and so (also by RDMA) read the data
from the named buffer. The buffer remains allocated to the Ethernet
hardware during the reading step(s).
[0010] One problem with this approach is that it requires the
Ethernet hardware to be capable of accessing the memory 14 by RDMA,
and to include functionality that can handle the named buffer
protocol. if the Ethernet hardware is not compatible with RDMA or
with the named buffer protocol, or if the remainder of the system
is not configured to communicated with the Ethernet hardware by
RDMA then this method cannot be used. Also, RDMA typically involves
performance overheads.
[0011] Analogous problems arise when bridging in the opposite
direction: from Fibrechannel to ISCSI, and when using other
protocols.
[0012] There is therefore a need to improve the processing of data
units in bridging situations.
SUMMARY
[0013] According to one aspect of the present invention there is
provided a method for bridging between a first data link carrying
data units of a first data protocol and a second data link for
carrying data units of a second protocol by means of a bridging
device, the first and second protocols being such that data units
of each protocol include protocol data and traffic data and the
bridging device comprising a first interface entity for interfacing
with the first data link, a second interface entity for interfacing
with the second data link, a protocol processing entity and a
memory accessible by the first interface entity, the second
interface entity and the protocol processing entity, the method
comprising: receiving by means of the first interface entity data
units of the first protocol, and storing those data units in the
memory; accessing by means of the protocol processing entity the
protocol data of data units stored in the memory and thereby
performing protocol processing for those data units under the first
protocol; and accessing by means of the second interface entity the
traffic data of data units stored in the memory and thereby
transmitting that traffic data over the second data link in data
units of the second data protocol.
[0014] According to a second aspect of the present invention there
is provided a bridging device for bridging between a first data
link carrying data units of a first data protocol and a second data
link for carrying data units of a second protocol, the first and
second protocols being such that data units of each protocol
include protocol data and traffic data and the bridging device
comprising: a first interface entity for interfacing with the first
data link, a second interface entity for interfacing with the
second data link, a protocol processing entity and a memory
accessible by the first interface entity, the second interface
entity and the protocol processing entity; the first interface
entity being arranged to receive data units of the first protocol,
and storing those data units in the memory; the protocol processing
entity being arranged to access the protocol data of data units
stored in the memory and thereby perform protocol processing for
those data units under the first protocol; and the second interface
entity being arranged to access the traffic data of data units
stored in the memory and thereby transmit that traffic data over
the second data link in data units of the second data protocol.
[0015] According to a third aspect of the present invention there
is provided a data processing system comprising: a memory
comprising a plurality of buffer regions; an operating system for
supporting processing entities running on the data processing
system and for restricting access to the buffer regions to one or
more entities; a first interface entity running on the data
processing system whereby a first hardware device may communicate
with the buffer regions; and an application entity running on the
data processing system; the first interface entity and the
application entity being configured to, in respect of a buffer
region to which the operating system permits access by both the
interface entity and the application entity, communicate ownership
data so as to indicate which of the first interface entity and the
application entity may access the buffer region and to access the
buffer region. only in accordance with the ownership data.
[0016] According to a fourth aspect of the present invention there
is provided a method for operating a data processing system
comprising: a memory comprising a plurality of buffer regions; an
operating system for supporting processing entities running on the
data processing system and for restricting access to the buffer
regions to one or more entities; a first interface entity running
on the data processing system whereby a first hardware device may
communicate with the buffer regions; and an application entity
running on the data processing system; the method comprising, in
respect of a buffer region to which the operating system permits
access by both the interface entity and the application entity,
communicating ownership data by means of the first interface entity
and the application entity so as to indicate which of the first
interface entity and the application entity may access the buffer
region and to access the buffer region only in accordance with the
ownership data.
[0017] According to a fifth aspect of the present invention there
is provided a protocol processing entity for operation in a
bridging device for bridging between a first data link carrying
data units of a first data protocol and a second data link for
carrying data units of a second protocol by means of a bridging
device, the first and second protocols being such that data units
of each protocol include protocol data and traffic data and the
protocol processing entity being arranged to cause a processor of
the bridging device to perform protocol processing for data units
stored in the memory without it accessing the traffic data of those
units stored in the memory. The protocol processing entity may be
implemented in software. The software may be stored on a data
carrier.
[0018] The protocol processing entity may be arranged to perform
protocol processing for the data units stored in the memory without
it accessing the traffic data of those units stored in the
memory.
[0019] The first protocol may be such that protocol data of a data
unit of the first protocol includes check data that is a function
of the traffic data of the data unit. The method may then comprise:
applying the function by means of the first entity to the content
of a data unit of the first protocol received by the first
interface entity to calculate first check data; transmitting the
first check data to the protocol processing entity; and comparing
by means of the protocol processing entity the first check data
calculated for a data unit with the check data included in the
protocol data of that data unit.
[0020] The memory may comprise a plurality of buffer regions. The
first interface entity, the second interface entity and the
protocol processing entity may each be arranged to access a buffer
region only when they have control of it. The method may then
comprise: the first interface entity storing a received data unit
of the first protocol in a buffer of which it has control and
subsequently passing control of that buffer to the protocol
processing entity; the protocol processing entity passing control
of a buffer to the second interface entity when it has performed
protocol processing of the or each data unit stored in that buffer;
and the second interface entity passing control of a butler to the
first interface entity when it has transmitting the traffic data
contained in that buffer over the second data link in data units of
the second data protocol.
[0021] The method may comprise: generating by means of the protocol
processing entity protocol data of the second protocol for the data
units to be transmitted under the second protocol; communicating
that protocol data to the second interface entity; and the second
interface entity including that protocol data in the said data
units of the second protocol.
[0022] The second protocol may be such that protocol data of a data
unit of the second protocol includes check data that is a function
of the traffic data of the data unit. The method may then comprise:
applying the function by means of the second interface entity to
the content of a data unit of the second protocol to be transmitted
by the second interface entity to calculate first check data;
combining that check data with protocol data received from the
protocol processing entity to form second protocol data; and the
second interface entity including the second protocol data in the
said data units of the second protocol.
[0023] One of the first and second protocols may be TCP. One of the
first and second protocols may be Fibrechannel. The first and
second protocols may be the same.
[0024] The first and second interface entities may each communicate
with the respective data link via a respective hardware
interface.
[0025] The first and second interface entities may each communicate
with the respective data link via the same hardware interface.
[0026] The protocol processing may comprise terminating a link of
first protocol.
[0027] The protocol processing may comprise: inspecting the traffic
data of the first protocol; comparing the traffic data of the first
protocol with one or more pre-set rules; and if the traffic data
does not satisfy the rules preventing that traffic data from being
transmitted by the second interface entity.
[0028] The data processing system may comprise a second interface
entity running on the data processing system whereby a second
hardware device may communicate with the buffer regions. The first
and second interface entities and the application entity may be
configured to, in respect of a buffer region to which the operating
system permits access by the first and second interface entities
and the application entity, communicate ownership data so as to
indicate which of the first and second interface entities and the
application entity may access each buffer regions and to access
each buffer region only in accordance with the ownership data.
[0029] The first interface entity may be arranged to, on receiving
a data unit, store that data unit in a buffer region that it may
access in accordance with the ownership data and to subsequently
modify the ownership data such that the application entity may
access that buffer region in accordance with the ownership data.
The application entity may be arranged to perform protocol
processing on data unit(s) stored in a buffer region that it may
access in accordance with the ownership data and to subsequently
modify the ownership data such that the second interface entity may
access that buffer region in accordance with the ownership data.
The second interface entity may be arranged to transmit at least
some of the content of data unit(s) stored in a buffer region that
it may access in accordance with the ownership data and to
subsequently modify the ownership data such that the application
entity may access that buffer region in accordance with the
ownership data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0030] The present invention will now be described by way of
example with reference to the accompanying drawings. In the
drawings:
[0031] FIG. 1 shows in outline the logical and physical
architecture of a bridge.
[0032] FIG. 2 shows the architecture of the bridge of FIG. 1 in
more detail, illustrating data transfer steps.
[0033] FIG. 3 shows the structure of an ISCSI-over-Ethernet
packet.
[0034] FIG. 4 shows the architecture of the bridge of FIG. 1,
illustrating alternative data transfer steps.
[0035] FIG. 5 illustrates the physical architecture of a bridging
device.
[0036] FIG. 6 illustrates the logical architecture of the bridging
device of FIG. 5.
[0037] FIG. 7 shows the processing of data in the bridging device
of FIG. 5.
DETAILED DESCRIPTION OF THE INVENTION
[0038] In the bridging device described below, data units of a
first protocol are received by interface hardware and written to
one or more receive buffers. In the example described below, those
data units are TCP packets which encapsulate ISCSI packets. The TCP
and ISCSI header data is then passed to the entity that performs
protocol processing. The header data is passed to that entity
without the traffic data of the packets, but with information that
identifies the location of the traffic data within the buffer(s).
The protocol processing entity performs TCP and ISCSI protocol
processing. If protocol processing is successful then it also
passes the data identifying the location of the traffic data in the
buffers to an interface that will be used for transmitting the
outgoing packets. The interface can then read that data, form one
or more headers for transmitting it as data units of a second
protocol, and transmit it. In bridging between the data links that
carry the packets of the respective protocols, the bridging device
receives data units of one protocol and transmits data units of
another protocol which include the traffic data contained in the
received data units.
[0039] FIG. 5 shows the physical architecture of a device 40 for
bridging between an ISCSI-over-Ethernet data link 41 and a
Fibrechannel data link 42. The device comprises an Ethernet
hardware interface 43, a Fibrechannel hardware interface 44 and a
central processing section 45. The hardware interfaces link the
respective data links to the central processing section 45 via a
bus 46, which could be a PCI bus. The central processing section
comprises a CPU 47, which includes a cache 47a and a processing
section 47b, and random access memory 48 which are linked by a
memory bus 49 to the PCI bus. A non-volatile storage device 50,
such as a hard disc, stores program code for execution by the
CPU.
[0040] FIG. 6 shows the logical architecture provided by the
central processing section 45 of the bridging device 40. The CPU
provides four main logical functions: an Ethernet transport library
51, a bridging application 52, a Fibrechannel transport library 53
and an operating system kernel 54. The transport libraries, the
bridging application and the operating system are implemented in
software which is executed by the CPU. The general principles of
operation of such systems are discussed in WO 2004/025477.
[0041] Areas of the memory 48 are allocated for use as buffers 55,
56. These buffers are configured in such a way that the interface
that receives the incoming data can write to them, the bridging
application can read from them, and the interface that transmits
the outgoing data can read from them. This may be achieved in a
number of ways. In a system that is configured not to police memory
access any buffer may be accessible in this way. In other operating
systems they may be set up as anonymous memory: i.e. memory that is
not mapped to a specific process; so that they can be freely
accessed by both interfaces. Another approach is to implement a
further process, or a set of instructions calls, or an API that is
able to act as an intermediary to access the buffers on behalf of
the interfaces.
[0042] The present example will be described with reference to a
system in which the operating system allocates memory resources to
specific processes and restricts other processes from accessing
those resources. The transport libraries 51, 53 and the bridging
application 52 are implemented in a single process, by virtue of
them occupying a common instruction space. As a result, a buffer
allocated to any of those three entities can be accessible to the
other two. (Under a normal operating system (OS), OS-allocated
buffers are only accessible if the OS chooses for them to be). The
interfaces 43, 44 should be capable of writing to and reading from
the buffers. This can be achieved in a number of ways. For example,
each transport libraries may implement an API through which the
respective interface can access the buffers. Alternatively, the
interface could interact directly with the operating system to
access the buffers. This may be convenient where, in an alternative
embodiment, one of the transport libraries is implemented as part
of the operating system and derives its ability to access the
buffers through its integration with the operating system rather
than its sharing of an instruction space with the bridging
application.
[0043] Each buffer is identifiable by a handle that acts as a
virtual reference to the buffer. The handle is issued by the
operating system when the buffer is allocated. An entity wishing to
read from the buffer can issue a read call to the operating system
identifying the buffer by the handle, in response to which the
operating system will return the content of the buffer or the part
of buffer cited in the read call. An entity wishing to write to the
buffer can issue a write call to the operating system identifying
the buffer by the handle, in response to which the operating system
will write data supplied with the call to the buffer or to the part
of buffer cited in the write call. As a result, the buffers need
not be referenced by a physical address, and can hence be accessed
by user-level entities under operating systems that limit the
access of user-level entities to physical memory.
[0044] The transport libraries and the bridging application
implement a protocol to allow them to cooperatively access the
buffers that are allocated to the instruction space that they
share. In this protocol each of those entities maintains an "owned
buffer" list of the buffers that it has responsibility for. Each
entity is arranged to access only those buffers currently included
in its owned buffer list. Each entity can pass a "handover" message
to one of the other entities. The handover message includes the
handle of a buffer. On transmitting the handover message (or
alternatively on acknowledgement of the handover message), the
entity that transmitted the handover message deletes the buffer
mentioned in the message from its owned buffer list. On receipt of
a handover message an entity adds the buffer mentioned in the
message to its owned buffer list. This process allows the entities
to cooperatively assign control of each buffer between each other,
independently of the operating system. The entity whose owned
buffer list includes a buffer is also responsible for the
administration of that buffer: for example for returning the buffer
to the operating system when it is no longer required. Buffers that
are subject to this protocol will be termed "anonymous buffers"
since the operating system does not discriminate between the
entities of the common instruction space in policing access to
those buffers.
[0045] The operation of the device for bridging packets from the
Ethernet interface to the Fibrechannel interface will now be
explained. The device operates in an analogous way to bridge
packets in the opposite direction.
[0046] At the start of operations the bridging application 52
requests the operating system 54 to allocate blocks of memory for
use by the bridging system as buffers 55. The operating system
allocates a set of buffers accordingly and passes handles to them
to the application. These buffers can then be accessed directly by
the bridging application and the transport libraries, and can be
accessed by the interfaces by means of the anonymous APIs
implemented by the respective transport libraries.
[0047] One or more of the buffers are passed to the incoming
transport library 51 by means of one or more handover messages. The
transport library adds those buffers to its owned buffer list. The
transport library maintains a data structure that permits it to
identify which of those buffers contains unprocessed packets. This
may be done by queuing the buffers or by storing a flag indicating
whether each buffer is in use. On being passed a buffer the
incoming transport library notes that buffer as being free. The
data structure preferably indicates the order in which the packets
were received, in order that that information can be used to help
prioritise their subsequent processing. Multiple packets could be
stored in each buffer, and a data structure maintained by the
Ethernet transport library to indicate the location of each
packet.
[0048] Referring to FIG. 7, as Ethernet packets are received
Ethernet protocol processing is performed by the Ethernet interface
hardware 43, and the Ethernet headers are removed from the Ethernet
packets, leaving TCP packets in which ISCSI packets are
encapsulated. Each of these packets is written by the Ethernet
hardware into one of the buffers 55. (Step 60). This is achieved by
the Ethernet hardware issuing a buffer write call to the API of the
Ethernet transport library, with the TCP packet as an operand. In
response to this call the transport library identifies a buffer
that is included in its owned buffer list and that is free to
receive a packet. It stores the received packet in that buffer and
then modifies its data structure to mark the buffer as being
occupied.
[0049] Thus, at least some of the protocol processing that is to be
performed on the packet can be performed by the interface (43, in
this example) that received the incoming packet data. This is
especially efficient if that interface includes dedicated hardware
for performing that function. Such hardware can also be used in
protocol processing for non-bridged packets: for example packets
sent to the bridge and that are to terminate there. One example of
such a situation is when an administrator is transmitting data to
control the bridging device remotely. The interface that receives
the incoming packet data has access to both the header and the
traffic data of the packet. As a result, it can readily perform
protocol processing operations that require knowledge of the
traffic data in addition to the header data. Examples of these
operations include verifying checksum data, CRC (cyclic redundancy
check) data or bit-count data. In addition to Ethernet protocol
processing the hardware could conveniently perform TCP protocol
processing of received packets.
[0050] The application 52 runs continually. Periodically it makes a
call, which may for example be "recv( )" or "complete( )" to the
transport library 51 to initiate the protocol processing of any
Ethernet packet that is waiting in one of the buffers 55. (Step
61). The recv( )/complete( ) call does not specify any buffer. In
response to the recv( )/complete( ) call the transport library 51
checks its data structure to find whether any of the buffers 55
contain unprocessed packets. Preferably the transport library
identifies the buffer that contains the earliest-received packet
that is still unprocessed, or if the buffer is capable of
prioritising certain traffic then it may bias its identification of
a packet based on that prioritisation. If an unprocessed packet has
been identified then the transport library responds to the recv(
)/complete( ) call by returning a response message to the
application (step 62), which includes: [0051] the TCP and ISCSI
headers of the identified packet, which may collectively be
considered to constitute a header or header data of the packet;
[0052] the handle of the buffer in which the identified packet is
stored; [0053] the start point within that buffer of the traffic
data block of the packet; and [0054] the length of the traffic data
block of the packet.
[0055] By means of the headers the application can perform protocol
processing on the received packet. The other data collectively
identifies the location of the traffic data for the packet. The
response message including a buffer handle is treated by the
incoming transport library and the bridging application as handing
that buffer over to the bridging application. The incoming
transport library deletes that buffer handle from its owned buffer
list as one of the buffers 55, and the bridging application adds
the handle to its owned buffer list.
[0056] It will be noted that by this message the application has
received the header of the packet and a handle to the traffic data
of the packet. However, the traffic data itself has not been
transferred. The application can now perform protocol processing on
the header data.
[0057] The protocol processing that is to be performed by the
application may involve functions that are to be performed on the
traffic data of the packet. For example, ISCSI headers include a
CRC field, which needs to be verified over the traffic data. Since
the application does not have access to the traffic data it cannot
straightforwardly perform this processing. Several options are
available. First, the application could assume that that CRC (or
other such error-check data) is correct. This may be a useful
option if the data is delay-critical and need not anyway be
re-transmitted, or if error checking is being performed in a
lower-level protocol. Another option is for the interface to
calculate the error-check data over the relevant portion of the
received packet and to store it in the buffer together with the
packet. The error check data can then be passed to the application
in the response message detailed above, and the application can
simply verify whether that data matches the data. that is included
in the header. This requires the interface to be capable of
identifying data of the relevant higher-level protocol (e.g. ISCSI)
embedded in received packets of a lower-level protocol (e.g.
Ethernet or TCP), and to be capable of executing the error-check
algorithm appropriate to that higher-level data. Thus, in this
approach the execution of the error-check algorithm is performed by
a different entity from that which carries out the remainder of the
protocol processing, and by a different entity from that which
verifies the error--check data.
[0058] Not all of the headers of the packet as received at the
hardware interface need be passed in the response message that is
sent to the application, or even stored in the buffer. If protocol
processing for one or more protocols is performed at the interface
then the headers for those protocols can be omitted from the
response and not stored in the buffer. However, it may still be
useful for the application to receive the headers of one or more
protocols for which the application does not perform protocol
processing. One reason for this is that it provides a way of allow
the application to calculate the outgoing route. The outgoing route
could be determined by the Fibrechannel transport library 53 making
use of system-wide route tables that could for example, be
maintained by the operating system. The Fibrechannel transport
library 53 can look up a destination address in the route tables so
as to resolve it to the appropriate outgoing FC interface.
[0059] The application is configured in advance to perform protocol
processing on one or more protocol levels. The levels that are to
be protocol processed by the application will depend on the
bridging circumstances. The application is configured to be capable
of performing such protocol processing in accordance with the
specifications for the protocol(s) in question. In the present
example the application performs protocol processing on the ISCSI
header. (Step 63).
[0060] Having performed protocol processing on the header as
received from the incoming transport library, the application then
passes a send( ) command to the Fibrechannel transport library
(step 65). The send( ) command includes as an operand the handle of
the buffer that includes the packet in question. It may also
include data that specifies the location of the traffic data in the
buffer, for example the start point and length of the traffic data
block of the packet. The send( ) command is interpreted by the
buffering application and by the outgoing transport library as
handing over that buffer to the outgoing transport library.
Accordingly, the bridging application deletes that buffer handle
from its owned buffer list, and the outgoing transport library adds
the handle to its owned buffer list, as one of the buffers 56.
[0061] The Fibrechannel transport library then reads the header
data from that buffer (step 66) and using the header alone (i.e.
without receiving the traffic data stored in the buffer) it forms a
Fibrechannel header for onward transmission of the corresponding
traffic data (step 67).
[0062] The Fiberchannel transport library then provides that header
and the traffic data to the Fibrechannel interface, which combines
them into a packet for transmission (step 68). The header and the
traffic data could be provided to the Fiberchannel interface in a
number of ways. For example, the header could be written into the
buffer and the start location and length of the header and the
traffic data could be passed to the Fiberchannel interface.
Conveniently the header could be written to the buffer immediately
before the traffic data, so that only one set of start location and
length data needs to be transmitted. If the outgoing header or
header set is longer than the incoming header or header set this
may require the incoming interface to write the data to the buffer
in such a way as to leave sufficient free space before the traffic
data to accommodate the outgoing header. The Fiberchannel interface
could then read the data from the buffer, for example by DMA
(direct memory access). Alternatively, the header could be
transmitted to the Fiberchannel interface together with the start
location and length of the traffic data and the interface could
then read the traffic data, by means of an API call to the
transport library, and combine the two together. Alternatively,
both the header and the traffic data could be transmitted to the
Fiberchannel interface. The header and the start/length data could
be provided to the Fiberchannel interface by being written to a
queue stored in a predefined set of memory locations, which is
polled periodically by the interface.
[0063] The outgoing header might have to include calculated data,
such as CRCs, that is to be calculated as a function of the traffic
data. In this situation the header as formed by the transport
library can include space (e.g. as zero bits) for receiving that
calculated data. The outgoing hardware interface can then calculate
the calculated data and insert it into the appropriate location in
the header. This avoids the outgoing transport library having to
access the traffic data.
[0064] Once the Fibrechannel packet has been transmitted for a
particular incoming packet the buffer in which the incoming packed
had been stored can be re-used. The Fibrechannel transport library
hands over ownership of the buffer to the Ethernet transport
library. Accordingly, the Fiberchannel transport library deletes
that buffer handle from its owned buffer list, and the Ethernet
transport library adds the handle to its owned buffer list, marking
the buffer as free for storage of an incoming packet.
[0065] As indicated above, the buffers in which the packets are
stored are implemented as anonymous buffers. When a packet is
received the buffer that is to hold that packet is owned by the
incoming hardware and/or the incoming transport library. When the
packet comes to he processed by the bridging application ownership
of the buffer is transferred to the bridging application. Then when
the packet comes to be transmitted ownership of the buffer is
transferred to the outgoing hardware and/or the outgoing transport
library. Once the packet has been transmitted ownership of the
buffer can be returned to the incoming hardware and/or the incoming
transport library. In this way the buffers can be used efficiently,
and without problems of access control. The use of anonymous
buffers avoids the need for the various entities to have to support
named buffers. This is especially significant in the case of the
incoming and outgoing hardware since it may not he possible to
modify pre-existing hardware to support named buffers. It may also
not be economically viable to use such hardware since it requires
significant additional complexity--namely the ability to fully
perform complex protocol processing e.g. to support TCP and RDMA
(iWARP) protocol processing, This would in practice require a
powerful CPU to be embedded in the hardware, which would make the
hardware excessively expensive.
[0066] Once each layer of protocol processing is completed for a
packet the portion of the packet's header that relates to that
protocol is no longer required. As a result, the memory in which
that portion of header was stored can be used to store other data
structures. This will be described in more detail below.
[0067] When a packet is received the incoming hardware and/or
transport library should have one or more buffers in its ownership.
It selects one of those buffers for writing the packet to. That
buffer may include one or more other received packets, in which
case the hardware/library selects suitable free space in the buffer
for accommodating the newly received packet. Preferably it attempts
to pack the available space efficiently. There are various ways to
aim at this: one is to find a space in a buffer that most closely
matches the size of the received packet, whilst not being smaller
than the received packet. The space in the buffer may be managed by
a data structure stored in the buffer itself which provides
pointers to the start and end of the packets stored in the buffer.
If the buffer includes multiple packets then ownership of the
buffer is passed to the application when any of those is to be
protocol processed by the application. When the packet has been
transmitted the remaining packets in the buffer remain unchanged
but the data structure is updated to show the space formerly
occupied by the packet as being vacant.
[0068] If the TCP and ISCSI protocol processing is unsuccessful
then the traffic data of the packet may be dropped. The data need
not be deleted from the buffer: instead the anonymous buffer handle
can simply passed back to the Ethernet transport library for
reuse.
[0069] This mechanism has the consequence that the traffic data
needs to pass only twice over the memory bus: once from the
Ethernet hardware to memory and once from memory to the
Fibrechannel hardware. It does not need to pass through the CPU; in
particular it does not need to pass through the cache of the CPU.
The same approach could be used for other protocols; it is not
limited to bridging between Ethernet and Fibrechannel.
[0070] The transport libraries and the application can run at user
level. This can improve reliability and efficiency over prior
approaches in which protocol processing is performed by the
operating system. Reliability is improved because the machine can
continue in operation even if a user-level process fails.
[0071] The transport libraries and the application are configured
programmatically so that if their ownership list does not include
the identification of a particular buffer they will not access that
buffer.
[0072] If the machine is running other applications in other
address spaces then the named buffers for one application are not
accessible to the others. This feature provides for isolation
between applications and system integrity. This is enforced by the
operating system in the normal manner of protecting applications'
memory spaces.
[0073] The received data can be delivered directly from the
hardware to the ISCSI stack, which is constituted by the Ethernet
transport library and the application operating in cooperation with
each other. This avoids the need for buffering received data on the
hardware, and for transmitting the data via the operating system as
in some prior implementations.
[0074] The trigger for the passing of data from the buffers to the
CPU is the polling of the transport library at step 61. The polling
can be triggered by an event sent by the Ethernet hardware to the
application on receipt of data, a timer controlled by the
application, by a command from a higher level process or from a
user, or in response to a. condition in the bridging device such as
the CPU running out of headers to process. This approach means that
there is no need for the protocol processing to be triggered by an
interrupt when data arrives. This economises on the use of
interrupts.
[0075] The bridging device may be implemented on a conventional
personal computer or server. The hardware interfaces could be
provided as network interface cards (NICs) which could each be
peripheral devices or built into the computer. For example, the
NICs could be provided as integrated circuits on the computer's
motherboard.
[0076] When multiple packets have been received the operations in
FIG. 7 can be combined for multiple packets. For example, the
response data (at step 62) for multiple packets can be passed to
the CPU and stored in the CPU's cache awaiting processing.
[0077] There may be limitations on the size of the outgoing packets
that mean that the traffic data of an incoming packet cannot be
contained in a single outgoing packet. In that case the traffic
data can be contained in two or more outgoing packets, each of
whose headers is generated by the transport library of the outgoing
protocol.
[0078] Since the packets are written to contiguous blocks of free
space in the buffers 54, as packets get removed from the buffers 54
gaps can appear in the stream of data in the buffers. If the
received packets are of different lengths then those gaps might not
be completely filled by new received data. As a result the buffers
can become fragmented, and therefore inefficiently utilised. To
mitigate this, as soon as the header of a received packet has been
passed to the CPU for processing the space occupied by that header
can be freed up immediately. That space can be used to allow a
larger packet to be received in a gap in memory preceding that
header. Alternatively that space can be used for various data
constructs. For example, it can be used to store a linked-list data
structure that allows packets to be stored discontiguously in the
buffer. Alternatively, it could be used to store the data structure
that indicates the location of each packet and the order in which
it was received. Fragmentation may also be reduced by performing a
defragmentation operation on the content of a buffer, or by moving
packets whose headers have not been passed to the CPU for
processing from one buffer to another. One preferred fragmentation
algorithm is to check from time to time for buffers that contain
less data than a pre-set threshold level. The data in such a buffer
is moved out to another buffer, and the data structure that
indicates which packet is where is updated accordingly.
[0079] In a typical architecture, when the Ethernet packet headers
are read to the CPU for processing by the bridging application they
will normally be stored in a cache of the CPU. The headers will
then be marked as "dirty" data. Therefore, in normal circumstances
they would be flushed out of the cache and written back to the
buffer so as to preserve the integrity of that memory. However,
once the headers have been processed by the bridging application
they are not needed any more, and so writing them back to the
buffer is wasteful. Therefore, efficiency can be increased by
taking measures to prevent the CPU from writing the headers back to
memory. One way to achieve this is by using an instruction such as
the whiny (write-back invalidate) instruction which is available on
some architectures. This instruction can be used in respect of the
header data stored in the cache to prevent the bridging application
from writing that dirty data back to the memory. The instruction
can conveniently be invoked by the bridging application on header
data that is stored in the cache when it completes processing of
that header data. At the same point, it can arrange for the space
in the buffer(s) that was occupied by that header data to be marked
as free for use, for instance by updating the data directory that
indicates the buffer contents.
[0080] The principles described above can be used for bridging in
the opposite direction: from Fibrechannel to ISCSI, and when using
other protocols. Thus the references herein to Ethernet and
Fibrechannel can be substituted for references to other incoming
and outgoing protocols respectively. They could also be used for
bridging between links that use two identical protocols. In the
apparatus of FIG. 5 the software could be configured to permit
concurrent bridging in both directions. If the protocols are
capable of being operated over a common data link then the same
interface hardware could be used to provide the interface for
incoming and for outgoing packets.
[0081] The anonymous buffer mechanism described above could be used
in applications other than bridging. In general it can be
advantageous wherever multiple devices that have their own
processing capabilities are to process data units in a buffer, and
where one of those devices is to carry out processing on only a
part of each data unit. In such situations the anonymous buffer
mechanism allows the devices or their interfaces to the buffer to
cooperates so that the entirety of each data unit need not pass
excessively through the system. One examples of such an application
is a firewall in which a network card is to provide data units to
an application that is to inspect the header of each data unit and
in dependence on that header either block or pass the data unit. In
that situation, the processor would not need to terminate a link of
the incoming protocol to perform the required processing: it could
simply inspect incoming packets, compare them with pre-stored rules
and allow them to pass only if they satisfy the rules. Another
example is a tape backup application where data is being received
by a computer over a network, written to a buffer and then passed
to a tape drive interface for storage. Another example is a billing
system for a telecommunications network, in which a network device
inspects the headers of packets in order to update billing records
for subscribers based on the amount or type of traffic passing to
or from them.
[0082] The applicant hereby discloses in isolation each individual
feature described herein and any combination of two or more such
features, to the extent that such features or combinations are
capable of being carried out based on the present specification as
a whole in the light of the common general knowledge of a person
skilled in the art, irrespective of whether such features or
combinations of features solve any problems disclosed herein, and
without limitation to the scope of the claims. The applicant
indicates that aspects of the present invention may consist of any
such individual feature or combination of features. In view of the
foregoing description it will be evident to a person skilled in the
art that various modifications may be made within the scope of the
invention.
* * * * *