U.S. patent application number 11/090734 was filed with the patent office on 2005-08-04 for system and method for operating a packet buffer in an intermediate node.
Invention is credited to Key, Kenneth M., Mak, Kwok Ken, Sun, Xiaoming.
Application Number | 20050169291 11/090734 |
Document ID | / |
Family ID | 29399223 |
Filed Date | 2005-08-04 |
United States Patent
Application |
20050169291 |
Kind Code |
A1 |
Key, Kenneth M. ; et
al. |
August 4, 2005 |
System and method for operating a packet buffer in an intermediate
node
Abstract
A technique implements a novel high-speed high-density packet
buffer utilizing a combination of high-speed and low-speed memory
devices. The novel packet buffer is organized as a plurality of
FIFO queues where each FIFO queue is associated with a particular
input or output line. Each queue comprises a high-speed cache
portion that resides in high-speed memory and a low-speed
high-density portion that resides in low-speed high-density memory.
The high-speed cache portion contains FIFO data that contains head
and/or tail associated with the novel FIFO queue. The low-speed
high-density portion contains FIFO data that is not contained in
the high-speed cache portion.
Inventors: |
Key, Kenneth M.; (Raleigh,
NC) ; Mak, Kwok Ken; (Chapel Hill, NC) ; Sun,
Xiaoming; (Chapel Hill, NC) |
Correspondence
Address: |
CESARI AND MCKENNA, LLP
88 BLACK FALCON AVENUE
BOSTON
MA
02210
US
|
Family ID: |
29399223 |
Appl. No.: |
11/090734 |
Filed: |
March 25, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11090734 |
Mar 25, 2005 |
|
|
|
10135603 |
Apr 30, 2002 |
|
|
|
6892285 |
|
|
|
|
Current U.S.
Class: |
370/412 |
Current CPC
Class: |
H04L 69/329 20130101;
H04L 49/90 20130101; H04L 29/06 20130101 |
Class at
Publication: |
370/412 |
International
Class: |
H04L 012/28 |
Claims
What is claimed is:
1-17. (canceled)
18. An intermediate node comprising: a packet buffer comprising one
or more queues wherein each queue is configured to hold one or more
packets and comprises a high-speed portion and an low-speed
portion; and a traffic manager connected to the packet buffer and
configured to enqueue and dequeue the one or more packets to and
from the one or more queues.
19. The intermediate node of claim 18 wherein the queues are FIFO
queues.
20. The intermediate node of claim 18 wherein the traffic manager
further comprises: an internal packet memory configured to hold the
one or more packets; a queue descriptor memory configured to hold
information specific to each of the queues; a queue manager
configured to manage the one or more queues; and a scheduler
configured to determine when the one or more packets are dequeued
from the one or more queues.
21-24. (canceled)
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present invention is related to the following co-pending
and commonly assigned U.S. patent application Ser. No.
(112025-0502) titled, Queue Cache, which was filed on even
date.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The invention relates generally to networking devices and
more specifically to caching data contained in packet buffers.
[0004] 2. Background Information
[0005] A computer network is a geographically distributed
collection of interconnected communication links for transporting
data between nodes, such as computers. Many types of computer
networks are available, with the types ranging from local area
networks (LANs) to wide area networks (WANs). The nodes typically
communicate by exchanging discrete frames or packets of data
according to pre-defined protocols, such as the Transmission
Control Protocol/Internet Protocol (TCP/IP) or the Internetwork
Packet eXchange (IPX) protocol.
[0006] The topology of a computer network can vary greatly. For
example, the topology may comprise a single LAN containing a single
intermediate node of a type such as, e.g., a hub, with end-nodes
attached to the hub. A more complex network may contain one or more
local area networks interconnected through a complex intermediate
internetwork comprising a plurality of other types of intermediate
nodes, such as switches or routers, to form a WAN. Each of these
latter intermediate nodes typically contain a central processor
that enables the intermediate node to, inter alia, route or switch
the packets of data along the interconnected links from, e.g., a
source end-node that originates the data to a destination end-node
that is designated to receive the data. Often, these intermediate
nodes employ packet buffers to temporarily hold packets that are
processed by the nodes.
[0007] Packet buffers often comprise one or more memory devices
that are arranged to form one or more First-In First-Out (FIFO)
queues where each queue is associated with a particular input or
output line. The size of each FIFO queue often depends on the rate
of the line associated with the queue, as well as the time it takes
for a packet to be processed by the intermediate node. For example,
assume an input line on an intermediate node has a line rate of 1
Gigabits per second (Gb/s) and a packet takes 250 milliseconds (ms)
to be processed by the node. The FIFO queue size can be determined
by multiplying the line rate times the processing rate, thus
yielding a queue size of at least 250 megabits (Mb).
[0008] The line rates associated with the input or output lines
typically define the minimal required memory bandwidth of the
packet buffer needed to support those lines. Memory bandwidth is
often determined by taking the reciprocal of the "random cycle
time" (tRC) associated with the memory devices that comprise the
packet buffer and multiplying this result by the number of bits
that can be transferred to the memory devices at a time. For
example, assume a packet buffer can handle 64-bit data transfers
and the memory devices that comprise the buffer have a tRC of 50
nanoseconds (ns), the memory bandwidth for the packet buffer is 1.2
Gb/s.
[0009] A typical intermediate node may comprise many line cards
where each line card contains many ports and each port comprises an
input line and an output line. Moreover, each line may operate at a
rate of 1 Gb/s or greater. Thus, packet buffers for intermediate
nodes are often large and operate at a very-high memory bandwidth.
For example, assume an intermediate node has four ports with two
lines per port and each line operates at a rate of 1 Gb/s. Further
assume the intermediate node can process a packet in 250 ms and
that data is transferred to and from the packet buffer using 64-bit
data transfers. The memory bandwidth for the packet buffer must be
at least 8 Gb/s and the tRC for the memory devices must be 8 ns or
less. Moreover, the size of each FIFO must be at least 250 Mb
yielding an overall packet buffer size of 1 gigabit (Gb).
[0010] In order to meet the high-bandwidth requirements associated
with high-speed data communication lines, conventional packet
buffer design mandates use of solely high-speed memory devices,
such as Static Random Access Memory (SRAM), because their bandwidth
and tRC is often sufficient to meet the rigorous requirements
demanded by high speed input/output lines. However, high-speed
memory devices are often very costly and not sufficiently dense to
make them practical to be used for such an implementation.
[0011] An alternative technique for implementing a high-speed
high-density packet buffer has been described in Analysis of a
Memory Architecture for Fast Packet Buffers by S. Iyer et al. This
technique employs a combination of high speed devices arranged as a
head and tail cache and low-speed high-density devices, such as
Dynamic Random Access Memory (DRAM), arranged to hold the FIFO
queues. Moreover, the technique employs a memory management
algorithm that utilizes a look-ahead arrangement to determine the
order data is read from the low-speed devices to replenish the head
cache. However, the technique does not scale well with respect to
the number of FIFO queues and consequently may be inapplicable in
systems that contain a large number of queues, e.g., greater than
512. It would be desirable to have a technique for implementing a
high-speed high-density packet buffer that scales well to systems
that employ a large number of queues.
SUMMARY OF THE INVENTION
[0012] The present invention incorporates a technique that enables
implementation of a high-speed, high-density packet buffer
utilizing a combination of high-speed and low-speed memory devices
in addition to a cache replenishment technique that enables support
of a large number of queues. The novel packet buffer is organized
as a plurality of FIFO queues where each FIFO queue is associated
with a particular input or output line. Each queue comprises a
high-speed cache portion that resides in high-speed memory and a
low-speed, high-density portion that resides in low-speed,
high-density memory. The high-speed cache portion holds FIFO data
associated with the head and/or tail of the novel FIFO queue. The
low-speed, high-density portion holds FIFO data that is not held in
the high-speed cache portion.
[0013] Each FIFO queue is associated with a directory entry that
holds information specific to the queue. This information includes
head and tail information associated with the queue as well as
information relating to the amount of data contained in the
high-speed portion of the FIFO. The information contained in the
directory is used to determine, inter alia, when and how to
replenish the high-speed portion of the FIFO queue.
[0014] In one embodiment of the invention, the high-speed portion
of the FIFO queue holds only data associated with the head of the
queue. Data written to the tail of the queue is written to the
low-speed portion of the queue. Data is read from either the head
cache, if data is available there, or the low-speed portion if the
head cache is depleted. The head cache is refilled, as necessary,
whenever data is written or read to and from the FIFO queue.
[0015] Advantageously, the inventive technique enables high-speed,
high-density packet buffers to be implemented without relying
wholly on high-speed memory devices. Rather, according to the
invention, a portion of the high-speed, high-density packet buffer
can be implemented using inexpensive low-speed, high-density
devices, such as commodity DRAMs.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The above and further advantages of the invention may be
better understood by referring to the following description in
conjunction with the accompanying drawings in which like reference
numbers indicate identical or functionally similar elements:
[0017] FIG. 1 is a schematic block diagram of a network that can be
advantageously used with the present invention;
[0018] FIG. 2 is a partial schematic block diagram of an
intermediate node that can be advantageously used with the present
invention;
[0019] FIG. 3 is a partial schematic block diagram of a network
services engine (NSE) that can be used to implement the present
invention;
[0020] FIG. 4 is a schematic block diagram of a packet buffer that
can be used to implement the present invention;
[0021] FIG. 5 is a schematic diagram of a QID directory entry that
can be used with the present invention;
[0022] FIG. 6 is a schematic diagram of a request that can be used
with the present invention;
[0023] FIG. 7 is a high-level flow diagram of a method that can be
used to enqueue and dequeue packets to and from a FIFO queue in
accordance with the present invention;
[0024] FIGS. 8A-B is a flow diagram of a method that can be used to
implement the present invention; and
[0025] FIG. 9 is an illustration of an embodiment of the invention
that employs a head cache and a tail cache.
DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
[0026] FIG. 1 is a schematic block diagram of a computer network
100 that can be advantageously used with the present invention. The
computer network 100 comprises a collection of communication links
and segments connected to a plurality of nodes, such as end nodes
110 and intermediate nodes 200. The network links and segments may
comprise local area networks (LANs) 120, wide area networks (WANs)
such as Internet 270 and WAN links 130 interconnected by
intermediate nodes 200, such as network switches or routers, to
form an internetwork of computer nodes. These internetworked nodes
communicate by exchanging data packets according to a predefined
set of protocols, such as the Transmission Control
Protocol/Internet Protocol (TCP/IP) and the Internetwork Packet
eXchange (IPX) protocol.
[0027] FIG. 2 is a partial block diagram of an intermediate node
(switch) 200 that can be advantageously used with the present
invention. An illustrative example of intermediate node 200 is the
Cisco 7300 Router available from Cisco Systems, Incorporated, San
Jose, Calif. The illustrative intermediate node 200 is a compact,
mid-range router that provides high availability and high
performance and delivers high-touch IP services at optical speeds.
Intermediate node 200 supports various combinations of
communication protocols including, e.g., Asynchronous Transmission
Mode (ATM), Ethernet, Fast Ethernet, Gigabit Ethernet and
multi-channel T3. The intermediate node 200 comprises a plurality
of cards including line cards 210 and a network services engine
(NSE) card 300 interconnected by a switch fabric backplane 220.
Moreover, each card has a backplane interface 250 that, inter alia,
interfaces the card to the backplane 220 and enables the card to
send and receive various data and control signals to and from the
backplane 220.
[0028] The line cards 210 connect (interface) the intermediate
node, which may alternatively be configured as a switch 200, with
the network 100. To that end, the line cards 210 receive and
transmit data over the network through input ports 215 and output
ports 217, respectively, using various protocols such as, e.g.,
OC-3, OC-12, Fast Ethernet, T3. The line cards 210 forward data
received from the network 100 to the backplane 220, as well as
transmit data received from the backplane 220 to the network
100.
[0029] The switch fabric backplane 220 comprises logic and a
point-to-point interconnection backplane that provides an interface
between the line cards 210 and the NSE 300. That is, the backplane
220 provides interconnections between the cards that allow data and
signals to be transferred from one card to another.
[0030] The NSE 300 is adapted to provide processing of incoming and
outgoing packets. FIG. 3 is a partial block diagram of NSE 300
comprising backplane interface logic 250d, cluster interconnect
logic 320, processor 370, processor memory 360, packet processing
logic 350, port interface logic 380, one or more physical ports
385, traffic manager 330 and packet buffer 400. The backplane
interface logic 250d comprises logic that enables NSE 300 to
communicate over the backplane 220. For example, the backplane
interface logic 250d comprises logic that enables the NSE 300 to
communicate with other cards connected to the backplane 220 using
various data and control signals.
[0031] The cluster interconnect logic 320 comprises logic that
interfaces the backplane interface logic 250d and the processor 370
with the traffic manager 330. Preferably, the cluster interconnect
is embodied in a Field Programmable Gate Array (FPGA) that is
configured to enable the traffic manager 330 to transmit and
receive data to and from the backplane 220 and the processor
370.
[0032] The processor 370 comprises processing elements and logic
that are capable of executing instructions and generating memory
requests. An example of processor 370 that may be advantageously
used with the invention is the BCM1250 processor available from
Broadcom Corporation, Irving, Calif. The processor memory 360 is a
computer readable medium that holds data and software routines
containing executable instructions. These data and software
routines enable (adapt) the processor 370 to perform various
functions, such as managing switch 200, as well as route
processing. The processor memory 360 may comprise one or more
memory devices (not shown) that are capable of storing executable
instructions and data. Preferably, these memory devices are
industry standard memory devices, such as Dynamic Random Access
Memory (DRAM) devices available from Micron Technology, Inc.,
Boise, Id.
[0033] The port interface logic 380 comprises logic that interfaces
the traffic manager 330 to the physical ports 385. To that end, the
port interface logic 380 includes logic that enables the traffic
manager 330 to transmit and receive packets to and from the ports
385, respectively. The ports 385 comprise logic that physically
interfaces the NSE 300 to the network 100.
[0034] The packet processing logic 350 comprises logic and
processing elements that, inter alia, classify a packet that has
been received by NSE 300. The packet processing logic 350 includes
logic that is configured to examine packet headers of received
packets and associate each packet with a FIFO queue contained in
the packet buffer 400. Preferably, the packet processing logic 350
is embodied in a series of Application Specific Integrated Circuits
(ASICs).
[0035] The traffic manager 330 comprises logic and memory elements
that are configured to, inter alia, enqueue and dequeue packets to
and from FIFO queues 405 contained in the packet buffer 400.
Moreover, the traffic manager 330 is configured to allocate and
deallocate blocks of memory contained in the packet buffer's
external DRAM and issue commands to the packet buffer 400 to direct
that buffer 400 to write and read packets to and from the FIFO
queues 405. To that end, traffic manager 330 includes an internal
packet memory (IPM) 332, a queue descriptor memory (QDM) 338, a
scheduler (SCH) 336 and a queue manager (QM) 334. The internal
packet memory 332 contains logic and memory elements that are used
to temporarily hold packets received from the switch fabric
backplane 220 and the physical ports 385. The queue descriptor
memory 338 holds information specific to each of the queues 405.
This information includes pointers to the head and tail of each
queue 405, as well as the size of each queue. The scheduler 336
contains logic and processing elements that perform traffic
management and shaping of traffic transmitted by the NSE 300 over
the switch fabric backplane 220 and ports 385. The queue manager
334 comprises logic and processing elements that, inter alia,
manage each of the FIFO queues 405 contained in the packet buffer
400. Preferably, traffic manager 330 is configured to support 8192
FIFO queues.
[0036] Packet buffer 400 comprises logic and memory elements that
enable packets to be written and read to and from the FIFO queues
405 in accordance with the present invention. Each FIFO queue 405
comprises a low-speed portion 492 (FIG. 4) and a high-speed portion
482. Moreover, each queue is associated with a queue identifier
(QID). Preferably, packet buffer 400 is configured to support 8192
FIFO queues.
[0037] Operationally, incoming packets are received from the
network 100 by the source line cards 210 and sent over the switch
fabric backplane 220 to NSE 300 where they are received by the
backplane interface logic 250d and transferred through the cluster
inter-connect logic 320 to the traffic manager 330 for further
processing. Alternatively, packets are received from the network
100 by the physical ports 385 and transferred to the traffic
manager 330 through the port interface logic 380. The traffic
manager 330 stores each packet in the internal packet memory 332
and notifies the packet processing logic 350. The packet processing
logic 350 examines the packet's header, selects a FIFO queue 405
that is to receive the packet and conveys the FIFO queue's QID to
the queue manager 334. The queue manager 334 allocates, as
necessary, one or more blocks of the DRAM 490 (FIG. 4), associates
the allocated blocks with the QID and issues a series of commands
to the packet buffer 400 to place the packet in the allocated DRAM.
The queue manager 334 then notifies the scheduler 336 to schedule
the packet for dequeuing. When the packet is to be dequeued, the
scheduler 336 notifies the queue manager 334. The queue manager
334, in turn, issues a series of commands to the packet buffer 400
to dequeue the packet. The dequeued packet is then processed and
sent out onto the back-plane 220 or one or more physical ports
385.
[0038] The present invention comprises a technique that enables the
implementation of a high-speed, high-density packet buffer
utilizing a combination of high-speed and low-speed memory devices.
The novel packet buffer is organized as a plurality of FIFO queues
where each FIFO queue is associated with a particular input or
output line. Each queue comprises a high-speed cache portion that
resides in a high-speed memory and a low-speed, high-density
portion that resides in a low-speed high-density memory. The
high-speed cache portion contains FIFO data associated with the
head and/or the tail of the novel FIFO queue. The low-speed,
high-density portion contains FIFO data that is not contained in
the high-speed cache portion.
[0039] FIG. 4 is a detailed partial schematic block diagram of
packet buffer 400 that can be used to implement the present
invention. Packet buffer 400 comprises interface logic 410, command
decode logic 420, read request queue logic 430, queue identifier
(QID) directory 460, queue head (Qhead) cache 480, write packet
queue logic 450, DRAM controller 470 and DRAM 490. Preferably, the
interface logic 410, command decode logic 420, read request queue
430, QID directory 460, Qhead cache 480, write packet queue 450 and
DRAM controller 470 are embodied in one or more ASICs.
[0040] The interface logic 410 comprises logic that is configured
to interface packet buffer 400 to the traffic manager 330. To that
end, the interface logic 410 generates the necessary data and
control signals that enable requests and data to be transferred
between the traffic manager 330 and packet buffer 400. The command
decode logic 420 is connected to the interface logic 410, the read
request queue logic 430 and the write packet queue logic 450, and
comprises logic configured to process requests received at
interface logic 410. If a read command is specified in the request,
the logic 420 forwards the request to the read request queue logic
430; otherwise if a write command is specified, the logic 420
forwards the request and data to the write packet queue logic
450.
[0041] The read request queue 430 comprises logic and memory
elements that are configured to hold and process read commands
received by the command decode logic 420. The read request queue
430 comprises a FIFO command queue 432 and command decode logic 434
that is configured to process a command when it reaches the head of
the command queue 432. Preferably, the FIFO command queue 432 is a
32-entry by 72-bit FIFO queue that is configured to hold up to 32
read requests.
[0042] Similarly, the write packet queue 450 comprises logic and
memory elements that are configured to hold and process write
commands and data received by the command decode logic 420. The
write packet queue 450 comprises a FIFO command queue 452, command
decode logic 454 that is configured to process a command when it
reaches the head of the command queue 452 and a write packet buffer
456 that is configured to hold data associated with the commands in
the command queue 452. Preferably, the write packet queue 450 is a
16-entry queue that is configured to hold up to 16 write
requests.
[0043] DRAM controller 470 comprises logic that is configured to
process requests issued by the write packet queue 450 and QID
directory logic 460. To that end, the DRAM controller 470 contains
logic that decodes requests and transfers data associated with
those requests, preferably as 32-byte blocks, to and from the
low-speed portion 492 of the FIFO queues 405 contained in DRAM
490.
[0044] DRAM 490 comprises logic and memory elements that are
configured to hold the low-speed portion 492 of each FIFO queue
405, preferably as a series of one or more 4096-byte blocks.
Preferably, the memory elements comprise high-density commodity
memory devices capable of holding packet data, such as Reduced
Latentcy DRAM (RLDRAM) devices available from Infineon Technologies
Corporation, San Jose, Calif.
[0045] The Qhead cache 480 comprises logic and memory elements
configured to hold the high-speed portion 482 of FIFO queue 405,
which in the preferred embodiment, is the first 1024 bytes at the
head of the queue. Preferably, Qhead cache 480 comprises
high-speed, high-bandwidth embedded memory macros or external
memory devices, such as Static Random Access Memory (SRAM) devices
available from Micron Technology, Inc., Boise, Id.
[0046] The QID directory 460 comprises logic and memory elements
that are configured to, inter alia, issue requests to the DRAM
controller 470 and hold information specific to the FIFO queues
405. The QID directory 460 issues read requests to DRAM controller
470 to refill the Qhead cache 480 as directed by the read request
queue 430 and write packet queue 450. Moreover QID directory 460
comprises a database 462 that is configured to hold information
specific to the FIFO queues 405. Preferably, database 462 is
organized as a series of 8192 entries where each entry is
associated with a particular FIFO queue 405.
[0047] FIG. 5 is a schematic diagram of a typical entry 500
contained in the QID directory database 462. Entry 500 comprises a
parity field 510, a new entry field 520, a Qtail offset field 530,
a Qtail block address field 540, a Qhead end offset field 550, a
Qhead start offset field 560 and a Qhead block address field 570.
The parity field 510 contains a one-bit value that represents the
overall parity of the entry 500. The new entry field 520 comprises
a one-bit flag field that indicates whether or not the queue is
enqueuing data for the first time.
[0048] The Qtail block address field 540 holds a pointer which is
the address of the is 4096-byte block in DRAM 490 that is
associated with the queue's 405 tail and the Qtail offset field 530
holds the byte offset within the 4096-byte block pointed to by the
Qtail block address field 540. Together the Qtail block address 540
and Qtail offset 530 yield the queue tail pointer 535, which is a
pointer to the byte address of the queue's 405 tail.
[0049] The Qhead block address field 570 holds a pointer, which is
the address of the 4096-byte block in DRAM 490 that is associated
with the queue's head. The Qhead start offset field 560 holds the
byte offset within the 4096-byte block "pointed to" (referenced) by
the Qhead block address field 570. Collectively, the Qhead block
address 570 and the Qhead start offset 560 yield the queue head
pointer 565, which is a pointer to the byte address of the queue's
head.
[0050] The Qhead end offset field 550 holds the byte offset of the
last byte within the block pointed to by the Qhead block address
field 570 that is contained in the high-speed portion 482 of the
queue 405. The difference between the Qhead start offset and the
Qhead end offset yields the number of bytes currently in the
high-speed portion of the associated FIFO queue 405.
[0051] Suppose, for example, that traffic manager 330 needs to
place (enqueue) a 2048-byte packet located in the internal packet
memory 332 into FIFO queue 405a and dequeue queue that packet at
some later time as determined by the scheduler 336. Further suppose
that the packet processing logic 350 has examined the packet and
provided the traffic manager 330 with a QID associated with FIFO
queue 405a and that the new entry field 520 in the QID entry 500
associated with queue 405a indicates the queue is enqueuing data
for the first time. FIG. 7 is a high-level flow diagram of a
sequence of steps traffic manager 334 can use to enqueue the packet
to FIFO queue 405, schedule the packet for dequeuing and dequeue
the packet. The sequence begins at Step 702 and proceeds to Step
704 where queue manager 334 allocates a 4096-byte block in DRAM 490
and updates the queue tail pointer and queue size in the queue's
405a queue descriptor memory entry to reflect the packet. At Step
706, queue manager 334 generates a request 600 to transfer the
packet data from the internal packet memory 332 to queue 405a.
[0052] FIG. 6 is a schematic diagram of request 600 that can be
used with the present invention. Request 600 comprises a command
field 610, an address field 630, a transfer size field 650 and a
queue identifier (QID) field 660. The command field 610 specifies
the operation to be performed by the packet buffer 400, e.g., read
or write data. The address field 630 specifies the location in the
DRAM where the data is to be read or written. Preferably, this
field is a 28-bit field that specifies a 32-byte block address. The
transfer size field 650 specifies the amount of data to be
transferred. Preferably, this field is a 7-bit field that specifies
the number of 32-byte blocks to be transferred. The QID field 660
specifies the FIFO queue 405.
[0053] Traffic manager 330 generates request 600 specifying a write
command in the command field 610, the address associated with the
allocated block in the address field 630, the size of the packet in
the transfer size field 650 and the QID for queue 405a in the QID
field 660. Traffic manager 330 then sends request 600 along with
the data to packet buffer 400, as indicated at Step 708.
[0054] FIGS. 8A and 8B are flow diagrams illustrating a sequence of
steps that enables a packet buffer, such as packet buffer 400, to
be implemented as a high-speed, high-density packet buffer using
high-speed and low-speed memory devices in accordance with the
present invention. The sequence begins at Step 802 and progresses
to Step 804 where request 600 and the associated data is received
at interface logic 410 and transferred to the command decode logic
420. Next at Step 806, the command decode logic examines the
command field 610 and determines if a read command is specified. As
indicated above, command field 610 specifies a write command, so
the sequence proceeds to Step 810 where the request including the
data is transferred to the write packet queue 450 and the data is
placed in the buffer 456.
[0055] At Step 810, the write packet queue 450 directs DRAM
controller 470 to write the data associated with request 600
contained in buffer 456 into the low-speed portion 492a of queue
405a at the location specified by address 630. Next at Step 812,
the write packet queue 450 updates the QID directory entry 500 for
queue 405a to reflect the packet that has just been added to the
queue 405a.
[0056] At Steps 813-814, write packet queue 450 examines queue
405a's QID directory and determines if the high-speed portion of
the cache (hereinafter "Qhead cache 482a") needs to be refilled.
Specifically, the new entry field 520 is examined and if the field
520 indicates the queue 405a is enqueuing data for the first time,
write packet queue 450 "concludes" that the cache 482a needs to be
refilled and clears the new entry field 520 to indicate the queue
is enqueuing data for the first time and proceeds to step 816.
Otherwise, the difference between the Qhead end offset field 550
and Qhead start offset field 560 is calculated. If the difference
is less than the size of cache 482a, i.e., 1024 bytes, the write
packet queue 450 likewise concludes that cache 482a needs to be
refilled and proceeds to step 816. Otherwise, cache 482a does not
need to be refilled, i.e., the cache is full, and the sequence
proceeds to Step 880 where the sequence ends.
[0057] Since the new entry field indicates the queue is enqueuing
data for the first time, the new entry field is cleared and the
sequence proceeds to Step 816 where write packet queue 450
determines if the block address designated by Qhead block address
570 is the same as the block address designated by Qtail block
address 540, i.e., queue 405a is contained in a single block. If
the Qhead cache 482a is not contained in a single block, the
sequence proceeds to Step 820 where the Qhead cache 482a is filled
until either the cache 482a is full or the end of the block pointed
to by Qhead block address 570 is reached. Assuming queue 405a is
contained in a single block, the sequence proceeds to Step 818
where cache 482a is refilled from the data contained in the
low-speed portion 492a of the queue 405a until cache 482a is filled
or the queue's tail address is reached. At Step 822, write packet
queue 450 updates Qhead end offset 550 to reflect the amount of
data placed in the Qhead cache 482a. The sequence then proceeds to
Step 880 where the sequence ends.
[0058] Referring again to FIG. 7, at Step 712 queue manager 334
schedules the packet to be dequeued at some later time with
scheduler 336. When scheduler 336 determines the packet is ready to
be dequeued, it notifies the queue manager 334, as indicated at
Step 714. The queue manager 334, at Step 716, then generates a
request 600, in a manner as described above, to read the packet
from the FIFO queue 405a. More specifically, queue manager 334
reads the entry associated with queue 405a from the queue
descriptor memory 338 and, using this information, generates
request 600 specifying a read command in the command field 610, the
address of the head of the queue in the address field 630, the size
of the packet, i.e., 2048 bytes, in the transfer size field 650 and
the QID associated with queue 405a in the QID field 660. Request
600 is then sent to packet buffer 400, as indicated at Step
718.
[0059] At Step 720, the packet buffer 400 reads the data from queue
405a. More specifically, referring again to FIG. 8A, at Step 804
interface logic 410 receives request 600 and transfers the request
to the command decode logic 420 where the request is examined to
determine if the request specifies a read command, as indicated at
Step 806. Since the request contains a read command, the sequence
proceeds to Step 807 where the request is forwarded to the read
request queue 430. The sequence then proceeds to Step 840 (FIG. 8B)
where the read request queue 430 examines the QID directory entry
500 for queue 405a and determines if the Qhead cache 482a is empty,
i.e., contains no data. Preferably, this determination is made by
calculating the difference between Qhead start offset 560 and Qhead
block address 570 for queue 405a and, if the difference is zero,
concluding the Qhead cache 482a is empty. If cache 482a is empty,
the sequence proceeds to Step 844 where read request queue 430
directs the DRAM controller 470 to acquire (read) the packet from
the low-speed portion 492a of the queue 405a and proceeds to Step
852.
[0060] As described above, cache 482a is not empty so, the sequence
proceeds to Step 846 where read request queue 430 directs the QID
directory to acquire (read) the data from the Qhead cache 482a.
Next at Step 848, read request queue 430 determines if all the data
read from Qhead cache 482a is sufficient to satisfy the request,
i.e., the amount of data read from the Qhead cache 482a equals the
amount of data specified in the transfer size field 650 of the
request 600. If so, the sequence proceeds to Step 852.
[0061] Since Qhead cache 482a provided only 1024 bytes of data
which is less than the size specified in the transfer size field
650, the sequence proceeds to Step 850 where read request queue 430
directs the DRAM controller 470 to read the remaining data, i.e.,
1024 bytes, from the low-speed portion 492a of FIFO queue 405a.
[0062] At Step 852, read request queue 430 updates the QID
directory entry 500 associated with queue 405a by updating the
Qhead start offset 560 field to reflect the amount of data that has
been read from queue 405a.
[0063] Next at Step 856, read request queue 430 determines if there
is data in FIFO queue 405a to refill the Qhead cache 482a. More
specifically, read request queue 430 calculates the queue's head
pointer by adding the Qhead block address 570 and the Qhead start
offset 560, calculates the queue's tail pointer by adding the Qtail
block address 540 and Qtail offset 530, and compares the head
pointer to the tail pointer to determine if they are equal. If
there is no data in the queue 405a, the sequence proceeds to Step
858 where a check is performed to determine if the Qhead block
address 570 and Qtail block address 540 point to the same block. If
so, read request queue 430 directs the QID directory 460 to direct
the DRAM controller 470 to refill the Qhead cache 482a from the
DRAM 490 until the cache 482a is full or the queue's tail address
is reached, as indicated at Step 860. Otherwise if the Qhead block
address 570 and the Qtail block address 540 are not in the same
block, the DRAM controller is directed to refill cache 482a until
the cache 482a is full or the end of the block is reached, as
indicated at Step 862. The sequence then proceeds to Step 880 where
the sequence ends. Since all of the data contained in queue 405a
was read and the queue's head and tail pointers are equal, read
request queue 430 concludes there is no data in queue 405a to
refill cache 482a and thus proceeds to Step 880 where the sequence
ends.
[0064] Referring once again to FIG. 7, at Step 722 queue manager
334 examines the queue's 405a entry in the queue descriptor memory
338 and determines that all of the data in the allocated block has
been read, i.e., the block is no longer in use, and thus
deallocates the block. The sequence then ends at Step 724.
[0065] It should be noted that in the above-described embodiment of
the invention the high-speed portion of the FIFO queue comprises
only a queue head cache; however, this is not a limitation of the
invention. Rather, in other embodiments of the invention, the
high-speed portion comprises other combinations of queue head and
queue tail caches. For example, FIG. 9 is an illustration of a FIFO
queue 900 that employs both a queue head cache and a queue tail
cache. Queue 900 comprises a high-speed portion 970 comprising tail
cache 915 and head cache 955 and a low-speed portion 925. The head
cache 955 comprises entries 951, 952 and 953 and the tail cache
comprises queue entries 911, 912 and 913. The low-speed portion
comprises a plurality of entries including a first entry 923 and a
last entry 921. Data enters FIFO queue 900 at 910 and is written to
the tail cache 915. When the tail cache 915 becomes full, data is
written to the low-speed portion 925 following path 920, in a
manner as described above. Data is read from queue 900 also in a
manner as described above first from the head cache 955, then from
either the low-speed portion 925 following path 940, or the tail
cache 915 following path 930 if the tail pointer of the head cache
955 equals the head pointer of the tail cache 915. Likewise, the
head cache 955 is refilled, in a manner as described above, either
from the low-speed portion 925 following path 940, or the tail
cache 915 following path 930 if the tail pointer of the head cache
955 equals the head pointer of the tail cache 915.
[0066] It should be noted that certain methods employed by the
above-described novel technique may be implemented in whole or in
part using computer readable and executable instructions that are
stored in a computer readable medium, such as DRAM, and executed on
hardware containing processing elements capable of executing
instructions, such as a processor. For example, in one embodiment
of the invention, the methods performed by the traffic manager, the
command decode logic, the read request queue, the write packet
queue and the QID directory are implemented as a series of software
routines that are stored in processor memory 360 and executed by
processor 370.
[0067] In summary, the present invention incorporates a technique
that enables the implementation of a high-speed high-density packet
buffer without incurring the cost associated with wholly
implementing the buffer using only high-speed memory devices. It
will be apparent, however, that other variations and modifications
may be made to the described embodiments, with the attainment of
some or all of their advantages. Therefore, it is an object of the
appended claims to cover all such variations and modifications as
come within the true spirit and scope of the invention.
* * * * *