U.S. patent application number 10/463217 was filed with the patent office on 2004-12-16 for processing a data packet.
Invention is credited to Adiletta, Matthew J., Hooper, Donald F., Kushlis, Robert J., Rosenbluth, Mark B., Wilkinson, Hugh M. III, Wolrich, Gilbert.
Application Number | 20040252686 10/463217 |
Document ID | / |
Family ID | 33511538 |
Filed Date | 2004-12-16 |
United States Patent
Application |
20040252686 |
Kind Code |
A1 |
Hooper, Donald F. ; et
al. |
December 16, 2004 |
Processing a data packet
Abstract
A device and method for processing a data packet at a device are
described. The device receives data packets and determines
available memory in one or more of local memories of a plurality of
execution threads. The device stores packet information in an
available one of the local memories of the execution threads.
Inventors: |
Hooper, Donald F.;
(Shrewsbury, MA) ; Rosenbluth, Mark B.; (Uxbridge,
MA) ; Wolrich, Gilbert; (Framingham, MA) ;
Adiletta, Matthew J.; (Bolton, MA) ; Wilkinson, Hugh
M. III; (Newton, MA) ; Kushlis, Robert J.;
(Worcester, MA) |
Correspondence
Address: |
FISH & RICHARDSON, PC
12390 EL CAMINO REAL
SAN DIEGO
CA
92130-2081
US
|
Family ID: |
33511538 |
Appl. No.: |
10/463217 |
Filed: |
June 16, 2003 |
Current U.S.
Class: |
370/389 ;
712/E9.053 |
Current CPC
Class: |
H04L 49/901 20130101;
H04L 49/9094 20130101; H04L 49/90 20130101; H04L 49/9031 20130101;
G06F 9/3851 20130101; H04L 49/9057 20130101 |
Class at
Publication: |
370/389 |
International
Class: |
H04L 012/28 |
Claims
What is claimed is:
1. A method of processing a data packet at a device, the method
comprising: receiving a data packet, determining available memory
in one or more of local memories of a plurality of execution
threads, and storing packet information in an available one of the
local memories of the execution threads.
2. The method of claim 1 wherein determining further comprises:
determining available memory in a dispatcher memory, and storing
further comprises: storing the packet information in the dispatcher
memory if the local memories of execution threads are
unavailable.
3. The method of claim 2 wherein determining further comprises:
determining available memory in a shared memory, and storing
further comprises: storing the packet information in the shared
memory if there is no available memory in the dispatcher
memory.
4. The method of claim 3 wherein storing further comprising:
storing the packet information in RAM if there is no available
memory in the shared memory.
5. The method of claim 2 wherein storing further comprising:
storing the packet information in the dispatcher memory if the
packet requires additional reassembly.
6. The method of claim 3 wherein storing further comprising:
storing the packet information in the shared memory if packet
requires additional reassembly.
7. The method of claim 3 wherein storing further comprising:
storing the packet information in RAM if the packet requires
additional reassembly.
8. The method of claim 1 wherein the data packet is received into a
receiver buffer.
9. The method of claim 2 wherein determining further comprises:
determining available memory in a dispatcher backup memory, and
storing further comprises: storing the packet information in the
dispatcher backup memory if the no dispatcher memory is
available.
10. A computer program product, disposed on a computer readable
medium, for processing a data packet at a device, the program
comprising instructions for causing a processor to: receive a data
packet, determine available memory in one or more of local memories
of a plurality of execution threads, and store packet information
in an available one of the local memories of the execution
threads.
11. The program of claim 10 wherein instructions for causing a
processor to determine further comprise instructions for causing a
processor to: determine available memory in a dispatcher memory,
and instructions to store further comprises to: store the packet
information in the dispatcher memory if the local memories of
execution threads are unavailable.
12. The program of claim 11 wherein instructions for causing a
processor to determine further comprise instructions for causing a
processor to: determine available memory in a shared memory, and
instructions to store further comprises to: store the packet
information in the shared memory if there is no available memory in
the dispatcher memory.
13. The program of claim 12 wherein instructions for causing a
processor to store further comprise instructions for causing a
processor to store the packet information in RAM if there is no
available memory in the shared memory.
14. The program of claim 11 wherein instructions for causing a
processor to store further comprise instructions for causing a
processor to: store the packet information in the dispatcher memory
if the packet requires additional reassembly.
15. The program of claim 12 wherein instructions for causing a
processor to store further comprise instructions for causing a
processor to: store the packet information in the shared memory if
packet requires additional reassembly.
16. A system for processing a data packet, the system comprising:
at least one communication port; at least one Ethernet MAC (Medium
Access Control) device coupled to at least one of the at least one
communication ports; at least one processor having access to the at
least one Ethernet MAC device; and instructions for causing at
least one processor to: receive a data packet, determine available
memory in one or more of local memories of a plurality of execution
threads, and store packet information in an available one of the
local memories of the execution threads.
17. The system of claim 16 wherein instructions for causing a
processor to determine further comprises instruction for causing a
processor to: determine available memory in a dispatcher memory,
and instructions to store further comprises to: store the packet
information in the dispatcher memory if the local memories of
execution threads are unavailable.
18. The system of claim 16 wherein instructions for causing a
processor to determine further comprises instruction for causing a
processor to: determine available memory in a shared memory, and
instructions to store further comprises to: store the packet
information in the shared memory if there is no available memory in
the dispatcher memory.
19. The system of claim 18 wherein instructions for causing a
processor to store further comprises instruction for causing a
processor to: store the packet information in RAM if there is no
available memory in the shared memory.
20. The system of claim 17 wherein instructions for causing a
processor to store further comprises instruction for causing a
processor to: store the packet information in the dispatcher memory
if the packet requires additional reassembly.
21. The system of claim 17 wherein instructions for causing a
processor to store further comprises instruction for causing a
processor to: store the packet information in the shared memory if
packet requires additional reassembly.
22. A device for processing a data packet comprising: a packet
receiver buffer to receive a data packet, one or more local
memories of a plurality of execution threads, and a packet
dispatcher to determine available memory in the one or more of
local memories of a plurality of execution threads, and store
packet information in an available one of the local memories of the
execution threads.
23. The device of claim 22 further comprises: a dispatcher memory
and wherein the dispatcher further determines available memory in a
dispatcher memory, and stores the packet information in the
dispatcher memory if the local memories of execution threads are
unavailable.
24. The device of claim 23 further comprises: a shared memory and
wherein the dispatcher further determines available memory in a
shared memory, and stores the packet information in the shared
memory if there is no available memory in the dispatcher
memory.
25. The device of claim 24 further comprising: RAM and wherein the
dispatcher further stores the packet information in RAM if there is
no available memory in the shared memory.
26. The device of claim 23 wherein: the dispatcher further stores
the packet information in the dispatcher memory if the packet
requires additional reassembly.
27. The device of claim 24 wherein: the dispatcher further stores
the packet information in the shared memory if packet requires
additional reassembly.
28. The device of claim 25 wherein: the dispatcher further stores
the packet information in RAM if packet requires additional
reassembly.
29. The device of claim 23 wherein: the dispatcher further
determines available memory in a dispatcher backup memory, and
stores the packet information in the dispatcher backup memory if
the dispatcher memory is unavailable.
30. The device of claim 22 wherein the device is a component of a
network processor.
Description
BACKGROUND
[0001] Networks enable computers and other devices to exchange data
such as e-mail messages, web pages, audio, video, and so forth. To
send data across a network, a sending device typically constructs a
collection of packets. In networks, individual packets store some
portion of the data being sent. A receiver can reassemble the data
into its original form after receiving the packets.
[0002] A packet traveling across a network may make many "hops" to
intermediate network devices before reaching its final destination.
A packet includes data being sent and information used to deliver
the packet. This information is often stored in the packet's
"payload" and "header(s)", respectively. The header(s) may include
information for a number of different communication protocols that
define the information that should be stored in a packet. Different
protocols may operate at different layers. For example, a low level
layer generally known as the "link layer" coordinates transmission
of data over physical connections. A higher level layer generally
known as the "network layer" handles routing, switching, and other
tasks that determine how to move a packet forward through a
network.
[0003] Many different hardware and software schemes have been
developed to handle packets. For example, some designs use software
to program a general purpose CPU (Central Processing Unit)
processor to process packets. Other designs use components such as
ASICs (application-specific integrated circuits), feature
dedicated, "hard-wired" approaches. Field programmable processors
enable software programmers to quickly reprogram network processor
operations.
DESCRIPTION OF DRAWINGS
[0004] FIG. 1 is a block diagram of a communication system
employing a hardware-based multithreaded processor.
[0005] FIG. 2 is a block diagram of a microengine unit employed in
the hardware-based multithreaded processor of FIG. 1.
[0006] FIG. 3 is a diagram of the processing of a packet.
[0007] FIG. 4 is a flow chart of the processing of a packet.
[0008] FIG. 5 is a flow chart of the initial handling and storing
of packet information prior to processing by the threads.
DETAILED DESCRIPTION
[0009] Referring to FIG. 1, a communication system 10 includes a
parallel, hardware-based multithreaded processor 12. The
hardware-based multithreaded processor 12 is coupled to a bus such
as a Peripheral Component Interconnect (PCI) bus 14, a memory
system 16 and a second bus 18. The system 10 is especially useful
for tasks that can be broken into parallel subtasks. Specifically
hardware-based multithreaded processor 12 is useful for tasks that
are bandwidth oriented rather than latency oriented. The
hardware-based multithreaded processor 12 has multiple microengines
22 each with multiple hardware controlled program threads that can
be simultaneously active and independently work on a task. A
program thread is an independent program that runs a series of
instruction. From the program's point-of-view, a program thread is
the information needed to serve one individual user or a particular
service request.
[0010] The hardware-based multithreaded processor 12 also includes
a central controller 20 that assists in loading microcode control
for other resources of the hardware-based multithreaded processor
12 and performs other general purpose computer type tasks such as
handling protocols, exceptions, extra support for packet processing
where the microengines pass the packets off for more detailed
processing such as in boundary conditions. In one embodiment, the
processor 20 is a Strong Arm.RTM. (Arm is a trademark of ARM
Limited, United Kingdom) based architecture. The general purpose
microprocessor 20 has an operating system. Through the operating
system the processor 20 can call functions to operate on
microengines 22a-22f. The processor 20 can use supported operating
system preferably a real time operating system. For the core
processor implemented as a Strong Arm architecture, operating
systems such as, Microsoft NT real-time, VXWorks and TCUS, a
freeware operating system available over the Internet, can be
used.
[0011] The hardware-based multithreaded processor 12 also includes
a plurality of microengines 22a-22f. Microengines 22a-22f each
maintain a plurality of program counters in hardware and states
associated with the program counters. Effectively, a corresponding
plurality of sets of program threads can be simultaneously active
on each of the microengines 22a-22f while only one is actually
operating at one time.
[0012] In one embodiment, there are six microengines 22a-22f, each
having capabilities for processing four hardware program threads.
The six microengines 22a-22f operate with shared resources
including memory system 16 and bus interfaces 24 and 28. The memory
system 16 includes a Synchronous Dynamic Random Access Memory
(SDRAM) controller 26a and a Static Random Access Memory (SRAM)
controller 26b. SDRAM memory 16a and SDRAM controller 26a are
typically used for processing large volumes of data, e.g.,
processing of network payloads from network packets. The SRAM
controller 26b and SRAM memory 16b are used in a networking
implementation for low latency, fast access tasks, e.g., accessing
look-up tables, memory for the core processor 20, and so forth.
[0013] Hardware context swapping enables other contexts with unique
program counters to execute in the same microengine. Hardware
context swapping also synchronizes completion of tasks. For
example, two program threads could request the same shared resource
e.g., SRAM. Each one of these separate units, e.g., the FBUS
interface 28, the SRAM controller 26a, and the SDRAM controller
26b, when they complete a requested task from one of the
microengine program thread contexts reports back a flag signaling
completion of an operation. When the flag is received by the
microengine, the microengine can determine which program thread to
turn on.
[0014] As a network processor the hardware-based multithreaded
processor 12 interfaces to network devices such as a media access
controller device e.g., a 10/100BaseT Octal MAC 13a or a Gigabit
Ethernet device 13b coupled to communication ports or other
physical layer devices.
[0015] In general, as a network processor, the hardware-based
multithreaded processor 12 can interface to different types of
communication device or interface that receives/sends large amounts
of data. The network processor can include a router 10 in a
networking application route network packets amongst devices 13a,
13b in a parallel manner. With the hardware-based multithreaded
processor 12, each network packet can be independently processed.
26.
[0016] The processor 12 includes a bus interface 28 that couples
the processor to the second bus 18. Bus interface 28 in one
embodiment couples the processor 12 to the so-called FBUS 18 (FIFO
bus). The FBUS interface 28 is responsible for controlling and
interfacing the processor 1b2 to the FBUS 18. The FBUS 18 is a
64-bit wide FIFO bus, used to interface to Media Access Controller
(MAC) devices. The processor 12 includes a second interface e.g., a
PCI bus interface 24 that couples other system components that
reside on the PCI 14 bus to the processor 12. The units are coupled
to one or more internal buses. The internal buses are dual, 32 bit
buses (e.g., one bus for read and one for write). The
hardware-based multithreaded processor 12 also is constructed such
that the sum of the bandwidths of the internal buses in the
processor 12 exceed the bandwidth of external buses coupled to the
processor 12. The processor 12 includes an internal core processor
bus 32, e.g., an ASB bus (Advanced System Bus) that couples the
processor core 20 to the memory controllers 26a, 26b and to an ASB
translator 30 described below. The ASB bus is a subset of the
so-called AMBA bus that is used with the Strong Arm processor core.
The processor 12 also includes a private bus 34 that couples the
microengine units to SRAM controller 26b, ASB translator 30 and
FBUS interface 28. A memory bus 38 couples the memory controller
26a, 26b to the bus interfaces 24 and 28 and memory system 16
including flashrom 16c used for boot operations and so forth.
[0017] Each of the microengines 22a-22f includes an arbiter that
examines flags to determine the available program threads to be
operated upon. The program thread of the microengines 22a-22f can
access the SDRAM controller 26a, SDRAM controller 26b or FBUS
interface 28. The SDRAM controller 26a and SDRAM controller 26b
each include a plurality of queues to store outstanding memory
reference requests. The queues either maintain order of memory
references or arrange memory references to optimize memory
bandwidth.
[0018] Although microengines 22 can use the register set to
exchange data. A scratchpad or shared memory is also provided to
permit microengines to write data out to the memory for other
microengines to read. The scratchpad is coupled to bus 34.
[0019] Referring to FIG. 2, an exemplary one of the microengines
22a-22f, e.g., microengine 22f is shown. The microengine includes a
control store 70 which, in one implementation, includes a RAM of
here 1,024 words of 32 bits. The RAM stores a microprogram that is
loadable by the core processor 20. The microengine 22f also
includes controller logic 72. The controller logic includes an
instruction decoder 73 and program counter (PC) units 72a-72d. The
four micro program counters 72a-72d are maintained in hardware. The
microengine 22f also includes context event switching logic 74.
Context event logic 74 receives messages (e.g.,
SEQ_#_EVENT_RESPONSE; FBI.sup.--EVENT_RESPONSE;
SRAM_EVENT_RESPONSE; SDRAM _EVENT_RESPONSE; and ASB_EVENT_RESPONSE)
from each one of the shared resources, e.g., SRAM 26a, SDRAM 26b,
or processor core 20, control and status registers, and so forth.
These messages provide information on whether a requested task has
completed. Based on whether or not a task requested by a program
thread has completed and signaled completion, the program thread
needs to wait for that completion signal, and if the program thread
is enabled to operate, then the program thread is placed on an
available program thread list (not shown).
[0020] In addition to event signals that are local to an executing
program thread, the microengines 22 employ signaling states that
are global. With signaling states, an executing program thread can
broadcast a signal state to the microengines 22. The program thread
in the microengines can branch on these signaling states. These
signaling states can be used to determine availability of a
resource or whether a resource is due for servicing.
[0021] The context event logic 74 has arbitration for the program
threads. In one embodiment, the arbitration is a round robin
mechanism. Other techniques could be used including priority
queuing or weighted fair queuing. The microengine 22f also includes
an execution box (EBOX) data path 76 that includes an arithmetic
logic unit 76a and general purpose register set 76b. The arithmetic
logic unit 76a performs arithmetic and logic operation as well as
shift operations. The registers set 76b has a relatively large
number of general purpose registers. In this implementation there
are 64 general purpose registers in a first bank, Bank A and 64 in
a second bank, Bank B. The general purpose registers are windowed
so that they are relatively and absolutely addressable.
[0022] The microengine 22f also includes a write transfer register
stack 78 and a read transfer stack 80. These registers are also
windowed so that they are relatively and absolutely addressable.
Write transfer register stack 78 is where write data to a resource
is located.
[0023] Similarly, read register stack 80 is for return data from a
shared resource. Subsequent to or concurrent with data arrival, an
event signal from the respective shared resource e.g., the SRAM
controller 26a, SDRAM controller 26b or core processor 20 will be
provided to context event arbiter 74 which will then alert the
program thread that the data is available or has been sent. Both
transfer register banks 78 and 80 are connected to the execution
box (EBOX) 76 through a data path. In one implementation, the read
transfer register has 64 registers and the write transfer register
has 64 registers.
[0024] Each microengine 22a-22f supports multi-threaded execution
of multiple contexts. One reason for this is to allow one program
thread to start executing just after another program thread issues
a memory reference and must wait until that reference completes
before doing more work. This behavior maintains efficient hardware
execution of the microengines because memory latency is
significant.
[0025] Special techniques such as inter-thread communications to
communicate status and a thread_done register to provide a global
program thread communication scheme is used for packet processing.
The thread_done register can be implemented as a control and status
register.
[0026] Network operations are implemented in the network processor
using a plurality of program threads e.g., contexts to process
network packets. For example, scheduler program threads could be
executed in one of the microprogram engines e.g., 22a whereas,
processing program threads could execute in the remaining engines
e.g., 22b-22f. The program threads (processing or scheduling
program threads) use inter-thread communications to communicate
status.
[0027] Program threads are assigned specific tasks such as receive
and transmit scheduling, receive processing, and transmit
processing, etc. Task assignment and task completion are
communicated between program threads through the inter-thread
signaling, registers with specialized read and write
characteristics, e.g., the thread-done register, SRAM 16b and data
stored in the internal scratchpad memory resulting from operations
such as bit set, and bit clear.
[0028] Referring to FIG. 3, the packet dispatcher 302 resides on a
processor inside the network processor and requests packets from
the network interface. The packet dispatcher 302 is notified when a
packet segment (e.g., 128 bytes) has been received by a packet
receiver buffer 304. The packet dispatcher 302 moves the packet
segment payload into DRAM 306. The packet dispatcher 302 stores
packet reassembly state information to reassemble the packet. As
successive segments are received for a packet, the dispatcher 302
uses the state information to direct and assemble the segments in
space allocated in DRAM 306 by the packet dispatcher 302.
[0029] Each packet received is assigned a sequence number, in
ascending order. The sequence number allows the packets to be
dequeued in the order they were received. The sequence number range
corresponds to a slot in a ring in memory called an Asynchronous
Insert Synchronous Remove (AISR) 308 ring. When a thread 310 in the
pool of threads has taken its assigned packet and finished
processing the packet, the thread 310 sends the processed packet to
DRAM 306. The thread also signals completion of the processed
packet to the indexed location in the AISR 308, based on the
packet's sequence number. This ensures that the results are stored
in ascending addresses by order of packet arrival. The reorder
dequeue 312 reads the AISR 308 in ascending order, checking to see
if packet information has been assigned to the slot. The reorder
dequeue 312 will continue checking the slot in the AISR 308 until
packet information is found in the slot. The system provides a
First In First Out (FIFO) routine while efficiently processing
packets out of order.
[0030] When a packet is received, the dispatcher 302 assigns the
packet to a thread 310 in the pool of threads. Each thread in the
pool makes itself available by signaling the dispatcher via either
a thread mailbox 314 or a message CSR 316. Each thread 310 has a
memory that allows the thread to work on a presently assigned
packet and store the next assigned packet in memory. The thread 310
communicates its memory and processing availability and location of
the thread to the packet dispatcher 302. The dispatcher 302
communicates select packet state information back to the assigned
threads. The packet state information can include, for example, the
packet payload's address in DRAM 306 and the sequence number.
[0031] There are multiple methods by which the thread 310 can
communicate its availability and the packet dispatcher 302 can
assign a packet to that thread 310. A thread 310 can communicate
its availability through a Control and Status Register (CSR) 316.
Each thread can write to a few bits of the CSR 316. The packet
dispatcher 302 can read and clear the CSR 316, thus providing the
status of many threads at one time. Alternatively, the dispatcher
302 and threads 310 can communicate via "mailboxes" 314. The thread
310 can signal its availability by flagging or placing an
identifier in the mailbox 314. The dispatcher polls each thread
mailbox until it identifies an available thread. The dispatcher 302
can write the packet state information to the mailbox 314 for the
available thread.
[0032] The threads 310 in the pool can finish their assignment at
any time. Some will take a long time, probing deep into the packet
header. Others will finish early. Once the thread 310 is finished
processing the packet, the thread sends the packet information to
the AISR ring 308 in the location of the sequence number given to
the packet during initial processing. The thread 310 is now
available to process the next packet and signals its availability
to the packet dispatcher 302. The reorder dequeue 312 cycles
through the AISR ring 308 and dequeues the packets to the network
based on the order the packets were received.
[0033] A backlog (or bottleneck) can result when the microengine
receives an above-average amount of packets that require in-depth
processing. If the dispatcher 302 receives a new data packet from
the network at a time when all the threads 310 are processing
assigned data packets, then the dispatcher 302 is forced to drop
the new packet, leave the packet in the packet receiver buffer 304
or find temporary storage for it. The dispatcher 302 has a memory
318. Similar to the AISR ring 308 discussed earlier, the dispatcher
memory 318 is a ring that allows the dispatcher 302 to assign
packet state information to a slot in the memory ring. The
dispatcher 302 continues assigning newly enqueued packet state
information sequentially in the slot of the memory ring 318. When
threads 310 in the pool of threads become available the dispatcher
302 assigns packet information starting with the oldest saved slot
and sequentially assigns packets to newly available threads memory
310.
[0034] If the backlog continues to the extent that all the slots of
the dispatcher memory ring 318 are filled, in one embodiment the
dispatcher starts to assign slots to a backup memory ring 320. This
process is similar to the process of assigning and retrieving slot
information from the memory ring 318. The difference is that the
backup ring can use memory that would normally be allocated to
other resources when there is no need for the backup ring. In
another embodiment, the primary dispatcher memory ring 318 is made
larger in order to handle the largest bottleneck of packet
processing.
[0035] In one embodiment, the dispatcher 302 can use the
microengine scratch memory 322 to store packet information. If a
packet-processing bottleneck causes all the slots in the dispatcher
memory 318 to become filled, the dispatcher 302 can assign packet
information to the microengine scratch memory 322. Once the
bottleneck is relieved the dispatcher 302 assigns the packet
information in the scratch memory 322 to the available thread
memory 310. The dispatcher 302 can also assign packet information
to the DRAM 306 if the dispatcher memory 318 and the scratch memory
322 are filled due to the bottleneck. The dispatcher 302 can also
assign packet information to the DRAM 306 if the dispatcher memory
318 is filled and the scratch memory 322 is filled with other data
assigned to scratch memory by the microengine processor. The
process provides for efficient storage of packet information during
bottlenecks while restraining the use of DRAM 306 bandwidth and
other memory resources of the microengine.
[0036] Referring to FIG. 4, the flowchart shows the processing of
data packets 400 by the microengine. The data packet is received
from the network into the receiver buffer 402. The dispatcher gives
the data packet a packet sequence number and assigns a location in
memory for the thread information 404. The sequence number allows
the packets to be processed by the threads in an order independent
of the order the threads will be dequeued back to the network or
general processor. The threads independently communicate to the
packet dispatcher regarding their available state 406. A thread 408
in the pool can make itself available even when it is busy
processing a packet. The thread 408 stores the packet it is
processing and stores the next packet intended for processing by
the thread. This allows each thread 408 to handle two packets at a
time. Once the dispatcher determines an available location in a
thread 408, the packet dispatcher assigns the packet information to
the memory of the available thread 416. If the dispatcher
determines that there are no available threads at that time 408,
the packet dispatcher stores the packet information temporarily in
memory 410. The packet dispatcher continues to receive packets,
process the packets (e.g. assign a sequence number, a storage
location, and determine reassembly information), and store the
packet information in the next sequential memory slot 412.
[0037] Once the dispatcher determines a thread is available 414,
the dispatcher sends the packet information into the available
thread's local memory 416. The thread processes the packet and then
sends the packet information to the AISR ring in memory based on
the sequence number in the packet information 420. The reorder
dequeue sequentially pulls the packet information from the ring and
sends the packet to the packets future destination 422. In the case
of router the packet would be sent onto the network to the next
router on the packet path to the packets final destination.
[0038] Referring to FIG. 5, the dispatcher determines the most
efficient location to store the packet information 500. By storing
the packet information in a variety of the location the dispatcher
can efficiently use the microengine's memory and handle overflow
produced by bottleneck of thread processing. The packet is
initially received into the receiver buffer 502. The dispatcher
assigns the packet payload a location in memory and a sequence
number 504. The dispatcher determines if the packet has been
completely received and is ready for processing 506. If the packet
is complete, the dispatcher determines if there is an available
thread to process the packet 508. If a thread is available the
dispatcher can send the packet information directly to the
available threads memory 510. However, if there are no available
threads or the packet has not been completely reassembled the
dispatcher determines the best location to store the packet
information until both of these conditions are satisfied. The
dispatcher checks the dispatcher's memory ring 512. If the memory
ring is available the packet assigns the packet to a slot in the
memory ring 514. If memory ring is filled and unavailable, the
dispatcher checks the memory slot availability of the dispatcher's
backup memory. If the backup memory has space available the packet
information is assigned to a slot in the backup memory ring
structure 516. When both backup and primary memory of the
dispatcher are filled the dispatcher will check the scratch memory
of the microengine 520. If the memory is available the dispatcher
will assign the packet information to the scratch memory 522.
Otherwise the dispatcher can assign the packet information to DRAM
524. The process allows the dispatcher to assign memory to a
variety of memory location rather than continually sending the
overflow of packet information directly to DRAM. The system
provides efficient use of bandwidth of the DRAM and the scratch
memory. The system also provides memory use for other processing
resources when bottlenecks are not present and quickly stores
packet information.
[0039] A number of embodiments of the packet processing have been
described. Nevertheless, it will be understood that various
modifications may be made without departing from the scope of the
packet processing. Accordingly, other embodiments are within the
scope of the following claims.
* * * * *