U.S. patent application number 13/918748 was filed with the patent office on 2014-06-05 for session-based forwarding.
The applicant listed for this patent is Aruba Networks, Inc.. Invention is credited to Bhanu S. Gopalasetty, Ramsundar Janakiraman, Ravinder Verma.
Application Number | 20140153577 13/918748 |
Document ID | / |
Family ID | 50825368 |
Filed Date | 2014-06-05 |
United States Patent
Application |
20140153577 |
Kind Code |
A1 |
Janakiraman; Ramsundar ; et
al. |
June 5, 2014 |
SESSION-BASED FORWARDING
Abstract
The present disclosure discloses a method and network device for
session based forwarding. Specifically, the disclosed system
receives a first packet in a session, and performs a route lookup
to determine a route for the first packet. Then, the system caches
a reference to the route and a neighbor in the session, and also
caches a reference to the session in a tunnel within which packets
in the session are to be forwarded. Based on a comparison between
the route version number cached in the session and the route
version number in a route table corresponding to the route
referenced by a route index in the session, the system determines
whether the route is stale. If so, the system performs another
route lookup to update the route. Moreover, the system uses cached
reference to the session in the tunnel for forwarding subsequent
packets in the session.
Inventors: |
Janakiraman; Ramsundar;
(Sunnyvale, CA) ; Verma; Ravinder; (San Jose,
CA) ; Gopalasetty; Bhanu S.; (San Ramon, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Aruba Networks, Inc. |
Sunnyvale |
CA |
US |
|
|
Family ID: |
50825368 |
Appl. No.: |
13/918748 |
Filed: |
June 14, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61732829 |
Dec 3, 2012 |
|
|
|
Current U.S.
Class: |
370/392 |
Current CPC
Class: |
H04L 63/101 20130101;
H04L 61/25 20130101; H04L 45/566 20130101; H04L 47/122 20130101;
H04L 47/19 20130101; H04L 47/28 20130101; H04L 49/3009 20130101;
H04L 63/0272 20130101 |
Class at
Publication: |
370/392 |
International
Class: |
H04L 12/745 20060101
H04L012/745 |
Claims
1. A method comprising: receiving, by a network device, a first
packet in a session; performing, by the network device, a route
lookup based on a header of the first packet to determine a route
for the first packet; and caching, by the network device, a
reference to the route and a neighbor in the session such that
subsequent packets in the session are routed based on the cached
reference in lieu of subsequent route lookups.
2. The method of claim 1, wherein the reference to the route
comprises one or more of: a route index, a route version number, a
neighbor index, and a neighbor index number.
3. The method of claim 1, further comprising: comparing, by the
network device, a first route version number cached in the session
and a second route version number in a route table corresponding to
the route referenced by a route index in the session; and
determining, by the network device, that the route is stale in
response to the first route version number being different from the
second route version number.
4. The method of claim 3, further comprising: comparing, by the
network device, a first neighbor index and version number cached in
the session with a second neighbor index and version number in a
neighbor table corresponding to the route referenced by the route
index in the session; and determining, by the network device, that
the route is stale in response to the first neighbor index or
version number being different from the second neighbor index or
version number.
5. The method of claim 4, further comprising: in response to
determining that the route is stale, performing another route
lookup to update the route with one or more of an updated route
index, an updated route version number, an updated neighbor index,
and an updated neighbor version number.
6. The method of claim 4, further comprising: in response to
determining that the route is stale and the session is inactive,
delaying route lookup until at least one packet is received in the
session.
7. The method of claim 3, wherein at least two paths with identical
cost corresponding to the route are stored in the route table, each
path being identified by a unique Equal Cost Multiple Path (ECMP)
index.
8. The method of claim 7, wherein, when a new ECMP index is added
to the route table, a subsequent session uses the path associated
with the new ECMP index and an existing session continues to use an
existing path associated with an existing ECMP index.
9. The method of claim 4, wherein, when at least two next hop nodes
use Virtual Router Redundancy Protocol (VRRP), the route is
determined to be stale based on difference between the first
neighbor version number cached in the session and the second
neighbor version number corresponding to the route in the neighbor
table.
10. The method of claim 3, further comprising: in response to the
route determined to be stale, performing another route lookup to
update the session with an updated route index and an updated route
version number; in response to the updated route index and the
updated route version corresponding to a shorter alternative route
than the route, forwarding subsequent packets in the session using
the shorter alternative route.
11. The method of claim 10, wherein the shorter alternative route
is stored in a patricia trie as a child node of a parent node,
wherein the parent node corresponds to the route, and wherein a
route version number of the route corresponding to the parent node
is increased in response to the child node being inserted in the
patricia trie.
12. The method of claim 1, further comprising: caching, by the
network device, a reference to the session in a tunnel within which
packets in the session are to be forwarded, thereby allowing direct
access to the route from the tunnel.
13. The method of claim 1, further comprising: encapsulating, by
the network device, the first packet based on information returned
from a bridge lookup prior to encrypting the first packet;
identifying, by the network device, a network interface that the
first packet is to be transmitted on; sending the first packet to a
security engine of the network device to encrypt the first packet;
and instructing the security engine to forward encrypted first
packet to the identified network interface in lieu of returning the
encrypted first packet to a processor within the network
device.
14. A network device having a symmetric multiprocessing
architecture, the network device comprising: a plurality of CPU
cores; a network interface to receive one or more data packets; and
a memory whose access is shared by the dedicated CPU core and the
plurality of CPU cores; wherein the plurality of CPU cores are to:
receive a first packet in a session; perform a route lookup based
on a header of the first packet to determine a route for the first
packet; and cache a reference to the route and a neighbor in the
session such that subsequent packets in the session are routed
based on the cached reference in lieu of subsequent route
lookups.
15. The network device of claim 14, wherein the reference to the
route comprises one or more of: a route index, a route version
number, a neighbor index, and a neighbor index number.
16. The network device of claim 14, wherein the plurality of CPU
cores are further to: compare a first route version number cached
in the session and a second route version number in a route table
corresponding to the route referenced by a route index in the
session; and determine that the route is stale in response to the
first route version number being different from the second route
version number.
17. The method of claim 16, wherein the plurality of CPU cores are
further to: compare a first neighbor index and version number
cached in the session with a second neighbor index and version
number in a neighbor table corresponding to the route referenced by
the route index in the session; and determine that the route is
stale in response to the first neighbor index or version number
being different from the second neighbor index or version
number.
18. The network device of claim 17, wherein the plurality of CPU
cores are further to: perform another route lookup to update the
route with one or more of an updated route index, an updated route
version number, an updated neighbor index, and an updated neighbor
version number in response to determining that the route is
stale.
19. The network device of claim 17, wherein the plurality of CPU
cores are further to: delay route lookup until at least one packet
is received in the session in response to determining that the
route is stale and the session is inactive.
20. The network device of claim 16, wherein at least two paths with
identical costs corresponding to the route are stored in the route
table, each path being identified by a unique Equal Cost Multiple
Path (ECMP) index.
21. The network device of claim 20, wherein, when a new ECMP index
is added to the route table, a subsequent session uses the path
associated with the new ECMP index and an existing session
continues to use an existing path associated with an existing ECMP
index.
22. The network device of claim 17, wherein, when at least two next
hop nodes use Virtual Router Redundancy Protocol (VRRP), the route
is determined to be stale based on difference between the first
neighbor version number cached in the session and the second
neighbor version number corresponding to the route in the neighbor
table.
23. The network device of claim 16, wherein the plurality of CPU
cores further to: perform another route lookup to update the
session with an updated route index and an updated route version
number in response to the route determined to be stale; forward
subsequent packets in the session using the shorter alternative
route in response to the updated route index and the updated route
version corresponding to a shorter alternative route than the
route.
24. The network device of claim 23, wherein the shorter alternative
route is stored in a patricia trie as a child node of a parent
node, wherein the parent node corresponds to the route, and wherein
a route version number of the route corresponding to the parent
node is increased in response to the child node being inserted in
the patricia trie.
25. The network device of claim 14, wherein the plurality of CPU
cores are further to: cache a reference to the session in a tunnel
within which packets in the session are to be forwarded, thereby
allowing direct access to the route from the tunnel.
26. The network device of claim 14, wherein the plurality of the
CPU cores are further to: encapsulate the first packet based on
information returned from a bridge lookup prior to encrypting the
first packet; identify a network interface that the first packet is
to be transmitted on; send the first packet to a security engine of
the network device to encrypt the first packet; and instruct the
security engine to forward encrypted first packet to the identified
network interface in lieu of returning the encrypted first packet
to a processor within the network device.
27. A non-transitory computer-readable storage medium storing
embedded instructions for a plurality of operations that are
executed by one or more mechanisms implemented within a network
device having a symmetric multiprocessing architecture, the
plurality of operations comprising: receiving a first packet in a
session; performing a route lookup to determine a route for the
first packet; caching a reference to the route in the session;
caching a reference to the session and a neighbor in a tunnel
within which packets in the session are forwarded; comparing a
first route version number cached in the session with a second
route version number in a route table corresponding to the route
referenced by a route index in the session; determining whether the
route is stale based on the first and second route version numbers;
performing another route lookup to update the route in response to
determining that the route is stale; and using cached reference to
the session in the tunnel for forwarding subsequent packets in the
session.
Description
RELATED APPLICATIONS
[0001] This application claims the benefit of priority on U.S.
Provisional Patent Application 61/732,829, filed Dec. 3, 2012, the
entire contents of which are incorporated by reference.
[0002] Related patent applications to the subject application
include the following: (1) U.S. Patent Application entitled "System
and Method for Achieving Enhanced Performance with Multiple
Networking Central Processing Unit (CPU) Cores" by Janakiraman, et
al., U.S. application Ser. No. 13/692,622, filed Dec. 3, 2012,
attorney docket reference no. 6259P186; (2) U.S. Patent Application
entitled "Ingress Traffic Classification and Prioritization with
Dynamic Load Balancing" by Janakiraman, et al., U.S. application
Ser. No. 13/692,608, filed Dec. 3, 2012, attorney docket reference
no. 6259P191; (3) U.S. Patent Application entitled "Method and
System for Maintaining Derived Data Sets" by Gopalasetty, et al.,
U.S. application Ser. No. 13/692,920, filed Dec. 3, 2012, attorney
docket reference no. 6259P192; (4) U.S. Patent Application entitled
"System and Method for Message handling in a Network Device" by
Palkar, et al., U.S. application Ser. No. ______, filed Jun. 14,
2013, attorney docket reference no. 6259P189; (5) U.S. Patent
Application entitled "Rate Limiting Mechanism Based on Device
Load/Capacity or Traffic Content" by Nambiar, et al., U.S.
application Ser. No. ______ , filed Jun. 14, 2013, attorney docket
reference no. 6259P185; (6) U.S. Patent Application entitled
"Control Plane Protection for Various Tables Using Storm Prevention
Entries" by Janakiraman, et al., U.S. application Ser. No. ______,
filed Jun. 14, 2013, attorney docket reference no. 6259P188. The
entire contents of the above applications are incorporated herein
by reference.
FIELD
[0003] The present disclosure relates to networking processing
performance of a symmetric multiprocessing (SMP) network
architecture. In particular, the present disclosure relates to a
system and method for providing session-based forwarding in a
pipelined forwarding model.
BACKGROUND
[0004] A symmetric multiprocessing (SMP) architecture generally is
a multiprocessor computer architecture where two or more identical
processors can connect to a single shared main memory. In the case
of multi-core processors, the SMP architecture can apply to the CPU
cores.
[0005] In an SMP architecture, multiple networking CPUs or CPU
cores can receive and transmit network traffic. While receiving and
transmitting the network traffic, the system may maintain a
flow-based engine that transmits the network traffic on a per-flow
basis. Each flow is uniquely identified by a session key. To allow
for efficient forwarding of flow-based network traffic, a network
routing system typically uses longest prefix match to perform route
lookup. Specifically, the routing system can look up in a route
table for next hop information based on which Internet Protocol
(IP) address provides the longest prefix match to the destination
IP address.
[0006] Nevertheless, the longest prefix match lookup may incur
excessive cost when the packets have long IP addresses, e.g., in
the scenario of IPv6 network. Moreover, because the network
topology and conditions can change dynamically, updating the route
table to reflect the route changes in the flow table can be costly
too. As a result, the rate of network convergence in the event of
route changes may be slow with the conventional routing mechanisms
that update the route information in the flow table.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The present disclosure may be best understood by referring
to the following description and accompanying drawings that are
used to illustrate embodiments of the present disclosure.
[0008] FIG. 1 is a diagram illustrating an exemplary wireless
network environment according to embodiments of the present
disclosure.
[0009] FIG. 2 illustrates an exemplary architecture at multiple
processing planes according to embodiments of the present
disclosure.
[0010] FIG. 3 illustrates an exemplary network forwarding process
according to embodiments of the present disclosure.
[0011] FIG. 4 is a diagram illustrating exemplary routing tables
maintained in a shared memory according to embodiments of the
present disclosure.
[0012] FIG. 5 illustrates exemplary layer 3 and/or layer 2 packet
flow data structure according to embodiments of the present
disclosure.
[0013] FIG. 6 illustrates an exemplary route catch table according
to embodiments of the present disclosure.
[0014] FIGS. 7A-7C illustrates various routing tables according to
embodiments of the present disclosure.
[0015] FIGS. 8A-8C illustrates various scenarios in which a network
route may need to be updated according to embodiments of the
present disclosure.
[0016] FIG. 9 illustrates an exemplary trie data structure used in
session-based forwarding according to embodiments of the present
disclosure.
[0017] FIGS. 10A-10B illustrate processes for session-based
forwarding according to embodiments of the present disclosure.
[0018] FIG. 11 is a block diagram illustrating a system of
session-based forwarding according to embodiments of the present
disclosure.
DETAILED DESCRIPTION
[0019] In the following description, several specific details are
presented to provide a thorough understanding. While the context of
the disclosure is directed to SMP architecture performance
enhancement, one skilled in the relevant art will recognize,
however, that the concepts and techniques disclosed herein can be
practiced without one or more of the specific details, or in
combination with other components, etc. In other instances,
well-known implementations or operations are not shown or described
in details to avoid obscuring aspects of various examples disclosed
herein. It should be understood that this disclosure covers all
modifications, equivalents, and alternatives falling within the
spirit and scope of the present disclosure.
Overview
[0020] Embodiments of the present disclosure relate to networking
processing performance. In particular, the present disclosure
relates to a system and method for providing efficient
session-based forwarding with multiple networking central
processing unit (CPU) cores. Specifically, the system achieves
efficient session-based forwarding by maintaining a version
associated with each route in a session table and determining
whether a route is stale based on the value of the version
associated with each route.
[0021] According to embodiments of the present disclosure, the
conventional route cache table that enumerates all destinations on
the shared memory is trimmed down to regular Neighbor table without
the need for LPM based Route lookup. The packet forwarding pipeline
process is optimized by performing route lookup only once per
session flow (assuming that no route changes during the session).
The present disclosure allows for caching a reference to a route in
the session and caching a reference to the session in a tunnel or a
logical interface, and thus not only enhancing the conventional
per-packet based route lookup to per-flow based lookup, but also
allowing direct access to route information from the tunnel or
logical interface.
[0022] Specifically, with the solution provided herein, a disclosed
network device receives a first packet in a session, and performs a
route lookup based on a header of the first packet to determine a
route for the first packet. Further, the network device caches a
reference to the route in the session such that subsequent packets
in the session are routed based on the cached reference in lieu of
subsequent route lookups. The reference to the route comprises one
or more of a route index (which may additionally include an equal
cost multiple path (ECMP) index), a route version number, a
neighbor index, and a neighbor index number.
[0023] For session based forwarding, the disclosed system compares
a route's version number in the session against the version number
in a route referred by the index in the session. Likewise, the
disclosed system also compares the neighbor entry's version number
in the session against the version number in a neighbor referred by
the index in the session.
[0024] For tunnel based forwarding, the disclosed system can
validate reference to the session by comparing the source and
destination IP addresses. Specifically, the disclosed system can
checks the source and/or destination IP address in the tunnel
against the source and/or destination IP address in the
session.
[0025] If the disclosed network device determines that the route is
stale, the disclosed network device can perform another route
lookup to update the route with one or more of an updated route
index, an updated route version number, an updated neighbor index,
and an updated neighbor version number. In some embodiments,
however, if the disclosed network device determines that the route
is stale and the session is inactive, it will delay route lookup
until at least one packet is received in the session.
[0026] In some embodiments, at least two paths with identical cost
correspond to the route are stored in the route table. Each path is
identified by a unique Equal Cost Multiple Path (ECMP) index. When
a new ECMP index is added to the route table, a subsequent session
uses the path associated with the new ECMP index, but an existing
session continues to use an existing path associated with an
existing ECMP index.
[0027] In some embodiments, when at least two next hop nodes use
Virtual Router Redundancy Protocol (VRRP), the route is determined
to be stale based on difference between the first neighbor version
number cached in the session and the second neighbor version number
corresponding to the route in the neighbor table.
[0028] In some embodiments, if the route determined to be stale,
the disclosed network device performs another route lookup to
update the session with an updated route index and an updated route
version number. And if the updated route index and the updated
route version corresponding to a shorter alternative route than the
route, the disclosed network device forwards subsequent packets in
the session using the shorter alternative route. In one embodiment,
the shorter alternative route is stored in a patricia trie as a
child node of a parent node. Specifically, the parent node
corresponds to the route; and, a route version number of the route
corresponding to the parent node is updated/incremented in response
to the child node being inserted in the patricia trie.
[0029] In some embodiments, the disclosed network device
encapsulates the first packet based on information returned from a
bridge lookup prior to encrypting the first packet.
[0030] Furthermore, the disclosed network device identifies a
network interface that the first packet is to be transmitted on.
Then, the disclosed network device sends the first packet to a
security engine of the network device to encrypt the first packet,
and instructs the security engine to forward encrypted first packet
to the identified network interface in lieu of returning the
encrypted first packet to a processor within the network
device.
Computing Environment
[0031] FIG. 1 shows an exemplary wireless digital network
environment according to embodiments of the present disclosure.
FIG. 1 includes at least one or more network controller (such as
controller 100), one or more access points (such as access point
160), one or more client devices (such as client 170), a layer 2 or
layer 3 network 110, a routing device (such as router 120), a
gateway 130, Internet 140, and one or more web servers (such as web
server A 150, web server B 155, and web server C 158), etc.
[0032] Controller 100 is a hardware device and/or software module
that provide network managements, which include but are not limited
to, controlling, planning, allocating, deploying, coordinating, and
monitoring the resources of a network, network planning, frequency
allocation, predetermined traffic routing to support load
balancing, cryptographic key distribution authorization,
configuration management, fault management, security management,
performance management, bandwidth management, route analytics and
accounting management, etc.
[0033] Moreover, assuming that a number of access points, such as
access point 160, are interconnected with network controller 100.
Each access points may be interconnected with zero or more client
devices via either a wired interface or a wireless interface. In
this example, for illustration purposes only, assuming that client
170 is associated with access point 160 via a wireless link. Access
points generally refer to a network device that allows wireless
clients to connect to a wired network. Access points usually
connect to a router via a wired network or can be a part of a
router in itself.
[0034] Furthermore, controller 100 can be connected to router 120
through zero or more hops in a layer 3 or layer 2 network (such as
L2/L3 Network 110). Router 120 can forward traffic to and receive
traffic from Internet 140 through gateway 130. Router 160 generally
is a network device that forwards data packets between different
networks, and thus creating an overlay internetwork. A router is
typically connected to two or more data lines from different
networks. When a data packet comes in one of the data lines, the
router reads the address information in the packet to determine its
destination. Then, using information in its routing table or
routing policy, the router directs the packet to the next/different
network. A data packet is typically forwarded from one router to
another router through the Internet until the packet gets to its
destination.
[0035] Gateway 130 is a network device that passes network traffic
from local subnet to devices on other subnets. In the example in
FIG. 1, gateway 130 is a default gateway that often connects a
local network (such as L2/L3 Network 110) to Internet 140. In some
embodiments, gateway 130 may be a part of router 120 depending on
the configuration of router 120.
[0036] Web servers 150, 155, and 158 are hardware devices and/or
software modules that facilitate delivery of web content that can
be accessed through Internet 140. For example, web server 150 may
be assigned an IP address of 1.1.1.1 and used to host a first
Internet website (e.g., www.yahoo.com); web server 155 may be
assigned an IP address of 2.2.2.2 and used to host a second
Internet website (e.g., www.google.com); and, web server 158 may be
assigned an IP address of 3.3.3.3 and used to host a third Internet
website (e.g., www.facebook.com).
[0037] In packet switching networks, a flow generally refers to a
sequence of packets from a source network/client device to a
destination network/client device, which may be another host, a
multicast group, or a broadcast domain. A flow could consist of all
packets in a specific session connection or media stream. Each
layer 2 or layer 3 network session can be uniquely identified by a
session key, which may be a layer 3 network session key or a layer
2 network session key. A layer 3 network session key generally
includes information, such as a source Internet Protocol (IP)
address, a destination IP address, a protocol, a layer 4 source
port, a layer 4 destination port, etc. Moreover, a layer 2 network
session key generally includes a source Media Access Control (MAC)
address, a destination MAC address, Ethernet type, etc. The above
described session keys are maintained in a session table use for
session management.
General Architecture
[0038] FIG. 2 illustrates a general architecture including multiple
processing planes according to embodiments of the present
disclosure. Specifically, FIG. 2 includes at least a control plane
process 210, two or more datapath processors 220, a lockless shared
memory 260 accessible by the two or more datapath processors 220,
and a network interface 250.
[0039] Control plane process 210 may be running on one or more CPU
or CPU cores, such as CP CPU 1 212, CP CPU 2 214, . . . CP CPU M
218. Furthermore, control plane process 210 typically handles
network control or management traffic generated by and/or
terminated at network devices as opposed to data traffic generated
and/or terminated at client devices.
[0040] According to embodiments of the present disclosure, datapath
processors 220 include a single exception processing CPU, such as a
slowpath (SP) processor (e.g., Exception Processing CPU 230) and
multiple forwarding CPU, such as fastpath (FP) processors (e.g.,
Forwarding CPU 1 240, Forwarding CPU 2 242, . . . Forwarding CPU N
248). Only forwarding processors are able to receive data packets
directly from network interface 250. Exception processing
processor, on the other hand, only receives data packets from the
forwarding processors.
[0041] Lockless shared memory 260 is a flat structure that is
shared by all datapath processors 220, and not tied to any
particular CPU or CPUs. Any datapath processor 220 can read any
memory location within lockless shared memory 260. Therefore, both
the single exception processing processor (e.g., Exception
Processing CPU 230) and the multiple forwarding processors (e.g.,
Forwarding CPU 1 240, Forwarding CPU 2 242, . . . Forwarding CPU N
248) have read access to lockless shared memory 260, but only the
single exception processing processor (e.g., Exception Processing
CPU 230) has write access to lockless shared memory 260. More
specifically, any datapath processor 220 can have access to any
location in lockless shared memory 260 in the disclosed system.
[0042] Also, control plane process 210 is communicatively coupled
to exception processing CPU 230, such as a slowpath (SP) CPU, but
not forwarding CPU, such as fastpath (FP) processors (e.g.,
Forwarding CPU 1 240, Forwarding CPU 2 242, . . . Forwarding CPU N
248). Thus, whenever control plane process 210 needs information
from datapath processors 220, control plane process 210 will
communicate with exception processing CPU 230, such as an SP
processor.
Network Forwarding
[0043] FIG. 3 illustrates an exemplary network forwarding process
according to embodiments of the present disclosure. A typical
pipeline process at a FP processor involves one or more of the
following operations:
[0044] Port lookup;
[0045] VLAN lookup;
[0046] Port-VLAN table lookup;
[0047] Bridge table lookup;
[0048] Firewall session table lookup;
[0049] Route table lookup;
[0050] Packet encapsulation;
[0051] Packet encryption;
[0052] Packet decryption;
[0053] Tunnel de-capsulation; and/or
[0054] Forwarding; etc.
[0055] Thus, the network forwarding process illustrated in FIG. 3
includes at least a port lookup operation 300, a virtual local area
network (VLAN) lookup operation 305, a port/VLAN lookup operation
310, a bridge lookup operation 315, a firewall session lookup
operation 320, a route lookup operation 325, a forward lookup
operation 330, an encapsulation operation 335, an encryption
operation 340, a tunnel decapsulation operation 345, a decryption
operation 350, and a transmit operation 360.
[0056] FIG. 4 is a diagram illustrating exemplary routing tables
maintained in a shared memory according to embodiments of the
present disclosure. Shared memory 380 can be used to store a
variety of tables to assist software network packet forwarding. For
example, the tables may include, but are not limited to, a bridge
table, a session table, a user table, a station table, a tunnel
table, a route table and/or route cache, etc. Specifically, in the
example illustrated in FIG. 4, shared memory 400 stores at least
one or more of a port table 410, a VLAN table 420, a bridge table
430, a station table 440, a route table 450, a route cache 460, a
session policy table 470, a user table 480, etc. Each table is used
during network forwarding operations illustrated in FIG. 3 for
retrieving relevant information in order to perform proper network
forwarding. For example, port table 410 is used during port lookup
operation to look up a port identifier based on the destination
address of a network packet. Likewise, VLAN table 420 is used
during VLAN lookup operation to look up a VLAN identifier based on
the port identifier and/or source/destination address(es). Note
that, a table can be used by multiple network forwarding
operations, and each network forwarding operation may need to
access multiple routing tables.
[0057] In some embodiments, shared memory 400 is a lockless shared
memory. Thus, multiple tables in shared memory 400 can be accessed
by multiple FP processors while the FP processors are processing
packets received one or more network interfaces. If the FP
processor determines that a packet requires any special handlings,
the FP processor will hand over the packet processing to the SP
processor. For example, the FP processor may find a table entry
corresponding to the packet is missed; and therefore, handing over
the packet processing to the SP processor. As another example, the
FP processor may find that the packet is a fragmented packet, and
thus hand over the packet processing to the SP processor.
[0058] A. Packet Flows
[0059] As mentioned above, a flow generally refers to a sequence of
packets from a source network/client device to a destination
network/client device, which may be another host, a multicast
group, or a broadcast domain. A flow could consist of all packets
in a specific session connection or media stream. FIG. 5
illustrates exemplary flow packets according to embodiments of the
present disclosure. Note that, a layer 2 or layer 3 packet flow may
include multiple fragmented packets, which include, e.g., a first
fragment (or a parent fragment) and one or more subsequent
fragments (or data fragments).
[0060] Also, each packet may include multiple portions. For
example, a packet with a L3 packet key 500 may include at least a
network layer (layer 3 or L3) header that includes L3 source IP
510, L3 destination IP 515, and protocol 520, a transport layer
(layer 4 or L4) header that includes L4 source port 525 and L4
destination port 530. As another example, a packet with a L2 packet
key 550 may include at least a media access control layer (layer 2
or L2) header that include source media access control (MAC)
address 560, destination MAC address 570, and Ethernet type
580.
[0061] Subsequent fragments include at least a network layer (layer
3 or L3) header, but do not include any transport (layer 4 or L4)
header. Transport (layer 4 or L4) header is required for
session-based forwarding, for example, when firewall policies need
to be applied to the packet. Even though subsequent fragments do
not include any transport (layer 4 or L4) header, they are
typically applied with the same session policies as those applied
to the first segment.
[0062] B. Route Cache
[0063] FIG. 6 illustrates an exemplary route catch table according
to embodiments of the present disclosure. In the example shown,
route cache table 600 includes fields such as destination IP
address 605, next hop interface 610, gateway MAC address 615, and
neighbor information 620, which duplicates a corresponding neighbor
entry in a neighbor table. To populate route catch table 600, a
network device first performs a longest prefix match lookup in the
route table based on the destination address (e.g., "1.1.1.1") of a
network packet to find out the next hop interface corresponding to
the destination address (e.g., Interface.sub.NH1). Assuming, for
example, the longest prefix match in the route table for "1.1.1.1"
is "1.1.0.0," which corresponds to next hop interface of
Interface.sub.NH1. Note that, the next hop interface may include,
but is not limited to, a VLAN, a tunnel, and/or a port.
[0064] Then, the network device inserts an entry in route cache
table 600 with the destination IP address, the next hop MAC
interface resulting from the route table lookup, and the default
gateway MAC address (e.g., MAC.sub.GW1) that is known to the
network device (e.g., a network controller). The inserted entry
will also include other information in its corresponding neighbor
entry Entry.sub.1 from the neighbor table. In the example
illustrated in FIG. 6, for destination IP address "2.2.2.2," the
next hop interface is Interface.sub.NH2, the default gateway MAC
address is MAC.sub.GW2, and the neighbor information includes
neighbor entry Entry.sub.2; for destination IP address "3.3.3.3,"
the next hop interface is Interface.sub.NH3, the default gateway
MAC address is MAC.sub.GW3, and the neighbor information includes
neighbor entry Entry.sub.3. Note that, when multiple routes exist
for a destination IP address, each route will correspond to a
different next hop interface although the default gateway address
may be the same. Thus, each route to the same destination IP
address corresponds to a unique entry in the route cache table.
[0065] In some embodiments, route cache table 600 is maintained as
a hash table by applying a hash function on the destination IP
address of the packet. Route cache table 600 may introduce a few
issues. First, the cost of matching longest prefix for each packet
can be high. This is especially true in IPv6 network, where the IP
addresses become longer than conventional IPv4 addresses. Second,
as the number of hashed entries in route cache table increases, the
cost for looking up route cache table 600 also increases
accordingly. Third, maintaining the consistency of route cache
table 600 may result in additional costs.
[0066] For example, when the next hop address in a route
corresponding to "1.1.0.0" changes in the routing table, the system
would have to perform a reverse lookup to search for all
destination IP addresses for which "1.1.0.0" is the longest prefix
match, and update the next hop address in all of those entries.
Therefore, the convergence after a route change is slow because of
the costs involved in maintaining the consistency of route cache
table 600.
[0067] C. Routing Tables
[0068] FIGS. 7A-7C illustrate various routing tables according to
embodiments of the present disclosure. The illustrated routing
tables in combination provide an alternative and an enhancement to
the route cache table describe above. Specifically, FIG. 7A
illustrates an exemplary route table 700, which includes a field
for version 702, prefix/length 705 and a field for next hop address
710 and also an optional next hop interface for routes through P2P
interface (not shown). For example, according to route table 700,
the next hop address for "1.0.0.0/8" is IP.sub.NH1) and the version
of the route is 100; the next hop address for "0.0.0.0/0" is
IP.sub.NH2, and the version of the route is 101; etc.
[0069] FIG. 7B illustrates an exemplary session table 720, which
includes at least the following fields: source IP address 725,
destination IP address 730, route index 735, route version 740,
neighbor index 745, neighbor version 750, etc. Route index 735 may
also include a next hop index in cases where equal cost multiple
paths (EMCP) apply. In the first example shown in FIG. 7B, the
route index corresponding to a route from the source IP address
"100.100.100.100" to the destination IP address "1.1.1.1" is
route.sub.1. Also, the corresponding route version for route.sub.1
is 100; and, the corresponding neighbor index and neighbor version
are ARP.sub.1 and 10 respectively. In the second example, due to a
neighbor change in route.sub.1, the neighbor index, neighbor
version, and route version are updated to ARP.sub.2, 20, and 101
respectively responsive to the next hop change for route.sub.1.
[0070] Note that, the version number 100 for route.sub.1 is the
route version number corresponding to route entry route.sub.1 in
the route table at the time when the disclosed system performs the
route lookup and insert the reference to route.sub.1 into session
table 720. Likewise, the version number 10 is neighbor ARP.sub.1 is
the neighbor version number corresponding to the neighbor entry
ARP.sub.1 in the neighbor table at the time when the disclosed
system insert the reference to APR.sub.1 into session table
720.
[0071] FIG. 7C illustrates an exemplary neighbor table 760, which
includes at least the following fields: version 765, index 770, IP
address 775, MAC address 780, VLAN identifier 790, etc. Neighbor
table 760 provides a mapping between a destination IP address and
MAC address. In some embodiments, information in neighbor table 760
can be obtained by transmitting an Address Resolution Protocol
(ARP) request and analyze the corresponding ARP response. In the
example shown in FIG. 7C, for the first neighbor with neighbor
index number 1, the IP address is 10.10.10.10, which corresponds to
VLAN 10; the corresponding MAC address is MAC.sub.1; and the
version number is 10. Likewise, for the second neighbor with
neighbor index number 2, the IP address is 18.18.18.18, which
corresponds to VLAN 18; the corresponding MAC address is MAC.sub.2;
and the version number is 20. Further, for the third neighbor with
neighbor index number 3, the IP address is 15.15.15.15, which
corresponds to VLAN 15; the corresponding MAC address is MAC.sub.2;
and the version number is 30. The above examples are provided for
illustration purposes only. Different neighbors may have the same
version number, but their IP address and MAC address will be
unique.
[0072] In addition, there exists a firewall session policy table,
which includes information, such as permission (e.g., permit or
deny access), destination network address translation (DNAT),
source network address translation (SNAT), rate limiting, etc. In
some embodiments, a flow table can be used for stateful firewall
purposes in addition to firewall purposes. According to the present
disclosure, the firewall session policy table and/or flow table can
be modified to additionally include the next hop information and
version information, and used for routing purposes.
[0073] During operation, the system monitors every session based on
flow-based destination IP address, which persists through the
entire session. Because the value of the destination IP address
does not change for a particular session, information such as those
cached in the route cache can be cached in the session for easier
access during the session instead of performing a lookup operation
on a packet-by-packet basis. As mentioned previously in description
regarding FIG. 3, a typical forwarding pipeline includes a bridge
lookup operation, a firewall session policy lookup operation, and
subsequent routing operations. The firewall session policy lookup
operation checks the firewall policy based on the destination IP
address of the packets. For example, a user may be redirected to a
captive portal page when the user is trying to access a destination
IP address of "1.1.1.1" according to the enterprise firewall policy
configurations.
[0074] More specifically, the system will be forwarding packets to
"1.1.1.1" in a session (or a flow). Accordingly, the system will
initiate a route lookup based on the destination IP address of the
session, e.g., "1.1.1.1." For illustration purposes only, assuming
that the route lookup returns a next hop IP address of
"10.10.10.10." The system will then perform a neighbor lookup based
on the resulting IP address of the next hop. In this example, it is
assumed that the neighbor lookup returns MAC.sub.1 (e.g.,
01:02:03:04:05:06) and VLAN identifier V.sub.10.
[0075] It is important to note that the firewall session policies
include source and destination IP addresses along with other keys.
Therefore, in every session for which the system performs
session-based forwarding, if the session is determined to be a
router session (which means that the session is not a
client-to-client session that the network system can simply forward
the packets by bridging the packets, but a session that requires a
router to route the packets to the Internet), the system can then
cache the routing lookup results in the session itself.
[0076] In another example, one router is configured as the default
router. Thus, every packet will be sent to the same MAC address
corresponding to the default router, but the next hop address will
be different depending on the destination IP address of each
session. In this example, it is possible to eliminate all user
VLANs and to use only one VLAN to route all packets to the default
router. In addition, a guest VLAN may be configured to route all
guest traffic to a network controller within the WLAN.
[0077] Therefore, the disclosed system may optimize the forwarding
pipeline by caching the routing information in each session.
However, it is possible that during an active session, a route to
the destination IP address may change. In such scenarios, the
disclosed system will update the session table to reflect the route
changes. To improve the efficiency in the updating operations,
rather than maintaining a copy of the relevant route information in
the session table, the disclosed system maintains a reference to an
entry in the route table and a version number corresponding to the
route reference, as well as a reference to an entry in the neighbor
table and a version number corresponding to the neighbor
reference.
[0078] With this solution, the disclosed system no longer needs to
perform a route lookup after a firewall session lookup, because the
system can obtain route information directly from the sessions.
Neither is it necessary to maintain a route cache in the system any
longer, which reduces the cost for routing operations compared with
other solutions.
Maintenance of Route Consistency
[0079] The disclosed system is also able to efficiently maintain
the consistency of the route information in the session table (or
flow table) and the route information in the route table and the
neighbor table (or ARP table). Because the reference and version
number to each route is maintained in the session, any time when a
route changes, the system will be able to quickly detect the route
change based on one or more of a change in the route reference,
route version, neighbor reference, and/or neighbor version.
Moreover, the system can easily update the session with the route
change by updating one or more of the route reference, route
version, neighbor reference, and/or neighbor version to reflect the
route change.
[0080] Specifically, to detect a route change, for every packet
being forwarded in a session, the system de-references the route
index and the neighbor index in the session entry to retrieve the
corresponding entries in the route table and the neighbor table.
The system then obtains the current version number corresponding to
the route index in the route table and the current version number
corresponding to the neighbor index in the neighbor table. Next,
the system determines whether certain conditions that indicate that
the route in the session table is stale have occurred. For example,
the system can determine whether the route version number and the
neighbor version number maintained in the session entry match the
current version numbers and/or neighbor index obtained above. If
either version number in the session table is different from its
corresponding version number in the route table or the neighbor
table or if the neighbor index in the session table is different
from its corresponding neighbor index in the neighbor table, then
the session entry is stable. Therefore, the system will perform a
route lookup using longest prefix match of the destination IP
address of the session, and update the route information based on
the results from the route lookup.
[0081] Note that, in the route table, a version number
corresponding to a route index changes whenever there is a change
in the route, e.g., a change in the next hop address. On the other
hand, in the neighbor table, a version number corresponding to a
neighbor index changes whenever the mapping between the IP address
and the MAC address of a network node (e.g., a default gateway or a
next hop for a particular VLAN) changes, for example, when the
Ethernet interface changes.
[0082] The following sections describe a few exemplary scenarios in
which a route could be changed during an active session, and how
the system will detect the route change in each scenario. These
examples are provided for illustration only. They are not intended
to be an exhaustive list of all possible scenarios. One skilled in
the art can apply the techniques disclosed herein to detect other
types of route changes without departing from the spirit of the
invention.
[0083] A. Route Change
[0084] In this scenario, the default gateway's IP address may
change during a session because the route gets modified, for
example, from "10.10.10.10" at time point t.sub.1 to "18.18.18.18"
at time point t.sub.2. In the route table, assuming that, at
t.sub.1, the route entry has the version number of 100, index value
of 1, prefix/length value of "1.0.0.0/8," and next hop IP address
of "10.10.10.10."
[0085] Accordingly, in the session table, at time point t.sub.1,
the session entry has values as shown in session entry 760 in
session table 720, e.g., having a source IP address of
"100.100.100.100," a destination IP address of "1.1.1.1," a route
index corresponding to route.sub.1, a route version of 100, a
neighbor index corresponding to ARP.sub.1, and a neighbor version
of 10.
[0086] Based on the change above, at time point t.sub.2, the same
route entry has the version number of 101, index value of 1,
prefix/length value of "1.0.0.0/8," and next hop IP address of
"18.18.18.18."
[0087] When the system receives the first packet in the session
after the time point t.sub.2, the system will detect that the route
version in the session table (e.g., 100) does not match the route
version in the route table (e.g., 101). Thus, the system will deem
the session entry as stale and perform a route lookup to update the
session entry.
[0088] After the update, the session entry will have the values as
shown in session entry 765 in session table 720, e.g., having a
source IP address of "100.100.100.100," a destination IP address of
"1.1.1.1," a route index corresponding to route.sub.1, a route
version of 101, a neighbor index corresponding to ARP.sub.2, and a
neighbor version of 20. Note that, the route version has changed
from 100 to 101; the neighbor index has changed from ARP.sub.1
(corresponding to "10.10.10.10" in neighbor table 760) to ARP.sub.2
(corresponding to "18.18.18.18" in neighbor table 760); and, the
neighbor version has changed from 10 (corresponding to APR.sub.1 in
neighbor table 760) to 20 (corresponding to ARP.sub.2 in neighbor
table 760).
[0089] Similarly, when a route is deleted rather than modified, the
system will also detect a mismatch in the version numbers between
the session table and the route table, because the corresponding
route entry in the route table is missing. Therefore, the system
will initiate a route lookup to search for a new route to the
destination IP address based on the longest prefix match to the
current route table (in which the original route was deleted), and
update the session table with the information regarding the new
route.
[0090] Also, note that, the route lookup is performed when the
system receives a packet in the session after time point t.sub.2.
Thus, if no packet is received which indicates that the client is
in an idle state, then the session entry will remain to be stale
despite that a change has occurred in the route table at time point
t.sub.2. This is because when the client is idle, the client is not
using the route, and thus there is no need to spend any resources
on updating the route for an idle client. Furthermore, if the
session entry remains to be stale for an excessive amount of time
because no client is using the session, the session entry will
eventually be removed without ever being updated with the route
change at all.
[0091] B. Equal Cost Multiple Paths (ECMP)
[0092] FIG. 8A illustrates a use case scenario of ECMP, where there
exits multiple paths with equal cost. In this example, a packet in
a session flow from node A 800 to a destination address in L2/L3
Network 880 needs to be forwarded by the disclosed system. Further,
node A 800 is interconnected with two different next hop nodes,
which are next hop A 810 and next hop B 820. Each of the two
different next hop nodes corresponds to a unique path to the
destination address with equal cost.
[0093] For illustration purposes, assuming that, next hop A 810 is
associated with the IP address of "10.10.10.10" and is pre-existing
at time point t.sub.1. Thus, at time point t.sub.1, in the route
table, the route entry has the version number of 101, index value
of 1, prefix/length value of "1.0.0.0/8," and next hop IP address
of "10.10.10.10" corresponding to next hop A 810.
[0094] In addition, assuming that, at time point t.sub.2, a new
route with equal cost from the source address to the destination
address through next hop B 820 becomes available during the
session. Assuming that, next hop B 820 is associated with the IP
address of "18.18.18.18."
[0095] Accordingly, the route table will be updated. Thus, the same
route entry now has the version number of 101, index value of 1,
prefix/length value of "1.0.0.0/8," and next hop IP address of
"10.10.10.10; 18.18.18.18" corresponding to both next hop A 810 and
next hop B 820. In this example, each next hop IP address
corresponds to a unique ECMP index. In addition to caching the
route index, the session may also cache the ECMP index in the cases
involving ECMPs. Note that, the version number remains the same,
but the ECMP index is increased with the additional ECMP route
becoming available.
[0096] Subsequently, any new sessions to a destination address for
which "1.0.0.0/8" provides the longest prefix match will be using
the new ECMP route corresponding to next hop B 820 with the IP
address of "18.18.18.18." Nevertheless, any existing route will
continue to use next hop A 810 with the IP address of
"10.10.10.10," because the route version number has not changed. As
a result, the system will determine that the route through next hop
A 810 is not stale and does not need to be changed or updated.
[0097] This scheme can be particularly useful when traffic from a
private IP network is to be forwarded to two or more networks
corresponding to different uplink service providers (e.g.,
AT&T.RTM. and Verizon.RTM.) via two or more different IP
addresses, such as IP.sub.1 and IP.sub.2. It is desirable that the
traffic to the first service provider is only transmitted via
IP.sub.1, and the traffic to the second service provider is only
transmitted via IP.sub.2. Therefore, the traffic from users of
different service providers will not get mixed with each other.
[0098] C. Virtual Router Redundancy Protocol (VRRP)
[0099] Virtual Router Redundancy Protocol (VRRP) is a computer
networking protocol that provides for automatic assignment of
available IP routers to participating hosts. This increases the
availability and reliability of routing paths via automatic default
gateway selections on an IP sub-network. The VRRP protocol achieves
this by creation of virtual routers, which are an abstract
representation of multiple routers, i.e., master and backup
routers, acting as a group. The default gateway of a participating
host is assigned to the virtual router instead of a physical
router. If the physical router that is routing packets on behalf of
the virtual router fails, another physical router is selected to
automatically replace it. The physical router that is forwarding
packets at any given time is called the master router. Thus, at any
given time, there is only one physical router that is actively
forwarding the traffic.
[0100] FIG. 8B illustrates a use case scenario of VRRP, where there
exits two alternative paths with alternative next hop nodes. In
this example, a packet in a session flow from node A 800 to a
destination address in L2/L3 Network 880 needs to be forwarded by
the disclosed system. Further, node A 800 is interconnected with
two different next hop nodes, which are next hop A 810 and next hop
B 820. Each of the two different next hop nodes corresponds to the
same virtual router under VRRP 815. Therefore, at any given time,
traffic from node A 800 will be able to reach next hop C 830 via
either next hop A 810 or next hop B 820 but not both, and the
traffic will continue to be forwarded to next hop C 830 via L2/L3
network 880.
[0101] Assuming that, at time point t.sub.1, next hop A 810 is
active under VRRP 815 and corresponds to the IP address of
"10.10.10.10." Thus, in the neighbor table, the neighbor entry has
the version number of 10, index value of 1, IP address of
"10.10.10.10," MAC address of MAC.sub.A (corresponding to next hop
A 810), and VLAN identifier value of V.sub.10.
[0102] At time point t.sub.2, assuming that next hop A 810 fails,
and next hop B 820 starts to function as the virtual router.
Therefore, in the neighbor table, the same neighbor entry now has
the version number of 11, index value of 1, IP address of
"10.10.10.10," MAC address of MAC.sub.B (corresponding to next hop
B 820), and VLAN identifier value of V.sub.10.
[0103] Because the neighbor version has changed from 10 to 11, the
system will determine that the route has become stale.
Consequently, the system will perform a route lookup and update the
session entry in the session table with the new neighbor version
number.
[0104] D. Shorter Alternative Route
[0105] FIG. 8C illustrates a use case scenario in which an
alternative shorter route is added during the session. In this
example, a packet in a session flow from node A 800 to a
destination node D 890 is initially forwarded through next hop B
820 and L2/L3 Network 880. Subsequently, assuming that node A 800
is interconnected with an additional next hop node, which is next
hop A 810, and that next hop A 810 is further interconnected with
destination node D 890. Here, for illustration purposes only, it is
assumed that next hop A is associated with an IP address of
"1.1.0.0" and that next hop B is associated with an IP address of
"1.0.0.0." Therefore, there are two alternative routes between node
A 800 and destination node D 890. Specifically, traffic from node A
800 will be able to reach destination node D 890 via next hop A
810, which provides a shorter route than the original route via
next hop B 820 and L2/L3 Network 880.
[0106] In some embodiments, route information can be maintained in
a patricia trie. A patricia trie generally refers a space-optimized
trie data structure, where each node with only one child is merged
with its child. As a result, every internal node has at least two
children. Unlike in regular tries, edges can be labeled with
sequences of elements as well as single elements. This makes them
much more efficient for small sets (especially if the strings are
long) and for sets of strings that share long prefixes.
[0107] FIG. 9 illustrates an exemplary patricia trie used in
session-based forwarding according to the present disclosure. In
this given example, patricia trie 900 includes at least root node
920, node 940, and node 960, which correspond to "0.0.0.0/0,"
"1.0.0.0/8," and "1.1.0.0/16" respectively. Note that, node 940 is
a child node of root node 920. Also, node 960 is a child node of
node 940 and a grandchild node of root node 920. Therefore, in the
patricia trie illustrated in FIG. 9, a child node always has a
longer prefix match than its parent node.
[0108] Furthermore, when a new route is inserted into a patricia
trie as a new node (e.g., assuming that node 960 is inserted as a
child node of node 940), the disclosed system performs at least two
operations: First, the system will add the new route (e.g.,
"1.1.0.0/16") to the route table with a new route index that is
different from the route index of the original route corresponding
to the parent node in the patricia trie. Second, the system will
increase, in the route table, the version number of the route
corresponding to the parent node of the inserted node (e.g., the
version number of route "1.0.0.0/8" will be increased from 100 to
101; note that the route "1.0.0.0/8" corresponds to parent node 940
of the inserted node 960 in this example).
[0109] Because the version number of the route corresponding to the
parent node in the patricia trie gets updated, the corresponding
route entry (e.g., "1.0.0.0/8") becomes stale due to the difference
in the route version number in the session table (e.g., 100) and in
the route table (e.g., 101). Thus, the system will perform a route
lookup to update the route information. As a result, the route
lookup will return the shorter route, e.g., "1.1.0.0/16", with a
new route index instead of the original route (e.g., "1.0.0.0/8").
Thus, subsequent traffic from the same source node to the same
destination node will be forwarded through the updated shorter
route.
[0110] Note that, as mentioned above, when traffic from a private
IP network is to be forwarded to two or more networks corresponding
to different uplink service providers (e.g., AT&T.RTM. and
Verizon.RTM.) via two or more different IP addresses, such as
IP.sub.1 and IP.sub.2, it is desirable that the traffic to the
first service provider is only transmitted via IP.sub.1, and the
traffic to the second service provider is only transmitted via
IP.sub.2. In such scenarios, typically a network address
translation (NAT) of either source IP address or destination IP
address of the packets is involved. Nevertheless, when no NAT
operation is involved and a shorter route is found during a session
as in the example illustrated above in the description of FIG. 8C
and FIG. 9, it would be desirable to switch to a shorter route
during an existing session.
[0111] Note that, the route lookup is performed only when the
system receives a packet in a session. Thus, if in a second session
where the new route "1.1.0.0/16" can be used, but no packet is
received in the second session due to the client being idle, then
the session entry will remain to be stale despite that a new and
shorter route has been inserted in the route table. This is because
when the client is idle, the client is not using the route, and
thus there is no need to spend any resources on updating the route
for an idle client. Furthermore, if the session entry remains to be
stale for an excessive amount of time because no client is using
the session, the session entry will eventually be removed without
ever being updated with the route change at all.
[0112] It is important to note that, in the present disclosure,
only active purging is used, and there is not background purging
involved. Therefore, a session is updated only when there are
active traffic activities in the session. No background process is
used to update the session entries, because there is no need to
utilize the resource for a session when the session is idle.
Caching Session Information in Secured Tunnels
[0113] In some embodiments, a computing environment as illustrated
in FIG. 1 may have a L2/L3 network between controller 100 and
access point 160. In such embodiments, a L2 or L3 tunnel (e.g., a
Generic Routing Encapsulation (GRE) tunnel) can be established
between controller 100 and access point 160 for transmission of
packets to and from client 170. A tunnel is generally represented
as (source IP address, destination IP address, protocol, L4
attributes), and is usually identified by a unique session key.
[0114] As illustrated in FIG. 3, a typical network forwarding
process includes, inter alia, a bridge lookup 315 (e.g., on an
Ethernet packet formatted as IEEE 802.3 packet), a firewall session
lookup 320, a route lookup 325, a forwarding lookup 330, etc. Then,
if it is determined that the packet needs to be transmitted via a
tunnel between a controller and an access point, the packet is
further encapsulated with an outer header, e.g., a GRE header, to
be converted to an IEEE 802.11 packet format. The encapsulated
packet in IEEE 802.11 format again goes through the same series of
network forwarding process, e.g., a firewall session lookup 320, a
route lookup 325, a forwarding lookup 330, etc., before it can be
properly forwarded to its destination.
[0115] In a system as illustrated in FIG. 2, one of datapath
processors 220 (e.g., FP CPU 1 240, FP CPU 2 242, . . . , or FP CPU
N 248) will perform a session and/or route lookup, then send the
packet to a security engine (not shown) for encryption. The
encrypted (and encapsulated) packet will be returned back to
datapath processors 220, which will then forward the encrypted and
encapsulated packet to its corresponding network interface 250.
[0116] In some embodiments, datapath processors 220 may perform
encapsulation prior to sending the packet to the security engine
for encryption. In addition, datapath processors 220 may instruct
the security engine which destination network interface 250 is
associated with the packet. Therefore, after the security engine
completes the encryption of the packet, the security engine can
directly forward the packet to its corresponding destination
network interface 250 without returning the encapsulated packet to
datapath processors 220.
[0117] Specifically, the system will first perform a route and
neighbor (e.g., ARP) lookup, which will return a MAC address and a
VLAN identifier corresponding to the destination IP address in the
packet. Next, based on the combination of MAC and VLAN identifier,
the system performs a bridge lookup, which will return a
destination network interface that can be either a port identifier
or a tunnel identifier.
[0118] Based on the MAC address corresponding to the destination IP
address, the system can determine whether the packet is a unicast
packet or a multicast packet. If the packet is a unicast packet,
the system can use a unicast key to encrypt the packet. On the
other hand, if the packet is a multicast packet, the system can use
a tunnel or multicast key to encrypt the packet.
[0119] Furthermore, based on the destination network interface, the
system can determine whether the packet needs to encapsulated. For
example, if the destination network interface returned from the
bridge lookup is associated with a GRE tunnel, then the system can
determine that the packet will need to be encapsulated with the GRE
headers before being forwarded to its destination. Typically, in
order to perform an encapsulation, the system needs to know the
tunnel information for the packet, which includes the source and
destination IP addresses (available in the header of the packet),
the transmission protocol (which can be determined based on the
tunnel identifier), and L4 attributes associated with the packet
(which are usually cached in the tunnel). Therefore, upon
successful bridge lookup, the system would be able to perform an
encapsulation of the packet based on the information returned from
the bridge lookup.
[0120] Note that, if the system identifies that a packet needs to
be encrypted and encapsulated, the system can perform the
encapsulation prior to the encryption, and thereby avoiding the
need for the packet to be returned to datapath processors after
encryption. This simplified packet flow within the system, e.g.,
from the FP processors to security engine directly to network
interface without the packet being returned to the FP processors by
the security engine, allows for dramatic performance enhancement in
a high performance controlling and switching system.
[0121] After session entries are cached in a tunnel, the system can
combine the tunnel encapsulation operations with the L2/L3 lookups
(such as, firewall session lookup 320, route lookup 325, forwarding
lookup 330, etc.), and thereby avoid feeding the network forwarding
process twice with the same packet (but differently formatted as
IEEE 802.3 for the first time and IEEE 802.11 for the second time).
In one embodiment, a link from the tunnel to the session (e.g.,
index value of the corresponding session entry in the session
table) is maintained where the session further includes routing
information as described above. The link will provide quick access
to important routing information stored in session, and therefore
allowing for determination of whether the system can leverage the
session information for simplified session-based forwarding (e.g.,
where no complex firewall operations are required for the
session).
[0122] Furthermore, for subsequent packets within the same session,
the system can use the cached link to the session entry to retrieve
the routing information, and thus avoiding feeding the packets
through the session forwarding pipeline process. In summary, rather
than feeding every packet through the session forwarding pipeline
twice (first with IEEE 802.3 format and second with IEEE 802.11
format), the present disclosure allows for the first packet in a
flow to be sent through the session forwarding pipeline once
whereby the encryption and encapsulation operations are combined
into the pipeline process, and for any subsequent packets to bypass
the session forwarding pipeline by providing a direct link from
tunnel to the corresponding session entry, which caches the
corresponding routing information returned from the route lookup
performed for the first packet in the flow.
Processes for Session-Based Forwarding
[0123] FIGS. 10A-10B are flowcharts illustrating exemplary
processes for session-based forwarding. Specifically, FIG. 10A
illustrates an exemplary session-based forwarding process in which
route references are cached in the session to avoid per-packet
based route lookup. During operation, the disclosed system receives
a first data packet in a session (operation 1000). The system then
performs a route lookup to determine a route for the first packet
(operation 1005). In addition, for session-based forwarding, the
disclosed system caches a reference to the route and the neighbor
in the session (operation 1010). Further, the disclosed system
optionally caches a reference to the session and the neighbor in a
tunnel in cases where tunnel-based forwarding mechanism is used
(operation 1015). The reference to the route includes one or more
of a route index, a route version number, a neighbor index, and a
neighbor version number. In some embodiments, the tunnel can be a
GRE tunnel within which packets in the session are to be forwarded.
Note that, caching the reference to the session in the tunnels
allows for direct access to the route information from the
tunnel.
[0124] Moreover, the disclosed system compares a first route
version number cached in the session with a second route version
number cached in a route (operation 1020), and then determines
whether the route is stale (operation 1025). In some embodiments,
the system further compares a first neighbor index and version
number cached in the session with a second neighbor index and
version number corresponding to the route in a neighbor table, and
determines that the route is stale if the first neighbor index or
version number is different from the second neighbor index or
version number.
[0125] If the system determines that the route is stale, the system
will perform another route lookup to update the route (operation
1030). Specifically, the system may update the route with one or
more of an updated route index, an updated route version number, an
updated neighbor index, and an updated neighbor index number.
Nevertheless, in some embodiments, if the system determines that
the route is stale but the session is inactive, the system will
delay route lookup until at least one packet is received in the
session.
[0126] In some embodiments, at least two paths with identical cost
corresponding to the route are stored in the route table; and, each
path is identified by a unique Equal Cost Multiple Path (ECMP)
index. When a new ECMP index is added to the route table, a
subsequent session uses the path associated with the new ECMP
index, but an existing session continues to use an existing path
associated with an existing ECMP index.
[0127] In some embodiments, at least two next hop nodes use Virtual
Router Redundancy Protocol (VRRP), the route is determined to be
stale based on the difference between a first neighbor version
number cached in the session and a second neighbor version number
corresponding to the route in the neighbor table.
[0128] Next, when tunnel-based forwarding mechanism is used, the
system can use the cached reference to the session in the tunnel
for forwarding subsequent packets in the session (operation 1035).
Thus, the system only needs to perform a route lookup for the first
packet in a session unless there are route changes during the
session that prompts for another route lookup to update the
route.
[0129] In other embodiments, when a route is determined to be
stale, the system performs another route lookup to update the
session with an updated route index and an updated route version
number. Such updated route index and updated route version number
may correspond to a shorter alternative route than the original
route. If so, the system will forward subsequent packets in the
session using the shorter alternative route. In one embodiment, the
shorter alternative route is stored in a patricia trie as a child
node of a parent node. Specifically, the parent node corresponds to
the route; and, a route version number corresponding to the parent
node is increased when a child node is inserted in the patricia
trie.
[0130] FIG. 10B illustrates an exemplary session-based forwarding
process in which a packet is encapsulated prior to being forwarded
to a security engine such that the security engine can forward
encrypted packet directly to the corresponding network interface.
During operation, the disclosed system performs a bridge lookup
based on a received packet (operation 1040). Then, the system
encapsulates the packet based on information returned from the
bridge lookup (operation 1045). Also, the system identifies a
network interface that the packet is to be transmitted on
(operation 1050). Then, the system sends the packet to a security
engine for encryption (operation 1055). Furthermore, the system
instructs the security engine to forward the encrypted packet to
the identified network interface (operation 1060). Thus, unlike
conventional forwarding process, the security engine does not need
to return the packet to a process within the system and can
directly forward the encrypted packets to a network via the
identified network interface.
System for Session-Based Forwarding
[0131] FIG. 11 is a block diagram illustrating a network device
system for session-based forwarding according to embodiments of the
present disclosure. Network device 1100 includes at least a network
interface 1110 capable of communicating to a wired network, a
shared memory 1120 capable of storing data, a exception processing
processor core 1130 capable of processing network data packets, and
one or more forwarding processor cores, including forwarding
processor core 1142, forwarding processor core 1144, . . . ,
forwarding processor core 1148, which are capable of processing
network data packets. Moreover, network device 1100 may be used as
a network switch, network router, network controller, network
server, etc. Further network device 1100 may serve as a node in a
distributed or a cloud computing environment.
[0132] Network interface 1110 can be any communication interface,
which includes but is not limited to, a modem, token ring
interface, Ethernet interface, wireless IEEE 802.11 interface
(e.g., IEEE 802.11n, IEEE 802.11ac, etc.), cellular wireless
interface, satellite transmission interface, or any other interface
for coupling network devices. In some embodiments, network
interface 1110 may be software-defined and programmable, for
example, via an Application Programming Interface (API), and thus
allowing for remote control of the network device 1100.
[0133] Shared memory 1120 can include storage components, such as,
Dynamic Random Access Memory (DRAM), Static Random Access Memory
(SRAM), etc. In some embodiments, shared memory 1120 is a flat
structure that is shared by all datapath processors (including,
e.g., exception processing processor core 1130, forwarding
processor core 1142, forwarding processor core 1144, . . . ,
forwarding processor core 1148, etc.), and not tied to any
particular CPU or CPU cores. Any datapath processor can read any
memory location within shared memory 1120. Shared memory 1120 can
be used to store various tables to assist session-based packet
forwarding. For example, the tables may include, but are not
limited to, a bridge table, a session table, a user table, a
station table, a tunnel table, a route table and/or route cache,
etc. It is important to note that there is no locking mechanism
associated with shared memory 1120. Any datapath processor can have
access to any location in lockless shared memory in network device
1100.
[0134] Exception processing processor core 1130 typically includes
a networking processor core that is capable of processing network
data traffic. Exception processing processor core 1130 is a single
dedicated CPU core that typically handles table managements. Note
that, slowpath processor core 1130 only receives data packets from
one or more forwarding processor cores, such as forwarding
processor core 1142, forwarding processor core 1144, . . . ,
forwarding processor core 1148. In other words, exception
processing processor core 1130 does not receive data packets
directly from any line cards or network interfaces. Only the
plurality of forwarding processor cores can send data packets to
exception processing processor core 1130. Moreover, exception
processing processor core 1130 is the only processor core having
the write access to shared memory 1120, and thereby will not cause
any data integrity issues even without a locking mechanism in place
for shared memory 1120.
[0135] Forwarding processor cores 1142-1148 also include networking
processor cores that are capable of processing network data
traffic. However, by definition, forwarding processor cores
1142-1148 only performs "fast" packet processing. Thus, forwarding
processor cores 1142-1149 do not block themselves and wait for
other components or modules during the processing of network
packets. Any packets requiring special handling or wait by a
processor core will be handed over by forwarding processor cores
1142-1148 to exception processing processor core 1130.
[0136] Each of forwarding processor cores 1142-1148 maintains one
or more counters. The counters are defined as a regular data type,
for example, unsigned integer, unsigned long long, etc., in lieu of
an atomic data type. When a forwarding processor core 1142-1148
receives a packet, it may increment or decrement the values of the
counters to reflect network traffic information, including but not
limited to, the number of received frames, the number of received
bytes, error conditions and/or error counts, etc. A typical
pipeline process at forwarding processor cores 1142-1148 includes
one or more of: port lookup; VLAN lookup; port-VLAN table lookup;
bridge table lookup; firewall session table lookup; route table
lookup; packet encapsulation; packet encryption; packet decryption;
tunnel de-capsulation; forwarding; etc.
[0137] Moreover, forwarding processor cores 1142-1148 each can
maintain a fragment table. Upon receiving a data fragment without
information necessary for session processing (e.g., a transport
layer or L4 header), forwarding processor cores 1142-1148 will
queue the data fragments in their own fragment table, and perform
various fragment table management tasks.
[0138] Periodically, exception processing processor core 1130 may
receive a query corresponding to one or more forwarding processor
cores 1142-1148 from a control plane process. Exception processing
processor core 1130 identifies one or more memory locations in the
shared memory storing data for the one or more forwarding processor
cores 1142-1148 corresponding to the query, retrieves one or more
data values at the identified memory locations, and responds to the
query. In some embodiments, exception processing processor core
1130 can further aggregate retrieved data values to generate an
aggregated data value, and respond to the query based on the
aggregated data value.
[0139] According to embodiments of the present disclosure, network
services provided by network device 1100, solely or in combination
with other wireless network devices, include, but are not limited
to, an Institute of Electrical and Electronics Engineers (IEEE)
802.1x authentication to an internal and/or external Remote
Authentication Dial-In User Service (RADIUS) server; an MAC
authentication to an internal and/or external RADIUS server; a
built-in Dynamic Host Configuration Protocol (DHCP) service to
assign wireless client devices IP addresses; an internal secured
management interface; Layer-3 forwarding; Network Address
Translation (NAT) service between the wireless network and a wired
network coupled to the network device; an internal and/or external
captive portal; an external management system for managing the
network devices in the wireless network; etc.
[0140] The present disclosure may be realized in hardware,
software, or a combination of hardware and software. The present
disclosure may be realized in a centralized fashion in one computer
system or in a distributed fashion where different elements are
spread across several interconnected computer systems coupled to a
network. A typical combination of hardware and software may be an
access point with a computer program that, when being loaded and
executed, controls the device such that it carries out the methods
described herein.
[0141] The present disclosure also may be embedded in
non-transitory fashion in a computer-readable storage medium (e.g.,
a programmable circuit; a semiconductor memory such as a volatile
memory such as random access memory "RAM," or non-volatile memory
such as read-only memory, power-backed RAM, flash memory,
phase-change memory or the like; a hard disk drive; an optical disc
drive; or any connector for receiving a portable memory device such
as a Universal Serial Bus "USB" flash drive), which comprises all
the features enabling the implementation of the methods described
herein, and which when loaded in a computer system is able to carry
out these methods. Computer program in the present context means
any expression, in any language, code or notation, of a set of
instructions intended to cause a system having an information
processing capability to perform a particular function either
directly or after either or both of the following: a) conversion to
another language, code or notation; b) reproduction in a different
material form.
[0142] As used herein, "digital device" generally includes a device
that is adapted to transmit and/or receive signaling and to process
information within such signaling such as a station (e.g., any data
processing equipment such as a computer, cellular phone, personal
digital assistant, tablet devices, etc.), an access point, data
transfer devices (such as network switches, routers, controllers,
etc.) or the like.
[0143] As used herein, "access point" (AP) generally refers to
receiving points for any known or convenient wireless access
technology which may later become known. Specifically, the term AP
is not intended to be limited to IEEE 802.11-based APs. APs
generally function as an electronic device that is adapted to allow
wireless devices to connect to a wired network via various
communications standards.
[0144] As used herein, the term "interconnect" or used
descriptively as "interconnected" is generally defined as a
communication pathway established over an information-carrying
medium. The "interconnect" may be a wired interconnect, wherein the
medium is a physical medium (e.g., electrical wire, optical fiber,
cable, bus traces, etc.), a wireless interconnect (e.g., air in
combination with wireless signaling technology) or a combination of
these technologies.
[0145] As used herein, "information" is generally defined as data,
address, control, management (e.g., statistics) or any combination
thereof. For transmission, information may be transmitted as a
message, namely a collection of bits in a predetermined format. One
type of message, namely a wireless message, includes a header and
payload data having a predetermined number of bits of information.
The wireless message may be placed in a format as one or more
packets, frames or cells.
[0146] As used herein, "wireless local area network" (WLAN)
generally refers to a communications network links two or more
devices using some wireless distribution method (for example,
spread-spectrum or orthogonal frequency-division multiplexing
radio), and usually providing a connection through an access point
to the Internet; and thus, providing users with the mobility to
move around within a local coverage area and still stay connected
to the network.
[0147] As used herein, the term "mechanism" generally refers to a
component of a system or device to serve one or more functions,
including but not limited to, software components, electronic
components, electrical components, mechanical components,
electro-mechanical components, etc.
[0148] As used herein, the term "embodiment" generally refers an
embodiment that serves to illustrate by way of example but not
limitation.
[0149] It will be appreciated to those skilled in the art that the
preceding examples and embodiments are exemplary and not limiting
to the scope of the present disclosure. It is intended that all
permutations, enhancements, equivalents, and improvements thereto
that are apparent to those skilled in the art upon a reading of the
specification and a study of the drawings are included within the
true spirit and scope of the present disclosure. It is therefore
intended that the following appended claims include all such
modifications, permutations and equivalents as fall within the true
spirit and scope of the present disclosure.
[0150] While the present disclosure has been described in terms of
various embodiments, the present disclosure should not be limited
to only those embodiments described, but can be practiced with
modification and alteration within the spirit and scope of the
appended claims. Likewise, where a reference to a standard is made
in the present disclosure, the reference is generally made to the
current version of the standard as applicable to the disclosed
technology area. However, the described embodiments may be
practiced under subsequent development of the standard within the
spirit and scope of the description and appended claims. The
description is thus to be regarded as illustrative rather than
limiting.
* * * * *
References