U.S. patent application number 14/324682 was filed with the patent office on 2014-10-30 for intelligent graph walking.
The applicant listed for this patent is Cavium, Inc.. Invention is credited to Imran Badr, Rajan Goyal, Muhammad Raghib Hussain.
Application Number | 20140324900 14/324682 |
Document ID | / |
Family ID | 40589300 |
Filed Date | 2014-10-30 |
United States Patent
Application |
20140324900 |
Kind Code |
A1 |
Hussain; Muhammad Raghib ;
et al. |
October 30, 2014 |
Intelligent Graph Walking
Abstract
An apparatus, and corresponding method, for performing a search
for a match of at least one expression in an input stream is
presented. A graph including a number of interconnected nodes is
generated. A compiler may assign at least one starting node and at
least one ending node. The starting node includes a location table
with node position information of an ending node and a sub-string
value associated with the ending node. Using the node position
information and a string comparison function, intermediate nodes
located between the starting and ending nodes may be bypassed. The
node bypassing may reduce the number of memory accesses required to
read the graph.
Inventors: |
Hussain; Muhammad Raghib;
(Saratoga, CA) ; Goyal; Rajan; (Saratoga, CA)
; Badr; Imran; (Fremont, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Cavium, Inc. |
San Jose |
CA |
US |
|
|
Family ID: |
40589300 |
Appl. No.: |
14/324682 |
Filed: |
July 7, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11982433 |
Nov 1, 2007 |
8819217 |
|
|
14324682 |
|
|
|
|
Current U.S.
Class: |
707/758 |
Current CPC
Class: |
H04L 63/1441 20130101;
G06F 16/9024 20190101 |
Class at
Publication: |
707/758 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A method comprising: generating a graph including a plurality of
interconnected nodes in a device operatively coupled to a network,
the plurality of interconnected nodes including at least one
starting node and a plurality of ending nodes, the at least one
starting node associated with a comparison command and a location
table including multiple entries, each entry of the multiple
entries including node position information of a respective ending
node of the plurality of ending nodes and a location table string
value of a sub-string between the at least one starting node and
the respective ending node; and employing the comparison command to
compare at least one location table string value of the multiple
entries with an input sub-string value from an input stream to
detect a common sub-string of at least one expression matching in
the input stream, the input stream received from the network via a
hardware interface of the device.
2. The method of claim 1, wherein employing is based on positively
matching a given segment from the input stream at the at least one
starting node and identifying the at least one starting node as a
given node of the plurality of interconnected nodes associated with
a corresponding location table.
3. The method of claim 1, further comprising determining a first
length of the input sub-string value based on a second length of a
given location table string value of the at least one location
table string value of the multiple entries.
4. The method of claim 3, further comprising selecting a given
entry of the multiple entries having a longest length location
table string value of the at least one location table string value
in an event multiple location table string values of the at least
one location table string value are identified as matching multiple
input sub-string values.
5. The method of claim 1, further comprising recognizing the at
least one starting node as a starting node type of a plurality of
node types and employing the comparison command based on the
recognition.
6. The method of claim 1, wherein employing the comparison command
is based on determining that the at least one starting node is
associated with a forward arc that is associated with a given
segment from the input stream based on traversing the at least one
starting node with the given segment.
7. The method of claim 6, wherein the segment is a beginning
character of the at least one location table string value.
8. The method of claim 1, further comprising traversing a given end
node of the plurality of end nodes based on node position
information of a given entry of the multiple entries, the given
entry identified based on matching a given location table string
value of the multiple entries with the input sub-string value.
9. The method of claim 1, wherein to detect the common sub-string
includes comparing location table string values of the multiple
entries in any order.
10. The method of claim 1, wherein to detect the common sub-string
includes comparing each location table string value of the multiple
entries with different input sub-string values, concurrently.
11. The method of claim 1, wherein the device is a security
appliance.
12. An apparatus comprising: a compiler configured to generate a
graph including a plurality of interconnected nodes; and a memory
configured to store the generated graph, the plurality of
interconnected nodes including at least one starting node and a
plurality of ending nodes, the at least one starting node
associated with a comparison command and a location table including
multiple entries, each entry of the multiple entries including node
position information of a respective ending node of the plurality
of ending nodes and a location table string value of a sub-string
between the at least one starting node and the respective ending
node, the comparison command enabling a comparison of at least one
location table string value of the multiple entries with an input
sub-string value from an input stream received via a hardware
interface of the apparatus to detect a common sub-string of at
least one expression matching in the input stream.
13. An apparatus comprising: a hardware interface configured to
receive an input stream; and a walker configured to: traverse a
graph including a plurality of interconnected nodes, the plurality
of interconnected nodes including at least one starting node and a
plurality of ending nodes, the at least one starting node
associated with a comparison command and a location table including
multiple entries, each entry of the multiple entries including node
position information of a respective ending node of the plurality
of ending nodes and a location table string value of a sub-string
between the at least one starting node and the respective ending
node; and employ the comparison command to compare at least one
location table string value of the multiple entries with an input
sub-string value from an input stream to detect a common sub-string
of at least one expression matching in the input stream, the input
stream received from the network via a hardware interface of the
device.
14. The apparatus of claim 13, wherein the comparison command is
employed based on positively matching a given segment from the
input stream at the at least one starting node and identifying the
at least one starting node as a given node of the plurality of
interconnected nodes associated with a corresponding location
table.
15. The apparatus of claim 13, wherein the walker is further
configured to determine a first length of the input sub-string
value based on a second length of a given location table string
value of the at least one location table string value of the
multiple entries.
16. The apparatus of claim 15, wherein the walker is further
configured to select a given entry of the multiple entries having a
longest length location table string value of the at least one
location table string value in an event multiple location table
string values of the at least one location table string value are
identified as matching multiple input sub-string values.
17. The apparatus of claim 13, wherein the walker is further
configured to recognize the at least one starting node as a
starting node type of a plurality of node types and employing the
comparison command based on the recognition.
18. The apparatus of claim 13, wherein the walker is further
configured to employ the comparison command based on determining
that the at least one starting node is associated with a forward
arc that is associated with a given segment from the input stream
based on traversing the at least one starting node with the given
segment.
19. The apparatus of claim 18, wherein the segment is a beginning
character of the at least one location table string value.
20. The apparatus of claim 13, wherein the walker is further
configured to traverse a given end node of the plurality of end
nodes based on node position information of a given entry of the
multiple entries, the given entry identified based on matching a
given location table string value of the multiple entries with the
input sub-string value.
21. The apparatus of claim 13, wherein to detect the common
sub-string the walker is further configured to compare location
table string values of the multiple entries in any order.
22. The apparatus of claim 13, wherein to detect the common
sub-string the walker is further configured to compare each
location table string value of the multiple entries with different
input sub-string values, concurrently.
Description
RELATED APPLICATION
[0001] This application is a continuation of U.S. application Ser.
No. 11/982,433, filed Nov. 1, 2007. The entire teachings of the
above application are incorporated herein by reference.
BACKGROUND
[0002] The Open Systems Interconnection (OSI) Reference Model
defines seven network protocol layers (L1-L7) used to communicate
over a transmission medium. The upper layers (L4-L7) represent
end-to-end communications and the lower layers (L1-L3) represent
local communications.
[0003] Networking application aware systems need to process, filter
and switch a range of L3 to L7 network protocol layers, for
example, L7 network protocol layers such as, HyperText Transfer
Protocol (HTTP) and Simple Mail Transfer Protocol (SMTP), and L4
network protocol layers such as Transmission Control Protocol
(TCP). In addition to processing the network protocol layers, the
networking application aware systems need to simultaneously secure
these protocols with access and content based security through
L4-L7 network protocol layers including Firewall, Virtual Private
Network (VPN), Secure Sockets Layer (SSL), Intrusion Detection
System (IDS), Internet Protocol Security (IPSec), Anti-Virus (AV)
and Anti-Spam functionality at wire-speed.
[0004] Network processors are available for high-throughput L2 and
L3 network protocol processing, that is, performing packet
processing to forward packets at wire-speed. Typically, a general
purpose processor is used to process L4-L7 network protocols that
require more intelligent processing. Although a general purpose
processor can perform the compute intensive tasks, it does not
provide sufficient performance to process the data so that it can
be forwarded at wire-speed.
[0005] Content aware networking requires inspection of the contents
of packets at "wire speed." The content may be analyzed to
determine whether there has been a security breach or an intrusion.
A large number of patterns and rules in the form of regular
expressions are applied to ensure that all security breaches or
intrusions are detected. A regular expression is a compact method
for describing a pattern in a string of characters. The simplest
pattern matched by a regular expression is a single character or
string of characters, for example, /c/ or /cat/. The regular
expression also includes operators and meta-characters that have a
special meaning.
[0006] Through the use of meta-characters, the regular expression
can be used for more complicated searches such as, "abc*xyz". That
is, find the string "abc", followed by the string "xyz", with an
unlimited number of characters in-between "abc" and "xyz". Another
example is the regular expression "abc??abc*xyz;" that is, find the
string "abc," followed two characters later by the string "abc" and
an unlimited number of characters later by the string "xyz."
[0007] An Intrusion Detection System (IDS) application inspects the
contents of all individual packets flowing through a network, and
identifies suspicious patterns that may indicate an attempt to
break into or compromise a system. One example of a suspicious
pattern may be a particular text string in a packet followed 100
characters later by another particular text string.
[0008] Content searching is typically performed using a search
algorithm such as, Deterministic Finite Automata (DFA) to process
the regular expression. The DFA processes an input stream of
characters sequentially using a DFA graph and makes a state
transition based on the current character and state.
SUMMARY
[0009] A processor, and corresponding method, to search for a match
of at least one expression or sub-expression in an input stream is
presented. The processor, and corresponding method, may greatly
reduce the number of memory access required to search a graph by
employing a bypassing technique. The processor may comprise a
compiler configured to generate a graph of expressions including a
plurality of interconnected nodes, at least one ending node, and at
least one starting node, the starting node further including a
comparison command and a location table, the location table
including node position information of the at least one ending node
and a value of a sub-string between the at least one starting node
and the at least one ending node. The processor may also include a
memory unit configured to store the graph and a walker process
configured to consecutively traverse the nodes of the graph to
search for the match of the at least one expression in the input
stream. Upon reaching the at least one starting node, the walker
process may be configured to detect a common sub-string in the at
least one expression and the sub-string value in the location
table, using the comparison command. Upon detection of the common
sub-string, the walker process may be configured to bypass at least
two consecutively interconnected nodes to reach the at least one
ending node.
[0010] The starting node may be a root node and/or an
interconnecting node, the interconnecting node including at least
two interconnections to at least two other nodes. The ending node
may be a mark node, the mark node indicating a matched expression.
The ending node may also be an other starting node.
[0011] The walker process may be configured to bypass at least two
consecutively interconnected nodes and retrieve node position
information of the at least two consecutively interconnected nodes
with a single memory access.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] The foregoing will be apparent from the following more
particular description of example embodiments of the invention, as
illustrated in the accompanying drawings in which like reference
characters refer to the same parts throughout the different views.
The drawings are not necessarily to scale, emphasis instead being
placed upon illustrating embodiments of the present invention.
[0013] FIGS. 1A and 1B are block diagrams of a security appliance
including a network services processor and a protocol processor,
respectively.
[0014] FIG. 2 is a block diagram of the network services processor
shown in FIG. 1.
[0015] FIG. 3 is a block diagram illustrating content search
elements used by the processor of FIGS. 1A and 1B.
[0016] FIG. 4 is a block diagram of an example data structure that
is used by the Content Search Mechanism (CSM) to traverse a
graph.
[0017] FIG. 5 is an example of a DFA graph.
[0018] FIG. 6 is an example of a DFA graph according to an example
embodiment.
DETAILED DESCRIPTION
[0019] FIG. 1A is a block diagram of an example security appliance
102 including a network services processor 100. The security
appliance 102 may be a standalone system that may switch packets
received at one Ethernet port (Gig E) to another Ethernet port (Gig
E) and perform a plurality of security functions on received
packets prior to forwarding the packets. For example, the security
appliance 102 may be used to perform security processing on packets
received on a Wide Area Network prior to forwarding the processed
packets to a Local Area Network.
[0020] The network services processor 100 processes Open System
Interconnection network L2-L7 layer protocols encapsulated in
received packets. As is well-known to those skilled in the art, the
Open System Interconnection (OSI) reference model defines seven
network protocol layers (L1-L7). The physical layer (L1) represents
the actual interface, electrical and physical that connects a
device to a transmission medium. The data link layer (L2) performs
data framing. The network layer (L3) formats the data into packets.
The transport layer (L4) handles end to end transport. The session
layer (L5) manages communications between devices, for example,
whether communication is half-duplex or full-duplex. The
presentation layer (L6) manages data formatting and presentation,
for example, syntax, control codes, special graphics and character
sets. The application layer (L7) permits communication between
users, for example, file transfer and electronic mail.
[0021] The network services processor 100 may schedule and queue
work (packet processing operations) for upper level network
protocols, for example L4-L7, and allow processing of upper level
network protocols in received packets to be performed to forward
packets at wire-speed. Wire-speed is the rate of data transfer of
the network over which data is transmitted and received. By
processing the protocols to forward the packets at wire-speed, the
network services processor does not slow down the network data
transfer rate.
[0022] The network services processor 100 may include a plurality
of Ethernet Media Access Control interfaces with standard Reduced
Gigabyte Media Independent Interface (RGMII) connections to the
off-chip PHYs 104a, 104b.
[0023] The network services processor 100 may also receive packets
from the Ethernet ports (Gig E) through the physical interfaces PHY
104a, 104b, and perform L7-L2 network protocol processing on the
received packets and forwards processed packets through the
physical interfaces 104a, 104b to another hop in the network or the
final destination or through the PCI bus 106 for further processing
by a host processor. The network protocol processing may include
processing of network security protocols such as Firewall,
Application Firewall, Virtual Private Network (VPN) including IP
Security (IPSec) and/or Secure Sockets Layer (SSL), Intrusion
Detection System (IDS) and Anti-virus (AV).
[0024] The network services processor 100 may also include a low
latency memory controller for controlling low latency Dynamic
Random Access Memory (DRAM) 118.
[0025] The low latency DRAM 118 may be used for Internet Services
and Security applications allowing fast lookups, including the
string-matching that may be required for Intrusion Detection System
(IDS) or Anti Virus (AV) applications and other applications that
require string matching.
[0026] The network services processor 100 may perform pattern
search, regular expression processing, content validation,
transformation and security accelerate packet processing according
to an embodiment of the present invention. The regular expression
processing and pattern search may be used to perform string
matching for AV and IDS applications and other applications that
require string matching.
[0027] A DRAM controller in the network services processor 100 may
control access to an external Dynamic Random Access Memory (DRAM)
108 that is coupled to the network services processor 100. The DRAM
108 may store data packets received from the PHYs interfaces 104a,
104b or the Peripheral Component Interconnect Extended (PCI-X)
interface 106 for processing by the network services processor 100.
In one embodiment, the DRAM interface supports 64 or 128 bit Double
Data Rate II Synchronous Dynamic Random Access Memory (DDR II
SDRAM) operating up to 800 MHz. The DRAM may also store rules data
required for lookup and pattern matching in DFA graph expression
searches.
[0028] A boot bus 110 may provide the necessary boot code which may
be stored in flash memory 112 and may be executed by the network
services processor 100 when the network services processor 100 is
powered-on or reset. Application code may also be loaded into the
network services processor 100 over the boot bus 110, from a device
114 implementing the Compact Flash standard, or from another
high-volume device, which can be a disk, attached via the PCI
bus.
[0029] The miscellaneous I/O interface 116 offers auxiliary
interfaces such as General Purpose Input/Output (GPIO), Flash, IEEE
802 two-wire Management Data Input/Output Interface (MDIO),
Universal Asynchronous Receiver-Transmitters (UARTs) and serial
interfaces.
[0030] It should be appreciated that the example security appliance
102 may alternatively include a protocol processor 101 (FIG. 1B).
The protocol processor 101 may include the element of the network
services processor 100 with the addition of a content processing
accelerator 107, connected to the processor 101 via the PCl/PCI-X
connection 106, and an external DRAM 111 connected to the
accelerator 107. The accelerator 107 and DRAM 111 may be employed
in content search applications, therefore making all content
searching operations external to the processor 101.
[0031] FIG. 2 is a block diagram of the network services processor
100, or the protocol processor 101 shown in FIGS. 1A and 1B,
respectively. The network services processor 100, and/or the
protocol processor 101, delivers high application performance using
a plurality of processors (cores) 202 located on a L1 network
protocol. Network applications may be categorized into data plane
and control plane operations. Each of the cores 202 may be
dedicated to performing data plane or control plane operations. A
data plane operation may include packet operations for forwarding
packets. A control plane operation may include processing of
portions of complex higher level protocols such as Internet
Protocol Security (IPSec), Transmission Control Protocol (TCP) and
Secure Sockets Layer (SSL). A data plane operation may include
processing of other portions of these complex higher level
protocols.
[0032] A packet may be received by any one of the interface units
210a, 210b through a SPI-4.2 or RGM II interface. A packet may also
be received by the PCI interface 224. The interface unit 210a, 210b
handles L2 network protocol pre-processing of the received packet
by checking various fields in the L2 network protocol header
included in the received packet. After the interface unit 210a,
210b has performed L2 network protocol processing, the packet is
forwarded to the packet input unit 214. The packet input unit 214
may perform pre-processing of L3 and L4 network protocol headers
included in the received packet. The pre-processing includes
checksum checks for Transmission Control Protocol (TCP)/User
Datagram Protocol (UDP) (L3 network protocols).
[0033] The packet input unit 214 may write packet data into buffers
in Level 2 cache 212 or DRAM 108 in a format that is convenient to
higher-layer software executed in at least one processor 202 for
further processing of higher level network protocols. The packet
input unit 214 may also support a programmable buffer size and can
distribute packet data across multiple buffers to support large
packet input sizes.
[0034] The Packet order/work (POW) module (unit) 228 may queue and
schedule work (packet processing operations) for the processor 202.
Work is defined to be any task to be performed by a processor that
is identified by an entry on a work queue. The task can include
packet processing operations, for example, packet processing
operations for L4-L7 layers to be performed on a received packet
identified by a work queue entry on a work queue. Each separate
packet processing operation is a piece of the work to be performed
by a processor on the received packet stored in memory (L2 cache
memory 212 or DRAM 108). For example, the work may be the
processing of a received Firewall/Virtual Private Network (VPN)
packet. The processing of a Firewall/VPN packet may include the
following separate packet processing operations (pieces of work):
(1) defragmentation to reorder fragments in the received packet;
(2) IPSec decryption; (3) IPSec encryption; and (4) Network Address
Translation (NAT) or TCP sequence number adjustment prior to
forwarding the packet.
[0035] The network services processor 100, and/or the protocol
processor 101, may also include a memory subsystem. The memory
subsystem may include level 1 data cache memory 204 in each
processor 202, instruction cache in each processor 202, level 2
cache memory 212, a DRAM controller 216 for external DRAM memory
and the interface 230 to external low latency memory 118. The
memory subsystem is architected for multi-processor support and
tuned to deliver both high-throughput and low-latency required by
memory intensive content networking applications. Level 2 cache
memory 212 and external DRAM memory 108 (FIG. 1) may be shared by
all of the processors 202 and I/O co-processor devices.
[0036] The network services processor 100, and/or the protocol
processor 101, may also include application specific co-processors
that offload the processors 202 so that the network services
processor achieves high-throughput. The application specific
co-processors include a DFA co-processor 244 that performs
Deterministic Finite Automata (DFA) and a compression/decompression
co-processor 208 that performs compression and decompression.
[0037] Each processor 202 may be a dual-issue, superscalar
processor with instruction cache 206, Level 1 data cache 204,
built-in hardware acceleration (crypto acceleration module) 200 for
cryptography algorithms with direct access to low latency memory
over the low latency memory bus 230. The low-latency direct-access
path to low latency memory 118 bypasses the L2 cache memory 212 and
can be directly accessed from both the processors (cores) 202 and a
DFA co-processor 244. In one embodiment, the latency to access the
low-latency memory is less than 40 milliseconds.
[0038] Prior to describing the operation of the content search
macros used for regular expression processing and pattern search in
further detail, the other modules in the network services processor
100 will be described. In an example, after the packet has been
processed by the processors 202, a packet output unit (PKO) 218
reads the packet data from L2 cache or DRAM, performs L4 network
protocol post-processing (e.g., generates a TCP/UDP checksum),
forwards the packet through the interface unit 210a, 210b and frees
the L2 cache 212 or DRAM 108 locations used to store the
packet.
[0039] Each processor 202 is coupled to the L2 cache by a coherent
memory bus 234. The coherent memory bus 234 is the communication
channel for all memory and I/O transactions between the processors
202, an I/O Bridge (IOB) 232 and the Level 2 cache and controller
212.
[0040] A Free Pool Allocator (FPA) 236 maintains pools of pointers
to free memory in level 2 cache memory 212 and DRAM 108. A
bandwidth efficient (Last In First Out (LIFO)) stack is implemented
for each free pointer pool. If a pool of pointers is too large to
fit in the Free Pool Allocator (FPA) 236, the Free Pool Allocator
(FPA) 236 builds a tree/list structure in level 2 cache 212 or DRAM
108 using freed memory in the pool of pointers to store additional
pointers.
[0041] The I/O Bridge (IOB) 232 manages the overall protocol and
arbitration and provides coherent I/O partitioning. The IOB 232
includes a bridge 238 and a Fetch and Add Unit (FAU) 240. The
bridge 238 includes buffer queues for storing information to be
transferred between the I/O bus, coherent memory bus, the packet
input unit 214 and the packet output unit 218.
[0042] The Fetch and Add Unit (FAU) 240 is a 2KB register file
supporting read, write, atomic fetch-and-add, and atomic update
operations. The Fetch and Add Unit (FAU) 240 can be accessed from
both the processors 202 and the packet output unit 218. The
registers store highly-used values and thus reduce traffic to
access these values. Registers in the FAU 240 are used to maintain
lengths of the output queues that are used for forwarding processed
packets through the packet output unit 218.
[0043] The PCI interface controller 224 has a DMA engine that
allows the processors 202 to move data asynchronously between local
memory in the network services processor and remote (PCI) memory in
both directions.
[0044] Typically, content aware application processing utilizes a
Deterministic Finite Automata (DFA) to recognize a pattern in the
content of a received packet. The DFA is a finite state machine,
that is, a model of computation including a set of states, a start
state, an input alphabet (set of all possible symbols) and a
transition function that maps input symbols and current states to a
next state. Computation begins in the start state and changes to
new states dependent on the transition function. The DFA is
deterministic, that is, the behavior can be completely predicted
from the input. The pattern is a finite number of strings of
characters (symbols) to search for in the input stream (string of
characters).
[0045] The pattern is commonly expressed using a regular expression
that includes atomic elements, for example, normal text characters
such as, A-Z, 0-9 and meta-characters such as, *, and |. The atomic
elements of a regular expression are the symbols (single
characters) to be matched. These are combined with meta-characters
that allow concatenation (+) alternation (|), and Kleene-star (*).
The meta-character for concatenation is used to create multiple
character matching patterns from a single character (or
sub-strings) while the meta-character for alternation (|) is used
to create a regular expression that can match any of two or more
sub-strings. The meta-character Kleene-star (*) allows a pattern to
match any number, including no occurrences of the preceding
character or string of characters. Combining different operators
and single characters allows complex expressions to be constructed.
For example, the expression (th(is|at)*) will match the following
character strings: th, this, that, thisis, thisat, thatis, or
thatat.
[0046] The character class construct [. . . ] allows listing of a
list of characters to search for, e.g. gr[ea]y looks for both grey
and gray. A dash indicates a range of characters, for example,
[A-Z]. The meta-character "." matches any one character.
[0047] The input to the DFA state machine is typically a string of
(8-bit) bytes, that is, the alphabet is a single byte (one
character or symbol). Each byte in the input stream results in a
transition from one state to another state.
[0048] The states and the transition functions can be represented
by a graph, where each node in the graph represents a state and
arcs in the graph represent state transitions. The current state of
the state machine is represented by a node identifier that selects
a particular graph node. The graph may be stored in low latency
memory 118, or the main DRAM 108, and accessed by the processors
202 over the low latency bus. The processors 202 may access a
DFA-based graph stored in the low latency memory, or the main DRAM
108, directly. The graph will be described later in conjunction
with FIG. 5.
[0049] FIG. 3 is a block diagram illustrating content search macros
that may be used by a processor 202 in the network services
processor 100 shown in FIG. 2. Content search macros 300 may
include a walker software component (process) 302 for searching the
DFA-based content search graph that may be generated via a compiler
software component 304. The content search macros 300 may be stored
in L2/DRAM (212, 108) and may be executed by a processor 202. The
DFA-based content search graph may be stored in low latency memory
118 which is accessible directly by the processor 202 through the
low latency bus and low-latency memory controller shown in FIG. 2.
The compiler 304 translates expressions into a DFA-based content
search graph with a plurality of nodes.
[0050] After the compiler 304 has generated the content search
graph and the graph stored in low latency memory 118, or in main
DRAM 108, the walker process 302 executed by one of the processors
202 walks input data (e.g., a string of characters) in the received
data packet one character at a time and outputs a set of matches
based on a search for a pattern in the input data using the content
search graph.
[0051] FIG. 4 is a block diagram of an example of a typical data
structure 400 that may be stored in Low Latency Memory Dynamic
Random Access Memory 118, or the main DRAM 108, and used by the
Content Search Mechanism (CSM) executing in a processor 202 to
traverse a graph. The data structure 400 may be generated by the
compiler component 304 based on the expressions to be searched for
in the input stream.
[0052] The data structure may include a plurality of nodes, for
example nodes 402 and 404, that may be used in a content search
graph according to an embodiment of the present invention. Each
node in the graph may include an array of 256 next node pointers,
one for each unique input byte value; that is, 2.sup.8 (256
possible values, or 256 addresses) representing an ASCII value of
the input. Each next node pointer contains a next node ID that
directly specifies the next node/state for the input byte
value.
[0053] As shown in FIG. 4, a current node 402 comprises 256 arcs.
Each arc represents an input ASCII value. For example, in node 404,
the arc addressed as `97` includes a next node pointer for the
character `a.` Similarly, a next node 404 also comprises 256 arcs,
each arc comprising a unique address and including a next node
pointer for a corresponding ASCII value.
[0054] The arcs of a node may be forward arcs (e.g., arcs which
point to next nodes in the DFA graph), backward arcs (e.g., arcs
which point back to a root node or a prior node), or repeating arcs
(e.g., arcs which point back to the node to which they are
associated with). Arc 408 of node 404 comprises a node pointer to
node 404, and is therefore an example of a repeating arc. Arc 410
of node 404 comprises a next node pointer to node 402, which in
this context is considered to be a prior node, and therefore arc
410 is an example of a backward arc. In the example provided by
FIG. 4, the arc addressed as `66` of current node 402 comprises a
forward next node pointer 406 pointing to next node 404,
representing a character match of `B` with the input stream. It
should be appreciated that although FIG. 4 only shows 2 nodes, any
number of nodes may be included in a DFA based content search
graph.
[0055] FIG. 5 provides an example of a DFA graph 500 complied via
the compiler 304. For simplicity, only forward arcs have been
illustrated in the example graph 500. The node marked as `0` is the
root node and is a starting position for traversing the graph 500
with the walker process 302. Each of the nodes are interconnected
through arcs represented by the lines connecting each node. The
arcs shown in FIG. 5 are forward arcs representing a character
match between the expression being searched and an input character.
The nodes comprising a double line (e.g., nodes 3, 9, 10, 13, 15,
and 16) are referred to as mark nodes and represent a string match
in the input stream. For example, the double line around node 3
represents a string match of `CON`; node 9 represents a string
match of `CONTENT`; node 10 represents a string match of `CONTEXT`;
node 13 represents a string match of `CONTINUE`; node 15 represents
a string match of `CONTINUUM`; and node 16 representing a string
match of `CONTENTS.` A table 504 illustrates all of the possible
expression matches, and the corresponding nodal paths, for the
example DFA graph 500.
[0056] In operation, the walker process 302 may evaluate the input
stream one byte at a time. As an example, consider the input stream
502. The walker 302 evaluates the first character of the input
stream 502 which is `B.` The walker then proceeds to the root node
to access the next node pointer associated with the character `B.`
In the example provided by the DFA graph 500, the root node only
includes a forward match for the character `C.` Therefore, the arc
associated with the character `B` is a repeating arc (not shown)
comprising a next node pointer pointing back to the root node
`0.`
[0057] The walker process 302 then proceeds to the next character
in the input stream 502 which is `C.` Upon locating the arc
associated with the character `C,` the walker 302 finds a next node
pointer providing a forward match and pointing to node `1.` The
walker process 302 then intakes the next input stream character
`O,` and proceeds to find the associated arc and next node pointer
providing a forward match and leading to node `3.` Since node `3`
is a mark node, the walker process 302 registers that an expression
match for the string `CON` in the input stream has been found.
[0058] Depending on the specific IDS application, the walker
process 302 may proceed to evaluate the next character in the input
stream 502 and analyze the character `W.` The arc in node `3`
associated with the character `W` comprises a backward next node
pointer to the root node `0` since the only forward match
associated with node `3` is for the character `T.` The walker
process then proceeds to search for the arc in the root node `0`
associated with the current character `W.` Upon finding that the
associated arc is a repeating arc, pointing back to the root node
`0,` the walker process 302 proceeds to evaluate the next character
in the input stream 502, which is `X.`
[0059] Upon evaluating the next input character `X,` the associated
arc in the root node `0` is a repeating next node pointer since the
root node does not comprise a forward match for the character `X.`
Following the same logic discussed above, the walker process 302
may then proceed to find an expression match for the string
`CONTENT` in mark node `9.` Upon reading the next character `J,`
the walker process traverses back to the root node `0` and the arc
and next node pointer associated with the character `J` are read
from the root node `0.` Upon detecting a repeating arc and reaching
the end of the input stream 502, the walker process 302 completes
its walking of the DFA graph 500.
[0060] Typically, in the reading of each arc, the walker process
302 makes one access to external memory (e.g., low latency memory
118 or DRAM 108). These external memory accesses may be extremely
costly and may require a significant amount of system resources.
Therefore, in an embodiment of the present invention, a DFA graph
is compiled with location tables in order enable the walker process
to bypass nodes in the graph.
[0061] FIG. 6 illustrates a DFA graph according to an embodiment of
the present invention. The DFA graph 600 is similar to the DFA
graph 500 of FIG. 5 (e.g., the DFA graph 600 comprises the same
expressions to be search as shown in table 504) with the addition
of location tables. Nodes `0,` `4,` `6,` and `12` all comprise a
location table generated by the compiler 304 during a compiling
stage. Nodes `0,` `4,` `6,` and `12` may be considered as starting
nodes and the location table of each starting node may comprise a
number of string values, with each string value corresponding to a
respective ending node. Using the node position information in the
location table, the walker process 302 may bypass the nodes
interconnected between the starting node and the ending node. It
should be appreciated that the compiler 304 may designate any node
as a starting and/or ending node.
[0062] In operation, the walker process 302 analyzes the first byte
of an input stream 602, which is the character `B.` The walker
process 302 then proceeds to traverse the DFA graph 600 starting
with the root node `0.` Once the walker processes reaches the root
node `0,` the walker process recognizes the root node as a starting
node, where each starting node in the DFA graph 600 further
comprises a location table. The walker process first looks for the
arc associated with the character `B.` Upon finding the arc and
next node pointer associated with the character `B,` the walker
process detects that the arc is a repeating arc pointing back to
the root node `0` and proceeds to analyze the next byte in the
input stream, which is `C.` The walker process 302 then proceeds to
find the arc and next node pointer associated with the character
`C.`
[0063] Upon finding that the arc and next node pointer associated
with the character `C` is a forward arc, the walker process 302
proceeds to search the location table since node `0` has been
identified as a starting node. The walker process 302 may comprise
a string comparison function which may be configured to compare the
location table string values with the characters in the input
stream 602. Therefore, since the root node `0` has been recognized
as a starting node, the walker process 302 employs its string
comparison function starting with the largest string value in the
location table `CONTE.` Since the first string value in the
location table of node `0` is five characters long, the string
comparison function of the walker process 302 compares the current
character `C` and the four characters following the `C` character.
Therefore, the location table string value `CONTE` and the input
sub-string `CONWX` is compared.
[0064] Since the two strings being compared are not equal, the
string comparison function reports a negative match to the walker
process 302. Upon receiving the negative match result from the
string comparison function, the walker process 302 then proceeds to
the next entry in the location table `CONT.` Similarly, the
location table string value `CONT` and the input sub-string `CONW`
is compared resulting in a negative match.
[0065] The walker process 302 then proceeds to use the string
comparison function to compare the location table string value
`CON` and the input sub-string `CON.` This comparison results in a
positive string match. It should be appreciated that the walker
process 302 may utilize the string comparison function on the
entries of the location table in any order. Additionally, the
walker process 302 may evaluate any number of entries in the
location table at a time.
[0066] The walker process 302 may then proceed to obtain the
address of the ending node associated with the string value
resulting in the position string match, which in this example, the
string value `CON` yields the address of node `3.` Upon, obtaining
the address of the ending node associated with the matched string
value, the walker process 302 bypasses the DFA graph and traverses
directly to the ending node. Therefore, the walker process 302 does
not traverse the intermediate nodes `1` and `2` comprised between
the starting node `0` and the ending node `3.` Once the walker
process 302 reaches node `3` the walker recognizes the node as a
mark node and reports that an expression match has been found.
[0067] Depending on the application for which the DFA graph is
utilized, the walker process 302 may continue to analyze the input
stream 602. The next character in the input stream 602 is the
character `W.` The walker process 302 searches node `3` for the arc
and next node pointer associated with the character `W.` Upon
obtaining the associated arc, the walker process 302 recognizes the
arc as a backward arc and traverses the DFA graph back to the root
node `0.` At the root node `0,` the walker process 302 searches for
the arc and next node pointer associated with the character `W.`
Upon finding a repeating arc, the walker process 302 will evaluate
the next character in the input stream 602. Similarly, the walker
process 302 finds that the arc associated with the next input
character `X` as a repeating arc. Thus, the walker process 302
remains in the root node `0` and proceeds to evaluate the next
input character.
[0068] The next input stream 602 character is `C.` The walker
process 302 proceeds to find the arc and next node pointer
associated with the character `C.` Upon finding that the associated
arc is a forward arc, since the root node `0` is identified as a
starting node, the walker process 302 proceeds to perform a string
comparison on an input stream sub-string and the string values in
the location table. Therefore, the first location table string
value `CONTE` is compared with an input sub-string of the same
length starting with the current character, `CONTE.`
[0069] The string comparison function provides a positive match
result, therefore prompting the walker process 302 to locate the
address of the associated end node. Upon locating the address of
the associated end node, the walker process 302 proceeds to bypass
the DFA graph to node `6.` Thus, the walker process 302 bypasses
the intermediate nodes `1`-`4` and traverse the DFA graph directly
to node `6.`
[0070] Upon reaching node `6` the walker process 302 proceeds to
analyze the next character `N` in the input stream 602 and searches
for its associated arc and next node pointer in node `6.` The
walker process 302 finds a forward arc associated with the
character `N` and also recognizes node `6` as a starting node.
Therefore, the walker process begins the string comparison function
starting with the first string value in the location table of node
`6.` The string comparison function compares the first location
table string value `NTS` with the input sub-string of the same
length starting with the current character, `NTS.`
[0071] The string comparison function provides a positive match
result, therefore prompting the walker process 302 to locate the
address of the associated end node. Upon locating the address of
the associated end node, the walker process 302 proceeds to bypass
the DFA graph to node `16.` Thus, the walker process 302 bypasses
the intermediate nodes `7` and `9` and traverses the DFA graph
directly to node `16.` Since node `16` is a mark node, the walker
process 302 detects that an expression match for the string
`CONTENTS` has been found.
[0072] The walker process 302 may then analyze the next character
in the input stream, which is `C.` The walker process searches the
current node, which is node `16,` for the associated arc and next
node pointer. The associated arc and next node pointer points in a
backward direction and traverses the walker back to the root node
`0.`
[0073] Following the same logic, the walker process 302 bypasses
the DFA graph from the root node `0` to the node `4` upon finding a
string comparison match for the sub-string `CONT.` Upon analyzing
the location table associated with node `4,` the walker process 302
then bypasses to node `12` after detecting a string comparison
match for the sub-string `INU.` At node `12` the walker process may
then traverse to node `13,` where an expression match will be
detected for the expression `CONTINUE.`
[0074] In the example illustrated by FIG. 6, three expressions have
been located. The first expression `CON` was obtained with a single
memory access (e.g., the single memory access of node `3`), the
second expression `CONTENTS` was obtained with three memory
accesses (e.g., an access to node `4,` node `9,` and node `16`),
the third expression `CONTINUE` was also obtained with three memory
accesses (e.g., an access to node `4,` node `12,` and node `13`).
Using the DFA graph traversing method illustrated in FIG. 5, the
first expression would have required three memory accesses, the
second expression would have required eight memory accesses, and
the third expression would have also required eight memory
accesses. Therefore, by employing the bypass process illustrated in
FIG. 6, the total number of memory accesses may be greatly
reduced.
[0075] While this invention has been particularly shown and
described with references to example embodiments thereof, it will
be understood by those skilled in the art that various changes in
form and details may be made therein without departing from the
scope of the invention encompassed by the appended claims.
* * * * *