U.S. patent application number 16/581637 was filed with the patent office on 2020-01-16 for hierarchical aggregation of select network traffic statistics.
The applicant listed for this patent is Silver Peak Systems, Inc.. Invention is credited to David Anthony Hughes, Pawan Kumar Singh.
Application Number | 20200021506 16/581637 |
Document ID | / |
Family ID | 60573215 |
Filed Date | 2020-01-16 |
![](/patent/app/20200021506/US20200021506A1-20200116-D00000.png)
![](/patent/app/20200021506/US20200021506A1-20200116-D00001.png)
![](/patent/app/20200021506/US20200021506A1-20200116-D00002.png)
![](/patent/app/20200021506/US20200021506A1-20200116-D00003.png)
![](/patent/app/20200021506/US20200021506A1-20200116-D00004.png)
![](/patent/app/20200021506/US20200021506A1-20200116-D00005.png)
![](/patent/app/20200021506/US20200021506A1-20200116-D00006.png)
![](/patent/app/20200021506/US20200021506A1-20200116-D00007.png)
![](/patent/app/20200021506/US20200021506A1-20200116-D00008.png)
![](/patent/app/20200021506/US20200021506A1-20200116-D00009.png)
United States Patent
Application |
20200021506 |
Kind Code |
A1 |
Hughes; David Anthony ; et
al. |
January 16, 2020 |
HIERARCHICAL AGGREGATION OF SELECT NETWORK TRAFFIC STATISTICS
Abstract
Disclosed herein are systems and methods for the collection,
aggregation, and processing of network traffic statistics for a
plurality of network appliances in a wide area network. Select
network traffic statistics can be collected and associated with a
hierarchical string, and aggregated over time. In this way, only
information that is likely to be relevant is gathered and
maintained, allowing for the maintenance of select network traffic
statistics for large-scale operations.
Inventors: |
Hughes; David Anthony; (Los
Altos Hills, CA) ; Singh; Pawan Kumar; (Los Altos,
CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Silver Peak Systems, Inc. |
Santa Clara |
CA |
US |
|
|
Family ID: |
60573215 |
Appl. No.: |
16/581637 |
Filed: |
September 24, 2019 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15180981 |
Jun 13, 2016 |
10432484 |
|
|
16581637 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/22 20190101;
H04L 43/067 20130101; H04L 43/062 20130101; H04L 43/026 20130101;
H04L 43/045 20130101; G06F 16/248 20190101; G06F 16/284
20190101 |
International
Class: |
H04L 12/26 20060101
H04L012/26; G06F 16/28 20060101 G06F016/28; G06F 16/248 20060101
G06F016/248; G06F 16/22 20060101 G06F016/22 |
Claims
1. A method for aggregating select network traffic statistics for
each of a plurality of network appliances connected in a
communication network, the method comprising: for each flow from a
first network appliance, extracting a first attribute value of a
first flow attribute; for each flow from the first network
appliance, extracting a second attribute value of a second flow
attribute; building at least one hierarchical string with the
extracted first attribute value and the extracted second attribute
value, extracting at least one network metric for at least one
network characteristic associated with the at least one
hierarchical string; aggregating the at least one network metric
for the at least one network characteristic over a plurality of
flows to and from the first network appliance in the communication
network; generating an accumulating map that is updated in
substantially real time, the accumulating map comprising the at
least one hierarchical string and associated aggregated network
metrics for the first flow attribute and the second flow attribute
of the hierarchical string, wherein the accumulating map has a
target number of entries for a specified time period and an
eviction policy determines how information is aggregated once the
accumulating map reaches its target number of entries for the
specified time period, the eviction policy determining that a
record is aggregated into a higher level record of the accumulating
map and is evicted from the accumulating map; and transmitting the
accumulating map to a network information collector in
communication with the plurality of network appliances.
2. The method of claim 1, wherein information regarding each flow
to or from a given network appliance is collected in a flow
table.
3. The method of claim 1, wherein the first and the second flow
attributes are extracted at a first time interval.
4. The method of claim 1, wherein the accumulating map is
transmitted to the network information collector at a second time
interval, the second time interval being a different amount of time
than a first time interval.
5. The method of claim 1, wherein a new accumulating map is started
at the first network appliance after the aggregated information is
transmitted to the network information collector.
6. The method of claim 1, wherein the hierarchical string
represents a subset of network traffic statistics collected for the
first network appliance.
7. The method of claim 1, wherein the second attribute of the
hierarchical string further defines the first attribute of the
hierarchical string;
8. The method of claim 1, wherein the accumulating map comprises an
eviction log for collected information in excess of the target
number of entries for the specified time period, the eviction log
comprising a summary of strings of information in excess of the
target number of entries for the specified time period.
9. The method of claim 1, wherein the eviction policy determines
that once the target number of entries is reached for the specified
time period, any new information collected will be discarded, and
not aggregated during that time period.
10. The method of claim 1, wherein the eviction policy further
determines that an evicted record is moved to an eviction log when
aggregated into a higher level record of the accumulating map.
11. The method of claim 1, wherein the eviction policy determines
that a portion of at least one hierarchical string of information
is removed from the accumulating map to reduce the number of
entries below a maximum number of entries for the specified time
period.
12. The method of claim 1, wherein the eviction policy removes a
predetermined number of records from the accumulating map and moves
them to an eviction log, when a maximum number of entries for the
specified time period is reached.
13. The method of claim 1, further comprising: in response to a
query regarding network traffic from a user, displaying a portion
of the information collected from each network appliance on a
graphical user interface to the user.
14. The method of claim 9, wherein the eviction log is
post-processed to minimize information loss.
15. The method of claim 1, wherein the aggregated information is
stored in bins.
16. The method of claim 1, further comprising: for each flow from
the first network appliance, extracting a second network metric of
the first flow attribute and its corresponding value.
17. A system for aggregating select network traffic statistics,
comprising: a plurality of network appliances in a communication
network, each of the plurality of network appliances configured to:
collect a plurality of flow attributes for network traffic through
each network appliance; build at least one hierarchical string of
network traffic flow attributes with an extracted first attribute
value and an extracted second attribute value of the collected flow
attributes; extract at least one network metric for at least one
network characteristic associated with each of the at least one
hierarchical string; aggregate the at least one network metric for
the at least one network characteristic over a plurality of flows
to or from the network appliance; generate an accumulating map that
is updated in substantially real time, the accumulating map
comprising the at least one hierarchical string and associated
aggregated network metrics for a first flow attribute and a second
flow attribute of the hierarchical string, wherein the accumulating
map has a target number of entries for a specified time period and
an eviction policy determines that a record is aggregated into a
higher level record of the accumulating map and is evicted from the
accumulating map when the accumulating map reaches the target
number of entries for the specified time period; and transmit the
accumulating map to a network information collector in
communication with each network appliance; and the network
information collector configured to receive information from each
network appliance, and provide the information to a user on a
graphical user display.
18. The system of claim 17, wherein the second attribute of the
hierarchical string further defines the first attribute of the
hierarchical string.
19. The system of claim 17, wherein each of the plurality of
network appliances further generates at least one indexing data
structure for the accumulating map.
20. The system of claim 17, wherein the extracted first attribute
value and the extracted second attribute value are extracted at a
first time interval.
21. The system of claim 17, wherein the accumulating map is
transmitted to the network information collector at a second time
interval, the second time interval being a different amount of time
than a first time interval at which the extracted first attribute
value and the extracted second attribute value are extracted.
22. The system of claim 17, wherein the network appliance is
further configured to: generate a new accumulating map, after a
previous accumulating map is transmitted to the network information
collector.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a Continuation of, and claims the
priority benefit of, U.S. patent application Ser. No. 15/180,981
filed on Jun. 13, 2016. The disclosure of the above-referenced
application is incorporated herein by reference in its entirety for
all purposes.
TECHNICAL FIELD
[0002] This disclosure relates generally to the collection,
aggregation, and processing of network traffic statistics for a
plurality of network appliances.
BACKGROUND
[0003] The approaches described in this section could be pursued,
but are not necessarily approaches that have previously been
conceived or pursued. Therefore, unless otherwise indicated, it
should not be assumed that any of the approaches described in this
section qualify as prior art merely by virtue of their inclusion in
this section.
[0004] An increasing number of network appliances, physical and
virtual, are deployed in communication networks such as wide area
networks (WAN). For each network appliance, it may be desirable to
monitor attributes and statistics of the data traffic handled by
the device. For example, information can be collected regarding
source IP addresses, destination IP addresses, traffic type, port
numbers, etc. for the traffic that passes through the network
appliance. Typically this information is collected for each data
flow using industry standards such as NetFlow and IPFIX. The
collected data is transported across the network to a collection
engine, stored in a database, and can be utilized for running
queries and generating reports regarding the network.
[0005] Since there can be any number of data flows processed by a
network appliance each minute (hundreds, thousands, or even
millions), this results in a large volume of data that is collected
each minute, for each network appliance. As the number of network
appliances in a communication network increases, the amount of data
generated can quickly become unmanageable. Moreover, transporting
all of this data across the network from each network appliance to
the collection engine can be a significant burden, as well as
storing and maintaining a database with all of the data. Further,
it may take longer to run a query and generate a report since the
amount of data to be processed and analyzed is so large.
[0006] Thus, there is a need for a more efficient mechanism for
collecting and storing network traffic statistics for a large
number of network appliances in a communication network.
SUMMARY
[0007] This summary is provided to introduce a selection of
concepts in a simplified form that are further described in the
Detailed Description below. This summary is not intended to
identify key features or essential features of the claimed subject
matter, nor is it intended to be used as an aid in determining the
scope of the claimed subject matter.
[0008] In various exemplary methods of the present disclosure, a
system for aggregating select network traffic statistics is
disclosed. The system comprises a plurality of network appliances
in a communication network configured to collect a plurality of
flow attributes for network traffic through each network appliance,
build a plurality of hierarchical strings of network traffic flow
attributes with extracted attribute values of those flow
attributes, extract at least one network metric for at least one
network characteristic associated with each of the plurality of
hierarchical strings, and aggregate the at least one network metric
for the at least one network characteristic over the plurality of
flows, and transmit the aggregated information to a network
information collector in communication with each network appliance;
and the network information collector configured to receive the
information from each network appliance, and provide the
information to a user on a graphical user display in response to
the user running a query on the received information.
[0009] In other embodiments, a method for aggregating select
network traffic statistics for each of a plurality of network
appliances connected in a communication network is disclosed. The
method for each flow from a network appliance, extracting an
attribute value of a first flow attribute; for each flow from the
network appliance, extracting an attribute value of a second flow
attribute; building at least one hierarchical string with the
extracted attribute values; extracting at least one network metric
for at least one network characteristic associated with the at
least one hierarchical string; aggregating the at least one network
metric for the at least one network characteristic over a plurality
of flows; and transmitting the at least one hierarchical string and
associated aggregated network metrics to a network information
collector in communication with the network appliance.
[0010] Other features, examples, and embodiments are described
below.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] Embodiments are illustrated by way of example and not by
limitation in the figures of the accompanying drawings, in which
like references indicate similar elements.
[0012] FIG. 1A depicts an exemplary system of the prior art.
[0013] FIG. 1B depicts an exemplary system within which the present
disclosure can be implemented.
[0014] FIG. 2 illustrates a block diagram of a network appliance,
in an exemplary implementation of the disclosure.
[0015] FIG. 3 depicts an exemplary flow table at a network
appliance.
[0016] FIG. 4A depicts an exemplary accumulating map at a network
appliance.
[0017] FIG. 4B depicts exemplary information from a row of an
accumulating map.
[0018] FIG. 5A depicts an exemplary sorting via bins for an
accumulating map.
[0019] FIG. 5B depicts an exemplary eviction policy for an
accumulating map.
[0020] FIG. 6 depicts an exemplary method for building a
hierarchical string.
DETAILED DESCRIPTION
[0021] The following detailed description includes references to
the accompanying drawings, which form a part of the detailed
description. The drawings show illustrations, in accordance with
exemplary embodiments. These exemplary embodiments, which are also
referred to herein as "examples," are described in enough detail to
enable those skilled in the art to practice the present subject
matter. The embodiments can be combined, other embodiments can be
utilized, or structural, logical, and electrical changes can be
made without departing from the scope of what is claimed. The
following detailed description is therefore not to be taken in a
limiting sense, and the scope is defined by the appended claims and
their equivalents. In this document, the terms "a" and "an" are
used, as is common in patent documents, to include one or more than
one. In this document, the term "or" is used to refer to a
nonexclusive "or," such that "A or B" includes "A but not B," "B
but not A," and "A and B," unless otherwise indicated.
[0022] The embodiments disclosed herein may be implemented using a
variety of technologies. For example, the methods described herein
may be implemented in software executing on a computer system
containing one or more computers, or in hardware utilizing either a
combination of microprocessors or other specially designed
application-specific integrated circuits (ASICs), programmable
logic devices, or various combinations thereof. In particular, the
methods described herein may be implemented by a series of
computer-executable instructions residing on a storage medium, such
as a disk drive, or computer-readable medium.
[0023] The embodiments described herein relate to the collection,
aggregation, and processing of network traffic statistics for a
plurality of network appliances.
[0024] FIG. 1A depicts an exemplary system 100 within which
embodiments of the prior art are implemented. The system comprises
a plurality of network appliances 110 in communication with a flow
information collector 120 over one or more wired or wireless
communication network(s) 160. The flow information collector 120 is
further in communication with one or more flow database(s) 125,
which in turn is in communication with a reporting engine 140 that
is accessible by a user 150.
[0025] Network appliance 110 collects information about network
flows that are processed through the appliance and maintains flow
records 112. These flow records are transmitted to the flow
information collector 120 and maintained in flow database 125. User
150 can access information from these flow records 112 via
reporting engine 140.
[0026] FIG. 1B depicts an exemplary system 170 within which the
present disclosure can be implemented. The system comprises a
plurality of network appliances 110 in communication with a network
information collector 180 over one or more wired or wireless
communication network(s) 160. The network information collector 180
is further in communication with one or more database(s) 130, which
in turn is in communication with a reporting engine 140 that is
accessible by a user 150. While network information collector 180,
database(s) 130, and reporting engine 140 are depicted in the
figure as separate, one or more of these engines can be part of the
same computing machine or distributed across many computers.
[0027] In a wide area network, there can be multiple network
appliances deployed in one or more geographic locations. Each
network appliance 110 comprises hardware and/or software elements
configured to receive data and optionally perform any type of
processing, including but not limited to, WAN optimization
techniques to the data, before transmitting to another appliance.
In various embodiments, the network appliance 110 can be configured
as an additional router or gateway. If a network appliance has
multiple interfaces, it can be transparent on some interfaces, and
act like a router/bridge on others. Alternatively, the network
appliance can be transparent on all interfaces, or appear as a
router/bridge on all interfaces. In some embodiments, network
traffic can be intercepted by another device and mirrored (copied)
onto network appliance 110. The network appliance 110 may further
be either physical or virtual. A virtual network appliance can be
in a virtual private cloud (not shown), managed by a cloud service
provider, such as Amazon Web Services, or others.
[0028] Network appliance 110 collects information about network
flows that are processed through the appliance in flow records 112.
From these flow records 112, network appliance 110 further
generates an accumulating map 114 containing select information
from many flow records 112 aggregated over a certain time period.
The flow records 112 and accumulating map 114 generated at network
appliance 110 are discussed in further detail below with respect to
FIGS. 3 and 4.
[0029] At certain time intervals, network appliance 110 transmits
information from the accumulating map 114 (and not flow records
112) to network information collector 180 and maintains this
information in one or more database(s) 130. User 150 can access
information from these accumulating maps via reporting engine 140,
or in some instances user 150 can access information from these
accumulating maps directly from a network appliance 110.
[0030] FIG. 2 illustrates a block diagram of a network appliance
110, in an exemplary implementation of the disclosure. The network
appliance 110 includes a processor 210, a memory 220, a WAN
communication interface 230, a LAN communication interface 240, and
a database 250. A system bus 280 links the processor 210, the
memory 220, the WAN communication interface 230, the LAN
communication interface 240, and the database 250. Line 260 links
the WAN communication interface 230 to another device, such as
another appliance, router, or gateway, and line 270 links the LAN
communication interface 240 to a user computing device, or other
networking device. While network appliance 110 is depicted in FIG.
2 as having these exemplary components, the appliance may have
additional or fewer components.
[0031] The database 250 comprises hardware and/or software elements
configured to store data in an organized format to allow the
processor 210 to create, modify, and retrieve the data. The
hardware and/or software elements of the database 250 may include
storage devices, such as RAM, hard drives, optical drives, flash
memory, and magnetic tape.
[0032] In some embodiments, some network appliances comprise
identical hardware and/or software elements. Alternatively, in
other embodiments, some network appliances may include hardware
and/or software elements providing additional processing,
communication, and storage capacity.
[0033] Each network appliance 110 can be in communication with at
least one other network appliance 110, whether in the same
geographic location, different geographic location, private cloud
network, customer datacenter, or any other location. As understood
by persons of ordinary skill in the art, any type of network
topology may be used. There can be one or more secure tunnels
between one or more network appliances. The secure tunnel may be
utilized with encryption (e.g., IPsec), access control lists
(ACLs), compression (such as header and payload compression),
fragmentation/coalescing optimizations and/or error detection and
correction provided by an appliance.
[0034] A network appliance 110 can further have a software program
operating in the background that tracks its activity and
performance. For example, information about data flows that are
processed by the network appliance 110 can be collected. Any type
of information about a flow can be collected, such as header
information (source port, destination port, source address,
destination address, protocol, etc.), packet count, byte count,
timestamp, traffic type, or any other flow attribute. This
information can be stored in a flow table 300 at the network
appliance 110. Flow tables will be discussed in further detail
below, with respect to FIG. 3.
[0035] In exemplary embodiments, select information from flow table
300 is aggregated and populated into an accumulating map, which is
discussed in further detail below with respect to FIG. 4.
Information from the accumulating map is transmitted by network
appliance 110 across communication networks(s) 160 to network
information collector 180. In this way, the information regarding
flows processed by network appliance 110 is not transmitted
directly to network information collector 180, but rather a
condensed and aggregated version of selected flow information is
transmitted across the network, creating less network traffic.
[0036] After a flow table 300 is used to populate an accumulating
map, or on a certain periodic basis or activation of a condition,
flow table 300 may be discarded by network appliance 110 and a new
flow table is started. Similarly, after an accumulating map 400 is
received by network information collector 180, or on a certain
periodic basis or activation of a condition, accumulating map 400
may be discarded by network appliance 110 and a new accumulating
map is started.
[0037] Returning to FIG. 1B, network information collector 180
comprises hardware and/or software elements, including at least one
processor, for receiving data from network appliance 110 and
processing it. Network information collector 180 may process data
received from network appliance 110 and store the data in
database(s) 130. In various embodiments, database(s) 130 is a
relational database that stores the information from accumulating
map 400. The information can be stored directly into database(s)
130 or separated into columns and then stored in database(s)
130.
[0038] Database(s) 130 is further in communication with reporting
engine 140. Reporting engine 140 comprises hardware and/or software
elements, including at least one processor, for querying data in
database(s) 130, processing it, and presenting it to user 150 via a
graphical user interface. In this way, user 150 can run any type of
query on the stored data. For example, a user can run a query
requesting information on the most visited websites, or a "top
talkers" report, as discussed in further detail below.
[0039] FIG. 3 depicts an exemplary flow table 300 at network
appliance 110 for flows 1 through N, with N representing any
number. The flow table contains one or more rows of information for
each flow that is processed through network appliance 110. Data
packets transmitted and received between a single user and a single
website that the user is browsing can be parsed into multiple
flows. Thus, one browsing session for a user on a website may
comprise many flows. Typically a TCP flow begins with a SYN packet
and ends with a FIN packet. Other methods can be used for
determining the start and end of non-TCP flows. The attributes of
each of these flows, while they may be identical or substantially
similar, are by convention stored in different rows of flow table
300 since they are technically different flows.
[0040] In exemplary embodiments, flow table 300 may collect certain
information about the flow, such as header information 310, network
information 320, and other information 330. As would be understood
by a person of ordinary skill in the art, flow table 300 can
comprise fewer or additional fields than depicted in FIG. 3.
Moreover, even though header information 310 is depicted as having
three entries in exemplary flow table 300, there can be fewer or
additional entries for header information. Similarly, there can be
fewer or additional entries for network information 320 and for
other information 330 than the number of entries depicted in
exemplary flow table 300.
[0041] Header information 310 can comprise any type of information
found in a packet header, for example, source port, destination
port, source address (such as IP address), destination address,
protocol. Network information 320 can comprise any type of
information regarding the network, such as a number of bytes
received or a number of bytes transmitted during that flow.
Further, network information 320 can contain information regarding
other characteristics such as loss, latency, jitter, re-ordering,
etc. Flow table 300 may store a sum of the number of packets or
bytes of each characteristic, or a mathematical operator other than
the sum, such as maximum, minimum, mean, median, average, etc.
Other information 330 can comprise any other type of information
regarding the flow, such as traffic type or domain name (instead of
address).
[0042] In an example embodiment, entry 340 of flow N is the source
port for the flow, entry 345 is the destination port for the flow,
and entry 350 is the destination IP address for the flow. Entry 355
is the domain name for the website that flow N originates from or
is directed to, entry 360 denotes that the flow is for a voice
traffic type, and entry 365 is an application name (for example
from deep packet inspection (DPI)). Entry 370 contains the number
of packets in the flow and entry 375 contains a number of bytes in
the flow.
[0043] The flow information regarding every flow is collected by
the network appliance 110 at all times, in the background. A
network appliance 110 could have one million flows every minute, in
which case a flow table for one minute of data for network
appliance 110 would have one million rows. Over time, this amount
of data becomes cumbersome to process, synthesize, and manipulate.
Conventional systems may transport a flow table directly to a flow
information collector, or to reduce the amount of data, retain only
a fraction of the records from the flow table as a sample. In
contrast, embodiments of the present disclosure reduce the amount
of data to be processed regarding flows, with minimal information
loss, by synthesizing selected information from flow table 300 into
an accumulating map. This synthesis can occur on a periodic basis
(such as every minute, every 5 minutes, every hour, etc.), or upon
the meeting of a condition, such as number of flows recorded in the
flow table 300, network status, or any other condition.
[0044] FIG. 4A depicts exemplary accumulating maps that are
constructed from information from flow table 300. A string of
information is built in a hierarchical manner from information in
flow table 300. A network administrator can determine one or more
strings of information to be gathered. For example, a network
administrator may determine that information should be collected
regarding a domain name, user computing device, and user computer's
port number that is accessing that domain. A user computing device
can identify different computing devices utilized by the same user
(such as a laptop, smartphone, desktop, tablet, smartwatch, etc.).
The user computing device can be identified in any manner, such as
by host name, MAC address, user ID, etc.
[0045] Exemplary table 400 has rows 1 through F, with F being any
number, for the hierarchical string "/domain name/computer/port"
that is built from this information. Since the accumulating map 400
is an aggregation of flow information, F will be a much smaller
value than N, the total number of flows from flow table 300.
[0046] Exemplary table 450 shows data being collected for a string
of source IP address and destination IP address combinations. Thus,
information regarding which IP addresses are communicating with
each other is accumulated. Network appliance 110 can populate an
accumulating map for any number of strings of information from flow
table 300. In an exemplary embodiment, network appliance 110
populates multiple accumulating maps, each for a different string
hierarchy of information from flow table 300. While FIG. 4A depicts
only two string hierarchies, there can be fewer or additional
strings of information collected in accumulating maps.
[0047] Row 410 in exemplary accumulating map 400 shows that during
the time interval represented, sampledomain1 was accessed by
computer1 from port 1. All of the flows where sampledomain1 was
accessed by computer1 from port1 in flow table 300 are aggregated
into a single row, row 410, in accumulating map 400. The network
information 320 may be aggregated for the flows to depict a total
number of bytes received and a total number of packets received
from sampledomain1 accessed by computer1 via port1 during the time
interval of flow table 300. In this way, a large number of flows
may be condensed into a single row in accumulating map 400.
[0048] As would be understood by a person of ordinary skill in the
art, while accumulating map 400 depicts a total number of bytes
received and a total number of packets received (also referred to
herein as a network characteristic), any attribute can be collected
and aggregated into accumulating map 400. For example, instead of a
sum of bytes received, accumulating map 400 can track a maximum
value, minimum value, median, percentile, or other numeric
attribute for a string. Additionally, the network characteristic
can be other characteristics besides number of packets or number of
bytes. Loss, latency, re-ordering, and other characteristics can be
tracked for a string in addition to, or instead of, packets and
bytes, such as number of flows that are aggregated into the row.
For example, packet loss and packet jitter can be measured by time
stamps and serial numbers from the flow table. Additional
information on measurement of network characteristics can be found
in commonly owned U.S. Pat. No. 9,143,455 issued on Sep. 22, 2015
and entitled "Quality of Service Using Multiple Flows", which is
hereby incorporated herein in its entirety.
[0049] Row 430 shows that the same computer (computer1) accessed
the same domain name (sampledomain1), but from a different port
(port2). Thus, all of the flows in flow table 300 from port2 of
computer1 to sampledomain1 are aggregated into row 430. Similarly,
accumulating map 400 can be populated with information from flow
table 300 for any number of domains accessed by any number of
computers from any number of ports, as shown in row 440.
[0050] Flow table 300 may comprise data for one time interval while
accumulating map 400 can comprise data for a different time
interval. For example, flow table 300 can comprise data for all
flows through network appliance 110 over the course of a minute,
while data from 60 minutes can all be aggregated into one
accumulating map. Thus, if a user returns to the same website from
the same computer from the same port within the same hour, even
though this network traffic is on a different flow, the data can be
combined with the previous flow information for the same parameters
into the accumulating map. This significantly reduces the number of
records that are maintained. All activity between a computer and a
domain from a certain port is aggregated together as one record in
the accumulating map, instead of multiple records per flow. This
provides information in a compact manner for further processing,
while also foregoing the maintenance of all details about more
specific activities.
[0051] Exemplary accumulating map 450 depicts flow information for
another string--source IP address and destination IP address
combinations. In IPv4 addressing alone, there are four billion
possibilities for source IP addresses and four billion
possibilities for destination IP addresses. To maintain a table of
all possible IP address combinations between these would be an
unwieldy table of information to collect. Further, most
combinations for a particular network appliance 110 would be zero.
Thus, to maintain large volumes of data in a scalable way, the
accumulating map 450 only collects information regarding IP
addresses actually used as a source or destination, instead of
every possible combination of IP addresses.
[0052] The accumulating map 450 can be indexed in different
indexing structures, as would be understood by a person of ordinary
skill in the art. For example, a hash table can be used where the
key is the string and a hash of the string is computed to find a
hash bin. In that bin is a list of strings and their associated
values. Furthermore, there can be additional indexing to make
operations (like finding smallest value) fast, as discussed herein.
An accumulating map may comprise the contents of the table, such as
that depicted in 400 and 450, and additionally one or more indexing
structures and additional information related to the table. In some
embodiments, only the table itself from the accumulating map may be
transmitted to network information collector 180. In other
embodiments, some or all of the additional information, such as
indexing information, may be transmitted with the table.
[0053] The information from an accumulating map can be collected
from the network appliances and then stored in database(s) 130,
which may be a relational database. The scheme can use raw
aggregated strings and corresponding values in columns of the
database(s) 130, or separate columns can be used for each flow
attribute of the string and its corresponding values. For example,
port, computer, and domain name can all be separate columns in a
relational database, rather than stored as one column for the
string.
[0054] The reporting engine 140 allows a user 150 or network
administrator to run a query and generate a report from information
in accumulating maps that was stored in database(s) 130. For
example, a user 150 can query which websites were visited by
searching "/domain/*". A user 150 can query the top traffic types
by searching "/*/traffic type". Multi-dimensional searches can also
be run on data in database(s) 130. For example, who are they top
talkers and which websites are they visiting? For the top
destinations, who is going there? For the top websites, what are
the top traffic types? A network administrator can configure the
system to aggregate selected flow information based specifically on
the most common types of queries that are run on network data.
Further, multi-dimensional queries can be run on this aggregated
information, even though the data is not stored in a
multi-dimensional format (such as a cube).
[0055] Further, by collecting flow information for a certain time
interval in flow table 300 (e.g., once a minute), and aggregating
selected flow information into one or more accumulating maps for a
set time interval (e.g., once an hour) at the network appliance
110, only relevant flow information is gathered by network
information collector 180 and maintained in database(s) 130. This
allows for efficient scalability of a large number of network
appliances in a WAN, since the amount of information collected and
stored is significantly reduced, compared to simply collecting and
storing all information about all flows through every network
appliance for all time. Through an accumulating map, information
can be aggregated by time, appliance, traffic type, IP address,
website/domain, or any other attribute associated with a flow.
[0056] While the strings of an accumulating map are depicted herein
with slashes, the information can be stored in an accumulating map
in any format, such as other symbols or even no symbol at all. A
string can be composed of binary records joined together to make a
string, or normal ASCII text, Unicode text, or concatenations
thereof. For example, row 410 can be represented as "sampledomain1,
computer1, port1" or in any number of ways. Further, instead of
delimiting a string by characters, it can be delimited by links and
values. Information can also be sorted lexicographically.
[0057] FIG. 4B depicts exemplary information from a row of an
accumulating map. A string is composed of an attribute value 412
(such as 1.2.3.4) of a first attribute 411 (such as source IP
address), and an attribute value 414 (such as 5.6.7.8) of a second
attribute 413 (such as destination IP address). For each string of
information, there is an associated network characteristic 415
(such as number of bytes received) and its corresponding network
metric 416 (such as 54) and there can optionally be a second
network characteristic 417 (number of packets received) and its
corresponding network metric 418 (such as 13). While two network
characteristics are depicted here, there can be only one network
characteristic or three or more network characteristics. Similarly,
there can be fewer or additional attributes in a string. This
information can also be stored as a binary key string 419 as
depicted in the figure.
[0058] Furthermore, while data is discussed herein as being
applicable to a particular flow, a similar mechanism can be
utilized to gather data for a tunnel, instead of just a flow. For
example, a string of information comprising
"/tunnelname/application/website" can be gathered in an
accumulating map. In this way, information regarding which tunnel a
flow goes into and which application is using that tunnel can be
collected and stored. Data packets can be encapsulated into tunnel
packets, and a single string may collect information regarding each
of these packets as a way of tracking tunnel performance.
[0059] In various embodiments, an accumulating map, such as map
400, can have a maximum or target number of rows or records that
can be maintained. Since one purpose of the accumulating map is to
reduce the amount of flow information that is collected,
transmitted, and stored, it can be advantageous to limit the size
of the accumulating map. Once a defined number of records is
reached, then an eviction policy can be applied to determine how
new entries are processed. The eviction policy can be triggered
upon reaching a maximum number of records, or upon reaching a lower
target number of records.
[0060] In one eviction policy, any new strings of flow information
that are not already in the accumulating map will simply be
discarded for that time interval, until a new accumulating map is
started for the next time interval.
[0061] In a second eviction policy, the strings of information that
constitute overflow are summarized into a log file, called an
eviction log. The eviction log can be post-processed and
transmitted to the network information collector 180 at
substantially the same time as information from the accumulating
map. Alternatively, the eviction log may be consulted only at a
later time when further detail is required.
[0062] In a third eviction policy, when a new string needs to be
added to an accumulating map, then an existing record can be moved
from the accumulating map into an eviction log to make space for
the new string which is then added to the accumulating map. The
determination of which existing record to purge from the
accumulating map can be based on a metric. For example, the
existing entry with the least number of bytes received can be
evicted. In various embodiments there can also be a time parameter
for this so that new strings have a chance to aggregate and build
up before automatically being evicted for having the lowest number
of bytes. That is, to avoid a situation where the newest entry is
constantly evicted, a time parameter can be imposed to allow for
any desired aggregation of flows for the string.
[0063] In some embodiments, to find the existing entry with the
least number of bytes to be evicted, the whole accumulating map can
be scanned. In other embodiments, the accumulating map is already
indexed (such as via a hash table) so it is already sorted and the
lowest value can be easily found.
[0064] In further embodiments, information from an accumulating map
can be stored in bins such as those depicted in FIG. 5A. In the
exemplary embodiment of FIG. 5A, aggregated network metric values
of a network characteristic are displayed, and bins are labeled
with various numeric ranges, such as 0-10, 11-40 and 41-100. Each
network metric is associated with the bin of its numeric range.
Thus strings and their corresponding aggregated values can be
placed in an indexing structure for the accumulating map in
accordance with the metric value of their corresponding network
characteristic. As a network metric increases (for example from new
flows being aggregated into the string), or as a network metric
decreases (for example from some strings being evicted), then the
entry can be moved to a different bin in accordance with its new
numeric range. In an exemplary embodiment, the table of an
accumulating map is a first data structure, a bin is a second data
structure, and sorting operations can be conducted in a third data
structure.
[0065] Placing data from accumulating map 400 in bins allows for
eviction to occur from the lowest value bin with data. Any record
can be evicted from the lowest value bin with data, or the lowest
value bin can be scanned to find the entry with the lowest network
metric for eviction.
[0066] The bins can also be arranged in powers of two to cover
bigger ranges of values. For example, bins can have ranges of 0-1,
2-3, 4-7, 8-15, 16-31, 32-63, 64-127 and so on. In this way, the
information from accumulating map doesn't need to be kept perfectly
sorted by network metric, which can require a significant amount of
indexing.
[0067] In another exemplary embodiment, space can be freed up in an
accumulating map by combining multiple records that have common
attributes. For example, in the accumulating map of FIG. 5B, there
are two entries with the same domain and computer, but different
port numbers. The data from these entries can be combined by
keeping the domain and computer in the string, but removing the
port numbers. In this way, two or more records in the accumulating
map with common flow attributes can be aggregated into one record
by removing the uncommon attributes from the record. The bytes
received and packets received for the new condensed record is an
aggregation of the previous separate records. In this way, some
information may be lost from the accumulating map (through loss of
some granularity), but by least importance as defined by the
combination of attributes in the string (by removing a lower level
but keeping a higher level of information in the string).
Alternatively, of the two entries with the same domain and computer
but different port numbers, the record with the lowest number of
bytes may simply be evicted from the accumulating map and added to
the eviction log. There can also be a time interval allotted to the
record before it is evicted to allow flow data to be aggregated for
that string before eviction.
[0068] In a fourth eviction policy, a batch eviction can be
conducted on the accumulating map to free up space. For example, a
determination may be made of which records are the least useful and
then those are evicted from the accumulating map and logged in the
eviction log. In an exemplary embodiment, an accumulating map may
be capable of having 10,000 records. A batch eviction may remove
1,000 records at a time. However, any number of records can be
moved in a batch eviction process, and an accumulating map size can
be set to any number of records. A batch eviction can also remove
one or more bins of information.
[0069] FIG. 6 depicts an exemplary method for building a
hierarchical string and aggregating the associated values, as
discussed herein. In step 610, information about network traffic
flows is collected at a network appliance. In step 620, an
attribute value of a first attribute (or flow attributes) is
extracted when the flow ends, or on a periodic basis. For example,
if a flow attribute is source IP address, then the attribute value
of the source IP address (such as 1.2.3.4) is extracted. An
attribute value of a second flow attribute can also be extracted.
There can be any number of flow attributes extracted from flow
information. In step 630, at least one hierarchical string is built
with the extracted attribute values. For example, source IP may be
a part of only one, or multiple different hierarchical strings.
Network metric(s) for the associated network characteristic(s) of
the hierarchical string(s) are extracted in step 640, and the
network metrics are aggregated for the different flows into an
accumulating map record for each hierarchical string in step 650.
For example, a string of "/source IP/destination IP" can be built
from the various source and destination IP address combinations
with the aggregated network metrics of the network characteristic
of number of bytes exchanged between each source IP and destination
IP combination.
[0070] The aggregated information may be sent from each network
device to the network information collector 180 as discussed
herein. The information can be transmitted as raw data, or may be
subjected to processing such as encryption, compression, or other
type of processing. The network information collector 180 may
initiate a request for the data from each network appliance, or the
network appliance may send it automatically, such as on a periodic
basis after the passage of a certain amount of time (for example,
every minute, every 5 minutes, every hour, etc.).
[0071] While the method has been described in these discrete steps,
various steps may occur in a different order, or concurrently.
Further, this method may be practiced for each incoming flow or
outgoing flow of a network appliance.
[0072] Thus, methods and systems for aggregated select network
traffic statistics are disclosed. Although embodiments have been
described with reference to specific examples, it will be evident
that various modifications and changes can be made to these example
embodiments without departing from the broader spirit and scope of
the present application. Therefore, these and other variations upon
the exemplary embodiments are intended to be covered by the present
disclosure. Accordingly, the specification and drawings are to be
regarded in an illustrative rather than a restrictive sense.
* * * * *