U.S. patent application number 13/443682 was filed with the patent office on 2012-10-11 for meta-event generation based on time attributes.
Invention is credited to Debabrata Dash, Pravin S. Kothari, Hugh S. Njemanze, Shijie Wang.
Application Number | 20120260306 13/443682 |
Document ID | / |
Family ID | 46002224 |
Filed Date | 2012-10-11 |
United States Patent
Application |
20120260306 |
Kind Code |
A1 |
Njemanze; Hugh S. ; et
al. |
October 11, 2012 |
META-EVENT GENERATION BASED ON TIME ATTRIBUTES
Abstract
First stage meta-events are generated based on analyzing time
attributes of base events received from a network component. Second
stage meta-events are generated based on a number of the first
stage meta-events that have a time attribute falling within a time
period. An amount of time that has passed since a most-recent
second stage meta-event was generated is determined, and if a
threshold time period does not exceed the amount of time that has
passed since the most-recent second stage meta-event was detected,
a third stage meta-event is determined.
Inventors: |
Njemanze; Hugh S.; (Los
Altos, CA) ; Kothari; Pravin S.; (San Jose, CA)
; Dash; Debabrata; (Sunnyvale, CA) ; Wang;
Shijie; (San Jose, CA) |
Family ID: |
46002224 |
Appl. No.: |
13/443682 |
Filed: |
April 10, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10308767 |
Dec 2, 2002 |
8176527 |
|
|
13443682 |
|
|
|
|
Current U.S.
Class: |
726/1 ; 726/22;
726/23 |
Current CPC
Class: |
G06Q 10/06 20130101;
G06F 21/554 20130101 |
Class at
Publication: |
726/1 ; 726/22;
726/23 |
International
Class: |
G06F 21/00 20060101
G06F021/00 |
Claims
1-41. (canceled)
42. A computer-implemented method, comprising: generating first
stage meta-events based on analyzing time attributes of base events
received from at least one network component; generating second
stage meta-events based on a number of the first stage meta-events
that have a time attribute falling within a time period;
determining an amount of time that has passed since a most-recent
second stage meta-event was generated; and if a threshold time
period does not exceed the amount of time that has passed since the
most-recent second stage meta-event was detected, generating a
third stage meta-event.
43. The method of claim 42, wherein generating first stage
meta-events comprises: identifying a first rule that indicates a
threshold number of base events and a first time period;
determining how many base events include a time attribute that
falls within the first time period; and when the threshold number
of base events does not exceed the number of base events whose time
attributes fall within the first time period, generating a first
stage meta-event.
44. The method of claim 43, comprising: activating the first rule
dynamically.
45. The method of claim 43, comprising: detecting improper rule
syntax.
46. The method of claim 43, comprising: detecting a loop condition
generated by the first rule.
47. The method of claim 43, comprising: detecting rule
feedback.
48. The method of claim 43, comprising: filtering the base events
based on a condition before determining how many base events
include a time attribute that falls within the first time
period.
49. The method of claim 48, wherein filtering the base events based
on the condition comprises discarding the base events that do not
satisfy the condition.
50. The method of claim 42, wherein the at least one network
component comprises an intrusion detection system.
51. The method of claim 42, comprising: aligning time of the base
events generated by different network components.
52. The method of claim 42, comprising: performing an action
specified by the first rule to notify an individual of the
first-stage meta-events.
53. A system, comprising: data storage to store base events
received from at least one network component; and a processor to
generate first stage meta-events based on analyzing time attributes
of base events received from at least one network component,
generating second stage meta-events based on a number of the first
stage meta-events that have a time attribute falling within a time
period, determine an amount of time that has passed since a
most-recent second stage meta-event was generated, and if a
threshold time period does not exceed the amount of time that has
passed since the most-recent second stage meta-event was detected,
generate a third stage meta-event.
54. The system of claim 53, wherein the processor to generate first
stage meta-events comprises the processor to identify a first rule
that indicates a threshold number of base events and a first time
period, determine how many base events include a time attribute
that falls within the first time period, and when the threshold
number of base events does not exceed the number of base events
whose time attributes fall within the first time period, generate a
first stage meta-event.
55. The system of claim 54, wherein the processor is to activate
the first rule dynamically.
56. The system of claim 54, wherein the processor is to detect
improper rule syntax.
57. The system of claim 54, wherein the processor is to detect a
loop condition generated by the first rule.
58. The system of claim 54, wherein the processor is to filter the
base events based on a condition before determining how many base
events include a time attribute that falls within the first time
period.
59. The system of claim 58, wherein the processor is to discard the
base events that do not satisfy the condition.
60. A non-transitory computer readable medium storing machine
readable instructions that when executed perform instructions to:
generate first stage meta-events based on analyzing time attributes
of base events received from at least one network component;
generate second stage meta-events based on a number of the first
stage meta-events that have a time attribute falling within a time
period; determine an amount of time that has passed since a
most-recent second stage meta-event was generated; and if a
threshold time period does not exceed the amount of time that has
passed since the most-recent second stage meta-event was detected,
generate a third stage meta-event.
Description
PRIORITY
[0001] This application is a continuation of U.S. patent
application Ser. No. 10/308,767, filed Dec. 2, 2002, entitled
"CORRELATION ENGINE WITH SUPPORT FOR TIME-BASED RULES", which is
incorporated by reference in its entirety.
FIELD OF THE INVENTION
[0002] The present invention relates to a computer-based system for
the identification and processing of security events from
heterogeneous sources, including a correlation engine with support
for time--based rules.
BACKGROUND
[0003] Computer networks and systems have become indispensable
tools for modern business. Today terabits of information on
virtually every subject imaginable are stored in and accessed
across such networks by users throughout the world. Much of this
information is, to some degree, confidential and its protection is
required. Not surprisingly then, intrusion detection systems (IDS)
have been developed to help uncover attempts by unauthorized
persons and/or devices to gain access to computer networks and the
information stored therein.
[0004] Intrusion detection may be regarded as the art of detecting
inappropriate, incorrect or anomalous activity within or concerning
a computer network or system. The most common approaches to
intrusion detection are statistical anomaly detection and
pattern-matching detection. IDS that operate on a host to detect
malicious activity on that host are called host-based IDS (and may
exist in the form of host wrappers/personal firewalls or
agent-based software), and those that operate on network data flows
are called network-based IDS. Host-based intrusion detection
involves loading software on the system (the host) to be monitored
and using log files and/or the host's auditing agents as sources of
data. In contrast, a network-based intrusion detection system
monitors the traffic on its network segment and uses that traffic
as a data source. Packets captured by the network interface cards
are considered to be of interest if they match a signature.
[0005] Regardless of the data source, there are two complementary
approaches to detecting intrusions: knowledge-based approaches and
behavior-based approaches. Almost all IDS tools in use today are
knowledge-based. Knowledge-based intrusion detection techniques
involve comparing the captured data to information regarding known
techniques to exploit vulnerabilities. When a match is detected, an
alarm is triggered. Behavior-based intrusion detection techniques,
on the other hand, attempt to spot intrusions by observing
deviations from normal or expected behaviors of the system or the
users (models of which are extracted from reference information
collected by various means). When a suspected deviation is
observed, an alarm is generated.
[0006] Advantages of the knowledge-based approaches are that they
have the potential for very low false alarm rates, and the
contextual analysis proposed by the intrusion detection system is
detailed, making it easier for a security officer using such an
intrusion detection system to take preventive or corrective action.
Drawbacks include the difficulty in gathering the required
information on the known attacks and keeping it up to date with new
vulnerabilities and environments.
[0007] Advantages of behavior-based approaches are that they can
detect attempts to exploit new and unforeseen vulnerabilities. They
are also less dependent on system specifics. However, the high
false alarm rate is generally cited as a significant drawback of
these techniques and because behaviors can change over time, the
incidence of such false alarms can increase.
[0008] With both knowledge-based and behavior-based systems,
matches are detected with the aid of a rules engine. Many current
rules engines implement a standard RETE algorithm because the rules
engine's performance is demonstrably independent of the number of
rules that are used.
[0009] Regardless of whether a host-based or a network-based
implementation is adopted and whether that implementation is
knowledge-based or behavior-based, an intrusion detection system is
only as useful as its ability to discriminate between normal system
usage and true intrusions (accompanied by appropriate alerts). If
intrusions can be detected and the appropriate personnel notified
in a prompt fashion, measures can be taken to avoid compromises to
the protected system. Otherwise such safeguarding cannot be
provided. Accordingly, what is needed is a system that can provide
accurate and timely intrusion detection and alert generation so as
to effectively combat attempts to compromise a computer network or
system
SUMMARY OF INVENTION
[0010] A rules engine with support for time-based rules is
disclosed. A method performed by the rules engine, comprises
receiving security events generated by a number of network devices.
The security events are aggregated. One or more time-based rules
are provided to a RETE engine. The aggregated security events are
provided to the RETE engine at specific times associated with the
time-based rules. The security events are cross-correlated with the
one or more time-based rules; and one or more first stage
meta-events are reported.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The present invention is illustrated by way of example, and
not limitation, in the figures of the accompanying drawings in
which:
[0012] FIG. 1 illustrates one embodiment of a computer-based system
for capturing, normalizing and reporting security events from
heterogeneous sources configured in accordance with the present
invention;
[0013] FIG. 2 illustrates procedures followed by an agent
configured in accordance with an embodiment of the present
invention when collecting, normalizing and reporting security event
data;
[0014] FIG. 3 illustrates procedures followed by a manager
configured in accordance with an embodiment of the present
invention when analyzing security event data and generating alerts
based thereon;
[0015] FIG. 4 illustrates one embodiment of a rules engine in
accordance with the present invention; and
[0016] FIGS. 5A and 5B illustrate a set of procedures followed by a
rules engine in accordance with an embodiment of the present
invention.
DETAILED DESCRIPTION
[0017] Described herein is a computer-based system for the
identification and processing of security events from heterogeneous
sources, including a correlation engine with support for time-based
rules. The system (one embodiment of which is manifest as computer
software), implements a method that comprises receiving security
events generated by a number of network devices. The security
events are aggregated. One or more time-based rules are provided to
a RETE engine. The aggregated security events are provided to the
RETE engine at specific times associated with the time-based rules.
The security events are cross-correlated with the one or more
time-based rules; and one or more first stage meta-events are
reported.
[0018] Although the present system will be discussed with reference
to various illustrated examples, these examples should not be read
to limit the broader spirit and scope of the present invention. For
example, the examples presented herein describe distributed agents,
managers and consoles, which are but one embodiment of the present
invention. The general concepts and reach of the present invention
are much broader and may extend to any computer-based or
network-based security system. Also, examples of the messages that
may be passed to and from the components of the system and the data
schemas that may be used by components of the system are given in
an attempt to further describe the present invention, but are not
meant to be all-inclusive examples and should not be regarded as
such.
[0019] Some portions of the detailed description that follows are
presented in terms of algorithms and symbolic representations of
operations on data within a computer memory. These algorithmic
descriptions and representations are the means used by those
skilled in the computer science arts to most effectively convey the
substance of their work to others skilled in the art. An algorithm
is here, and generally, conceived to be a self-consistent sequence
of steps leading to a desired result. The steps are those requiring
physical manipulations of physical quantities. Usually, though not
necessarily, these quantities take the form of electrical or
magnetic signals capable of being stored, transferred, combined,
compared and otherwise manipulated. It has proven convenient at
times, principally for reasons of common usage, to refer to these
signals as bits, values, elements, symbols, characters, terms,
numbers or the like. It should be borne in mind, however, that all
of these and similar terms are to be associated with the
appropriate physical quantities and are merely convenient labels
applied to these quantities. Unless specifically stated otherwise,
it will be appreciated that throughout the description of the
present invention, use of terms such as "processing", "computing",
"calculating", "determining", "displaying" or the like, refer to
the action and processes of a computer system, or similar
electronic computing device, that manipulates and transforms data
represented as physical (electronic) quantities within the computer
system's registers and memories into other data similarly
represented as physical quantities within the computer system
memories or registers or other such information storage,
transmission or display devices.
[0020] As indicated above, one embodiment of the present invention
is instantiated in computer software, that is, computer readable
instructions, which, when executed by one or more computer
processors/systems, instruct the processors/systems to perform the
designated actions. Such computer software may be resident in one
or more computer readable media, such as hard drives, CD-ROMs,
DVD-ROMs, read-only memory, read-write memory and so on. Such
software may be distributed on one or more of these media, or may
be made available for download across one or more computer networks
(e.g., the Internet). Regardless of the format, the computer
programming, rendering and processing techniques discussed herein
are simply examples of the types of programming, rendering and
processing techniques that may be used to implement aspects of the
present invention. These examples should in no way limit the
present invention, which is best understood with reference to the
claims that follow this description.
[0021] Referring now to FIG. 1, an example of a computer-based
system 10 architected in accordance with an embodiment of the
present invention is illustrated. System 10 includes agents 12, one
or more managers 14 and one or more consoles 16 (which may include
browser-based versions thereof). In some embodiments, agents,
managers and/or consoles may be combined in a single platform or
distributed in two, three or more platforms (such as in the
illustrated example). The use of this multi-tier architecture
supports scalability as a computer network or system grows.
[0022] Agents 12 are software programs that provide efficient,
real-time (or near real-time) local event data capture and
filtering from a variety of network security devices and/or
applications. The primary sources of security events are common
network elements including firewalls, intrusion detection systems
and operating system logs. Agents 12 can collect events from any
source that produces event logs or messages and can operate at the
native device, at consolidation points within the network, and/or
through simple network management protocol (SNMP) traps.
[0023] Managers 14 are server-based components that further
consolidate filter and cross-correlate events received from the
agents, employing a rules engine 18 and a centralized event
database 20. One role of manager 14 is to capture and store all of
the real-time and historic event data to construct (via database
manager 22) a complete, enterprise-wide picture of security
activity. The manager 14 also provides centralized administration,
notification (through one or more notifiers 24), and reporting, as
well as a knowledge base 28 and case management workflow. The
manager 14 may be deployed on any computer hardware platform and
one embodiment utilizes an Oracle.TM. database. Communications
between manager 14 and agents 12 may be bi-directional (e.g., to
allow manager 14 to transmit commands to the platforms hosting
agents 12) and encrypted. In some installations, managers 14 may
act as concentrators for multiple agents 12 and can forward
information to other managers (e.g., deployed at a corporate
headquarters).
[0024] Consoles 16 are computer- (e.g., workstation-) based
applications that allow security professionals to perform
day-to-day administrative and operation tasks such as event
monitoring, rules authoring, incident investigation and reporting.
Access control lists allow multiple security professionals to use
the same system and event database, with each having their own
views, correlation rules, alerts, reports and knowledge base
appropriate to their responsibilities. A single manager 14 can
support multiple consoles 16.
[0025] In some embodiments, a browser-based version of the console
16 may be used to provide access to security events, knowledge base
articles, reports, notifications and cases. That is, the manager 14
may include a web server component accessible via a web browser
hosted on a personal computer (which takes the place of console 16)
to provide some or all of the functionality of a console 16.
Browser access is particularly useful for security professionals
that are away from the consoles 16 and for part-time users.
Communication between consoles 16 and manager 14 is bi-directional
and may be encrypted.
[0026] Through the above-described architecture the present
invention can support a centralized or decentralized environment.
This is useful because an organization may want to implement a
single instance of system 10 and use an access control list to
partition users. Alternatively, the organization may choose to
deploy separate systems 10 for each of a number of groups and
consolidate the results at a "master" level. Such a deployment can
also achieve a "follow-the-sun" arrangement where geographically
dispersed peer groups collaborate with each other by passing
primary oversight responsibility to the group currently working
standard business hours. Systems 10 can also be deployed in a
corporate hierarchy where business divisions work separately and
support a rollup to a centralized management function.
[0027] Examining each of the various components in further detail,
we begin with the agents 12. Agents 12 are used to collect, reduce
and normalize the enormous amount of data that is generated by a
network's security devices before a manager 14 acts on the data. As
will become evident, this process goes beyond simple log
consolidation. Before presenting those details, however, and to
understand why such measures are desirable, some background
regarding how analysts currently cope with security event
information generated by multiple network devices is useful.
[0028] Conventional intrusion detection systems can help an analyst
detect an attack directed at a network resource such as a server.
Usually, such investigations are launched in response to an alert
generated by the IDS. As a first step after receiving such an
alert, an analyst might review perimeter router logs to see if a
router associated with the network passed a packet that triggered
the alert. If such a packet were discovered, the analyst would
likely then want to review one or more firewall logs to see if any
existing filters blocked the suspect packet. Assume, for the sake
of this example, the suspect packet got past any firewalls; further
investigation would be necessary to determine whether the integrity
of the server itself was compromised. Such an integrity check may
be performed using a conventional software application such as
Tripwire, which is a file integrity checker employing MD5
checksums, to see which files, if any, had been accessed or
modified. Finally, the analyst may have to examine a Syslog or an
EventLog from the subject server, as well as any tcpdump data
collected by a dedicated tcpdump host, for the segment of time
surrounding the attack to determine what actually happened.
[0029] By this time the analyst has accessed many different systems
and looked at several different types of logs in an effort to
distill a comprehensive view of the attack. This can be a
significant amount of work, and time taken in such review and
analysis is time lost from the vitally important tasks of securing
the network and restoring the compromised server to make sure that
no other systems will be affected. The present invention helps to
minimize the time spent on such analysis by consolidating all the
relevant information in a single logging facility, allowing the
analyst to look at the data in whatever sequence or depth he or she
requires.
[0030] More than just consolidation, though, the present agents 12
provide data normalization, which is of great benefit when an
analyst must deal with security incidents in a heterogeneous
network environment. To understand why normalization is helpful
consider a typical enterprise environment, which consists of many
different types of network devices ranging from border routers and
VPN devices, to firewalls and authentication servers, and a wide
range of application servers such as web servers, e-mail servers
and database servers. Each of these devices generates logs that, as
described above, are sources of data to a security analyst.
However, it is seldom, if ever, the case that two manufactures will
use the same event logging mechanism or format their event logs
identically. For example a Cisco Systems PIX.TM. firewall will not
report an accepted packet in the same way as a Check Point firewall
or even in the same fashion as a Cisco Systems router.
[0031] An example of the types of various reports that might be
generated by different network devices is presented below in Table
1, which shows examples of logs from different network devices,
each reporting the same packet traveling across a network. In
particular, these logs represent a remote printer buffer overflow
that connects to IIS servers over port 80.
TABLE-US-00001 TABLE 1 Examples of Event Logs for Different Network
Devices. Network Device Event Log Check Point firewall ''14''
''21Dec2001'' ''12:10:29'' ''eth-s1p4c0'' ''ip.of.firewall''
''log'' "accept'' ''www-http'' ''65.65.65.65'' ''10.10.10.10''
''tcp'' ''4'' ''1355'' '' '' '' '' '' '' '' '' '' '' '' '' '' '' ''
'' '' '' ''firewall'' '' len 68'' Cisco Systems router Dec 21
12:10:27: %SEC-6-IPACCESSLOGP: list 102 permitted tcp
65.65.65.65(1355) -> 10.10.10.10(80), 1 packet Cisco Systems PIX
Dec 21 2001 12:10:28: %PIX-6-302001: Built inbound TCP connection
firewall 125891 for faddr 65.65.65.65/1355 gaddr 10.10.10.10/80
laddr 10.0.111.22/80 Snort [**] [1:971:1] WEB-IIS ISAPI .printer
access [**] [Classification: Attempted Information Leak] [Priority:
3] 12/21-12:10:29.100000 65.65.65.65:1355 -> 10.10.10.10:80 TCP
TTL:63 TOS:0x0 ID:5752 IpLen:20 DgmLen:1234 DF ***AP*** Seq:
0xB13810DC Ack: 0xC5D2E066 Win: 0x7D78 TcpLen: 32 TCP Options (3)
=> NOP NOP TS: 493412860 0 [Xref => http://cve.mitre.org/cgi-
bin/cvename.cgi?name=CAN-2001-0241] [Xref =>
http://www.whitehats.com/info/IDS533]
[0032] The Check Point record contains the following fields: event
id, date, time, firewall interface, IP address of the firewall
interface, logging facility, action, service, source IP, target IP,
protocol, source port, some Check Point specific fields and then
the size of the datagram. This report is, to say the least,
difficult for a human analyst to read (especially with all the
empty fields that are represented by double quotes). The Cisco
router has a different format: date, time, logging facility, event
name, source IP, source port, target address, target port, and
number of packets. The Cisco PIX firewall, which is produced by the
same manufacturer as the router, uses yet another format: date,
time, event name, source IP, source port, translated address or
target address, target port, local address, and local port.
[0033] The final record is a Snort alert that claims this traffic
was malicious. Snort is a well-known IDS and the fields it
populates are: exploit or event name, classification, priority,
date, time, source IP, source port, target IP, target port,
protocol, TTL (time to live), type of service, ID, IP length,
datagram length, tcp flags, sequence number, acknowledgement
number, window size, and tcp length. Snort also reports additional
data such as references to investigate the exploit.
[0034] Agents 12 may be deployed in connection with some or all of
these (and other) network components and applications. For example,
in FIG. 1, agent 12a is deployed in connection with an IDS (such as
Snort). Agent 12b is deployed in connection with a firewall (such
as the Check Point firewall and/or the Cisco PIX firewall). Agent
12c is deployed in connection with other network components or
agents (e.g., a router). Each of these agents receives the event
information from its associated network device or application in
that device's or application's native format and converts (or
normalizes) the information to a common schema. This normalization
allows for later storage of the event information in a format that
can more readily be utilized by an analyst.
[0035] Many normalized schemas can be used and, in general,
choosing the fields of a common schema may be based on content
rather than semantic differences between device logs and/or
manufacturers. To accomplish this normalization, agents 12 are
equipped with a parser configured to extract values from the events
as reported by the individual network devices/applications and
populate the corresponding fields in the normalized schema. Table 2
is an example of a normalized schema for the data reported by the
devices in Table 1.
TABLE-US-00002 TABLE 2 Common Schema Representation of Event Data
Event Device Date Time Name Src_IP Src_Port Tgt_IP Trg_Port Type
Additional data Dec. 21, 2001 12:10:29 accept 65.65.65.65 1355
10.10.10.10 80 Check Point Dec. 21, 2001 12:10:27 list 102
65.65.65.65 1355 10.10.10.10 80 Cisco permitted Router tcp Dec. 21,
2001 12:10:28 built 65.65.65.65 1355 10.10.10.10 80 Cisco inbound
PIX tcp connection Dec. 21, 2001 12:10:29 WEB-IIS 65.65.65.65 1355
10.10.10.10 80 Snort TCP TTL: 63 ISAPI. TOS: 0x0 ID: 5752 printer
IpLen: 20 access DgmLen: 1234 DF ***AP*** Seq: 0xB13810DC Ack:
0xC5D2E066 Win: 0x7D78 TcpLen: 32 TCP Options (3) =>NOP NOP TS:
493412860 0
[0036] Table 2 reports the same four events described earlier, this
time in a normalized fashion. Each of the agents 12 is configured
to extract the relevant data from events reported by its associated
network device/application and map that data to the corresponding
common schema representation. For instance the Check Point firewall
reports a target port as www-http, not as port 80 as is the case
for most other network devices. Therefore an agent 12 associated
with the Check Point firewall is configured with an appropriate
lookup mechanism (e.g., a table) to ensure that "www-http" as
reported by the firewall gets translated into "port 80" when the
agent 12 reports the event to the manager 14.
[0037] Similarly, the agents 12 may need to be configured to
convert the date/time stamp formats used by the various network
devices/applications into a common date/time representation. That
is, because the different network devices/applications all use
different date/time formats, the agents cannot simply report the
date/time stamps reported by the device/application. Instead, the
agents 12 may be configured to convert local date/time stamps to a
universal date/time notation, such as Greenwich Mean Time.
[0038] In addition to normalizing event data by fields, agents 12
can parse the event data stream and set field values based on
conventions and practices of the organization. For example, the
variety of event severity levels that devices produce can all be
normalized at the agent level into a single, consistent
hierarchy.
[0039] Thus, agents 12 collect and process events generated by
heterogeneous network devices/applications throughout an
enterprise. Alerts can come from routers, e-mail logs, anti-virus
products, firewalls, intrusion detection systems, access control
servers, VPN systems, NT Event Logs, Syslogs, and other sources
where security threat information is detected and reported. In some
embodiments, each event generator has an agent 12 assigned to
collect all relevant security information, while in other
embodiments agents are shared among two or more event generators.
Thus, depending on the device/application to be monitored and the
in-place infrastructure, a choice is provided for simple log
parsing and loading, network listening (e.g., through SNMP traps),
installation on aggregation points (Syslog servers and
concentrators) and full distribution to all security-relevant
devices.
[0040] In addition to collecting and normalizing data from security
devices, the agents 12 intelligently manage the data with: [0041]
Filtering: each agent 12 can be configured according to conditions
by which data will be collected and sent to the manager 14. This
helps to reduce the need to collect and manage large volumes of
unwanted data. [0042] Aggregation: Based on the time period
selected, the agents 12 can collect duplicate alerts but send only
a single message with a count of the total number of such alerts to
the manager 14. This helps reduce the amount of traffic transmitted
across the network. [0043] Batching: Agents 12 can be configured to
send a collection of alerts at one time rather than sending alerts
immediately after each occurrence.
[0044] FIG. 2 illustrates the various processes performed by agents
12 from the point of view of the event information. Initially, at
step 30, the raw event information is received or collected from
the native network device or application in that
device's/application's native format. At this point (or,
optionally, following normalization), data filters may be applied
to reduce the volume of data being passed for further analysis
(step 32). Such filtering is optional and may involve assessing the
captured data against one or more conditions to determine whether
or not the data is relevant for further analysis.
[0045] Thereafter, the event data is normalized at step 34. As
indicated above, the normalization may occur at the field and/or
the field value level. Further, the normalization may involve
translation of the field values into nomenclatures/formats used
across an enterprise.
[0046] Following normalization, the event data may, optionally, be
aggregated (step 36) before being transmitted to the manager 14
(step 38). The transmissions may occur as the events are captured
or may be made on a batched basis. In either case, the messages
used to transmit the event data preferably include all of the
source fields of an event. By delivering the entire event data set
(i.e., all of the source fields) organized in a consistent format
(i.e., the common schema), powerful upstream data management,
cross-correlation, display and reporting is available to the
security team. In some embodiments the event data is discarded
after successful transmission to the manager 14, but in other cases
the data may be cached for a time at the agent 12 to permit later
replay of the data.
[0047] Referring again to FIG. 1, the manager 14 includes one or
more agent managers 26, which are responsible for receiving the
event data messages transmitted by the agents 12. Where
bi-directional communication with the agents 12 is implemented,
these agent managers 26 may be used to transmit messages to the
agents 12. If encryption is employed for agent--manager
communications (which is optional), the agent manager 26 is
responsible for decrypting the messages received from agents 12 and
encrypting any messages transmitted to the agents 12.
[0048] Once the event data messages have been received (and if
necessary decrypted), the event data is passed to the rules engine
18. Rules engine 18 is at the heart of the manager 14 and is used
to cross-correlate the event data with security rules in order to
generate meta-events. Meta-events, in the context of the present
invention, are instances of (usually) multiple individual event
data elements (gathered from heterogeneous sources) that
collectively satisfy one or more rule conditions such that an
action is triggered. Stated differently, the meta-events represent
information gathered from different sensors and presented as
correlated results (i.e., the decision output of the rules engine
18 indicating that different events from different sources are
associated with a common incident as defined by one or more
rules).
[0049] The actions triggered by the rules may include notifications
transmitted (e.g., via notifier 24) to designated destinations
(e.g., security analysts may be notified via the consoles 16,
e-mail messages, a call to a telephone, cellular telephone,
voicemail box and/or pager number or address, or by way of a
message to another communication device and/or address such as a
facsimile machine, etc.) and/or instructions to network devices
(e.g., via agents 12 or via external scripts or programs to which
the notifier 24 may pass arguments) to take action to thwart a
suspected attack (e.g., by reconfiguring one or more of the network
devices, and or modifying or updating access lists, etc.). The
information sent with the notification can be configured to include
the most relevant data based on the event that occurred and the
requirements of the analyst. In some embodiments, unacknowledged
notifications will result in automatic retransmission of the
notification to another designated operator.
[0050] As discussed below, when meta-events are generated by the
rules engine 18, on-screen notifications may be provided to
consoles 16 to prompt users to open cases for investigation of the
events which led to the notification. This may include accessing
knowledge base 28 to gather information regarding similar attack
profiles and/or to take action in accordance with specified
procedures. The knowledge base 28 contains reference documents
(e.g., in the form of web pages and/or downloadable documents) that
provide a description of the threat, recommended solutions,
reference information, company procedures and/or links to
additional resources. Indeed, any information can be provided
through the knowledge base 28. By way of example, these
pages/documents can have as their source: user-authored articles,
third-party articles, and/or security vendors' reference
material.
[0051] The rules engine 18 is based on a RETE engine configured to
preserve event information state over configurable time windows so
as to provide correlation of the event data according to specified
rules. Correlation is generally regarded as a process of bringing
information items into mutual relation. In the context of the
present invention, correlation through rules engine 18 provides the
ability to access, analyze, and relate different attributes of
events from multiple sources to bring something to the attention of
an analyst that might (or likely would) have otherwise gone
unnoticed. In other words, the rules engine 18 provides the ability
to determine what type of incident is represented by a collection
of events reported by a number of heterogeneous network devices
and/or applications. Because the collected event data is normalized
into a common event schema, correlation can be performed utilizing
any field including, but not limited to, geography, device type,
source, target, time thresholds, and/or event type. Based on alerts
generated by the rules engine 18, operators are provided with a
workflow for investigating these incidents.
[0052] Turning to FIG. 3, the manager 14 receives (step 40) and
analyzes (step 42) the event data reported by agents 12 in
real-time (or near real-time owing to network latencies and
depending upon whether or not batched message transmission is used)
according to a set of flexible rules. The rules define which events
generate an alert, when those events generate an alert, and what
actions are associated with the alert. Hence, the rules may be
written to contain event conditions, thresholds, and actions. In
some embodiments the rule conditions may be specified using Boolean
operators and/or database queries. When incoming events match a
particular rule's conditions and thresholds, causing a meta-event
to be generated (step 44), the rule automatically fires the action
that has been defined (step 46). Such actions can include, but are
not limited to: executing a pre-determined command or script,
logging the alert, sending the alert to the consoles 16, sending
the alert to notification designees, setting custom severity levels
for the alert based on cumulative activity, adding a source to a
suspicious list or a target to a vulnerable list, and/or a
combination of these actions.
[0053] Rules may be created at the manager 14 and/or at the
consoles 16 using a flexible scripting language. An example of a
rule might be: [0054] If(an ids evasion attack) occurs (from the
same source ip address) (3 times) within (2 minutes) then (send
message to console) and (notify the security supervisor via pager).
In this example, the incoming event data would be compared against
the rule conditions and thresholds (in the above example 3 events
that satisfy the condition of an IDS evasion attack are required
and all must originate from a common source IP address and be
detected within 2 minutes of each other), and if those criteria are
satisfied the designated actions (here, sending an alert message to
the consoles 16 and also notifying a security supervisor via a
pager) would be performed. The correlation rules that operate on
the events evaluate threats and attacks according to selected
criteria (e.g., degree of threat, level of success, vulnerability
of target and value of target) and generate alerts according to a
security intelligence taxonomy that focuses attention on the most
dangerous and potentially most damaging attacks. For example,
threats to network assets that are deemed not to have succeeded or
that are not likely to succeed may be coded green, while those that
have succeeded or have a high probability of success might be coded
red. The value of the security information taxonomy lies in its
ability to eliminate false positives while clearly identifying real
threats to vulnerable and valuable assets.
[0055] In general, the rules may be designed to capture threats and
attacks that are typical in large, diverse networks and may be
organized to provide multiple lines of defense by detecting
specific activities and grouping them according to level of threat:
[0056] Reconnaissance zone transfer, port scan, protocol, scanning,
etc. [0057] Suspicious illegal outgoing traffic, unusual levels of
alerts from the same host, etc. [0058] Attack overflow, IDS
evasion, virus, denial of service, etc. [0059] Successful
compromise of a backdoor, root compromise, covert channel exploit,
etc. Similar events and signatures may be grouped into rule
categories that can be utilized by the rules to insulate the rule
from changes in vendor-specific event details. For example, event
names may change between product releases or new devices may be
added to the network infrastructure with a new set of nomenclature.
Since the rule categories map similar signatures into a single name
that is used by the rules engine, if an individual network device
changes taxonomy, only the mapping is changed, not the rule
definition. Therefore, despite changes in individual devices, the
investment in custom defined rules is preserved.
[0060] After the events are processed by rules engine 18, the raw
event data as well as any meta-events that were generated are
stored in database 20 (step 48). In some embodiments, the raw event
data may be stored prior to or concurrently with processing of the
data by rules engine 18. Regardless of the sequence, such storage
of the event data (and the meta events generated by the rules
engine 18) preserves a historical record of the event traffic and
allows for replaying of the events through an existing or a new
rule set (either at the manager 14 or the consoles 16) in order to
assess the efficacy of new rules, for training purposes, and/or for
case investigation.
[0061] Correlation via the rules ensures that credible threats and
attacks come to the attention of the security staff on a
high-priority basis. Hence once an alert is received, the operator
can perform in-depth analysis and take aggressive action secure in
the knowledge that the effort is well spent. When a rule match is
reported to a console 16, the analyst can quickly drill down
(through an associated graphical user interface) to see all of the
individual events that caused the rule to fire. If necessary, the
analyst can investigate even further to see all of the individual
data elements captured for those events.
[0062] When action is required, the present invention provides a
full set of tools and services for the operator. Resources such as
the rule definition, a knowledge base article containing company
policies and recommended actions, and the development of a complete
case docket describing the problem assist the operator in
responding immediately to critical security threats. If necessary,
the operator can proactively deal with an attack by launching
specific applications or scripts from the console 16 to reconfigure
device settings or change access privileges.
[0063] The console 16 provides a centralized view into the security
status of an enterprise and gives administrators, analysts, and
operators an interface to perform security management tasks. In
various embodiments, the console provides event display in
real-time or in replay mode (i.e., the ability to playback events
from a given time period according to a VCR or DVD metaphor).
Replay may be had from the events stored in database 20 or, in some
instances, from caches associated with agents 12. This latter form
of replay is especially useful because it provides improved
simulation of actual network conditions as the events are played
out across the same network as during the original attack.
[0064] Consoles 16 also provide operators with complete drill-down
capability from the highest level of detail (e.g., the entire rage
of events) to the lowest level of detail (e.g., fields within a
single event). This allows analysts to probe at whatever level of
detail is required to gain further insight into an attack and
assess vulnerability. This varying level of detailed analysis is
made possible because the agents 12 report all of the event data
fields, not merely a subset thereof. By way of example, one tool
provides analysts with the ability to quickly see similar
characteristics of events using a cursor control operation, such as
a mouse click. For example, if analysts are presented with a
meta-event alert that consists of, say, twenty or more individual
events reported by several different agents associated with
different network devices, the present user interface associated
with consoles 16 allows the analyst to quickly visualize only the
common fields of these events (e.g., such as a source IP address)
by simply highlighting the events and performing a mouse
click/select operation.
[0065] Once security personnel have been notified of a meta-event,
they can utilize the knowledge base to determine the appropriate
actions. In addition, security analysts may undertake
investigations of events and/or meta-events. In general, such
matters can be assigned to so-called cases. Stated differently,
cases create a workflow and oversight environment for situations
where there are suspicious events requiring further investigation.
Once a case is created, it can be assigned to an operator,
investigated, and resolved based on the business policies and
practices of the enterprise (e.g., as documented in knowledge base
28). The security staff can also add narration and event
information to a case, or view open cases to determine their status
and any required next steps.
[0066] Consoles 16 also provide a front-end for the administration
of the entire system 10. This may include system configuration such
as setting up operators, notification, agent behavior, etc. User
management (such as creating and modifying users, access, roles,
and responsibilities), rules management (e.g., authoring, viewing,
and updating rules), and workflow management (e.g., setting up the
flow of actions taken when an event is received) may also be
handled through the consoles 16. Finally, the consoles 16 allow for
remote access, thus supporting divisional responsibility and
"follow-the-sun" management.
[0067] Having thus described the elements of system 10, it is
helpful to present a more in depth look at rules engine 18. As
described above, rules engine 18 is used to cross-correlate the
event data with security rules in order to generate meta-events.
FIG. 4 illustrates one embodiment of a rules engine 18 configured
in accordance with the present invention. Rules engine 18 includes
six components; a partial matcher 410, a memory manager 420, a RETE
engine 430, a time tracker 440, a rules manager 490, and an action
engine 450.
[0068] The partial matcher 410, receives events via the agent
manager 26. The events may be gathered from different security
devices. Partial matcher 410 determines which rule in the system is
interested in a particular event it received. An event is
considered to be interesting if a rule used by the system mentions
one or more attributes of the event. For example, an event can be
considered interesting if the event has a particular source address
from a particular subnet. Partial matcher 410 groups, batches, or
aggregates interesting events together that are related to one
another and one or more conditions of a particular rule.
Additionally, the partial matcher 410 is aware of time windows
associated with a rule used by the system. By knowing the time
window associated with a rule, partial matcher 410 can compute the
last moment in time at which that rule remains interesting, and
should be processed by RETE engine 430.
[0069] The memory manager 420 keeps track of the events that are
operated on by RETE engine 430. Aggregated events are passed from
the partial matcher 410 to the memory manager 420. The aggregated
events can have an expiration time, (i.e., the aggregated event is
only of interest for a period of time.) Memory manager 420 provides
events to RETE Engine 430 and deletes the events when the event
reaches its expiration time. In other words, memory manager 420
feeds and deletes events to and from the RETE engine 430 as needed
to provide statefulness.
[0070] RETE engine 430 implements the RETE algorithm which scales
to many hundreds of rules while its performance is independent of
the number of rules it considers. In operation, RETE engine 430
loads user-written rules, that are time-based and once the rule is
active, engine 430 analyzes the events provided by memory manager
420 and generates a first stage meta-event which can result in the
performance of an action in response to the correlated events. More
specifically, the RETE engine 430, reports instances where the
rules are satisfied.
[0071] Rules manager 490 provides user-defined, time-based rules to
the RETE engine 430. The user-defined rules are generated via
console/browser interfaces 16. Furthermore, a user can provide
instructions to rules manager 490 for activating or deactivating
rules dynamically.
[0072] Time tracker 440 allows the RETE engine 430 to process
time-based rules. Time-based rules are triggered when events that
occur over a period of time collectively are recognized as being
associated with similar occurrences. Time tracker 440 receives
meta-events generated by RETE engine 430 and groups related
meta-events together. This correlation of the meta-events is used
to determine if the threshold of a time-based rule is reached. For
example, time-based rules can require that an event occur ten times
in an hour to signify that an action need be performed. If that
threshold is not reached, the group of events is terminated. If the
threshold is reached, then a second stage meta-event is generated.
Time tracker 440 communicates with action engine 450, which
executes the actions specified in the rule loaded into RETE engine
430. Meta-events are a hypothesized description of the real world
scenario of what various sensors of security devices independently
report as events. These meta-events may then be fed back into rules
engine 18 to be used as an event. Action engine 450 can notify the
user that a meta-event occurred via email, website, or with
notifier on the console. Additionally, time tracker 440 reports a
third stage meta-event if no second stage meta-events occur or when
second stage meta-events cease to occur. The third stage meta-event
detects the end of a security attack and indicates the magnitude of
the attack.
[0073] Although the functional blocks of a rules engine are
depicted in one embodiment within rules engine 18, one or more of
these functional blocks can be distributed in other systems. Rules
engine 18 has additional functionalities, such as detection of
improper rule syntax, loop detection, rule feedback detection,
aggregation of joint events, and timeline alignment. Detection of
improper rule syntax allows for the deactivation of rules that are
abusive to the system, such as inefficient memory consumption, or
CPU usage. Rules engine 18 can also detect if a user defined rule
generates a loop condition in which the same events are provided
repeatedly to the RETE engine 430. Similarly, rules engine 18 can
detect rule feedback. Rule feedback occurs when a meta-event that
is generated by rules engine 18 is fed back into rules engine 18
and results in an abusive or destructive consumption of memory or
CPU processing.
[0074] FIG. 5 illustrates procedures followed by a rules engine 18
in accordance with an embodiment of the present invention. Partial
matcher 410 receives an event (step 510) and determines if the
event is interesting as described above (decision block 530). If
the event is not of interest, the next event is received by partial
matcher 410 (step 520) and the process determines if the next event
is interesting (decision block 530). If an event is determined to
be of interest, these interesting events are aggregated by partial
matcher 410 (step 530). In parallel with the processing performed
by partial matcher 410, rules manager 490 compiles a user-defined,
time-based rule (step 545) associated with the aggregated events of
interest. The compiled rule and aggregated events are provided to
RETE engine 430 (step 550) for a time period specified by the rule.
If the time period expires, the time-based rule is no longer
provided to the RETE engine 430.
[0075] Upon receipt of the compiled rule and interesting events,
the RETE engine 430 determines if one or more of the aggregated
events matches a processed rule (decision block 560). If no matches
occur, then new events are received at rules engine 18 (step 520),
and the process described above is repeated. If a match occurs,
then a first stage meta-event is generated (step 570). Although not
shown, first stage meta-events can be reported, via the console 16
or with another reporting mechanism described above.
[0076] First stage meta-events are aggregated and processed by time
tracker 440 when determining if in subsequent time periods, whether
a threshold level of repeat matches of similar interesting events
occur (decision block 580). For example, if five first stage
meta-events occurred in a first ten minute time period, time
tracker 440 determines if five more first stage meta-events occur
in a subsequent ten minute time period. If the threshold is met
(i.e., five meta-events occur in ten minutes), then a second stage
meta-event is generated (step 590). After generating a second stage
meta-event, the process described above repeats, by accepting
events at a rules engine 18 (step 520). Additional second stage
meta-events can occur, however, if no second stage meta-events
occur or stop occurring in subsequent time periods, a third stage
meta-event is generated (step 599). The generation of a third stage
meta-event signifies that a security attack on the system has
ended. It also measures how long an attack was, and the attack's
magnitude in terms of the number of network computers attacked.
Both second and third stage meta-events can be reported as actions
performed by action engine 450. By reporting second stage
meta-events, the detection of repetitive behavior attacks and
enterprise wide attacks is improved. Process 500 can occur within
the process shown in FIG. 3, above. Specifically, process 500
describes how event data is processed through rules engine 18, as
shown at step 42 of FIG. 3.
[0077] Thus, a computer-based system for capturing correlating and
reporting security events from heterogeneous sources, including a
correlation engine with support for time-based rules has been
described. In the foregoing description, the various examples and
embodiments were meant to be illustrative of the present invention
and not restrictive in terms of their scope. Accordingly, the
invention should be measured only in terms of the claims, which
follow.
* * * * *
References