U.S. patent application number 12/548281 was filed with the patent office on 2010-03-04 for detection of non-occurrences of events using pattern matching.
This patent application is currently assigned to Oracle International Corporation. Invention is credited to Rakesh Komuravelli, Shailendra Mishra, Anand Srinivasan.
Application Number | 20100057737 12/548281 |
Document ID | / |
Family ID | 41726784 |
Filed Date | 2010-03-04 |
United States Patent
Application |
20100057737 |
Kind Code |
A1 |
Srinivasan; Anand ; et
al. |
March 4, 2010 |
DETECTION OF NON-OCCURRENCES OF EVENTS USING PATTERN MATCHING
Abstract
Techniques for detecting non-occurrence of an event within a
time period following the occurrence of another event. In one
embodiment, language extensions are provided to a language that
enable queries to be formulated for detecting non-occurrences using
that language.
Inventors: |
Srinivasan; Anand;
(Bangalore, IN) ; Komuravelli; Rakesh; (Hyderabad,
IN) ; Mishra; Shailendra; (Fremont, CA) |
Correspondence
Address: |
TOWNSEND AND TOWNSEND AND CREW LLP/ORACLE
TWO EMBARCADERO CENTER, 8TH FLOOR
SAN FRANCISCO
CA
94111-3834
US
|
Assignee: |
Oracle International
Corporation
Redwood Shores
CA
|
Family ID: |
41726784 |
Appl. No.: |
12/548281 |
Filed: |
August 26, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61092983 |
Aug 29, 2008 |
|
|
|
Current CPC
Class: |
G06K 9/62 20130101 |
Class at
Publication: |
707/6 ; 719/318;
707/E17.039 |
International
Class: |
G06F 17/30 20060101
G06F017/30; G06F 9/44 20060101 G06F009/44 |
Claims
1. A computer-readable storage medium storing a plurality of
instructions for controlling a processor to process a data stream
of events, the plurality of instructions comprising: instructions
that cause the processor to receive a query for detecting
non-occurrence of a first event within a time period following
occurrence of a second event, the query specifying a pattern;
instructions that cause the processor to generate a modified
pattern by adding a first symbol to the pattern; instructions that
cause the processor to generate an automaton for the query based
upon the modified pattern; and instructions that cause the
processor to detect an instance in the data stream of
non-occurrence of the first event within the time period following
occurrence of the second event using the automaton.
2. The computer-readable storage medium of claim 1 wherein the
plurality of instructions further comprises: instructions that
cause the processor to determine whether the query is for detecting
non-occurrence of the first event; and instructions that cause the
processor to, generate the modified pattern only upon determining
that the query is for detecting non-occurrence of the first
event.
3. The computer-readable storage medium of claim 1 wherein the
plurality of instructions comprises instructions that cause the
processor to determine the time period from the query.
4. The computer-readable storage medium of claim 1 wherein the
plurality of instructions that cause the processor to detect the
one or more instances comprises: instructions that cause the
processor to, associate a target time with a binding, wherein the
target time is based upon the time of the first element in the
binding and the time period; instructions that cause the processor
to receive an input; and instructions that cause the processor to
compare a time associated with the input with the target time.
5. The computer-readable storage medium of claim 4 wherein the
input is a heartbeat.
6. The computer-readable storage medium of claim 4 wherein the
input is another event received in the data stream.
7. The computer-readable storage medium of claim 4 wherein the
plurality of instructions further comprise: instructions that cause
the processor to, upon determining that the time associated with
the input equals or exceeds the target time, cause the automaton to
move to a final state.
8. A system for processing a data stream of events, the system
comprising: a memory storing a plurality of instructions: and a
processor coupled to the memory, the processor configured to
execute the plurality of instructions to: receive a query for
detecting non-occurrence of a first event within a time period
following occurrence of a second event, the query specifying a
pattern; generate a modified pattern by adding a first symbol to
the pattern; generate an automaton for the query based upon the
modified pattern; and detect an instance in the data stream of
non-occurrence of the first event within the time period following
occurrence of the second event using the automaton.
9. The system of claim 8 wherein the processor is configured to:
determine whether the query is for detecting non-occurrence of the
first event; and generate the modified pattern only upon
determining that the query is for detecting non-occurrence of the
first event.
10. The system of claim 8 wherein the processor is configured to
determine the time period from the query.
11. The system of claim 8 wherein the processor is configured to:
associate a target time with a binding, wherein the target time is
based upon the time of the first element in the binding and the
time period; receive an input; and compare a time associated with
the input with the target time.
12. The system of claim 11 wherein the input is a heartbeat.
13. The system of claim 11 wherein the input is another event
received in the data stream.
14. The system of claim 11 wherein the processor is configured to,
upon determining that the time associated with the input equals or
exceeds the target time, cause the automaton to move to a final
state.
15. A computer-implemented method of processing a data stream of
events, the method comprising: receive, by a processing system, a
query for detecting non-occurrence of a first event within a time
period following occurrence of a second event, the query specifying
a pattern; generating, by the processing system, a modified pattern
by adding a first symbol to the pattern string; generating, by the
processing system, an automaton for the query based upon the
modified pattern; and detecting, by the processing system, an
instance an instance in the data stream of non-occurrence of the
first event within the time period following occurrence of the
second event using the automaton.
16. The method of claim 15 further comprising: determining, by the
processing system, whether the query is for detecting
non-occurrence of the first event; and generating, by the
processing system, a modified pattern by adding a first symbol to
the pattern only upon determining that the query is for detecting
non-occurrence of the first event.
17. The method of claim 15 further comprising determining, by the
processing system, the time period from the query.
18. The method of claim 15 wherein detecting the one or more
instances comprises: associating a target time with a binding,
wherein the target time is based upon the time of the first element
in the binding and the time period; receiving an input; and
comparing a time associated with the input with the target
time.
19. The method of claim 18 wherein the input is a heartbeat or
another event received in the data stream.
20. The method of claim 18 further comprising, upon determining
that the time associated with the input equals or exceeds the
target time, causing the automaton to move to a final state.
Description
CROSS-REFERENCES TO RELATED APPLICATIONS
[0001] This application claims the benefit and priority under 35
U.S.C. 119(e) of U.S. Provisional Application Ser. No. 61/092,983,
filed Aug. 29, 2008, entitled FRAMEWORK FOR SUPPORTING REGULAR
EXPRESSION-BASED PATTERN MATCHING IN DATA STREAMS, the contents of
which are herein incorporated by reference in their entirety for
all purposes.
[0002] The present application incorporates by reference for all
purposes the entire contents of the following related applications
filed concurrently with the present application:
[0003] (1) U.S. application Ser. No. ______ titled FRAMEWORK FOR
SUPPORTING REGULAR EXPRESSION-BASED PATTERN MATCHING IN DATA
STREAMS (Attorney Docket No. 021756-056000US;
OID-2008-153-01US);
[0004] (2) U.S. application Ser. No. ______ titled TECHNIQUES FOR
MATCHING A CERTAIN CLASS OF REGULAR EXPRESSION-BASED PATTERNS IN
DATA STREAMS (Attorney Docket No. 021756-056500US;
OID-2008-152-01US);
[0005] (3) U.S. application Ser. No. ______ titled TECHNIQUES FOR
PERFORMING REGULAR EXPRESSION-BASED PATTERN MATCHING IN DATA
STREAMS (Attorney Docket No. 021756-056700US; OID-2008-153-02US);
and
[0006] (4) U.S. application Ser. No. ______ titled DETECTION OF
RECURRING NON-OCCURRENCES OF EVENTS USING PATTERN MATCHING
(Attorney Docket No. 021756-058900US; OID-2008-269-02US).
BACKGROUND OF THE INVENTION
[0007] The present application relates to processing of data
streams and more particularly to techniques for detecting the
non-occurrence of an event within a time period following the
occurrence of another event.
[0008] Databases have traditionally been used in applications that
require storage of data and querying capability on the stored data.
Existing databases are thus best equipped to run queries over
finite stored data sets. However, the traditional database model is
not well suited for a growing number of modern applications in
which data is received as a stream of data events instead of a
bounded data set. A data stream, also referred to as an event
stream, is characterized by a real-time, potentially continuous,
sequence of events. A data or event stream thus represents
unbounded sets of data. Examples of sources that generate data
streams include sensors and probes (e.g., RFID sensors, temperature
sensors, etc.) configured to send a sequence of sensor readings,
financial tickers, network monitoring and traffic management
applications sending network status updates, click stream analysis
tools, and others.
[0009] Pattern matching is commonly used for analyzing data. For
example, data stored in a database may be analyzed to determine if
the data matches a pattern. It is desirable to efficiently perform
pattern matching on data received in the form of data or event
streams.
BRIEF SUMMARY OF THE INVENTION
[0010] Embodiments of the present invention provide techniques for
detecting non-occurrence of an event within a time period following
the occurrence of another event. In one embodiment, language
extensions are provided for a querying language that enable queries
to be formulated for detecting non-occurrences of events.
[0011] In one embodiment, during runtime processing of a query, the
query is analyzed to determine if the query is for detection of a
non-occurrence of an event. If so, the pattern specified in the
query is modified by suffixing a special symbol (e.g., `#`) to the
pattern, where the `#` symbol represents timer events. An FSA is
then built for the modified pattern and used during runtime to
guide detection of the non-occurrences.
[0012] According to an embodiment of the present invention,
techniques are provided for processing a data stream of events. A
query may be received for detecting non-occurrence of a first event
within a time period following occurrence of a second event. The
query may specify a pattern. A modified pattern is generated by
adding a first symbol to the pattern specified in the query. An
automaton may then be generated for the query based upon the
modified pattern. An instance of non-occurrence of the first event
within the time period following occurrence of the second event in
the data stream may be detected using the generated automaton. One
or more actions may be performed upon determining an instance of
non-occurrence of the first event within a time period following
occurrence of the second event. In one embodiment, the time period
may be determined from the query itself.
[0013] In one embodiment, a determination is made as to whether the
query is for detecting a non-occurrence of an event. Upon
determining that the query is for detecting non-occurrence of an
event, the pattern specified in the query by adding a first symbol
to it, thereby generating a modified pattern.
[0014] In one embodiment, detecting the non-occurrences may
comprise associating a target time with a binding, wherein the
target time is based upon the time of the first element in the
binding and the time period. An input may then be received. The
input may be a heartbeat or another event received in the data
stream. The time associated with the input may then be compared
with the target time. Processing may then be contingent based upon
the results of this comparison.
[0015] In one embodiment, if the time associated with the input
equals or exceeds the target time, then the processing may force
the automaton to move to a final state (force the modified pattern
to be matched).
[0016] In one embodiment, the query may comprise a language
extension that indicates whether or not the query is for detecting
the non-occurrence of an event.
[0017] The foregoing, together with other features and embodiments
will become more apparent when referring to the following
specification, claims, and accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] FIG. 1 is a simplified block diagram of a system that
incorporates an embodiment of the present invention;
[0019] FIG. 2 depicts an example of a query comprising a regular
expression specifying a pattern to be matched according to an
embodiment of the present invention;
[0020] FIG. 3 is a simplified flowchart depicting a method of
performing pattern matching on an event stream according to an
embodiment of the present invention;
[0021] FIG. 4 is a simplified flowchart depicting a method of
performing pattern matching on an event stream based upon the type
of the pattern according to an embodiment of the present
invention;
[0022] FIG. 5 depicts an example of a query 500 identifying a Class
A pattern according to an embodiment of the present invention;
[0023] FIG. 6 depicts a simplified flowchart depicting a method of
maintaining bindings for Class A patterns according to an
embodiment of the present invention;
[0024] FIG. 7 is a simplified flowchart depicting a method for
performing pattern matching for Class A patterns after receiving
each event in an event stream according to an embodiment of the
present invention;
[0025] FIG. 8 is an example of a query specifying a Class B pattern
but not a Class A pattern according to an embodiment of the present
invention;
[0026] FIG. 9 is a simplified flowchart depicting a method for
performing operations at compile time including constructing an
automaton for a general Class B pattern according to an embodiment
of the present invention;
[0027] FIGS. 10A and 10B depict a simplified flowchart depicting
runtime processing performed for detecting a Class B pattern in an
input event stream according to an embodiment of the present
invention;
[0028] FIGS. 11A-11I depict various state machines for generating
an automata for a regular expression according to an embodiment of
the present invention;
[0029] FIGS. 12A-12D depict state machines for constructing an
automata for an example regular expression according to an
embodiment of the present invention;
[0030] FIG. 13 depicts an example of a query that may be used to
detect the non-occurrence of an event within a time period
following the occurrence of another event according to an
embodiment of the present invention;
[0031] FIG. 14 depicts a simplified flowchart depicting additional
processing performed at compile time for a query for detecting
non-occurrences according to an embodiment of the present
invention;
[0032] FIG. 15 depicts a simplified flowchart depicting processing
for detecting non-occurrences according to an embodiment of the
present invention.
[0033] FIG. 16 depicts an example of a query that may be used to
detect recurring non-occurrences of an event according to an
embodiment of the present invention;
[0034] FIG. 17 depicts a simplified flowchart depicting processing
for detecting recurring non-occurrences according to an embodiment
of the present invention;
[0035] FIG. 18 is a simplified block diagram illustrating
components of a system environment 1800 that may be used in
accordance with an embodiment of the present invention; and
[0036] FIG. 19 is a simplified block diagram of a computer system
1900 that may be used in accordance with embodiments of the present
invention.
DETAILED DESCRIPTION OF THE INVENTION
[0037] In the following description, for the purposes of
explanation, specific details are set forth in order to provide a
thorough understanding of embodiments of the invention. However, it
will be apparent that the invention may be practiced without these
specific details.
[0038] Embodiments of the present invention provide techniques for
detecting patterns in data or event streams. A pattern to be
detected may be specified using a regular expression. Events
received in data streams are processed during runtime to detect
occurrences of the pattern specified by the regular expression in
the data stream. Techniques are provided for detecting
non-occurrence of an event within a time period following the
occurrence of another event. In one embodiment, language extensions
are provided to a language that enable queries to be formulated for
detecting non-occurrences using that language.
[0039] FIG. 1 is a simplified block diagram of a system 100 that
may incorporate an embodiment of the present invention. As depicted
in FIG. 1, system 100 comprises an events processing server 102
that is configured to process one or more incoming data or event
streams 104, 106, and 108. Streams 104, 106, and 108 may be
received from different sources including a database, a file, a
messaging service, various applications, devices such as various
types of sensors (e.g., RFID sensors, temperature sensors, etc.),
tickers, and the like. Server 102 may receive the streams via a
push-based mechanism or a pull-based mechanism or other
mechanisms.
[0040] A data or event stream is a real-time sequence of events.
Multiple events may be received in a stream. The data stream can
thus be considered as a stream of unbounded sets of data. In one
embodiment, a data stream is a sequence of <tuple, timestamp>
pairs. The tuple refers to the data portion of a stream. A tuple
may be considered as similar to a row in a table. The tuples in a
stream have a schema. A stream can include multiple tuples.
Timestamps define an order over the tuples in a data stream. The
timestamps in a data stream may reflect an application's notion of
time. For example, the timestamp may be set by an application on
the system receiving an event stream. The receiving system may
timestamp an event on receipt as configured by the application, for
example, if specified in the CREATE STREAM DDL that is used to
define a structure of the events stream and the mechanism used to
use application time or system time as the timestamp. In other
embodiments, the timestamp associated with a tuple may correspond
to the time of the application sending the data events. The
timestamp is part of the schema of a stream. There could be one or
multiple tuples with the same timestamp in a stream. The tuples in
a stream can be viewed as a series of events and accordingly the
data stream is also referred to as an event stream. An event stream
can thus be considered to comprise a series of events, each with an
associated timestamp. For example, an event stream may comprise a
series of temperature readings from a sensor such as 10.degree.,
15.degree., 20.degree., etc. and associated time stamps. For
purposes of this application, the terms "tuple" and "event" are
being used interchangeably.
[0041] System 100 comprises an event processing server 102 that is
configured to process event streams. Event processing server 102
may receive one or more event streams. As shown in FIG. 1, event
processing server 102 receives streams 104, 106, and 108. Each
event stream comprises one or more events. The events in a stream
are received by server 102 in a sequence at specific time points.
Server 102 is configured to perform various types of processing on
the incoming streams. According to an embodiment of the present
invention, server 102 is configured to detect patterns in the
incoming event streams based upon the events in the event streams
received by server 102. In one embodiment, server 102 performs the
pattern matching without doing any backtracking processing on the
events of the stream being analyzed as the events are received by
server 102. Pattern matching may be performed using a type of
continuous query that is applied to the incoming streams. Server
102 may also perform other types of processing on the input streams
such as running other continuous queries on the incoming event
streams, and other operations. An example of an event processing
server is the Oracle Complex Event Processor from Oracle.TM.
Corporation.
[0042] In the embodiment depicted in FIG. 1, server 102 comprises a
pattern matching module 110 that is configured to perform
processing related to pattern matching for one or more event
streams. As depicted in FIG. 1, pattern matching module 110
comprises a pattern input interface 112, a class-technique
determinator 113, an automaton generator 114, and a matcher 116.
Pattern input interface 112 provides an interface for receiving
information specifying patterns to be matched in the event streams.
Pattern input interface 112 may provide a graphical user interface
that allows information to be entered specifying one or more
patterns to be matched, a command line interface for specifying the
patterns to be matched, or some other interface. A pattern to be
matched may be specified by a user of server 102. Information
identifying a pattern to be matched may also be received from other
sources, for example, from other components or modules of event
processing server 102, or other systems or applications.
[0043] In one embodiment, patterns to be matched are specified
using regular expressions. A regular expression is a string of
symbols (also referred to as correlation names or correlation
variables) representing the pattern to be matched. The regular
expression is built using one or more symbols and may use one or
more operators. Examples of operators include but are not limited
to a concatenation operator (e.g., an "AND" operator between
symbols in a regular expression may be used to indicate an AND
relationship between the symbols), alternation operator (e.g., a
vertical bar "|" may separate symbols in a regular expression
indicating an OR condition for the symbols), one or more
quantifiers, grouping operator (e.g., indicated by parentheses),
and the like. Examples of quantifiers include an asterisk `*`
implying zero or more occurrences of the symbol with which the
quantifier is associated, a plus sign `+` implying one or more
occurrences of the symbol with which the quantifier is associated,
a question mark `?` implying zero or one occurrences of the symbol
with which the quantifier is associated, reluctant quantifiers, and
the like. Examples of operators and quantifiers that may be used,
including associated syntax for the regular expressions, are
provided and described in Fred Zemke et al., "Pattern Matching in
Sequence of Rows (12)," ISO/IEC JTCi/SC32 WG3:URC-nnn, ANSI NCITS
H2-2006-nnn, Jul. 31, 2007, the entire contents of which are herein
incorporated by reference for all purposes.
[0044] In the past, regular expressions have been mainly used to
find patterns in strings. In embodiments of the present invention,
the power of regular expressions is used to match patterns in event
streams received by event processing server 102. Regular
expressions provide a simple, concise, and flexible way for
specifying patterns to be matched. In the embodiment depicted in
FIG. 1, event processing server 102 may receive pattern information
118 specifying a regular expression to be matched in one or more
event streams. In one embodiment, the pattern may be specified
using pattern input interface 112 of pattern matching module
110.
[0045] Pattern information 118 may be provided using different
languages. In one embodiment, a programming language such as SQL,
which is commonly used to query databases, may be used. Extensions
may be provided to SQL to express the pattern to be matched for
event streams. For example, pattern information 118 may specify a
SQL query comprising a regular expression specifying a pattern to
be matched in one or more event streams received by event
processing server 102.
[0046] Oracle supports a CQL (Continuous Query Language) language
in Complex Events Processing (CEP) products. CQL is very similar to
SQL with extensions for stream processing. Pattern matching
constructs proposed to extend SQL to specify pattern matching via
regular expressions (e.g., the constructs described in Fred Zemke
et al., "Pattern Matching in Sequence of Rows (12)," ISO/IEC
JTCi/SC32 WG3:URC-nnn, ANSI NCITS H2-2006-nnn, Jul. 31, 2007, the
entire contents of which are herein incorporated by reference for
all purposes) have been adopted in CQL to extend CQL for the
purpose of specifying pattern matching requirements over event
streams.
[0047] Typically, pattern matching for a query pattern occurs only
over a single input stream. Pattern matching may also be performed
over multiple event streams, for example, using CQL. In one
embodiment, this may be done by first performing a UNION of all the
relevant input streams over which pattern matching is to be done
with the result defining a view corresponding to an intermediate
stream, and the pattern to be matched can be specified over this
single intermediate stream. The pattern will then be matched to all
the streams included in the view.
[0048] FIG. 2 depicts an example of a query 200 that may be
provided specifying a pattern to be matched over an event stream
according to an embodiment of the present invention. Query 200
comprises a FROM clause 202 that specifies an event stream "Ticker"
over which pattern matching is to be performed. "Ticker" may
represent a single event stream or a UNION of multiple streams.
[0049] Query 200 comprises a PATTERN component 203 that specifies a
regular expression 204 identifying the pattern to be recognized in
the event stream "Ticker". The regular expression (A B C A B D) in
query 200 comprises several symbols or correlation names. The
pattern specified in FIG. 2 is an example of a simple nonrecurring
pattern. It is nonrecurring since each symbol in the pattern
specifies only a single occurrence of that symbol and does not
include recurrences of the symbol. The alphabet set for a pattern
comprises distinct symbols in the pattern. For the above example,
the alphabet set is {A, B, C, D}. Each symbol in the alphabet
corresponds to a variable name corresponding to a Boolean condition
that is specified in the DEFINE component 206 of the query.
[0050] The DEFINE component 206 of query 200 specifies Boolean
conditions (or predicates) that define the symbols declared in the
regular pattern. For the example depicted in FIG. 2, the symbols
declared in pattern 204 include A, B, C, and D. The predicates or
Boolean conditions associated with the symbols are defined by the
DEFINE component as follows:
TABLE-US-00001 Symbol Predicate A 30 <= A.price <= 40 B
B.price < PREV(B.price) C C.price <= PREV(C.price) D D.price
> PREV(D.price)
It should be understood that all symbols defined in regular
expression do not require an associated predicate. A symbol with no
associated predicate is by default assumed to be always matched or
true. Such a symbol may be used to match any event in an event
stream.
[0051] The predicates depicted above are all related to the price
attribute of an event. An event may have one or more attributes.
The predicates may be based upon these attributes. A particular
symbol is deemed to be matched by an input event received in an
event stream if the predicate associated with the symbol is matched
or satisfied due to the input event. For example, symbol A in FIG.
2 is matched by a received event if the price attribute of the
received event is greater than or equal to 30 and less than or
equal to 40. Whether or not a predicate associated with a symbol is
matched may depend on the present event and/or previously received
events. For example, symbol B in FIG. 2 is matched by a received
event if the price attribute of the received event is less than the
price attribute of the event received just immediately preceding
the presently received event. For the "Symbol" partition, when a
PARTITION BY is specified (as in this example), PREV is the
previous input received for that partition. A received input event
in an event stream may cause zero or more symbols of the regular
expression to be matched.
[0052] For the symbols and associated predicates depicted in FIG.
2: [0053] (1) the symbol A is matched by an event received in the
Ticker event stream if the value of the price attribute of the
event is greater than or equal to 30 and less than or equal to 40;
[0054] (2) the symbol B is matched by an event received in the
Ticker event stream if the value of the price attribute of the
received event is less than the price of the event received just
immediately preceding the presently received event; [0055] (3) the
symbol C is matched by an event received in the Ticker event stream
if the value of the price attribute of the received event is less
than or equal to the price of the event received just immediately
preceding the presently received event; and [0056] (4) the symbol D
is matched by an event received in the Ticker event stream if the
value of the price attribute of the received event is greater than
the price of the event received just immediately preceding the
presently received event. As discussed earlier, for the "Symbol"
partition, when a PARTITION BY is specified (as in this example),
PREV is the previous input received for that partition.
[0057] As evident from the above, matching of symbols in a regular
expression to events received in an event stream is quite different
from conventional pattern matching in strings using regular
expressions. In event stream pattern matching, a symbol in a
regular expression is considered matched by a received event only
if the predicate associated with the symbol is satisfied by the
event. This is unlike character string matching using regular
expressions wherein a symbol is matched if that symbol itself is
present in the string to be matched. Further, in event stream
pattern matching, multiple predicates can be satisfied at the same
time and as a result multiple symbols may be matched by a received
input event. This is not the case in regular string matching.
Several other differences exist between pattern matching in strings
and pattern matching in event streams.
[0058] For the pattern specified in FIG. 2, the pattern
corresponding to (ABCABD) is matched in the event stream when
symbol A is matched, followed by a match of B, followed by a match
of C, followed by a match of A, followed by a match of B, and
followed by a match of D. An example of runtime pattern matching
processing performed for the pattern depicted in FIG. 2 is
described below.
[0059] As indicated above, a pattern may be specified using a
query, such as a CQL query depicted in FIG. 2. In one embodimemt,
the syntax for such a query follows the query standards specified
in Fred Zemke et al., "Pattern Matching in Sequence of Rows (12),"
ISO/IEC JTCi/SC32 WG3:URC-nnn, ANSI NCITS H2-2006-nnn, Jul. 31,
2007, the entire contents of which are herein incorporated by
reference for all purposes. Some of the components of the query
include: [0060] FROM<data_stream_name>--specifies the event
stream over which pattern matching is to be performed. [0061]
MATCH_RECOGNIZE--Clause that contains all the sub-clauses or
components relevant to the pattern specification. [0062] PARTITION
BY--Used to specify how the event stream is to be partitioned. If
this clause is not used, then all the events constitute one
partition. [0063] AFTER MATCH SKIP TO--This clause determines the
resumption point of pattern matching after a match has been found
in the event stream. [0064] PATTERN--Used to specify a regular
expression built using one or more symbols and may contains
operators. [0065] DEFINE--This component is used to specify the
predicates that define the symbols declared in the pattern.
[0066] As described above, SQL extensions are provided for
specifying a query for performing pattern matching over event
streams. The query may comprise a regular expression identifying
the pattern to be matched and predicates defining or associated
with symbols in the regular expression. The extensions enhance the
ability to use SQL for performing pattern matching on event
streams.
[0067] Referring back to FIG. 1, pattern information 118 is
provided to class-technique determinator module 113 for further
processing. Class-technique determinator module 113 is configured
to identify a type or class of pattern based upon information 118
and to further determine a pattern matching technique to be used
for performing pattern matching for the identified pattern type or
class. Pattern matching module 110 is capable of performing for
different types of patterns. In one embodiment, the type of class
for a pattern is determined based upon the regular expression
specified in information 118 and/or based upon the predicates
associated with the symbols in the regular expression.
Class-technique determinator module 113 is configured to analyze
the regular expression and predicates specified in information 118
and determine a pattern class or type based upon the analysis.
[0068] In one embodiment, class-technique determinator uses pattern
type information 120 identifying to determine the pattern class or
type for the information provided in 118. Pattern type information
120 may identify different pattern types or classes and
characteristics associated with the different pattern classes.
Pattern matching module 110 may use pattern type information 120 to
automatically identify a particular type or class of pattern for
the pattern specified in pattern information 118 from among
multiple pattern classes that module 110 is capable of processing.
In another embodiment, pattern type information 120 is not needed,
and pattern matching module 110 may be configured to automatically
determine a type of pattern by analyzing the information provided
in pattern information 118.
[0069] In one embodiment, pattern matching module 110 is configured
to apply different pattern matching techniques for different types
or classes of patterns. After a pattern type has been determined
for the pattern specified in pattern information 118, module 113 is
configured to determine a particular pattern matching technique,
from among multiple available techniques, that is suited for
performing pattern matching for the determined pattern. In this
manner, a customized pattern matching technique or a technique that
is best suited for performing pattern matching for the determined
pattern type is determined. This helps to improve the efficiency of
the pattern matching process for specific types of patterns.
[0070] In one embodiment, class-to-technique information 124 may be
provided to pattern matching module 110 identifying one or more
pattern classes and one or more pattern matching techniques to be
used for detecting pattern matches for each pattern class. After a
class of pattern specified in pattern information 118 has been
determined, pattern matching module 110 may use class-to-technique
information 124 to determine a specific pattern matching technique
to be used for finding matches in the event stream. For example, if
the pattern is determined to be a Class A pattern, then a pattern
matching technique appropriate for a Class A pattern may be used
for performing the pattern matching. Whereas, if the pattern is
determined to be a Class B pattern, then a pattern matching
technique appropriate for a Class B pattern may be used for
performing the pattern matching.
[0071] In one embodiment, the pattern matching process comprises
constructing a finite state automaton (FSA) for a given pattern and
then using the constructed FSA to guide the pattern matching
process during runtime as events are received. Automaton generator
114 is configured to parse the input regular expression received
via interface 112 and build an automaton for the pattern to be
matched. One or more automata constructed by generator 114 may be
stored as automata information 122. The automaton generated for a
pattern is then used in runtime by matcher 116 to guide detection
of the pattern in event streams 104, 106, and 108.
[0072] As previously indicated, the pattern matching process may be
customized for certain classes of patterns. In one embodiment,
automaton generator 114 may receive information from
class-technique module 113 identifying the class of the pattern and
the technique to be used for performing the pattern matching for
the identified class of pattern. Automaton generator 114 may then
generate an automaton using the selected pattern matching
technique.
[0073] Matcher 116 is configured to process the events received in
the events streams during runtime to detect occurrences of the
specified pattern in the incoming event streams. Matcher 116 uses
the automaton generated by automaton generator 114 to guide the
pattern matching process. For each event stream, the automaton is
used as a guide to indicate how much of the specified pattern is
matched by the events received in the event stream at any point in
time. In one embodiment, bindings are maintained by matcher 116
after each event in an event stream is processed to capture the
state of partial or full matches of the pattern at any point in
time. A binding is like an assignment of contiguous events (and in
the case of PARTITIONS, contiguous within the PARTITION) to one or
more correlation names that corresponds to a partial (or possibly
full) match that satisfies all the DEFINE predicates associated
with the pattern. A binding indicates that degree to which a
pattern is matched as a result of the last received event. Bindings
stored after receiving an event may indicate partial matches that
have the potential of becoming longer matches or full matches. If a
particular pattern matching technique has been selected, matcher
116 may perform the processing according to the selected
technique.
[0074] Matcher 116 may be configured to take one or more actions
when a particular pattern is matched or detected in an event
stream. For example, when a pattern is matched, matcher 116 may
send a signal indicating that the pattern has been matched. The
signal may be forwarded to one or more components of events
processing server 102 or some other system for further processing.
In one embodiment, the action may include outputting the events
that resulted in the pattern being matched.
[0075] System 100 depicted in FIG. 1 is an example of a system
which may incorporate an embodiment of the present invention.
Various other embodiments and variations are possible. Similarly,
the various modules depicted in FIG. 1 are shown as examples and
are not intended to limit the scope of the present invention. In
alternative embodiments, more or less modules may be present. The
various modules depicted in FIG. 1 may be implemented in software
(e.g., code, program, instructions) executed by a processor,
hardware, or combinations thereof. For example, in some
embodiments, a separate class-technique determinator module 113 may
not be provided. In such embodiments, the processing performed by
module 113 may instead be performed by automaton generator 114 and
matcher 116. In one such embodiment, automaton generator 114 may be
configured to automatically determine a pattern class or type for
the pattern specified in pattern information 118 and build an
automaton. Matcher 116 may be configured to determine a pattern
matching technique to be used for the determined pattern and then
apply the determined technique during runtime processing of events
received in an event stream.
[0076] FIG. 3 is a simplified flowchart 300 depicting a method of
performing pattern matching on an event stream according to an
embodiment of the present invention. In one embodiment, the method
depicted in FIG. 3 is performed by pattern matching module 110
depicted in FIG. 1. The processing depicted in FIG. 3 may be
performed by software (e.g., code, program, instructions) executed
by a processor, in hardware, or combinations thereof. The software
may be stored in a computer-readable storage medium. The method
depicted in FIG. 3 may be applied to multiple event streams.
[0077] As depicted in FIG. 3, processing is initiated upon
receiving information identifying a pattern to be matched (step
302). In one embodiment, the information received in 302 comprises
a regular expression specifying the pattern to be matched. For
example, a query may be received in 302 specifying a regular
expression identifying a pattern to be matched. The information
received in 302 may also identify the event streams that are to be
analyzed to determine if events received in the event streams match
the specified pattern. The information received in 302 may also
specify predicates associated with the symbols in the regular
expression.
[0078] An automaton is then constructed for the pattern received in
302 (step 304). The automaton generated in 304 may be a finite
state automaton (FSA).
[0079] The automaton constructed in 304 is then used during runtime
to guide the pattern matching process to detect presence of the
specified pattern in the specified event streams (step 306). As
part of the processing, the event streams to be analyzed are passed
through a state machine corresponding to the automaton generated in
304. As part of the processing in 306, bindings are maintained
after each event received in an event stream has been analyzed to
store the state of pattern matches, including partial matches that
have the potential to turn into full matches, after processing the
received event. As previously described, a binding is used to
encapsulate a full or partial pattern match and maintains
references to the received events of the stream that comprise the
full or partial matched pattern.
[0080] One or more actions may be performed upon detecting a
pattern match in an input event stream being analyzed (step 308).
The actions performed may include sending a signal indicating a
match, outputting the events in the event stream that resulted in
the pattern match, and other actions.
[0081] Steps 302 and 304 typically represent design time or compile
time activities that are performed before the pattern matching
analysis may be performed. Steps 306 and 308 represent runtime
activities that are performed in real time as events in an event
stream are received and processed.
[0082] As indicated above, in one embodiment of the present
invention, the type or class of the pattern to be matched is
determined and then used to customize the pattern matching
processing. FIG. 4 is a simplified flowchart 400 depicting a method
of performing pattern matching on an event stream based upon the
type of the pattern according to an embodiment of the present
invention. In one embodiment, the method depicted in FIG. 4 is
performed by pattern matching module 110 depicted in FIG. 1. The
processing depicted in FIG. 4 may be performed by software (e.g.,
code, program, instructions) executed by a processor, in hardware,
or combinations thereof The software may be stored in a
computer-readable storage medium.
[0083] As depicted in FIG. 4, processing is initiated upon
receiving information identifying a pattern to be matched (step
402). The information received in 402 may comprise a regular
expression identifying the pattern to be detected in an event
stream and information identifying predicates associated with the
one or more symbols in the regular expression.
[0084] Processing is then performed to determine a type or class of
pattern for the pattern received in 402 (step 404). In one
embodiment, the class of pattern may be determined based upon the
regular expression received in 402 and/or the predicates defined
for the symbols in the regular expression. For example, the type or
class of pattern may be determined based upon the contents of the
PATTERN and DEFINE clauses. Accordingly, in one embodiment, as part
of the processing performed in 404, the information received in 402
is parsed to determine the contents of the PATTERN and DEFINE
clauses. A type or class is then determined based upon analysis of
the extracted contents.
[0085] In one embodiment, preconfigured information identifying
different types of patterns and their associated characteristics
may be used to facilitate the pattern type identification in 404.
For example, as depicted in FIG. 1, pattern type information 120
depicted may be used to facilitate determination of the class or
type. In one embodiment, pattern type information 120 may identify
different types or classes of patterns and information specifying
characteristics of each type and class.
[0086] A technique to be used for performing the pattern matching
processing is then determined based upon the pattern class or type
determined in 404 (step 406). In one embodiment, preconfigured
information identifying different types of patterns and techniques
to be used for each class may be used to facilitate identification
of the technique in 406. For example, as depicted in FIG. 1,
class-to-technique information 124 may be used to facilitate
determination of a technique to be used based upon the class or
type determined in 404.
[0087] Pattern matching processing is then performed by applying
the technique determined in 406 (step 408). The processing
performed in 408 may include constructing an an automaton for the
pattern received in 402. The automaton generation may be performed
according to the technique determined in 406. Accordingly, the
automaton generated in 408 may be customized for the particular
class or type determined in 404.
[0088] Further, as part of the processing performed in 408, the
automaton that is constructed may then be used during runtime to
guide the pattern matching process to detect presence of the
specified pattern in a specified event stream. The pattern
detection may be performed per the technique determined in 406. In
this manner, a pattern matching technique that is customized for or
well suited for the type or class determined in 404 is used in
408.
[0089] As part of the processing performed in 408, the events
received in an event stream are processed and passed through a
state machine corresponding to the automaton generated in 408. As
part of the processing in 408, bindings are maintained after each
received event to represent the state of pattern matches including
partial matches that have the potential to turn into full
matches.
[0090] One or more actions may be performed upon detecting a full
pattern match in the input event stream (step 410). The actions
performed may include sending a signal indicating a match,
outputting event instances that resulted in the full pattern match,
and other actions.
[0091] Steps 402, 404, and 406 represent design time or compile
time activities that are performed before the runtime pattern
matching may be performed. Steps 408 and 410 represent runtime
activities that are performed in real time as events in an event
stream are received and processed.
[0092] As described above, an automaton such as a finite state
automaton (FSA) is generated for a pattern to be matched prior to
runtime processing. For example, an automaton is generated for the
pattern corresponding to the regular expression (A B C A B D)
depicted in FIG. 2. In one embodiment, the automaton generated for
the example in FIG. 2 has seven states including a start state Q0
and one state for each symbol position in the pattern with state Qi
corresponding to pattern symbol position Pi. Since there are six
symbol positions in the pattern (A B C A B D), the seven states for
this pattern are Q0 (initial state), Q1 (state representing partial
match of the 1.sup.st symbol "A"), Q2 (state representing partial
match of the 1.sup.st and 2.sup.nd symbols "AB"), Q3 (state
representing partial match of the 1.sup.st, 2.sup.nd, and 3.sup.rd
symbols "ABC"), Q4 (state representing partial match of the
1.sup.st, 2.sup.nd, 3.sup.rd, and 4.sup.th symbols "ABCA"), Q5
(state representing partial match of the 1.sup.st, 2.sup.nd,
3.sup.rd, 4.sup.th, and 5.sup.th symbols "ABCAB"), and Q6 (final
state representing full match of pattern "ABCABD"). The alphabet
for the pattern is {A, B, C, D}. An extra symbol may be added to
represent an event that does not match any of the specified symbols
in the pattern. In this example, this extra symbol may be
represented by letter R. Hence, the alphabet for the above pattern
depicted in FIG. 2 is the set {A, B, C, D, R}.
[0093] Table A (shown below) depicts a state transition function
table created for the FSA generated for the pattern identified in
FIG. 2 according to an embodiment of the present invention.
TABLE-US-00002 TABLE A State Alphabet(s) Next State Q0 A Q1 Q0 B,
C, D, R Q0 Q1 B Q2 Q1 A, C, D, R Q0 Q2 C Q3 Q2 A, B, D, R Q0 Q3 A
Q4 Q3 B, C, D, R Q0 Q4 B Q5 Q4 A, C, D, R Q0 Q5 D Q6 Q5 A, B, C, R
Q0
[0094] In Table A, the first column shows an initial state. The
third column of the table identifies a state to which a transition
is made from the initial state upon receiving an event that matches
the symbols identified in the second column of the table. For
example, as shown by the first two rows of Table A, the FSA starts
in state Q0. If an event received in the event stream causes the
first symbol A in the pattern to be matched, then the FSA
transitions from state Q0 to state Q1. However, if any other symbol
(e.g., B, C, D, or R) is matched by the received event, then the
FSA continues to be in state Q0. Likewise, when in state Q1, if the
next event causes symbol B to be matched, then the FSA transitions
to state Q2 and if the event matches a A, C, D, or R, then the
state reverts to state Q0. In this manner, Table A specifies the
automaton for the pattern identified by regular expression (A B C A
B D).
[0095] The general idea for the FSA is to have one state per prefix
of the pattern to be matched. There is a forward transition from a
state Qi only corresponding to the matching of the symbol that when
concatenated with the prefix associated with state Qi produces the
prefix associated with the state Qi+1. For all other symbols the
transition is to state Q0. For example consider state Q5. The
prefix corresponding to this state is ABCAB. If the next event in
the event stream matches the symbol D, the FSA machine will
transition to state Q6 since the prefix associated with state is
ABCABD which is the concatenation of ABCAB (the prefix associated
with state Q5) and the matched symbol D. On the other hand, if the
next event in the event stream matches the symbol C, the FSA state
machine will transition to state Q0.
[0096] The FSA generated for the pattern identified by (A B C A B
D) is then used at runtime to guide the detection of the specified
pattern in an event stream. For purposes of illustrating how the
FSA of Table A may be applied to an event stream, it is assumed
that the event stream comprises events as shown below in Table B
and are received in the sequence depicted in Table B. The event
stream may be for example a ticker event stream comprising the
price of a stock.
TABLE-US-00003 TABLE B Seq # 0 1 2 3 4 5 6 7 Price 36 35 35 34 32
32 31 45 Matching A A A A A A A D Symbols B C B B C B C C C C
As shown in Table B, events are received in sequence (as determined
by the time stamp associated with each event) and have price
attributes: 36, 35, 35, 34, 32, 32, 31, 45. The third row in Table
B depicts, for each sequence time point, the symbols of the pattern
that are matched by the price attribute of the event received at
that time point. For example, at sequence #0, an event is received
having a price attribute of 36 that results in symbol A being
matched since 30 <=36<=40, satisfying the predicate
associated with symbol A. Price 35 received at sequence #1 causes
the following matches: A (since 30<=35<=40), B (since
35<36 (the previous price)), and C (since 35<=36 (the
previous price)). Likewise, price 35 received at seq #2 results in
the following matches: A (since 30<=35<=40) and C (since
35<=35). Price 34 received at seq #3 results in the following
matches: A (since 30<=34<=40), B (since 34<35) and (since
34<=35). Price 32 received at seq #4 results in the following
matches: A (since 30<=32<=40), B (since 32<34), and C
(since 32<=35). Price 32 received at seq #5 results in the
following matches: A (since 30<=32<=40) and C (since
32<=32). Price 31 received at #6 results in the following
matches: A (since 30<=31<=40), B (since 31<32), and C
(since 31<=32). Price 45 received at seq #7 results in the
following matches: only D (since 45>31).
[0097] Table C shows the state of the FSA (of Table A) after
receipt of each event in the event stream according to the sequence
indicated in Table B.
TABLE-US-00004 TABLE C Matched Seq # Price Symbol State of FSA
Stored Bindings {Q0} Q0: (*, *, *, *, *, *) 0 36 A {Q0, Q1} Q0: (*,
*, *, *, *, *) Q1: (0, *, *, *, *, *) 1 35 A {Q0, Q1, Q2} Q0: (*,
*, *, *, *, *) B Q1: (1, *, *, *, *, *) C Q2: (0, 1, *, *, *, *) 2
35 A {Q0, Q1, Q3} Q0: (*, *, *, *, *, *) C Q1: (2, *, *, *, *, *)
Q3: (0, 1, 2, *, *, *) 3 34 A {Q0, Q1, Q2, Q4} Q0: (*, *, *, *, *,
*) B Q1: (3, *, *, *, *, *) C Q2: (2, 3, *, *, *, *) Q4: (0, 1, 2,
3, *, *) 4 32 A {Q0, Q1, Q2, Q3, Q5} Q0: (*, *, *, *, *, *) B Q1:
(4, *, *, *, *, *) C Q2: (3, 4, *, *, *, *) Q3: (2, 3, 4, *, *, *)
Q5: (0, 1, 2, 3, 4, *) 5 32 A {Q0, Q1, Q3, Q4} Q0: (*, *, *, *, *,
*) C Q1: (5, *, *, *, *, *) Q3: (3, 4, 5, *, *, *) Q4: (2, 3, 4, 5,
*, *) 6 31 A {Q0, Q1, Q2, Q3, Q5} Q0: (*, *, *, *, *, *) B Q1: (6,
*, *, *, *, *) C Q2: (5, 6, *, *, *, *) Q4: (3, 4, 5, 6, *, *) Q5:
(2, 3, 4, 5, 6, *) 7 45 D {Q0, Q6} Q0: (*, *, *, *, *, *) Q6: (2,
3, 4, 5, 6, 7)
[0098] The first column of Table C "Seq #" identifies the sequence
number indicating the sequence time point at which an event is
received. The second column "Price" indicates, for each sequence,
the value of the price attribute of the event received in the event
stream in that sequence point. The third column "Matched Symbol"
identifies, for each event, the symbol or correlation name(s) that
is matched by the event received at the sequence. Zero or more
symbols may be matched by an event. The fourth column "State of
FSA" identifies, for each sequence, the different states in which
the FSA may be in after processing the event received in that
sequence. The fifth column "Stored Bindings", for each sequence
time point, indicates the bindings that are stored for a sequence
time point after processing an event received at that sequence time
point. Each binding identifies a partial or full match of the
pattern to be matched. In Table C, each binding identifies a state
representing a partial or full match after processing an event.
Each binding also identifies the events that cause the binding to
be in that state. For example, a binding Q2: (2,3,*,*,*,*)
represents a partial match (of the first two symbols) of the
pattern being matched and corresponds to the FSA being in state Q2
due to prices associated with events received in seq #2 and seq #3.
As another example, a binding Q3: (3,4,5,*,*,*) indicates that the
binding corresponds to the FSA being in state Q3 due to a partial
match due to prices associated with events received in sequences
#3, #4, and #5. A Q0: (*,*,*,*,*,*) indicates a binding
corresponding to the Q0 state, which is the starting state, and
represents no match of the pattern being matched.
[0099] Bindings stored after processing an event encapsulate
partial or full matches. A binding indicates that degree to which a
pattern is matched as a result of the last received event. Bindings
stored after receiving an event may indicate partial matches that
have the potential of becoming longer matches or full matches. They
contain the mapping information between a symbol and the event from
the stream. At any point in time, for the pattern identified by
regular expression depicted in FIG. 2, one instance of a binding is
maintained per state that the machine is in. For example, after seq
#3, four bindings are maintained, one binding for each of states
Q0, Q1, Q2, and Q4. This follows from the construction of the FSA.
Each state of the FSA corresponds to a unique prefix of the
pattern. For the simple pattern (A B C A B D), the length of the
prefix associated with each state is fixed. The number of elements
in the binding associated with a state that are not `*`, i.e., they
are valid event associations, is equal to the length of the pattern
prefix associated with the state. The set of valid event
associations in a binding are always contiguous events of the event
stream and are the last k events of the event stream, where k is
the length of the pattern prefix associated with the state.
Accordingly, exactly one instance of a binding is maintained per
state that the FSA machine is in after receiving each event. As
will be discussed below in further detail, for certain type of
patterns, like the pattern depicted in FIG. 2, the number of
bindings at any point in time is bound from above by the number of
possible states of the FSA, which is one plus the number of symbols
in the regular expression specifying the pattern. Accordingly, for
the regular expression (A B C A B D), the maximum number of
bindings that are maintained at any time is 6+1=7. A binding can be
thought of as a vector of length m, with position i of the vector
corresponding to the symbol Pi of the pattern. Its sequence number
in the event stream indicates the event that is bound to this
position.
[0100] The processing depicted in Table C may be described as
follows:
[0101] (1) The FSA starts in state Q0.
[0102] (2) At seq #0, an event is received with price 36. This
results in a match with symbol A and causes the FSA to be in two
possible states Q0 and Q1. The binding corresponding to state Q0 is
Q0: (*,*,*,*,*,*) indicating that the binding represents no partial
pattern match. The binding corresponding to this state is Q1:
(0,*,*,*,*,*) indicating that the binding represents a partial
pattern match ("A") due to the event received in seq #0.
[0103] (3) At seq #1, an event is received with price 35. This
results in a match with symbols A, B, and C and causes the FSA to
be in three possible states Q0, Q1, and Q2. The binding
corresponding to state Q0 is Q0: (*,*,*,*,*,*) indicating that the
binding represents no partial pattern match. The binding
corresponding to state Q1 is Q1: (1,*,*,*,*,*) indicating that the
binding represents a partial pattern match ("A") due to the event
received in seq #1. The binding corresponding to state Q2 is Q2:
(0,1,*,*,*,*) indicating that the binding represents a partial
pattern match ("AB") due to the events received in seq #0 and seq
#1.
[0104] (4) At seq #2, an event is received with price 35. This
results in a match with symbols A and C and causes the FSA to be in
three possible states Q0, Q1, and Q3. The binding corresponding to
state Q0 is Q0: (*,*,*,*,*,*) indicating that the binding
represents no pattern match. The binding corresponding to state Q1
is Q1: (2,*,*,*,*,*) indicating that the binding represents a
partial pattern match ("A") due to the event received in seq #2.
The binding corresponding to state Q3 is Q3: (0,1,2,*,*,*)
indicating that the binding represents a partial pattern match
("ABC") due to the events received in seq #0, seq #1, and seq
#2.
[0105] (5) At seq #3, an event is received with price 34. This
results in a match with symbols A, B, and C and causes the FSA to
be in four possible states Q0, Q1, Q2, and Q4. The binding
corresponding to state Q0 is Q0: (*,*,*,*,*,*) indicating that the
binding represents no pattern match. The binding corresponding to
state Q1 is Q1: (3,*,*,*,*,*) indicating that the binding
represents a partial pattern match ("A") due to the event received
in seq #3. The binding corresponding to state Q2 is Q2:
(2,3,*,*,*,*) indicating that the binding represents a partial
pattern match ("AB") due to the events received in seq #2 and seq
#3. The binding corresponding to state Q4 is Q4: (0,1,2,3,*,*)
indicating that the binding represents a partial pattern match
("ABCA") due to the events received in seq #0, seq #1, seq #2, and
seq #3.
[0106] (6) At seq #4, an event is received with price 32. This
results in a match with symbols A, B, and C and causes the FSA to
be in five possible states Q0, Q1, Q2, Q3, and Q5. The binding
corresponding to state Q0 is Q0: (*,*,*,*,*,*) indicating that the
binding represents no pattern match. The binding corresponding to
state Q1 is Q1: (4,*,*,*,*,*) indicating that the binding
represents a partial pattern match ("A") due to the event received
in seq #4. The binding corresponding to state Q2 is Q2:
(3,4,*,*,*,*) indicating that the binding represents a partial
pattern match ("AB") due to the events received in seq #3 and seq
#4. The binding corresponding to state Q3 is Q3: (2,3,4,*,*,*)
indicating that the binding represents a partial pattern match
("ABC") due to the events received in seq #2, seq #3, and seq #4.
The binding corresponding to state Q5 is Q5: (0,1,2,3,4,*)
indicating that the binding represents a partial pattern match
("ABCAB") due to the events received in seq #0, seq #1, seq #2, seq
#3, and seq #4. It should be noted here that binding Q5 is just one
match from a complete pattern match.
[0107] (7) At seq #5, an event is received with price 32. This
results in a match with symbols A and C and causes the FSA to be in
four possible states Q0, Q1, Q3, and Q4. The binding corresponding
to state Q0 is Q0: (*,*,*,*,*,*) indicating that the binding
represents no pattern match. The binding corresponding to state Q1
is Q1: (5,*,*,*,*,*) indicating that the binding represents a
partial pattern match ("A") due to the event received in seq #5.
The binding corresponding to state Q3 is Q3: (3,4,5,*,*,*)
indicating that the binding represents a partial pattern match
("ABC") due to the events received in seq #3, seq #4, and seq #5.
The binding corresponding to state Q4 is Q4: (2,3,4,5,*,*)
indicating that the binding represents a partial pattern match
("ABCA") due to the events received in seq #2, seq #3, seq #4, and
seq #5.
[0108] (8) At seq #6, an event is received with price 31. This
results in a match with symbols A, B, and C and causes the FSA to
be in five possible states Q0, Q1, Q2, Q4, and Q5. The binding
corresponding to state Q0 is Q0: (*,*,*,*,*,*) indicating that the
binding represents no pattern match. The binding corresponding to
state Q1 is Q1: (6,*,*,*,*,*) indicating that the binding
represents a partial pattern match ("A") due to the event received
in seq #6. The binding corresponding to state Q2 is Q2:
(5,6,*,*,*,*) indicating that the binding represents a partial
pattern match ("AB") due to the events received in seq #5 and seq
#6. The binding corresponding to state Q4 is Q4: (3,4,5,6,*,*)
indicating that the binding represents a partial pattern match
("ABCA") due to the events received in seq #3, seq #4, seq #5, and
seq #6. The binding corresponding to state Q5 is Q5: (2,3,4,5,6,*)
indicating that the binding represents a partial pattern match
("ABCAB") due to the events received in seq #2, seq #3, seq #4, seq
#5, and seq #6.
[0109] At seq #7, an event is received with price 45. This results
in a match with symbol D and causes the FSA to be in two possible
states Q0 and Q6. The binding corresponding to state Q0 is Q0:
(*,*,*,*,*,*) indicating that the binding represents no partial
pattern match. The binding corresponding to state Q6 is Q6:
(2,3,4,5,6,7) indicating that the binding represents a full pattern
match ("ABCABD") due to the event received in seq #2, seq #3, seq
#4, seq #5, seq #6 and seq #7.
[0110] State Q6, that is reached after the event in seq #7,
represents that final state of the FSA representing a full pattern
match. In the above example, the full pattern is matched due to
events received in sequences 2, 3, 4, 5, 6, and 7. The events
received at these sequence are italicized in Table B and in Table C
and their corresponding states that resulted in a full match have
been underlined. One or more actions may be initiated after the
pattern match. The actions may include outputting the events that
resulted in the final state. For the pattern indicated in FIG. 2,
after the final state is reached, the state of the FSA machine is
set back to {Q0}, the initial state. This is done since the length
of the matching bindings at the final state is always fixed.
[0111] After a match is found, the resumption point of the pattern
matching is determined based upon the AFTER MATCH SKIP TO clause
(shown in FIG. 2) in the query. With reference to a match contained
in another full match, the AFTER MATCH SKIP TO clause determines if
overlapping matches are to be reported. By default, the AFTER MATCH
SKIP TO clause is set to TO PAST LAST ROW, which indicates that
once a match is obtained, overlapping partial matches are discarded
and pattern matching is resumed anew. For details on the AFTER
MATCH SKIP TO clause, please refer to Fred Zemke et al., "Pattern
Matching in Sequence of Rows (12)," ISO/IEC JTCi/SC32 WG3:URC-nnn,
ANSI NCITS H2-2006-nnn, Jul. 31, 2007.
[0112] An interesting thing to note in the above example is the
transitions that occur after processing the event at seq #5. After
the event received at seq #4, the FSA is one event away from a full
pattern match. However, the event received at seq #5 does not
complete a full pattern match. However, binding Q3: (2,3,4,*,*,*)
representing a partial match ("ABC") after seq #4 is progressed to
a Q4: (2,3,4,5,*,*) binding after seq #5 representing a partial
match ("ABCA").
[0113] As described above, pattern matching after receiving an
event is performed based upon the received event and events
received prior to the received event. The bindings that are stored
after processing of an event enable the pattern matching to be
performed without backtracking or re-scanning of the received
events. In one embodiment, an event is processed upon its receipt
by server 102. After an event is received and processed, the extent
to which the pattern specified by the regular expression is matched
based upon the most recently received event and one or more events
received prior to the most recently received event is determined.
The bindings stored after receipt and processing of an event
encapsulate the extent of the matches. The bindings stored after
the receipt of the last received event are then used to determine
pattern matches after receipt of the next event. As a result of the
use of bindings, the one or more events that have been received
prior to the most recently received event do not have to be
processed again for the pattern matching processing. In this
manner, for purposes of pattern matching, an event is processed
only once upon receipt of the event by server 102. After an event
has been received and processed, the event does not have to be
processed again as more events are received by server 102. The
bindings after a sequence store the match information that is used
for evaluating the DEFINE predicates and evaluating the MEASURES
clause on obtaining a full match. In this manner, backtracking of
events is not performed for the pattern matching according to an
embodiment of the present invention.
[0114] As another example, consider the sequence of events depicted
in Table D and the matched symbols:
TABLE-US-00005 TABLE D Seq No. 0 1 2 3 4 5 6 7 8 Price 36 25 25 34
25 25 31 25 45 Matching A B C A B C A B D Symbols C D C D C
[0115] Table E shows the state of the FSA (of Table A) after
receipt of each event in an event stream according to the sequence
indicated in Table D.
TABLE-US-00006 TABLE E Seq # Price Matched Symbol State of FSA
Stored Bindings {Q0} Q0: (*, *, *, *, *, *) 0 36 A {Q0, Q1} Q0: (*,
*, *, *, *, *) Q1: (0, *, *, *, *, *) 1 25 B {Q0, Q2} Q0: (*, *, *,
*, *, *) C Q2: (0, 1, *, *, *, *) 2 25 C {Q0, Q3} Q0: (*, *, *, *,
*, *) Q3: (0, 1, 2, *, *, *) 3 34 A {Q0, Q1, Q4} Q0: (*, *, *, *,
*, *) D Q1: (3, *, *, *, *, *) Q4: (0, 1, 2, 3, *, *) 4 25 B {Q0,
Q2, Q5} Q0: (*, *, *, *, *, *) C Q2: (3, 4, *, *, *, *) Q5: (0, 1,
2, 3, 4, *) 5 25 C {Q0, Q3} Q0: (*, *, *, *, *, *) Q3: (3, 4, 5, *,
*, *) 6 31 A {Q0, Q1, Q4} Q0: (*, *, *, *, *, *) D Q1: (6, *, *, *,
*, *) Q4: (3, 4, 5, 6, *, *) 7 25 B {Q0, Q2, Q5} Q0: (*, *, *, *,
*, *) C Q2: (6, 7, *, *, *, *) Q5: (3, 4, 5, 6, 7, *) 8 45 D {Q0,
Q6} Q0: (*, *, *, *, *, *) Q6: (3, 4, 5, 6, 7, 8)
[0116] As depicted in Table E, the final state Q6 is reached due to
events received at sequence numbers 3, 4, 5, 6, 7, and 8. The
events resulting in a full match are italicized in the second
column and the corresponding symbols that result in the match are
underlined in the third column.
Class A Patterns
[0117] As indicated above, the technique used for performing
pattern matching may be different for different types or classes of
patterns. This section describes a pattern matching technique used
for a specific simplified pattern referred to as a Class A pattern.
The customized processing described in this section may be applied
for detecting Class A patterns in one or more event streams.
[0118] In one embodiment, a Class A pattern is defined as follows:
[0119] Let the pattern to be matched be P=(P.sub.1, P.sub.2, . . .
, P.sub.m), where m>=1. [0120] A Class A pattern is one where
each Pi is only one of the following: [0121] C.sub.i--a symbol
without any quantifier [0122] C.sub.i*--a symbol followed by a
greedy * quantifier, indicating zero or more occurrences of C.sub.i
[0123] C.sub.i+--a symbol followed by a greedy + quantifier,
indicating one or more occurrences of C.sub.i [0124] C.sub.i?--a
symbol followed by a greedy ? quantifier, indicating zero or one
occurrences of C.sub.i
[0125] Further, for a Class A pattern, the predicate defined for a
symbol cannot be dependent on any other symbols. In other words,
the predicate for a symbol has to be independent of other symbols.
Accordingly, a predicate defined for a symbol in a Class A pattern
does not include other symbols. For example, the predicate for a
symbol A cannot have the following predicate (A.price<B.price)
in which the predicate for A is dependent on symbol B. Also,
aggregation operators (e.g., sum) over the same symbol are not
permitted in the DEFINE clause for a Class A pattern. Further, for
Class A patterns, only the regular expression concatenation
operator is allowed to link the symbols in the regular expression,
as shown above. Other, regular expression operators such as
ALTERNATION (or |) and GROUPING are not permitted.
[0126] As described above, in order for a pattern specified in a
query to be classified as a Class A pattern, the pattern and the
predicates associated with symbols in the pattern have to satisfy
certain restrictions described above. Accordingly, as part of
determining a pattern type for a pattern specified in a query,
pattern matching module 110 is configured to extract the regular
expressions and predicates from the query and determine if the
various characteristics (limitations) of a Class A pattern are
satisfied. The pattern is classified as a Class A pattern only if
the restrictions are satisfied. This processing may be performed,
for example, in step 404 depicted in FIG. 4.
[0127] Without loss of generality, it can be assumed that each
C.sub.i is distinct (although this is not a requirement for Class A
patterns). It can be shown that the other cases, where the
C.sub.i's are not distinct, can be reduced to an instance of the
case where the C.sub.i's are distinct. This can be done by, for
each repetition, replacing the repeated symbol with a new symbol
defined using the same predicate.
[0128] The example query 500 depicted in FIG. 5 depicts an example
of a Class A pattern and will be used to illustrate detection of
Class A patterns in an event stream according to an embodiment of
the present invention. Query 500 comprises a regular expression 502
(AB*C) specifying the pattern to be detected. The pattern specified
by regular expression 502 is an example of a recurring pattern due
to the B* portion since it specifies zero or more recurrences of B.
The predicates for the symbols A, B, and C are defined by DEFINE
clause 504. As can be seen in FIG. 5, the predicate for each symbol
is defined such that it is not dependent on any other symbol. In
other words, a predicate for a symbol does not include other
symbols. For example, the predicate for A does not include B, or
C.
[0129] Pattern matching module 110 is configured to analyze query
500 as specifying a pattern that is a Class A pattern. Upon
recognizing a pattern as a Class A pattern, pattern matching module
110 is configured to determine and apply a technique that is
specified for Class A pattern processing. In one embodiment,
according to the selected technique, a finite state automaton (FSA)
is created for the pattern. The following definitions are
introduced to formally describe the structure of the automaton.
[0130] Let .SIGMA. (alphabet)={C.sub.i|1.ltoreq.i.ltoreq.m} [0131]
Let e.sub.i denote the i.sup.th event of the input event stream
[0132] Let C(i).OR right..SIGMA. be defined as
C(i)={C.sub.i|e.sub.i satisfies the predicate defining correlation
name C.sub.i} [0133] FOLLOW(i), for 0.ltoreq.i.ltoreq.m, is defined
as follows
[0134] FOLLOW(m)={$} where $ is a special symbol
[0135] For 0.ltoreq.i.ltoreq.m-1,
FOLLOW ( i ) = FOLLOW ( i + 1 ) C i + 1 if ( P i + 1 = C i + 1 * or
P i + 1 = C i + 1 ? ) = C i + 1 otherwise ##EQU00001##
[0136] The automaton corresponding to the pattern M(P) =(Q,
.SIGMA., .delta., Q0, F) is defined as follows:
Q=Q0.orgate.{Qi|1.ltoreq.i.ltoreq.m}.
Intuitively, there is one initial state and then one state per
pattern symbol (or per correlation name since it is assumed without
loss of generality that the symbols are distinct).
[0137] .SIGMA. is the alphabet given by
.SIGMA.={C.sub.i|1.ltoreq.i.ltoreq.m}
[0138] .delta. is the state transition function (defined in detail
below)
[0139] Q0 is the initial state
[0140] F is the set of final states; F={Qi|Qi.epsilon.Q and
FOLLOW(i) contains $}
[0141] The state transition function for a state Qi, for
0.ltoreq.i.ltoreq.m, is defined as follows:
.delta. ( Qi , Cj ) = Qj where j > i and Cj .di-elect cons.
FOLLOW ( i ) - { $ } = Qi if P i = C i * or P i = C i + = Q 0
otherwise ##EQU00002##
[0142] Applying the above construction technique to the example
depicted in FIG. 5 yields the following automaton:
FOLLOW(3)={$}
FOLLOW(2)={C}
FOLLOW(1)={C, B}
FOLLOW(0)={A}
[0143] Table F depicts the state transition for the automaton
constructed for the pattern specified in FIG. 5.
TABLE-US-00007 TABLE F State Alphabet Next State Q0 A Q1 Q1 B Q2 Q1
C Q3 Q2 C Q3 Q2 B Q2 All other transitions lead to the next state
of Q0.
[0144] The following observation follows directly from the
construction above: [0145] (1) If state Qi.epsilon.F, then for all
j>i, Qj.epsilon.F [0146] (2) If .delta.(Qi, C.sub.j)!=Q0 for
j>i, then .delta.(Q.sub.i-1, C.sub.j) also !=Q0
[0147] The FSA constructed for the Class A pattern is then used
during pattern matching to guide detection of the specified Class A
pattern in event streams during runtime processing. Table G depicts
an example of an input event stream for illustrating the processing
that occurs at runtime for detecting a Class A pattern. As with
previous examples, Table G indicates the sequence in which events
are received in the event stream. For each event, the table shows
the position of the event in the event stream, the price attribute
of the event, and, for each sequence point, the symbols that are
matched by the input event received at that sequence point. A
symbol is considered matched by an event if the predicate
associated with the symbol is satisfied by the event. As explained
below, the underlined items in Table G constitute an instance of a
full pattern match for pattern (AB*C).
TABLE-US-00008 TABLE G Seq No. 1 2 3 4 5 Price 40 20 10 10 40
Matching A A A A A Symbols B B B C C
[0148] During runtime, the FSA machine of Table F is used to detect
the specified pattern in any one of several events based upon the
prices associated with the input events. For a Class A pattern,
multiple event matches may arise due to the non-determinism (since
one event may match multiple correlation names) in the input
events. For example, in the above example, upon receiving event
e.sub.5, the FSA machine is (logically) looking for a pattern match
in one of 18 event matches such as {AAAAA, AAABA, AAACA, ABAAA . .
. }.
[0149] At every point in time (i.e., at any sequence number), after
processing an input event e.sub.i, the state of the automaton
machine may be defined as follows:
S(i)={q|q.epsilon.Q} with S(0)={Q0}
Now S(i+1) is given as follows:
S(i+1)=.orgate..sub.q.epsilon.S(i){.delta.(q, a) where
a.epsilon.C(i+1)}UNION Q0
Accordingly, for each state in S(i), the next state is found for
each symbol that the input event can be bound to. S(i+1) represents
the union of all these states. Maintaining S(i) is the way to
simulate simultaneous detection of the pattern in any one of
several matches. This may be implemented by maintaining S(i) at any
point in the manner described above with the addition that Q0 is
always in S(i).
[0150] Table H shown below illustrates application of the automaton
depicted in Table F to the events received according to Table
G.
TABLE-US-00009 TABLE H Seq # Event e.sub.i (price) Matching Symbol
State of FSA {Q0} 1 40 A {Q0, Q1} 2 20 A {Q0, Q1, Q2} B 3 10 A {Q0,
Q1, Q2, Q3} B C 4 10 A {Q0, Q1, Q2, Q3} B C 5 40 A {Q0, Q1}
[0151] As depicted in Table H, the event received at seq #1
(price=40) results in symbol A being matched and causes the FSA to
be in states Q0 and Q1 ("A"). The event received at seq #2
(price=20) results in symbols A and B being matched and causes the
FSA to be in states Q0, Q1 ("A"), and Q2 ("AB*"). The event
received at seq #3 (price=10) results in symbols A, B, and C being
matched and causes the FSA to be in states Q0, Q1 ("A"), Q2
("AB*"), and Q3 ("AB*C"). Since Q3 is the final state (underlined
in Table H), it indicates a full pattern match for pattern AB*C.
Even though a full pattern match has been found, in one embodiment,
the full pattern match is not output and pattern matching continues
to find the longest pattern match. The event received at seq #4
(price=10) results in symbols A, B, B and C being matched and
causes the FSA to be in states Q0, Q1, Q2, and Q3. Since Q3 is the
final state, it again indicates a match for pattern AB*C. The event
received at seq #5 (price=40) results in symbol A being matched and
causes the FSA to be in states Q0, and Q1. At this point, there is
no longer match possible and the matched pattern at seq #4 is
output. In this manner, pattern matching on the input events is
performed.
[0152] For a Class A pattern, preferment rules are used to
determine which matches to store. For example, all matches may be
stored or only the longest match may be stored. Rules that control
such behavior are referred to as preferment rules. In one
embodiment, preferment may be given to matches based upon the
following priorities: [0153] (1) A match that begins at an earlier
event is preferred over a match that begins at a later event.
[0154] (2) Of two matches matching a greedy quantifier, the longer
match is preferred. [0155] (3) Of two matches matching a reluctant
quantifier, the shorter match is preferred. Matches are then chosen
and maintained per the preferment rules. For information on
preferment rules, please also refer to Fred Zemke et al., "Pattern
Matching in Sequence of Rows (12)," ISO/IEC JTCi/SC32 WG3:URC-nnn,
ANSI NCITS H2-2006-nnn, Jul. 31, 2007. For example, in the example
of Table H, if seq #5 were to evaluate to a C, then the longest
match would be A B B B C (since seq #4 also evaluates to a B) and
not the current A B B C. Further, if seq #5 were a C, then as per
the default SKIP clause (which is SKIP PAST LAST ROW), the
overlapping match A B B C would not be reported.
[0156] Bindings are maintained to facilitate the pattern matching
without performing backtracking on the input events. A binding
indicates that degree to which a pattern is matched as a result of
the last received event. Bindings stored after receiving an event
may indicate partial matches that have the potential of becoming
longer matches or full matches. For simple non-recurring patterns,
as specified in FIG. 2 and described above, for a state Qi, the
length of the binding (the number of non-star elements in the
binding, i.e., the number of elements that have an associated event
from the stream mapped) is "i" and since this is always the last
"i" events of the event stream, the binding is unique for a state
and thus there is one binding per state. However, for recurring
patterns, such as the pattern specified in FIG. 5, there could be
multiple bindings applicable for a given state at any point in
time. For example in Table H, after processing e.sub.3 (i.e., event
received at seq #3 ) for state Q2 (state representing matching of
the first two symbols of the pattern), both (1) (A=2, B=3) (i.e., A
matched by the event at seq #2 and B matched by the event at seq #3
) and (2) (A=1, B=2, B=3) are valid bindings. However, since the
predicate defining a symbol is defined independent of other symbols
(a feature of Class A patterns), it follows that the set of symbols
that an event can be bound to is independent of the current
bindings. This is a feature of Class A patterns (and as will be
described below differentiates it from Class B patterns).
[0157] As is evident from the above, a full match state for a Class
A pattern may have multiple bindings associated with it. Preferment
rules may be preconfigured for the pattern matching system for
selecting a particular binding from the multiple bindings. For
example, as depicted in FIG. 1, preferment rules 126 may be
configured that are used by pattern matching module 110 to select
which binding, from among many, to select. In one embodiment, these
preferment rules may be configured to follow the preferment rules
defined by a specification being supported by the pattern matching
system. For example, the preferment rules for an embodiment of the
present invention may be configured to follow and support the
preferment rules defined by the "Pattern Matching in Sequences of
Rows (12)" specification. Preferment rules may be preconfigured for
the pattern matching system specifying rules for selecting a
partial binding for a state for further processing from among
multiple partial bindings associated with that state.
[0158] Since an FSA is a right-invariant equivalence relation over
.SIGMA.* (that is if xR.sub.My then xzR.sub.Myz for strings x,y and
any string z), and the preferment rules used for the processing are
prefix based, only one of these bindings needs to be maintained
(for the SKIP PAST LAST ROW case). From the above, it follows that
the number of active bindings stored at any point in time for a
Class A pattern is bound by the number of states and is one plus
the length of the specified pattern, i.e., one plus the number of
symbol positions in the specified Class A pattern. For example, for
the pattern AB*C, the maximum number of bindings maintained at any
point is 3+1=4 bindings. Thus, in the default SKIP case (SKIP PAST
LAST ROW) where the requirement is NOT to report overlapping
matches, after processing every input event, there need be at most
only one binding per state of the automaton.
[0159] For example, consider the previous example depicted in Table
H where after the event received at seq #3 (referred as e3), there
are two possible bindings that are associated with state Q2: (1)
(A=2, B=3) and (2) (A=1, B=2, B=3). Now suppose "x" is the sequence
of events corresponding to the first binding (i.e., x=(A=2, B=3))
and "y" is the sequence of events corresponding to the second
binding (i.e., y=(A=1, B=2, B=3)), then for every following
sequence of events "z", both "xz" and "yz" will be in the same
state of the automaton. This indicates right equivalence. The
reason for this is due to the nature of a Class A pattern where
predicates are independent of other symbols and hence the set of
correlation names or symbols that an event can be bound to is
independent of binding context.
[0160] Further, it can be shown that, whenever, "xz" is in final
state, "yz" would be the match preferred over the match "xz" if and
only if "y" is preferred over "x" per the preferment rules. Hence,
it suffices to maintain the second binding (corresponding to "y")
and discard the first binding (corresponding to "x") after
processing input event e3, for state Q2.
[0161] For simple non-recurring patterns as specified in FIG. 2 and
described above, when a state S(i) contains a final state (a state
that is a member of F), the binding associated with the state is
immediately output and S(i) is reset to {Q0}. This could be done
since the length of matching bindings was always fixed, which meant
that one could not get two matching bindings where one was properly
contained in the other. In the case of simple recurring patterns as
depicted in FIG. 5, it is possible to get two matches where one is
properly contained in the other. For instance, in the example
above, after processing e.sub.3, S(i) contains the final state Q3.
For example, if the associated binding is (A=1, B=2, C=3), it
cannot be concluded at this point that this is part of the output
since the binding (A=1, B=2, B=3) could develop into a longer match
that would take precedence over (A=1, B=2, C=3) by the preferment
rules. This is exactly what happens after processing e.sub.4.
[0162] Based on the above, in one embodiment, bindings may be
classified into 3 categories: [0163] (1) Matched and reported
bindings: These are bindings that constitute a full pattern match
and are output; [0164] (2) Matched but unsure bindings: These are
bindings that constitute a full pattern match but there are other
partial matches that could develop into matches and contain this
binding. Hence, these bindings are not output at this point in
time; [0165] (3) Partial matches: These are bindings that are
partial matches that have the potential to become full matches.
[0166] Further, the following may be defined for bindings: [0167]
Let "b" be a binding. [0168] Interval of a binding INT(b)=(i,j)
where i is the least sequence number of an event that is part of
this binding and j is the highest sequence number of an event in
the binding. For example, for the binding b=(A=1, B=2, C=3),
INT(b)=(1,3) [0169] Left(b)=i where INT(b)=(i,j) [0170] Right(b)=j
where INT(b)=(i,j) [0171] Length(b)=j-i+1 where INT(b)=(i,j) [0172]
For a set TB of bindings, min(TB)={min left(b)|b.epsilon.TB},
max(TB)={max right(b)|b.epsilon.TB}
[0173] Based upon the above, it can be shown that in the case of
Class A recurrent patterns as specified in FIG. 5, there can be at
most one binding that is in the matched but unsure category at any
point in time. Further, we can show that left(d)=min(TB) (where `d`
is the unique binding in the matched but unsure category). This
essentially follows from the above based on the structure of the
automaton constructed.
[0174] Accordingly, maintaining bindings during runtime processing
essentially involves maintaining the vector (d, B) where d is the
binding (if any) in the matched but unsure category and B is the
set of partial match bindings. On processing e.sub.i, let F1 denote
the set of final states in S(i) and let B1 denote the set of
bindings associated with the states in F1. FIG. 6 depicts a
simplified flowchart 600 depicting a method of maintaining bindings
for Class A patterns according to an embodiment of the present
invention. The method may be performed by software (program, code,
instructions executed by a processor), in hardware, or combinations
thereof. In one embodiment, the processing is performed by matcher
116 depicted in FIG. 1.
[0175] As depicted in FIG. 6, a determination is to check if F1 and
d are empty or non-empty (step 602). Here, "d" refers to the value
of the symbol after processing event e.sub.(i-1) and "F1" refers to
the set after processing event e.sub.i. B corresponds to the set of
partial match bindings after processing event e.sub.i. Processing
is then performed based upon the determinations in 602. There can
be one of four situations: [0176] (1) If F1 and d are both
determined to be empty, then the set of stored bindings B is
updated (step 604). Accordingly, on processing e.sub.i, there will
be a change to the set of stored bindings and B is updated. Nothing
else is done in 604. [0177] (2) If F1 is determined to be non-empty
and d is determined to be empty, then among the bindings in B1, a
unique binding b is picked after applying preferment rules (step
606). A determination is then made if left(b) is less than min(B)
and if there is no transition from the final state corresponding to
b (step 608). If the condition is 608 is met, then b is output and
all bindings h from B and their corresponding states from S(i) are
removed except for Q0 (step 610). If the condition in 608 is not
satisfied then it implies that left(b)=min(B) since it is not
possible that left(b)>min(B). In this case, d is set to b and
all bindings h from B where left(h)>left(b) are removed and also
all their corresponding states from S(i) are removed, except for Q0
(step 612). [0178] (3) If F1 is non-empty and d is non-empty, then
the old d is discarded (step 614). Processing then proceeds with
step 606. [0179] (4) If F1 is empty, d is non-empty, then if
left(d)<min(B) (step 616), d is output (step 618) and then set
to null (step 620). Else (i.e., left(d)==min(B), since left(d)
cannot be greater than min(B)), then nothing is done.
[0180] Table I shown below depicts the processing after receipt of
each event according the automaton depicted in Table F.
TABLE-US-00010 TABLE I State Matched but Si E.sub.i C S(i) Update
Bindings Apply Preferment Partial Bindings unsure Output {Q0} Q0: (
) Q0: ( ) Q0: ( ) 1 40 A {Q0, Q0: ( ) Q0: ( ) Q0: ( ) Q1} Q1: (A =
1) Q1: (A = 1) Q1: (A = 1) 2 20 A {Q0, Q0: ( ) Q0: ( ) Q0: ( ) B
Q1, Q1: (A = 2) Q1: (A = 2) Q1: (A = 2) Q2} Q2: (A = 1, B = 2) Q2:
(A = 1, B = 2) Q2: (A = 1, B = 2) 3 10 A {Q0, Q0: ( ) Q0: ( ) Q0: (
) Q3: (A = 1, B Q1, Q1: (A = 3) B = 2, C = 3) C Q2, Q2: (A = 2, B =
3) Q3} Q2: (A = 1, B = 2, B = 3) Q2: (A = 1, B = 2, B = 3) Q2: (A =
1, B = 2, B = 3) Q3: (A = 2, C = 3) Q3: (A = 1, B = 2, C = 3) Q3:
(A = 1, B = 2, C = 3) 4 10 A {Q0, Q0: ( ) Q0: ( ) Q0: ( ) Q3: B Q1,
Q1: (A = 4) (A = 1, B = 2, C Q2, Q2: (A = 1, B = 2, B = 3, B = 4)
Q2: Q2: B = 3, C = 4) Q3} Q3: (A = 1, B = 2, B = 3, C = 4) (A = 1,
B = 2, B = 3, B = 4) (A = 1, B = 2, B = 3, B = 4) Q3: (A = 1, B =
2, B = 3, C = 4) 5 40 A {Q0, Q0: ( ) Q0: ( ) Q0: ( ) (A = 1, B =
Q1} Q1: (A = 5) Q1: (A = 5) Q1: (A = 5) 2, B = 3, C = 4)
[0181] In Table I, the first column shows the sequence in which
events are received. The price attribute of each event is depicted
in the second column. The third column shows the symbol matches
after processing an event. The fourth column "State(S(i)" depicts
the possible states of the finite automaton after processing each
received event. The fifth column "Update Bindings" identifies the
updated bindings corresponding to the states after processing an
event. It should be noted that in the fifth column, a state may
have multiple bindings associated with it. The sixth column "Apply
Preferment" identifies bindings selected after applying preferment
rules and/or deleting bindings as per steps 610 and 612 from FIG. 6
to the states and associated bindings depicted in the fifth column.
As previously described, preferment rules are used to select a
binding for a state from a set of bindings. Preferment rules are
used in conjunction with the SKIP clause. It is sufficient to
maintain only one binding per state. At times, after processing an
input there may be states with more than one binding (as in this
example). For each such state, the most "preferred" binding is
retained and others are discarded. This is because, as mentioned
earlier, it can be shown that (right equivalence) the retained
binding will always yield a match that would be preferred compared
to equivalent matches resulting from the same suffix applied on the
other competing bindings that are discarded (for the same state).
In Table I, the bindings that are deleted as a result of applying
preferment rules are shown with a strikethrough. The seventh column
"Partial Bindings" identifies partial bindings after processing an
event after applying the preferment rules. The eight column
"Matched but unsure" identifies the matched but unsure bindings
after processing an event after applying the preferment rules. The
ninth column "Output" identifies a binding that results in a
pattern match that is output.
[0182] As indicated above, for Class A patterns, a state after
processing an event can have multiple bindings associated with.
After applying preferment rules, one or more of the bindings
associated with a state may be deleted. In situations where you do
not want overlapping matches (e.g., if the SKIP PAST LAST ROW
clause is used), then some bindings may be deleted even if the
deleted binding is a single binding for a state. For example, in
Table I, after S3, it is known for sure that the first 3 events
will participate in a full but unsure match, and since overlapping
matches are not needed (SKIP PAST LAST ROW), there is no point in
keeping A=3 in Q1 since it intersects with a previous match.
Accordingly, after applying preferment rules, the binding A=3 in Q1
is deleted. In this manner, those bindings that can yield full
matches that will be lower in preferment than an already determined
full match can be deleted. If a state does not have any bindings
associated with it, it is referred to as an inactive state and is
also deleted. For example, in Table I, state Q1 is deleted after S3
and S4.
[0183] The following example illustrates how bindings are stored
after receiving each event according to an embodiment of the
present invention. For this example, let S(c1 integer) be an input
event stream. A query Q may be received specifying a pattern to be
detected in event stream S as follows:
TABLE-US-00011 SELECT * FROM S MATCH_RECOGNIZE ( MEASURES A.c1 as
c1, B.c1 as c2, C.c1 as c3, D.c1 as c4 PATTERN(A B+ C D*) DEFINE A
as A.c1 % 2 == 0, B as B.c1 % 3 == 0, C as C.c1 % 5 == 0, D as D.c1
% 7 == 0, ) as T
[0184] In the above example, the (X.c1% Y==0) predicates test
whether X.c1 is divisible by Y. The pattern specified in the query
is a Class A pattern. Table J shows a sequence of events received
in event stream S and the matched symbols for the pattern
(AB+CD*)
TABLE-US-00012 TABLE J Seq No. 0 1 2 3 4 5 c1 2 3 30 14 77 4
Matching A B A A D A Symbols B D C
[0185] The FSA for the above query will have the following states:
[0186] Q0--initial state [0187] Q1--corresponding to A [0188]
Q2--corresponding to AB+ [0189] Q3--corresponding to AB+C [0190]
Q4--corresponding to AB+CD* Among these, Q3 and Q4 are the final
states. It should also be noted that both the final states have an
out transition, from Q3 to Q4 on D, and from Q4 to itself on D.
[0191] Processing of the input events, per the flowchart depicted
in FIG. 6 and described above, occurs as shown below. In the
description below, "d" refers to the value of the symbol after
processing event e.sub.(i-1) and "F1" refers to the set after
processing event e.sub.i. B refers to the set of partial bindings
after processing event e.sub.i.
[0192] Sequence #1 [0193] Current Input=2 [0194] Matching symbols=A
[0195] F1={ }--empty [0196] d={ }--empty This is the case where d
is empty and F1 is also empty. [0197] S(1)={Q0, Q1} B={<(A=2),
Q1>} where <. . . > represents a single binding and (A=2)
is the symbol to input mapping and Q represents the state
associated with this binding.
[0198] Sequence #2 [0199] Current Input=3 [0200] Matching symbols=B
[0201] F1={ }--empty [0202] d={ }--empty This is the case where d
is empty and F1 is also empty. [0203] S(2)={Q0, Q2} [0204]
B={<(A=2, B=3), Q2>}
[0205] Sequence #3 [0206] Current Input=30 [0207] Matching
symbols=A, B, C [0208] F1={Q3} This is the case where d is empty
and F1 is not empty. [0209] S(3)={Q0, Q1, Q2, Q3} [0210]
B={<(A=2, B=3, B=30), Q2>, <(A=30), Q1>} [0211] b is
the unique binding in the final state namely--<(A=2, B=3, C=30),
Q3> [0212] left(b)=seq#1=min(B)=seq#1 [0213] Thus, d=b=<(A=2,
B=3, C=30), Q3> Also, the binding <(A=30), Q1>with
seq#4>left(b) is deleted and its associated state Q1 is removed
from S(3).
So,
[0213] [0214] S(3)={Q0, Q2, Q3} [0215] B={<(A=2, B=3, B=30),
Q2>}
[0216] Sequence #4 [0217] Current Input=14 [0218] Matching
symbols=A, D [0219] F1={Q4} This is the case where d is not empty
and F1 is also not empty. [0220] S(4)={Q0, Q1, Q4} [0221]
B={<(A=14), Q1>} [0222] b is the unique binding in the final
state namely--<(A=2, B=3, C=30, D=14), Q4> [0223]
left(b)=seq#1 <min(B)=seq#4. However, there is a transition out
of state Q4. [0224] Thus, old d=<(A=2, B=3, C=30), Q3> is
discarded and now [0225] d=<(A=2, B=3, C=30, D=14), Q4> Also,
S(1) is reset to {Q0, Q4} since the binding <(A=14), Q1> is
deleted and its associated state [0226] Q1 is removed from S(4).
Thus B={ } [0227] Sequence #5 [0228] Current Input=77 [0229]
Matching symbols=D [0230] F1={Q4} This is the case where d is not
empty and F1 is also not empty. [0231] S(5)={Q0, Q4} [0232] B={ }
[0233] b is the unique binding in the final state namely--<(A=2,
B=3, C=30, D=14, D=77), Q4> [0234] left(b)=seq#1 =min(B) is not
defined, also there is a transition out of state Q4. [0235] Thus,
old d=<(A=2, B=3, C=30, D=14), Q4> is discarded and now
d=<(A=2, B=3, C=30, D=14, D=77), Q4>
[0236] Sequence #6 [0237] Current Input=4 [0238] Matching symbols=A
[0239] F1={Q0} This is the case where d is not empty and F1 is
empty. [0240] S(6)={Q0, Q1} [0241] B={<(A=4), Q1>} [0242]
left(d)=seq#1<min(B)=seq#6 Thus, d=<(A=2, B=3, C=30, D=14,
D=77), Q4> is output and d reset back to empty. Note that S(6)
and B={<(A=4), Q1>} remain as they are and nothing is deleted
in this case.
[0243] FIG. 7 is a simplified flowchart 700 depicting a method for
performing pattern matching for Class A patterns after receiving
each event in an event stream according to an embodiment of the
present invention. The method may be performed by software
(program, code, instructions executed by a processor), in hardware,
or combinations thereof. The software may be stored on a
computer-readable storage medium. In one embodiment, the processing
is performed by matcher 116 depicted in FIG. 1.
[0244] The processing is initiated upon receiving an event (step
702). Symbols, if any, that are matched due to the event received
in 702 are determined (step 704). One or more states for the
automaton are determined based upon the symbols determined to be
matched in 704 and based upon bindings, if any, stored prior to
receiving the event received in 702 (step 706). For example, the
bindings stored upon receiving and processing the event received
prior to the event received in 702 may be used to determine the
state(s) of the automaton.
[0245] Updated bindings are then determined and maintained for the
states determined in 706 (step 708). In one embodiment, the
processing depicted in FIG. 6 may be performed in step 708 as part
of determining which bindings to update and maintain. The
processing in step 708 may comprise updating the previously stored
bindings, applying preferment rules to select bindings from among
bindings associated with the same state, determining matched but
unsure bindings, and determining matched bindings that are to be
output. Full pattern matches, if any, that are to be output are
then determined based upon whether or not the updated bindings
comprise any bindings representing full pattern matches which are
to be output (step 710). Events corresponding to the full pattern
matches, if any, determined in 710 are then output (step 712).
Other actions, triggered by a full pattern match, may also be
performed in 712. The processing depicted in FIG. 7 is performed
upon receiving each event in the event stream. The bindings
determined and maintained in 708 are then used during processing of
the next received event.
[0246] In one embodiment, the processing depicted in FIG. 6 and
described above is performed in steps 708, 710, and 712 of FIG.
7.
[0247] The technique described above is capable of detecting Class
A patterns, including recurring and non-recurring patterns, in
input event streams without performing any backtracking of the
input events. Further, due to application of preferment rules, only
one binding associated with a given state of the FSA is maintained
at any point in time. As a result, the number of bindings to be
maintained at any time point after processing an event is bounded
by the number of states, which in turn is proportional to the
length of the pattern to be matched. This enables pattern matching
of Class A patterns to be performed efficiently in polynomial space
and time over the number of symbols making up the pattern to be
matched. The technique is thus very scalable and pattern matching
can be performed at reduced costs.
[0248] Further, since the predicates defining symbols for Class A
patterns are defined independent of other symbols, it follows that
the set of symbols that an input event can be bound to is
independent of the current bindings. Further, since an FSA is a
right-invariant equivalence relation over S* (that is if xRMy then
xzRMyz for strings x,y and any string z), and the preferment rules
are prefix based, only one of these bindings is maintained per
state. Accordingly, the number of active bindings at any point in
time is bound by the number of states and is equal to one plus the
length of the pattern to be matched.
[0249] In the manner described above, embodiments of the present
invention are capable of automatically identifying Class A pattern
based upon the input pattern to be matched and based upon the
predicates associated with the pattern symbols. Upon identifying a
pattern as a Class A pattern, embodiments of the present invention
are configured to select and apply a pattern matching technique
that is appropriate for processing Class A patterns. The
application of the selected technique enables processing of event
streams for detecting Class A patterns to be performed efficiently
(e.g., in terms of memory and processing resources used) and in a
manner that is scalable.
[0250] Class A patterns represent a class of patterns which can be
used to model a large number of use cases. Since the number of
patterns that can be generally specified can be quite large and may
require a significant amount of computing resources, it becomes
beneficial to identify a subclass (e.g., Class A patterns) of the
global universe of patterns that is widely used and for which an
efficient customized pattern matching solution is applied as
described above.
[0251] As described above, Class A patterns represent a subset of
patterns that may be specified for pattern matching. The following
section describes a more generalized technique for performing
pattern matching in input event streams for patterns that may not
fall under the umbrella of Class A patterns.
Class B Patterns (General Patterns)
[0252] The above section described a technique for identifying
Class A patterns and performing pattern matching for this specific
subclass of patterns in polynomial space and polynomial time.
However, there are several patterns that do not qualify as Class A
patterns. This section describes a technique for performing pattern
matching for general patterns, which will be referred to as Class B
patterns to differentiate them from Class A patterns. A Class B
pattern may include a Class A pattern.
[0253] Class B patterns are general patterns that are not
restricted by the limitations imposed on Class A patterns. One of
the differences between processing of Class A patterns and the
Class B patterns is that there may be multiple bindings in the
Class B patterns scenario that are associated with a given state of
the FSA at any point in time and that need to be maintained for
processing of the next event while in the Class A pattern case at
most one binding associated with a given state of the FSA may be
maintained at any point in time. As a result, for Class A patterns
that the number of bindings that is maintained after processing an
event is bounded by the number of states, which itself is
proportional to the number of symbols in the pattern to be matched,
thus yielding a solution that is polynomial in space and time over
the number of symbols making up that pattern.
[0254] Further, unlike Class A patterns, a predicate associated
with a symbol in a Class B pattern may contain references to other
symbols (e.g. a predicate associated with a symbol A be defined as
(A.price<B.price), where B is another symbol having its own
associated predicate). Accordingly, a predicate for a symbol in
Class B patterns may be dependent on other symbols.
[0255] The framework for performing Class B pattern matching may be
divided into two stages: (1) a compile or design time stage during
which an automaton is constructed for the query; and (2) a runtime
stage during which the automaton generated in (1) is used to guide
the pattern matching process. Bindings representing partial and/or
full matches are maintained during the runtime stage processing. In
the first stage, if the pattern is included in a query, the query
is compiled into an execution plan that comprises the automaton for
the query. The plan is then executed at runtime.
[0256] FIG. 8 is an example of a query 800 specifying a Class B
pattern 802 according to an embodiment of the present invention.
The pattern tkpattern_q10 depicted in FIG. 8 is a Class B pattern
but not a class A pattern (e.g., the aggregate "avg(B.c1)" in the
predicate defining C is not allowed in a Class A pattern; also the
predicate for B is dependent on A, which is not allowed in a Class
A pattern). The predicates for the symbols in the pattern are
defined by DEFINE clause 804. As can be seen from DEFINE clause
804, the predicate for one symbol may be dependent upon another
symbol. For example, the predicate for symbol B is dependent upon
symbol A, and the predicate for symbol C is dependent on symbol B.
At compile time processing, query 800 may be compiled into an
execution plan that is used for runtime processing. A Finite State
Automaton (FSA) is constructed corresponding to regular expression
and the predicates specified in query 800. As an example, the
following sequence matches the pattern depicted in FIG. 8
TABLE-US-00013 10 11 12 13 2 A B B B C
[0257] FIG. 9 is a simplified flowchart 900 depicting a method for
performing operations at compile time including constructing an
automaton for a general Class B pattern according to an embodiment
of the present invention. The method may be performed by software
(program, code, instructions executed by a processor), in hardware,
or combinations thereof. In one embodiment, the processing is
performed by automaton generator 114 depicted in FIG. 1. Flowchart
900 assumes that the input regular expression has been already been
determined to specify a Class B pattern.
[0258] As depicted in FIG. 9, a standard grammar for regular
expressions is used to create a parse tree for the specified
regular expression specifying the pattern (step 902). The parse
tree obtained in 902 is then used as a guide to recursively
construct the FSA (step 904). The base case for recursion is a
simple symbol (e.g., A) or symbol followed by a quantifier (e.g.,:
A*).
[0259] The out-transitions from each state are ordered to handle
the preferment rules (step 906). In order to identify the most
preferred match, while applying transitions to a binding in state
S, the transitions are applied in a specific order. This order
(among the set of transitions from a state) is determined at
compile time and stored as part of the automaton. Included in this
order, is a rank for the "finality" of the state (applicable only
for final states). This is done to accommodate the preferment
rules. In the presence of reluctant quantifiers (such as *?)
sometimes "shorter" matches are preferred. However, for greedy
quantifiers (such as *) "longer" matches are preferred. Using this
technique of introducing a rank amongst the set of out transitions
for an "imaginary" transition corresponding to the finality of a
state (applies only to a final state), such decisions can be made
in a manner consistent with the overall framework (and without
having to special case the final state during runtime processing)
that handles the preferment rules. The following is an invariant
there will be only one start state and the start state does not
have any incoming edges.
[0260] Several operators such as CONCATENATION, ALTERNATION, and
various QUANTIFIERS (e.g., *, ?, +) may be used for one or a group
of symbols. Examples of operators and quantifiers that may be used
are described in "Pattern matching in sequences of rows (12)"
document. These operators and quantifiers are handled in the
recursion step in 904. For example, the CONCATENATION operator may
be handled in the recursive step in the following manner. Consider
an example R.S where R is the left regular expression and S is the
right regular expression. Let F be the FSA corresponding to R and G
be the FSA corresponding to S. Let Y be the set of final states in
F. Let J be the start state of G. The FSA corresponding to the
union of F and G may be called as H. Accordingly, all the
out-transitions from state J in FSA G are copied to each state Y in
FSA H. These new transitions are introduced to each state in Y at
the position of its "finality" rank in the same order as they
appeared in state J of FSA G (note that all states in Y were final
in F and hence would have a "finality" rank in the order of their
out transitions). Note that each state in Y remains final if and
only if J was a final state in G. State J is then removed from H.
Other operators such as ALTERNATION, GROUPING, and Quantifiers
across groups may also handled individually in a similar
manner.
[0261] Referring back to FIG. 9, all referenced aggregations are
aggregated and "init" and "incr" evaluators for the aggregations
and their corresponding input expressions are prepared (step 908).
As part of processing every input and updating the bindings, the
specified aggregations are also incrementally maintained. These
aggregations are stored along with the bindings. For example,
suppose there is a reference to sum(B.c1) (where this could be
referenced in one of the DEFINE predicates or the MEASURES clause),
then this would be dependent on the binding. For a binding with
(A=1, B=2, B=3), sum(B.c1) would be 2+3=5 while for another binding
(A=1, B=2) sum(B.c1)=2. Init and Incr evaluators are mechanisms
used to initialize the aggregations when a binding is newly created
and to incrementally maintain the aggregation as the binding is
"grown".
[0262] Evaluators are then prepared for each of the defined
predicates (e.g., predicates specified by the DEFINE clause in a
query) (step 910). Evaluators are also prepared for the expressions
in the MEASURES clause (step 912). These evaluators are the
mechanism used to evaluate a DEFINE predicate to determine the set
of symbols that an input event corresponds to. For Class B
patterns, this is done in the context of the binding, i.e., for a
binding b1, the same input may evaluate to a symbol A while for
binding b2, it may evaluate to B.
[0263] The FSA constructed as described above and depicted in FIG.
9 may then be used to guide detection of Class B patterns in input
event streams during runtime processing. In one embodiment, the FSA
constructed at compile time is used as a guide to indicate how much
of the specified pattern has matched at any point in time. As with
Class A patterns, bindings are also used to capture the partial or
full match states at any point in time during the runtime
processing. As previously indicated, a binding can be thought of as
an assignment of contiguous events to symbols that corresponds to a
partial or full match that satisfies all the DEFINE predicates.
[0264] The runtime pattern matching processing may be explained
using the following definitions. Consider a situation where an
input tuple or event i has been processed and an input tuple or
event (i+1) is to be processed. A binding after processing of e(i)
may be in one of following three disjoint states: [0265] (1)
Partial Active (PA) state--binding not in a final state. A binding
is in this state if it represents a binding that is not in a final
state but has to potential of growing into a full match; [0266] (2)
Only Matched but unsure (MU) state--binding in a final state with
no out-transitions (i.e., no transitions to another state or the
same final state); or [0267] (3) Both active and matched but unsure
state (AMU)--binding in a final state with out-transitions to
another or the same final state. Further, let AB be the set of
active bindings (i.e., bindings in states PA and AMU). Let FB be
the set of final bindings (i.e., bindings in states MU and AMU) Let
AFB=AB union FB Let AFBL be an ordered list of bindings from set
AFB in decreasing order of preferment. (It may be noted that
preferment rules may be defined not only for bindings in FB but for
all bindings.)
[0268] Derived bindings may also be defined. Consider a binding b1
in AB after event i has been processed. Now suppose on processing
event (i+1), this binding can be "grown" to bd1, bd2, . . . bdk.
Then all these bindings are considered to be derived from b1.
[0269] Based upon the above definitions, the following
observations/invariants may be made. [0270] (1) The "last" event
(by event sequence number) in every binding in AB is the same and
is event i. [0271] (2) For every pair of bindings fb1, fb2 in FB,
INT(fb1) and INT(fb2) do not intersect. [0272] (3) For every
binding fb in FB, there exists a binding ab in AB such that ab==fb
OR ab !=fb and ab is preferred to fb as per preferment rules.
[0273] (4) Consider distinct bindings fb in FB, ab in AB. If fb is
preferred to ab as per preferment rules, then INT(ab) does not
intersect with INT(fb). [0274] (5) Suppose bd1 is derived from b1
and bd2 is derived from b2. Then bd1 is preferred to bd2 iff b1 is
preferred to b2.
[0275] Based on the above, the following invariants also hold:
[0276] (1) If list AFBL is not empty, then the first binding in the
list AFBL is in AB (follows from 3 above); and [0277] (2) If there
was a binding that moved into set FB for the first time during
processing event i, and if it is still in FB, then this is the last
binding in the list AFBL (follows from 1, 2, 4 above).
[0278] FIGS. 10A and 10B depict a simplified flowchart 1000
depicting runtime processing performed for detecting a Class B
pattern in an input event stream according to an embodiment of the
present invention. The method may be performed by software
(program, code, instructions executed by a processor), in hardware,
or combinations thereof. In one embodiment, the processing is
performed by matcher 116 depicted in FIG. 1. The method depicted in
FIGS. 10A and 10B show processing that is performed upon receiving
an event e.sub.i+1.
[0279] As depicted in FIG. 10A, processing is initiated upon
receiving an event e.sub.i+1 (step 1001). A new list of bindings
NEW_AFBL is initialized to an empty list (step 1002). The AFBL list
is accessed (step 1004). As described above, the AFBL list is an
ordered list of bindings from set AFB in decreasing order of
preferment, wherein the set AFB is a union of bindings in set AB
(the set of active bindings (i.e., bindings in states PA and AMU))
and bindings in set FB (the set of final bindings (i.e., bindings
in states MU and AMU)).
[0280] Bindings are then iteratively selected from list AFBL in
decreasing order of preferment and processing according to steps
1006, 1008, 1010, 1012, 1014, 1016, 1017, 1018, 1020, and 1022. A
binding from list AFBL is selected in decreasing order of
preferment (step 1006). The binding selected in step 1006 will be
referred to as binding "b". A check is then made to see if b is in
AB (step 1008). If binding b is determined to not be in AB, then
binding b is inserted into list NEW AFBL (step 1010). Processing
then continues with step 1022.
[0281] If it is determined in 1008 that b is in AB, then ALPHA is
set to the symbols that are matched by event e.sub.i+1 (step 1012).
The symbols in ALPHA are then selected iteratively in order of the
out-transitions on the corresponding state of the FSA and processed
according to steps 1014, 1016, 1017, 1018, and 1020. A symbol is
selected from ALPHA for processing (step 1014). The symbol selected
in 1014 will be referred to as symbol "a". A check is then made to
see if there is a binding ba gotten by applying symbol a on binding
b (step 1016). If such a binding ba exists then the binding ba is
inserted into list NEW_AFBL (step 1017). Further, if ba is moving
into FB for the first time then newFinal(ba) is set to TRUE (step
1018). This is to identify whether in this iteration there is a
binding that has moved into a final state. This means that all
bindings that would be created following this need not be
considered (hence need not be inserted into NEW_AFBL). Nothing is
done if no such binding ba exists. A check is then done to see if
all symbols in ALPHA have been processed (step 1020). If all
symbols in ALPHA have not been processed, then processing continues
with step 1014 wherein the next symbol that has not yet been
processed is selected. Else processing continues with step
1022.
[0282] A check is then made to see if all bindings in AFBL have
been processed (in decreasing order of preferment) (step 1022). If
it is determined in 1022 that all bindings in AFBL have not been
processed, then processing continues with step 1006 wherein another
unprocessed binding is selected. If it is determined in 1022 that
all bindings in AFBL have not been processed, then, in order to
handle state Q0, new bindings are inserted into NEW_AFBL in
appropriate order (step 1024). The order would correspond to
iterating through the out-transitions of state Q0 in the order in
which they appear in the FSA. The processing would correspond to b
is in AB. [0283] 1. Let ALPHA be the alphabets that tuple (i+1)
evaluates to for the binding b [0284] 2. Iterate through the
alphabets in ALPHA in order of their occurrence in the out
transitions of the current state of the binding [0285] 1. Let a be
the alphabet for this iteration [0286] 2. Insert binding ba got by
applying alphabet a on binding b into NEW AFBL. Further if ba is
moving into FB for this first time then mark newFinal(ba)=true
[0287] 3. If there is no such binding by applying alphabet a on
binding b then do nothing
[0288] The bindings in list NEW_AFBL are then processed. In order
to facilitate the processing, a variable "delete_remaining" is set
to FALSE and another variable "found_ab" is set to FALSE (step
1026). The bindings in list NEW_AFBL are processed according to
steps 1028 onwards (1028 to 1054). A binding is selected from
NEW_AFBL for processing (step 1028). The binding selected in 1028
will be referred to as b.
[0289] Processing is then performed based upon the state of the
selected binding, whether it is MU, AMU, or PA. A check is
performed to see if the state of the binding selected in 1028 is MU
(step 1030). If the state of binding b is not MU, then processing
continues with step 1042. If it is determined in 1030 that the
state of the binding is MU, then if delete_remaining is TRUE then
the binding b is deleted from NEW_AFBL (step 1032). Then, if
newFinal(b) is TRUE, delete_remaining is set to TRUE (step 1034).
Then, if found_ab is FALSE (i.e., !found_ab), then binding b is
reported as a match and binding b is deleted from NEW_AFBL (step
1040).
[0290] A check is made to see if the state of binding b selected in
1028 is AMU (step 1042). If the state of binding b is not AMU, then
processing continues with step 1050. If it is determined in 1042
that the state of the binding is AMU, then if delete_remaining is
FALSE (i.e., !delete_remaining) and newFinal(b) is true, then
delete_remaining is set to TRUE (step 1044). Else, if
delete_remaining is TRUE then binding b is deleted from NEW_AFBL
(step 1046). The variable found_ab is then set to TRUE (step
1048).
[0291] If the state of binding b is neither MU nor AMU, then the
state of b is PA. In such a case, if delete_remaining is set to
TRUE then binding b is deleted from NEW_AFBL (step 1050). The
variable found_ab is set to TRUE (step 1052). Processing then
continues with step 1054.
[0292] A check is then made to see if all bindings in list NEW_AFBL
have been processed (step 1054). If all bindings in NEW_AFBL are
determined to have been processed then the processing ends, else
processing continues with step 1028 wherein a new binding is
selected for processing.
[0293] At the end of processing of event e.sub.i+1, it can be
verified that the seven invariants listed above hold. It should be
noted that the method depicted in FIG. 10 and described above may
also be used for detecting Class A, including Class A simple
recurring patterns, since these patterns are just a subclass of the
general Class B patterns. The method depicted in FIG. 10 and
described above may be implemented with two separate lists: i)
Partial Active PA and ii) Both active and Matched but unsure list
AMU. In one embodiment, rather than have a single AFB list, it is
also possible to have two lists--AB and FB.
Output Ordering for Partition by Support for Patterns
[0294] In an embodiment of the present invention, an input event
stream may be partitioned into multiple different streams based for
example upon a symbol. The pattern matching techniques described
above may, then, be applied over each sub-stream. In this scenario,
a query comprising a regular expression is compiled into a PLAN
that is then used at runtime processing for detecting the pattern
over the partitioned multiple streams. The compilation process
comprises building an FSA for the regular expression. This may
involve building a base FSA for each symbol and then combining the
FSAs to form a single FSA, which is then used at runtime
processing.
[0295] The following SQL query provides an example of how an input
event stream may be partitioned using a symbol, and pattern
matching may then be applied over each sub-stream.
TABLE-US-00014 create query double_bottom as select symbol,
start_price, end_price from Ticker MATCH_RECOGNIZE ( PARTITION BY
symbol MEASURES A.symbol as symbol, A.price as start_price,
LAST(Z.price) as end_price PATTERN (A W+ X+ Y+ Z+) DEFINE W as
(W.price < PREV(W.price)), X as (X.price > PREV(X.price)), Y
as (Y.price < PREV(Y.price)), Z as (Z.price > PREV(Z.price))
)
[0296] The complex pattern that this query specifies is what is
known as a "double bottom" or "W" shaped pattern. The requirement
is to match non-overlapping maximal "W" patterns and from all the
events constituting a match, output a single event corresponding to
the match that reports the symbol, the price at the beginning of
the fall and the price at the end of the last rise.
[0297] The following definitions are used to describe the pattern
matching processing: [0298] 1) ReadyToOutputList--This list
contains all the potential output bindings in the increasing order
of output timestamp. These bindings are on hold because there
exists an unsure binding with lesser output timestamp in some other
partition.
[0299] The pattern matching process proceeds as follows: [0300] (1)
After processing the current input tuple, collect all the bindings
in the current partition, which can be output and move them to
ReadyToOutputList. [0301] (2) Get minimum matched timestamp
(minMatchedTs) of all the unsure bindings of all the partitions.
[0302] (3) Emit all the bindings in the ReadyToOutputList whose
matched timestamp is less than minMatchedTs.
[0303] The processing described above ensures that ready to output
bindings are on hold until all the potential output bindings
(unsure bindings) whose matched timestamp is less than the ready to
output bindings either become ready to output or cannot be
developed further.
[0304] Responsibility of pattern processor is to output the matched
events in order of time. In one partition there may be a match
ready for output and in another partition there is a match that is
being held back by preferment.
[0305] As described above, extensions are provided to SQL that
transform SQL into a rich expressive language for performing
pattern matching using regular expressions. The extensions enhance
the ability of SQL to support pattern matching on events.
Extensions to support or model regular expression-based pattern
matching on event streams may also be provided for other
programming languages.
[0306] Pattern matching using regular expressions over continuously
arriving events of an event stream, as described above, has wide
applicability in various fields and applications. Examples include
financial services, RFID based applications such as inventory
management, click stream analysis applications, electronic health
systems, and the like. For example, in financial services, a trader
may use the pattern matching techniques described above to identify
trends in the stock market based upon a stream of incoming ticker
stock prices. As another example, in RFID-based tracking and
monitoring, the pattern matching techniques described above may be
used to track valid paths of shipments and detect anomalies.
[0307] While embodiments of the present invention have been
described above with respect to Class A and Class B patterns, in
alternative embodiments various different types of patterns may be
recognized and processed accordingly. For a particular class of
pattern that is detected, an embodiment of the present invention is
configured to select and apply a pattern matching technique that is
appropriate for that type of pattern. In this manner, embodiments
of the present invention optimize and increase the efficiency of
pattern matching performed over event streams.
Automaton Construction
[0308] This section describes techniques for constructing a
nondeterministic finite state automata (NFSA) given a regular
expression used to express a pattern to be recognized according to
an embodiment of the present invention. Description is provided for
generating an NFSA for a basic regular expression such as `A`.
Description is then provided for the necessary transformations for
each of the regular expression operators such as Concatenation,
Alternation, Greedy Quantifiers--`*`,`+`,`?` and Lazy
Quantifiers--`*?`, `+?`, `??`.
[0309] In one embodiment, a regular expression is maintained in a
tree representation (which is actually a unique parse tree for that
regular expression) constructed by taking into account the
precedence and associativity of regular expression operators. Like
most algorithms operating on tree data structure, the process of
construction of NFSA given a regular expression in tree form is
also recursive in nature. A standard NFSA is constructed for basic
regular expression that is nothing but a single correlation
variable and then gradually the entire NFSA is built by applying
the transformations for different operators involved.
(a) Machine for a Basic Regular Expression
[0310] A basic regular expression is simply a single correlation
variable e.g. A. The machine for this basic regular expression
consists of two states: 0 and 1. [0311] State 0: This is the start
state and it has a transition going to state 1 on encountering the
alphabet A in the input. [0312] State 1: This is the final state
and has one transition going to "undefined" state (state number-1)
on the alphabet "Final" (indicated by F in FIGS. 11A-11).
[0313] A state diagram for a single correlation variable e.g. `A`
is shown in FIG. 11A. In FIGS. 11A-11I, a rectangular box below the
state shows the <alphabet, dest state> pairs (transitions) of
that state in the decreasing order of preference.
(b) Concatenation Operator (.)
[0314] The concatenation operator is a binary operator. First, the
NFSA for the left and right operands are obtained and then merged
to get the NFSA for the concatenated regular expression.
[0315] Let `L`--number of states in left operand NFSA [0316]
`R`--number of states in right operand NFSA The start state of
right NFSA is not considered and so the number of states in the
merged NFSA is L+R-1. Also the states are numbered 0 through
L+R-2.
[0317] The steps for merging the two NFSAs are as follows: [0318]
(1) For every state `S` in the left NFSA [0319] Copy all the
transitions (<alphabet, destination state> pairs) of `S` to
the corresponding state of merged NFSA. [0320] (2) For every final
state of left NFSA [0321] In the corresponding state of merged
NFSA, add all the start state transitions of right NFSA, in the
order in which they appear, at a position, where `Final` transition
appears for that state. In the process that `Final` transition is
removed. [0322] If the start state of right NFSA is final (i.e. the
language of the right regular expression accepts `.epsilon.`) then
make this state of merged NFSA a final state. [0323] (3) For every
state `S` of right NFSA except the start state [0324] Copy the
transitions of state `S` to the state `S+L-1` of the merged NFSA
incrementing the destination state field by L-1. [0325] Make the
state `S+L-1` of merged NFSA final, if the state `S` of right NFSA
is final. FIG. 11B depicts how the state diagram of `AB` is
obtained from the state diagrams of `A` and `B` by following the
above technique.
(c) Alternation Operator (|)
[0326] The alternation operator is a binary operator. First, the
NFSA for the left and right regular expressions are obtained and
then join them together to get the NFSA for the complete regular
expression.
[0327] Let `L`--number of states in left operand NFSA and [0328]
`R`--number of states in right operand NFSA In the joined NFSA, a
new state is created for start state that has all the transitions
of start states of left as well as right NFSA. We do away with the
start states of left and right NFSA. So the number of states in the
joined NFSA is L+R-1. Also the states are numbered 0 through
L+R-2.
[0329] The two NFSAs are joined as follows: [0330] (1) For every
state `S` in the left NFSA [0331] Copy all the transitions
(<alphabet, destination state> pairs) of `S` to the
corresponding state of joined NFSA. [0332] Also mark the state `S`
of joined NFSA as final if state `S` of left NFSA is final. [0333]
(2) If start states of both left and right NFSA are final then
[0334] Append all the start state transitions of right NFSA
excluding the `Final` transition, in the order in which they
appear, to the transitions of the start state of the joined NFSA.
While appending increment the destination state field by L-1.
[0335] Else Append all the start state transitions of right NFSA,
in the order in which they appear, to the transitions of the start
state of the joined NFSA. While appending increment the destination
state field by L-1. [0336] Make the start state of joined NFSA
final, if the start state of right NFSA is final. [0337] 3. For
every state `S` of right NFSA except the start state [0338] Copy
the transitions of state `S` to the state `S+L-1` of the joined
NFSA incrementing the destination state field by L-1. [0339] Make
the state `S+L-1` of joined NFSA final, if the state `S` of right
NFSA is final.
[0340] FIG. 11C depicts an example of how to get state diagram for
(A|B) from state diagrams of alternation components.
(d) Quantifiers
[0341] Quantifiers that may be supported in a regular expression
include greedy quantifiers (*, +, ?) and lazy (*?, +?, ??)
quantifiers. These quantifiers can be applied over simple regular
expressions like A, B, etc. or over complex regular expressions
formed by applying `concatenation` and/or `alternation` over simple
regular expressions e.g., (AB)*, (A|B)*, etc. Repeated applications
of these quantifiers in any order is also allowed yielding regular
expressions like (A*)?, (A*?B+C?)+, etc. In the techniques
described below for handling quantifiers, a machine constructed for
the unary operand (whether simple/complex) of the quantifier is
first obtained and then the constructed NFSA is appropriately
modified depending on the quantifier, as per the rules stated
below.
[0342] (1) `*` Quantifier (Greedy) [0343] Let P=R* Case 1--L(R)
does not contain epsilon
TABLE-US-00015 [0343] Start state of R is Cannot be the case (since
L(R) does final not contain epsilon) Start state of R is Make start
state final and the rank of "final" is not final the lowest (do
this start state processing last) Non-start final For each such
state, copy over all start-state states transitions between current
"final" and its predecessor. Then, for each duplicate transition,
remove the lower ranked duplicate. (Here duplicate transition is
one where alphabet and destination state are identical)
Case 2--L(R) contains epsilon
TABLE-US-00016 Start state of R is No change, leave it as is final
Start state of R is Cannot be the case (since L(R) contains
epsilon) not final Non-start final For each such state, copy over
all start-state states transitions between current "final" and its
predecessor. This includes the "final transition" from the start
state. Remove the original "final transition" of this final state.
Then, for each duplicate transition, remove the lower ranked
duplicate. (Here duplicate transition is one where alphabet and
destination state are identical)
[0344] FIG. 11D depicts an example of the state machine for `A*`
obtained by applying rules for `*` over state machine for `A`.
[0345] (2) `+` Quantifier (Greedy) [0346] Let P=R+ Case 1--L(R)
does not contain epsilon
TABLE-US-00017 [0346] Start state of R is Cannot be the case (since
L(R) does not contain final epsilon) Start state of R is No change,
leave it as is. not final Non-start final For each such state, copy
over all start-state states transitions between current "final" and
its predecessor. Then, for each duplicate transition, remove the
lower ranked duplicate. (Here duplicate transition is one where
alphabet and destination state are identical)
Case 2--L(R) contains epsilon
TABLE-US-00018 Start state of R is No change, leave it as is final
Start state of R is Cannot be the case (since L(R) contains
epsilon) not final Non-start final For each such state, copy over
all start-state states transitions between current "final" and its
predecessor. This includes the "final transition" from the start
state. Remove the original "final transition" of this final state.
Then, for each duplicate transition, remove the lower ranked
duplicate. (Here duplicate transition is one where alphabet and
destination state are identical)
[0347] FIG. 11E depicts an example of the state machine for `A+`
obtained by applying rules for `+` over the state machine for
`A`.
[0348] (3) `?` Quantifier (Greedy) [0349] Let P=R? Case 1--L(R)
does not contain epsilon
TABLE-US-00019 [0349] Start state of R is final Cannot be the case
(since L(R) does not contain epsilon) Start state of R is not final
Make the start state final and the rank of "final" is the lowest.
Non-start final states Nothing needs to be done
Case 2--L(R) contains epsilon
TABLE-US-00020 Start state of R is final No change, leave it as is
Start state of R is not final Cannot be the case (since L(R)
contains epsilon) Non-start final states Nothing needs to be
done.
[0350] FIG. 11F depicts an example of the state machine for `A?`
obtained by applying rules for `?` over state machine for `A`.
[0351] (4) `*?` quantifier (Lazy) [0352] Let P=R*? Case 1--L(R)
does not contain epsilon
TABLE-US-00021 [0352] Start state of R is final Cannot be the case
(since L(R) does not contain epsilon) Start state of R is not final
Make the start state final and the rank of "final" is first (do
this processing last) (Since epsilon is to be given more preference
over any non-empty string, rank of final is first). Non-start final
states For each such state, copy all start state transitions
between current `final` and its immediate successor. Then for each
duplicate transition, remove the lower ranked duplicate
Case 2--L(R) contains epsilon
TABLE-US-00022 Start state of R is final Remove the original
`final` transition and put it in the first place. (Since epsilon is
to be given more preference over any non-empty string, rank of
final is first) Start state of R is not final Cannot be the case
(since L(R) contains epsilon) Non-start final states For each such
state, copy all start state transitions between current "final" and
its immediate successor. No need to include the "final transition"
from the start state. Then, for each duplicate transition, remove
the lower ranked duplicate.
[0353] FIG. 11G depicts an example of the state machine for `A*?`
obtained by applying the rules for `*?` over state machine for
`A`.
[0354] (5) `+?` Quantifier (Lazy) [0355] Let P=R+? Case 1--L(R)
does not contain epsilon
TABLE-US-00023 [0355] Start state of R is final Cannot be the case
(since L(R) does not contain epsilon) Start state of R is not final
No change, leave it as is. Non-start final states For each such
state, copy all start state transitions between current `final` and
its immediate successor. Then for each duplicate transition, remove
the lower ranked duplicate.
Case 2--L(R) contains epsilon
TABLE-US-00024 Start state of R is final No change, leave it as is.
(No change in preference of epsilon needed here) Start state of R
is not final Cannot be the case (since L(R) contains epsilon)
Non-start final states For each such state, copy all start state
transitions between current "final" and its immediate successor. No
need to include the "final transition" from the start state. Then,
for each duplicate transition, remove the lower ranked
duplicate.
[0356] FIG. 11H depicts an example of a state machine for `A+?`
obtained by applying the rules for `+?` over state machine for
`A`.
[0357] (6) `??` Quantifier (Lazy) [0358] Let P=R?? Case 1--L(R)
does not contain epsilon
TABLE-US-00025 [0358] Start state of R is final Cannot be the case
(since L(R) does not contain epsilon) Start state of R is not final
Make the start state `final` and rank of `final` is first. (Since
epsilon is to be given more preference over any non-empty string
rank of final is first) Non-start final states Nothing needs to be
done
Case 2--L(R) contains epsilon
TABLE-US-00026 Start state of R is final Remove the original final
transition and put it in the first place. (Since epsilon is to be
given more preference over any non-empty string, rank of final is
first) Start state of R is not final Cannot be the case (since L(R)
contains epsilon) Non-start final states Nothing needs to be
done
[0359] FIG. 11I depicts an example of a state machine for `A??`
obtained by applying rules for `??` over state machine for `A`.
Example
[0360] This section provides an example of constructing an NFSA
based upon the rules described above. Consider a regular expression
"(A?B*|C*?)+". Here `+` is applied over the expression in brackets.
The expression inside brackets has two alternation components:
`A?B*` and `C*?`. The first one of these components is
concatenation of `A?` and `B*`. The process of constructing a state
machine for this regular expression is as follows: [0361] (1) The
state machine for first alternation component `A?B*` is obtained by
applying rules of concatenation operator on the state machines for
`A?` and `B*`. This is shown in FIG. 12A. [0362] (2) The state
diagram for second alternation component `C*?` is shown in FIG.
12B. [0363] (3) The rules of alternation operator are applied on
state machines of the two components to get the state machine for
`(A?B*|C*?)`, as shown in FIG. 12C. [0364] (4) Finally, the rules
for `+` quantifier on the state machine obtained in last step are
applied to get the state machine for complete regular expression,
as shown in FIG. 12D.
Analysis of Technique for Constructing NFSA
[0365] Let `N` be the number of alphabets in the regular expression
and `S` be the number of states in the state diagram of that
regular expression constructed by using the algorithms described in
this document. Then S=N+1 always, which means S is linear in `N`.
This can be proven using the principle of induction as follows:
[0366] (1) Basic case: When N=1, (single correlation variable case)
the number of states is two. So S=N+1 holds. [0367] (2) Assumption:
Let S=N+1 holds for all N<=K. [0368] (3) Induction Step: To
Prove for N=K+1, we have the following cases: [0369] Concatenation:
State diagram of regular expression with `n1` variables is joined
by using concatenation to another state diagram of regular
expression with `n2` variables where n1 and n2 both <=K, then
the number of states in resultant state diagram is: S=S1+S2-1
(since the initial state of second NFSA is removed in the process)
Where S1--number of states in first NFSA=n1+1 (follows from step
2)
[0370] S2--number of states in the second NFSA=n2+1 (follows from
step 2) [0371] Therefore, S=n1+1+n2+1-1=n1+n2+1=N+1 since N=n1+n2.
[0372] Alternation: State diagram of regular expression with `n1`
variables is joined by using alternation to another state diagram
of regular expression with `n2` variables where n1 and n2 both
<=K, then the number of states in resultant state diagram is:
[0373] S=S1+S2-1 (since the initial states of both NFSA are removed
in the process and a new initial state is added) [0374] Where
S1--number of states in first NFSA=n+1 (follows from step 2) [0375]
S2--number of states in the second NFSA=n2+1 (follows from step 2).
[0376] Therefore, S=n1+1+n2+1-1=n1+n2+1=N+1 since N=n1+n2. [0377]
Quantifiers: Algorithms for quantifiers don't change the number of
states. [0378] So in all cases S=N+1 is proved. Hence the number of
states is linear in N. [0379] The worst case bound on the number of
transitions (M) is [0380] M=number of alphabets (N)*number of
states (S). [0381] This follows easily from the following
invariant, there can be at most one transition for a given [0382]
<state, alphabet> pair. [0383] So in the worst case, every
state will have one transition on every alphabet. [0384]
M=N*S=N*(N+1)=O(N 2). [0385] So in the worst case, M is quadratic
in N.
Detection of Non-Occurrences of Events
[0386] In the embodiments described above, pattern matching
processing is performed during runtime upon receiving an event.
Accordingly, the pattern matching described above is based upon the
arrival of events in a data stream. There are several situations
across different industry domains where there is a need to detect
that a specific event has NOT happened within a time period
following the occurrence of another event. In other words, in such
situations it is important to detect the non-occurrence of an event
within a time period following the occurrence of another event.
These use cases may be referred to as "non-event" detection use
cases.
[0387] For example, in an airline baggage check-in application, the
application may be required to detect situations where a
passenger's bag has not been scanned within a certain period of
time (e.g., 5 minutes) following the passenger's check-in. Non
detection of a bag scan within the time period after a passenger's
check-in may indicate a potential lost bag scenario. Early
detection of such situations allows the airline company time to
react and take actions to resolve a missed bag situation.
[0388] The pattern matching techniques discussed above may be used
to detect such non-occurrences. For example, pattern matching
module 110 may be configured to detect such non-occurrences. In one
embodiment, language extensions are provided to a language (such as
CQL) that enable a user to formulate queries for detection of
non-occurrences of events using that language. One language
extension may enable a user to specify in the query that the query
is for detection of non-occurrences of events. Another language
extension may enable the user to specify a time period during which
the non-occurrence is to be monitored. For purposes of the
description below, the event that starts the time period within
which non-occurrence of an event is to be detected will be referred
to below as the "trigger event" since it triggers that start of the
time period. The event whose non-occurrence is to be monitored
during the time period will be referred to as a "non-event".
[0389] FIG. 13 depicts an example of a query 1300 that may be used
to detect the non-occurrence of an event within a time period
following the occurrence of another event according to an
embodiment of the present invention. Query 1300 depicted in FIG. 13
is merely an example of a query that may be used and is not
intended to limit the scope of the invention as recited in the
claims. Other queries may be used in alternative embodiments.
[0390] As depicted in FIG. 13, query 1300 comprises a new language
extension "INCLUDE TIMER EVENTS" clause 1304 that indicates that
query 1300 is a special query for detecting non-occurrence of an
event within a time period following the occurrence of another
event. Pattern 1302 may be used to specify the trigger event.
Another new language extension "DURATION" clause 1306 enables a
user to specify the time period after the occurrence of a trigger
event within which non-occurrence of a non-event is to be
monitored. One or more actions may be performed upon the detection
of non-occurrence of the non-event within the defined time period.
Various different types of actions may be performed such as raising
an alert, preventative actions, and others.
[0391] In the example depicted in FIG. 13,
"baggageCheckIn_u_baggageTracking" is an input stream being
analyzed. This input stream may comprise events of different types
including an event that indicates a passenger check-in (cType=0)
and an event that indicates a baggage scan (cType=1) following a
check-in. In query 1300, the pattern 1302 comprises two symbols A
and B. The condition or predicate for symbol A is defined such that
event A occurs when a check-in event (cType=0) is received in the
data stream. Accordingly, symbol A matches a check-in event. B*
matches all events other than the required baggage scan, i.e., it
is used to ignore all irrelevant events relative to the bag in
question. Now, this pattern will match (i.e., the automaton for
this pattern will transition to a final state) 5 minutes following
the occurrence of the check-in event unless the corresponding
baggage scan occurs prior to this. A pattern match in this context
corresponds to raising an alert signaling a potential missed bag
situation.
[0392] As with previously described queries, at compile time, an
FSA is constructed for query 1300 and the constructed FSA then used
to guide the pattern matching during runtime processing. The FSA is
constructed as previously described for Class A and Class B
patterns with some additional processing as depicted in FIG. 14.
FIG. 14 depicts a simplified flowchart 1400 depicting additional
processing performed at compile time for a query for detecting
non-occurrences according to an embodiment of the present
invention. The processing depicted in FIG. 14 may be performed by
software (e.g., code, program, instructions) executed by a
processor, in hardware, or combinations thereof. The software may
be stored on a computer-readable storage medium.
[0393] As depicted in FIG. 14, determination is made whether the
query is one for detecting non-occurrences of a non-event (step
1402). In one embodiment, automaton-generator 114 may determine
this based upon the presence or absence of a special language
extension clause (e.g., the "INCLUDE TIMER EVENTS" clause in the
query). As part of 1402, the input query may be processed to
determine the existence of the INCLUDE TIMER EVENTS clause. If such
a clause is detected in 1402, then a new variable or symbol (e.g.,
`#`) is introduced into the pattern specified by the query (step
1404). The `#` symbol represents timer events. In one embodiment,
the original pattern specified in the query is modified by
suffixing a `#` symbol to the pattern. For example, the original
"AB*" pattern in FIG. 13 is modified to "AB*#". This may be done
internally by pattern matching module 110. The user or system
providing the query need not be aware of this modification. The
modified pattern is then treated as a Class B pattern and an FSA is
constructed for the modified pattern using the techniques described
above for Class B patterns (step 1406). The FSA constructed in 1406
is then used during runtime to guide the detection of
non-occurrences of a specific event within a time period following
the occurrence of the trigger event. If it is determined in 1402
that the query is not one for detecting non-occurrences, then
processing proceeds for construction of a FSA as described above
for Class A or Class B patterns (step 1408).
[0394] As described above, the newly inserted symbol represents a
timer event. Unlike the processing described above for Class A and
Class B where the arrival of an event triggers the pattern matching
analysis and triggers state transitions in an FSA, in case of
detection of non-occurrences, the passage of time may also trigger
a transition of the FSA to a final state and thus cause an action
to be performed. This time-based transition is represented by the
`#` symbol.
[0395] There are different ways in which the passage of time may be
detected by pattern matching module 110. In one embodiment, as
events are received in the data stream, the timestamps associated
with the events are used to detect the passage of time. As
previously described, the timestamps associated with events in a
data stream may reflect an application's notion of time. For
example, the timestamp may be set by an application on the system
receiving an event stream or alternatively the timestamp associated
with an event may correspond to the time of the application sending
the data events. In either scenario, the timestamps associated with
events may be used to determine passage of time. For example, a
trigger event e.sub.1 (e.g., check-in) may have an associated time
stamp of t=3. If the time duration for which non-occurrence is to
be detected is t=5, then an alert is to be raised if the bag scan
event does not arrive within t=(3+5), i.e., before t=8. In this
scenario, if an event is received in the data stream with an
associated time stamp of t=10, pattern matching module 110 infers
the passage of time to t=10, i.e., it denotes a passage of time
beyond t=8 and that the non-event has not occurred in the specified
time period. In this manner, time stamps associated with events
received after the trigger event may be used to determine the
passage of time, and thus used to determine non-occurrence of a
non-event within the specified time period after the occurrence of
a trigger event.
[0396] In addition to timestamps associated with events, a
heartbeat event (or heartbeat) may also be used to determine the
passage of time. The heartbeat may be generated by the event
processing server based upon a clock of the event processing server
or some other time or may be provided by the application. The
heartbeat event may be periodically fed to pattern matching module
110 to indicate the passage of time. The heartbeat allows pattern
matching module 110 to determine the passage of time without being
reliant on the arrival of events in the data stream. This is
especially useful when the arrival of events in a data stream is
very intermittent and cannot be relied on to determine passage of
time. For example, when a data stream goes silent or has not
received an event for a certain period of time, pattern matching
module 110 may use the heartbeat information to determine passage
of time.
[0397] The runtime processing performed for detecting
non-occurrences generally follows the runtime processing performed
for detecting presence of Class B patterns as described above with
additional processing performed to handle timer events. FIG. 15
depicts a simplified flowchart 1500 depicting processing for
detecting non-occurrences according to an embodiment of the present
invention. The processing depicted in FIG. 1500 may be performed by
software (e.g., code, program, instructions) executed by a
processor, in hardware, or combinations thereof. The software may
be stored on a computer-readable storage medium. It should be noted
that there may be several active bindings present when an input is
received. The processing in FIG. 15 and described below would be
applied to all these bindings.
[0398] As depicted in FIG. 15, a new input is received (step 1502).
The new input may be an actual event received in the data stream or
a heartbeat event. A list of active bindings BINDING_LIST is then
accessed (step 1504). In one embodiment, the bindings in the
BINDING_LIST are sorted in non-decreasing order based upon the
target times associated with the bindings in the list. Accordingly,
in the sorted BINDING_LIST, the top element in the list is a
binding with the least associated value of target time.
[0399] A target time associated with a binding is defined as the
time of the first element of the binding plus the duration
specified in the query. The first element is the triggering event.
For example, in the "AB*" example of FIG. 13, the event matching
symbol "A" would be the first event or trigger event. The trigger
event is the event that results in the construction of a new
binding (see 1524 below) (as opposed to other events, the B's in
the AB* example, that result in "growing" an existing binding but
not creating a new binding). As an example, if the time of the
first element (the trigger element) is t=1, and the time duration
is t=5 units, then the target time associated with the active
binding is t=6. For example, if the active binding is for AB*#,
then the target time is the time of event A plus the time
duration.
[0400] The top element (i.e., the binding with the least associated
target time) is then selected from BINDING_LIST and referred to as
binding B (step 1506). A check is then made to see if B is null
(step 1508). If B is determined to be null then processing
continues with step 1524 described below. If B is determined not to
be null, then a check is made to see if the input time (i.e., the
time associated with the input received in 1502) is greater than or
equal to the target time associated with binding B (step 1510). If
it is determined in 1510 that the input time is greater than or
equal to the target time associated with binding B, then it is
inferred that the time period has elapsed without the occurrence of
the event in the context of binding B, i.e., a non-occurrence is
detected in the context of binding B. In this situation, it is
assumed that a heartbeat event is received at target time
associated with binding B that moves the FSA to the final state due
to the special symbol # being matched (step 1512). Processing is
then performed according to usual Class B processing corresponding
to the heartbeat at target time associated with binding B (step
1514). As part of the processing in 1514, an action may be
performed such as outputting an alert indicating the non-occurrence
of an event in the context of binding B. The timestamp associated
with the output event (indicating non-event) is the target time for
binding B. As part of the processing in 1514, the current binding B
is removed from BINDING_LIST since it was output.
[0401] The next unprocessed binding from BINDING_LIST is then
selected and referred to as binding B (step 1518). Processing then
continues with 1508.
[0402] Referring back to step 1510, if it is determined that the
input time is not greater than or equal to the target time
associated with binding B (i.e., the input time is less than the
target time), then a check is made to see if the input is a
heartbeat event (step 1520). If it is determined that the input
event received in 1502 is a heartbeat, then no further processing
is needed and processing ends. If the input event is not a
heartbeat event then the input event is treated as the data event
received at the input time and usual Class B processing is
performed to determine if any correlation variable predicates are
matched by the input event in the context of binding B (step 1522).
Processing then continues with step 1518.
[0403] Referring back to step 1508, if it is determined that B is
null then a new binding is created if applicable (input matches
alphabet out of start state). A target time is associated with the
newly created binding, where the target time is the time of the
first element of the new binding (i.e., the current input time)
plus the duration specified in the query. The new binding is then
inserted into the end of BINDING_LIST. Processing then ends.
[0404] For example, consider the AB* example of FIG. 13. Suppose A
happens at t=1 and target time is t=6. Now, if a non-heartbeat
input event "e" is next received is at t=10. In this case, there is
only one active binding B. The "implicit/imaginary" heartbeat event
at t=6 (with respect to binding B) moves the FSA to a final state
(since target time in this case is t=6). There is a round of Class
B processing at this time for processing the heartbeat event at
t=6. Further Class B processing is then performed to see if the
non-heartbeat (data) event with t=10 causes any predicates to be
matched.
[0405] Although steps 1512 and 1514 are shown as separate steps in
FIG. 15, the processing performed in these steps may be performed
as part of the processing performing in step 1514.
[0406] The following depicts an example of a stream of events and
the corresponding FSA states per the processing depicted in FIG.
15. The example is based upon the query depicted in FIG. 13. The
FSA for the query is indicated by the following transition table,
where Q0 is the starting state and Q3 is the final state.
TABLE-US-00027 Transition Table Source State Symbol Destination
State Q0 A Q1 Q1 B Q2 Q2 B Q2 Q2 # Q3 Q1 # Q3
[0407] As indicated in the transition table, when in the initial
state Q0, if the input event matches A, then the FSA transitions
from Q0 to Q1. In state Q1, when the input event matches B, then
the FSA moves from Q1 to Q2. In state Q2, when the input event
matches B, then the FSA remains in state Q2. In either state Q1 or
Q2, upon the detection of a non-occurrence of an event, symbol # is
matched, that forces the FSA to move to the final state Q3.
[0408] For purposes of this example, let the schema of the input
stream [0409] baggageCheckIn_u_baggageTracking be [0410]
baggageCheckIn_u_baggageTracking(ctype, reservationLocator, bagId,
flightNumber, flightSegment) Further, assume that the events on the
data stream are: [0411] Event e1=(0, p1, bag1, FL100, A-B) at t=1
[0412] Event e2=(0, p2, bag2, FL100, A-B) at t=2 [0413] Event
e3=(1, p2, bag2, FL100, A-B) at t=3 [0414] Event e4=(0, p3, bag3,
FL100, A-B) at t=4 [0415] Event e5=(0, p4, bag4, FL100, A-B) at
t=15
[0416] The following depicts that processing that happens at
runtime:
Before receiving any events there are no bindings. [0417] (1) Event
e1=(0, p1, bag1, FL100, A-B) at t=1 A new binding b1 is created
where b1=(A=e1, target time=6), state=Q1. [0418] (2) Event e2=(0,
p2, bag2, FL100, A-B) at t=2 [0419] For binding b1, the input time
is less than the target time (per 1510), so b1 becomes b1=(A=e1,
B=e2, target time=6), state=Q2. Also since it is ALL MATCHES, a new
binding b2 is created with b2=(A=e2, target_time=7), state=Q1.
[0420] (3) Event e3=(1, p2, bag2, FL100, A-B) at t=3 For binding
b1, the input time is less than the target time (per 1510), so b1
becomes b1=(A=e1,B=e2,B=e3, target time=6), state=Q2. For binding
b2, the input time is less than the target time (per 1510), however
the event does not match symbol B, hence binding b2 is destroyed
since it cannot be grown further. Note that this corresponds to
passenger p2's bag being scanned within 5 time units of his checkin
(p2's check in event was e2). [0421] (4) Event e4=(0, p3, bag3,
FL100, A-B) at t=4 For binding b1, the input time is less than the
target time (per 1510), so b1 becomes, b1=(A=e1,B=e2,B=e3,B=e4
target time=6), state=Q2. Also new binding b3 is created with
b3=(A=e4,target time=9) state=Q1. [0422] (5) Event e5=(0, p4, bag4,
FL100, A-B) at t=15 For binding b1, the input time is no longer
less than the target time (per 1510), so the event is first treated
as a heartbeat at t=6 (per 1512). Now, b1=(A=e1,B=e2,B=e3,B=e4,
#=e5' target_time=6), state=Q3 where e5' is the heartbeat inferred
from event e5 at t=6. Since Q3 is final state, the non-event
corresponding to missed bag scan for passenger p1's bag bag1 is
reported. But, there is another binding b3 and for binding b3, the
input time is not less than the target time (per 1510), so first
treat as heartbeat at t=9 (per 1512). Now, b3=(A=e4,#=e5'' target
time=9) state=Q3 where e5'' is the heartbeat inferred from event e5
at t=9. Since Q3 is final state, the non-event corresponding to
missed bag scan for passenger p3's bag bag3 is reported. A new
binding b4 is created and b4=(A=e5, target time=20) state=Q1
[0423] In the manner described above, embodiments of the present
invention provide language extensions to a querying language such
as CQL that enable a user to formulate queries that cause pattern
matching module 110 to detect non-occurrences of events for a time
period after the occurrence of a trigger event. Various different
queries for various different use cases may be formulated using the
language extensions. During runtime processing, the pattern
specified in the query is modified by suffixing a special symbol
(e.g., `#`) to the pattern. The `#` symbol represents timer events.
An FSA is then built for the modified pattern and used during
runtime to guide detection of non-occurrences of non-events.
[0424] While the description above has described detecting the
non-occurrence of an event within a certain time period following
occurrence of another event, the teachings described above may also
be applied to detecting non-occurrence of an event within a time
period following the occurrence of a pattern of events. Further,
the teachings described above may also be applied to detecting
non-occurrence of a pattern of events within a time period
following the occurrence of a pattern of events or the occurrence
of an event.
[0425] In the context of non-event detection, the processing in the
case with the PARTITION BY clause is by and large very similar to
the case where there is no PARTITION BY clause except for the
following differences: [0426] (1) The set of bindings when input
time>=target time, independent of which partition the binding is
associated with, are processed first as per steps 1512, 1514.
[0427] (2) However when input time is less than the target time,
only those bindings that belong to the same partition as the
current input are considered. Of course, only case 1 above applies
(and this case does not apply) if current event is a heartbeat
since a heartbeat is not associated with any specific
partition.
[0428] In one embodiment, to implement the above, the following two
sets of data structures are used: [0429] (1) A global linked list
containing bindings from all partitions sorted in non-decreasing
order of target time. [0430] (2) A HashTable that is indexed on the
partition key. The value for each partition key is the list of all
bindings associated with that particular partition.
[0431] In the case where there is no PARTITION BY clause, only the
first data structure is present (this is referred to in FIG. 15 and
the description above as BINDING_LIST). The second is not required
(does not make sense since there is no partition key since there is
no PARTITION BY clause).
Detection of Recurring Non-Occurrences of an Event
[0432] As described above, there is a need in several applications
to detect the non-occurrence of an event within a time period
following the occurrence of another event. In yet other
applications, there is a need to detect recurring instances of such
a non-occurrence. These situations may be referred to as "recurring
non-event" detection use cases. In these situations, the
non-occurrence of a specific event within a time period is to be
detected where the time period gets incremented until the
occurrence of another event or some stoppage condition is met. It
should be noted that the detection of such recurring
non-occurrences is not a simple application of multiple detections
of non-occurrences described above.
[0433] For example, consider a flight monitoring application. The
requirement in such an application may be to prompt EVERY 5 minutes
after a flight's scheduled departure time has passed until the time
the flight actually takes off. Such a prompting system may be used
by the airport authorities to monitor flight delays and to initiate
actions for addressing the causes for the delay.
[0434] As with detection of a single non-occurrence (described
above), in the detection of recurring non-occurrences, the
occurrence of a trigger events starts the first time period during
which the non-occurrence of a specific event is to be detected. In
the above example, the trigger event is the changing of the flight
status to FLIGHT DEPARTURE. The trigger event initiates the first
timer period of a series of recurring time periods in sequence. The
non-occurrence of the FLIGHT TAKE OFF is detected for each time
period in the recurring series of time periods. If the
non-occurrence of the specified event is detected during a time
period then an action, such as an alert, may be taken at the end of
that time period (e.g., a prompt every 5 minutes in the above
example). The recurring time periods continue until a stoppage
condition is met. The stoppage condition may be the occurrence of
the specific event, the occurrence of some other event, after the
recurrence had occurred for a certain number of times, etc. In this
manner, the recurring non-occurrence of a specified event is
detected.
[0435] The pattern matching techniques discussed above may be used
to detect such recurring non-occurrences. For example, pattern
matching module 110 may be configured to detect such recurring
non-occurrences. In one embodiment, language extensions are
provided to a language (e.g., to CQL) that enable a user to
formulate queries for detection of recurrences of non-occurrences
of events using that language. One language extension may enable a
user to specify in the query that the query is for detection of
non-occurrences of events. Another language extension may enable
the user to specify a length of the time period the frequency of
recurrence of the time period during which the non-occurrences are
to be detected.
[0436] FIG. 16 depicts an example of a query 1600 that may be used
to detect recurring non-occurrences of an event according to an
embodiment of the present invention. Query 1600 depicted in FIG. 16
is merely an example of a query that may be used and is not
intended to limit the scope of the invention as recited in the
claims. Other queries may be used in alternative embodiments.
[0437] Query 1600 models the delayed flight example discussed
above. As depicted in FIG. 16, query 1600 comprises a new language
extension "INCLUDE TIMER EVENTS" clause 1604 that indicates that
query 1600 is a special query for detecting non-occurrence of an
event. Pattern 1602 may be used to specify the trigger event.
Another new language extension "DURATION MULTIPLES OF" clause 1606
enables a user to specify a length of a time period and the
recurring frequency of the time period after the occurrence of a
trigger event within which non-occurrence of a specific event is to
be detected. For each time period in the recurring sequence of time
periods, one or more actions may be performed upon the detection of
a non-occurrence of the specific event within that time period.
Various different types of actions may be performed such as raising
an alert, preventative actions, and others.
[0438] In the example depicted in FIG. 16,
"Flight_departure_takeoff_stream" is an input stream comprising
events including "flights departure announced" and "flight take
off" events. In query 1600, the symbol A matches an event where the
departure for the flight has been announced. The symbol B* matches
all events other than the flight takeoff event for the flight in
question. Now, pattern 1606 will match EVERY 5 minutes following
the event of the flight departure announcement until the flight
takeoff event arrives. In this example, the flight takeoff event is
the stoppage condition. A pattern match (i.e., detection of
non-occurrence of an event) in this context may take an action such
as alerting the concerned authority to take necessary action
arising due to delay in the flight takeoff. In the above, example,
the stoppage condition is the occurrence of the flight takeoff
event, which is also the event whose non-occurrence is being
detected.
[0439] As described above for detection of non-occurrence of an
event, the passage of time may be determined based upon times
associated with events received via the data stream and/or based
upon a heartbeat. This is also the case for detecting recurring
non-occurrences.
[0440] At compile time, an FSA is constructed for query 1600 and
the constructed FSA then used to guide the pattern matching during
runtime processing. The FSA is constructed as previously described
for Class A and Class B patterns with some additional processing as
depicted in FIG. 14 and describe above. As part of the processing,
upon determining that the query is for detection of non-occurrences
of an event (e.g., due to the presence of INCLUDE TIME EVENTS
clause 1606 in the query), a special symbol (e.g., `#`) is
introduced into the pattern specified by the query. For example,
the original "AB*" pattern in FIG. 16 is modified to "AB*#". The
modified pattern is then treated as a Class B pattern and an FSA is
constructed for the modified pattern using the techniques described
above for Class B patterns. The constructed FSA is then used during
runtime to guide the detection of recurring non-occurrences after
the occurrence of a trigger event and until the stoppage condition
is satisfied.
[0441] FIG. 17 depicts a simplified flowchart 1700 depicting
processing for detecting recurring non-occurrences according to an
embodiment of the present invention. The processing depicted in
FIG. 1700 may be performed by software (e.g., code, program,
instructions) executed by a processor, in hardware, or combinations
thereof. The software may be stored on a computer-readable storage
medium. It should be noted that there may be several active
bindings present when an input is received. The processing in FIG.
17 and described below would be applied to these bindings.
[0442] Flowchart 1700 depicted in FIG. 17 is quite similar to
flowchart 1500 depicted in FIG. 15 with step 1702 replacing step
1514 in FIG. 15. The other steps are as described above with
respect to FIG. 15. In step 1702, processing is performed according
to usual Class B processing corresponding to the heartbeat at
target time associated with binding B. As part of the processing in
1702, an action may be performed such as outputting an alert
indicating the non-occurrence of an event in the context of binding
B. The timestamp associated with the output event (indicating
non-event) is the target time for binding B. As part of the
processing in 1702, the current binding B is removed from
BINDING_LIST since it was output. Further, as part of the
processing in 1702, a new binding is created based upon the binding
that was output. An incremented target time is associated with the
new binding, where the target time is the previous target time
incremented by the period specified in the query. The new binding
with the associated new target time is then inserted at the end of
BINDING_LIST. Processing then continues with step 1518.
[0443] The following example is based upon the query depicted in
FIG. 16. The FSA for the query is indicated by the transition table
shown below, where Q0 is the starting state and Q3 is the final
state.
TABLE-US-00028 Transition Table Source State Symbol Destination
State Q0 A Q1 Q1 B Q2 Q2 B Q2 Q2 # Q3 Q1 # Q3
[0444] For purposes of this example, let the schema of the input
stream [0445] Flight_departure_takeoff_stream be [0446]
Flight_departure_takeoff_stream (ctype, flightNumber) [0447]
Ctype=0 indicates a departure announcement event [0448] Ctype=1
indicates a flight take off event
[0449] Further, assume that the events received on the data stream
are: [0450] Event e1=(0, FL100) at t=1 [0451] Event e2=(0, FL101)
at t=3 [0452] Event e3=heartbeat at t=6 [0453] Event e4=(1, FL101)
at t=7 [0454] Event e5=(0, FL102) at t=13 [0455] Event e6=(1,
FL100) at t=15 The following depicts that processing that happens
at runtime for the events received as shown above:
[0456] Before receiving any events there are no bindings. [0457]
(1) Event e1=(0, FL100) at t=1 A new binding b1 is created where
b1=(A=e1, target time=6), state=Q1. [0458] (2) Event e2=(0, FL101)
at t=3 For binding b1, the input time (3) is less than the target
time (6), so b1 becomes b1=(A=e1, [0459] B=e2, target time=6),
state=Q2. A new binding b2 is created with b2=(A=e2, target
time=8), state=Q1. [0460] (3) Event e3=heartbeat at t=6 For binding
b1, input time=target time, hence b1 becomes b1=(A=e1, B=e2, #=e3,
target time=6) state=Q3. Since Q3 is final, the non-event
corresponding to flight takeoff delayed is reported. Further, new
binding b3 is created based on b1 and its target time is further
incremented by 5 units. So b3=(A=e1,B=e2, target time=11) state=Q2
For binding b2, input time<target time, so there is no further
processing since input is a heartbeat. [0461] (4) Event e4=(1,
FL101) at t=7 For binding b2, input time (7) <target_time (8),
but the binding cannot be grown since event does not match symbol B
(for this binding). Hence the binding is deleted. This corresponds
to takeoff of flight within the time limit following the departure
announcement, i.e., there is no delay in flight takeoff. For
binding b3, input time(7) <target_time(11), so b3 becomes
b3=(A=e1,B=e2, B=e4 target time=11) state=Q2 [0462] (5) Event
e5=(0, FL102) at t=13 For binding b3, input_time
(13)>target_time (11). The event is first treated as a heartbeat
event e5' at t=1. Hence b3 becomes b3=(A=e1,B=e2, B=e4,
#=e5',target time=11) state=Q3. Since the non-event corresponding
to flight takeoff delayed is reported (for a second time for flight
FL100) at t=11. Further, new binding b4 is created based on b3 and
its target time is further incremented by 5 units. So b4=(A=e1,
B=e2, B=e4 target time=16 ) state=Q2. b4 is added to the end of the
BINDING_LIST. Now treat as data event e5 at t=13 But there is
binding b4 yet to process. For binding b4, input time (13)
<target time (16), so b4 becomes b4=(A=e1,B=e2, B=e4, B=e5
target time=16) state=Q2 A new binding b5 is created with b5=(A=e5,
target time=18), state=Q1. [0463] (6) Event e6=(1, FL100) at t=15
For binding b4, input_time (15) <target_time (16), but the
binding cannot be grown since event does not match symbol B (for
this binding). Hence the binding is deleted. This corresponds to
takeoff of flight--thus no further monitoring for takeoff delays
for this flight. For binding b5, input_time (15) <target_time
(18), hence b5 becomes b5=(A=e5, B=e6, target_time=18),
state=Q2.
[0464] In the manner described above, embodiments of the present
invention provide language extensions (e.g., ANSI extensions) to a
querying language such as CQL that enable a user to formulate
queries that cause pattern matching module 110 to detect
non-occurrences of a specific event for recurring time periods
following the occurrence of a trigger event until a stoppage
condition is met.
[0465] While the description above has described detecting the
non-occurrence of an event over recurring time periods following
occurrence of another event, the teachings described above may also
be applied to detecting non-occurrence of an event over recurring
time periods following the occurrence of a pattern of events.
Further, the teachings described above may also be applied to
detecting non-occurrence of a pattern of events over recurring time
periods following the occurrence of a pattern of events or the
occurrence of an event.
[0466] FIG. 18 is a simplified block diagram illustrating
components of a system environment 1800 that may be used in
accordance with an embodiment of the present invention. As shown,
system environment 1800 includes one or more client computing
devices 1802, 1804, 1806, 1808, which are configured to operate a
client application such as a web browser, proprietary client (e.g.,
Oracle Forms), or the like. In various embodiments, client
computing devices 1802, 1804, 1806, and 1808 may interact with a
server 1812.
[0467] Client computing devices 1802, 1804, 1806, 1808 may be
general purpose personal computers (including, by way of example,
personal computers and/or laptop computers running various versions
of Microsoft Windows and/or Apple Macintosh operating systems),
cell phones or PDAs (running software such as Microsoft Windows
Mobile and being Internet, e-mail, SMS, Blackberry, or other
communication protocol enabled), and/or workstation computers
running any of a variety of commercially-available UNIX or
UNIX-like operating systems (including without limitation the
variety of GNU/Linux operating systems). Alternatively, client
computing devices 1802, 1804, 1806, and 1808 may be any other
electronic device, such as a thin-client computer, Internet-enabled
gaming system, and/or personal messaging device, capable of
communicating over a network (e.g., network 1810 described below).
Although exemplary system environment 1800 is shown with four
client computing devices, any number of client computing devices
may be supported. Other devices such as devices with sensors, etc.
may interact with server 1812.
[0468] System environment 1800 may include a network 1810. Network
1810 may be any type of network familiar to those skilled in the
art that can support data communications using any of a variety of
commercially-available protocols, including without limitation
TCP/IP, SNA, IPX, AppleTalk, and the like. Merely by way of
example, network 1810 can be a local area network (LAN), such as an
Ethernet network, a Token-Ring network and/or the like; a wide-area
network; a virtual network, including without limitation a virtual
private network (VPN); the Internet; an intranet; an extranet; a
public switched telephone network (PSTN); an infra-red network; a
wireless network (e.g., a network operating under any of the IEEE
802.11 suite of protocols, the Bluetooth protocol known in the art,
and/or any other wireless protocol); and/or any combination of
these and/or other networks.
[0469] System environment 1800 also includes one or more server
computers 1812 which may be general purpose computers, specialized
server computers (including, by way of example, PC servers, UNIX
servers, mid-range servers, mainframe computers, rack-mounted
servers, etc.), server farms, server clusters, or any other
appropriate arrangement and/or combination. In various embodiments,
server 1812 may be adapted to run one or more services or software
applications described in the foregoing disclosure. For example,
server 1812 may correspond to a events processing server as
depicted in FIG. 1 that include a pattern matching module as
depicted in FIG. 1.
[0470] Server 1812 may run an operating system including any of
those discussed above, as well as any commercially available server
operating system. Server 1812 may also run any of a variety of
additional server applications and/or mid-tier applications,
including HTTP servers, FTP servers, CGI servers, Java servers,
database servers, and the like. Exemplary database servers include
without limitation those commercially available from Oracle,
Microsoft, Sybase, IBM and the like.
[0471] System environment 1800 may also include one or more
databases 1814, 1816. Databases 1814, 1816 may reside in a variety
of locations. By way of example, one or more of databases 1814,
1816 may reside on a storage medium local to (and/or resident in)
server 1812. Alternatively, databases 1814, 1816 may be remote from
server 1812, and in communication with server 1812 via a
network-based or dedicated connection. In one set of embodiments,
databases 1814, 1816 may reside in a storage-area network (SAN)
familiar to those skilled in the art. Similarly, any necessary
files for performing the functions attributed to server 1812 may be
stored locally on server 1812 and/or remotely, as appropriate. In
one set of embodiments, databases 1814, 1816 may include relational
databases, such as Oracle 10 g, that are adapted to store, update,
and retrieve data in response to SQL-formatted commands.
[0472] FIG. 19 is a simplified block diagram of a computer system
1900 that may be used in accordance with embodiments of the present
invention. For example server 102 may be implemented using a system
such as system 1900. Computer system 1900 is shown comprising
hardware elements that may be electrically coupled via a bus 1924.
The hardware elements may include one or more central processing
units (CPUs) 1902, one or more input devices 1904 (e.g., a mouse, a
keyboard, etc.), and one or more output devices 1906 (e.g., a
display device, a printer, etc.). Computer system 1900 may also
include one or more storage devices 1908. By way of example, the
storage device(s) 1908 may include devices such as disk drives,
optical storage devices, and solid-state storage devices such as a
random access memory (RAM) and/or a read-only memory (ROM), which
can be programmable, flash-updateable and/or the like.
[0473] Computer system 1900 may additionally include a
computer-readable storage media reader 1912, a communications
subsystem 1914 (e.g., a modem, a network card (wireless or wired),
an infra-red communication device, etc.), and working memory 1918,
which may include RAM and ROM devices as described above. In some
embodiments, computer system 1900 may also include a processing
acceleration unit 1916, which can include a digital signal
processor (DSP), a special-purpose processor, and/or the like.
[0474] Computer-readable storage media reader 1912 can further be
connected to a computer-readable storage medium 1910, together
(and, optionally, in combination with storage device(s) 1908 )
comprehensively representing remote, local, fixed, and/or removable
storage devices plus storage media for temporarily and/or more
permanently containing computer-readable information.
Communications system 1914 may permit data to be exchanged with
network 1610 and/or any other computer described above with respect
to system environment 1600.
[0475] Computer system 1900 may also comprise software elements,
shown as being currently located within working memory 1918,
including an operating system 1920 and/or other code 1922, such as
an application program (which may be a client application, Web
browser, mid-tier application, RDBMS, etc.). In an exemplary
embodiment, working memory 1918 may include executable code and
associated data structures (such as caches) used for pattern
matching method described above. It should be appreciated that
alternative embodiments of computer system 1900 may have numerous
variations from that described above. For example, customized
hardware might also be used and/or particular elements might be
implemented in hardware, software (including portable software,
such as applets), or both. Further, connection to other computing
devices such as network input/output devices may be employed.
[0476] Storage media and computer readable media for containing
code, or portions of code, can include any appropriate media known
or used in the art, including storage media and communication
media, such as but not limited to volatile and non-volatile,
removable and non-removable media implemented in any method or
technology for storage and/or transmission of information such as
computer readable instructions, data structures, program modules,
or other data, including RAM, ROM, EEPROM, flash memory or other
memory technology, CD-ROM, digital versatile disk (DVD) or other
optical storage, magnetic cassettes, magnetic tape, magnetic disk
storage or other magnetic storage devices, data signals, data
transmissions, or any other medium which can be used to store or
transmit the desired information and which can be accessed by a
computer.
[0477] Although specific embodiments of the invention have been
described, various modifications, alterations, alternative
constructions, and equivalents are also encompassed within the
scope of the invention. Embodiments of the present invention are
not restricted to operation within certain specific data processing
environments, but are free to operate within a plurality of data
processing environments. Additionally, although embodiments of the
present invention have been described using a particular series of
transactions and steps, it should be apparent to those skilled in
the art that the scope of the present invention is not limited to
the described series of transactions and steps.
[0478] Further, while embodiments of the present invention have
been described using a particular combination of hardware and
software, it should be recognized that other combinations of
hardware and software are also within the scope of the present
invention. Embodiments of the present invention may be implemented
only in hardware, or only in software, or using combinations
thereof.
[0479] The specification and drawings are, accordingly, to be
regarded in an illustrative rather than a restrictive sense. It
will, however, be evident that additions, subtractions, deletions,
and other modifications and changes may be made thereunto without
departing from the broader spirit and scope as set forth in the
claims.
* * * * *