U.S. patent application number 14/050808 was filed with the patent office on 2014-04-17 for data analytic engine towards the self-management of complex physical systems.
This patent application is currently assigned to NEC Laboratories America, Inc.. The applicant listed for this patent is NEC Laboratories America, Inc.. Invention is credited to Haifeng Chen, Min Ding, Guofei Jiang, Bin Liu, Abhishek Sharma, Kenji Yoshihira.
Application Number | 20140108324 14/050808 |
Document ID | / |
Family ID | 50476329 |
Filed Date | 2014-04-17 |
United States Patent
Application |
20140108324 |
Kind Code |
A1 |
Chen; Haifeng ; et
al. |
April 17, 2014 |
DATA ANALYTIC ENGINE TOWARDS THE SELF-MANAGEMENT OF COMPLEX
PHYSICAL SYSTEMS
Abstract
Systems and method for modeling system dynamics, including
extracting features representative of a temporal evolution of a
dynamical system, further including deriving one or more vector
trajectories by performing sliding window segmentation of one or
more time series; applying a linear test to determine whether the
one or more vector trajectories are linear or nonlinear; and
performing linear or nonlinear subspace decomposition on the vector
trajectory based on the linear test. The system and method may
generate a system evolution model from the extracted features of
the dynamical system and determine a fitness score of the system
evolution model.
Inventors: |
Chen; Haifeng; (Old Bridge,
NJ) ; Ding; Min; (Chalfont, PA) ; Liu;
Bin; (Piscataway, NJ) ; Sharma; Abhishek; (New
Brunswick, NJ) ; Yoshihira; Kenji; (Princeton
Junction, NJ) ; Jiang; Guofei; (Princeton,
NJ) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
NEC Laboratories America, Inc. |
Princeton |
NJ |
US |
|
|
Assignee: |
NEC Laboratories America,
Inc.
Princeton
NJ
|
Family ID: |
50476329 |
Appl. No.: |
14/050808 |
Filed: |
October 10, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61713006 |
Oct 12, 2012 |
|
|
|
61719475 |
Oct 28, 2012 |
|
|
|
Current U.S.
Class: |
706/52 |
Current CPC
Class: |
G06N 20/00 20190101;
G06F 17/18 20130101; G06F 11/3072 20130101; G06F 11/3055 20130101;
G06F 17/16 20130101; G06N 5/048 20130101 |
Class at
Publication: |
706/52 |
International
Class: |
G06N 5/04 20060101
G06N005/04 |
Claims
1. A method for modeling system dynamics, comprising: extracting
features representative of a temporal evolution of a dynamical
system, including; deriving one or more vector trajectories by
performing sliding window segmentation of one or more time series;
applying a linear test to determine whether the one or more vector
trajectories are linear or nonlinear; and performing linear or
nonlinear subspace decomposition on the vector trajectory based on
the linear test; and generating, using a processor, a system
evolution model from the extracted features of the dynamical
system; and determining a fitness score of the system evolution
model.
2. The method as recited in claim 1, wherein the dynamical system
is a deterministic system.
3. The method as recited in claim 1, wherein the system evolution
model is constructed using a Vector-Autoregressive (VAR)
technique.
4. The method as recited in claim 1, wherein the system evolution
model is constructed using a density based approach for a time
series which lack smoothness in the one or more vector
trajectories.
5. The method as recited in claim 1, wherein the system evolution
model monitors a current system status and compares the current
system status with the system evolution model to detect
anomalies.
6. The method as recited in claim 1, wherein a window size is
increased during the sliding window segmentation to improve
detection precision.
7. A system for modeling system dynamics, comprising: a processor
configured to extract features representative of a temporal
evolution of a dynamical system, including; a sliding window
segmentation module configured to derive one or more vector
trajectories of one or more time series; a linear testing module
configured to determine whether the one or more vector trajectories
are linear or nonlinear; and a subspace decomposition module
configured to perform linear or nonlinear subspace decomposition on
the vector trajectory based on the linear test; and a modeling
module configured to generate a system evolution model from the
extracted features of the dynamical system; and an analytic engine
configured to determine a fitness score of the system evolution
model.
8. The system as recited in claim 7, wherein the dynamical system
is a deterministic system.
9. The system as recited in claim 7, wherein the system evolution
model is constructed using a Vector-Autoregressive (VAR)
technique.
10. The system as recited in claim 7, wherein the system evolution
model is constructed using a density based approach for a time
series which lack smoothness in the one or more vector
trajectories.
11. The system as recited in claim 7, wherein the system evolution
model monitors a current system status and compares the current
system status with the system evolution model to detect
anomalies.
12. The system as recited in claim 7, wherein a window size is
increased during the sliding window segmentation to improve
detection precision.
Description
RELATED APPLICATION INFORMATION
[0001] This application claims priority to provisional application
Ser. No. 61/713,006, filed on Oct. 12, 2012, and provisional
application Ser. No. 61/719,425, filed on Oct. 28, 2012, each
incorporated herein by reference.
BACKGROUND
[0002] 1. Technical Field
[0003] The present invention relates to complex systems, and, in
particular, to modeling, analyzing, and managing time series
dynamics in complex physical systems.
[0004] 2. Description of the Related Art
[0005] Recently, cyber-physical systems (CPS) have risen to
prominence with examples including automobile and intelligent
transportation systems, medical devices and healthcare systems,
smart grid, industrial monitoring, etc. Since CPS integrates
computing, communication, and storage capabilities in the
monitoring entities, a significant amount of measurement data can
now be collected, which contains useful knowledge to provide an
opportunity for system self-management. Modern industries, such as
power plant systems, chemical systems and a variety of
manufacturers, deploy massive sensors to monitor the status of
physical systems. As a result, a large amount of time series
observations are collected from the massive sensors and an
effective method is needed to model the system dynamics from the
data. A good modeling of the underlying dynamics is important for a
wide range of applications, such as enhancing the understanding of
the underlying dynamics, improving predictive capabilities,
monitoring the status of the system, and facilitating anomaly
detection and failure diagnosis.
[0006] There has been research analyzing measurement data from
physical systems to improve the self-manageability of systems, and
this research may be classified into two categories: domain
specific techniques and domain independent solutions. The domain
specific techniques mainly rely on system experts to define rules
or policies to extract related knowledge from the data, thus a full
understanding of the properties of the systems are necessary. Such
methods require extensive human involvement and are domain
specific. Also, it may be difficult to obtain complete domain
knowledge with the increasing of system scale and complexity. The
domain independent solutions attempt to extract knowledge from data
by general analytic tools rather than systems experts, (e.g.,
similarity based approach, support vector machine (SVM) based
learning method, etc.). However, their solutions need large amount
historical data and are computationally expensive.
[0007] Since CPS integrates computing, communication, and storage
capabilities in the monitoring entities, a significant amount of
measurement data can now be collected, which contains useful
knowledge to provide an opportunity for system self-management.
However, a number of challenges exist to extract knowledge from
measurement data. For example, the data usually have thousands or
even millions of attributes with each exhibiting different
behaviors. Some attributes are also correlated with each other due
to the dependencies between system components. It is necessary to
have advanced analytic techniques for system measurements to
benefit management tasks such as anomaly detection, capacity
planning, and so on
SUMMARY
[0008] A method for modeling system dynamics, comprising extracting
features representative of a temporal evolution of a dynamical
system, further comprising; deriving one or more vector
trajectories by performing sliding window segmentation of one or
more time series, applying a linear test to determine whether the
one or more vector trajectories are linear or nonlinear, and
performing linear or nonlinear subspace decomposition on the vector
trajectory based on the linear test; and generating, using a
processor, a system evolution model from the extracted features of
the dynamical system; and determining a fitness score of the system
evolution model.
[0009] A system for modeling system dynamics, comprising a
processor configured to extract features representative of a
temporal evolution of a dynamical system, further comprising; a
sliding window segmentation module configured to derive one or more
vector trajectories of one or more time series, a linear testing
module configured to determine whether the one or more vector
trajectories are linear or nonlinear; and a subspace decomposition
module configured to perform linear or nonlinear subspace
decomposition on the vector trajectory based on the linear test;
and a modeling module configured to generate a system evolution
model from the extracted features of the dynamical system; and an
analytic engine configured to determine a fitness score of the
system evolution model.
[0010] These and other features and advantages will become apparent
from the following detailed description of illustrative embodiments
thereof, which is to be read in connection with the accompanying
drawings.
BRIEF DESCRIPTION OF DRAWINGS
[0011] The disclosure will provide details in the following
description of preferred embodiments with reference to the
following figures wherein:
[0012] FIG. 1 is a block/flow diagram of a computing system for
modeling time series dynamics in accordance with the present
principles;
[0013] FIG. 2 is a block/flow diagram of a method for modeling time
series dynamics in accordance with the present principles;
[0014] FIG. 3 is a block/flow diagram of a method for modeling time
series dynamics with integrated outputs in accordance with the
present principles;
[0015] FIG. 4 is a block/flow diagram of a method for generating
modeling profiles in accordance with the present principles;
[0016] FIG. 5 is a block/flow diagram of a method for data property
discovery for time series dynamics in accordance with the present
principles; and
[0017] FIG. 6 is a block/flow diagram of a system for modeling time
series dynamics with integrated outputs in accordance with the
present principles.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0018] The process statuses of complex physical systems may be
monitored by a large number of sensors. The processes in a complex
physical system usually develop some temporal patterns which are
inherently governed by the underlying dynamical system and can be
captured by the analysis of the time series observations from the
sensors. Indeed, the modeling of the system dynamics for temporal
patterns can be beneficial to a wide range of applications.
However, given the noisy, nonlinear and heterogeneous time series
data, it is a critical challenge to model the system dynamics
without an a priori knowledge of the system. In one embodiment
according to the present principles, a method to model time series
dynamics without much a priori knowledge about the system is
illustratively provided.
[0019] The above-mentioned method may be based on the fact that an
underlying dynamical system of a complex physical system is
deterministic and that the state trajectory demonstrates a
self-similar behavior governed by a certain smooth evolution
function. Specifically, sliding window segmentation may be
performed on the time series data to derive a vector trajectory
that includes all the dynamics information. Then, features of
system dynamics may be extracted by linear or nonlinear subspace
decomposition, which may be decided by an intrinsic dimension based
criterion. Next, an approach with temporal evolution awareness may
be employed to model the system dynamics.
[0020] There are several challenges to model the system dynamics.
First, a general method to model the time series dynamics without a
priori knowledge on the system model may be employed, given only
the time series observations from such physical systems, without
any background of the underlying system model itself. Second, the
time series from physical systems may exhibit both linear and
nonlinear behaviors. Modeling linear time series dynamics has been
researched, but it is still a challenge to model nonlinear time
series dynamics from the observation sequences. Finally, because
the time series data from physical systems are heterogeneous, the
time series are dramatically different in terms of shape, trend,
seasonal variation and periodicity. Therefore, a general a priori
knowledge free modeling method is needed to model system dynamics
from the noisy, nonlinear and heterogeneous time series.
[0021] In one embodiment, the underlying dynamical system of a
physical system may be deterministic, and this deterministic
characteristic may be exploited according to the present
principles. For a deterministic system, the underlying system
dynamical states may lie on a manifold in the state space endowed
with a family of smooth evolution functions. Therefore there may be
two important steps involved in the modeling of the nonlinear time
series dynamics. First, good features that represent the evolution
of the dynamical system may be extracted. Second, a modeling method
that considers the temporal evolution of the features of dynamics
is employed according to the present principles. Features of system
dynamics that capture the temporal dynamics of a time series may be
extracted through sliding window segmentation. A linear or
nonlinear subspace method may be employed to extract features of
dynamics based on intrinsic dimension criterion.
[0022] In one embodiment, a general data analytic engine for
physical systems may be employed to reduce the burden from
operators in management tasks according to the present principles.
Measurements may be collected during normal system operations, and
the engine may treat the data from each attribute as a time series,
and may discover typical properties or patterns from and across
time series. For example, some time series may demonstrate periodic
behaviors due to the repeated physical processes that generate the
data. Some time series may be highly correlated with some others in
the measurements. Given a large number of time series, the engine
may discover related properties for each time series and profile
them based on the learned patterns. The learned data profiles may
bridge the gap between raw data and system management policies, and
hence may help operators to better understand the system and
improve performance. The learned data profiles may be employed to
detect anomalies in the system. For example, given the online
measurements from system operations, the newly observed data may be
evaluated with respect to its related profile, and if there are
significant deviations, some alerts may be issued to notify
operators to check the system operation.
[0023] In one embodiment, the unique features of data
characteristics in physical systems may provide a guideline for
developing the analytic engine. Considering that there may be a
high diversity of data behaviors, the engine may include an
ensemble of analysis models, each of which may explore a specific
property from the data (e.g., constant model, periodic model,
cumulative sum (CUSUM) model, AutoRegressive model with eXternal
input (ARX), etc.). The properties considered in those models may
include different compositions of attributes, including those from
a single attribute, every pair of attributes, a group of
attributes, or the whole data set. Each model in the engine may
discover the group of time series that follows the property it is
associated with. As the models in the engine may cover all the
measurement data, each time series may ultimately find at least one
model that can capture the behavior of its evolutions.
[0024] As the output, the engine may provide a profile for each
attribute in the monitoring data. The profile may describe the
expected behavior of a time series according to the learned
properties in models. For example, the mean and variance statistics
may be employed to profile a time series whose values are close to
a constant. The shape and length of the period may be employed to
profile periodic time series. Based on those profiles, the status
of future time series may be checked, and hence detect anomalies of
system operation. For example, given a newly observed value of each
attribute, it may be compared with the expected value computed from
the profile, and an alert may be generated if their deviation is
large (e.g., over a threshold). By integrating and analyzing alerts
from all the time series, the engine may provide a clear indication
about the global status of system operations.
[0025] The engine employed according to the present principles is
efficient because the learned properties of each time series may be
represented in a parametric way (e.g., using equations and
parameters), which provides efficient computation times. In
addition, a common input and output interface for each model in the
analytic engine may be employed, and as a result, new models may be
easily added to further improve the engine performance.
[0026] Referring now to the drawings in which like numerals
represent the same or similar elements and initially to FIG. 1, a
block/flow diagram showing a computing system for modeling time
series dynamics 100 is illustratively depicted in accordance with
one embodiment of the present principles. The system 100 may
include a workstation or system 102. The system 102 preferably
includes one or more processors 104 and memory 106 for storing
applications, modules, and other data.
[0027] In one embodiment according to the present principles, the
system 102 may include one or more displays 108 for viewing. The
displays 108 may permit a user to interact with the system 102 and
its components and functions. This may be further facilitated by a
user interface 110, which may include a mouse, joystick, or any
other peripheral or control to permit user interaction with the
system 102 and/or its devices. It should be understood that the
components and functions of the system 102 may be integrated into
one or more systems or workstations.
[0028] The system 102 may receive input 112, which may include time
series data 114, and a sliding window segmentation module 116 may
perform sliding window segmentation on the time series data 114. A
feature extraction module 118 may extract features of system
dynamics by linear or nonlinear subspace composition that represent
the temporal evolution of the system 102, and a modeling module 120
and/or an analytic engine 122 may model system dynamics based on
the features extracted by the feature extraction module 118. A
model integrator module 124 may be implemented to combine
information from different models, and to generate an overall
report of system operation. The system may generate output in block
128, which may include a temporal system dynamics model 130 and
anomalies 132 detected by an anomaly detection module in block
126.
[0029] Referring now to FIG. 2, a method for modeling time series
dynamics 200 is illustratively shown in accordance with the present
principles. In one embodiment, one or more time series are
generated in block 202. To extract representative features of time
series dynamics, sliding window segmentation of the time series may
be performed in block 204 to derive a vector trajectory. In time
series segmentation, the dynamics of a time series X={x.sub.1,
x.sub.2, . . . , x.sub.t, . . . } may be represented by x.sub.t as
well as its past observations, and x.sub.1, x.sub.2, . . . x.sub.t
may represent observations at different times. As a result, time
series dynamics may be contained in one or more vector trajectories
constructed by sliding window segments of time series with a window
size d. A linear test 206 may be performed to determine whether a
time series is linear or nonlinear. The linear test 206 may employ,
for example, an intrinsic dimension based criterion to determine
whether to use a linear subspace decomposition or nonlinear
subspace decomposition method to derive the features of system
dynamics. A linear subspace decomposition may then be performed on
the vector trajectory to extract features of system dynamics in
block 208, and the linear subspace decomposition may be optimal in
the least square sense. For high level nonlinear behavior time
series, a nonlinear subspace decomposition may be performed on the
vector trajectory to extract features of system dynamics in block
210.
[0030] In one embodiment according to the present principles, a
complementary method may be employed to model system dynamics. A
Vector-Autoregressive (VAR) model 212 may be employed to model the
trajectory of features. A fitness score may be determined in block
214, and for time series that may lack smoothness in feature
trajectories, these time series feature trajectories may result in
a low fitness score in the VAR model, and a density based model 216
may be employed to model the behaviors. The built model may be
applied for applications such as, for example, anomaly detection
and reporting in block 218 for time series with any fitness score
(e.g., High, Low).
[0031] In one embodiment, a unique feature series pattern may arise
for a deterministic physical system starting from a given state. To
model the system dynamics, good features that represent the
evolution of the system dynamics may be extracted. The time series
may first be segmented with a sliding window size to derive a
vector trajectory that includes all the dynamics information in
block 204. Then the features of system dynamics may be extracted by
linear or nonlinear subspace decomposition in blocks 208 and 210,
which may be decided by an intrinsic dimension based criterion
linear test in block 206. The extracted features may be
non-redundant which may be beneficial for the modeling complexity
and robust to the noise in the data, and the extracted features may
present smooth trajectories.
[0032] In one embodiment according to the present principles, a
time series X=(x.sub.1, x.sub.1, . . . , x.sub.t, . . . ) from a
physical system may be generated in block 202. A consecutive
subsequence with length d to fix the state may be employed. In a
physical system, a mapping function, which may be unknown, may
generate the time series and may be modeled as differential
equations to represent the underlying physical process. A state of
the physical system may be represented as a vector sequence that
covers consecutive observations. Time series dynamics may be
included in a segment of time series, and the dynamics of a time
series X may be represented by x.sub.t as well as its past
observations. As such, time series segmentation may be
advantageously employed according to the present principles to
model the system dynamics.
[0033] In one embodiment, moving window segmentation may be
performed by segmenting time series observations with a sliding
window of size d in block 204 according to the present principles
using, for example, z.sub.t=.left brkt-bot.x.sub.t, x.sub.t+1 K,
x.sub.t+(d-1).right brkt-bot.. A segment vector trajectory matrix
may then be generated (e.g., Z=[z.sub.1, z.sub.2, . . .
z.sub.n].sup.T), where Z represents a trajectory matrix and
z.sub.1, z.sub.2, . . . z.sub.n may represent observations at
different times. In trajectory matrix Z, the dynamics of the
original time series may be represented by consecutive subsequences
with length d. The trajectory matrix may include information about
evolution of the dynamics of a time series. A subspace
decomposition method may be employed to derive the features of
system dynamics. As the dynamics may include linear and nonlinear
dynamics, a linear or nonlinear subspace decomposition may be
employed in blocks 208 and 210 based on a linear test (e.g.,
intrinsic dimension based criterion) in block 206.
[0034] In one embodiment, after determining the trajectory matrix
Z, singular value decomposition may be applied using, for example,
Z.sub.d.times.n=U.SIGMA.V.sup.T, where elements of E may be the
singular values .SIGMA.=diag(.lamda..sub.1, . . . , .lamda..sub.k,
.lamda..sub.r+1, . . . , .lamda..sub.r), and d is length, n is the
number of samples in a time series matrix, U is a transformed
matrix, V.sup.T is a transposed matrix, and .lamda. may represent
the outcome of the transformation. Based on the magnitude of
singular values .lamda..sub.i, the space of Z may be decomposed
into signal and noise subspaces. The method
Z.sub.d.times.n=U.SIGMA.V.sup.T=U.sub.S.SIGMA..sub.SV.sub.S.sup.T+-
U.sub.n.SIGMA..sub.nV.sub.n.sup.T, where U.sub.S=[u.sub.1, u.sub.2,
. . . , u.sub.k] may be selected from the left k columns of U, may
form the bases of signal space, where U.sub.S represents the signal
part of trajectory matrix U, and u.sub.1, u.sub.2, . . . u.sub.k
represent elements in the U matrix. An eigenvector number k may be
chosen such that
k = min k { i = 1 k .lamda. i 2 .lamda. i 2 .gtoreq. } ,
##EQU00001##
where .epsilon. is a predefined threshold value.
[0035] In one embodiment, a vector z.di-elect cons.R.sup.d may be
represented by the summation of two projection vectors from two
subspaces z={circumflex over (z)}+{tilde over (z)}, with the signal
part of z and noise part {tilde over (z)}, where z represents a
vector, R is a real number space, d is a d-dimensional space. A
low-dimensional representation of in the signal subspace, which may
be represented as y=U.sub.s.sup.Tz, may be the principal component
vector of z and may represent the r-dimensional coordinates of in
the signal subspace, may be obtained. Then the feature of system
dynamics may be defined as y=U.sub.s.sup.Tz.
[0036] In one embodiment, a time series (e.g., {x.sub.1, . . . ,
x.sub.t, . . . }) may represent high level nonlinear behaviors. The
trajectory matrix Z constructed by, for example, sliding window
segmentation, may lie on a nonlinear manifold. Therefore, the
linear subspace decomposition, which may be optimal in the least
square sense, may not be appropriate for this type of time series.
Instead, when presented with a high level nonlinearity, a nonlinear
transformation of the segment trajectory may be applied into a
feature space F by .PHI.:z.fwdarw..PHI.(z), where .PHI.:z
represents a feature of z trajectory. Then subspace decomposition
may be performed in the feature space F to extract features of
system dynamics in block 210.
[0037] In one embodiment, after determining a set of N segment
trajectory vectors z.sub.j, they may be mapped into
higher-dimensional feature space by using, for example,
z.fwdarw..PHI.(z), which may result in .PHI.(Z)=[.PHI.(z.sub.1),
.PHI.(z.sub.2), . . . , .PHI.(z.sub.n)]. In the feature space,
subspace decomposition may then be performed on .PHI.(Z). The
subspace decomposition on .PHI.(Z) may be equal to diagonalize
C = 1 N j = 1 N .PHI. ( z j ) .PHI. ( z j ) T , ##EQU00002##
where C represents a covariance matrix of .PHI.(Z), and N
represents the number of elements.
[0038] The diagonalization may be equal to .alpha.=N.lamda..alpha.,
where .alpha. denotes the column vector with entries .alpha..sub.1,
. . . , .alpha..sub.N. The equation
.sub.i,j=(.PHI.(z.sub.i).PHI.(z.sub.j)) is called the Kernel
function, and may be employed to compute the dot product instead of
computing the mapping .PHI.. Choosing the Kernel function may be
equivalent to choosing .PHI., the embedding map function. Commonly
used kernel functions include polynomial kernel (x.sub.i,
x.sub.j)=(x.sub.i.sup.Tx.sub.j+1).sup.p and Gaussian RBF kernel
( x i , x j ) = exp ( - 1 2 .sigma. 2 x i - x j ) 2 .
##EQU00003##
In one embodiment, z.sub.i may be a point with an image
.PHI.(z.sub.i) in F, and
V.sup.k.PHI.(z.sub.i)=.SIGMA..sub.j=1.sup.N.alpha..sub.j.sup.k(z.s-
ub.i+z.sub.j) may be the nonlinear subspace decomposition
corresponding to .PHI., where V.sup.k represents a transformation
matrix. The method
y i = j = 1 N .alpha. j k ( z i + z j ) ##EQU00004##
may be defined as the system dynamic feature extracted in the
feature space for segment vector z.sub.i, where y.sub.i represents
a transformation matrix.
[0039] In one embodiment, nonlinear models may be more accurate
when the nonlinearities of the underlying structure are not
effectively approximated by linear models. To determine which model
to choose, given a set of measurements, an intrinsic dimension of
data may first be estimated, and then a statistical test to decide
whether use the linear or nonlinear model may be employed in block
206. Intrinsic dimension may be the number of variables employed to
represent a manifold in R.sup.d. An estimator designed to determine
the dimension of the attractor of a chaotic dynamical system from
samples of its time series may be employed according to the present
principles. Given an r-dimensional data embedded in d-dimensional
space, z.di-elect cons.R.sup.d, the correlation integrals (C.sub.n)
may be defined as
C n ( v ) = 2 n ( n - 1 ) i < j n I z i - z j < v ,
##EQU00005##
where v is a vector, and n is the number of possible pairs.
C.sub.n(v) be computed for different v.sub.i, and a line may be fit
through [log v.sub.i, log C.sub.n(v.sub.i)] to derive r, where r
represents an intrinsic dimension.
[0040] In one embodiment, the linear model may be tested to
determine whether the linear model may be sufficient for
discovering the geometry of data samples based on the estimated
dimension r. First, SVD of the data matrix may be performed, and
whether the linear subspace with dimension r covers enough
variances of the original space may be determined. The method may
be defined as
.gamma. = [ .lamda. 1 2 + .lamda. 2 2 + + .lamda. r 2 .lamda. 1 2 +
.lamda. 2 2 + + .lamda. k 2 ] 1 / 2 , ##EQU00006##
where .lamda..sub.is are the eigenvalues form SVD, and k=min{d,n},
and .gamma. represents a ratio of signal power. If the .gamma.
value is larger than a predefined threshold, linear model may be
employed to characterize the normal data. Otherwise, the nonlinear
model may be employed.
[0041] In one embodiment, the system dynamics may be modeled based
on the extracted features. The projected features may demonstrate
deterministic trajectories along time. Here, a complementary
strategy may be adopted to model the system dynamics. A
Vector-Autoregressive (VAR) technique may be employed in block 212
to model the trajectory of features as it considers temporal
smoothness of trajectory. Because a small set of time series may
lack smoothness in feature trajectories, these time series may
result in a low fitness score in the VAR R model, and as such, a
density based approach may be employed to model their behaviors in
block 216.
[0042] As a deterministic system may demonstrate a self-similar
behavior, which means that a current state may be predicted based
on states in recent past, current system states may be modeled as a
linear function of previous states according to one embodiment. As
such, the following recursive state transition function may be
employed according to the present principles:
y t = c + i = 1 p i y t - i + ( t ) , ##EQU00007##
where y.sub.t may be the extracted feature of the dynamical systems
and .PI..sub.i may be a k.times.k transition matrix. Vector c may
be a vector of intercept terms so that nonzero mean system dynamics
may be modeled. The influence of external randomness on a physical
system during its interaction with its surroundings may be modeled
using Gaussian noise .epsilon.(t).about.(0,.SIGMA.).
[0043] There may be several estimation methods employed to obtain
VAR parameters in accordance with the present principles, such as
the least square estimation and maximum likelihood estimation,
which may be conditioned on the first p observations. In one
embodiment, the least square estimation is employed according to
the present principles. For example, there may be N+p system
dynamic features {t.sub.t}.sub.1-p.sup.N available in one
embodiment. The estimation may be conditioned on the first p
observations (e.g., {y.sub.1-p, . . . , y.sub.0}) and estimation
may be based on the last N observations (e.g., {y.sub.1, . . . ,
y.sub.N}). The model may be represented as a more compact
expression as y.sub.t=Bw.sub.t, with B being the k.times.(kp+1)
parameter matrix B=.left brkt-bot.c, .PI..sub.1, . . . ,
.PI..sub.p.right brkt-bot. and w.sub.t=[1, y.sub.t-1, . . . ,
y.sub.t-p].sup.T of dimension kp+1.
[0044] In one embodiment, based on the initial conditional vector
w.sub.1=[1, y.sub.0, . . . , y.sub.1-p].sup.T, the estimation of
{circumflex over (B)} may be represented by
B ^ k .times. ( kp + 1 ) = [ t = 1 N y t w i T ] [ t = 1 N w t w t
T ] - 1 . ##EQU00008##
It is noted that {circumflex over (B)} may contain the constant
term c and coefficients {{circumflex over (.PI.)}.sub.1,
{circumflex over (.PI.)}.sub.2, . . . , {circumflex over
(.PI.)}.sub.p}. Bayesian Information Criteria (BIC) may be employed
to determine the optimal lag p value in the VAR model in block 212.
After learning the system dynamic model parameters {c, {circumflex
over (.PI.)}.sub.1, {circumflex over (.PI.)}.sub.2, . . . ,
{circumflex over (.PI.)}.sub.p} and given the p previous dynamic
feature y.sub.t, y.sub.t-1, . . . , y.sub.t+1-p, the next one step
dynamic feature value may be predicted by using, for example,
y.sub.t+1=c+{circumflex over (.PI.)}.sub.1y.sub.t+{circumflex over
(.PI.)}.sub.2y.sub.t-1+ . . . +{circumflex over
(.PI.)}.sub.py.sub.t+1-p.
[0045] In one embodiment, the goodness of the modeling of system
dynamics may be determined and tested by, for example, a metric in
block 214. One method to measure goodness of fit is the R square
(R.sup.2), which may be defined as
R 2 = 1 - residual sum of squares total sum of squares .
##EQU00009##
Considering N predictions, the measure may become
R 2 = 1 - t = n + 1 y t - y ^ t 2 t = n + 1 y t - y _ 2 ,
##EQU00010##
where y=.SIGMA..sub.t=n+1t.sub.t/N. The R.sup.2 fit score may be a
scale free metric, and a large R.sup.2 may indicate that the model
provides a closer fit to the data.
[0046] In one embodiment, a small set of time series that are not
effectively modeled using the VAR model, and for such time series,
a low fitness score may be generated. A nonparametric density
method may be employed to compensate for the low fitness score in
block 216. Instead of modeling temporal coherence, a density based
method may build a distribution on previously observed feature
values. Density estimation may estimate y.sub.i+t by a weighted
value of current y.sub.t plus its k-NN neighbors' differences. The
nonparametric kernel density based regression model may be
represented by
y ^ i + 1 = k = 1 N n ( y i ) w ( y i , y k ) ( y k + 1 - y k + y i
) ##EQU00011##
where y.sub.k may be one of the nearest neighbors of y.sub.t
obtained from a k-NN search on previous feature values, and
y.sub.k+1 may be the corresponding next point of y.sub.k in the
feature trajectory. The weights may be computed using a kernel
function such as
w ( y t , y k ) = K h ( y t - y k ) k = 1 N n ( y i ) K h ( y t - y
k ) ##EQU00012## with ##EQU00012.2## K h ( b ) = 1 h K ( b h )
##EQU00012.3##
where K() may be a kernel function and h may be the bandwidth of
the kernel. In one embodiment, the Gaussian kernel may be set to be
(0, 2).
[0047] In one embodiment, the present principles may be applied for
time series anomaly detection in block 218. Anomalies in time
series data may be data points that significantly deviate from the
normal pattern of the data sequence, and temporal aspects of the
data may be considered. A modeling and prediction based method may
be employed, a system evolution model may be constructed from the
past time series observations, and the current system status may be
monitored by checking incoming data with the model to determine if
an anomaly has occurred. The built system dynamic model may have
already considered the temporal dynamical information of the
original time series. Therefore anomaly detection may be performed
by comparing the predicted feature value and the current feature
value. For example, if y.sub.t+1 is the predicted feature value by
y.sub.t+1=c+{circumflex over (.PI.)}.sub.1y.sub.t+{circumflex over
(.PI.)}.sub.2y.sub.t-1+ . . . +{circumflex over
(.PI.)}.sub.py.sub.t+1-p or
y ^ t + 1 = k = 1 N n ( y t ) w ( y t , y k ) ( y k + 1 - y k + y t
) , ##EQU00013##
as shown above, and the feature value for the next one step time
series x.sub.t+1 is y.sub.t+1=U.sub.s.sup.Tz.sub.t+1, an alarm may
be flagged when
.parallel.y.sub.t+1-y.sub.t+1.parallel..gtoreq..epsilon. where
.epsilon. is a threshold value.
[0048] In one embodiment, the present principles may be applied to
a large collection of time series from a physical system (e.g., a
power plant system), and the anomaly detection performance may be
compared. A plurality of time series may include sample
observations collected over a period of time, and together these
time series may capture the entire system state. The model may be
built from normal operations, and anomalies may be detected based
on the learned model.
[0049] The evaluation comparisons may include two parts. First, the
present principles may be applied without nonlinear subspace
decomposition with a standard time series modeling method
autoregressive (AR) model. This means that the features of dynamics
may be extracted by linear subspace decomposition. The AR method
may be employed as a baseline because the VAR model may be employed
on the feature vector, and the AR method may model the time series
directly.
[0050] During the second comparison, the anomaly detection
performances on high level nonlinear behavior time series that
cannot go through the intrinsic dimension based linear test may be
compared. More specifically, the anomaly detection performances
between the nonlinear subspace decomposition and linear subspace
decomposition may be compared. It is noted that the present
principles provide for a much lower false alarm rate as compared to
the AR model, and the present principles are more robust to
different threshold levels. Furthermore, anomaly detection
performance for all different levels of anomalies (e.g., weak,
middle, strong), has been shown to achieve a more than 10 percent
improvement of detection rate, while significantly lowering false
alarms when compared to the AR model.
[0051] In one embodiment, the sliding window segmentation process
may have one parameter, namely the window size d. The window size d
may affect the performance of anomaly detection. To investigate the
effect of window size d, a different value of d may be set, and the
corresponding detection rate may be obtained. In one embodiment,
increasing of window size d may improve detection precision.
Increasing of window size d may recover the system dynamic better,
which may improve detection precision. It is noted that the
detection rate may not be sensitive to the change of window size.
This may be in part because the reduced representation of the
original trajectory may have already been extracted during the
system dynamic extraction.
[0052] Furthermore, the reduced representation of the dynamic
feature may avoid any potential underfitting problems during the
system modeling stage. Without subspace decomposition, the
dimensions may be d.times.d. When d is large, it may lead to
underfitting problem for limited training samples. After subspace
decomposition, the dimensions may become k.times.k, and k may be
much smaller than d. An underfitting problem may arise when
embedding dimension d is high, but the present principles may
leverage this problem by employing system dynamic feature
extraction.
[0053] Referring now to FIG. 3, a block/flow diagram of a method
for modeling time series dynamics with integrated outputs 300 is
illustratively shown in accordance with the present principles. In
one embodiment, time series data may be input in block 302.
Properties of the data may be profiled in block 304, and complex
data dependencies may be discovered (e.g., from bottom to top) in
block 306. Model training 308 and model testing 310 may be
performed using an ensemble of analysis models, each of which may
cover a specific data property within a unified framework. An
architecture may be provided for integrating outputs from a
plurality of analysis models in block 312, and the final system
status report may be output in block 314.
[0054] Referring now to FIG. 4, a block/flow diagram of a method
for generating modeling profiles 400 is illustratively depicted in
accordance with the present principles. Time series data may be
generated and/or input in block 402, and may be represented by
x.sub.t. The model parameters (e.g., .theta.) may be learned (e.g.,
estimated) in block 406 based on the input data. In one embodiment,
different models may have different types of parameters and may
follow different learning processes. For example, in a model that
discovers the periodic behavior of x.sub.t, the frequency
distribution and auto-correlation function (ACF) of x.sub.t may be
leveraged to learn the parameters that describe the shape of
x.sub.t's period. In learning the dependency between a pair of
attributes, the parameters of an ARX model may be employed to
capture their delay-aware correlations.
[0055] In one embodiment, different models may have different
representations, but they may all include a common metric, which
may be represented as the fitness score F. The fitness score may
reflect the goodness of fit for a given time series. For example, a
time series with weak periodicity behaviors may receive a low
fitness score in the periodic model. The computation of fitness may
vary with each model. In the periodic model it may be based on the
magnitude of dominant components in x.sub.t's frequency
distribution, whereas the fitness of pairwise correlation may be
based on the average estimation error of the ARX model. The fitness
score may be employed to remove irrelevant time series from each
model. For example, a threshold may be defined and time series
whose fitnesses are below that threshold may be pruned out in block
408. As a result, each model may cover a group of time series that
follow the data property of that model.
[0056] In one embodiment, if x.sub.t has a high fitness score, that
time series may be further profiled in block 410 based on the
learned properties to determine an overall modeling profile 404. In
addition to the fitness score F, two other elements may also be
included in x.sub.t's profile: the learned model parameters
.theta., as well as the maximum error .DELTA. obtained when fitting
x.sub.t to the model based on .theta.. While model parameters are
employed to generate the prediction of future observations during
online monitoring, the maximum error .DELTA. may be employed as the
threshold to check the deviation of the predicted value from the
real observation. Given a large number of time series, the model
may scan them one by one, learn the model parameters, and may only
keep those with high fitness scores. For each selected time series,
a profile {.theta., F, .DELTA.} may be built, so that its future
observations may be checked based on that profile, and the model
profiles may be output in block 412.
[0057] Referring now to FIG. 5, a block/flow diagram of a method
for data property discovery for time series dynamics 500 is
illustratively depicted in accordance with the present principles.
In one embodiment, data in physical systems may be relatively
deterministic and may follow certain regularities. Each attribute
may be profiled once appropriate properties in its data are
discovered. However, different attributes may follow different
patterns that may be unknown during profiling. The analytic engine,
may define a plurality of potential properties that may be expected
to exist in measurement data based on physical systems. For each
candidate data property, a model may be constructed to discover the
group of time series that follow the appropriate property and
corresponding profiles may be built.
[0058] In one embodiment, there may be a plurality of properties in
physical data that correspond to different aspects of system
evolutions. For example, for each individual attribute, its time
series may show constant, periodic, autoregressive, or other
behaviors. There may also be various types of dependencies across
different groups of attributes. Data properties modeled in the
engine may be classified into four categories, which are
illustratively depicted in FIG. 5, and labeled as layer 1 through
layer 4. Each layer may represent a specific data dependency
relationship, ranging from the single attribute analysis 508,
pairwise relationship analysis 506, group wise analysis 504, to
full attribute analysis 502. While the first layer may mainly
describes data properties within a single attribute, the upper
layers may capture higher-order attribute interactions gradually to
leverage more complex data properties in accordance with the
present principles.
[0059] In one embodiment, the single attribute analysis 508 may
build a plurality of models to describe properties from individual
time series, such as the periodic model for signals with
periodicity, the constant model for nearly constant signals, the
cumulative sum (CUSUM) model for time series with weak dynamics,
the autoregressive (AR) model to measure linear signal dynamics,
etc. Pairwise analysis may be performed in block 506 to model
correlations between one or more pairs of system attributes. Note
that the measurement of each attribute may be a time series, and
the correlation of two time series may involve multiple time
points. That is, the value of one attribute may affect not only the
immediate value of the other attribute but also may affect values
with some time delays. The AutoRegresive model with eXternal input
(ARX) may be leveraged because it may learn the delay-aware
correlation between each pair of time series, and a new search
mechanism which quickly identify attribute pairs that exhibit high
correlations may be employed according to the present
principles.
[0060] In one embodiment, a group-wise analysis 504 may be
performed for relationships among multiple time series. A
delay-aware multivariate regression model may be employed in the
engine to cluster time series and capture correlations in each
cluster. The top layer relates to performing an all-attributes
analysis 502, which may analyze the behavior and dependencies for
all attributes in the systems. Following the group-wise analysis
504, the all-attributes analysis may consider inter-cluster
correlations of attributes to obtain a global picture of data
distribution.
[0061] Each data property in the layers 502, 504, 506, 508 may be
associated with a model in the engine, which may discover a set of
time series that follows that property. While the analytic engine
may model as many data properties as possible, it may not be
guaranteed to include all the necessary properties in all
measurement data. However, a common interface may be employed for
all the models in the engine, so that new models may be easily
added to enhance engine performance. A model integrator may be
employed to combine information from all the models and to generate
an overall report of system operation.
[0062] Referring now to FIG. 6, a block/flow diagram of a system
for modeling time series dynamics with integrated outputs 600 is
illustratively depicted in accordance with the present principles.
In one embodiment, data is collected by employing, for example,
physical sensors and/or monitoring servers in block 602. The
sensors may collect a plurality information (e.g., temperature,
pressure, etc.), and time series data may be generated and input in
block 604 into the modeling and integration system 606.
[0063] In one embodiment, the profile modeling/analytic engine
module 608 may include a plurality of models (e.g., Model 1, Model
2, . . . Model N, as shown in blocks 610, 612, and 614,
respectively), each of which may profile a group of time series
that follow a specific data property. The models may include all
the measurement data, and as such, each time series may generate a
model that includes the profile of its behaviors. In the online
monitoring process, as shown in FIG. 5, new measurements of all
time series will be dispatched to corresponding models. The model
status checking/anomaly detection module 616 may check the values
of associated time series based on their profiles, and may report
the status to the model integrator module 618. The model integrator
module 618 may combine reports from some or all of the models and
may generate and output a global report of the system status in
block 620.
[0064] In one embodiment, each model may contain a profile (e.g.,
{.theta., F, .DELTA.}) for each covered time series. During the
monitoring process, the value of new observations at every time t
may be predicted based on x.sub.t's profile as well as its past
observation at, for example, t-1, t-2, . . . t-N. The predicted
value of x.sub.t may be represented as {circumflex over (x)}.sub.t.
The model may also receive the real observation x.sub.t at time t,
and the absolute difference between {circumflex over (x)}.sub.t and
real observation x.sub.t may be calculated as the residual
r.sub.t=|x.sub.t-{circumflex over (x)}.sub.t|. If r.sub.t is larger
than the threshold .DELTA. in x.sub.t's profile, it may indicate
that the newly observed value deviates from the model. If a
predetermined number (e.g., 3, 4, etc.) of consecutive threshold
violations are observed, the profile modeling/analytic engine
module 608 may determine that the deviation is caused by some
faults in the system rather than the noise, and may send a status
report to the model integrator module 618. The status report may be
represented as an information tuple (e.g., time, model ID,
attribute ID, fitness), and may contain one or more of a time stamp
of the last violation, IDs of the model and time series, and the
fitness score of that time series with respect to the model.
[0065] In one embodiment, at each time t, the model integrator
module 618 may receive status reports from all the models 610, 612,
614, each of which may relate to an alert from measurement data.
The number of those alerts may reflect the health of the system. In
addition, since the alerts may be derived by inputting related time
series into the model, the goodness of that fit may reflect the
overall reliability of the alert. Therefore, the summation of
fitness values from the received status reports may be employed to
describe the system status, which may be denoted as the `anomaly
score` of the system, and may be detected in block 616. The anomaly
score may be based on the sum of alerts at time t with each alert
weighted by its associated fitness value. A high anomaly score may
mean that the system significantly deviates from its normal
situations. Once the anomaly score exceeds a predefined threshold,
the model integrator module 618 may generate and output an alarm in
block 620 so that system operators may be informed regarding any
possible problems with the operation of the system.
[0066] In one embodiment, the fitness value in each model may be
normalized between -1 and 1, so that all the models may be treated
equally in the status integration. Once an alarm has been
generated, the operator may pinpoint the root cause of the problem
based on a plurality of factors including the model IDs and
attribute IDs from status reports, as well as a plurality of
advanced diagnosis models, some of which are discussed in further
detail hereinafter. The first four models discussed (constant
model, cumulative sum (CUSUM) model, auto-regressive (AR) model,
and periodic model) may be associated with the single attribute
analysis, which may model the data properties in individual time
series. One model (pairwise time series model) may be associated
with the pairwise analysis, and another model (multivariate time
series model) may be associated with the group-wise analysis.
[0067] In one embodiment, the constant model may capture signals
whose values are almost a constant (e.g., having small deviations
around the mean). Given measurements of a time series (e.g.,
x.sub.t, t-1, . . . , N), the mean
.mu. = 1 N i = 1 N x i , ##EQU00014##
where N represents the number of samples in the time series, and
variances
.sigma. 2 = 1 N - 1 t - 1 N ( x t - .mu. ) 2 ##EQU00015##
may be computed as model parameters .theta.[.mu.,.sigma..sup.2].
The following fitness score describes an illustrative example of
the level of variances that may be present in x.sub.t: F=1- {square
root over (.sigma..sup.2/.mu.)}. In one embodiment, low
.sigma..sup.2 values may lead to high fitness scores, where F
represents the fitness score. F may reach the highest value 1 when
the time series is a pure constant, (e.g., .sigma..sup.2=0). The
levels of .sigma..sup.2 may also depend on the mean .mu.. For
example, a signal with mean .mu.=1000 and variance .sigma..sup.2=1
may be nearer to a constant compared with a signal with .mu.=10 and
variance .sigma..sup.2=1. Therefore, the mean .mu. may be added in
F=1- {square root over (.sigma..sup.2/.mu.)} to represent the
effect of base value in the fitness.
[0068] The model may select those time series whose fitness F is
larger than .gamma., where .gamma. may be a predefined parameter
based on expectations of constant signals, and a common .gamma. can
range from 0.3 to 0.8. For the selected time series, the threshold
.DELTA. used for the online monitoring may be further computed as
follows:
.DELTA. = max i - 1 , , N x t - .mu. F , ##EQU00016##
[0069] which may be defined as the maximum deviation of x.sub.t
from mean .mu. divided by the fitness F. In one embodiment, larger
deviations (e.g., |x.sub.t-.mu.|) may lead to a large bound for
error checking. The fitness F may be placed in the denominator
of
.DELTA. = max i - 1 , , N x t - .mu. F ##EQU00017##
because those time series with lower fitness may have larger
uncertainties and hence may employ wider band in checking their
behaviors.
[0070] In one embodiment, by employing the profile {.theta., F,
.DELTA.} of each covered time series x.sub.t, the residual of each
new observation obtained during the online monitoring may be
computed by r.sub.t=|x.sub.t-.mu.| and compared with the threshold
.DELTA. to check the online status. Note that as compared with
r.sub.t=|x.sub.t-{circumflex over (x)}.sub.t|, the expected value
of x.sub.t during the monitoring may be its mean .mu..
[0071] While the constant model may discover time series with small
variances, in one embodiment, the CUSUM model may focus on a set of
attributes with stable means. That is, although some attributes may
include large variances, their values may continuously oscillate
around the mean. CUSUM is a sequential analysis technique employed
in process control. It may define two counters C.sup.+ and C.sup.-
for each time series x.sub.t, which may accumulate the deviation of
x.sub.t above the mean (e.g., x.sub.t-.mu.) and below the mean
(e.g., x.sub.t-.mu.), respectively.
[0072] Because x.sub.t may oscillate around the mean, if the value
of x.sub.t-1 is above the mean, which may lead to an increase of
C.sup.+ and decrease of C.sup.-, the next value x.sub.t may be
expected to be below the mean (e.g., with an decrease of C.sup.+
and increase of C.sup.-). As a consequence, the counters C.sup.+
and C.sup.- are normally set to keep small values. To measure the
goodness of the CUSUM model for a selected time series, the fitness
score may be computed, the counters may be updated for each new
observation in the online monitoring, and the deviation error may
be computed.
[0073] In one embodiment, the AR model is employed according to the
present principles. The AR model is a general technique in time
series analysis which may capture the auto-regressive behavior
series (e.g., the current value of the time series may be highly
correlated with previous observations. Given the measurements of
x.sub.t as training data, the least square regression may be
employed to estimate model parameters, and also to determine the
estimated value {circumflex over (x)}.sub.t. The AR model may
select time series with fitness larger having strong
auto-regressive behaviors. For each selected series, its threshold
.DELTA. may be determined from the estimation error r.sub.ts, which
may be the maximum of r.sub.t from the training data. During the
online monitoring, given every new x.sub.t, predicted value
{circumflex over (x)}.sub.t may be estimated based on the computed
residual r.sub.t and may be compared with the threshold .DELTA. to
infer the status of x.sub.t.
[0074] In one embodiment, a periodic model which may focus on
signals that exhibit periodicity behaviors is employed in
accordance with the present principles. The periodic model may
contain two parts: periodicity detection and time series profiling.
In the first step, the time series in the frequency domain may be
analyzed and a fitness score may be defined to measure whether the
series is periodic or not. For a time series x.sub.t with high
fitness, the second step may estimate the period length T from
x.sub.t's auto-correlation function (ACF) and may build a profile
for x.sub.t. A feature of a periodic time series x.sub.t is that it
may contain dominant frequencies in its evolution. Fast Fourier
transform (FFT) may be employed to compute the power spectrum
X.sub.n of x.sub.t, where n=1, . . . , .left brkt-top.N.right
brkt-bot., and the majority of the power may be occupied by a small
number of dominant frequency components.
[0075] In one embodiment, to capture the `dominance` phenomenon in
the power distribution, power X.sub.ns may be ordered from high to
low, and the minimum number of top components, denoted as K, that
contains 90 percent of X.sub.n's total power, may be determined. A
small K value may mean that there exists dominant frequencies in
x.sub.t's power spectrum, and hence x.sub.t may be more likely to
exhibit periodic behaviors. The probability distribution of K,
P(K), when x.sub.t is a random sequence (e.g., non-periodic), may
be estimated. From that distribution, a bound K* may be inferred
such that as long as K is smaller than K* there exists a high
confidence that x.sub.t may be a periodic signal. As a result, the
fitness of x.sub.t a may be defined as
F = K * - 2 K + 1 K * - 1 , ##EQU00018##
where F represents the fitness score, K* represents the number of
expected components in a power distribution, and K represents the
actual number of components in a power distribution. When there is
only one dominant frequency component in x.sub.t's power
distribution, there may exist K=1, with the highest fitness score
of F=1. When the number K reaches K*, the fitness score may drop to
-1. In one embodiment, only those time series whose fitness scores
are larger than -1 (e.g., with less than K* dominant frequencies)
may be selected.
[0076] In one embodiment, once x.sub.t is determined as periodic,
its period length may be further estimated and corresponding
profile may be built for it. Frequency analysis may still be
employed to infer x.sub.t's period length T, but the result is not
accurate due to the low time resolution in the frequency domain.
Instead the auto-correlation function (ACF) of x.sub.t may be
computed. The ACF may describe the correlation of x.sub.t with its
lagged observations for all different lag .tau. s. If x.sub.t is
periodic with length T, x.sub.t may be maximally correlated with
its previous cycles (e.g., x.sub.t-T, x.sub.t-2T). That is, there
may exist peaks at positions .tau.=kT, k=1, 2, . . . , in the ACF
curve, where k represents any point in the ACF curve, and r
represents positions of possible peaks. A robust method (e.g.,
algorithm) may be employed to extract those peaks, and the T value
may be obtained by estimating the greatest common divisor (GCD) of
the peak positions.
In one embodiment, once the period length T is obtained, the
harmonic model may be employed to describe the shape of period
x.sub.t=.SIGMA..sub.j-1.sup.J{a.sub.j sin(j.omega.t)+b.sub.j
cos(j.omega.t)}+c, which may be represented as the composition of a
set of harmonic waves with frequencies .omega., 2.omega., . . . ,
N.omega., where x.sub.t is a time series model, and J is a number
of harmonic components. The fundamental frequency may be
represented as .omega.=2.pi./T, and there may be N=.left
brkt-top.T/2.right brkt-bot. possible frequencies due to the
Naquist sampling theory. Given the measurements of x.sub.t as the
training data, the least squares regression may be employed to
estimate model parameters (e.g., .theta.=[a.sub.1, b.sub.1, . . . ,
a.sub.J, b.sub.J, c]). The residuals of x.sub.t may be obtained
from the harmonic model (e.g., r.sub.t=x.sub.t-{circumflex over
(x)}.sub.t) where {circumflex over (x)}.sub.t may be the estimated
value from x.sub.t=.SIGMA..sub.j-1.sup.J{a.sub.j
sin(j.omega.t)+b.sub.j cos(j.omega.t)}+c. The threshold .DELTA. may
then be defined as maximum r.sub.t value obtained from model
training. During the online monitoring, given a new x.sub.t value
at time t, its estimated value may be obtained based on the
harmonic model with parameters .theta., and generate the residual
r.sub.t. The status of x.sub.t may then inferred by comparing
r.sub.t with the threshold .DELTA..
[0077] In one embodiment, a pairwise time series model is employed
according to the present principles. While previously discussed
models deal with data properties in a single time series, there
also may exist correlations among a group of time series due to the
dependencies between physical processes in the system. The pairwise
time series model may focus on the dependencies between one or more
pairs of time series (e.g., x.sub.t and y.sub.t). There are several
ways to model their dependencies, and for illustrative purposes, an
engine employing an AutoRegresive model with eXternal input (ARX)
will be discussed to describe the relationship because ARX may
capture dependencies of both current and previous values of two
time series. That is, the current value (e.g., y.sub.t), may be
correlated with not only the current value of x.sub.t but also the
past values of both x.sub.t and y.sub.t. The maximum lags of both
time series involved may be represented as [n, m, k], and the
maximum lags may be called as the order of ARX.
[0078] As opposed to the models previously discussed above, the
pairwise time series model may search the correlation between each
pair of time series in the training process, which may require many
computations for systems with a large number of attributes. As
such, a technique that utilizes some heuristics to speed up the
correlation search may be employed. For example, if it is found
that (x.sub.t,y.sub.t) and (y.sub.t,z.sub.t) are correlated pairs,
it may be directly inferred that x.sub.t and z.sub.t are also
correlated.
[0079] In one embodiment according to the present principles, a
multivariate time series model may be employed. In addition to the
correlation between pairs of time series discussed above, the
correlation among a group of time series may also be modeled
according to the present principles. That is, the values of time
series y.sub.t may depend on multiple other series (e.g.,
x.sub.t.sup.(1), x.sub.t.sup.(2), . . . ) rather than just one. The
multivariate regression may be employed to describe the
relationship:
y.sub.t=w.sub.0+w.sub.1x.sub.t.sup.(1)+w.sub.2x.sub.t.sup.(2)+ . .
. +w.sub.gx.sub.t.sup.(g). It is generally not known, and it is not
necessary to know the subset of time series x.sub.t.sup.(1), . . .
, x.sub.t.sup.(g) that are involved in the above equation, where
x.sub.t represents a time series, y.sub.t represents a time series,
and w.sub.0-g may represent coefficients for dependency between,
for example, y.sub.t and x.sub.t. Without domain knowledge, all
other attributes except y.sub.t may be the candidates. In the
model, all the possible variables may be placed in the right side
of the above equation, and then the model coefficients (e.g.,
w=.left brkt-bot.w.sub.0, w.sub.1, . . . , w.sub.g.right brkt-bot.)
may be regularized to select relevant coefficients. That is, rather
than using the least squares method, the coefficients may be
estimated by minimizing an objective function.
[0080] In the training process, the multivariate model may treat
every attribute as y.sub.t and may apply the above optimization to
discover its dependent attributes. Based on the parameters of
discovered relationship (e.g., .theta.=.left brkt-bot.w.sub.0,
w.sub.1, . . . , w.sub.g.right brkt-bot.), the estimation error
r.sub.t may be obtained. The fitness may then be determined, and
for those relationships with high fitness (e.g., F>0.6), the
threshold .DELTA. may be computed as the maximum of r.sub.t from
the training data. During the monitoring, the error r.sub.t may be
captured given new observations, and the error may be compared with
threshold .DELTA. to check the status. Similarly to the pairwise
time series model, past observations of each time series may be
added in the regression.
[0081] In one embodiment, the multi-layer time series analytic
engine has been evaluated on several physical systems, including a
power plant system, an automobile system, and a glass manufacture
plant, and has advantageously exhibited a high level of
effectiveness in system modeling and anomaly detection application.
One important application of our analytic engine is the automatic
system monitoring. The system may be found to be in a healthy
condition if all the monitoring data fits the estimated models.
Some anomaly events can be detected when one or some of the models
are broken for the testing data. In the training stage, a set of
different models may be learned, and may be employed for
prediction.
[0082] As discussed above, the data analytic engine according to
the present principles may be employed for complex physical system
self-management. Based on the strong regularity and high diversity
data characteristics observed in physical systems, the analytic
engine may profile the system monitoring data with an ensemble of
models, each of which may have discovered a specific data property.
The extracted data profiles may be employed to facilitate a
plurality of management tasks, such as system status monitoring and
online anomaly detection.
[0083] Embodiments described herein may be entirely hardware,
entirely software or including both hardware and software elements.
In a preferred embodiment, the present invention is implemented in
software, which includes but is not limited to firmware, resident
software, microcode, etc.
[0084] Embodiments may include a computer program product
accessible from a computer-usable or computer-readable medium
providing program code for use by or in connection with a computer
or any instruction execution system. A computer-usable or computer
readable medium may include any apparatus that stores,
communicates, propagates, or transports the program for use by or
in connection with the instruction execution system, apparatus, or
device. The medium can be magnetic, optical, electronic,
electromagnetic, infrared, or semiconductor system (or apparatus or
device) or a propagation medium. The medium may include a
computer-readable storage medium such as a semiconductor or solid
state memory, magnetic tape, a removable computer diskette, a
random access memory (RAM), a read-only memory (ROM), a rigid
magnetic disk and an optical disk, etc.
[0085] A data processing system suitable for storing and/or
executing program code may include at least one processor coupled
directly or indirectly to memory elements through a system bus. The
memory elements can include local memory employed during actual
execution of the program code, bulk storage, and cache memories
which provide temporary storage of at least some program code to
reduce the number of times code is retrieved from bulk storage
during execution. Input/output or I/O devices (including but not
limited to keyboards, displays, pointing devices, etc.) may be
coupled to the system either directly or through intervening I/O
controllers.
[0086] Network adapters may also be coupled to the system to enable
the data processing system to become coupled to other data
processing systems or remote printers or storage devices through
intervening private or public networks. Modems, cable modem and
Ethernet cards are just a few of the currently available types of
network adapters.
[0087] Having described preferred embodiments of a system and
method for modeling, analyzing, and managing time series dynamics
in complex physical systems (which are intended to be illustrative
and not limiting), it is noted that modifications and variations
can be made by persons skilled in the art in light of the above
teachings. It is therefore to be understood that changes may be
made in the particular embodiments disclosed which are within the
scope of the invention as outlined by the appended claims. Having
thus described aspects of the invention, with the details and
particularity required by the patent laws, what is claimed and
desired protected by Letters Patent is set forth in the appended
claims.
* * * * *