U.S. patent application number 17/110276 was filed with the patent office on 2022-06-02 for adaptive log analysis.
The applicant listed for this patent is INTERNATIONAL BUSINESS MACHINES CORPORATION. Invention is credited to Jae-Wook Ahn, Raghu Kiran Ganti, Shreeranjani Srirangamsridharan, Mudhakar Srivatsa.
Application Number | 20220171670 17/110276 |
Document ID | / |
Family ID | 1000005292853 |
Filed Date | 2022-06-02 |
United States Patent
Application |
20220171670 |
Kind Code |
A1 |
Srivatsa; Mudhakar ; et
al. |
June 2, 2022 |
ADAPTIVE LOG ANALYSIS
Abstract
A method for obtaining information and status about a monitored
system by adaptively analyzing log messages is provided. A log
analyzer receives log messages generated by a monitored system. The
log analyzer identifies static and variable portions in the
received log messages. The log analyzer generates a template based
on the identified static and variable portions of the received log
messages. The log analyzer computes a metric for the generated
template based on a number of log messages that fall within the
template. The log analyzer reports a status in the monitored system
based on the computed metric.
Inventors: |
Srivatsa; Mudhakar; (White
Plains, NY) ; Ganti; Raghu Kiran; (White Plains,
NY) ; Ahn; Jae-Wook; (Nanuet, NY) ;
Srirangamsridharan; Shreeranjani; (White Plains,
NY) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
INTERNATIONAL BUSINESS MACHINES CORPORATION |
ARMONK |
NY |
US |
|
|
Family ID: |
1000005292853 |
Appl. No.: |
17/110276 |
Filed: |
December 2, 2020 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/1865 20190101;
G06F 40/186 20200101; H04L 41/069 20130101; G06F 11/0709 20130101;
G06F 40/242 20200101; G06F 11/0772 20130101 |
International
Class: |
G06F 11/07 20060101
G06F011/07; G06F 16/18 20060101 G06F016/18; G06F 40/186 20060101
G06F040/186; G06F 40/242 20060101 G06F040/242; H04L 12/24 20060101
H04L012/24 |
Claims
1. A computing device comprising: a processor; and a storage device
storing a set of instructions, wherein an execution of the set of
instructions by the processor configures the computing device to
perform acts comprising: receiving one or more log messages
generated by a monitored system; identifying static and variable
portions in the received log messages; generating a template based
on the identified static and variable portions of the received one
or more log messages; computing a metric for the generated template
based on a number of log messages of the one or more log messages
that fall within the template; and reporting a status in the
monitored system based on the computed metric.
2. The computing device of claim 1, wherein the static and variable
portions of the one or more log messages are identified by using a
dictionary of meaningful words that are identified based on
statistics of words appearing in log messages.
3. The computing device of claim 1, wherein the static and variable
portions of the one or more log messages are identified by a list
of words that co-occur in the log messages.
4. The computing device of claim 1, wherein an execution of the set
of instructions by the processor further configures the computing
device to perform an act comprising: determining a time frame of
occurrence for the reported status based on a time stamp of a log
message that falls within the template.
5. The computing device of claim 1, wherein the status of the
monitored system is reported based on a template having a highest
metric among a plurality of generated templates.
6. The computing device of claim 1, wherein the template is
incrementally updated based on one or more subsequently received
log messages.
7. The computing device of claim 1, wherein an execution of the set
of instructions by the processor further configures the computing
device to perform an act comprising: identifying a set of related
log messages and identifying a set of templates that the set of
related log messages fall within, as a template model.
8. The computing device of claim 7, wherein an execution of the set
of instructions by the processor further configures the computing
device to perform acts comprising: adding a particular template to
the template model when one or more log messages of the set of
related log messages fall within the particular template; and
removing a given particular template from the template model when
no log messages of the set of related log messages fall within the
given particular template.
9. The computing device of claim 7, wherein an execution of the set
of instructions by the processor further configures the computing
device to perform acts comprising: determining a time frame of
occurrence for the reported status based on one or more time stamps
in incoming log messages that fall within a template model that is
related to the reported status.
10. The computing device of claim 1, wherein the metric is computed
based on a ratio of number of log messages that fall within the
template with respect to total number of log messages.
11. A computer program product comprising: one or more
non-transitory computer-readable storage devices and program
instructions stored on at least one of the one or more
non-transitory storage devices, the program instructions executable
by a processor, the program instructions comprising sets of
instructions for: receiving one or more log messages generated by a
monitored system; identifying static and variable portions in the
received log messages; generating a template based on the
identified static and variable portions of the received one or more
log messages; computing a metric for the generated template based
on a number of log messages of the one or more log messages that
fall within the template; and reporting a status in the monitored
system based on the computed metric.
12. A computer-implemented method comprising: receiving log
messages generated by a monitored system; identifying static and
variable portions in the received log messages; generating a
template based on the identified static and variable portions of
the received log messages; computing a metric for the generated
template based on a number of log messages of the one or more log
messages that fall within the template; and reporting a status in
the monitored system based on the computed metric.
13. The computer-implemented method of claim 12, wherein the static
and variable portions of the one or more log messages are
identified by using a dictionary of meaningful words that are
identified based on statistics of words appearing in log
messages.
14. The computer-implemented method of claim 12, wherein the static
and variable portions of the one or more log messages are
identified by a list of words that co-occur in the log
messages.
15. The computer-implemented method of claim 12, further comprising
determining a time frame of occurrence for the reported status
based on a time stamp of a log message that fall within the
template.
16. The computer-implemented method of claim 12, wherein the
template is incrementally updated based on one or more subsequently
received log messages.
17. The computer-implemented method of claim 12, further
comprising: identifying a set of related log messages; and
identifying a set of templates that the set of related log messages
fall within as a template model.
18. The computer-implemented method of claim 17, further comprising
adding a particular template to the template model when one or more
log messages of the set of related log messages fall within the
particular template.
19. The computer-implemented method of claim 17, further comprising
removing a particular template from the template model when no log
messages of the set of related log messages fall within the
particular template.
20. The computer-implemented method of claim 12, wherein the metric
is computed based on a ratio of a number of log messages that fall
within the template with respect to a total number of log messages.
Description
BACKGROUND
Technical Field
[0001] The present disclosure generally relates to processing and
analysis of log messages generated by computing or network
systems.
Description of the Related Arts
[0002] In computing, a log is a message or file that records either
events that occur in an operating system or other software runs, or
messages between different users of a communications software. For
example, a transaction log is a file of the communication between a
system and the users of that system, or a data collection method
that automatically captures the type, content, or time of
transactions made by a user with that system. For Web searching, a
transaction log is an electronic record of interactions that have
occurred during a searching episode between a Web search engine and
users searching for information on that Web search engine. A
logging system enables a dedicated, standardized subsystem to
generate, filter, record, and analyze log messages. Analyzing the
data stored in transaction logs may provide valuable insight into
understanding the system that generates the logs.
[0003] In managing modern computing system, it is of great interest
to have real time information about errors so that an administrator
may determine where in the system errors are occurring and which
errors are the most dominant. In a system that has hierarchical
structures such clusters, nodes, pods, and applications, it is
useful to identify regions in the system in which dominant errors
are occurring.
SUMMARY
[0004] Some embodiments of the disclosure provide a log analyzer
for obtaining information and status about a monitored system by
adaptively analyzing log messages. A log analyzer receives log
messages generated by a monitored system. The log analyzer
identifies static and variable portions in the received log
messages. The log analyzer generates a template based on the
identified static and variable portions of the received log
messages. The log analyzer computes a metric for the generated
template based on a number of log messages that fall within the
template. The log analyzer reports a status in the monitored system
based on the computed metric.
[0005] In some embodiments, the static and variable portions of the
log messages are identified by using a dictionary of meaningful
words that are identified based on statistics of words appearing in
log messages. The statistics of the words appearing in log messages
may be used to identify meaningful words to include in the
dictionary and meaningless (or low entropy) words to exclude from
the dictionary. In some embodiments, the static and variable
portions of the log messages are identified by a list of words that
co-occur in the log messages. In some embodiments, the log analyzer
allows an update to the dictionary by a user or an SME.
[0006] In some embodiments, a template is incrementally updated
based on subsequently received log messages of a same domain. In
some embodiments, existing templates are used as a seed to create
new templates. In some embodiments, the log analyzer groups a set
of related log messages and identifies a set of templates that the
set of related log messages fall within as a template model. The
log analyzer may add a particular template to the template model
when one or more log messages of the set of related log messages
fall within the particular template. The log analyzer may also
remove a particular template from the template model when no log
message of the set of related log messages fall within the
particular template.
[0007] In some embodiments, the log analyzer computes a metric for
each template in its template library with regard to the incoming
log messages. In some embodiments, the metric is computed based on
a ratio of number of log messages that fall within the template
with respect to total number of log messages.
[0008] The log analyzer may also determine a time frame of
occurrence for the reported status based on a time stamp of a log
message that fall within the template. In some embodiments, the
reported status of the monitored system is identified based on a
template having a highest metric among multiple templates in the
template library. The log analyzer may determine a time frame of
occurrence for the reported status based on one or more timestamps
in the incoming log messages that fall within a template model that
is related to the reported status.
[0009] By inferring templates from log messages, the log analyzer
is able to automatically detect issues arising from a monitored
system that generates the log messages. The log analyzer measures
the performance metrics of the templates with regard to the log
messages in order to further refine the templates. The log analyzer
therefore improves the efficiency of system monitoring.
[0010] The preceding Summary is intended to serve as a brief
introduction to some embodiments of the disclosure. It is not meant
to be an introduction or overview of all inventive subject matter
disclosed in this document. The Detailed Description that follows
and the Drawings that are referred to in the Detailed Description
will further describe the embodiments described in the Summary as
well as other embodiments. Accordingly, to understand all the
embodiments described by this document, a Summary, Detailed
Description and the Drawings are provided. Moreover, the claimed
subject matter is not to be limited by the illustrative details in
the Summary, Detailed Description, and the Drawings, but rather is
to be defined by the appended claims, because the claimed subject
matter can be embodied in other specific forms without departing
from the spirit of the subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The drawings are of illustrative embodiments. They do not
illustrate all embodiments. Other embodiments may be used in
addition or instead. Details that may be apparent or unnecessary
may be omitted to save space or for more effective illustration.
Some embodiments may be practiced with additional components or
steps and/or without all of the components or steps that are
illustrated. When the same numeral appears in different drawings,
it refers to the same or like components or steps.
[0012] FIG. 1 conceptually illustrates a log analyzer that analyzes
log messages generated by a monitored system to identify issues in
the system, consistent with an illustrative embodiment.
[0013] FIG. 2 illustrates corresponding example log messages and
example templates.
[0014] FIG. 3a provides an example dictionary of stop words that
are identified by the log analyzer.
[0015] FIG. 3b provides an example list of co-occurring words or
tokens identified by the log analyzer.
[0016] FIG. 4 illustrates example templates and the corresponding
number of log messages that fall within the template.
[0017] FIG. 5 conceptually illustrates an example report generated
by the log analyzer.
[0018] FIG. 6 conceptually illustrates a process for extracting
templates from log messages and for using metrics of the extracted
templates to report the status of a monitored system, consistent
with an illustrative embodiment.
[0019] FIG. 7 shows a block diagram of the components of a data
processing system in accordance with an illustrative embodiment of
the present disclosure.
[0020] FIG. 8 illustrates an example cloud-computing
environment.
[0021] FIG. 9 illustrates a set of functional abstraction layers
provided by a cloud-computing environment, consistent with an
illustrative embodiment.
DETAILED DESCRIPTION
[0022] In the following detailed description, numerous specific
details are set forth by way of examples in order to provide a
thorough understanding of the relevant teachings. However, it
should be apparent that the present teachings may be practiced
without such details. In other instances, well-known methods,
procedures, components, and/or circuitry have been described at a
relatively high-level, without detail, in order to avoid
unnecessarily obscuring aspects of the present teachings.
[0023] Some embodiments of the disclosure provide a log analyzer
for obtaining information and status about a system based on log
messages generated by the system. As log messages tend to be mostly
repetitive lines with missing variables, the log analyzer extracts
patterns from the content of the log messages to explain the issues
at hand, such as where in the system errors are occurring and what
types of errors are occurring. The log analyzer is capable of
learning by inferring new templates based on the content of the
received log messages and does not presume any specific known
format.
[0024] In some embodiments, the log analyzer performs template
extraction by identifying static and variable portions of a
heterogeneous collection of timestamped log data. The log analyzer
treats log data as a multi-variate timeseries, recursively learning
templates by obtaining input to create custom variables. The
identified static and variable portions are used to form new
templates. The learned templates can in turn be viewed as a time
series that shows events and issues occurring in the system at
various time intervals. In some embodiments, the log analyzer
incrementally learns new templates through seeding of an original
model template. The log analyzer determines a ratio or percentage
of log messages for each template as a metric for the template. The
log analyzer also captures metrics of newly appeared templates as
well as metrics of disappeared templates.
[0025] FIG. 1 conceptually illustrates a log analyzer 100 that
analyzes log messages generated by a monitored system 110 to
identify issues in the system. An example of such a system may be a
passenger management system that includes component systems for
billing, notification, payments, driver management, trip
management, etc. The monitored system 110 may also include various
network or communications interface components. As illustrated, the
system 110 generates log messages 120, which may reflect various
real time status, events, errors, or other types of issues
occurring at different component systems or different hierarchical
levels of the system 110. The log messages 120 generated may
exhibit different patterns that correspond to different issues
being reported by different components or hierarchical levels of
the system 110.
[0026] The log analyzer 100 receives the generated log messages
120, and generates various templates based on the different
patterns found in the log messages 120. As different patterns in
log messages may correspond to different issues occurring in the
system 110 at different times, the log analyzer 100 may detect and
report those issues based on metrics related to individual
templates (e.g., number of messages having patterns that fit a
particular template).
[0027] As illustrated, the log analyzer 100 includes a time series
slicing module 130, a template generation module 132, a template
matching module 134, a diagnostic module 136, and a user interface
module 138. In some embodiments, a computing device implements the
modules 130-138. In some embodiments, the modules 130-138 are
modules of software instructions being executed by one or more
processing units (e.g., a processor) of the computing device. In
some embodiments, the modules 130-138 are modules of hardware
circuits implemented by one or more integrated circuits (ICs) of an
electronic apparatus. Though the modules 130, 132, 134, and 136 are
illustrated as being separate modules, some of the modules can be
combined into a single module. An example computing device 700 that
may implement the log analyzer 100 will be described by reference
to FIG. 7 below.
[0028] The time series slicing module 130 assigns log messages into
different time slices or time frames. As templates are extracted
from log messages, this allows the log analyzer 100 to associate
templates extracted from a set of log messages with the specific
time frames. In some embodiments, this also allows the log analyzer
100 to report the time at which status or events of the system 110
are occurring. In some embodiments, the log messages 120 are time
sliced based on time stamps associated with the log messages. In
some embodiments, the log messages are time sliced based on an
anchoring message. In some embodiments, the time series slicing
module 130 assigns a burst of log messages into a same time slice.
In some embodiments, the log messages being time sliced into a same
time frame are related messages of a same domain (e.g., from a same
region of the monitored system 110).
[0029] The template generation module 132 extracts templates by
analyzing patterns in the log messages or a given dataset. Unlike
documents typically generated by humans, log messages are generated
by machines through print style statements. These statements
typically have variables filled in at the time of generation of
these logs. For example, consider: Failed to index
IndexableDocument with id index,None)), where the
web_crawl_566612-ffff-abcd-8888-00099999 is potentially an
identifier for the document being indexed. The goal of template
extraction is then to identify the variables from each of these log
messages.
[0030] In some embodiments, each log message is parsed into tokens
or component word. It is noted that standard tokenization or
parsing techniques based on specific symbols or words may not work,
since the log messages may be laced with JSON, parenthesis, and
other software related event monitoring instrumentation. In order
to parses the log messages, the template generation module 132 is
bootstrapped with regular expression (regex) processing
capabilities. For example, punctuations at beginning and end are
detached and added as separate tokens, such as "www.abc.com:8080"
is processed as "", "www.abc.com:8080", "". The template generation
module 132 may also perform regular expression based type-marking,
such as "www.abc.com:8080" is typed marked as
"INTERNET_ADDR_WITH_PORT". The template generation module 132 may
also perform sequence regular expression replacement, such as
replacing "1 sec" with "NUMBER" and "sec" with "TIME". The template
generation module 132 may also receive from the user interface 138
user defined regular expressions and be configured to perform
corresponding parsing operations.
[0031] FIG. 2 illustrates corresponding example log messages
211-214 and example templates 221-224. The example templates
221-224 are generated based on (or extracted from) the patterns of
the log messages 211-214, respectively. Various portions of the
messages 211-214 are identified as variable or static by the log
analyzer 100. For example, in the message 211, "error" and "Found
log configuration_id:" are identified as static portions of the
message 211 and are reproduced in the extracted template 221. On
the other hand, the string "TransactionID-AE12345678" and the
string "20191016-011111-777-szM1ABab" are identified as variable
portions of the message 211 and are replaced by a notation
"<*>" in the extracted template 221.
[0032] The template generation module 132 may also identify a
variable portion of the log messages 120 as belonging to a
particular type. The log analyzer 100 may type-mark that variable
portion based on certain known structures in a template. For
example, for the message 214, the log analyzer 100 type-mark the
variable portion of the message between the static portions
"httpExecutor Retry count:" and "Reach maximum allowed retries" as
"NUMBER" in the template 224. In some embodiments, the template
generation module 132 is configured to identify multiple different
types of variables, such as email addresses, IP addresses, time
stamps, etc.
[0033] In some embodiments, the template generation module 132 also
performs dictionary induction and filtering. Specifically, the
template generation module 132 compiles a dictionary 140, which
includes a list of words or tokens that are known to be meaningful
in message logs. Words or tokens that are determined to be
meaningless or low entropy are excluded from the dictionary 140.
The meaningful words are also referred to as stop words in some
embodiments. The template generation module 132 may also receive
user input from the user interface 138 to add or remove stop words
from the dictionary 140. The template generation module 132
extracts templates from the log messages based on the static and
variable words identified from the log messages 120, and by using
the content of the dictionary 140.
[0034] In some embodiments, the message parser determines which
words are meaningful and which words are meaningless based on
statistics, e.g., the number of times that the log analyzer 100
encounters each word in the received log messages. FIG. 3a provides
an example dictionary 310 of stop words that are identified by the
log analyzer 100. Each identified stop word is accompanied by its
corresponding statistics. In some embodiments, the statistics of a
word is used to establish the word's meaningfulness. In some
embodiments, the message parser may also determine which sets or
pairs of words tend to co-occur. These sets or pairs of words or
tokens appear together or provide context for each other. FIG. 3b
provides an example list 320 of co-occurring words or tokens
identified by the log analyzer. In some embodiments, the template
generation module 132 may index sentences and type-mark
co-occurring pairs. For example, the string "ABC:www.abc.com" may
be type-marked as <0,"ABC">,
<1,INTERNET_ADDR_WITH_PORT>.
[0035] In some embodiments, the template generation module 132 uses
a finite state machine with a single counter for variable length
parsing and for processing conditional expressions such as if-else,
for loops, open and close parenthesis or brackets, etc. In some
embodiments, template generation module 132 may traverse the
hierarchy of the system 100 when processing a log message, and the
counter based mechanism is used to track such traversal.
[0036] In some embodiments, the template generation module 132
performs a tokenization step to identify tokens in the content of
log messages. In some embodiments, an extracted template may
include regular expressions with a single counter to represent
tokens. A regular expression with counters may include two regular
expressions pos_regex and a neg_reg; when a pos_regex is
encountered, the counter is incremented by one; and when a
neg_regex is encountered the counter is decremented by one. In some
embodiments, an extracted template may include a tree of named keys
to represent a JSON string or XML string.
[0037] In some embodiments, the template generation module 132
performs a sequence clustering algorithm based on co-occurrence and
index of words to extract templates. Specifically, the algorithm
learns the conditional probability of the i{circumflex over ( )}th
token in a log given the j{circumflex over ( )}th token. Token at
index i is considered a variable if sum over all j!=i,
Pr(i|j)/(n-1)<threshold, where n is the length of the log
expressed in number of tokens. The variables identified are
replaced by a static token $VARIABLE and the algorithm is repeated
to identify more variables, until none can be found. In some
embodiments, the threshold is a parameter that the user or a
subject matter expert (SME) may control through the user interface
138.
[0038] In some embodiments, a token can itself be recursively
templated. For example, in some embodiments, at the coarse grained
application, an entire JSON string inside a text log message may
appear as a single token. However, that token can be further
specialized into JSON templates based on the tree of named keys
approach mentioned above. This step may be done with user or SME
feedback through the user interface 138 to selectively template
tokens identified in any round of template extraction. The step can
be recursively applied any number of times to produce richer (or
more fine-grained) templates.
[0039] Data in log messages is often nested, and the template
generation module 132 may correspondingly extract templates
recursively and hierarchically. For example, for a system having a
microservice architecture, log data can become deeply nested as
microservices can call each other in a nested fashion. For example,
consider the chain C->B->A, if an error in A is thrown, it is
captured by B, which formats it as JSON and that is thrown to C,
which also in turn formats it as JSON. This results in a nested log
of the form JSON_C{JSON_B{JSON_A}}.
[0040] The extracted templates are stored in a template library
142. The template library 142 stores one or more template models.
Each template model is a collection of templates that correspond to
a set of (related) log messages that are generated within a same
time frame (as divided by the time slicing module 130). The log
messages are analyzed to detect occurrence of changes and unusual
events. Subsequent log messages arriving at the log analyzer 100
may fall into templates belonging to different template models. In
some embodiments, a template may newly appear in an existing
template model when there is one or more log messages that fall
within the new template, which did not exist in the template model.
Conversely, an existing template may disappear from an existing
template model when there are no log messages that fall under the
template.
[0041] The template models included in the template library 142 may
also include pre-existing models, which are learned from a large
volume of historical data. These pre-existing template models can
be used to match, predict, or identify templates for a given input.
The pre-existing models are useful when the expected logs are more
or less following the same format as in the past. The template
models in the template library 142 can also be used to detect
anomalies in the system 110, such as when no pre-existing template
models match the incoming log messages.
[0042] The template matching module 134 applies the extracted
templates to the log messages 120 from the system 110 and computes
a metric for each extracted template. Computing metrics for
templates allows changes to templates over time to be quantified.
In some embodiments, the metric of a template is a matching score
that is determined based on a number of log messages that fall
within the template, i.e., a number of log messages having content
that fits the pattern described by the template. FIG. 4 illustrates
example templates and the corresponding number of log messages that
fall within the template. In some embodiments, the template metric
is determined based on a ratio of number of log messages that fall
within the template with respect to total number of log messages.
In some embodiments, whether a template is to newly appear in
template model or to disappear from a template model is determined
based on the metrics measured for the template.
[0043] In some embodiments, the log analyzer 100 generates a new
template when more than a threshold number of incoming log messages
fail to fall within any existing templates. The log analyzer may
generate the new template based on an existing template that is
selected from the template library 142. The log analyzer 100 may
select a template having a best correlation score with the log
messages that fail to fall within any existing templates. The
selected existing template serves as an original template from
which a child template is evolved from. The inference or extraction
of templates is therefore an iterative process, where existing
templates may be used to create newer templates, and the existing
templates may be incrementally updated as data from newer log
messages (from the same domain as the log messages used to extract
the original existing template) become available.
[0044] The diagnostic module 136 report status of the system based
on the metrics that are determined for the templates. Specifically,
incoming log messages 120 are analyzed using template models in the
template library 142, such that the log messages of a same domain
may disproportionately fall within the templates of a particular
template model, and the particular template model may in turn have
higher metrics than other template models in the library for those
log messages. The diagnostic module 136 uses the template metrics
reported by the template matching module 134 to identify the
highest scoring template model. The diagnostic module 136 may
identify different template models for different time frames or
time slices, the different time frames being defined by the time
series slicing module 130 based on the time stamps of the incoming
log messages.
[0045] FIG. 5 conceptually illustrates an example report generated
by the log analyzer 100. As illustrated, at different times of a
day, different template models (TM1 through TM10) are reported as
being the highest scoring template model based on metrics that are
computed for those template models. In some embodiments, at least
some of the template models are associated with different
components or hierarchical levels of the system, or with certain
types of events or issues. The diagnostic module 136 may use the
scores or metrics of templates or template models to report a
status or an issue of the system at a specific time frame.
[0046] FIG. 6 conceptually illustrates a process 600 for extracting
templates from log messages and for using metrics of the extracted
templates to report the status of a monitored system, consistent
with an illustrative embodiment. In some embodiments, one or more
processing units (e.g., processor) of a computing device
implementing the log analyzer 100 perform the process 600 by
executing instructions stored in a computer readable medium.
[0047] The log analyzer receives (at block 610) log messages
generated by a monitored system (such as the system 110). The log
analyzer identifies (at block 620) static and variable portions in
the received log messages. In some embodiments, the static and
variable portions of the log messages are identified by using a
dictionary of meaningful words that are identified based on
statistics of words appearing in log messages. The statistics of
the words appearing in log messages may be used to identify
meaningful words to include in the dictionary and meaningless (or
low entropy) words to exclude from the dictionary. In some
embodiments, the static and variable portions of the log messages
are identified by a list of words that co-occur in the log
messages. In some embodiments, the log analyzer allows updates to
the dictionary by a user or an SME.
[0048] The log analyzer generates (at block 630) a template based
on the identified static and variable portions of the received log
messages. In some embodiments, a template is incrementally updated
based on subsequently received log messages of a same domain. In
some embodiments, existing templates are used as seed to create new
templates. In some embodiments, the log analyzer groups a set of
related log messages and identifies a set of templates that the set
of related log messages fall within as a template model. The log
analyzer may add a particular template to the template model when
one or more log messages of the set of related log messages fall
within the particular template. The log analyzer may also remove a
particular template from the template model when no log message of
the set of related log messages fall within the particular
template.
[0049] The log analyzer computes (at block 640) a metric for the
generated template based on a number of log messages that fall
within the template. In some embodiments, the log analyzer computes
a metric for each template in its template library with regard to
the incoming log messages. In some embodiments, the metric is
computed based on a ratio of number of log messages that fall
within the template with respect to total number of log
messages.
[0050] The log analyzer reports (at block 650) a status in the
monitored system based on the computed metric. The status may
relate to an issue or an error occurring at a particular region of
the monitored system. The log analyzer may also determine a time
frame of occurrence for the reported status based on a time stamp
of a log message that fall within the template. In some
embodiments, the reported status of the monitored system is
identified based on a template having a highest metric among
multiple templates in the template library. The log analyzer may
determine a time frame of occurrence for the reported status based
on one or more time stamps in the incoming log messages that fall
within a template model that is related to the reported status.
[0051] By inferring templates from log messages, the log analyzer
is able to automatically detect issues arising from a monitored
system that generates the log messages. The log analyzer measures
the performance metrics of the templates with regard to the log
messages in order to further refine the templates. The log analyzer
therefore improves the efficiency of system monitoring.
[0052] The present application may be a system, a method, and/or a
computer program product at any possible technical detail level of
integration. The computer program product may include a computer
readable storage medium (or media) having computer readable program
instructions thereon for causing a processor to carry out aspects
of the present disclosure.
[0053] The computer readable storage medium can be a tangible
device that can retain and store instructions for use by an
instruction execution device. The computer readable storage medium
may be, for example, but is not limited to, an electronic storage
device, a magnetic storage device, an optical storage device, an
electromagnetic storage device, a semiconductor storage device, or
any suitable combination of the foregoing. A non-exhaustive list of
more specific examples of the computer readable storage medium
includes the following: a portable computer diskette, a hard disk,
a random access memory (RAM), a read-only memory (ROM), an erasable
programmable read-only memory (EPROM or Flash memory), a static
random access memory (SRAM), a portable compact disc read-only
memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a
floppy disk, a mechanically encoded device such as punch-cards or
raised structures in a groove having instructions recorded thereon,
and any suitable combination of the foregoing. A computer readable
storage medium, as used herein, is not to be construed as being
transitory signals per se, such as radio waves or other freely
propagating electromagnetic waves, electromagnetic waves
propagating through a waveguide or other transmission media (e.g.,
light pulses passing through a fiber-optic cable), or electrical
signals transmitted through a wire.
[0054] Computer readable program instructions described herein can
be downloaded to respective computing/processing devices from a
computer readable storage medium or to an external computer or
external storage device via a network, for example, the Internet, a
local area network, a wide area network and/or a wireless network.
The network may comprise copper transmission cables, optical
transmission fibers, wireless transmission, routers, firewalls,
switches, gateway computers and/or edge servers. A network adapter
card or network interface in each computing/processing device
receives computer readable program instructions from the network
and forwards the computer readable program instructions for storage
in a computer readable storage medium within the respective
computing/processing device. Computer readable program instructions
for carrying out operations of the present disclosure may be
assembler instructions, instruction-set-architecture (ISA)
instructions, machine instructions, machine dependent instructions,
microcode, firmware instructions, state-setting data, configuration
data for integrated circuitry, or either source code or object code
written in any combination of one or more programming languages,
including an object oriented programming language such as
Smalltalk, C++, or the like, and procedural programming languages,
such as the "C" programming language or similar programming
languages. The computer readable program instructions may execute
entirely on the user's computer, partly on the user's computer, as
a stand-alone software package, partly on the user's computer and
partly on a remote computer or entirely on the remote computer or
server. In the latter scenario, the remote computer may be
connected to the user's computer through any type of network,
including a local area network (LAN) or a wide area network (WAN),
or the connection may be made to an external computer (for example,
through the Internet using an Internet Service Provider). In some
embodiments, electronic circuitry including, for example,
programmable logic circuitry, field-programmable gate arrays
(FPGA), or programmable logic arrays (PLA) may execute the computer
readable program instructions by utilizing state information of the
computer readable program instructions to personalize the
electronic circuitry, in order to perform aspects of the present
disclosure.
[0055] Aspects of the present disclosure are described herein with
reference to flowchart illustrations and/or block diagrams of
methods, apparatus (systems), and computer program products
according to embodiments of the disclosure. It will be understood
that each block of the flowchart illustrations and/or block
diagrams, and combinations of blocks in the flowchart illustrations
and/or block diagrams, can be implemented by computer readable
program instructions. These computer readable program instructions
may be provided to a processor of a computer, or other programmable
data processing apparatus to produce a machine, such that the
instructions, which execute via the processor of the computer or
other programmable data processing apparatus, create means for
implementing the functions/acts specified in the flowchart and/or
block diagram block or blocks. These computer readable program
instructions may also be stored in a computer readable storage
medium that can direct a computer, a programmable data processing
apparatus, and/or other devices to function in a particular manner,
such that the computer readable storage medium having instructions
stored therein comprises an article of manufacture including
instructions which implement aspects of the function/act specified
in the flowchart and/or block diagram block or blocks.
[0056] The computer readable program instructions may also be
loaded onto a computer, other programmable data processing
apparatus, or other device to cause a series of operational steps
to be performed on the computer, other programmable apparatus or
other device to produce a computer implemented process, such that
the instructions which execute on the computer, other programmable
apparatus, or other device implement the functions/acts specified
in the flowchart and/or block diagram block or blocks. The
flowchart and block diagrams in the Figures (e.g., FIG. 6)
illustrate the architecture, functionality, and operation of
possible implementations of systems, methods, and computer program
products according to various embodiments of the present
disclosure. In this regard, each block in the flowchart or block
diagrams may represent a module, segment, or portion of
instructions, which comprises one or more executable instructions
for implementing the specified logical function(s). In some
alternative implementations, the functions noted in the blocks may
occur out of the order noted in the Figures. For example, two
blocks shown in succession may, in fact, be executed substantially
concurrently, or the blocks may sometimes be executed in the
reverse order, depending upon the functionality involved. It will
also be noted that each block of the block diagrams and/or
flowchart illustration, and combinations of blocks in the block
diagrams and/or flowchart illustration, can be implemented by
special purpose hardware-based systems that perform the specified
functions or acts or carry out combinations of special purpose
hardware and computer instructions.
[0057] FIG. 7 shows a block diagram of the components of data
processing systems 700 and 750 that may be used to implement a
system for extracting templates from received log messages (i.e.,
the log analyzer 100) in accordance with an illustrative embodiment
of the present disclosure. It should be appreciated that FIG. 7
provides only an illustration of one implementation and does not
imply any limitations with regard to the environments in which
different embodiments may be implemented. Many modifications to the
depicted environments may be made based on design and
implementation requirements.
[0058] Data processing systems 700 and 750 are representative of
any electronic device capable of executing machine-readable program
instructions. Data processing systems 700 and 750 may be
representative of a smart phone, a computer system, PDA, or other
electronic devices. Examples of computing systems, environments,
and/or configurations that may represented by data processing
systems 700 and 750 include, but are not limited to, personal
computer systems, server computer systems, thin clients, thick
clients, hand-held or laptop devices, multiprocessor systems,
microprocessor-based systems, network PCs, minicomputer systems,
and distributed cloud computing environments that include any of
the above systems or devices.
[0059] The data processing systems 700 and 750 may include a set of
internal components 705 and a set of external components 755
illustrated in FIG. 7. The set of internal components 705 includes
one or more processors 720, one or more computer-readable RAMs 722
and one or more computer-readable ROMs 724 on one or more buses
726, and one or more operating systems 728 and one or more
computer-readable tangible storage devices 730. The one or more
operating systems 728 and programs such as the programs for
executing the process 600 are stored on one or more
computer-readable tangible storage devices 730 for execution by one
or more processors 720 via one or more RAMs 722 (which typically
include cache memory). In the embodiment illustrated in FIG. 7,
each of the computer-readable tangible storage devices 730 is a
magnetic disk storage device of an internal hard drive.
Alternatively, each of the computer-readable tangible storage
devices 730 is a semiconductor storage device such as ROM 724,
EPROM, flash memory or any other computer-readable tangible storage
device that can store a computer program and digital
information.
[0060] The set of internal components 705 also includes a R/W drive
or interface 732 to read from and write to one or more portable
computer-readable tangible storage devices 786 such as a CD-ROM,
DVD, memory stick, magnetic tape, magnetic disk, optical disk or
semiconductor storage device. The instructions for executing the
process 600 can be stored on one or more of the respective portable
computer-readable tangible storage devices 786, read via the
respective R/W drive or interface 732 and loaded into the
respective hard drive 730.
[0061] The set of internal components 705 may also include network
adapters (or switch port cards) or interfaces 736 such as a TCP/IP
adapter cards, wireless Wi-Fi interface cards, or 3G or 4G wireless
interface cards or other wired or wireless communication links.
Instructions of processes or programs described above can be
downloaded from an external computer (e.g., server) via a network
(for example, the Internet, a local area network or other, wide
area network) and respective network adapters or interfaces 736.
From the network adapters (or switch port adaptors) or interfaces
736, the instructions and data of the described programs or
processes are loaded into the respective hard drive 730. The
network may comprise copper wires, optical fibers, wireless
transmission, routers, firewalls, switches, gateway computers
and/or edge servers.
[0062] The set of external components 755 can include a computer
display monitor 770, a keyboard 780, and a computer mouse 784. The
set of external components 755 can also include touch screens,
virtual keyboards, touch pads, pointing devices, and other human
interface devices. The set of internal components 705 also includes
device drivers 740 to interface to computer display monitor 770,
keyboard 780 and computer mouse 784. The device drivers 740, R/W
drive or interface 732 and network adapter or interface 736
comprise hardware and software (stored in storage device 730 and/or
ROM 724).
[0063] It is to be understood that although this disclosure
includes a detailed description on cloud computing, implementation
of the teachings recited herein are not limited to a cloud
computing environment. Rather, embodiments of the present
disclosure are capable of being implemented in conjunction with any
other type of computing environment now known or later developed.
Cloud computing is a model of service delivery for enabling
convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, network
bandwidth, servers, processing, memory, storage, applications,
virtual machines, and services) that can be rapidly provisioned and
released with minimal management effort or interaction with a
provider of the service. This cloud model may include at least five
characteristics, at least three service models, and at least four
deployment models.
[0064] On-demand self-service: a cloud consumer can unilaterally
provision computing capabilities, such as server time and network
storage, as needed--automatically without requiring human
interaction with the service's provider.
[0065] Broad network access: capabilities are available over a
network and accessed through standard mechanisms that promote use
by heterogeneous thin or thick client platforms (e.g., mobile
phones, laptops, and PDAs).
[0066] Resource pooling: the provider's computing resources are
pooled to serve multiple consumers using a multi-tenant model, with
different physical and virtual resources dynamically assigned and
reassigned according to demand. There is a sense of location
independence in that the consumer generally has no control or
knowledge over the exact location of the provided resources but may
be able to specify location at a higher level of abstraction (e.g.,
country, state, or datacenter).
[0067] Rapid elasticity: capabilities can be rapidly and
elastically provisioned, in some cases automatically, to quickly
scale out and rapidly released to quickly scale in. To the
consumer, the capabilities available for provisioning often appear
to be unlimited and can be purchased in any quantity at any
time.
[0068] Measured service: cloud systems automatically control and
optimize resource use by leveraging a metering capability at some
level of abstraction appropriate to the type of service (e.g.,
storage, processing, bandwidth, and active user accounts). Resource
usage can be monitored, controlled, and reported, providing
transparency for both the provider and consumer of the utilized
service.
[0069] Software as a Service (SaaS): the capability provided to the
consumer is to use the provider's applications running on a cloud
infrastructure. The applications are accessible from various client
devices through a thin client interface such as a web browser
(e.g., web-based e-mail). The consumer does not manage or control
the underlying cloud infrastructure including network, servers,
operating systems, storage, or even individual application
capabilities, with the possible exception of limited user-specific
application configuration settings.
[0070] Platform as a Service (PaaS): the capability provided to the
consumer is to deploy onto the cloud infrastructure
consumer-created or acquired applications created using programming
languages and tools supported by the provider. The consumer does
not manage or control the underlying cloud infrastructure including
networks, servers, operating systems, or storage, but has control
over the deployed applications and possibly application hosting
environment configurations. Infrastructure as a Service (IaaS): the
capability provided to the consumer is to provision processing,
storage, networks, and other fundamental computing resources where
the consumer is able to deploy and run arbitrary software, which
can include operating systems and applications. The consumer does
not manage or control the underlying cloud infrastructure but has
control over operating systems, storage, deployed applications, and
possibly limited control of select networking components (e.g.,
host firewalls).
[0071] Private cloud: the cloud infrastructure is operated solely
for an organization. It may be managed by the organization or a
third party and may exist on-premises or off-premises.
[0072] Community cloud: the cloud infrastructure is shared by
several organizations and supports a specific community that has
shared concerns (e.g., mission, security requirements, policy, and
compliance considerations). It may be managed by the organizations
or a third party and may exist on-premises or off-premises.
[0073] Public cloud: the cloud infrastructure is made available to
the general public or a large industry group and is owned by an
organization selling cloud services.
[0074] Hybrid cloud: the cloud infrastructure is a composition of
two or more clouds (private, community, or public) that remain
unique entities but are bound together by standardized or
proprietary technology that enables data and application
portability (e.g., cloud bursting for load-balancing between
clouds).
[0075] A cloud-computing environment is service oriented with a
focus on statelessness, low coupling, modularity, and semantic
interoperability. At the heart of cloud computing is an
infrastructure that includes a network of interconnected nodes.
[0076] Referring now to FIG. 8, an illustrative cloud computing
environment 850 is depicted. As shown, cloud computing environment
850 includes one or more cloud computing nodes 810 with which local
computing devices used by cloud consumers, such as, for example,
personal digital assistant (PDA) or cellular telephone 854A,
desktop computer 854B, laptop computer 854C, and/or automobile
computer system 854N may communicate. Nodes 810 may communicate
with one another. They may be grouped (not shown) physically or
virtually, in one or more networks, such as Private, Community,
Public, or Hybrid clouds as described hereinabove, or a combination
thereof. This allows cloud computing environment 850 to offer
infrastructure, platforms and/or software as services for which a
cloud consumer does not need to maintain resources on a local
computing device. It is understood that the types of computing
devices 854A-N shown in FIG. 8 are intended to be illustrative only
and that computing nodes 810 and cloud computing environment 850
can communicate with any type of computerized device over any type
of network and/or network addressable connection (e.g., using a web
browser).
[0077] Referring now to FIG. 9, a set of functional abstraction
layers provided by cloud computing environment 850 (of FIG. 8) is
shown. It should be understood that the components, layers, and
functions shown in FIG. 9 are intended to be illustrative only and
embodiments of the disclosure are not limited thereto. As depicted,
the following layers and corresponding functions are provided:
[0078] Hardware and software layer 960 includes hardware and
software components. Examples of hardware components include:
mainframes 961; RISC (Reduced Instruction Set Computer)
architecture based servers 962; servers 963; blade servers 964;
storage devices 965; and networks and networking components 966. In
some embodiments, software components include network application
server software 967 and database software 968.
[0079] Virtualization layer 970 provides an abstraction layer from
which the following examples of virtual entities may be provided:
virtual servers 971; virtual storage 972; virtual networks 973,
including virtual private networks; virtual applications and
operating systems 974; and virtual clients 975.
[0080] In one example, management layer 980 may provide the
functions described below. Resource provisioning 981 provides
dynamic procurement of computing resources and other resources that
are utilized to perform tasks within the cloud computing
environment. Metering and Pricing 982 provide cost tracking as
resources are utilized within the cloud computing environment, and
billing or invoicing for consumption of these resources. In one
example, these resources may include application software licenses.
Security provides identity verification for cloud consumers and
tasks, as well as protection for data and other resources. User
portal 983 provides access to the cloud-computing environment for
consumers and system administrators. Service level management 984
provides cloud computing resource allocation and management such
that required service levels are met. Service Level Agreement (SLA)
planning and fulfillment 985 provide pre-arrangement for, and
procurement of, cloud computing resources for which a future
requirement is anticipated in accordance with an SLA.
[0081] Workloads layer 990 provides examples of functionality for
which the cloud computing environment may be utilized. Examples of
workloads and functions which may be provided from this layer
include: mapping and navigation 991; software development and
lifecycle management 992; virtual classroom education delivery 993;
data analytics processing 994; transaction processing 995; and
workload 996. In some embodiments, the workload 996 performs some
of the operations of the log analyzer 100, e.g., template
generation and matching.
[0082] The foregoing one or more embodiments implements a log
analyzing system within a computer infrastructure by having one or
more computing devices processing log messages and inferring
templates from the log messages. The computer infrastructure is
further used to computing performance metrics for the inferred
templates.
[0083] The descriptions of the various embodiments of the present
disclosure have been presented for purposes of illustration, but
are not intended to be exhaustive or limited to the embodiments
disclosed. Many modifications and variations will be apparent to
those of ordinary skill in the art without departing from the scope
and spirit of the described embodiments. The terminology used
herein was chosen to best explain the principles of the
embodiments, the practical application or technical improvement
over technologies found in the marketplace, or to enable others of
ordinary skill in the art to understand the embodiments disclosed
herein.
* * * * *