U.S. patent application number 14/804528 was filed with the patent office on 2017-01-26 for providing fault injection to cloud-provisioned machines.
The applicant listed for this patent is International Business Machines Corporation. Invention is credited to Yu Deng, Ruchi Mahindru, Anca Sailer, Soumitra Sarkar, Long Wang.
Application Number | 20170024299 14/804528 |
Document ID | / |
Family ID | 57836178 |
Filed Date | 2017-01-26 |
United States Patent
Application |
20170024299 |
Kind Code |
A1 |
Deng; Yu ; et al. |
January 26, 2017 |
Providing Fault Injection to Cloud-Provisioned Machines
Abstract
Methods, systems, and computer program products for providing
fault injection to Cloud-provisioned machines are provided herein.
A method includes determining one or more fault conditions to be
associated with a fault injection implementation based on one or
more parameters associated with a request for the fault injection
implementation; generating a specification for a lifecycle of the
fault injection implementation based on the one or more fault
conditions; and executing the fault injection implementation in a
target system, wherein said executing comprises effecting the
lifecycle of the fault injection implementation according to the
generated specification.
Inventors: |
Deng; Yu; (Yorktown Heights,
NY) ; Mahindru; Ruchi; (Elmsford, NY) ;
Sailer; Anca; (Scarsdale, NY) ; Sarkar; Soumitra;
(Cary, NC) ; Wang; Long; (White Plains,
NY) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
International Business Machines Corporation |
Armonk |
NY |
US |
|
|
Family ID: |
57836178 |
Appl. No.: |
14/804528 |
Filed: |
July 21, 2015 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 11/263 20130101;
G06F 11/3414 20130101 |
International
Class: |
G06F 11/263 20060101
G06F011/263; G06F 11/22 20060101 G06F011/22 |
Claims
1. A method, comprising: determining one or more fault conditions
to be associated with a fault injection implementation based on one
or more parameters associated with a request for the fault
injection implementation, wherein said determining is carried out
automatically by a decision-making component executing on a
hardware processor; generating a specification for a lifecycle of
the fault injection implementation based on the one or more fault
conditions, wherein said generating is carried out automatically by
a fault injection specification component executing on the hardware
processor and communicatively linked to the decision-making
component; and executing the fault injection implementation in a
target system, wherein said executing comprises effecting the
lifecycle of the fault injection implementation according to the
generated specification, and wherein said executing is carried out
automatically by a fault injection execution component executing on
the hardware processor and communicatively linked to the
decision-making component and the fault injection specification
component.
2. The method of claim 1, wherein the one or more fault conditions
comprises fault type.
3. The method of claim 1, wherein the one or more fault conditions
comprises fault occasion.
4. The method of claim 1, wherein the one or more fault conditions
comprises fault location.
5. The method of claim 1, wherein the one or more fault conditions
comprises target system workload.
6. The method of claim 1, wherein the one or more parameters
associated with the request comprise identification of the target
system.
7. The method of claim 1, wherein said determining further
comprises determining the one or more fault conditions to be
associated with the fault injection implementation based on
feedback provided by an individual issuing the request.
8. The method of claim 1, wherein said determining further
comprises determining the one or more fault conditions to be
associated with the fault injection implementation based on input
provided by a knowledge based system.
9. The method of claim 1, wherein said determining further
comprises determining the one or more fault conditions to be
associated with the fault injection implementation based on input
provided by an error detection system.
10. The method of claim 1, wherein said determining further
comprises determining the one or more fault conditions to be
associated with the fault injection implementation based on input
provided by a cloud monitoring system.
11. The method of claim 1, wherein said determining the one or more
fault conditions to be associated with the fault injection
implementation comprises ranking multiple fault conditions based on
one or more variables.
12. The method of claim 11, wherein the one or more variables
comprises a likelihood of each of the multiple fault conditions
causing the target system to fail.
13. The method of claim 11, wherein the one or more variables
comprises severity of failures caused by each of the multiple fault
conditions.
14. The method of claim 11, wherein the one or more variables
comprises a frequency of usage of each of the multiple fault
conditions in systems analogous to the target system.
15. The method of claim 11, wherein the one or more variables
comprises a frequency of past usage of each of the multiple fault
conditions in the target system.
16. The method of claim 1, wherein at least one of (i) said
determining, (ii) said generating, and (iii) said executing is a
cloud-based service.
17. The method of claim 1, comprising: monitoring one or more
predetermined items of data during the fault injection
implementation in the target system.
18. The method of claim 17, comprising: outputting the one or more
predetermined items of data monitored during the fault injection
implementation to a database.
19. A computer program product, the computer program product
comprising a computer readable storage medium having program
instructions embodied therewith, the program instructions
executable by a computing device to cause the computing device to:
determine one or more fault conditions to be associated with a
fault injection implementation based on one or more parameters
associated with a request for the fault injection implementation;
generate a specification for a lifecycle of the fault injection
implementation based on the one or more fault conditions; and
execute the fault injection implementation in a target system,
wherein said executing comprises effecting the lifecycle of the
fault injection implementation according to the generated
specification.
20. A system comprising: a memory; and at least one processor
coupled to the memory and configured for: determining one or more
fault conditions to be associated with a fault injection
implementation based on one or more parameters associated with a
request for the fault injection implementation; generating a
specification for a lifecycle of the fault injection implementation
based on the one or more fault conditions; and executing the fault
injection implementation in a target system, wherein said executing
comprises effecting the lifecycle of the fault injection
implementation according to the generated specification.
Description
FIELD
[0001] The present application generally relates to information
technology, and, more particularly, to fault injection
techniques.
BACKGROUND
[0002] Fault injection (FI) is commonly used for evaluating the
resilience of systems. Existing FI approaches, however, involve a
significant amount of manual decision making, such as determining,
for example, what type of errors should be injected, when a fault
should be injected, which object, component, process, and/or
software-stack-level should be the target of the fault injection,
which value and/or variable in the target object, component,
process, and/or software-stack-level should be injected with what
erroneous value, and what workload should be used for fault
injection trials. Such approaches, accordingly, are inefficient,
costly and time-consuming to carry out.
SUMMARY
[0003] In one aspect of the present invention, techniques for
providing fault injection to Cloud-provisioned machines are
provided. An exemplary computer-implemented method can include
steps of determining one or more fault conditions to be associated
with a fault injection implementation based on one or more
parameters associated with a request for the fault injection
implementation; generating a specification for a lifecycle of the
fault injection implementation based on the one or more fault
conditions; and executing the fault injection implementation in a
target system, wherein said executing comprises effecting the
lifecycle of the fault injection implementation according to the
generated specification.
[0004] Another aspect of the invention or elements thereof can be
implemented in the form of an article of manufacture tangibly
embodying computer readable instructions which, when implemented,
cause a computer to carry out a plurality of method steps, as
described herein. Furthermore, another aspect of the invention or
elements thereof can be implemented in the form of an apparatus
including a memory and at least one processor that is coupled to
the memory and configured to perform noted method steps. Yet
further, another aspect of the invention or elements thereof can be
implemented in the form of means for carrying out the method steps
described herein, or elements thereof; the means can include
hardware module(s) or a combination of hardware and software
modules, wherein the software modules are stored in a tangible
computer-readable storage medium (or multiple such media).
[0005] These and other objects, features and advantages of the
present invention will become apparent from the following detailed
description of illustrative embodiments thereof, which is to be
read in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] FIG. 1 is a diagram illustrating system architecture,
according to an example embodiment of the invention;
[0007] FIG. 2 is a flow diagram illustrating techniques according
to an embodiment of the invention; and
[0008] FIG. 3 is a system diagram of an exemplary computer system
on which at least one embodiment of the invention can be
implemented.
DETAILED DESCRIPTION
[0009] As described herein, an aspect of the present invention
includes an intelligent service providing fault injection to
Cloud-provisioned machines. An example embodiment of the invention
can be implemented as a service that can be provided to customers,
administrators, or other users for conducting fault injection onto
specified machines, applications, and/or systems automatically,
thereby minimizing human intervention. By way of example,
representational state transfer (REST) application programming
interface (API) requests may form the invocations.
[0010] At least one embodiment of the invention includes
intelligently generating multiple decisions required by a fault
injection process based on one or more inputs and forms of
feedback. The inputs for the decisions can be derived from
knowledge about the target machines, application and/or systems.
The knowledge can be derived, for example, from management tools,
monitoring capabilities, collected performance data, predefined
virtual machine (VM) images, history of the application and/or
system behavior, workload history, etc. As noted, one or more
embodiments of the invention can additionally include
feedback-based refinement of the decisions, carried out
iteratively, for example.
[0011] Additionally, at least one embodiment of the invention
includes generating a service that, when implemented, intelligently
performs procedures required during a fault injection lifecycle. In
such a service, deployment steps adapt to the specified target
machines, applications, and/or systems. Also, such a service plans
and sets-up the given fault injection campaign based on intelligent
decision making, as well as user-provided input information.
Further, as detailed herein, such a service orchestrates each fault
injection experiment and generates a reliability evaluation.
[0012] At least one embodiment of the invention includes utilizing
machine learning and data pertaining to past FI experiences to
process and generate decisions for present FI instances. Also, as
described herein, feedback from FI results can be leveraged to
refine the decision-making mechanism. In addition to intelligently
and automatically performing all procedures in the fault injection
life-cycle, one or more embodiments of the invention include
generating an overall specification of an entire given fault
injection process. Such a specification can include
per-FI-experiment workload, deployment steps, FI life-cycle
driving, fault specification, etc.
[0013] FIG. 1 is a diagram illustrating system architecture,
according to an embodiment of the invention. By way of
illustration, FIG. 1 depicts an FI request submitted by an
administrator and/or a customer (with parameters) that is received
via an API 102 and forwarded to a decision-making planner component
104 for processing. The parameters of the request may include, as
free-form strings or structured data, the target machines for the
fault injection, the configuration file for the fault injection
(including user name and password, for example), the purpose of the
fault injection, guidelines and/or heuristics on fault types, etc.
The decision-making planner component 104 includes a sub-component
(or engine) 106 for deciding fault type, a sub-component (or
engine) 108 for deciding fault occasion, a sub-component (or
engine) 110 for deciding fault location, and a sub-component (or
engine) 112 for deciding workload. As further detailed herein, the
decision-making planner component 104 outputs data to an FI plan
specification component 114, which forwards input to an FI plan
executor component 118.
[0014] The FI plan executor component 118 includes a sub-component
120 for installing a central FI controller, a sub-component 122 for
deploying an FI agent and fault injector, a sub-component 124 for
setting-up an FI campaign, and a sub-component 126 for managing the
life-cycle of an FI experiment. Additionally, the FI plan executor
component 118 provides data to cloud manipulation facilities 116 as
well as multiple databases. Such cloud manipulation facilities can
include VM provisioning capability, image management units,
software-defined networking, storage and compute resource
controllers, security authentication mechanisms, a workflow
orchestrator, etc. Such databases can include, for example, a cloud
data and/or monitoring data database 130, a fault and/or error
knowledge base 132, and an FI analysis results database 134, all of
which can receive feedback from users and/or administrators. By way
of example, and as depicted in FIG. 1, the FI plan executor
component 118 can provide monitoring data to the cloud data and/or
monitoring data database 130 as well as the fault and/or error
knowledge base 132. Additionally, the FI plan executor component
118 can provide an FI collection to the FI analysis results
database 134.
[0015] As additionally illustrated in FIG. 1, cloud data and/or
monitoring data database 130, fault and/or error knowledge base
132, and FI analysis results database 134 can each interact with
(and receive data from) an analytics engine 128, which receives
queries from the decision-making planner component 104.
[0016] By way of further description, component 102 exposes an API
for external entities to invoke the fault injection service. After
receiving a request, component 102 processes the request and
assigns a task to the decision-making planner 104. The
decision-making planner 104 uses analytics mechanisms to
automatically make decisions on fault injection, wherein such
decisions can include fault type (via sub-component 106), fault
occasion (via sub-component 108), fault location (via sub-component
110) and workload for the fault injection experiments (via
sub-component 112). Specifically, component 104 formulates certain
queries, issues the queries to the analytics engine 128, obtains
responses from the engine 128, and makes decisions based on the
responses.
[0017] Additionally, component 114 represents an outcome provided
by the decision-making planner 104; that is, all decisions made by
the planner 104 on fault injection are documented into an FI plan
specification 114 in an automated way. Further, component 116
includes the facilities and capabilities provided by the Cloud
infrastructure and leveraged by one or more embodiments of the
invention for injecting faults. These facilities and capabilities
can include VM provisioning capability, image management units, a
software-defined network, a storage and compute resource
controller, a security authentication, workflow orchestrator, etc.
Component 118 executes the FI plan 114 and conducts the real fault
injection work by leveraging the cloud manipulation facilities 116.
Particularly, in one or more embodiments of the invention, at least
four tasks are carried out by component 118: installation of the
central FI controller (via sub-component 120), deployment of FI
agents and fault injectors onto the target machines (via
sub-component 122), setup of the fault injection campaigns (via
sub-component 124), and management of the FI experiment lifecycles
(via sub-component 126). Moreover, the executor 118 collects
monitoring and performance data of the target machines and the
cloud environment, as well as the fault injection-related data (for
example, fault type, fault occasion, fault location, workload,
etc.), and places such data into the cloud data/monitoring data
database 130 and the knowledge base on fault/error 132,
respectively.
[0018] Accordingly, database 130 represents a data repository that
stores the collected monitoring and performance data of the target
machines and the cloud environment. The collected data can be
derived from the FI plan executor 118 and the cloud facilities 116,
as well as from the users or admins directly. Also, knowledge base
132 represents a data repository that stores the knowledge on
faults and errors and other fault/error related data. The data can
be derived from the FI plan executor 118 and the cloud facilities
116, as well as from the users or admins directly. Further,
database 134 represents a data repository that stores results of
analysis data derived from the analytics engine 128. Such analysis
results can be used for future analytics of the analytics engine
128. Moreover, users/admins can use the analysis results for any
purpose as well.
[0019] As noted above, decision-making planner component 104
includes sub-component 106 for deciding fault type, sub-component
108 for deciding fault occasion, sub-component 110 for deciding
fault location, and sub-component 112 for deciding workload.
Accordingly, sub-component 106 determines what type of fault to
inject. Fault types can include a hardware fault (for example, a
broken hardware device, random bit flips, etc.), a network error
(for example, a socket that is occupied, not released, and/or
forbidden, a misconfigured network firewall, a switch failure,
etc.), an application and/or middleware failure (for example, a
failure of individual database software processes, a failure of
WebSphere processes, etc.), a configuration error (for example, an
incorrect setup of database software, an erroneous setup of ports,
etc.), and/or incorrect parameters from workloads and/or user
inputs.
[0020] Additionally, sub-component 106 carries out automated and
intelligent decision-making based on, for example, learning what
types of errors were encountered for the target machine, system
and/or application and for the same type of applications in the
literature, as well as based on learning from the history of the
target machine, system and/or application accessed from
entity-specific databases (as well as other machines, systems
and/or applications accessed from separate sources) regarding the
distribution of past faults and/or errors of various types. Such
decision-making can also be based, for example, on Cloud data
pertaining to the target machine, system and/or application.
[0021] Based on such learning, sub-component 106 infers one or more
rules on given types of variables and configuration values. By way
of example, one such rule might include a statement that each digit
of an internet protocol (IP) address cannot be more than 255. Also,
in addition to using random fault values, at least one embodiment
of the invention includes using correct fault types and correct
fault values derived from above-noted learnings to reduce an FI
space.
[0022] Further, as noted above, sub-component 108 determines when
to inject a fault. Accordingly, sub-component 108 carries out
automated and intelligent decision-making based on, for example,
observed and extracted (from relevant entity-specific databases,
for example) scenarios of error occurrences that were encountered
for the target machine, system and/or application and for the same
type of machines, systems and/or applications as encountered in
other data sources. The observed and extracted results may be
manually obtained by users via personal insights and/or interactive
tool-aided inspection, or may be automatically obtained through one
or more monitoring tools and/or discovery tools.
[0023] Sub-component 108 also carries out smart profiling of the
target machine, system and/or application to identify one or more
occasion points of known applications and/or middleware. For
example, certain applications and/or middleware have different
stages, and such knowledge for known applications and/or middleware
can be leveraged by the fault injection service via sub-component
108. Further, in at least one embodiment of the invention an FI
occasion determination can be linked with or to certain stages.
Also, in a controlled environment (such as, for example, a Cloud
environment), at least one embodiment of the invention can include
leveraging monitoring infrastructure and known attributes and/or
tags available for the target machine, system and/or application.
For example, certain applications and/or middleware have different
stages (for example, connecting, request received, metadata
retrieved, etc.), wherein the stage information is encoded as an
attribute or a tag available to the controlled environment. The
knowledge for the known applications and/or middleware can be
leveraged by the fault injection service.
[0024] Additionally, one or more embodiments can also include
problem-driven profiling, wherein the user and/or administrator can
specify what problems and/or which parts of the machine, system
and/or application should be the focused in a given FI study. In
studying focused problems or parts of the target machine, system
and/or application, at least one embodiment of the invention
includes locating and/or identifying the correct workloads and
inputs during profiling, and then identifying the correct fault
occasions from the profiling. One example technique for identifying
correct workloads and inputs is to implement the FI service to
launch different types of workloads with different inputs, and to
monitor whether the focused parts or the parts related to the
focused problems are involved during the workload execution. The
monitored results can be saved so that when the FI service handles
another fault injection request, these results can be used as
reference to help determine the correct workload and input.
Additionally, one example technique for identifying correct fault
occasions is to only limit fault occasions to those occasions when
focused parts are executing.
[0025] Referring again to decision-making planner component 104,
sub-component 110 determines where (for example, which target
component) and what item (for example, specific values and/or
variables) into which to inject the fault. Also, sub-component 110
carries out automated and intelligent decision-making based on, for
example, observing and extracting (from relevant entity-specific
databases, for example) a distribution of error locations for the
target machine, system and/or application, as well as for similar
types of machines, systems and/or applications from separate data
sources. Accordingly, such distributions of different error
locations and/or target components can be inferred by learning from
historical data such as, for example, past error behavior of the
target machine, system and/or application as well as other similar
machines, systems and/or applications. Such error behavior can
include observations of system logs and application logs, monitored
message flows, a dump of call stacks at the failure point, an
application output, a metadata record in middleware database,
etc.
[0026] In accordance with one or more embodiments of the invention,
there can be multiple fault locations for the same type of error.
By way of example, for a network error, at least one embodiment of
the invention can include injecting into a switch, a network
interface controller (NIC), a device driver, a system call, a
socket library, a library call, an application, etc. Such fault
location determinations can be made based on, for example, learning
from Cloud data on target machines, systems and/or applications, as
well as on learning from a knowledge base on faults and/or errors
(such as KB 132 in FIG. 1) pertaining to a multitude of
systems.
[0027] Also, at least one embodiment of the invention includes
prioritizing injection of multiple faults given constraints of time
and/or cost, as well as given aims to improve FI efficiency. Such
prioritization can be based, for example, on the probability of the
given faults, impact (severity) of the given faults, and/or the
cost associated with the given faults (in terms of central
processing unit (CPU), memory, disk, monetary expense, etc.).
[0028] In identifying the value and/or variable into which to
inject a given fault, at least one embodiment of the invention
includes identifying a set of values and/or variables from the
target component and selecting a value from the set. Such a value
might include, for example, a configuration attribute value, a
variable in the stack, a value in the control flow, etc. In one or
more embodiments of the invention, an exact variable need not be
utilized for evaluating resilience.
[0029] Referring again to the decision-making planner component 104
in FIG. 1, sub-component 112 determines the workload, that is, the
type of request mix and the amount of the load (for example,
requests per minute). Additionally, sub-component 112 carries out
automated and intelligent decision-making based on, for example,
automatically extracting workload data from historical data
pertaining to typical workloads for the target machine, system
and/or application, as well as for the same type of machines,
systems and/or applications. By way of example, such extracting can
result in repeating the average workload of the past week and/or
month for a given machine, system and/or application.
[0030] In one or more embodiments of the invention, target
components (such as fault locations) can be associated with certain
types of workloads. For example, if the injection of faults into
migration components is desired, a migration workload can be
utilized. By way of further example, if the injection of
resource-exhaustion faults is desired, heavy workloads can be
utilized. Such an embodiment of the invention can include
implementing and maintaining a table between fault components and
(type, amount of) workloads.
[0031] Referring back to FIG. 1, component 114 generates the
specification of a given fault injection plan. Such a specification
can be based, for example, on data pertaining to fault occasions
(such data pertaining to a sequence of events such as various
functions, matching of data, etc.), data pertaining to fault type,
and data pertaining to fault location. Additionally, an FI plan can
include a specification of a given fault injection experiment, a
specification of the workload for each fault injection experiment,
steps of automatic deployment of the test bed and fault injection
infrastructure and/or tools, a strategy for driving the life-cycle
of an FI experiment from beginning to end (which can include, for
example, creating the scripts for each step in the life-cycle), and
documenting the monitoring tools and logs that are leveraged for
collecting data.
[0032] As additionally noted above in connection with the
description of FIG. 1, the FI plan executor component 118 includes
sub-component 120 for installing a central FI controller,
sub-component 122 for deploying an FI agent and fault injector,
sub-component 124 for setting-up an FI campaign, and sub-component
126 for managing the life-cycle of an FI experiment. Fault
injection requires precise coordination of the FI campaign setup,
workload execution, interception and injection of the fault
injector, especially when the workload involves multiple machines'
execution. Therefore, a central controller is typically created in
conducting fault injection. In the example embodiment of the
invention depicted in FIG. 1, sub-component 120 installs the
central FI controller. Accordingly, via sub-component 122 and
sub-component 124, the FI plan executor component 118 can carry out
deployment and experiment setup of an FI plan. As detailed herein,
a fault injection plan specifies steps of automatic deployment of
the test bed and fault injection infrastructure and/or tools. The
FI plan executor component 118 invokes the corresponding scripts
and workflows (via sub-component 122) to carry out the automatic
deployment using Cloud capabilities.
[0033] Additionally, the FI plan executor component 118 drives the
life-cycle of fault injection experiment via sub-component 126.
Such actions include initializing the test bed machines and saving
the VM images of the initialized machines. Such initialization
includes installing the FI infrastructures and tools as well as
Cloud monitoring tools onto the machines with the target
applications and/or systems. Also, initialization includes
generating scripts for the life-cycle, which can include, for
example, scripts for starting the saved VM images, starting the
workload, injecting the fault, waiting for experiment completion,
and copying the logs and monitoring data.
[0034] At least one embodiment of the invention additionally
includes providing a set of fault injector primitives for changing
certain types of values. Such fault injector primitives can
include, for example, simple fault injector primitives for certain
simple fault types (such as process crashes, etc.), as well as
complicated fault injectors for certain fault types (such as
communication errors, incorrect return values, etc.). Additionally,
at least one embodiment of the invention can include incorporating
custom fault injector primitives provided by users and/or
administrators.
[0035] Also, one or more embodiments of the invention include
implementing low-latency detection of experiment completeness,
which utilizes monitoring tools to efficiently determine experiment
completeness. Additionally, in at least one embodiment of the
invention, feedback from the FI plan executor component 118 can be
applied as input to the decision-making for refinement. Feedback
from the FI plan executor 118 is stored in the data repositories
130 and 132. Subsequently, the decision-making planner 104 can use
the feedback stored in repositories 130 and 132 to perform refined
decision-making.
[0036] Further, as detailed herein, intelligent FI result analysis
is carried out by analytics engine 128. In one or more embodiments
of the invention, the analytics engine 128 combines multiple
analysis methods to answer reliability-related queries and
questions. Such analysis methods can include, for example,
generating histograms of outcome cases (with crash, hang,
fail-silent violation, success, and finer outcome categories), log
correlation for tracing error propagation, clustering faults or
failures, correlation analysis among fault type, outcome category,
varied workload scenarios, different metrics, etc. Such an
embodiment of the invention includes selecting a particular
combination of analysis methods for supplying answers to given
queries. In at least one embodiment of the invention, a
query-specific combination of methods will be used.
[0037] Also, in one or more embodiments of the invention, the
analytics engine 128 can implement a long-term analysis. By way of
example, if a fault injection is run multiple times at different
times, different outcomes may result, and such data can be used for
long-term analysis. Additionally, at least one embodiment of the
invention can include automatically scheduling fault injection
experiments based on certain conditions and/or temporal parameters.
Example conditions can include a new code release, a certain period
of time in software aging, etc., and example temporal parameters
can include time elapsed between fault injections.
[0038] FIG. 2 is a flow diagram illustrating techniques according
to an embodiment of the present invention. Step 202 includes
determining one or more fault conditions to be associated with a
fault injection implementation based on one or more parameters
associated with a request for the fault injection implementation,
wherein said determining is carried out automatically by a
decision-making component executing on a hardware processor. The
fault conditions can include fault type, fault occasion, fault
location, and/or target system workload. Additionally, the one or
more parameters associated with the request can include
identification of the target system.
[0039] Determining the one or more fault conditions can also be
based, for example, on feedback provided by an individual issuing
the request, input provided by a knowledge based system (for
surveying literature to determine fault conditions), input provided
by an error detection system, and/or input provided by a cloud
monitoring system. Additionally, determining the one or more fault
conditions can include ranking multiple fault conditions based on
one or more variables. Such variables can include, for example, a
likelihood of each of the multiple fault conditions causing the
target system to fail, severity of failures caused by each of the
multiple fault conditions, a frequency of usage of each of the
multiple fault conditions in systems analogous to the target
system, a frequency of past usage of each of the multiple fault
conditions in the target system, and/or one or more user-specified
rules.
[0040] Step 204 includes generating a specification for a lifecycle
of the fault injection implementation based on the one or more
fault conditions, wherein said generating is carried out
automatically by a fault injection specification component
executing on the hardware processor and communicatively linked to
the decision-making component. Step 206 includes executing the
fault injection implementation in a target system, wherein said
executing comprises effecting the lifecycle of the fault injection
implementation according to the generated specification, and
wherein said executing is carried out automatically by a fault
injection execution component executing on the hardware processor
and communicatively linked to the decision-making component and the
fault injection specification component.
[0041] The techniques depicted in FIG. 2 can additionally include
monitoring one or more predetermined items of data during the fault
injection implementation in the target system, and outputting the
one or more predetermined items of data monitored during the fault
injection implementation to a database. Further, in at least one
embodiment of the invention, step 202, step 204, and/or step 206
can be provided as a cloud-based service.
[0042] Also, an additional embodiment of the invention includes
determining a set of multiple fault conditions to be associated
with a target system fault injection implementation based on one or
more parameters associated with a request for the fault injection
implementation, wherein said set of multiple fault conditions
comprises at least: (i) fault type, (ii) fault occasion, (iii)
fault location, and (iv) target system workload, and wherein said
determining is carried out by a decision-making component executing
on a hardware processor. Such an embodiment also includes
generating a specification for a lifecycle of the fault injection
implementation based on the set of multiple fault conditions,
wherein said generating is carried out by a fault injection
specification component executing on the hardware processor and
communicatively linked to the decision-making component.
Additionally, such an embodiment includes executing the fault
injection implementation in the target system, wherein said
executing comprises effecting the lifecycle of the fault injection
implementation according to the generated specification, and
wherein said executing is carried out by a fault injection
execution component executing on the hardware processor and
communicatively linked to the decision-making component and the
fault injection specification component. Further, such an
embodiment includes monitoring one or more predetermined items of
data during the fault injection implementation in the target
system, and outputting the one or more predetermined items of data
monitored during the fault injection implementation to a database
executing on the hardware processor and communicatively linked to
the execution component.
[0043] The techniques depicted in FIG. 2 can also, as described
herein, include providing a system, wherein the system includes
distinct software modules, each of the distinct software modules
being embodied on a tangible computer-readable recordable storage
medium. All of the modules (or any subset thereof) can be on the
same medium, or each can be on a different medium, for example. The
modules can include any or all of the components shown in the
figures and/or described herein. In an aspect of the invention, the
modules can run, for example, on a hardware processor. The method
steps can then be carried out using the distinct software modules
of the system, as described above, executing on a hardware
processor. Further, a computer program product can include a
tangible computer-readable recordable storage medium with code
adapted to be executed to carry out at least one method step
described herein, including the provision of the system with the
distinct software modules.
[0044] Additionally, the techniques depicted in FIG. 2 can be
implemented via a computer program product that can include
computer useable program code that is stored in a computer readable
storage medium in a data processing system, and wherein the
computer useable program code was downloaded over a network from a
remote data processing system. Also, in an aspect of the invention,
the computer program product can include computer useable program
code that is stored in a computer readable storage medium in a
server data processing system, and wherein the computer useable
program code is downloaded over a network to a remote data
processing system for use in a computer readable storage medium
with the remote system.
[0045] An aspect of the invention or elements thereof can be
implemented in the form of an apparatus including a memory and at
least one processor that is coupled to the memory and configured to
perform exemplary method steps.
[0046] Additionally, an aspect of the present invention can make
use of software running on a computer or workstation. With
reference to FIG. 3, such an implementation might employ, for
example, a processor 302, a memory 304, and an input/output
interface formed, for example, by a display 306 and a keyboard 308.
The term "processor" as used herein is intended to include any
processing device, such as, for example, one that includes a CPU
(central processing unit) and/or other forms of processing
circuitry. Further, the term "processor" may refer to more than one
individual processor. The term "memory" is intended to include
memory associated with a processor or CPU, such as, for example,
RAM (random access memory), ROM (read only memory), a fixed memory
device (for example, hard drive), a removable memory device (for
example, diskette), a flash memory and the like. In addition, the
phrase "input/output interface" as used herein, is intended to
include, for example, a mechanism for inputting data to the
processing unit (for example, mouse), and a mechanism for providing
results associated with the processing unit (for example, printer).
The processor 302, memory 304, and input/output interface such as
display 306 and keyboard 308 can be interconnected, for example,
via bus 310 as part of a data processing unit 312. Suitable
interconnections, for example via bus 310, can also be provided to
a network interface 314, such as a network card, which can be
provided to interface with a computer network, and to a media
interface 316, such as a diskette or CD-ROM drive, which can be
provided to interface with media 318.
[0047] Accordingly, computer software including instructions or
code for performing the methodologies of the invention, as
described herein, may be stored in associated memory devices (for
example, ROM, fixed or removable memory) and, when ready to be
utilized, loaded in part or in whole (for example, into RAM) and
implemented by a CPU. Such software could include, but is not
limited to, firmware, resident software, microcode, and the
like.
[0048] A data processing system suitable for storing and/or
executing program code will include at least one processor 302
coupled directly or indirectly to memory elements 304 through a
system bus 310. The memory elements can include local memory
employed during actual implementation of the program code, bulk
storage, and cache memories which provide temporary storage of at
least some program code in order to reduce the number of times code
must be retrieved from bulk storage during implementation.
[0049] Input/output or I/O devices (including, but not limited to,
keyboards 308, displays 306, pointing devices, and the like) can be
coupled to the system either directly (such as via bus 310) or
through intervening I/O controllers (omitted for clarity).
[0050] Network adapters such as network interface 314 may also be
coupled to the system to enable the data processing system to
become coupled to other data processing systems or remote printers
or storage devices through intervening private or public networks.
Modems, cable modems and Ethernet cards are just a few of the
currently available types of network adapters.
[0051] As used herein, including the claims, a "server" includes a
physical data processing system (for example, system 312 as shown
in FIG. 3) running a server program. It will be understood that
such a physical server may or may not include a display and
keyboard.
[0052] As will be appreciated by one skilled in the art, aspects of
the present invention may be embodied as a system, method and/or
computer program product. Accordingly, aspects of the present
invention may take the form of an entirely hardware embodiment, an
entirely software embodiment (including firmware, resident
software, micro-code, etc.) or an embodiment combining software and
hardware aspects that may all generally be referred to herein as a
"circuit," "module" or "system." Furthermore, as noted herein,
aspects of the present invention may take the form of a computer
program product that may include a computer readable storage medium
(or media) having computer readable program instructions thereon
for causing a processor to carry out aspects of the present
invention.
[0053] The computer readable storage medium can be a tangible
device that can retain and store instructions for use by an
instruction execution device. The computer readable storage medium
may be, for example, but is not limited to, an electronic storage
device, a magnetic storage device, an optical storage device, an
electromagnetic storage device, a semiconductor storage device, or
any suitable combination of the foregoing. A non-exhaustive list of
more specific examples of the computer readable storage medium
includes the following: a portable computer diskette, a hard disk,
a random access memory (RAM), a read-only memory (ROM), an erasable
programmable read-only memory (EPROM or Flash memory), a static
random access memory (SRAM), a portable compact disc read-only
memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a
floppy disk, a mechanically encoded device such as punch-cards or
raised structures in a groove having instructions recorded thereon,
and any suitable combination of the foregoing. A computer readable
storage medium, as used herein, is not to be construed as being
transitory signals per se, such as radio waves or other freely
propagating electromagnetic waves, electromagnetic waves
propagating through a waveguide or other transmission media (for
example, light pulses passing through a fiber-optic cable), or
electrical signals transmitted through a wire.
[0054] Computer readable program instructions described herein can
be downloaded to respective computing/processing devices from a
computer readable storage medium or to an external computer or
external storage device via a network, for example, the Internet, a
local area network, a wide area network and/or a wireless network.
The network may comprise copper transmission cables, optical
transmission fibers, wireless transmission, routers, firewalls,
switches, gateway computers and/or edge servers. A network adapter
card or network interface in each computing/processing device
receives computer readable program instructions from the network
and forwards the computer readable program instructions for storage
in a computer readable storage medium within the respective
computing/processing device.
[0055] Computer readable program instructions for carrying out
operations of the present invention may be assembler instructions,
instruction-set-architecture (ISA) instructions, machine
instructions, machine dependent instructions, microcode, firmware
instructions, state-setting data, or either source code or object
code written in any combination of one or more programming
languages, including an object oriented programming language such
as Java, Smalltalk, C++ or the like, and conventional procedural
programming languages, such as the "C" programming language or
similar programming languages. The computer readable program
instructions may execute entirely on the user's computer, partly on
the user's computer, as a stand-alone software package, partly on
the user's computer and partly on a remote computer or entirely on
the remote computer or server. In the latter scenario, the remote
computer may be connected to the user's computer through any type
of network, including a local area network (LAN) or a wide area
network (WAN), or the connection may be made to an external
computer (for example, through the Internet using an Internet
Service Provider). In some embodiments, electronic circuitry
including, for example, programmable logic circuitry,
field-programmable gate arrays (FPGA), or programmable logic arrays
(PLA) may execute the computer readable program instructions by
utilizing state information of the computer readable program
instructions to personalize the electronic circuitry, in order to
perform aspects of the present invention.
[0056] Aspects of the present invention are described herein with
reference to flowchart illustrations and/or block diagrams of
methods, apparatus (systems), and computer program products
according to embodiments of the invention. It will be understood
that each block of the flowchart illustrations and/or block
diagrams, and combinations of blocks in the flowchart illustrations
and/or block diagrams, can be implemented by computer readable
program instructions.
[0057] These computer readable program instructions may be provided
to a processor of a special purpose computer or other programmable
data processing apparatus to produce a machine, such that the
instructions, which execute via the processor of the computer or
other programmable data processing apparatus, create means for
implementing the functions/acts specified in the flowchart and/or
block diagram block or blocks. These computer readable program
instructions may also be stored in a computer readable storage
medium that can direct a computer, a programmable data processing
apparatus, and/or other devices to function in a particular manner,
such that the computer readable storage medium having instructions
stored therein comprises an article of manufacture including
instructions which implement aspects of the function/act specified
in the flowchart and/or block diagram block or blocks.
[0058] The computer readable program instructions may also be
loaded onto a computer, other programmable data processing
apparatus, or other device to cause a series of operational steps
to be performed on the computer, other programmable apparatus or
other device to produce a computer implemented process, such that
the instructions which execute on the computer, other programmable
apparatus, or other device implement the functions/acts specified
in the flowchart and/or block diagram block or blocks.
[0059] The flowchart and block diagrams in the figures illustrate
the architecture, functionality, and operation of possible
implementations of systems, methods, and computer program products
according to various embodiments of the present invention. In this
regard, each block in the flowchart or block diagrams may represent
a module, segment, or portion of instructions, which comprises one
or more executable instructions for implementing the specified
logical function(s). In some alternative implementations, the
functions noted in the block may occur out of the order noted in
the figures. For example, two blocks shown in succession may, in
fact, be executed substantially concurrently, or the blocks may
sometimes be executed in the reverse order, depending upon the
functionality involved. It will also be noted that each block of
the block diagrams and/or flowchart illustration, and combinations
of blocks in the block diagrams and/or flowchart illustration, can
be implemented by special purpose hardware-based systems that
perform the specified functions and/or acts or carry out
combinations of special purpose hardware and computer
instructions.
[0060] It should be noted that any of the methods described herein
can include an additional step of providing a system comprising
distinct software modules embodied on a computer readable storage
medium; the modules can include, for example, any or all of the
components detailed herein. The method steps can then be carried
out using the distinct software modules and/or sub-modules of the
system, as described above, executing on a hardware processor 302.
Further, a computer program product can include a computer-readable
storage medium with code adapted to be implemented to carry out at
least one method step described herein, including the provision of
the system with the distinct software modules.
[0061] In any case, it should be understood that the components
illustrated herein may be implemented in various forms of hardware,
software, or combinations thereof, for example, application
specific integrated circuit(s) (ASICS), functional circuitry, an
appropriately programmed digital computer with associated memory,
and the like. Given the teachings of the invention provided herein,
one of ordinary skill in the related art will be able to
contemplate other implementations of the components of the
invention.
[0062] The terminology used herein is for the purpose of describing
particular embodiments only and is not intended to be limiting of
the invention. As used herein, the singular forms "a," "an" and
"the" are intended to include the plural forms as well, unless the
context clearly indicates otherwise. It will be further understood
that the terms "comprises" and/or "comprising," when used in this
specification, specify the presence of stated features, steps,
operations, elements, and/or components, but do not preclude the
presence or addition of another feature, step, operation, element,
component, and/or group thereof.
[0063] At least one aspect of the present invention may provide a
beneficial effect such as, for example, performing automatic fault
injection upon target machines, systems and/or applications to
minimize user intervention.
[0064] The descriptions of the various embodiments of the present
invention have been presented for purposes of illustration, but are
not intended to be exhaustive or limited to the embodiments
disclosed. Many modifications and variations will be apparent to
those of ordinary skill in the art without departing from the scope
and spirit of the described embodiments. The terminology used
herein was chosen to best explain the principles of the
embodiments, the practical application or technical improvement
over technologies found in the marketplace, or to enable others of
ordinary skill in the art to understand the embodiments disclosed
herein.
* * * * *