U.S. patent application number 15/222881 was filed with the patent office on 2017-02-02 for data analytics and management of computing infrastructures.
The applicant listed for this patent is Metriv, Inc.. Invention is credited to Christopher Baker, Theodore A. Carroll, John Scumniotales, Bruce Twito.
Application Number | 20170034016 15/222881 |
Document ID | / |
Family ID | 57886129 |
Filed Date | 2017-02-02 |
United States Patent
Application |
20170034016 |
Kind Code |
A1 |
Carroll; Theodore A. ; et
al. |
February 2, 2017 |
DATA ANALYTICS AND MANAGEMENT OF COMPUTING INFRASTRUCTURES
Abstract
Methods, systems, and techniques for analyzing and managing time
series workload data are provided. Example embodiments provide a
Data Management and Analysis platform that enables end users to
modernize their system configurations to incorporate external
services techniques, such as cloud technologies, and to utilize
virtualization technologies to host some of their functions in a
manner that improves the overall performance of their
configurations. This may be particularly useful in an IT
environment with many interdependent systems where it is hard to
analyze and determine where externally hosted or hybrid (e.g., both
remote and local) systems might improve the overall delivery of
services and cost to end users.
Inventors: |
Carroll; Theodore A.;
(Seattle, WA) ; Twito; Bruce; (Lake Forest Park,
WA) ; Scumniotales; John; (Bainbridge Island, WA)
; Baker; Christopher; (Clyde Hill, WA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Metriv, Inc. |
Seattle |
WA |
US |
|
|
Family ID: |
57886129 |
Appl. No.: |
15/222881 |
Filed: |
July 28, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62198052 |
Jul 28, 2015 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04L 41/0813 20130101;
H04L 43/026 20130101; H04L 43/04 20130101; H04L 41/142 20130101;
H04L 41/0853 20130101 |
International
Class: |
H04L 12/26 20060101
H04L012/26; H04L 12/24 20060101 H04L012/24 |
Claims
1. A method for automatically analyzing performance of a computing
system configuration comprising: installing a plurality of sensors
into the computing system configuration having a plurality of
computing nodes, wherein each sensor is one of a native sensor that
receives data directly from an operating system of a computing node
of the computing system configuration, a multiplexing sensor that
is a pluggable service configured to discover and collect
configuration information from multiple computing nodes in the
computing system configuration using Application Programming
Interfaces (APIs) of environments located behind a firewall, or a
cloud sensor that collects performance information from APIs
available from a cloud-based service, wherein the one or more
sensors are hardware, software, or a combination or both; and under
control of a computing system, receiving data from the plurality of
installed sensors; automatically aggregating the received data and
performing statistical analytics on the aggregated data regarding
one or more of resource requirements, interdependencies, and
efficacy of the computing nodes of the computing system
environment; and forwarding the results of the statistical
analytics as recommendations for configuration changes.
2. The method of claim 1 wherein the forwarding the results of the
statistical analytics as recommendations further comprises: under
control of the computing system, automatically determining which
nodes or interdependent sets of nodes are most appropriate to
consider for migration to cloud technologies; automatically
determining a recommended set of steps to update and move the
determined nodes to the cloud: automatically calculating a
projection of costs for moving the determined nodes to the cloud;
and outputting an indication of the determined nodes to consider
for migration, the recommended steps, and the calculated projection
of costs.
3. The method of claim 2 wherein the automatically determining
which nodes are most appropriate to consider for migration to cloud
technologies incorporates input from user solicited data from
metadata or questionnaires regarding types of workloads and
environments that users considered appropriate to migrate.
4. The method of claim 1 wherein the forwarding the results of the
statistical analytics as recommendations further comprises: under
control of the computing system, automatically determining which
nodes or interdependent sets of nodes are most appropriate to
consider for migration to virtualized operating environments;
automatically determining a recommended set of steps to update and
move the determined nodes to virtualized operating environments;
automatically calculating a projection of costs for moving the
determined nodes to the virtualized operating environments; and
outputting an indication of the determined nodes to consider for
migration, the recommended steps, and the calculated projection of
costs.
5. The method of claim 4 wherein the automatically determining
which nodes are most appropriate to consider for migration to
virtualized operating environments incorporates input from user
solicited data from metadata or questionnaires regarding types of
workloads and environments that users considered appropriate to
migrate.
6. The method of claim 1 wherein the performing statistical
analytics on the aggregated data regarding the efficacy of
different environments for the workload of the computing nodes of
the computing system environment further utilizes machine learning
algorithms to aid in determining which nodes are most appropriate
to consider for migration to cloud technologies or to
virtualization environments.
7. The method of claim 1 wherein the performing statistical
analytics further comprises: using statistical techniques,
determining whether a user might become interested in configuration
of a particular node; and when it is determined that the user might
become interested in configuration of the particular node, adjusts
a data collection interval and report-to-server interval to provide
appropriate balance between measurement frequency and data volume
for the particular node.
8. The method of claim 1, further comprising: assigning a category
to each of the computing nodes according to a type of work
performed at the computing node; and alerting an end user to
anomalies by alerting the user when a computing node appears to
depart from the type of work performed according to the category
assigned to that computing node.
9. The method of claim 8, further comprising integrating anomalies
into the forwarding the results of the statistical analytics as
recommendations for configuration changes.
10. The method of claim 8 wherein the categories reflect at least
one of a SQL Server, a database server, a web server, an end-user
computer, a containerization server, a virtualization server, an
application server, a cloud-based service, a microservices server,
an LDAP server, a DNS server, a file server, or another server or
containerized service having a role in a system with interdependent
services on multiple nodes.
11. A computer-readable storage medium comprising contents that,
when executed, instruct a computer processor to perform the method
of claim 1.
12. A sensor-based data management and analytics system comprising:
a plurality of sensors, each sensor comprised of hardware,
software, or a combination or both, wherein each sensor is
installed into a computing system configuration, and wherein each
sensor is configured as a native sensor that receives data directly
from an operating system of a computing node of the computing
system configuration, a multiplexing sensor that is a pluggable
service configured to discover and collect configuration
information from multiple computing nodes in the computing system
configuration using Application Programming Interfaces (APIs) of
environments located behind a firewall, or a cloud sensor that
collects performance information from APIs available from a
cloud-based service; a cloud-based service structured to: receive
data from the plurality of installed sensors; automatically
aggregate the received data and perform statistical analytics on
the aggregated data regarding the efficacy of different
environments for the workload of the computing nodes of the
computing system environment; and forward the results of the
statistical analytics as recommendations for configuration
changes.
13. The data management and analytics system of claim 12 wherein
the recommendations for configuration changes comprise a
recommendation to migrate a node to a cloud-based environment or a
recommendation to migrate a node to be a hosted environment using
virtualization technologies.
14. The data management and analytics system of claim 12 wherein
the recommendations for configuration changes further comprise: a
determination of which nodes or interdependent sets of nodes are
most appropriate to consider for migration to cloud technologies; a
determination of a recommended set of steps to update and move the
determined nodes to the cloud: a calculation of a projection of
costs for moving the determined nodes to the cloud; and an
indication of the determined nodes to consider for migration, the
recommended steps, and the calculated projection of costs.
15. The data management and analytics system of claim 12 wherein
the cloud-based service further comprises: data services structured
to receive and send data to the plurality of installed sensors; a
query engine structured with a user interface that provides a query
language and executes queries to determine attributes of one or
more of the computing nodes based upon data received from the
plurality of installed sensors; an analytics services structure to
provide statistical analytics and predictive analytics based upon
data received from the plurality of installed sensors or specified
by an executed query; and data storage structured to store data
received from the plurality of installed sensors.
16. The data management and analytics system of claim 12 wherein
the plurality of sensors include sensors for mobile devices and
cloud-hosted devices.
17. A sensor installed in an operating system of a computing
system, the sensor configured as a native sensor in the operating
system and configured to obtain and communicate data regarding the
resource requirements, interdependencies, and efficacy of the
computing system in a current state of the computing system even
when the computing system is migrated to a environment located
behind a firewall such that the operating system is hosted on the
environment located behind a firewall or even when the computing
system is migrated to a cloud-hosted environment such that the
operating system is hosted on the cloud-hosted environment.
18. The sensor of claim 17 wherein the data is communicated
according to a set of received policies.
19. The sensor of claim 17 wherein the data obtained from the
sensor is aggregated with data from other sensors when the
computing system is migrated to the environment located behind a
firewall or to the cloud-based environment.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional
Patent Application No. 62/198,052, entitled "DATA ANALYTICS AND
MANAGEMENT," filed Jul. 28, 2015, which is incorporated herein by
reference in its entirety.
TECHNICAL FIELD
[0002] The present disclosure relates to methods, techniques, and
systems for analyzing and managing data and, in particular, to
methods, techniques, and systems for sensing, analyzing, and
managing time series workload data.
BACKGROUND
[0003] Computing system environments have become incredibly
complicated as many involve a variety of types of hardware,
software, nodes, use of external systems, and the like. It has
become hard to decipher when it is appropriate to take advantage of
modernization techniques such as virtualization and cloud
offloading and for what systems given the interdependency of
various computational nodes and the difficulty of collecting,
correlating, and analyzing the myriad of data required to
understand dynamic system behaviors and resource use.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] The patent or application file contains at least one drawing
executed in color. Copies of this patent or patent application
publication with color drawing(s) will be provided by the Office
upon request and payment of the necessary fee.
[0005] FIG. 1A is an overview block diagram of a Data Management
and Analytics Platform.
[0006] FIG. 1B is an overview block diagram of an example
embodiment of a Data Management and Analytics Platform.
[0007] FIG. 2 is a block diagram of an example embodiment of a
native sensor architecture in an example Data Management and
Analytics Platform.
[0008] FIG. 3 is a block diagram of an example embodiment of a
multiplexing sensor architecture in an example Data Management and
Analytics Platform.
[0009] FIG. 4 is a block diagram of an example embodiment of a
cloud sensor architecture in an example Data Management and
Analytics Platform.
[0010] FIG. 5 is an block diagram of an example embodiment of a
Secure Data Services (web service) architecture that processes
information from sensors in an example Data Management and
Analytics Platform.
[0011] FIG. 6 is a block diagram of an example embodiment of an
analytics website for accessing functionality of an example Data
Management and Analytics Platform.
[0012] FIG. 7 is an example chart with data showing physical
servers in a computing environment that could be virtualized
[0013] FIG. 8 is an example schematic illustrating the MQL
execution environment.
[0014] FIG. 9 is a block diagram showing how MQL converts a query
into an executable plan.
[0015] FIG. 10 is a block diagram illustrating the execution
process of an MQL query.
[0016] FIG. 11 is a block diagram illustrating different types of
workload management for MQL execution.
[0017] FIG. 12 is an example block diagram of a computing system
for practicing embodiments of a Data Analytics and Management
Platform.
[0018] FIG. 13 is an example flow diagram of logic for
interoperating with specialized sensors to obtain configuration and
system information.
DETAILED DESCRIPTION
[0019] Embodiments described herein provide enhanced computer- and
network-based methods, techniques, and systems for analyzing and
managing time series workload data to enable users to understand
and modernize their system configurations to incorporate external
services techniques, such as cloud technologies, and to utilize
virtualization technologies to host some of their functions in a
manner that reduces costs, increases reliability, updates
technology, and improves the overall performance of their
configurations. This may be particularly useful in an IT
environment with many interdependent systems where it is hard to
analyze and determine where externally hosted or hybrid (e.g., both
remote and local) systems might improve the overall delivery of
services and cost to end users.
[0020] Example embodiments provide a Data Management and Analytics
Platform ("DMAP") 100 as shown in FIG. 1, which enables users such
as system administrators to determine which parts of the
configuration are better migrated to external services (such as
cloud based and virtualized technologies) and which are better to
remain as locally served. The DMAP 100 itself contains a set of
cloud based services that continuously store, collect, and process
data from a set of sensors (hardware, software, or both) and make
the data available for search using a proprietary search engine and
query language and utilize a machine learning complemented analysis
to understand the obtained sensor data. For example, Metriv Secure
Data Service 121 and Big Data Store 120 store, collect, and process
data from aggregation, native, and cloud sensors 104, 105, and 112,
respectively. Analytics engine 125, using job manager 122 to
implement MQL queries 124 and machine learning tool 123, analyzes
the obtained data. Once the data from the sensors is analyzed, the
DMAP can recommend cloud and virtualization modernization
strategies that provide more efficient (and less costly) storage
and performance by utilizing cloud and virtualized technologies
such Amazon AWS and VMware; Azure, OpenStack, vCloud Air, Docker,
Citrix XenApp, Microsoft Hyper-V. The machine learning analysis
tools 123 of the DMAP provide advanced predictive analytics,
modernization costing and planning, so that better decisions with
respect to use of public/private clouds and virtualization can be
incorporated into an overall system configuration (such as an IT
infrastructure of a corporation).
[0021] Also, although certain terms are used primarily herein,
other terms could be used interchangeably to yield equivalent
embodiments and examples. In addition, terms may have alternate
spellings which may or may not be explicitly mentioned, and all
such variations of terms are intended to be included.
Metriv--an Example Data Management and Analytics Platform
[0022] Overview
[0023] Metriv is an advanced data management and analytics platform
that allows its users to visualize their IT infrastructure and
create data-based plans to modernize and optimize IT based on
critical time-series system, service, and application workload.
[0024] Metriv includes a set of cloud-based services that collect
and process data collected by Metriv Sensors and makes the data
available for search and charting on the Metriv Analytics website
and to third-party business intelligence (BI) and IT operations
management tools such as Tableau, Splunk, IBM Tivoli, Microsoft
System Center and more. Data rendering, chart rendering and search
are accomplished with the domain-specific Metriv Query Language
(MQL).
[0025] FIG. 1B shows the Metriv high-level architecture.
[0026] Data Analytics and Management Platform 150 comprises a
variety of web-based components for managing data retrieved from
various components of an IT infrastructure. Metriv sensors
communicate with a publicly accessible web service called Metriv
Secure Data Services (e.g., Mojo) that handles policy, inventory,
data collection and data rollup. When data is available from
cloud-accessible APIs (e.g. AWS, Azure, DNS), Metriv uses its own
cloud-based sensor. When API data is behind enterprise firewalls,
such as VMware discovery and internal DNS, the Metriv Sensor Hub,
installed behind the firewall, collects and multiplexes data then
forwards the data to Metriv. For data accessible only when running
locally in the native operating system (e.g. process details,
network communications, performance data, installed packages), the
Metriv Native Sensor is installed on either Windows or Linux
platforms then sends its compressed summary data to Metriv Secure
Data Services. In the example described these are software sensors
although in other embodiments, the sensors could be hardware or
some mix of hardware, firmware, or software.
[0027] The primary user interface to the system is an analytics web
site which handles user login, charting, dashboards, scenarios,
alerting, and query building. The analytics web site generates
queries to render reports, graphs, and dashboards. These queries,
expressed in the MQL (Metriv Query Language), are submitted to a
distributed query engine for parallel execution. MQL is a rich,
user-accessible query and statistics language patterned after the
Unix pipe model (which filters data through a series of pipes--or
programs) with rich built-in charting capabilities.
[0028] Appendix A, incorporated herein by reference in its
entirety, includes a set of screen displays for an example Metriv
Data Management and Analytics Platform.
[0029] Appendix B, incorporated herein by reference in its
entirety, is a User Guide for the MQL (Metriv Query Language) also
referred to as Mojo. MQL is used for specifying queries to generate
charts and reports to provide user accessible analytics.
[0030] Appendix C, incorporated herein by reference in its
entirety, is a presentation directed to showing the benefits of
using the Metriv Data Management and Analytics Platform.
[0031] Metriv Sensors
[0032] For Metriv to provide insights into configuration,
workloads, and system inter-dependencies, the system needs data
from multiple sources. Metriv uses various types of sensors that
collect and transform data then deliver the data to the Metriv
Secure Data (web) Service, where it is further transformed,
summarized, and stored for access by Metriv Analytics and other
business intelligence (BI) and analytic solutions.
[0033] Two closely related concepts embodied in the system are that
of a sensor and node. A sensor collects data about one or more
nodes. A single node may have multiple sensors collecting data for
it. For example, a single node may have a native sensor collecting
detailed time series and inventory data and a VMware sensor that is
collecting VMware-specific data. Metriv integrates both sets of
data into a single node-oriented view.
[0034] Sensors collect information about one or more nodes and send
data to the Mojo service. The three major sensor types are: [0035]
Native Sensor--Native sensors gather information that can only be
retrieved by running natively within the OS. [0036] Sensor Hub
"Juju"--A multiplexed (aggregating) pluggable service that runs
behind the corporate firewall to discover and collect information
about the local environment (such as a virtualization environment).
[0037] Cloud Sensor "Shango"--A Metriv-hosted sensor that collects
information from cloud-available APIs.
[0038] Metriv Sensors are designed to be ported to new platforms
with minimal changes.
[0039] Sensor installers are built by Metriv specifically for each
account. The sensor deployment package contains credentials to be
authorized to a specific Metriv account. No post-installation
configuration is required for the sensor to register so both
installation and installation automation are trivial.
[0040] Native Sensor for Windows and Unix
[0041] The native sensor runs locally on the native operating
system (OS) of virtual machines (VMs), cloud instances, or
non-virtualized hardware. The native sensor is a compact, efficient
service (daemon) that collects a multitude of information and can
be extended to collect more, often just with a simple policy
change. The native sensor may collect: [0042] OS and hardware
configuration and settings [0043] System-level performance counters
[0044] Running processes and related performance counters [0045]
Network connections [0046] File contents
[0047] On Windows the native sensor may also collect: [0048]
Installed applications [0049] User activity, including active
applications and web URLs accessed
[0050] Data collection is driven by policy. Only policy-requested
information is collected and monitored. Policies are extensible--a
collection of new attributes (such as registry entry on Windows) or
performance counters can generally be done with a simple policy
change. Performance counter and network connection statistics are
aggregated by the sensor, effectively reducing data volume yet
still allowing some number of minutes (e.g., 15 minute) level
detail with min, max, and mean values sampled at regular (for
example 15 second) intervals.
[0051] A local data store (e.g., SQLite) is used to save some types
of data between uploads to the Metriv Secure Data Services web
service (e.g., Mojo), so critical data is reliably collected even
when no Internet connection is available.
[0052] FIG. 2 depicts the Native Sensor internal architecture.
[0053] Metriv Sensor Hub "Juju" (Aggregating or Multiplexing
Sensor)
[0054] Juju is written in portable Python, and is currently
delivered as a Windows service. The two plugins currently included
with Juju collect VMware data and perform reverse DNS lookup of
systems behind the corporate firewall.
[0055] VMware Plugin
[0056] Juju is configured with credentials it uses to discover and
collect information about VMware hosts and guests. It tracks VMware
performance counters for each VM host and guest, along with
topology information about the VMware deployment.
[0057] For VMs monitored by both the VMware sensor and native
sensors, Metriv reports on them as a single host and shows both the
VMware and guest OS statistics combined. Of course with native
sensors, much more workload, process, and configuration information
is available compared to a VMware-only sensor.
[0058] Reverse DNS Plugin
[0059] Juju receives lists of IP addresses and performs reverse DNS
lookups to find hostnames from private IP address ranges that
cannot otherwise be resolved. This way Metriv can show system names
for computers behind a firewall.
[0060] Future Juju Plugins
[0061] The Juju sensor is built on a library that is designed and
envisioned to be extended to collect data from catalogs, databases,
log files, JMX metrics and other sources.
[0062] Metriv envisions plugins to discover nodes and topology by
using Active Directory APIs, DNS zone lists, network topology
discovery components, operations systems such as Microsoft System
Center, and other behind-the-firewall data sources.
[0063] FIG. 3 depicts an example architecture for a multiplexing
sensor.
[0064] Metriv Cloud Sensor "Shango"
[0065] The Metriv cloud sensor is granted rights to use APIs of
cloud providers such as Amazon Web Services, Microsoft Azure,
Google Cloud, and other cloud providers. Similar to the VMware
sensor, the Metriv Cloud Sensor discovers nodes and collects
performance counters, then makes them available from Metriv
Analytics and MQL.
[0066] Shango uses a Staged Event Driven Architecture (SEDA), where
successive phases of node discovery and metric collection are
queued for workers. This allows scalable, fast discovery of nodes
and node changes while allowing time-series data to be collected in
independent steps.
[0067] The stages are currently: [0068] Enqueue discovery for each
configured Metriv user account [0069] Discover and register
instance then enqueue metric discovery [0070] Discover available
metrics and enqueue metric collection [0071] Collect metrics
[0072] Each stage has its own queue and celery worker pool so
stages can be scaled as needed.
[0073] FIG. 4 is an example architecture of a cloud sensor.
[0074] Metriv Secure Data Service (e.g., Mojo Service)
[0075] Mojo is a web service that receives and processes
information from sensors. Mojo offers the following services:
[0076] Authorization and registration: Mojo verifies the sensor has
correct credentials. [0077] Node registration/re-registration: Mojo
recognizes if a node has already registered--even if the node has
been re-imaged. During node registration, Mojo also recognizes if
the node has already been discovered by another sensor to map
multiple sensors to a single node. [0078] Node migration tracking:
Mojo recognizes nodes with the native sensor as they move among VM
hosts or are migrated to cloud services via lift-and-shift imaging
techniques. [0079] Policy provider: Mojo provides data collection
policies for the sensor to follow. [0080] Event and performance
counter storage: Mojo asks sensor for all data that has not yet
been stored. [0081] Data rollup and on-the-fly analytics: Mojo
performs on-the-fly analytics, immediately updating long-term
statistics with new data.
[0082] In one implementation, as shown in FIG. 5, The Mojo web
service is written in Python on top of the Flask web application
framework and uses gevent to provide event-loop style execution of
concurrent web requests. Mojo is run behind nginx with multiple
Mojo processes on each instance. Here, a "model" component denotes
a set of classes that use standard programming techniques to drive
several key components of the system including a Cassandra database
schema.
[0083] When a sensor recognizes a node, it calls Mojo to register
the node. Mojo decides, based on the information in the initial
request, which logical node the sensor is reporting on and informs
the sensor of that node's identity.
[0084] Subsequent requests take the form of HTTP POSTS that contain
messages from one or more sensors. These messages may ask for any
pending messages (such as policy changes, immediate commands, or
upgrade notifications) and report of time series or inventory
data.
[0085] As is typical of most NoSQL databases, the storage of the
data must be done with an eye to the query patterns that the
application will use. In a typical NoSQL application, data is
stored in a de-normalized form. Mojo does multiple writes to
Cassandra to accomplish this along with a small amount of online
rollup and summarization. Other data storage capabilities may also
be incorporated in addition to or instead of Cassandra.
[0086] Metriv Analytics
[0087] The Metriv Analytics website is the primary user interface
for the system. It allows users to login, invite other authorized
uses to access data, administer sensors, and create reports, graphs
and dashboards.
[0088] The website uses several common open-source technologies
including Python, Flask, Bootstrap, require.js, Backbone, and
JQuery. It also uses Highcharts and D3 for graphing.
[0089] Apart from account management, all data is stored in
Cassandra. In most areas, the analytics site does not access
Cassandra directly. One of the driving requirements for the Metriv
system is that customers must be able to access their data in ways
not anticipated, while still allowing Metriv to provide
"guardrails" that keep users from writing queries that degrade
performance of the service for other customers.
[0090] To this end, queries, charts, and reports are done through a
query language called Metriv Query Language (MQL)--see section
below for more details--that allows efficient access, parallelized
executions, and control over resources consumed.
[0091] While MQL is at the heart of the analytics experience, it is
often the case that users can get the answers they need without
ever seeing MQL by using the interactive query building and
dashboard building facilities of the analytics site.
[0092] FIG. 6 illustrates a layer diagram of the analytics
website.
[0093] Metriv Query Language (MQL) Query, Statistics, and Chart
Rendering
[0094] The Metriv Query Language (MQL) is patterned after the Unix
pipe concept, which has become a common paradigm for machine-data
systems like AppSumo and Splunk. By selecting a paradigm familiar
to system administrators, Metriv keeps the cognitive load down as
administrator users switch from system to system.
[0095] The following example an MQL query renders the chart below
of idle time for each hour of the day by node. The query results
are displayed in FIG. 7. This highlights any daily cyclic usage
patterns:
TABLE-US-00001 sample where device.virtualizationType==''Not
Virtualized'' and counter.categoryName==''Processor'' and
counter.counterName==''% Processor Time'' | eval hour =
hourofday(_time), idle_peak = 100.0 - max | chart avg (idle_peak)
over month by device.deviceName
[0096] The chart gives a pretty good idea of physical servers which
could probably be better used by virtualizing their current
workloads.
[0097] The chart shown in FIG. 7 shows two systems (metriv-bvt-2012
and metriv-bvt-win8) that, even at their peak workload, are idle
60-80% of the time throughout the entire last month. Since these
systems are idle a significant portion of time, they would be good
candidates for virtualization on hardware that could share
resources between them and other systems. The third system,
win-9rh8pthrfr9, is idle less than 20% of the time on average, so
would not be as attractive of a candidate to share resources with
other systems.
[0098] MQL Query Evaluation
[0099] The MQL Service, which is implemented in Scala, consists of
a set of load-balanced job managers that take incoming queries,
parse them, prepare execution plans, coordinate distributed
execution, and, optionally, perform any final transformation on the
results, which are then returned in JSON format.
[0100] The MQL Service creates data-flow operator graphs to process
queries, much like most modern query processing systems. One major
difference between the MQL Service and a traditional database
server is that the MQL Service translates large portions of an
operator graph into distributed operations using Apache Spark as an
underlying execution engine.
[0101] FIG. 8 illustrates the MQL execution environment for
executing MQL queries.
[0102] The execution cluster is a set of resources manager by
Apache Mesos. Metriv uses Apache Spark as an integral part of the
execution engine, which allows it to distribute subqueries to
Mesos-scheduled worker processes for parallel execution. S3/AVRO is
used as a serialization format; however, other serialization
formats may be incorporated.
[0103] The workers are partition aware, that is they know which
Cassandra node holds the data in which they are interested, and
send the request for data directly to the Cassandra node that can
most efficiently provide it.
[0104] Because Cassandra is a NoSQL database, data is often
denormalized to provide for multiple efficient query paths. The MQL
Service provides a consistent logical model of the data while
selecting the most efficient query path under the covers. Because
of this and the worker process partition awareness, many common
queries retrieve data directly from the Cassandra cluster
node--holding it with the same efficiency and latency one might
expect from a traditional SQL indexed query.
[0105] The diagram in FIG. 9 shows the series of steps that the MQL
Service takes in order to convert a query into an executable
plan.
[0106] Once the MQL query is converted into an operator graph, the
predicates involved in the query are pushed down the tree to the
data source graph node. Each data source graph node knows what
access patterns are available and can, based on the predicates,
select the most efficient access pattern.
[0107] Once the predicates are pushed down, the top level node can
be asked to iterate its results in a stream. The diagram in FIG. 10
shows the execution process:
[0108] One important point is that most operations can be
distributed around a set of workers; however, some operations
cannot. The execution engine keeps track of each graph node's
ability to stream starting from the data source. The execution
engine has a bias for remaining in the mode that it's currently in.
Since data sources are always streaming, this means that the entire
execution tree has a bias towards streaming, but it can be
"switched over" by a node that indicates that all its upstream
callers should be executed in what is called "collecting" mode. In
collecting mode the upstream callers are centralized. In general,
collecting occurs in the "job manager" (see FIG. 8) as opposed to
by the workers, which implement streaming. FIG. 11 illustrates how
commands are executed as tasks in either streaming mode or
collecting mode.
[0109] The "stats" command is a good example of a node that does
its work using streaming mode but ends up collecting its results.
Stats uses Apache Spark accumulators to distribute the work of
collecting statistics across the pool of workers, but the final
accumulator values are collected into the Mojo job manager. From
that point on, the execution engine has a bias towards working in
local memory, although it can be forced to work in the distributed
environment, again using the "distribute" command.
[0110] Machine Learning and Metriv
[0111] This section describes Metriv use of machine learning as
well as future directions.
[0112] Current Uses
[0113] Metriv currently uses machine learning (ML) techniques in
three areas, recommendations, workload classification, and
communications analysis.
[0114] Instance Type Recommendations
[0115] Recommending instance types is, in machine learning terms, a
classification problem. You have an unknown instance (the machine
for which you want a recommendation) and training data, which is a
collection of instances which are labeled with their AWS (or GPC,
Azure, . . . ) instance type. This is the classical setup of a
supervised classification algorithm.
[0116] There are a number of classification algorithms available to
use with Metriv. Since a premium is placed on being able to provide
the user with an explanation of why a certain recommendation is
made, Metriv prefers to stay away from "black-box" algorithms.
Therefore, the recommendations system is based on a kNN (k nearest
neighbor) technique. KNN requires that each instance be encoded as
a vector and training data labelled with the appropriate class.
Metriv encodes successful migrations as a vector of various
attributes including: [0117] CPU (in MHz) [0118] Memory (in MB)
[0119] Ephemeral Storage Requirements (in GB) [0120] Required
Ephemeral IOPS [0121] Required EBS (or equivalent) IOPS [0122]
Required network throughput.
[0123] Then an instance for which a recommendation is needed is
converted into the same vector space, based on the method required.
In other words, is it desirable to provision as previously
configured, by the 95% utilization level, or at the average
utilization? The nearest k successful migrations are collected and
the instance type that represents the largest proportion of the k
successful migrations is chosen as the final recommendation, with
some restrictions.
[0124] Because of the granularity of control afforded by EBS
(storage) through differing storage types, and provisioned IOPS
volumes, recommending a storage configuration is much more
straightforward and requires no machine learning techniques.
[0125] There are a number of small complications and one large one.
First the successful migration vectors must be filtered to
eliminate possibilities that would be under-provisioned based on a
hard-constraint. For instance, Metriv will not recommend an
instance type that under-provisions either ephemeral disk or
memory. Instead, it might allow slight under-provisioning of CPU,
IOPS, or Network throughput. Metriv will also not provision
ephemeral disk space in place of on-premises persistent storage. In
addition, the system may scale the successful migration data to
allow for "headroom" in the recommendation.
[0126] The large complication is that simply using raw CPU, Memory,
disk, and network traffic does not take into account how one might
want to trade-off one requirement vs. another. The relative scale
of these elements can have a dramatic effect on recommendations. To
account the possibility of trade-offs, Metriv must scale the
individual elements of the feature vectors according to how they
should be weighted in the trade-offs. The system does this by
converting the feature vector into currency (dollars). To look at
it from a certain perspective, if one says that he would want to
spend X dollars on IOPS (or CPU, etc.) if possible, then what is
the closest instance type? The conversion from an individual
feature to currency is standardized across providers using a
multiple regression analysis of pricing data.
[0127] Instance Recommendation Futures
[0128] Currently training instance data are sparse and synthesized
from published information. This has advantages from a computation
perspective and few downsides. Eventually, Metriv would choose to
take a sample of successful migrations as training data. The
eventual goal would be to display to the using information that
says, "Of 213 successful migrations of similar workloads, 80% were
onto m3.2xlarge instances." The explanatory power of such a
statement would be very desirable.
[0129] In some instances, Metriv track migrations automatically,
using some infrastructure in place to track this data.
[0130] Workload Classification
[0131] The topic of workload classification is difficult if you are
unwilling to settle for a simple rule-based system because the
number of potential indicators you could use to make a
determination of workload is very large. In Metriv, users are
allowed to tag servers as performing a certain workload. Metriv
then uses that as training data (across all customers) and using
machine learning techniques to classify each system according to
its actual usage.
[0132] First, workload on a machine is converted to a large vector.
This vector is tending to be about 60K elements in length. In this
vector are elements which represent what ports the system is
communicating on (one element per port), what processes are
running, CPU utilization by process, read IO per process, and write
IO per process. With such a large set of possible measures on which
to base decisions on, Metriv must be careful to choose an algorithm
that will provide good results and not require excessive
computations resources. Classification trees are a current favored
approach although other approaches may yield similar results.
[0133] Classification trees work by trying to find a simple
condition like, "SQL Server is running" that can be used to explain
the labels on the training data. Then, within both the true and
false branches of that condition, the algorithm continues the same
process, recursively, until it reaches a maximum depth. Then paths
which don't significantly improve the total performance of the
classification tree are pruned out. The end result looks something
like:
TABLE-US-00002 If (feature running:bzip2 (30291) <=
-0.031901041666666664) If (feature port:5051 (25690) <=
-0.004014615767776769) If (feature port:10080 (80) <=
0.06545305860040496) If (feature port:3268 (24371) <=
9.218645599886592) If (feature port:10123 (123) <=
-0.022074842409859904) If (feature running:cbengine (30295) <=
-1.0) If (feature port:1688 (7395) <= -0.015331052639731254)
Predict: _NONE (31.0 (prob = 1.0)) Else (feature port:1688 (7395)
> -0.015331052639731254) If (feature port:135 (3769) <=
-0.0020500225574119103) Predict: _NONE (31.0 (prob = 1.0)) Else
(feature port:135 (3769) > -0.0020500225574119103) Predict: Teds
(25.0 (prob = 1.0)) Else (feature running:cbengine (30295) >
-1.0) Predict: Windows Domain Controller (29.0 (prob = 1.0)) Else
(feature port:10123 (123) > -0.022074842409859904) Predict:
Microsoft SCCM Site Server (16.0 (prob = 1.0)) Else (feature
port:3268 (24371) > 9.218645599886592) If (feature port:139
(4198) <= -0.05097777503734046) Predict: _NONE (31.0 (prob =
1.0)) . . . ETC
[0134] Other Considerations
[0135] Currently the operating system is not part of the feature
vector, therefore the classification tree algorithm sometimes
latches onto the existence of a running process as a proxy for OS
type (for instance "kworker"). This does not affect accuracy but
explanations derived from decisions trees would read much more
nicely if there referred to the OS type instead of a random but
ubiquitous running process.
[0136] The Metriv system currently use the classification tree to
detect a single workload class. In cases where multiple workloads
are performed by a single system, future classifiers will use
logistic regression against a feature vector that includes
performance attributes of processes using resources on each
system.
[0137] Network Topology Analysis
[0138] The Network topology support in scenarios uses
graph-theoretic algorithms to help tame the chaos as it's easy for
a graph of all connected systems to quickly expand to cover the
entire network. While these algorithms don't traditionally fall
under the category of machine learning, they certainly appear that
way to an end-user.
[0139] Currently, a user can select a set of nodes to be included
in a scenario. A scenario represents a set of system what the user
would like to migrate. A key concern is whether the user is
forgetting other systems on which the currently included systems
are dependent. To help highlight these systems, the Metriv system
displays all the dependencies on which the included systems depend
both directly and indirectly. To do this the system represents the
communications between all nodes on the network as an undirected
graph. Then it uses a distributed graph clique algorithm to pick
out inter-communicating subsets. Finally, the system presents to
the user only those subsets which include at least one member of
the scenario.
[0140] The result can be very large due to the presence of systems
that we call "keystone" systems. These might be DNS server or other
highly-utilized service. Since the inclusion of these systems in a
scenario can result in the inclusion (by transitivity) of all the
systems in the network, we allow the user to mark systems and
keystone systems. This has the effect of removing that node from
the graph clique calculation.
[0141] Other Considerations
[0142] N-tiered application detection. A extension of the current
graph-theoretic algorithms can be used to find n-tiered
applications. The steps are as follows:
[0143] 1. Build the communication graph using only systems and
their links where the service type or communication type might be
part of an applications cluster. For instance, for a traditional
n-tiered application, the algorithm might include all database
servers, all http servers and all node that communicate with http
servers.
[0144] 2. Run the graph cliques algorithm on the resulting
graph.
[0145] 3. Return just the cliques that have all the conditions
specified in the first step (i.e. the clique must contain at least
one database server, at least one http server, and at least one
node communicating with one of the http servers).
[0146] 4. Each clique then represents a separate n-tiered
application.
[0147] In addition, Metriv has the ability to filter out
communications links that are sporadic. There's also a capability
which collapses multiple systems into a single system for the
purposes of graph calculations. The ability to collapse multiple
systems into a single system is also possible.
[0148] The following natural progressions are planned to using
currently-held data to provide additional capabilities
[0149] Future Applications of ML Techniques in Metriv [0150]
Periodicity detection will recognize periodic workloads where auto
scaling groups should be used to reduce costs and enhance
scalability at load. [0151] Per-machine outlier detection based on
PCA or auto-encoders may be used to find systems which suddenly
change behavior. [0152] Per-workload class outlier detects when a
node classified as a SQL Server stops behaving like a healthy SQL
Server. [0153] Predicting out-of-resource conditions with
time-series modelling to detect possible out-of-resource issues in
advance of the problem manifesting itself. [0154] Automatic
detection and labelling of Keystone service nodes, such as file
servers, DNS servers, authentication services. Automatic discovery
of services such as these can also enhance n-tier service discovery
by providing cut points for topology graphs.
[0155] Other Enhancements
[0156] Application/Service Mapping
[0157] By mapping command lines into higher-level apps, it is
possible to both give friendly names to long command lines and to
also map similar command lines to a single, common service. Metriv
has planned a UI-based naming process initially, but expects to use
supervised learning techniques once sufficient training data is
available to automatically map various command lines to
services.
[0158] The mapping of multiple command lines into higher-level name
app or service allows reduction of the number of features used in
workload classification and increases similarities among nodes
where small differences in the command line might otherwise cause
ML algorithms to consider them completely separate
applications.
[0159] Inside-the-Firewall Proxy
[0160] Since enterprise IT systems often do not have direct access
to the Internet, native Metriv agents running on them must instead
connect to a Mojo service that proxies communications between them
and the Metriv cloud service.
[0161] Extend Native Sensor to Collect Per-Process Network
Activity
[0162] Per-process network activity is to be captured and
correlated with TCP sessions in order to understand which processes
are communicating among nodes and with how much traffic volume.
This information will be used to enhance topology analysis and the
Scenarios UI to include process and traffic detail.
[0163] Component-Aware Plugins
[0164] Service multiplexing components such as web servers (NGINX,
Apache, IIS) and databases (SQL, MySQL, PostgreSQL) can have
configuration and versions interrogated via their APIs or by
reading configuration data from the filesystem. When systems are
discovered via running processes, a policy will direct the Native
Sensor to find and deliver configuration information to Metriv. We
envision systems to partially or fully automate migration or of
such apps to cloud services or into container services such as
Docker, along with identifying possible risks associated with
running obsolete or insecure software versions.
[0165] Automated N Tier Service Discovery
[0166] Using the N Tier Service discovery approaches described in
the Machine Learning section, Metriv will automatically identify
and name migration Scenarios.
[0167] Docker Plugin for Metriv Sensor Hub
[0168] Docker discovery and performance APIs fit well into the
current Metriv Sensor Hub plugin model and will be a powerful and
straightforward extension of Metriv capabilities into containerized
services.
Example Computing System Environment
[0169] Example embodiments described herein provide applications,
tools, data structures and other support to implement a data
management and analytics platform to be used for analyzing and
optimizing IT infrastructure using modernization techniques such as
cloud deployment and virtualization. Other embodiments of the
described techniques may be used for other purposes. In this
description, numerous specific details are set forth, such as data
formats and code sequences, etc., in order to provide a thorough
understanding of the described techniques. The embodiments
described also can be practiced without some of the specific
details described herein, or with other specific details, such as
changes with respect to the ordering of the logic, different logic,
etc. Thus, the scope of the techniques and/or functions described
are not limited by the particular order, selection, or
decomposition of aspects described with reference to any particular
routine, module, component, and the like.
[0170] FIG. 12 is an example block diagram of an example computing
system that may be used to practice embodiments of a DMAP described
herein. Note that one or more general purpose virtual or physical
computing systems suitably instructed or a special purpose
computing system may be used to implement an DMAP. Further, the
DMAP may be implemented in software, hardware, firmware, or in some
combination to achieve the capabilities described herein. The DMAP
1210 shown is an example computer system that may be used to
provide the web services shown in Figure A above. This (server
side) computing system 1200 may be connected via one or more
networks, e.g. network 1250, to one or more sensors 1265 or client
computing systems 1260.
[0171] The computing system 1200 may comprise one or more server
and/or client computing systems and may span distributed locations.
In addition, each block shown may represent one or more such blocks
as appropriate to a specific embodiment or may be combined with
other blocks. Moreover, the various blocks of the DMAP 1210 may
physically reside on one or more machines, which use standard
(e.g., TCP/IP) or proprietary interprocess communication mechanisms
to communicate with each other.
[0172] In the embodiment shown, computer system 1200 comprises a
computer memory ("memory") 1201, a display 1202, one or more
Central Processing Units ("CPU") 1203, Input/Output devices 1204
(e.g., keyboard, mouse, CRT or LCD display, etc.), other
computer-readable media 1205, and one or more network connections
1206. The DMAP 1210 is shown residing in memory 1201. In other
embodiments, some portion of the contents, some of, or all of the
components of the DMAP 1210 may be stored on and/or transmitted
over the other computer-readable media 1205. The components of the
DMAP 1210 preferably execute on one or more CPUs 1203 and manage
the acquisition and analysis of time series workload data as
described herein. Other code or programs 1230 and potentially other
data repositories, such as data repository 1220, also reside in the
memory 1201, and preferably execute on one or more CPUs 1203. Of
note, one or more of the components in FIG. 12 may not be present
in any specific implementation. For example, some embodiments
embedded in other software may not provide means for user input or
display.
[0173] In a typical embodiment, the DMAP 1210 includes one or more
workload sensor interfaces 1211, one or more secure data services
1212, one or more query engines, one or more analytic and/or
machine learning engines 1214, and one or more data storage
interfaces. In at least some embodiments, the one or more data
storage interfaces 1217 is provided external to the DMAP and is
available, potentially, over one or more networks 1250. Other
and/or different modules may be implemented. In addition, the DMAP
may interact via a network 1250 with external analytics or machine
learning code 1255 that uses results computed by the DMAP 1210 to
generate recommendations, one or more client computing systems
1260, and/or one or more client system sensor devices 1265. Also,
of note, the data repositories 1215 and 1216 may be provided
external to the DMAP as well, for example in a knowledge base
accessible over one or more networks 1250.
[0174] In an example embodiment, components/modules of the DMAP
1210 are implemented using standard programming techniques. For
example, the DMAP 1210 may be implemented as a "native" executable
running on the CPU 1203, along with one or more static or dynamic
libraries. In other embodiments, the DMAP 1210 may be implemented
as instructions processed by a virtual machine. A range of
programming languages known in the art may be employed for
implementing such example embodiments, including representative
implementations of various programming language paradigms,
including but not limited to, object-oriented, functional,
procedural, scripting, and declarative.
[0175] The embodiments described above may also use well-known or
proprietary, synchronous or asynchronous client-server computing
techniques. Also, the various components may be implemented using
more monolithic programming techniques, for example, as an
executable running on a single CPU computer system, or
alternatively decomposed using a variety of structuring techniques
known in the art, including but not limited to, multiprogramming,
multithreading, client-server, or peer-to-peer, running on one or
more computer systems each having one or more CPUs. Some
embodiments may execute concurrently and asynchronously and
communicate using message passing techniques. Equivalent
synchronous embodiments are also supported.
[0176] In addition, programming interfaces to the data stored as
part of the DMAP 1210 (e.g., in the data repositories 1216 and
1217) can be available by standard mechanisms such as through C,
C++, C#, and Java APIs; libraries for accessing files, databases,
or other data repositories; through scripting languages such as
XML; or through Web servers, FTP servers, or other types of servers
providing access to stored data. The data repositories 1215 and
1216 may be implemented as one or more database systems, file
systems, or any other technique for storing such information, or
any combination of the above, including implementations using
distributed computing techniques.
[0177] Also the example DMAP 1210 may be implemented in a
distributed environment comprising multiple, even heterogeneous,
computer systems and networks. Different configurations and
locations of programs and data are contemplated for use with
techniques of described herein. In addition, the server and/or
client may be physical or virtual computing systems and may reside
on the same physical system. Also, one or more of the modules may
themselves be distributed, pooled or otherwise grouped, such as for
load balancing, reliability or security reasons. A variety of
distributed computing techniques are appropriate for implementing
the components of the illustrated embodiments in a distributed
manner including but not limited to TCP/IP sockets, RPC, RMI, HTTP,
Web Services (XML-RPC, JAX-RPC, SOAP, etc.) and the like. Other
variations are possible. Also, other functionality could be
provided by each component/module, or existing functionality could
be distributed amongst the components/modules in different ways,
yet still achieve the functions of an DMAP.
[0178] Furthermore, in some embodiments, some or all of the
components of the DMAP 1210 may be implemented or provided in other
manners, such as at least partially in firmware and/or hardware,
including, but not limited to one or more application-specific
integrated circuits (ASICs), standard integrated circuits,
controllers executing appropriate instructions, and including
microcontrollers and/or embedded controllers, field-programmable
gate arrays (FPGAs), complex programmable logic devices (CPLDs),
and the like. Some or all of the system components and/or data
structures may also be stored as contents (e.g., as executable or
other machine-readable software instructions or structured data) on
a computer-readable medium (e.g., a hard disk; memory; network;
other computer-readable medium; or other portable media article to
be read by an appropriate drive or via an appropriate connection,
such as a DVD or flash memory device) to enable the
computer-readable medium to execute or otherwise use or provide the
contents to perform at least some of the described techniques. Some
or all of the components and/or data structures may be stored on
tangible, non-transitory storage mediums. Some or all of the system
components and data structures may also be stored as data signals
(e.g., by being encoded as part of a carrier wave or included as
part of an analog or digital propagated signal) on a variety of
computer-readable transmission mediums, which are then transmitted,
including across wireless-based and wired/cable-based mediums, and
may take a variety of forms (e.g., as part of a single or
multiplexed analog signal, or as multiple discrete digital packets
or frames). Such computer program products may also take other
forms in other embodiments. Accordingly, embodiments of this
disclosure may be practiced with other computer system
configurations.
Example Data Analytics Processes
[0179] FIG. 13 is an example flow diagram of logic for
interoperating with specialized sensors to obtain configuration and
system information. In an example DMAP, the services shown in FIGS.
1A and 12 perform logic 200 to obtain information regarding
configurations in which sensors are installed and to set policies
for these sensors regarding, for example, what is being measured
and frequencies for doing so.
[0180] More specifically, in block 1301, the DMAP installs a
plurality of sensors into the various nodes of the configuration
being monitored. For examples, these sensors may be of the native
type, a pluggable sensor into, for example, a virtualized
environment that uses Application Programming Interfaces (API) of
the virtual environment to obtain data, or a cloud sensor that uses
cloud API to measure aspects of the node.
[0181] In some example DMAPs, the native sensors that are installed
into the operating system of a node "move" with the node so that
information from the node's operating system can be obtained from
these sensors even when the node has been migrated to a virtualized
environment or the cloud. In these latter cases, the data from the
native sensors are combined (e.g., aggregated) with data from other
sensors later installed on the node (e.g., cloud or virtual
pluggable sensors) to continue to obtain even more accurate and
abundant information.
[0182] In block 1302, the DMAP is able to set or communicate
policies to the plurality of sensors. For example, in the Metriv
example described above, these policies can be communicated via the
Metriv Secure Data Services web service.
[0183] In block 1303, the DMAP receives data from one or more of
the plurality of sensors according to the set policies.
[0184] In block 1304, the DMAP aggregates the data received from
the various sensors and performs statistical analysis on the data.
This statistical analysis is used to determine whether to recommend
any changes to the configuration of one or more nodes and what
steps the user should take to migrate. In some instances, the
statistical techniques include cosine similarity or k-means
clustering algorithms. In some instances the analysis involves
using predictive analytics such as those available through standard
machine learning techniques (e.g., Bayesian algorithms, clustering,
etc.). In some instances, the analysis takes into account data
obtained from humans using other systems, such as through answers
to questionnaires to train the machine learning algorithms. In this
case, aspects of "socialized" data may be obtained from live human
experience of similar migrations, etc.
[0185] For example, in some example DMAPs, determining which nodes
are most appropriate to consider for migration to virtualized
environments or to the cloud uses data collected by other sensors
during successful migrations to populate feature vectors that are
used by k-nearest neighbors or Support Vector Machines (SVM)
algorithms to find similar systems or interdependent sets of nodes
to predict behavior of similar migrations. These feature vectors or
SVM algorithms are then used to determine whether a node being
considered for migration is likely to be a successful migration or
not.
[0186] Also, although not shown, in some instances it is desirable
to consider the interdependencies when determining migration
recommendations. For example, analysis of the interdependencies
indicated by network traffic and other data such as VMware topology
of virtual or physical computers can be used to determine which
subset of an interacting set of computers to move as a single
project. The DMAP can accomplish this by using graph analysis
algorithms to determine groups of isolated computers. Additionally,
the DMAP can generate all the possible ways to break dependencies
among these isolated groups of computer nodes (or computer systems)
and then use a filtering step to remove non-viable solutions
followed by a scoring step to select the top-n solutions. The
filtering step can be accomplished by heuristics but may be more
effectively accomplished using Support Vector Machines (SVM) or
other machine learning solutions. The scoring step can likewise be
accomplished by heuristics but is can also be implemented using a
Weighted Least Squares (WLS) algorithm. Other algorithms and
machine learning techniques may be similarly incorporated.
[0187] In addition, although not shown, a similar analysis and
techniques can be used to determine a recommended set of steps to
update and move the determined nodes to virtualized operating
environments by using sensor data and user interactions with the
website to track successful migration steps and apply the same
recommended steps to new migrations.
[0188] In addition, although not shown, a similar analysis and
techniques can be used to calculate and communicate a projection of
costs for moving the determined nodes to the virtualized operating
environments by using k-nearest neighbors to discover similar
performance and workload profiles available from commercial cloud
providers and/or virtualization environment providers.
[0189] In block 1305, the DMAP responds to user queries or system
queries specified, for example, using the MQL language, and
provides visualizations using a web service (e.g., Mojo services in
FIG. 1A/1B, Secure Data Services 1212 in FIG. 12).
[0190] In block 1306, the DMAP forwards (e.g., outputs,
communicates, sends, etc.) results of the analytics typically as
recommendations for configuration adjustments where desirable,
including, for example, the recommended steps and a projection of
costs.
[0191] As indicated earlier, some of these logically blocks may or
may not be present in any particular implementation. In addition,
one or more of these blocks of logic may be performed in different
orders. Also, additional blocks of logic may be integrated into the
flow shown in FIG. 13.
[0192] In some instances, the analysis described with reference to
FIG. 13 is aided by categorizing the nodes in a computing system.
For example, the nodes may be categorized according to the type of
work performed by the node. Then, when a computing node appears to
depart from the type of work performed according to the assigned
category, a user may be alerted using anomaly detection and outlier
detection algorithms on the data collected by the sensors. In some
instances of an example DMAP, the categories reflect one or more of
a SQL Server, a database server, a web server, an end-user
computer, a containerization server, a virtualization server, an
application server, a cloud-based service, a microservices server,
an LDAP server, a DNS server, a file server, or another server or
containerized service that has a role in a system with
interdependent services on multiple nodes.
CONCLUSION
[0193] All of the above U.S. patents, U.S. patent application
publications, U.S. patent applications, foreign patents, foreign
patent applications and non-patent publications referred to in this
specification and/or listed in the Application Data Sheet,
including but not limited to U.S. Provisional Patent Application
No. 62/198,052, entitled "DATA ANALYTICS AND MANAGEMENT," filed
Jul. 28, 2015, which is incorporated herein by reference in its
entirety.
[0194] From the foregoing it will be appreciated that, although
specific embodiments have been described herein for purposes of
illustration, various modifications may be made without deviating
from the spirit and scope of the invention. For example, the
methods and systems for performing data analytics and management
discussed herein are applicable to other architectures other than a
cloud and virtualization architecture. Also, the methods and
systems discussed herein are applicable to differing protocols,
communication media (optical, wireless, cable, etc.) and devices
(such as wireless handsets, electronic organizers, personal digital
assistants, portable email machines, game machines, pagers,
navigation devices such as GPS receivers, etc.).
* * * * *