U.S. patent application number 12/008954 was filed with the patent office on 2009-06-04 for external system integration into automated attribute discovery.
Invention is credited to Abe Achkinazi, Rajendra Bhagwatisingh Panwar.
Application Number | 20090144319 12/008954 |
Document ID | / |
Family ID | 61281181 |
Filed Date | 2009-06-04 |
United States Patent
Application |
20090144319 |
Kind Code |
A1 |
Panwar; Rajendra Bhagwatisingh ;
et al. |
June 4, 2009 |
External system integration into automated attribute discovery
Abstract
Methods and apparatus to transform attribute data about assets
in a source system data model into attribute data about the same
assets in a target system data model. The first step is to extract
the necessary attribute data from attribute data collected about
inventory assets of a business entity needed to populate the
attributes in objects representing those inventory assets in a
target system data model. Transformation rules are written which
are designed to make all conversions necessary in semantics, units
of measure, etc. to transform the source system attribute data into
attribute data for the target system which has the proper data
format. These transformation rules are executed on a computer on
the extracted attribute data and the transformed attribute data is
stored in an ER model. In the preferred embodiment, the
transformation rules are object-oriented in that transformation
rules for subtypes can be inherited from their parent types or
classes. An export adapter which is capable of invoking the
application programmatic interface of the target system CMDB is
then used to export the transformed attribute data stored in the ER
model to the target system CMDB. A heuristic method to create
self-consistent data blocks without exceeding a maximum size limit
involves loading instances of entity types and all related
instances in the order of decreasing connectivity metric.
Inventors: |
Panwar; Rajendra Bhagwatisingh;
(Mountain View, CA) ; Achkinazi; Abe; (Sunnyvale,
CA) |
Correspondence
Address: |
RONALD CRAIG FISH
P.O.BOX 820
LOS GATOS
CA
95031
US
|
Family ID: |
61281181 |
Appl. No.: |
12/008954 |
Filed: |
January 14, 2008 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11998635 |
Nov 29, 2007 |
|
|
|
12008954 |
|
|
|
|
Current U.S.
Class: |
1/1 ;
707/999.102; 707/E17.044 |
Current CPC
Class: |
G06F 9/54 20130101; G06F
16/289 20190101; G06F 16/258 20190101; G06F 16/212 20190101; G06F
17/30569 20130101; G06F 16/254 20190101; G06F 16/25 20190101; G06F
16/288 20190101 |
Class at
Publication: |
707/102 ;
707/E17.044 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A process to build self-consistent data blocks for a CMDB system
or any other external system using a data block maximum size,
comprising: A) dividing a graph of entity types into independent
groups, each group including entity types related to other entity
types by parent-child relationships; B) computing the cardinality
of each relationship; C) computing a connectivity metric for each
entity type; D) sorting the entity types into decreasing order of
connectivity metric; E) loading related instances of related entity
types into one or more data blocks so as to not substantially
exceed a maximum data size for each block starting with the entity
type of highest connectivity metric and working downward on the
sorted list of entity types generated in step D; F) loading into
said one or more data blocks any instances of entity types not
related to any other entity types; G) marking each instance which
has been loaded into a data block as done; and H) repeating steps
E, F and G until all instances of all entity types of all
independent groups have been loaded into one or more data
blocks.
2. The process of claim 1 further comprising a step B1 comprising
calculating a weighting ratio for each parent-child relationship by
computing the ratio S2 divided by S1, where S2 is the data size of
an instance of the child type entity in said relationship and S1 is
the data size of an instance of the parent type entity, and further
comprising a step B2 of multiplying said cardinality of each said
relationship by said weighting ratio of said relationship so as to
generate a normalized cardinality for each said relationship before
computing said connectivity metric for each said entity type, and
wherein step C comprises computing said connectivity metric for
each entity types using said normalized cardinality for each
relationship of said entity type.
3. The process of claim 1 wherein step E comprises: I) starting
with the independent group which contains the entity type with the
highest connectivity metric, hereafter referred to as Tmax; J)
calculating N divided by K where N is the maximum block size
expressed as the maximum number of entities and K is the number of
entities in the group; K) load N/K instances of entity type Tmax
into a data block referred to as currentBlock; L) load said
currentBlock with all instances related to said N/K instances of
entity type Tmax at relation distance one from said instances of
entity type Tmax loaded in step K; M) for each related instance
loaded in step L, determine other related instances of entity types
at relation distance one from said instances of entity types loaded
in step L; N) load into said current block all instances determined
in step M which are related to instances loaded in step L; O)
repeat search and loading processing like that of step M and N as
many times as necessary to load instances related to said instances
of entity type Tmax loaded in step K at each level of relation in
said group until all instances of said group are loaded into said
currentBlock.
4. The process of claim 2 wherein step E comprises: I) starting
with the independent group which contains the entity type with the
highest connectivity metric calculated using said normalized
cardinalities; J) calculating N divided by K where N is the maximum
block size expressed as the maximum number of entities and K is the
number of entities in the group; K) load N/K instances of entity
type Tmax into a data block referred to as currentBlock; L) load
said currentBlock with all instances related to said N/K instances
of entity type Tmax at relation distance one from said instances of
entity type Tmax loaded in step K; M) for each related instance
loaded in step L, determine other related instances of entity types
at relation distance one from said instances of entity types loaded
in step L; N) load into said current block all instances determined
in step M which are related to instances loaded in step L; O)
repeat search and loading processing like that of step M and N as
many times as necessary to load instances related to said instances
of entity type Tmax loaded in step K at each level of relation in
said group until all instances of said group are loaded into said
currentBlock.
5. The process of claim 2 wherein step E comprises: I) starting
with the independent group which contains the entity type with the
highest connectivity metric calculated using said normalized
cardinalities; J) calculating N divided by K where N is the maximum
block size expressed as the maximum number of entities and K is the
number of entities in the group; K) load N/K instances of entity
type Tmax into a data block referred to as currentBlock; L) load
said currentBlock with all instances related to said N/K instances
of entity type Tmax at relation distance one from said instances of
entity type Tmax loaded in step K; M) for each related instance
loaded in step L, determine other related instances of entity types
at relation distance one from said instances of entity types loaded
in step L; N) load into said current block all instances determined
in step M which are related to instances loaded in step L; O)
repeat search and loading processing like that of step M and N as
many times as necessary to load instances related to said instances
of entity type Tmax loaded in step K at each level of relation in
said group until all instances of said group are loaded into said
currentBlock; P) starting with the group in which the entity type
with the second highest connectivity metric resides, repeat steps J
through O until all instances of said entity type with said second
highest connectivity metric resides and instances related thereto
at all levels of relation are loaded into a data block; and Q)
repeating processing like step P as many times as necessary for all
entity types in all groups until all instances of all entity types
have been loaded into one or more data blocks.
6. The process of claim 1 wherein step E comprises: I) starting
with the independent group which contains the entity type with the
highest connectivity metric; J) calculating N divided by K where N
is the maximum block size expressed as the maximum number of
entities and K is the number of entities in the group; K) load N/K
instances of entity type Tmax into a data block referred to as
currentBlock; L) load said currentBlock with all instances related
to said N/K instances of entity type Tmax at relation distance one
from said instances of entity type Tmax loaded in step K; M) for
each related instance loaded in step L, determine other related
instances of entity types at relation distance one from said
instances of entity types loaded in step L; N) load into said
current block all instances determined in step M which are related
to instances loaded in step L; O) repeat search and loading
processing like that of step M and N as many times as necessary to
load instances related to said instances of entity type Tmax loaded
in step K at each level of relation in said group until all
instances of said group are loaded into said currentBlock; P)
starting with the group in which the entity type with the second
highest connectivity metric resides, repeat steps J through O until
all instances of said entity type with said second highest
connectivity metric resides and instances related thereto at all
levels of relation are loaded into a data block; and Q) repeating
processing like step P as many times as necessary for all entity
types in all groups until all instances of all entity types have
been loaded into one or more data blocks.
7. The process of claim 1 wherein step A comprises: A1) adding data
to relationship tables that record parent-child entity
relationships between entity types in said graph so as to make the
graph non-directed such that the parent entity type of any entity
type may be determined from information in said relationship table;
A2) starting with any first entity on said non-directed graph, find
all other entities on said graph which are either parent or child
to said first entity and putting all the entities found in a first
group along with their relationships; A3) proceeding to a parent or
child entity type found in step A2, referred to hereafter as the
second entity, finding all entity types which are related to said
second entity as either parent or child, and adding the entity
types so found to said first group along with their relationships;
A4) repeating step A3 for every other parent or child entity of
said first entity and putting all found entities in said first
group along with their relationships; A5) proceeding to another
entity type node in said graph which is one of the entity types
found in steps A3 or A4, which will hereafter be referred to as the
third entity, and finding all entity types which are related to
said third entity as either parent or child, and adding said entity
types so found to said first group along with their relationships;
A6) repeating the steps of selecting an entity node, finding all
parent and child entities types of said selected entity node, and
adding the found entity types to said first group along with their
relationships until no further new entity types not previously
found can be found, and declaring said first group completed; and
A7) repeating steps A1 through A6 to create another independent
group; and A8) repeating step A7 as many times as necessary to
create as many independent groups as are necessary to include all
entity types on said graph.
8. The process of claim 2 wherein step C comprises adding up the
normalized cardinality for all incoming and outgoing relationships
of an entity to calculate said connectivity metric.
9. A computer-readable medium which bears instructions which, when
executed by a computer causes said computer to carry out a process
to build self-consistent data blocks for a CMDB system or any other
external system using a data block maximum size, said process
comprising: A) dividing a graph of entity types into independent
groups, each group including entity types related to other entity
types by parent-child relationships; B) computing the cardinality
of each relationship; C) computing a connectivity metric for each
entity type; D) sorting the entity types into decreasing order of
connectivity metric; E) loading related instances of related entity
types into one or more data blocks so as to not substantially
exceed a maximum data size for each block starting with the entity
type of highest connectivity metric and working downward on the
sorted list of entity types generated in step D; F) loading into
said one or more data blocks any instances of entity types not
related to any other entity types; G) marking each instance which
has been loaded into a data block as done; and H) repeating steps
E, F and G until all instances of all entity types of all
independent groups have been loaded into one or more data
blocks.
10. The computer-readable medium of claim 9 further bearing
computer-readable instructions which, when executed, cause a
computer to carry out a further step B1 comprising calculating a
weighting ratio for each parent-child relationship by computing the
ratio S2 divided by S1, where S2 is the data size of an instance of
the child type entity in said relationship and S1 is the data size
of an instance of the parent type entity, and further bearing
computer-readable instructions which, when executed, cause a
computer to carry out a further step B2 of multiplying said
cardinality of each said relationship by said weighting ratio of
said relationship so as to generate a normalized cardinality for
each said relationship before computing said connectivity metric
for each said entity type, and further bearing computer-readable
instructions which, when executed, cause a computer to carry out
step C by computing said connectivity metric for each entity types
using said normalized cardinality for each relationship of said
entity type.
11. The computer-readable medium of claim 9 further bearing
computer-readable instructions which, when executed, cause a
computer to carry out step E by performing the following steps: I)
starting with the independent group which contains the entity type
with the highest connectivity metric, hereafter referred to as
Tmax; J) calculating N divided by K where N is the maximum block
size expressed as the maximum number of entities and K is the
number of entities in the group; K) load N/K instances of entity
type Tmax into a data block referred to as currentBlock; L) load
said currentBlock with all instances related to said N/K instances
of entity type Tmax at relation distance one from said instances of
entity type Tmax loaded in step K; M) for each related instance
loaded in step L, determine other related instances of entity types
at relation distance one from said instances of entity types loaded
in step L; N) load into said current block all instances determined
in step M which are related to instances loaded in step L; O)
repeat search and loading processing like that of step M and N as
many times as necessary to load instances related to said instances
of entity type Tmax loaded in step K at each level of relation in
said group until all instances of said group are loaded into said
currentBlock.
12. The computer-readable medium of claim 9 further bearing
computer-readable instructions which, when executed, cause a
computer to carry out step E by performing the following steps: I)
starting with the independent group which contains the entity type
with the highest connectivity metric calculated using said
normalized cardinalities; J) calculating N divided by K where N is
the maximum block size expressed as the maximum number of entities
and K is the number of entities in the group; K) load N/K instances
of entity type Tmax into a data block referred to as currentBlock;
L) load said currentBlock with all instances related to said N/K
instances of entity type Tmax at relation distance one from said
instances of entity type Tmax loaded in step K; M) for each related
instance loaded in step L, determine other related instances of
entity types at relation distance one from said instances of entity
types loaded in step L; N) load into said current block all
instances determined in step M which are related to instances
loaded in step L; O) repeat search and loading processing like that
of step M and N as many times as necessary to load instances
related to said instances of entity type Tmax loaded in step K at
each level of relation in said group until all instances of said
group are loaded into said currentBlock.
13. The computer-readable medium of claim 10 further bearing
computer-readable instructions which, when executed, cause a
computer to carry out step E by performing the following steps: I)
starting with the independent group which contains the entity type
with the highest connectivity metric calculated using said
normalized cardinalities; J) calculating N divided by K where N is
the maximum block size expressed as the maximum number of entities
and K is the number of entities in the group; K) load N/K instances
of entity type Tmax into a data block referred to as currentBlock;
L) load said currentBlock with all instances related to said N/K
instances of entity type Tmax at relation distance one from said
instances of entity type Tmax loaded in step K; M) for each related
instance loaded in step L, determine other related instances of
entity types at relation distance one from said instances of entity
types loaded in step L; N) load into said current block all
instances determined in step M which are related to instances
loaded in step L; O) repeat search and loading processing like that
of step M and N as many times as necessary to load instances
related to said instances of entity type Tmax loaded in step K at
each level of relation in said group until all instances of said
group are loaded into said currentBlock; P) starting with the group
in which the entity type with the second highest connectivity
metric resides, repeat steps J through O until all instances of
said entity type with said second highest connectivity metric
resides and instances related thereto at all levels of relation are
loaded into a data block; and Q) repeating processing like step P
as many times as necessary for all entity types in all groups until
all instances of all entity types have been loaded into one or more
data blocks.
14. The computer-readable medium of claim 9 further bearing
computer-readable instructions which, when executed, cause a
computer to carry out step E by performing the following steps: I)
starting with the independent group which contains the entity type
with the highest connectivity metric; J) calculating N divided by K
where N is the maximum block size expressed as the maximum number
of entities and K is the number of entities in the group; K) load
N/K instances of entity type Tmax into a data block referred to as
currentBlock; L) load said currentBlock with all instances related
to said N/K instances of entity type Tmax at relation distance one
from said instances of entity type Tmax loaded in step K; M) for
each related instance loaded in step L, determine other related
instances of entity types at relation distance one from said
instances of entity types loaded in step L; N) load into said
current block all instances determined in step M which are related
to instances loaded in step L; O) repeat search and loading
processing like that of step M and N as many times as necessary to
load instances related to said instances of entity type Tmax loaded
in step K at each level of relation in said group until all
instances of said group are loaded into said currentBlock; P)
starting with the group in which the entity type with the second
highest connectivity metric resides, repeat steps J through O until
all instances of said entity type with said second highest
connectivity metric resides and instances related thereto at all
levels of relation are loaded into a data block; and Q) repeating
processing like step P as many times as necessary for all entity
types in all groups until all instances of all entity types have
been loaded into one or more data blocks.
15. The computer-readable medium of claim 9 further bearing
computer-readable instructions which, when executed, cause a
computer to carry out step A by performing the following steps: A1)
adding data to relationship tables that record parent-child entity
relationships between entity types in said graph so as to make the
graph non-directed such that the parent entity type of any entity
type may be determined from information in said relationship table;
A2) starting with any first entity on said non-directed graph, find
all other entities on said graph which are either parent or child
to said first entity and putting all the entities found in a first
group along with their relationships; A3) proceeding to a parent or
child entity type found in step A2, referred to hereafter as the
second entity, finding all entity types which are related to said
second entity as either parent or child, and adding the entity
types so found to said first group along with their relationships;
A4) repeating step A3 for every other parent or child entity of
said first entity and putting all found entities in said first
group along with their relationships; A5) proceeding to another
entity type node in said graph which is one of the entity types
found in steps A3 or A4, which will hereafter be referred to as the
third entity, and finding all entity types which are related to
said third entity as either parent or child, and adding said entity
types so found to said first group along with their relationships;
A6) repeating the steps of selecting an entity node, finding all
parent and child entities types of said selected entity node, and
adding the found entity types to said first group along with their
relationships until no further new entity types not previously
found can be found, and declaring said first group completed; and
A7) repeating steps A1 through A6 to create another independent
group; and A8) repeating step A7 as many times as necessary to
create as many independent groups as are necessary to include all
entity types on said graph.
16. The computer-readable medium of claim 10 further bearing
computer-readable instructions which, when executed, cause a
computer to carry out step C by adding up the normalized
cardinality for all incoming and outgoing relationships of an
entity to calculate said connectivity metric.
17. An apparatus comprising a computer programmed with an operating
system and one or more application programs which interact with
said computer and said operating system so as to control said
computer to carry out the following process: A) dividing a graph of
entity types into independent groups, each group including entity
types related to other entity types by parent-child relationships;
B) computing the cardinality of each relationship; C) computing a
connectivity metric for each entity type; D) sorting the entity
types into decreasing order of connectivity metric; E) loading
related instances of related entity types into one or more data
blocks so as to not substantially exceed a maximum data size for
each block starting with the entity type of highest connectivity
metric and working downward on the sorted list of entity types
generated in step D; F) loading into said one or more data blocks
any instances of entity types not related to any other entity
types; G) marking each instance which has been loaded into a data
block as done; and H) repeating steps E, F and G until all
instances of all entity types of all independent groups have been
loaded into one or more data blocks.
18. An apparatus comprising a computer programmed with an operating
system and one or more application programs which interact with
said computer and said operating system so as to control said
computer to carry out the following process on a collection of data
objects which can be represented by a graph of entity types which
has been divided into independent groups, each group including
entity types related to other entity types by parent-child
relationships, each entity type representing one or more instances
of the entity, each instance represented by a data object which, if
said instance has any parent or child relationship, has data
recording at least some of its relationship(s) with one or more
instances of another type, said process comprising A) computing the
cardinality of each relationship; B) computing a connectivity
metric for each entity type; C) sorting the entity types into
decreasing order of connectivity metric; D) loading related
instances of related entity types into one or more data blocks so
as to not substantially exceed a maximum data size for each block
starting with the entity type of highest connectivity metric and
working downward on the sorted list of entity types generated in
step C; E) loading into said one or more data blocks any instances
of entity types not related to any other entity types; F) marking
each instance which has been loaded into a data block as done; and
G) repeating steps D, E and F until all instances of all entity
types of all independent groups have been loaded into one or more
data blocks.
Description
BACKGROUND OF THE INVENTION
[0001] It is useful for companies to know exactly what assets they
have for many different reasons, but it is difficult to know this
for large companies and governmental entities. Manually collecting
such data on a periodic basis is expensive and time consuming.
Systems have been developed by companies such as BDNA of Mountain
View, Calif. to take automated inventory of assets. IBM Tivoli is
another such system.
[0002] Sometimes, customers have IBM or BMC CMDB structured
databases (the IBM Tivoli Change and Configuration Management
Database or CMDB) of their assets and the attributes thereof but
the customer likes the way automated inventory collection of
attribute data about a company's assets is collected by another
system such as the BDNA automated inventory software provided by
BDNA in Mountain View, Calif. The BDNA software stores the
attribute data in a different format and with different semantics
than other systems like IBM CMDB. Sometimes in such situations, the
customer may wish to continue to use the IBM CMDB data model but
use BDNA software to collect the attribute data about the
customer's assets. In such cases, it is useful to be able to
extract automatically collected asset attribute data from BDNA data
repositories and be able to make that data available on other data
repositories such as those provided in asset management systems or
inventory systems developed by IBM and BMC. A system to map from
one data model to another and make all changes in semantics, data
types, class structure, inheritance relationships, etc. is needed
to do this.
[0003] The IBM Tivoli CMDB has configuration and tracking
functionality that does automated, agentless discovery of the
assets in use by an entity and their configuration and application
dependencies. The items discovered are called Configuration Items
or CIs for short. Wikipedia defines a Configuration Item as "a
collection of objects related to the specific functionality of a
largest system." Discovery information about a system is one aspect
of a CI, but there is usually other information about each CI
maintained in its CMDB. For example, Number of trouble tickets an
administrator has logged against a computer system; original set of
applications installed on it; and, how they were configured.
Discovery data is used to reconcile/enforce known data about a CI
against item. For example, the discovery data may include: Current
list of applications found in the system; or, up times collected
about the system. [0004] Another source defines a Configuration
Item as: [0005] " . . . any component of an information technology
infrastructure that is under the control of configuration
management. Configuration Items (CIs) can be individually managed
and versioned [sic], and they are usually treated as self-contained
units for the purposes of identification and change control. [0006]
All configuration items (CIs) are uniquely identified by names,
version numbers, and other attributes. Cis vary in complexity, size
and type. They can range from an entire service which may consist
of hardware, software and documentation, to a single program module
or a minor hardware component."
[0007] It is useful to be able to transform inventory attribute
data discovered by other automated inventory discovery systems such
as the one provided by the assignee of the present invention, BDNA,
into the IBM Tivoli CMDB data model, for the reasons given above.
In such cases, it makes sense to provide a layer of isolation and
mapping between the BDNA internal data structures and the outside
system and only expose through the layer the necessary models and
data of the BDNA system. This intermediary layer allows the BDNA
system and data structures to continue to evolve without impacting
the use of BDNA data in external systems.
[0008] The framework and functionality of the intermediate
layer:
[0009] 1) provides a layer of isolation between the BDNA internal
data model and what is exposed to outside sources;
[0010] 2) map out and helps solve differences between the BDNA data
model for the discovery data and the data model representation of
an outside source or target system;
[0011] 3) provide runtime support for processing BDNA 's discovery
data into normalized data required by the external system in the
form of a java code snippet.
[0012] 4) provide a consistent, scalable and manageable way of
processing the BDNA model and extracting it to an outside
target.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIG. 1 is one page of a multipage data model diagram
illustrating some of the Configuration Items in the IBM CMDB data
model.
[0014] FIG. 2 is a diagram of showing how attribute mapping rules
are used to map discovered element data to Configuration Items in a
CMDB data structure in an m-to-n mapping.
[0015] FIG. 3 is a block diagram showing the overall architecture
of a system to transform inventory data gathered by an automated
inventory attribute gathering system into data structures suitable
for use in IBM CMDB and BMC CMDB systems.
[0016] FIG. 4 is a diagram of the schema or data structure needed
to do the type of transformations which the system of FIG. 3
performs.
[0017] FIG. 5 is a diagram of a supported object model comprising
ER Model Table Types.
[0018] FIG. 6 is a diagram illustrating the hardware and data flows
of a system to transform BDNA asset attribute data into a
destination ER model and export the transformed attribute data into
an external system data store such as IBM CMDB.
[0019] FIG. 7 is a flow diagram of the workflow of a process to
create the BQL reports and transformation rules for a specific
transformation project and use them to transform BDNA inventory
asset attribute data into CIs in the data model of a CMDB data
structure.
[0020] FIG. 8 is a block diagram of the IBM ER Model.
[0021] FIG. 9 is flow diagram of the business method for
transformation of attribute data from the data model of the source
system to the data model of the target system.
[0022] FIG. 10 illustrates a class diagram for the object-oriented
transformation rules objects for storage of the Transformation
Rules of the ComputerSystem CI with two subtypes illustrated.
[0023] FIG. 11 is a diagram illustrating how the object oriented
transformation rules can combine-transform information from two or
more objects in the inventory attribute data extracted by the BQL
report from the source system to write a single CI in the target
system or split-transform information from a single object
extracted by the BQL report from the source system into two or more
Cis in the target system.
[0024] FIG. 12 is a diagram illustrating how BQL reports are used
to split aggregated computer system and operating system attribute
data into separate BQL reports, one for computer systems and one
for operating systems, and one for the relationships between
computer systems and operating systems.
[0025] FIG. 13 is an illustration of the overall workflow of
another embodiment of a transformation process represented by the
flowchart of FIG. 7.
[0026] FIG. 14 is a flow diagram showing how data from two
different elements discovered by the automated attribute discovery
process is combined using a BQL report into a single Configuration
Item in CMDB for the Oracle database instance.
[0027] FIG. 15 is an illustration of two elements linked together
by a containment relationship in the discovery data.
[0028] FIG. 16 is a flow diagram showing how BQL reports can also
be used to split a single BDNA report into two tables and a
relation expected by CMDB.
[0029] FIG. 17 is an example of the adapter restart.
[0030] FIG. 18 is a flow diagram of a BDNA adapter doing a
state-based transfer.
[0031] FIG. 19 shows two hierarchies of objects, one rooted at A
and one rooted at P.
[0032] FIG. 20 represents the process of exporting the data from
the BDNA CMDB ER Model using the CMDB specific adapters to external
CMDB stores.
[0033] FIG. 21 is a flow diagram of one method of dividing a graph
of entities (asset classes) into separate groups where no entity in
any group is related to any other entity in any other group.
[0034] FIG. 22 shows an example graph of the entity types for the
output data generated by the transformation process described
above. The relations between entity types (tables) are indicated by
arrows. An arrow from one entity to another indicates the
origination point is the parent entity and the destination point is
the child entity.
[0035] FIG. 23 shows the cardinality of each relationship added to
the example graph shown in FIG. 22.
[0036] FIG. 24 shows an example of relationship between two entity
tables. This Illustration only shows the metadata. In this simple
output schema, two output entity tables A and B and a relationship
table showing the relationship between A and B are shown.
[0037] FIG. 25 shows example data in the tables shown in the
relation example given in FIG. 24.
[0038] FIG. 26 shows examples of self-consistent blocks that load
data to the target CMDB system. A single block containing all the
data in the individual blocks illustrated would be too large to
load into the CMDB system.
[0039] FIG. 27 shows the connectivity metric of each entity type
added to the graph shown in FIG. 23.
[0040] FIG. 28 illustrates how to divide a graph consisting of 9
nodes A, B, C, D, E, F, G, H and I into three groups G1, G2 and G3
using the preferred algorithm disclosed herein. Note there is no
edge between any two nodes that belong to two different groups.
[0041] FIG. 29 shows an example illustrating how an approach that
processes nodes of the graph in a random order may lead to
inefficiency of execution.
[0042] FIG. 30, comprised of FIGS. 30A and 30B, referred to
collectively as FIG. 30, is a flowchart of the preferred heuristic
based method to build self-consistent blocks of entities once a
graph is divided into groups by any prior art method.
[0043] FIG. 31 is a flowchart of another example of a process to
divide a graph of entity types into independent groups where no
entity type in a group is related to another entity type in another
group.
[0044] FIG. 32 is a flowchart of the details of the process of
adding the data of all related entity instances to a data block
being filled.
[0045] FIG. 33 is a diagram illustrating one example of
relationships of instances of particular entity types.
SUMMARY OF THE INVENTION
[0046] Embodiments that implement the teachings of the invention
will do mapping from one data model to another. This is done
using:
[0047] 1) Means for extracting the necessary attribute data from
the source system to represent the same assets having those
attributes in the target system. In various embodiments, this is
done using BQL reports or any other method of selecting the
attribute data about one or more assets in the source system needed
to make up instances in one or more classes representing the same
type assets in the target system. Typically, this is done using a
computer programmed to run BQL reports, but dedicated hardware
circuits could also be used.
[0048] 2) Means for transforming the attribute data from the format
it is in for the data model of the source system to the format of
the data model for the target system. In the preferred embodiment,
this is done using a transformation engine which executes
transformation rules. These rules are written by hand by an analyst
who understands the difference between the source system data model
and the target system data model and then writes computer programs
that control a computer to transform the attribute data from the
source system into attribute data having the proper format for the
target system. The transformed attribute data is then stored
temporarily in an ER model of the target system.
[0049] 3) Means for exporting the transformed attribute data to the
target system. In the preferred embodiment, this is implemented
with an export adapter that is conversant with the application
programmatic interface of the target system and which functions to
make the appropriate function calls and supply the appropriate
arguments from the ER model data to properly store the ER model
data in the target system. The export adapters are basically
drivers for the target system.
[0050] The BQL reports are generated by software running on a
computer which controls the computer to select the attribute data
needed from the source system to make up each Configuration Item
(CI or class or type) in the target system. The BQL reports are
typically computer programs which control a computer to extract the
source system attribute data and store it in a store coupled to the
transformation engine. A transformation engine is typically
implemented by executing transformation rules programs on a
computer. The transformation rules are written by an analyst that
understand the differences in semanics, data types and units of
measure between the source system and the target system. The export
adapters are typically software programs which are executed on a
computer, the software controlling the computer being conversant
with the application programmatic interface (API) of the target
system and controlling the computer to export the transformed
attribute data into the target system.
[0051] In the preferred embodiment, the transformation rules are
organized in an object oriented format. What this means is that
since the objects in the target system data model are organized
into parent and subtype objects, i.e., CI and subtype Cis, the
transformation rules can be so-organized also. The preferred
embodiment method involves identifying for each CI with subtype CIs
in the target system which attributes are common in that all the
subtype CIs inherit those attributes from the parent CI.
Transformation rules for those common attributes are then written
and stored in an object which is the parent of subtype objects. The
subtype objects store transformation rules which are unique to the
subtype objects attributes stored in the subtype objects in the
target system data model.
[0052] When the attributes of a subtype object in the target system
data model are to be populated, the transformation rules of the
parent object are used to transform the attribute data of the
corresponding parent object in the source system data model into
the data format of the target system data model's parent object.
The transformed attribute data that is common to parent object (CI)
is then exported to the target system data model and used to
populate the parent object (CI) and the inherited attributes of all
the subtype or child objects (subtype CIs). The transformation
rules for the attributes which are unique to each subtype object
(subtype CI) are then used to transform the attribute data of the
subtype objects in the source data system into the data format of
attribute data of the subtype objects in the target data system.
The transformed attribute data for each subtype object is then
exported into the appropriate subtype objects of the target
system.
DETAILED DESCRIPTION OF THE VARIOUS EMBODIMENTS
[0053] The process of exporting inventory attribute data developed
by systems such as those marketed by BDNA (hereafter BDNA
discovered data) into the data bases or data structures of other
system fundamentally involves a mapping process to resolve
differences between the data structure (data model) of the BDNA
discovered data and the data structure or data model of a target
system.
[0054] FIG. 1 is one page of a multipage data model diagram
illustrating some of the Configuration Items in the IBM CMDB data
model. This is a conceptual model only in the form of a class
diagram and is not how the data is actually stored in their
database. A CMDB is a common store where all the information about
the IT assets of an entity are stored, and it has a data structure
or data model which is independent of the data structure of
discovery data discovered by discovery tools such as the BDNA
discovery tool. A data model is a class diagram which defines the
classes of objects, each object's attributes and genus-species
relationships, i.e., the subtypes or species of each genus or type
(class) of object and the containment relationships. A containment
relationship is a relationship such as an operating system is
installed on a computer. A genus species relationship is, for
example, computer system is the genus and Sun Sparc station is a
species of computer system. A database is an implementation of a
data model like FIG. 1 with specific tables. The characteristics
that define a data model are the class structure and the
relationships between the classes. The names and attributes of each
class and its subtypes and the attributes specific to the subtype
are defined and which classes are subtypes of other classes is
defined as is the relationship of which classes of asset types are
installed on other classes of asset types is also defined. How the
instances of devices within the class structure are stored is
irrelevant to the data model itself.
[0055] Vendors that define CMDB data models like IBM generally
market discovery tools which configure the discovered attribute
data about assets in a way compatible with that vendor's CMDB. But
sometimes, customers want to use other discovery tools that are not
configured to generate discovery data in a data format compatible
with a CMDB data model the customer wishes to use. In such a case,
a transformation system to convert the discovery data into the data
structured used in the CMDB is useful.
[0056] Each block in FIG. 1 represents a Configuration Item or CI
in the data model. A CI is a type of an asset such as a computer or
an operating system, etc. A CI is basically a class object with a
listing of attributes. For example, there is a CI 11 for
ComputerSystem which has the attributes listed in the box 11 such
as architecture, CPU speed, CPU type, manufacturer, memory size,
ROM version, serial number, etc. The data model diagram shows a
classic object-oriented class structure where various CI types have
subtypes which inherit the attributes of the parent class but which
unique attributes of their own. For example, the ComputerSystem CI
11 has the various subtypes named in the blocks such as blocks 13
and 15 representing subtypes of computer systems for
SunSPARCVirtualComputerSystem and ZSeriesComputerSystem,
respectively. Each of these is a separate species of the generic
ComputerSystem type. The lines with arrowheads, such as line 17,
indicate these type and subtype relationships. The CI box on the
arrowhead end of line 17 represents the genus CI or the common
attributes that all the subtype CI boxes (on the "feathers" end of
the arrow) will inherit. Each of the subtype CI will have all the
attributes of the parent type and may have its own "species"
attributes unique to it. These particular subtypes represent the
leaf level of the class diagram tree, and those types of objects
will be instantiated with instances of particular assets found in
inventory which have the attribute set of that type and subtype.
Some of the Cis or types are abstract such as the LogicalElement CI
23 at the top of FIG. 1. This CI type will never be instantiated
with actual instances but it serves as the top of the class diagram
tree and is generic to all the subtypes.
[0057] The containment relationships in the data model of FIG. 1
are represented by lines connecting the CI boxes, of which lines 19
and 21 are examples. Line 19 indicates that the ComputerSystem CI
11 is the parent of OperatingSystem CI 9 and is also the parent of
FileSystem CI 23. These lines 19 and 21, and the cardinalities
printed on them, mean that the computer system may have any number
of operating systems installed on it and any number of file systems
installed on it.
[0058] In order to transform attribute data collected by an
automated discovery system such as BDNA to data stored in a CMDB,
it is necessary to gather all the attributes needed for each CI and
transform that data if necessary to conform it to the data types of
the attributes of the CIs. This may require unit conversion, data
type conversion and recognition of semantic differences so that
attributes called one thing in the source data are stored in an
attribute storage memory location called something else in the
target system but which means the same thing. For example, a CI for
a computer system may have an attribute that measures the size of
the hard disk in megabytes but the automated asset discovery system
stores such an attribute in units of kilobytes. As another example,
the BDNA automated asset discovery system may store the attribute
CPU speed as a floating point number such as 126.4 MHz and the
target system may store the CPU speed as an integer number of MHz.
As another example a CI in the target system may label the CPU
speed attribute "CPUSpeed" whereas the BDNA software may label the
same attribute "ClockSpeed". They both mean the same thing and are
the same attribute.
[0059] FIG. 2 is diagram of showing how attribute mapping rules are
used to map discovered element data to Configuration Items (CIs) in
a CMDB data structure in an m-to-n mapping. In FIG. 2, BDNA
discovered data in the form of three different discovered elements
E1, E2 and E3 are input to a process 10 which uses mapping rules to
create data structures of the desired output type. The elements E1,
E2 and E3 could be hardware assets or software assets or any other
assets on the network of the entity using the BDNA discovery
software. These elements may have different names than the CIs to
which they are to be transformed and the listing of attributes of
the elements may be different than the CI target, but basically,
they are the same thing as the target CI or Cis into which the
element's attribute data are to be transformed. In the particular
example of FIG. 2, the target system is a CMDB data structure, but
IBM Tivoli or other data structures might also be target systems.
Sometimes, the attributes of multiple elements are needed to make
up a letter number of CIs. That is the purpose of the M-to-N
mapping shown in the example of FIG. 2 where the attributes of
three elements are transformed into the attributes of two CIs CI1
and CI2.
[0060] One example of how to do such a conversion is:
[0061] 1) Use the BVL/BQL functionality of BDNA software to combine
multiple discovered elements to a single entity.
[0062] a) This done by collecting all attributes needed to create a
particular Configuration Item type in CMDB in a BQL report (BQL
stands for BDNA Query Language) which is then used as an input of
data to an attribute mapping process.
[0063] For example, a ComputerSystem CI type in CMDB may require
attribute data from Network, Host and Operating system discovered
elements, as related together by the container relationship in BDNA
data structures of discovered elements. Several such BQL reports
may have to be defined depending upon how many different
Configuration Item types are to be generated. FIG. 14 is a flow
diagram showing how data from two different elements discovered by
the automated attribute discovery process is combined using a BQL
report into a single Configuration Item in CMDB for the Oracle
database instance. The Configuration Item in the CMDB to be built
is box 27 and is a CI for an Oracle database. It requires
information both about the operating system of the host and the
Oracle Database instance. The attribute data about the operating
system is represented by block 29. The attribute data about the
Oracle Database Instance is represented by block 31. A BQL report
33 is generated which contains the database attributes and adds the
hostname attributes about the operating system element 29.
[0064] FIG. 16 is a flow diagram showing how BQL reports can also
be used to split a single BDNA report into two tables and a
relation expected by CMDB. In this example, the BDNA OS report 37
contains both the IP address and the MAC address but CMDB expects
two tables and a relation. The BQL report 41 is used to generate a
CMDB OS report containing the OSid which has a relation InstalledOn
with a CMDB MAC report MACAddrid (the MAC address) which is bound
to a CMDB IP report which contains the IP address. The BQL report
generates the two separate CMDB reports and enforces proper
contraints.
[0065] When the BDNA discovery software discovers attribute data,
the values are stored in a special schema called the transactional
store that is very generic and suitable for inserting/appending new
data but not suitable for querying data (or interpreting data in a
meaningful way). A language called BQL was therefore developed to
generate reports that are easy to interpret from the transactional
store. These reports have meaningful column names that reflect the
attributes being discovered. Each report ends up stored in a
database table.
[0066] BQL is this query language prior art reporting mechanism
that is executable and which extracts attribute data and stores it
in one place. The BQL reports allow attribute data from different
inventory assets to be collected and stored in one place such as a
particular row of a table. In the transformation process described
here, the BQL reports are used to collect all the attribute data
needed for a particular CI from the attribute data of one or more
assets in the inventory data and store it in one place for the
transformation rules to work on. That one place is represented as
box 7 of FIG. 3, labeled the input schema based on BQL reports. An
alternative would be to use specialized SQL statements to generate
such tables.
[0067] b) Relationships between entities will, in some embodiments,
be specified by reports. For example, the relationships between
ComputerSystems and Databases (parent-child) may be specified as:
[0068] i.) an M-to-N relationship: a separate report that just
contains the parent-child link; [0069] ii.) an 1-to-N relationship:
The database reportwill, in some embodiments, contain a column for
OS_ID (which is sort of a foreign key to an all_os table) FIG. 15
is an illustration of two elements linked together by a containment
relationship in the discovery data. In this case, a relationship
between the Operating System report 33 and the Database Report 35
is a containment relationship. The os_id attribute is the common
attribute which links the two reports together in the sense that it
is a foreign key to the All_OS Report 33 and it contains the ID of
the OS that contains the database.
[0070] c) Use entity transformation rules to map a single entity to
possibly multiple CIs. In some embodiments, the BVL/BQL language
may be able to implement these transformations since BQL supports
user defined PL/SQL functions.
[0071] FIG. 3 is a block diagram of the overall architecture of a
system to convert data from a BDNA inventory data repository into
records in database entries in the data format of IBM CMDB or BMC
CMDB.
[0072] Block 12 represents a set of specific BQL queries which are
specific to the project's source and target data models. The BQL
reports are a set of reports necessary to transform m-to-n mapping
of BDNA types into 1-to-n mapping of BQL reports. A BQL report is a
a database table where the main idea is to store all the required
information for a CI in one table no matter how the attribute data
was discovered. For example, at discovery time, discovery data may
have been stored as several different elements. BQL reports contain
data extracted from the BDNA inventory data to make up a complete
specific CI in the CMDB data structure. An example of part of the
All Operating System report (only five out of a large number of OS
attributes are displayed) is given in the BQL report table
below
TABLE-US-00001 OPERATINGSYSTEM_id, OPERATINGSYSTEM_LABEL,
SERIALNUMBER, TOTALMEMORY, VERSION 102122068 osiris.bdnacorp.com
B0ZJP81 294000640 5.2.3790 102122080 qawinl.bdnacorp.com 1MN3081
4294000640 5.2.3790 102122090 hydra.bdnacorp.com 804757504 5.2.3790
102122096 xseries3.bdnacorp.com 00:14:5E:7E:2B:F2 4249919488 Red
Hat Enterprise 4 102122098 xseries4.bdnacorp.com 00:14:5E:7E:2B:EE
4249899008 Red Hat Enterprise 4 102122102 titan.bdnacorp.com
00:02:B3:95:73:95 3189473280 Red Hat Enterprise 4 102122104
avocado.bdnacorp.com 00:14:22:09:30:E2 2124529664 Red Hat
Enterprise 4
[0073] An example of the Database Report for the Oracle database
instance is included below.
TABLE-US-00002 ORACLEINSTANCE_ID, ORACLEINSTANCE_LABEL,
ORACLEINSTANCE_TYPE, NUMSESSIONS, 102129509 ora92 Oracle Instance
on UNIX 2 102129483 ora10g Oracle Instance on UNIX 21 102129472
ora10g Oracle Instance on UNIX 10 102129461 ora92 Oracle Instance
on UNIX 26 102129444 ora10g Oracle Instance on UNIX 32 102129433
ora92 Oracle Instance on UNIX 1 102129416 ora92 Oracle Instance on
UNIX 1 102133888 ORCL Oracle Instance on Windows ORACLEINSTANCE_ID,
EDITION, SCHEMACOUNT, OPERATINGSYSTEM_ID 102129509 Standard 39
102122173 102129483 Enterprise 28 102122146 102129472 Standard 9
102122144 102129461 Standard 9 102122136 102129444 Enterprise 9
102122134 102129433 Standard 5 102122132 102129416 Standard 4
102122130 102133888 0 102123967
[0074] Note that all columns of the database report are related to
database instances except for the last column Operating System ID
that is the foreign key to the containing operating system (the
column headings are too long to fit the page so they wrapped
around).
[0075] These BQL Reports contain the data required to make up CIs
of a specific CI type in the CMDB data structure.
[0076] Block 14 represents the data repository of automatically
discovered inventory data regarding the attributes of the devices
and software discovered in the discovery process. This data
repository 14 is the source of the inventory data which is being
transformed. Block 16 represents the entity transformation rules
which are used to transform data entities from BDNA data format to
CMDB format. The rules are vendor specific since each vendor has a
different data schema (data structure). In other words, the
transformation rules to transform BDNA data entities into CIs for
IBM CMDB data structures will be different than transformation
rules to transform BDNA data entities into BMC CMDB data
structures. These rules would be different than transformation
rules to transform IBM CMDB CIs into BDNA elements in the base
tables created by the BDNA system.
[0077] All the mapping/transformation rules to transform attributes
from the input schema to whatever is the output data schema are
specified in block 16. There is one transformation rule for each CI
type. Each transformation rule is handwritten by an analyst who
understands the differences between the data structure of the CI in
the target data model and the data structures of the elements and
attribute data in the inventory data stored in the source system
(which may be automatically discovered or stored in the source
system base tables or data repository by hand). The transformation
rules make the necessary data units conversion, data type
conversion and semantic translation in the sense of placing the
processed data in the appropriate field or fields of the target
data model's data structure despite the fact that the data is
called something else in the source data structure.
[0078] Block 18 represents the entity transformation engine. The
entity transformation engine is a program running on a computer
which receives all the input data from the inventory data
repository in data format 1 (the BDNA inventory data or whatever
other input data format is being used). That inventory data is
transformed by the entity transformation engine using the entity
transformation rules specified in block 16 as the other input. A
specific example of a transformation rule process is given
below.
Example of Transformation Rule
[0079] 1) The discovery process assigns a value of device type to a
Computer System to have a specific format (e.g.,
device.loadBalancer, device.storage etc.). Different CMDB vendors
have different formats for storing such an attribute. Here is a
snippet of a transformation rule that converts the value of this
particular attribute.
TABLE-US-00003 [0079] <xsiTargetAttributeComputed
name="csType"> <xsiAttributeAlias name="deviceType"
sourceAttributeName="deviceType"/> <xsiCode> <![CDATA[
if (deviceType.equals("device.loadBalancer")) return
"LoadBalancer"; else if (deviceType.equals("device.storage"))
return "StorageDevice"; else if (deviceType.equals("device.system
")) return "ComputerSystem"; else if
(deviceType.equals("device.storage.SAN.StorageArray")) return "SAN
Storage Device"; else if
(deviceType.equals("device.storage.SAN.StorageArray")) return "SAN
Storage Device"; ... ]]> </xsiCode>
</xsiTargetAttributeComputed>
[0080] 2) The discovery system finds the total physical memory of a
Compute System in Bytes. The CMDB system requires the attribute
values to be in Kilo Bytes. Here is a snippet of the transformation
rule that does the conversion:
TABLE-US-00004 [0080] <xsiTargetAttributeComputed
name="TotalPhysicalMemory"> <xsiAttributeAlias name="memory"
sourceAttributeName="totalmemory"/> <xsiCode> <![CDATA[
//bytes to kbytes if (memory == null) { return null; } return
memory / 1024; ]]> </xsiCode>
</xsiTargetAttributeComputed>
[0081] 3) Simple name transformation: The discovery system calls an
attribute representing the Operating System domain as
oscomputerdomain but the CMDB system requires the attribute name to
be WorkGroup. The following transformation rule snippet does the
name conversion. <xsiTargetAttributeMapped name="Workgroup"
sourceAttributeName="oscomputerdomain"/> [0082] 4) The CMDB
system requires an attribute called URI (Uniform Resource
Identifier). For example the user directory/home/joe on the machine
joe.bdnacorp.com has a URI represented in the given format:
file://joe.bdnacorp.com//home/joe. The discovery process never
computes the URI attribute. Instead a discovered element of type
root types resource software. OperatingSystem has the hostName
attribute whereas a related discovered element of type root types
resource.storageAllocation.fileSystem has a PhysicalName attribute
with value/home/joe. The discovery process finds the elements with
the corresponding values and also stores a relation between the two
elements since they are related (by a containment
relationship--since the fileSystem element is contained in the
OpertingSystem element), stores the relation information in the
database. A BQL report is built to bring the two attributes (for
each such discovered instance) together. Such a report may be
contain several other attributes such as fileSize, fileFormat etc.
A transformation rule to build the URI attribute takes the above
two attributes (hostName and PhysicalName) as input and computes
the URI attribute. The relevant snippet of a transformation rule
that builds the URI attribute is as follows:
TABLE-US-00005 [0082] <xsiTargetAttributeComputed name="URI">
<xsiAttributeAlias name="realFile"
sourceAttributeName="PhysicalName"/> <xsiAttributeAlias
name="host" sourceAttributeName="hostName"/> <xsiCode>
<![CDATA[ return "file://"+host+"/"+realFile; ]]>
</xsiCode> </xsiTargetAttributeComputed>
[0083] In the specific example of FIG. 3, the input data in BDNA
data format is pulled from data repository 14 and the
transformation rules specific to a BDNA to IBM CMDB data format
transformation are applied to convert the input data into data in a
CMDB ER Model which has data structures compatible with
Configuration Items for the destination CMDB.
[0084] Block 20 represents the output schema CIs in the example
given of the target data format being the CMDB ER Model or output
data entities in whatever the target data format is. The ER model
is a data model which is a staging area where the attribute data is
temporarily stored which has been transformed by operation of the
transformation rules engine to the data format required by the
target data model. This temporarily stored data awaits export by
the export adapter 56. FIG. 8 is a block diagram of the data
structure of the IBM ER Model. Each box in FIG. 8 is a table, and
the list of attributes in the box is a listing of the names of the
columns in the table. In other words, each attribute name listed in
the box is the name of one column in the table having the name
listed in the box. The IBM ER model is compatible with the data
structure of the IBM CMDB data model so the data structure which
implements the IBM CMDB data model would comprise tables for each
CI type. Each table would have one column for each attribute and
the rows in the column would be populated with the values of the
attribute represented by the columns of attributes for instances of
that type asset found in the BDNA inventory data base tables. For
example, a particular laser printer may have only four attributes:
manufacturer, model, serial number and IP address. The printer CI
in the IBM CMDB would have subtype tables for different subtypes
such as laser, inkjet, etc. In the data structure that stores the
data of the IBM CMDB data model, the laser printer subtype table
would have one row dedicated to the laser printed mentioned above.
It would have four columns labelled manufacturer, model, serial
number and IP address, and each column position on the row
dedicated to the particular printer would be populated by the
attribute value, e.g. HP, 801, HP2013458769, 10.10.10.1. Thus, the
tables in FIG. 8 would basically be transferred to the data
structure of the external system representing the IBM CMDB data
model. Most of the instances would be recorded in tables for the
subtypes, but any instance which could not be classified in one of
the subtypes would be stored in a table for instances of the parent
type CI.
[0085] Block 22 represents adapter interface mechanisms to process
BDNA data in a well defined, consistent, restartable and high
performance manner. The adapter interface 22 performs the following
functions:
[0086] 1) defining BDNA data to be exported [I thought the data to
be exported was defined by the BQL reports];
[0087] 2) uniquely identifying entities and relations within a
project;
[0088] 3) providing a dynamic, pluggable interface for external
adapters;
[0089] 4) keeping track of the progress of the exporting process
and allowing restarts;
A restart is the process of partially redoing a transfer of
information between BDNA and and external CMDB. Restart might be
necessary because doing the transfer can take a long time and the
process might stop in the middle because of unforseen events such
as the network or database failing or running out of disk space. In
a restart case, the BDNA transformation engine allow the adapter to
continue/restart from the last committed transaction instead of
having to start the export process from the beginning all over
again. FIG. 17 is an example of the adapter restart.
[0090] 5) providing common mechanisms to log errors and warning
conditions.
[0091] Errors and warning conditions can occur during any step of
the transformation. They can indicate a problem with the project
definition, a problem with the BDNA data set exported, or that the
transformation process encountered an out of resource condition.
Examples of project definition problems: "Invalid rule definition",
"rule reference missing rule definition" or "Invalid BQL report
name". Examples of BDNA data set problems: "Foreign key constraint
violation on relation installedOn". Examples of out of resource
conditions: "Unable to reach BDNA database", "User tablespace full
on BDNA database", "File system out of disk space while adapter
output xml book".
[0092] 6) divide up the data to be exported so that it can be
processed in reasonable sizes.
[0093] The adapters 22 are software which drive Adapter Interface
circuitry that is coupled to one or more data repositories 24 and
26 where the Configuration Items are stored. In this example, the
Cis are stored in an IBM CMDB data schema and a BMC CMDB data
schema. The mechanism to store data in an IBM CMDB is very
different than the mechanism to store data in a BMC CMDB. Each of
the IBM CMDB and the BMC CMDB will have its own adapter interface
circuitry which is designed to follow the appropriate procedures to
stored data in the target CMDB for which the adapter is designed.
Basically, each export adapter is a driver for a specific CMDB
which takes data tables out of the ER model for the target CMDB and
invokes the proper procedures to store the data in the target CMDB
in the appropriate place and the appropriate fashion. For example,
for the IBM CMDB, the export data has to be put in XML book format
and then exported. For a BMC CMDB, the export adapter is a JVC
connector which connects the ER model to the data store where the
BMC CMDB is being stored. The export adapter for each target CMDB
knows the API for that target in that it knows what function calls
to make and which arguments to supply for each function call.
[0094] Target data repositories 24 and 26 are databases in memory
arrays which store the IBM and BMC format data structures. It is
these data structures to which the ER model data is transferred by
the export adapters.
Modes of Data Transfer
[0095] There are two modes of data transfer from the automated
asset discovery systems to the CMDB data structures.
[0096] 1. Stateless Transfer: this type transfer assumes all the
data collected by the automated asset discovery system is assumed
to be new. The result is that the system attempts to create every
CI defined by the IBM CMDB. Stateless vs. State Based transfers
deals with what happens when information about the same CI is added
to or removed from the CMDB but with different data collected
multiple times. A stateless transfer from BDNA to CMDB means only
insert operations are done into the target CMDB without trying to
find out if that particular CI already exists. A state based
transfer requires the BDNA adapter to query the target CMDB to find
affected CIs and either do an insert, update, or delete depending
on the state of the remote CI. It is up to the CMDB Reconciliation
Engine to resolve for each CI the system attempts to create if the
CI already exists in the CMDB (based upon the NamingRules). A
reconciliation engine is part of the target CMDB. It uses unique
naming rules on the CMDB system to keep unique copies of each CI.
Naming rules are mechanisms for unique identifying of CIs. A
computer system might be identified by a combination of its domain
name "foo.bdnacorp.com", and its active IP addresses:
{192.168.1.160, 10.10.10.1}. They are used when trying to find if a
particular CI already exists in the target CMDB system when doing
state based transfers. This is approach can be implemented without
having a CMDB Driver component. [what is the CMDB driver component
in the block diagram?]
[0097] 2. State Based Transfer: in this type of transfer, data that
was previously exported is cached. The cached data is used to
compute the differences (the delta) between the cached data and the
new data. Based upon these differences, instructions are generated
such as create, modify, delete, etc. Assuming most of the data does
not change, the time to process the differences will be small. A
CMDB driver implements the state based transfer. FIG. 18 is a flow
diagram of a BDNA adapter doing a state-based transfer. State based
transfers are more complicated because it must match BDNA data with
state of target CMDB.
[0098] FIG. 3 is a diagram of the schema or data structure needed
to do the type of transformations which the system of FIG. 2
performs. The XSI_Project table 30 is a table which contains
definitions of a project object specifying rules reference, input
entities references, output entities references, relations between
output references, and adapter configuration. The adapter
configuration is any specific information requried by the adapter
to run. For example, a specific Vendor's adapter may need specific
connection properties or some such detail to run. Such vendor
specific adapter details are provided as adapter configuration.
[0099] Xsi_xform_rule_ref 32 is a table containing definition of
transformation rule references used by the project. Each row of
this table identifies a single transformation rule. Rules can
operate on CI's or relationships between CI's Both CI's or
relations have unique entity ID's. Each rule identifies its entity
source and entity target. An xsi_entity_source or sxi_entity_target
is the unique ID associated with either a relation or CI.
[0100] Xsi_adapter 34 is a table which defines the output adapter
and its associated entity and relation outputs. The xsi_adapter and
xsi_adapter_output tables define information used by the
transformation engine to configure, find the subset of the model
used by the adapter, and run the adapter when a user requests a
transformation. Each project can define an adapter in the
xsi_adapter table with information to load the adapter and which
configuration it should use for this particular project. The
xsi_adapter_output table defines which CI's and relations are part
of this project and should be sent to the adapter when the
transformation is executed. These references are needed because
projects can share entities and relations. For example, the BMC
adapter project has CI tables: T1, T2, T3, and the IBM adapter
project has CI tables T2, T3, T4.
[0101] In this example the BDNA External tables would create
T1,T2,T3,T4, but the BMC adapter would only use the first three
tables while IBM's adapter would only output the last three. Given
a particular table representing a CI, not all of its columns need
to be exported to the target CMBD. Some of the invisible columns
might only be used while processing the transformation rules and
are not used when exporting to the target CMDB.
[0102] Xsi_adapter_output 36 is a table which contains information
about the set of entity and relations output to be read by the
adapter
[0103] Xsi_base_type_ref 38 is a table which defines identities,
entities and relations used by a project. An entity is a table that
is part of ER model defined and built with BQL. Each defined CI and
the defined relationships between CI's form a CMDB specific ER
Model. The BDNA transformation engine represents each CI as a table
with a unique ID for each CI instance, and a column corresponding
to each CI's attribute. Relations betweens CI's are also
implemented as a table with two columns sourceId, targetId. This
generic implementation of the ER Model can be reused for multiple
CMDB specific ER Models. It loosely corresponds to a particular
type in the ER model. A relation associates two families of type: a
source type and a target type. We only need to define a relation
between base types, and it also applies to all subtypes of each CI.
For example the relation "OperatingSystem runningOn ComputerSystem"
is the only one needed to accommodate the instances "Windows1
runningOn CS1", "Linux2 runningOn CS2", "Solaris3 runningOn
CS3".
[0104] The need for Identity table results from the fact that the
output data model that we what to populate is Object Oriented. The
ability to have several subtypes of a base type requires us to
store the subtype data separate from each other. But consider two
such hierarchies shown in FIG. 19. FIG. 19 shows two hierarchies of
objects, one rooted at A and one rooted at P. Trying to specify
relations between objects of hierarchy rooted at A with objects of
hierarchy rooted at P becomes difficult because one has to consider
all possible combinations of relations. As a result, we store the
identities of all objects belonging to every subtype rooted at A in
one Identity table and similarly the identities of all objects
belonging to every subtype rooted at P in another Identity table.
The relations between the two sets of objects refer to the Identity
tables instead of the tables storing the actual data. This is one
of the important reasons for using the Identity tables. Besides
these tables get used to keeping track of objects as they are
output to the CMDB using the adapters (to remember which one's have
been already processed).
[0105] Identity objects are project specific tables created to
identify types and all associated subtypes for each type. A type is
a CI definition, for example Operating System. A subtype is Unix.
Another subtype is Windows. For example:
OperatingSystem--Unix--Linux would specify a particular subtype.
Each relation must associate its source and target with identity
tables. The identity tables are used when outputting the ER Model
for: identity mapping; final type instantiation; and, keeping track
of identity read state. In the ER Model, the identity of each CI
instance is separated from its attribute values in one embodiment.
Identity tables are used to uniquely identify each CI instance and
to keep state for the CI instance in the project (for example,
whether a project adapter has exported a CI instance to its CMDB).
Identity tables have a fixed format that ER Modelers must obey when
creating them.
[0106] The above defined schema permits modeling a full-fledged
object model which includes entities (an identity/entity pair) and
relations between entities (an association between two entities
identified by an entity source ID and an entity target ID.
[0107] FIG. 4 illustrates a data structure of how the
transformation rules are internally stored. It is a diagram of a
supported object model comprising ER Model Table Types. The
transformation process involves a BQL report or reports,
transformation rules and an output table. Somewhere the BQL report
definition, transformation rule definition and output table
definition. Those definitions have to be stored somewhere, and FIG.
4 represents one embodiment of a data structure to store these
definitions and other data needed for each transformation project.
Table 38 stores all the definitions for the BQL report,
transformation rules and output table.
[0108] Table 30 identifies each project and groups all the
definitions and transformation rules for that project together. The
transformation rules for each project are stored in table 38. In
fact, table 38 stores all the rules for all the projects as well as
all entities and all relations between CI types for all projects.
Which relations exist for each project are indicated in table 39.
The entities which exist for each project are listed in table 41.
Table 32 contains information which indicates which transformation
rules are used for each project. In other words, table 38 stores
the relationship information (such as the "installed on"
relationship) between CI classes illustrated in FIG. 1 as lines
between the CI type boxes for all projects, and table 39 indicates
which relations are used on each project. Table 34 stores the
information needed for the export adapters. For example, table 34
stores information about which code the export adapter for each
project needs to run, which classes need to be run, what tables are
involved when the export adapter is run. Table 36 stores
information on which attributes are actually needed for each CI in
case more attribute data has been collected than is needed for a
particular CI.
[0109] Table 30 stores identity objects in the model which are used
to keep track of a unique project per entity and whether the entity
has been read by the adapter during output. Each identity has a
corresponding entity associated with it. Entities represent the
actual inventory attribute information to be exported which was
collected by the automated inventory attribute discovery system
such as the BDNA software. The types of entities associated with
identities can have different types. For example, a
ComputerSystemIdentity can be associated with entities of different
types ComputerSystemWindows, ComputerSystemLinux,
ComputerSystemSolaris, etc. The possible subtypes associated with
an identity are defined in the xsi_entity_type_ref object table 42.
The adapter interface API uses the project data to validate and
enforce constraints when processing a project.
[0110] Relation objects define a directed association between two
identities and their specific subtypes in a project. Each
xsi_relation_type_ref that is part of a project defines a
`dependency` attribute which can be one of:
source|target|mutual|none. The information is used by the adapter
to validated relations between CI's. For example, the relation:
"OperatingSystem runsOn ComputerSystem" should be defined with
dependency=`mutual`. This allows the adapter framework to check to
make sure that all OperatingSystem CI's have a relation to some
computerSystem. This checking would be validation of a constraint,
e.g., each operating system must be installed on a computer system.
For example, the relation "installedOn" would exist between
identities ComputerSystemIdentity and OperatingSystemIdentity. An
actual instance example might be: source is scld(1)/WinCS(1) is
associated with target osId(2)/WinOs(2). Like identities, relations
have a unique project identifier and read flag state which are used
to track what objects are left to be read in a project A project is
essentially one conversion at one time of BDNA inventory data to
IBM data. Essentially, a project is the definition of how to
convert from BDNA to CMDB models and the current state of the
conversion. You can load the project, start, or restart the
conversion. But you can only have one active conversion at a time
per project.
[0111] Another attribute defined per relation type is "dependency".
As mentioned above, a relation type can take the values source,
target, mutual or none, and it is used to enforce constraints
between the relation and its entities. In the case of the
"installedOn" relation, the dependency is defined as "source" which
means that there should be no Operating System entity that is not
installed on a computer system. Example, "WinXP installedOn
ComputerAbe2". WinXP is the source, ComputerAbe2 is the target. To
BDNA this means during discovery we found that ComputerAbe2 has
Windows XP installed on it. Part of the validation we do while
processing the model makes sure that every OperatingSystem also has
a relation to a computer system since BDNA cannot discover
`uninstalled` operating systems. On the CMDB side, you could have
many OperatingSystem disks sitting in a warehouse and not installed
in any computer system. The adapter interface will flag entities in
the model that break the dependency constraint and each adapter
using the interface can use the information to log problems or
report errors.
[0112] FIG. 5 represents a typical CMDB extraction configuration.
After loading a project, the first steps are to issue a command to
take the current BDNA data, transform it into the project specific
CMDB ER Model, and then run the adapter or other tools to send it
to its CMDB system. These are the steps performed during an
extraction. BDNA defined the BQL language for defining extraction
projects. The transformation engine can load projects defined in
that language and execute the steps above to run the project.
[0113] Block 44 represents a CMDB project having one or more BQL
reports that define the data that must be extracted from the BDNA
inventory data for each CI to be generated. Block 44 also contains
the transform rules and the export adapter configuration file.
Adapter configuration is used to control the process of exporting
from BDNA to a CMDB system. FIG. 20 represents the process of
exporting the data from the BDNA CMDB ER Model using the CMDB
specific adapters to external CMDB stores. The Model Export Block
Paging block 160 represents a process to break the export process
up into manageable size blocks while keeping all related data
together. Each configuration is specific to the adapter and its
external CMDB system. In the case of IBM's CMDB adapter you can
configure: location of CMDB books generated and how big is each
CMDB book for example.
[0114] The BDNA transformation framework 161 in FIG. 20 provides
support for: Plug-in different adapters based on the type of CMDB
export, querying export project definition, defining what part of
the CMDB Model is exportable and visible, and maintaining the state
of the export process, a consistent block based API to allow
dividing the export process into manageable block sizes (the
process represented by block 160 in FIG. 20).
[0115] Block 44 in FIG. 6 represents all the information associated
with a CMDB project. Before the project is loaded the information
is in XML form. When the project is running the information is in
memory.
[0116] The inventory attribute data that was automatically
discovered by the automated inventory system, such as the BDNA
software, is represented by block 46. Block 46 represents data
structures in memory that embody the base tables the BDNA software
generates in its persistent data warehouse. The base table has an
entry for every hardware and software asset discovered and all the
attributes about each asset that have been discovered during the
automated inventory process.
[0117] Step 1: BQL Process (block 48) reads BQL Reports (subset of
block 44) and uses existing BDNA data (block 46 Base Table) to
produce CMDB Report Results (block 50).The reports contain the
attributes needed for each CI.
[0118] Step 2: Xform Rules Processor (block 52) reads Xform rules
(subset of block 44) and processes each rule against CMDB Reports
Results (block 50) to produce Destination ER Model (block 54). The
transform rules processor is a computer programmed to execute
transform rules to convert the data format of data in the reports
to the format of data in a destination ER model data structure 54.
The destination ER model contains the data structures required by
CMDB. The transform rules processor also generates defined
relations in the CMDB ER model.
[0119] Step 3: Export adapter (block 56) reads Export Adapter
Configuration (subset of block 44) and Destination ER Model (block
54) to export to External Destination (block 58).
Mapping and Transformation Process Details
[0120] FIG. 7 is a flowchart of the flow of steps of a genus of
processes that can be performed to map BDNA inventory attribute
data to CMDB format data. FIG. 13 is a graphical diagram of the
workflow of one species of process represented by FIG. 7. Step 60
represents a process where a content writer for the schema mapping
defines BQL Reports that define the required attribute input data
to the transformation process where transformation rules are used
to convert the BDNA data into CMDB entity types. Reports are
defined for: 1) collecting the attributes required for the various
entities in the CMDB data model; and 2) specifying the relationship
between entities. In FIG. 13, the inventory report definitions are
represented by blocks 140. These reports define which inventory
data in the attribute data 142 which is needed to make up the CI
types of the target system. Block 144 represents the BQL reports
which are transform specific definitions. Inventory reports 140 are
used to process Discovery Data into report tables used by the BDNA
Inventory UI applications. Xform specific report definitions (BQL
reports) 144 are used to reorganize the BDNA discovery data (142)
into the CI's matching the target CMDB ER model and create
appropriate CMDB specific report tables. The two are different and
exist independently of each other. The purpose of the Inventory
reports is to provide information for the BDNA Inventory UI as
efficiently as possible and therefore it is highly de-normalized
data. The purpose of Xform specific reports (BQL reports) and
tables is to match the required CMDB ER model for each CI and their
relations. The Xform specific reports and tables are highly
normalized and closely match the target CMDB schema. The CMDB
transform specific definitions 144 are needed to: group data in
ways not normally required by BDNA automated inventory attribute
data collection systems; synthesize data required by the target
CMDB system which is not collected by the BDNA automated inventory
attribute data collection system, and define ER model relationships
not required by the BDNA automated inventory attributed data
collection source system.
[0121] The BQL reports are executed by a report engine 146 which
uses the definitions in the reports to extract the attribute data
specified in the BQL reports from the discovery attribute data
stored in store 142 by the BDNA inventory system. The report engine
stores the extracted attribute data in the BDNA inventory reports
and the CMDB specific reports 150.
[0122] The BDNA inventory system collects attribute data about
computer systems and operating systems in a single central report.
CMDB systems model computer systems and operating systems as
separate CIs and a relationship between them. BQL reports are used
to do the necessary identity and data splitting transformation and
normalizing to match CMDB's data model. An example of this is shown
in FIG. 12. Block 130 represents the fully aggregated computer
system and operating system attribute data collected by the BDNA
system. A BQL report called CSExtractReport extracts just the
computer system attributes from the inventory data represented by
block 130 and stores the computer system data in a store 132 which
is used as an input to the transformation rules for the attributes
of computer systems. Another BQL report called OSExtractReport
extracts the operating system attribute data and stores it is a
store 134 which is used as an input to the transformation rules for
operating system attributes. A BQL report called
RelationExtraReport is executed to extract the relationship data
between the computers systems and the operating systems and store
it in store 136.
[0123] Returning to the consideration of FIG. 7, in step 62, the
content writer (a person) for the schema mapping defines
transformation rules for mapping BDNA inventory attribute data in
the base tables of the BDNA persistent data warehouse to the CMDB
schema. Each transformation rule maps a single BDNA source to a
single CMDB type. There can be transformation rules mapping a
single BDNA source to multiple different CMDB types, but mapping of
multiple BDNA sources to a single CMDB type is not allowed. For
each rule, the writer of the transformation rule needs to specify:
the source; the target; and a mapping from source attributes to
target attributes. Some mappings specify value transformations
using Java code. An example of some transformation rules working to
transform names and units of measure is illustrated elsewhere
herein.
[0124] Step 64 represents the process of defining a CMDB
integration project. Projects need to be defined in some
embodiments to allow users to do different CMDB integrations from
the same schema. For example, one project may be defined for
exporting data to IBM CMDB, while another project is defined for
exporting data for BMC, while yet another project may be defined
for importing data from an IBM CMDB into the BDNA system.
[0125] The input needed to define a project includes: 1) name of
the project; 2) path of the directory from where to load the source
definitions; 3) path of the directory (or directories) from where
to load the transformation rules; 4) vendor name of the vendor of
the schema to which the BDNA data is to be transformed; 5)
connection details to connect to the CMDB target store (the input
source connection is based on the BDNA connection properties in
some embodiments); 6) type of data transfer (export or import); 7)
any global properties associated with the project; and 8) a
description of the project.
[0126] Step 66 represents the process of loading all XML
specifications associated with the project. There are various
components of the XML specification of a transformation project.
They are explained more below. A project XML specification is the
external form of all information needed to extract BDNA discovery
data, map it to the CMDB target ER model, and transform the ER
model data out to the target CMDB system. The XML specification is
the external set of files to be loaded in a specific BDNA system
when we want to do an export to an external CMDB target.
[0127] Step 68 represents the process of loading the transformation
rules for a given project into transformation engine 152 (usually a
programmed computer as is the report engine 146). This step
represents the process of parsing the transformation rules and
putting the necessary data in database tables.
[0128] Step 70 represents the process of executing the
transformation rules in the transformation engine 152. The process
comprises the steps: 1) checking if all the BQL Reports are up to
date and refreshing them if necessary; However, before any
transformation is executed the framework (161 in FIG. 20) checks to
see if a BQL report exists and if it is up to date compared to the
current BDNA discovery data. If no BQL report exists or the BQL
report is stale compared to the state of the discovery data, the
BQL Report is rebuilt before executing the transformation. 2)
generates output tables in a data format that matches the CMDB
schema by extracting the attribute data specified in the BQL report
from the BDNA base tables using a BQL processor; 3) iterate through
each input source and execute the applicable transformation rules
using a transform rules processor to populate the output tables
(data structures in the Destination ER Model).
[0129] Each transformation rule can: map BDNA attribute names into
the appropriate CMDB name; do unit conversions; or combine and
merge attributes as required by each CMDB.
[0130] An example of a Configuration Item sample transformation
rule for mapping BDNA's inventory data for a host into BMC's
Computer System showing a typical transformation conversion (Name
Mapping and Unit Conversion) is given in Table 1 below.
TABLE-US-00006 TABLE 1 Example of Transformation Rule Action: Name
Mapping and Unit Conversion CMBD_OS Report Attribute Mapping Type
BMC_ComputerSystem CI Comments osComputerDomain Name mapping
Workgroup hostName HostName BDNA does not track the next two
attributes, reuse hostname hostName Description hostName
ShortDescription serialNumber SerialNumber osComputerDomain Domain
flashMemorySize FlashMemory ifThroughput DataRate
operatingSystem_label Label Reuse same name for next attribute
operatingSystem_label BMC_Name operatingSystem_id Id cpu_list Unit
conversion CpuList BDNA collects a string, BMC expects a numeric
value. For example: "i586" should be mapped to 0, "PowerPC" should
be mapped to 3, and "ARM" should be mapped to 8. totalMemory
TotalPhysicalMemory BDNA tracks total memory in megabyte units, BMC
expects the value to be represented as gigabytes, divide by 1024.
hostname, nicList Merge attributes BDNAUniqueInfo BMC requires a
unique name for each Computer system CI, generate one by combining
hostname and mac addresses of the system. Type_cs, hardware
Category, type_attr, item, BDNA collect information about the type
manufacturerName, Model of computer into just two attributes. BMC
requires that the data be split into 5 different attributes. The
BCM_ComputerSystem_XFR rule contains java code to parse the two
attributes and generate the five expected by BMC.
[0131] The transformed attribute data Is stored in an ER Model
store 154 to await export to the target system.
[0132] Finally, in step 72, the data in the output tables is
exported to the target CMDB using the appropriate export adapter.
An IBM CMDB requires adapter 156 to extract the BDNA data into an
intermediate XML form. If the target system is a BMC CMDB, the BMC
CMDB can be directly connected to the ER Model 154 using JDBC and
does not require an adapter.
[0133] FIG. 9 is a flowchart of a method of doing business to do
the data transformation which includes the manual steps of writing
the BQL report programs, writing the transformation rule programs,
and writing the export adapter program. Step 80 represents the
process of studying the data model of the target system to
determine the class definitions, subtype relationships and
containment relationships and to determine the semantics and data
types and units of measure of each attribute of each class of asset
and each subtype thereof and any other information needed to do the
transformation.
[0134] Block 82 represents the process of studying the data model
of the source system to determine the differences over the target
system. Things that need to be determined are such things as: 1)
which attributes are collected about each type of asset that is
within a class definition in the target system data model; and 2)
what are the differences between the data format, units and
semantics of the attribute data in the source system versus the
data format, units and semantics the attribute data would need to
be in for storage in the appropriate class defined for the target
system data model.
[0135] Block 84 represents the process of writing one or more BQL
report programs capable of controlling a computer to extract for
every CI type in the target system, the necessary attributes for
the CI type in the target system which have been collected from the
same type asset collected in the source system.
[0136] Block 86 represents the process of writing one or more
transformation rules programs which can control a computer to
change the format, units and semantics of attribute data from the
source system to the format, units and semantics compatible with
the target system.
[0137] Block 88 represents the process of writing an export adapter
which can control a computer to invoke the application programmatic
interface (API) of the target system and use said API to load data
into said target system. The export adapter is written so as to be
conversant with the application programmatic interface of the
target system in that the export adapter knows the function calls
to make and knows the arguments to supply to store data in the
target system.
[0138] Block 90 represents the process of executing the one or more
BQL reports on a computer to extract the attribute data needed from
the source system to make up the Cis of the target system.
[0139] Block 92 represents the process of executing the one or more
transformation rules programs to take the attribute data extracted
by the BQL reports and transform it to the data format of the
target system.
[0140] Block 94 represents the process of storing the transformed
attribute data in an ER data model store. The ER data model store
is typically comprised of tables having the data structure of the
tables used to implement the target system data model.
[0141] Block 96 represents the process of executing the export
adapter program on a computer to export data from the ER data model
store to the target system.
Object Oriented Transformation Rules
[0142] In the preferred embodiment, the transformation rules are
written in an object-oriented style. This means, for example, that
where a CI type such as ComputerSystem CI 11 in FIG. 1 has subtypes
which are species of the genus, there are generic transformation
rules that apply to all species or subtypes (and are inherited by
all subtypes) within the class and there are specific
transformation rules for each species or subtype within the class.
The combination of both the generic transformation rules and the
specific transformation rules for the subtypes are used to
transform the attribute data from the source system needed for the
ComputerSystem CI and all its subtypes such as Windows computers,
Sun Sparc stations, etc. In other words, for a specific species or
subtype of the parent ComputerSystem CI, the transformation rules
of the ComputerSystem CI which are common to all species are used
to transform attribute data from the source system into attribute
data of the target system for all the species or subtype CIs. To
finish the process, transformation rules specific to each
particular subtype are used to transform the attribute data from
the source system which is peculiar to the subtype into attribute
data into the data format of the target system. That transformed
data is used to populate the subtype CI instances.
[0143] The generic (for the parent CI) and specific (for the
subtype CI) transformation rules for a subtype can be executed in
any order. Transformation rules that are common to a CI type with
subtypes are stored in an object which is the parent of subtype
objects each of which store transformation rules which are unique
to the transformation of attribute data unique to the subtype, as
shown in FIGS. 10 and 11.
[0144] FIG. 10 illustrates a class diagram for the objects which
store the object-oriented transformation rules for storage of the
Transformation Rules of the ComputerSystem CI with two subtypes
illustrated. Object 100 represents a memory object with a plurality
of attributes, each with a name and a value. One of those
attributes is shown at 102 and another at 104.
[0145] Each attribute has a name which is not important and each
has a value. The value is the transformation rule string (or a
pointer thereto in some embodiments) which defines how to transform
attribute data from the source system into attribute data in the
proper format for the target system for one particular attribute of
the ComputerSystem CI or class. Each attribute in the object 100 is
a transformation rule pertaining to transformation of attribute
data instances of one named attribute in the ComputerSystem CI.
[0146] Subtype object 106 is the object with attributes which are
transformation rules for the Windows computer subtype. Attribute
108 is an attribute of object 106 which stores a string which is
the transformation rule for a particular attribute of the Windows
computer subtype.
[0147] Subtype object 110 is the object with attributes which are
transformation rules for Sun type computer systems. Object 110 has
an attribute 112 which stores a transformation rule for attribute
A20 of the Sun Sparc computer systems.
[0148] FIG. 11 is a diagram illustrating how the object oriented
transformation rules can combine-transform information from two or
more objects in the inventory attribute data extracted by the BQL
report from the source system to write a single CI in the target
system or split-transform information from a single object
extracted by the BQL report from the source system into two or more
Cis in the target system. FIG. 11 also shows a subtype object 120
having as its attributes the transformation rules 1-3 inherited
from the parent object 116, and having transformation rules 4 and 5
which are unique to the particular subtype CI.
[0149] The attribute data extracted from the source system as a BQL
report is object 114. Object 114 is the BQL Report for the
ComputerSystem CI type represented by object 118. The attributes of
object 114 are the individual attribute values which have been
extracted from the source system by the computer running the BQL
report program in the source system data format. The object
represented by oval 116 is the object storing the transformation
rules for the CI object 118 which represents a CI type which has
one subtype CI 120. The attributes of the object 116 are the
transformation rules themselves which are written to transform the
source system attribute data stored in objects 1-4 into attributes
A1, A2, A3 and A4 of the CI object 118. Object 118 represents the
target system data model object for a particular CI type which has
the attributes A1, A2, A3 and A4. The particular example shown has
transformation rule 1 transforming the data from objects 1 and 2 in
the BQL report into the data format of and populates attribute A1.
Rule 2 transforms the BQL Report object 3 into the data format of
and populates attribute A2. Rule 3 transforms the BQL Report object
4 into the data format of and populates the attributes A3 and
A4.
[0150] CI object 118 has a subtype CI object 122. This subtype CI
122 inherits attributes A1 through A4 from the parent CI 118 and
has its own attributes A5 and A6 unique to this subtype.
Transformation rules subtype object 120 inherits transformation
rules 1 through 3 from the parent transformation rules object 116
and has additional transformation rules 4 and 5. Rule 4 transforms
attribute data stored as object 5 in Windows ComputerSystem BQL
report subtype object 124 into the data format of and populates
attribute A5 of the Windows species subtype CI object 122. Rule 5
transforms attribute data stored as object 6 in BQL report subtype
object 124 into the data format of and populates attribute A6 of
the Windows subtype CI object 122.
[0151] Object 120 can but does not necessarily have to have rules
1, 2 and 3 recorded therein because of the parent-subtype
relationship. It may refer processing to implement these
transformation rules to the code that implements these rules in
parent object 116. It does not matter whether the inherited rules
are processed first and then the rules specific to the subtype are
processed.
[0152] The object-oriented transformation rules structure is useful
because in a typical system the source system and target system
data models have thousands of classes and subclasses each of which
has many attributes. Therefore, there are even more transformation
rules than there are classes and subclasses. If all the
transformation rules of a parent class had to be copied into each
subtype or child class, and there were many subtypes, each time a
transformation rule for a parent CI object attribute was changed,
it would have to be changed for all the subtype CIs also. By
storing the transformation rules common to all the subtypes of a
parent CI only in an object that stores transformation rules for
the parent CI, each time one of these transformation rules was
changed, it would have to be changed in only one location. Each
subtype CI's transformation rules would be stored in objects unique
to those subtypes. When the attributes of the subtype CI were to be
populated during a transformation process, the transformation rules
for the parent CI type would be executed and then the
transformation rules for the subtype would be executed, in no
particular order since there is no dependency between the two sets
of transformation rules or their input data.
[0153] It is more difficult to create object-oriented
transformation rule sets because it is initially difficult to
determine which transformation rules are common to all subtypes.
However, once that is done, the maintenance of the rule set is much
easier. For example, suppose the parent CI has 20 attributes and
there are 10 subtypes. If there are 20 transformation rules for the
parent, without object-oriented transformation rules, there would
have to be 2000 copies made of the transformation rules for storage
in objects that store transformation rules of the subtypes. By
having the transformation rules stored in an object-oriented data
structure, with the 20 transformation rules common to all the
subtypes stored in a parent object and the transformation rules for
each subtype stored in objects unique to the subtypes and linked to
the object storing the common transformation rules, duplication of
the common transformation rules into all the subtype objects can be
avoided.
Mechanism to Build Self Consistent Blocks of Entities to be Loaded
into a CMDB
Goals
[0154] The goal of this embodiment is to output the data that is
collected in a schema (set of tables with relations between them)
so as to load the data in a CMDB system or any external system
using a block size which is appropriate to the target CMDB
system.
[0155] It is assumed that the data being loaded to the target
system is preprocessed so that it is stored in tables consisting of
[0156] 1. entity tables that contain instances of objects that have
attributes [0157] 2. relationship tables that contain instances of
relations, where relation contains references to two instances (in
other words relates one instance to another instance).
[0158] The main issue discussed here is how to partition the
complete set of data to be loaded into smaller blocks. It is
impractical to assume that the whole model can be processed as a
single operation for a CMDB for any but the simplest toy
examples.
[0159] Various CMDB systems impose requirements on the blocks of
data that can be loaded. The following list summarizes the
requirements imposed on the data that is loaded into a CMDB system:
[0160] 1. The data being loaded needs to be broken down into blocks
of data. This is because loading the entire data as a single entity
may not be possible for systems to handle. Further, interruptions
in the loading process for very large data as a single entity, such
as might occur in a power failure, would cause waste of computer
resources in attempting to reload the entire data set again. This
requirement is imposed by the fact that the total amount of data to
be loaded can be extremely large since BDNA discovery run against a
large enterprise collects huge amounts of data typically. There are
various limits imposed by systems that disallow loading of such
large amount of data as a single operation. Typical systems require
the data to be broken down to smaller blocks of manageable sized
data. There is a limit on the size of the blocks that can be loaded
into the CMDB systems. This limit is called as the "maximum block
size". [0161] 2. Each block of data needs to be self consistent
which means if the block contains a relationship instance (R1 that
relates Entity E1 to Entity E2), the related entities must be part
of the same block. In other words it would be illegal to send
Relation R1 and Entity E1 in one block and Entity E2 in another
block. Note that there may be situations where the same entity
needs to be sent as part of multiple blocks so as to satisfy this
requirement. This is inefficient because it involves copying the
same data into two or more different blocks. For example assume
that each block can contain up to 500 objects. Assume that one
instance E1 is related to 600 other instances. It won't be possible
to send E1 and all its related instances in one block because of
the size limitation. But it would be permissible to divide the data
such that one block has E1 with 300 related instances and another
block has a second copy of E1 along with the remaining 300 related
instances. The target CMDB has the capability to relate the two
instances of E1 that arrive in different blocks and Illustrate that
they are same. Note that self consistency of the blocks is required
because the target system may be storing the objects in its own
schema. Database schemas typically enforce referential integrity
[available in any database reference e.g., C. J. Date, An
Introduction to Database Systems, Eighth Edition, Addison Wesley,
2003.]. Such referential integrity makes sure that the data made
available in the database is consistent (i.e., it "makes sense").
For example the definition of a relation is incomplete unless you
know which objects are being related to each other. [0162] 3. The
goal of the embodiments taught herein is to load the CMDB system as
efficiently as possible. Efficient loading of the system requires:
[0163] a. Each block must be built so that it is as close to the
specified size limit as possible. There is an overhead to
processing a block. So having a very large number of small blocks
is inefficient compared to having fewer larger blocks.
[0164] However having very large blocks also leads to
inefficiencies. There is an optimal block size that can be computed
(and having blocks slightly smaller than ideal block size is
acceptable however it is not desirable to exceed the size limit).
Computation of the optimal block size is outside the scope of this
application. However the scope of this application is to attempt to
build blocks as close to the size limit as possible with individual
block integrity. [0165] b. The duplication of data must be
minimized for maximal efficiency. So the goal is to avoid sending
the same data multiple times in each of several different blocks as
far as possible. Typically the size limit may force the process to
copy entities, but such copying should be minimized since the
object gets loaded in the target system when the first copy is
loaded. Subsequent copies are required for consistency of the
blocks but do not add additional information to the target system.
[0166] c. Also note that the data being loaded is typically stored
in a database. As a result the efficiency of computation needs to
be measured in terms of database operations.
BACKGROUND
[0167] This embodiment concerns the loading of data from the output
schema to the CMDB system (or any external system). It is assumed
that previously the data has been transformed into a database
schema that corresponds to the data model of the target CMDB such
as by the processes described earlier herein. The database schema
consists of: [0168] 1. Entity tables: these tables contain the
entities (assets) that correspond to Cis in the target CMDB. [0169]
2. Relationship Tables: These tables define relations between two
entities possibly of two different types. For example, assume two
CI types such as, ComputerSystem and OperatingSystem types. The
ComputerSystem CI type refers to the hardware that is found by
discovery process and the OperatingSystem CI type represents the
Operating System e.g., Linux, Windows etc. that is installed on a
given hardware. There can be relationship between the instances of
these two types called InstalledOn. Each instance of the
relationship identifies one instance of ComputerSystem (say CS1)
and one instance of OperatingSystem (say OS1) such that OS1 is
installed on CS1. The relation is directed which means that the two
end points of the relation are asymmetric. For example, in the
above instance of OS1 being installed on CS1--the relationship
doesn't imply that CS1 is installed on OS1 (which is meaningless).
Note that the relationship between two types T1 and T2 could be
[0170] a. 1-1 Relation: where each instance of T1 is associated
with one and only one instance of T2 (and vice versa). [0171] b.
1-N Relation: where each instance of T1 is associated with possibly
multiple instances of T2. For example multiple Operating Systems
may be installed on the same hardware machine. [0172] c. M-N
Relation: where multiple instances of T1 may be associated with
multiple instances of T2 e.g., the relation between IP addresses
and machines--one IP address can be used by multiple machines and a
single machine can have multiple IP addresses.
[0173] Note that dividing a given set of data into smaller blocks
for various purposes has been discusssed in prior art (e.g., paging
mechanisms used by Operating Systems, reference: Deitel, Harvey M.
(1983), An Introduction to Operating Systems, Addison-Wesley, pp.
181, 187, ISBN 0201144735). The main difference between the problem
discussed in this application compared to the prior art is that the
mechanism discussed here is specific to the kind of data that
consists of entities related to each other and blocks being built
need to satisfy constraints that require processing and
understanding the data at a semantic level whereas the paging
mechanisms such as those used by Operating Systems are very generic
and apply to any kind of mechanism. Using a mechanism similar to
the one used in Operating Systems to builds blocks for the purpose
of loading a CMDB may result in inconsistent blocks that cannot be
processed by the target CMDB systems. If blocks were constructed
using arbitrary mechanisms for an Operating Systems paging
mechanism the blocks most likely result in bad performance of the
Operating System (but the blocks would not be considered invalid),
whereas for loading a CMDB systems one could easily generate blocks
that cause errors while loading in the CMDB and would be considered
invalid blocks. Such errors would happen e.g., if a relation
instance was added to the block without adding both the entities
related by the relation.
Terminology
[0174] Note that the terms Graph, nodes and edges are not defined
here but have the usual meaning as per any Computer Science
text.
[0175] Typically in this document, the graph represents a schema
storing entities and their relationships in a database schema to be
output to a CMDB. The terms nodes is used synonymously with the
term Entity Types in the output schema and the term edges is used
synonymously with relationships between such Entity Types.
Instances are specific assets of some particular entity type. In
other words, each entity type has one or more instances
thereof.
[0176] Distance between two nodes in a graph: the distance between
two nodes in a graph is the number of edges that need to be
traversed to get from one node to the other. For example, in the
graph in FIG. 22, the distance between nodes B and D is one because
only one edge can get us from B to D. On the other hand, the
distance between nodes B and E is 2 (need to traverse edges
B.fwdarw.C and C.fwdarw.E) and the distance between nodes B and F
is three (need to traverse edges B.fwdarw.C and C.fwdarw.E and
E.fwdarw.F.
[0177] Project: A Project is the complete set of data and metadata
that is accumulated so as to be able to load the discovered data
into a CMDB system. Such a project includes the actual instances of
all the discovered entities, their relations, and any metadata
required for processing the data so as to enable loading the data
into a CMDB.
[0178] Group: A group is a set of entity types and relationships
that should be processed together. Any two entity types from two
different groups are not connected directly or indirectly by any
set of relationships. Given the set of entities {A,B,C,D,E} and the
relations {A.fwdarw.B, B.fwdarw.C}, the relationships and entity
types can be grouped into two groups G1={A,B,C, A.fwdarw.B,
B.fwdarw.C} and G2={D,E}.
[0179] Block: A block is a consistent set of entities and relations
instances that can be processed for outputting to a CMDB together
as a single operation (transaction). The project configuration has
a `blockSize` which defines the maximum number of entities that can
be included in a block.
Dividing Input Entity types into Groups
[0180] Since the main constraint while building blocks using
entities is to put related entities together in a block, any two
entity types that are not connected to each other through relations
can be processed independent of each other. There is no reason to
put entities from two unrelated entity types in the same
block--unless there was space available in the block and there was
no reason to add more entities to satisfy the constraints arising
due to relations.
[0181] A typical graph formed by entity types and relations can be
divided into multiple groups of entity types such that no two
entity types belonging to two different groups have a relation
between them. For example FIG. 28 shows a graph that can be divided
into three groups. Group G1 is comprised of entities (asset types)
A, B, C and D: parent entity A which is parent to child entity B
which is a parent entity to entities D and C. Group G2 is comprised
of parent entity E which is parent to child entity F. There is no
relationships between any entity in group G1 and group G2. Group G3
is comprised of entities G, H and I with parent entity G being
parents to entities H and I and entity H being parent entity to
entity I. There are no relationships between any entity in group G3
and any entity in group G1 or G2, and that is why they are
separated into the groups into which they are separated.
[0182] Note that there are several mechanisms available in the
prior art that can be used to divide the graph into disconnected
groups [Introduction to Algorithms (Second Edition) by Thomas H.
Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliff Stein,
published by MIT Press and McGraw-Hill]. We do not describe any
such mechanism here, but any of these prior art methods could be
used to divide graphs into groups also. Some mechanism must be used
to create independent groups. FIG. 21 illustrates one example of
such a mechanism. Step 201 represents the process of making the
graph non directed by adding information to the relationship table
so that not only can the child entity type of any entity type be
found, but also the parent of any entity type can be found. Step
203 represents the process of creating the first group by tracing
the parent and child relationships in the relationship tables so
that all entities that are related to each other by child or parent
relationships are found and recorded as the first group. This
involves tracing all the relationships (symbolized by arrows in
FIG. 28) until dead ends are found and no new entities can be found
by tracing relationships. All entity types so found are recorded as
the first group, and a new group is formed in step 205 by repeating
the tracing process until no new entities can be found. Step 207
represents the process of repeating step 205 as many times as
necessary to create new groups until all entity types in the graph
have been processed.
[0183] The following pseudocode depicts how the computation of
groups guides the subsequent process of building the blocks (the
TransformationProject object stores the metadata including the
graph; assume that the getGroups( ) method of the
TransformationProject object knows how to compute the groups in the
graph associated with the transformation project).
TABLE-US-00007 public void doOutput(TransformationProject project)
{ Set groups = project.getGroups( ); // compute all the groups in
the project for (each group belonging to groups) { process group; }
}
[0184] Another example method to form the groups is shown in
flowchart form in FIG. 31. The process is basically comprised of
the steps: 1) make the graph non directed so that whichever entity
one starts with, if it has a parent, the parent can be found (step
200); 2) start with any node (entity class, each class having
possible multiple instances thereof) in the graph and find all
other entities that are either its parent or child entity and put
them into the first group (step 202); 3) once all parents and
children entities of the starting node are found, proceeding to the
next node in the graph that is related to the node just processed
and repeating the process of tracing all its parents and children
and listing them as part of the group (step 204); 4) proceeding to
another node related to the node just processed and repeating the
process of tracing all nodes related to the node being processed
and recording each node found as part of the group being formed
(step 206); and 5) repeating the process of constructing a new
group by picking a node and tracing its relationships to all other
nodes and recording each node found as part of the group until no
new node is found and declaring the recorded nodes as the new group
(step 208); and 6) repeating this process to construct other groups
till all nodes in the graph are processed (step 210).
[0185] Step 200 in FIG. 31 represents the process, making the graph
non directed comprises going to the table where relationships are
stored and adding information to it so that each child entity has
an entry identifying its parent. For example, using the graph of
FIG. 22, the relationship table would look like table I below
before it was made non directed and would look like table 2 below
after it was made non directed.
TABLE-US-00008 TABLE 1 Directed relationships for graph of FIG. 28
A B B D, C C D E F G H, I H I
TABLE-US-00009 TABLE 2 Non directed relationships of graph of FIG.
28 A B B A B D, C D B, C C D C B E F F E G H, I I G H G H I I H
[0186] Step 202 in FIG. 31 represents the process of picking any
node in a graph such as FIG. 22 and using a relationship table like
Table 2 for the graph and finding all the parents and children
entities of the node with which the grouping process for group 1
was started. Once all parent and children entities of the starting
node are found, each is listed in the group being built. For
example, if the starting node were to be B, using Table 2 would
indicate that B knows C and D but it also knows A as its parent. So
A, B, C and D would be added to the first group. Going to each of
nodes A, B, C or D and tracing its parents and children from Table
2 would yield no new entity nodes. This would be the result of
steps 202, 204, 206 and 208 and the conclusion of step 208 would be
that no new entity nodes have been found so group I is
complete.
[0187] Step 210 represents the process of starting to construct the
next group. This is done by picking any other node in the graph
which is not one of the entity nodes in the group just completed
and repeating steps 202, 204, 206 and 208 until the next group is
completed. Step 212 represents repeating step 210 as many times as
necessary to complete other groups until all nodes in the graph
have been included in a graph.
[0188] The reason that all entities related to a particular entity
either as children or parents needs to be known is so that each
block being built is self consistent without the need to duplicate
entity nodes. Self-consistent means that all entities related to a
particular entity are in the same block. It would be possible to
make a self-consistent block without putting all entities related
together in the same data block, but that would require duplication
which would result in inefficiency. For example, in FIG. 28, it
would be possible to make two self-consistent blocks out of group
G1 with one block comprised of instances of the B, C and D
entities. The other block would be comprised of instances of the
entities B and A. This would require duplication of the data from
the instances of entity B in both data blocks, and this would be
inefficient.
[0189] The next section discusses the details of the steps used for
computing the blocks of the appropriate size for a given group.
Heuristic-Based Method to Build Self-Consistent Blocks
[0190] In the following discussion we assume that the independent
groups have already been determined, and the size of the blocks is
predominantly determined by the number of entities in the block.
The preferred heuristic-based method to build self-consistent
blocks is illustrated in FIG. 30 in flowchart form, and makes the
following assumptions: [0191] 1. The size of the relations are not
significant (relations are just a listing of the parent and each
child) [0192] 2. All entities are of approximately the same
size.
[0193] If necessary, adjustments can be made for any discrepancies
caused by the two assumptions: An example of a situation where an
adjustment might be necessary is as follows. Suppose in FIG. 28,
entity type B is a computer system and entity type D is an
operating system and entity type C is a disk drive. D and C are
children entities of the computer system because both the operating
system and the disk drive are installed on the computer system. An
instance of a computer system entity B may have 10 K bytes of data.
More data is collected on operating systems though so suppose each
instance of an operating system entity D has 100 K bytes of data.
Suppose also that each instance of a disk drive has 50 K bytes of
data. Suppose there are 100 instances of computer systems B. This
means that loading each instance of a computer system B will cause
10 K bytes of data for the computer system to be loaded into the
data block, and will also cause to be loaded 100 K bytes of data
for the operating system and 50 K bytes of data for the disk drive.
If this is done 100 times for all the instances of computer system
entity type B, the maximum data size of the data block can be
exceeded and duplication will result in such a case. To avoid
overrun of the maximum size limit, a weighting algorithm based upon
connectivity is used. That weighting mechanism is described below.
[0194] 1. We can actually assume a non-zero finite size per
relation (typically all relation instances have the same size since
the basic information in a relation instance is just the identity
of the parent and child entities). The number of entities per block
can be reduced by a specific fraction that makes adjustments for
the relation instances added to the block. [0195] 2. If there are
huge differences between the sizes of the entities of different
types, an algorithm which assumes all entity instances are of
approximately the same size can "break" by exceeding the maximum
block size. To prevent that, the preferred algorithm taught here
uses a weighting algorithm based upon the relative size of the
parent and child instances. Using this weighting, the cardinality
of each entity is "normalized" by multiplying the cardinality times
the weighting ratio. The resulting product is the normalized
cardinality that the entity has. The normalized cardinality affects
the connectivity which affects the sort order, all of which will be
explained further below in connection with the description of the
algorithm to build the data blocks. This weighting process helps
prevent overrun of the maximum data size for the block. The process
of calculating the weighting ratio is represented by step 218 in
FIG. 30. Basically, the weighting ratio is the data size of the
child entity divided by the data size of the parent entity.
[0196] The following are the steps followed by the heuristic-based
mechanism to build self-consistent blocks of data for one group of
entity types and relationships, and are illustrated in FIG. 30,
comprised of FIGS. 30A and 30B.
[0197] The preferred algorithm to build the self-consistent blocks
of size no more than the maximum size starts with the most
connected nodes first, as determined by step 222 in FIG. 30. This
is because when one stores an entity node in a self-consistent
block for export, all entity nodes related to the entity node
assigned to the block must also be taken in the same
self-consistent block to maintain the consistency thereof. The
exception to this rule is when taking an instance of an entity and
all instances of entities related to the selected entity would
exceed the maximum data size of the block. In such a case copying
of data already loaded into the block into another block would
occur for purposes of self-consistency, and this would result in
inefficiency. This is why the algorithm starts with the most
connected entity nodes first. Taking the most connected nodes first
will result in the taking of the most data because the entity nodes
having relationships with the most connected node must also be
taken. The situation is the data analogy of picking one grape up in
a bunch and bringing the whole bunch up with it as opposed to
picking up a single grape. So to avoid exceeding the maximum block
size, the most connected node is taken first and the amount of data
taken is calculated after taking all the related entity nodes also.
If there is still room in the data block, the next most connected
entity node is then taken.
[0198] We use the example graph shown in FIG. 22 throughout the
following description of the process of FIG. 30 to illustrate the
preferred method. The first step is to divide the graph into groups
of related entities as symbolized by step 214. This can be done
using any algorithm including prior art algorithms. One example of
how to do this is shown in FIG. 31. The process to build the data
blocks from the groups starts with step 216. [0199] 1. Step
216--Compute Relationship Cardinality: Cardinality means the number
of instances of each relationship. In the database implementation,
each relationship is represented by a table with a row for each
instance of such a relationship. For example, if entity B is a
computer system and entity D is an operating system and entity C is
a hard disk, and there are 98 computer systems in a company, each
having an operating system of entity type D, the number of rows in
the relationship table showing particular operating systems
installed in particular computer systems would be 98. The
cardinality of that relationship would be 98. It is simple to
compute the total number of rows in each relationship table in step
216 to compute the cardinality of the relationship represented by
the relationship table. Each relationship table represents one
relationship. For the example graph of FIG. 22, the resulting
cardinality data for each relationship is shown in dashed boxes
next to the relationship lines in the graph of FIG. 23. In the
example graph shown in FIG. 23, the following cardinality values
are assumed for the relationships:
TABLE-US-00010 [0199] No Relationship Count 1 A .fwdarw. B 1000 2 B
.fwdarw. C 10000 3 B .fwdarw. D 5000 4 C .fwdarw. D 1000 5 C
.fwdarw. E 2000 6 E .fwdarw. F 4000
[0200] 2. Step 218--Compute a Weighting Ratio: This step is used in
the preferred embodiment, but may be omitted in some alternative
embodiments especially where the size of all instances of entities
is approximately the same. In the preferred embodiment, the
weighting algorithm is used to prevent overruns of the maximum data
size of a block when the size of data of instances of a child
entity far exceeds the size of data of an instance of the parent
entity of said child. The weighting ratio is simply the data size
of an instance of the child entity divided by the data size of an
instance of the parent entity. As will be described further below,
the cardinality of each relationship is multiplied by the weighting
ratio for that particular relationship so as to generate a
normalized cardinality. The normalized cardinality is used in
calculating the connectivity metric of each relationship, and the
connectivity metric is used in the sorting of step 222 to order the
entity types in descending order of connectivity. [0201] 3. Step
220--Compute Connectivity Metrics for Entity types: For each entity
type step 220 computes the connectivity metric using the normalized
cardinality. The connectivity metric is defined as the sum total of
the normalized cardinality of all relationships that are either
incoming to the entity type or outgoing from the entity type. In
embodiments where the weighting ratio is not used, calculation of
the connectivity metric of an entity type simply involves summing
the cardinalities of each of the incoming and outgoing
relationships of the entity type. In other words, the normalized
cardinality is the total of all parent and child relationships for
an entity type divided by the weighting ratio. For example, suppose
the weighting ratio defined by the data size of an instance of B
divided by the data size of an instance of A is one. For entity
type B, the relationship A.fwdarw.B is incoming and relationships
B.fwdarw.C and B.fwdarw.D are outgoing. The sum total of the
cardinality of all the relationships incoming to and outgoing from
entity type B is 1000+10000+5000=16000. This value represents the
connectivity metric for entity type B. If the weighting ratio were
two, representing a situation where the data size of an instance of
entity type B is twice as large as the data size of an instance of
entity A, then the connectivity metric for entity type B would be
16000 multiplied by 2=32000. The weighting ratio would move this
entity type up in the sort list to a higher position so that it
would be taken earlier than otherwise would be the case if the data
size of an instance of the child entity type were the same size as
an instance of an entity of the parent type Similarly the
connectivity metrics of the remaining entity types are computed.
The table given below assumes the weighting ratio is one for each
parent-child relationship so that the connectivity metric is simply
the sum of the cardinalities of all the incoming and outgoing
relationships of an entity type.
TABLE-US-00011 [0201] Calculation showing sum Connectivity Entity
of incoming/outgoing metric for the No Type relationship
cardinality entity type 1 A 1000 1000 2 B 1000 + 10000 + 5000 16000
3 C 10000 + 1000 + 2000 13000 4 D 5000 + 1000 6000 5 E 2000 + 4000
6000 6 F 4000 4000
[0202] 4. Step 222--Order entity types in decreasing order of
connectivity: step 222 represents the process of sorting the entity
types in decreasing order of the connectivity metric value. For the
example above, the data after such ordering will be as follows:
[0203] a. B (16000) [0204] b. C (13000) [0205] c. D
(6000).rarw.Note that D/E have same value of connectivity metric.
So their order can be interchanged. [0206] d. E (6000) [0207] e. F
(4000) [0208] f. A (1000) The reason this step is done is to help
prevent overfilling a data block by starting the process of filling
the data block with the entity with the highest connectivity
metric. The entity type with the highest connectivity metric will
take the most data with it when an instance thereof is loaded into
the data block along with all related instances. For example,
suppose a computer system entity type is the most connected entity
type because it is the parent entity type to an operating system
entity type, a disk drive entity type, a network interface card
entity type, several different application program entity types, a
removable hard drive entity type, a flat screen display entity
type, a keyboard entity type, and a pointing device entity type.
Each parent instance of a computer system, when loaded into a data
block, will take with it instances of all these different child
entity types, so the more connected a entity type, the more data
instances thereof will take with it when they are loaded into a
data block. By taking these more connected instances first when the
maximum amount of room is available in the data block, there is
less chance of overrunning the maximum data size of the block.
[0209] 5. Step 224--Building the Block by adding Data to it: Step
224 represents the process of starting to build a self-consistent
data block. The details of the process of adding the data of all
related entity instances to a data block being filled is given in
flowchart form in FIG. 32. The relationships of instances of
particular entity types is illustrated in FIG. 33. Suppose entity
type A is a network, entity type B is a computer system, entity
type D is an operating system and entity type C is a printer.
Instances A1 and A2 represent individual networks and arrow 230
represents the fact that computer system instance B1 is coupled to
network instance A1. Arrows 232 and 234 represent the fact that
computer instances B2 and B3 are coupled to network instance A2.
Arrow 236 represents the fact that operating system instance D1 is
installed on computer system B1 and so on for the other operating
system instances. Arrows 238 and 240 represent the fact that
printers C3 and C4 are both coupled to computer system B3. So for
the parental relationship between entity types A and B, there will
be a cardinality of three because the relationship table for the
relationship A is parent of B will have three rows in it, each row
containing the information represented by one of the arrows 230,
232 and 234. The connectivity of a type is simply how many
relationships, both incoming and outgoing, it has. The connectivity
metric of a type is the sum of the number of rows in each
relationship table where that type is at one end of the
relationship arrow or the other. [0210] 6. Step 226--Process the
Remaining Entities in the Group: Step 226 represents the process of
loading into the data block instances of the remaining entities in
the group that have not been included in any data block built in
step 224. Instances of entities processed in this step are
instances of entities that are not related to any other entity in
the group. The processing of such entities is simple, since they
can be grouped together in any order so as to build a block that
satisfies the size limit. [0211] 7. Step 228--Mark the Entities So
Processed "Done": Each time a relation/entity instance is added to
a block, the relation/entity is marked DONE in the database in the
table storing the corresponding information. A relation that is
marked DONE doesn't need to be processed again. An entity that is
marked DONE may have to be processed again since the same entity
may have to be added to multiple output blocks in cases where the
maximum size limit prevents all related instances from being loaded
into a data block. However, it is necessary to remember somehow
that instances of an entity have been processed before so as to
find all the "remaining" entities that still need to be processed
as described in step 226 above.
[0212] Another reason to remember which entities/relations have
been processed before is to be able to process the data after a
failure of the system (e.g., due to power failure). For example,
loading of large amount of discovered data into a CMDB system can
take several hours. If after an execution of several hours, the
system is forced to shutdown due to a catastrophic failure such as
power shutdown, the mechanism of step 228 that marks the processed
entities/relations as DONE will prevent the need to re-load the
entities/relations that have already been loaded. [0213] 8. Step
229--Repeat the process of FIG. 30 for each group until all groups
in the graph are exhausted.
[0214] Referring to FIG. 32, the following steps show how to build
one or more self-consistent data blocks by incrementally adding
data of instances from a group to it:
[0215] Step 242--Start with an independent group created by any
process such as the process of FIG. 31 which contains the entity of
the highest connectivity metric. Suppose there are K entity types
in that group. Step 242 calculates the value N divided by K where N
is the maximum limit on block size is N expressed in terms of the
number of entities that can be stored in a self-consistent data
block. N is calculated by dividing the maximum block size desired
(or set by the CMDB) in gigabytes by the average data size in
gigabytes (or some other measure of data size consistent with the
units for the maximum data size of the block) of an instance of the
entity type in gigabytes. This calculation of N assumes all
instances of entity types in the graph are the same approximate
size. A weighted average of data sizes of instances of the entity
types in the graph could also be used for N if the size in
gigabytes of instances of the entity types varies substantially
from one entity type to another. The weighted average would be the
sum of the instance data sizes with the instances data size being
weighted by the number of instances of each particular data size
there are. The average size of an entity type is the average data
size of an instance of that entity type.
[0216] Once N/K is calculated, the resulting number is the number
of instances of the entity type with the highest connectivity
metric are taken and loaded into the data block. As each instance
is loaded, it is marked done. If the first block fills up before
all instances of that entity type are taken, a new block is started
using the instance of the entity type being processed which are not
marked done.
[0217] As an example of the calculation of step 242, suppose the
maximum block size N in entities is calculated to be 1000 and the
group size of the group containing the most connected entity type
is 4 different entity types such as group G1 in FIG. 28. So
1000/4=250. Suppose also that in the graph of FIG. 28 among all
three groups G1, G2 and G3, entity type B has the highest
connectivity metric and entity type C has the second highest
connectivity metric. Therefore, 250 instances of entity type B will
be loaded into the first data block. Loading of instances of other
entity types related to each of these 250 instances of entity type
B may occur too as will be described in other steps the details of
which are given below. Call the entity type with the highest
connectivity metric Tmax. Assume that all the instances picked and
loaded into a data block are added to a data block called
currentBlock.
[0218] The size limit on a data block is not absolute and one can
exceed the maximum block size a little, but exceeding it by a great
deal is undesirable. And if the block size is consistently
exceeded, a performance penalty over other software that does not
exceed the block size will be present.
[0219] It is possible to guarantee that the size limit will not be
exceeded by checking the amount of data stored in each block after
each instance of an entity type is loaded and all its related
instances are loaded. If the last loading operation caused the
maximum size limit to be exceeded, one can pull out from storage in
the block the last instance loaded and its related instances which
caused the block size to be exceeded and repeat this process as
many times as necessary till the maximum size limit is not
exceeded.
[0220] Step 244--Process each relation that connects related entity
types at distance 1 from entity type Tmax so as to load the first
level of instances related to the instances of entity type Tmax
just added in step 242 to currentBlock. What this means is that the
relationship table is consulted for the relationships that entity
type Tmax has and the entity types that are only one level away
from Tmax are determined. For example, in the graph of FIG. 22,
entity types A, D and C are all one level away from Tmax entity
type B, but entity types E and F are more than one relationship
level away from Tmax entity type B. So step 244 represents, in this
example, the process of loading into currentBlock instances of
entity types A, D and C which are related to the 250 instances of
entity type B just loaded. A running count of the amount of data in
the instances loaded into each data block is kept in most
embodiments so as to know when the size limit of the data block has
been reached.
[0221] So in other words, if B is the entity type picked by step
242, then entity types at distance 1 are A, D and C. All instances
of entity types A, D, and C that are related to instances of entity
B loaded into currentBlock through relations A.fwdarw.B, B.fwdarw.C
and B.fwdarw.D respectively are added to the current block.
[0222] Step 246--Repeat step 244 to load into currentBlock related
instances of entity types at distance 2 (one more than previous
step) from entity type Tmax. In the graph of FIG. 22, entity types
of distance 2 includes only E. So in this example, step 246
represents the process of loading into currentBlock all instances
of entity type E that are related to instances of entity type C
already loaded into currentBlock via relation C.fwdarw.E are added
to currentBlock. Note that in this example of the graph of FIG. 22,
such instances can only be of entity type E, but in some other
graph where entity type C is parent to two different entity types,
say E and G, then the instances loaded in step 246 would be
instances of entity type E and G which are related at distance one
from the instances of entity type C already loaded into
currentBlock.
[0223] The reason the instances at relation distance two from the
instances of the entity type which is Tmax are not loaded into
currentBlock during step 244 is because the identities of the
instances at distance two from the instances of Tmas are not
present in the relationship tables showing relationships of other
entity types to the entity Tmax. In the preferred embodiment, to
learn the identities of the entity types at relation distance two
from entity type Tmax requires consultation of the relationship
tables showing all incoming and outgoing relationships with the
entity types at relation distance one from entity type Tmax. This
determination is made in step 246, and the instances of the entity
types at distance two which are related to the instances at
relation distance one from entity type Tmax which have been
previously loaded into currentBlock are then loaded into
currentBlock.
[0224] Step 248--Repeat steps 244 and 246 to load all instances of
entity types related to entity type Tmax into current block,
expanding the level of the search by one relation level each time
until the complete set of entity types related to entity type Tmax
in this group is exhausted. In other words, keep searching for
related entity instances, expanding the search by one level each
time using the relation tables until no more related entities in
the group can be found. Each time an instance is loaded, it is
marked done and each time all instances of a relation are loaded,
the relation is marked done, as taught in step 228 of FIG. 30B.
[0225] Step 250--Repeat the process for all instances of all entity
types in descending order of connectivity metric until the entire
group is loaded into one or more blocks. Each time a data block is
filled, the process starts again on the same group and is repeated
until all instances from the group have been loaded into a data
block and marked done. Each time a new data block is started, if
there are instances of Tmax entity type still not marked "done"
those instances are started with by taking N/K of these instances
and then repeating the expanding search process described above
with respect to steps 244 through 250. When all instances of entity
type Tmax have been loaded in a data block and marked done, the
process continues starting with instances of the entity type with
the next highest connectivity metric. The process continues in
decreasing order of connectivity metric until instances of all
entity types in the group have been loaded and marked done. The
process then proceeds to steps 226, 228 and 229 in FIG. 30 to load
all "singleton" instances of entity types not related to any other
entity type, mark each instance done after it is loaded and repeat
the process for all other groups until all instances of all groups
have been loaded into one or more data blocks.
[0226] Each data block is self-consistent because all related
instances will be loaded in it, and copying of data multiple times
will occur if necessary to keep the data blocks self-consistent.
But because of the heuristic nature of the process, copying of the
same data into more than one block is minimized if it occurs at all
thereby making maximum efficiency of use of computer resources.
Best Case and Worst Case Situations for the Proposed Mechanism
[0227] The heuristic based mechanism provided works best when the
distribution of the edges between the nodes is uniform. For
example, assume a 1-N relation A.fwdarw.B such that A has 1000
entities and B has 2000 entities and there are 2000 instances in
the relation. The distribution is considered ideally uniform when
the each entity in A is connected to approximately 2 entities in B
i.e., each entity of A has two out going edges to B. The
distribution will be highly non-uniform if e.g., 1 instance of A
had all the 2000 edges (and the remaining had none).
[0228] Note that in the ideal situation, if all the relations were
1-1, the above algorithm will pull out N/K entities of each type.
Since there are K entity types, the resulting block will consist of
N entities, which is the maximum allowed block size. Also there
will be no duplication of data between different blocks.
[0229] Let us consider the worst case where one instance of A (say
A1) has 2000 edges. Assume each block could have 100 entities.
There is no way to build a single block that includes all related
entities that include A1. One way to build the blocks would be to
replicate A1 in 20 blocks where each block includes 100 entities
from B (strictly speaking the blocks will be of size 101--but let
us ignore the slight overflow in the block size). Note that having
to copy A1 into multiple blocks leads to a source of inefficiency
since A1 has to be communicated twenty times and loaded into the
system.
[0230] This example illustrates an extreme case for purposes of
explanation but in general the goal is to avoid duplication of
entities into multiple blocks. And the proposed mechanism achieves
this goal to a very large extent esp if the distribution of edges
between the nodes is uniform.
Efficiency of Loading Data into a CMDB Achieved by the Current
Approach
[0231] Note that performance comparison of the proposed approach
with existing approaches (available in prior art) has not been
presented because the authors are not aware of any relevant prior
art that approaches this problem. The reason for this is primarily
due to the fact that to our knowledge, discovery tools available as
prior art do not discover significantly large amount of data so as
to make the problem of loading the data into a CMDB a significant
issue. For example, when the BDNA team tried to inquire from
Vendors about their load testing of their systems, the Vendor had
tried load testing using 10,000 CIs. The discovery system of BDNA
easily discovers asset data that is multiple order of magnitude
more than the said number (for large enterprises number of assets
including hardware and software assets that potentially translate
into CIs can be easily as large as several millions).
[0232] An example scenario that was tried with a particular Vendor
of CMDB, around 76,000 CIs and 47,000 relation instances were
loaded into a CMDB system which took about 7 hours. We do not
present very precise Illustrations because the performance depends
on several factors such as the kind of hardware machine used for
running the CMDB, the effect of network load, the target CMDB
system etc. (different Vendors perform differently). As a result we
have presented approximate results based on few runs of the dataset
that we implemented. Also note that doing such performance studies
requires significant amount of resources making it difficult to do
such research. Also, the performance impact can be easily analysed
without doing actual performance studies (which makes such study
less important).
[0233] The important fact to note is that if a significant number
of CIs are duplicated during the loading of a CMDB, the loading of
the data to the CMDB can take extra time running into several
hours. If customers need to load the CMDB data on a regular basis
(for example, weekly) such performance makes significant impact on
the usability of system.
[0234] To discuss efficiency of the presented approach we present
two alternative approaches that have drawbacks compared to the
presented approach.
Process One Relation at a Time
[0235] This approach takes one arbitrary relation at a time and
processes entities related by the given relationship. If an entity
type E1 is connected to two other entity types E2 and E3 by two
separate relationships R1 and R2, entities from type E1 will be
communicated twice--once while processing relation R1 and again
while processing R2. Essentially, an entity becomes part of as many
blocks as the relations that it forms part of. Note that if no
entity type in the graph was connected to more than one other
entity type, this approach will perform as well as the presented
approach. However if entity types were connected to more than one
other entity types, this approach requires significant duplication
of entities in blocks. For example, if on an average each entity
type was related to two other entity types this approach will send
twice as many entities to the CMDB as the presented approach
causing 100% extra overhead.
Incrementally Grow a Block with Unordered Nodes
[0236] This approach doesn't provide any specific order to the node
as provided by the approach presented in the application. Assume
the maximum block size is N entities and there are K entity types
in a block. This approach takes the graph as provided and starts
with any entity type and adds N/K entities of the said entity type
to the current block being built. It further takes all entities
that are related to the entities in the current block at distance 1
and continues to add more and more entities by increasing the
distance by 1 each time. The building of the current block needs to
stop if adding more entities causes the block size to exceed N.
Note that the drawback of this approach is that since the entity
types are not ordered in any particular order, the number of
related entities being added at each step becomes unpredictable.
For example consider the graph shown in FIG. 29.
[0237] The graph has 4 entity types A, B, C, D and three
relationships A.fwdarw.B, B.fwdarw.C, C.fwdarw.D. Assume
cardinalities of relations as follows: A.fwdarw.B as 1000,
B.fwdarw.C as 2,000, and C.fwdarw.D as 4,000. Also assume that each
entity of type A is related to two entities of type B; each entity
of type B is related to two entities of type C; each entity of type
C is related to two entities of type D. Since nodes are picked in a
random order it is possible that the entity types are picked in the
order A, B, C, D. Assume, that the block size specified for the
project (value if N) is 400 entities. Since number of entity types
in the group (value of K) is 4, the value of N/K is 100. If 100
nodes of entity type A are picked, it is likely to bring in 200
entities of type B which will further bring in 400 entities of type
C which will further try to bring in 800 entities of type D. Of
course, since the block size limit is 400, so the block accordingly
will consist of 100 entities of type A, 200 entities of type B and
100 entities of type C (we need to bring in only a subset of the
related entities of type C). Note that such a block will result is
duplication of several entities in subsequent blocks. Since only
1/4.sup.th of the entities of type C related to entities of type B
were used, the remaining 3/4.sup.th of the entities of type B (150
entities) must be duplicated in subsequent blocks (at least).
Furthermore, since none of the entities of type D were included in
the block, the corresponding related entities of type C (100
entities) need to be included in subsequent blocks as well to get
the entities of type D. So in a block of 400 entities if 250
entities are duplicated, this causes approximately 250/400*100
i.e., 60% extra overhead.
[0238] On the other had let us compare the performance based on the
best mode approach discussed in this application. The connectivity
metrics for the various entity types is A=1000, B=1000+2000=3000,
C=2000+4000=6000, D=4000. By sorting the entity types in reverse
order of the connectivity metric we get the list C, D, B A (we
refer to the list as L). As explained above, the value of maximum
block size (N) is 400 and the number of entity types in the group
(K) is 4. The value of N/K is 400/4=100. The first entity type in
the list L is picked which is C. Adding 100 entities of type C, the
next step picks all related entities of type C and D that are at
distance 1. For 100 entities of type C, there are 50 entities of
type B and 200 entities of type D. The next step picks entities at
distance 2, i.e., entities of type A related to the block built so
far. Since the block built so far consists of 50 entities of type
B, there are only 25 entities of type A. The resulting block
consists of 100 C's, 200 D's, 50 B's and 25 A's=total of 375
entities. The block was constructed within the required limit. Note
that the important feature of the block so constructed is no
entities need to be duplicated in subsequent blocks since all
related entities have been included in the block.
[0239] Although the invention has been described in terms of the
preferred and alternative embodiments described herein, those
skilled in the art will appreciate other alternative embodiments
which are within the spirit and scope of the invention disclosed
herein. All such alternative embodiments are intended to be
included within the scope of the appended claims.
* * * * *