U.S. patent application number 13/480279 was filed with the patent office on 2013-11-28 for systems and methods for generating a common data model for relational and object oriented databases.
This patent application is currently assigned to SAP AG. The applicant listed for this patent is Steffen Geissinger, Tobias Friedrich Zahn. Invention is credited to Steffen Geissinger, Tobias Friedrich Zahn.
Application Number | 20130318128 13/480279 |
Document ID | / |
Family ID | 49555886 |
Filed Date | 2013-11-28 |
United States Patent
Application |
20130318128 |
Kind Code |
A1 |
Zahn; Tobias Friedrich ; et
al. |
November 28, 2013 |
Systems and Methods for Generating a Common Data Model for
Relational and Object Oriented Databases
Abstract
In one embodiment, a computer-implemented method comprises
receiving a query in a controller. A data store stores data in a
relational database. The database comprises a plurality of fields
and a plurality of data elements. Each field is associated with a
portion of the data elements. The database is arranged as at least
one table, and the query identifies data of an object to be stored
in a data set. The method further comprises determining, by the
controller, whether the relational database includes a field
associated with a class of the object; and if the relational
database does not include a field associated with the class of the
object, adding a column to the relational database, the added
column having a field corresponding to the class. The method
further comprises adding the data of the object to the relational
database.
Inventors: |
Zahn; Tobias Friedrich;
(Stuttgart, DE) ; Geissinger; Steffen; (Walldorf,
DE) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Zahn; Tobias Friedrich
Geissinger; Steffen |
Stuttgart
Walldorf |
|
DE
DE |
|
|
Assignee: |
SAP AG
Walldorf
DE
|
Family ID: |
49555886 |
Appl. No.: |
13/480279 |
Filed: |
May 24, 2012 |
Current U.S.
Class: |
707/803 ;
707/E17.005; 707/E17.045 |
Current CPC
Class: |
G06F 16/211 20190101;
G06F 16/289 20190101; G06F 16/284 20190101 |
Class at
Publication: |
707/803 ;
707/E17.045; 707/E17.005 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A computer-implemented method comprising: receiving a query, in
a controller, wherein a data store stores data in a relational
database, the database comprising a plurality of fields and a
plurality of data elements, wherein each field is associated with a
portion of the data elements, wherein the database is arranged as
at least one table, and wherein the query identifies data of an
object to be stored in a data set; determining, by a data modeling
engine, whether the relational database includes a field associated
with a class of the object; when the relational database does not
include a field associated with the class of the object as
determined by the data modeling engine, the data modeling engine
adding a column to the relational database, the added column having
a field corresponding to the class; and adding, by the data
modeling engine, the data of the object to the relational
database.
2. The method of claim 1 wherein the at least one table is a single
table including all objects.
3. The method of claim 1 wherein the at least one table includes a
first table and at least one second table for each sub-class of a
class of the first table.
4. The method of claim 1 wherein the at least one table includes a
main table and at least one partition table for each sub-class of a
class of the main table.
5. The method of claim 4 wherein the at least one table includes a
partition table for each sub-class of said at least one partition
table.
6. The method of claim 4 wherein the at least one table includes a
partition table for each sub-class of said at least one partition
table, said partition table includes columns of said at least one
partition table.
7. The method of claim 6 wherein the partition table for each
sub-class of said at least one partition table and includes at
least one additional column.
8. The method of claim 1 further comprising: determining whether a
first table of the at least one table has a size that exceeds a
threshold; and generating at least one partition table for each
sub-class of the first table if the size of the first table exceeds
the threshold.
9. The method of claim 1 further comprising generating a bit vector
indicative of whether an element of the table belongs to the
class.
10. The method of claim 1 further comprising compressing columns of
the at least one table.
11. The method of claim 1 further comprising indexing the at least
one table.
12. The method of claim 1 wherein the added column is based on the
inheritance of the class.
13. The method of claim 1 wherein the database is arranged as at
least one table in a column store format.
14. A computer readable medium embodying a computer program for
performing a method, said method comprising: receiving a query, in
a controller, wherein a data store stores data in a relational
database, the database comprising a plurality of fields and a
plurality of data elements, wherein each field is associated with a
portion of the data elements, wherein the database is arranged as
at least one table, and wherein the query identifies data of an
object to be stored in a data set; determining, by a data modeling
engine, whether the relational database includes a field associated
with a class of the object; when the relational database does not
include a field associated with the class of the object as
determined by the data modeling engine, the data modeling engine
adding a column to the relational database, the added column having
a field corresponding to the class; and adding, by the data
modeling engine, the data of the object to the relational
database.
15. The computer readable medium of claim 14 wherein the at least
one table is a single table including all objects.
16. The computer readable medium of claim 14 wherein the at least
one table includes a main table and at least one partition table
for each sub-class of a class of the main table.
17. A computer system comprising: one or more processors; a
controller, the controller receiving a query, wherein a data store
stores data in a relational database, the database comprising a
plurality of fields and a plurality of data elements, wherein each
field is associated with a portion of the data elements, wherein
the database is arranged as at least one table, and wherein the
query identifies data of an object to be stored in a data set; a
data modeling engine determining whether the relational database
includes a field associated with a class of the object; the data
modeling engine, when the relational database does not include a
field associated with the class of the object as determined by the
data modeling engine, adding a column to the relational database,
the added column having a field corresponding to the class; and the
data modeling engine adding the data of the object to the
relational database.
18. The computer system of claim 17 wherein the at least one table
is a single table including all objects.
19. The computer system of claim 17 wherein the at least one table
includes a main table and at least one partition table for each
sub-class of a class of the main table.
20. The computer system of claim 17 wherein the at least one table
includes a partition table for each sub-class of said at least one
partition table, said partition table includes columns of said at
least one partition table.
Description
BACKGROUND
[0001] The present invention relates to computing, and in
particular, to systems and methods for generating a common data
model for relational and object-oriented databases.
[0002] Most commercially used programming languages are
object-oriented languages or have similar features. Many programs
written in such languages make heavy use of databases and could not
function without them. However, most common database systems are
relational databases. Although mapping data stored in objects to
relational database systems is widely discussed, such mapping has
performance issues or a negative influence on software design.
[0003] One problem associated with using object databases that it
is not always possible to use them because object databases tend to
be slow and are not widely accepted. It is generally desirable to
use multiple databases of different types. However, object
databases typically are not compatible with existing relational
database software. Consequently, there exists a need for improved
systems and methods for using relational and object-oriented
databases. The present invention addresses this problem, and
others, by providing systems and methods for generating a common
data model for relational and object-oriented databases.
SUMMARY
[0004] Embodiments of the present invention include systems and
methods for performing direct reporting access to transactional
data stores. In one embodiment, the present invention includes a
computer-implemented method comprising receiving a query, in a
controller, wherein a data store stores data in a relational
database. The database comprises a plurality of fields and a
plurality of data elements. Each field is associated with a portion
of the data elements. The database is arranged as at least one
table, and the query identifies data of an object to be stored in a
data set. The method further includes determining, by the
controller, whether the relational database includes a field
associated with a class of the object; and if the relational
database does not include a field associated with the class of the
object, adding a column to the relational database, the added
column having a field corresponding to the class. The method
further includes adding the data of the object to the relational
database.
[0005] In one embodiment, the at least one table is a single table
including all objects.
[0006] In one embodiment, the at least one table includes a main
table and at least one partition table for each sub-class of a
class of the main table.
[0007] In one embodiment, the at least one table includes a
partition table for each sub-class of said at least one partition
table.
[0008] In one embodiment, the at least one table includes a
partition table for each sub-class of said at least one partition
table, said partition table includes columns of said at least one
partition table.
[0009] In one embodiment, the partition table for each sub-class of
said at least one partition table and includes at least one
additional column
[0010] In one embodiment, the method further includes determining
whether a first table of the at least one table has a size that
exceeds a threshold; and generating at least one partition table
for each sub-class of the first table if the size of the first
table exceeds the threshold.
[0011] In one embodiment the method further includes generating a
bit vector indicative of whether an element of the table belongs to
the class.
[0012] In one embodiment, the method further includes compressing
columns of the at least one table.
[0013] In one embodiment, the method further includes indexing the
at least one table.
[0014] In one embodiment, the added column is based on the
inheritance of the class.
[0015] In one embodiment, the database is arranged as at least one
table in a column store format.
[0016] In another embodiment, the present invention includes a
computer readable medium embodying a computer program for
performing a method and embodiments described above.
[0017] In another embodiment, the present invention includes a
computer system comprising one or more processors implementing the
techniques described herein. For example, the system includes a
controller that receives a query. A data store stores data in a
relational database. The database comprises a plurality of fields
and a plurality of data elements. Each field is associated with a
portion of the data elements. The database is arranged as at least
one table. The query identifies data of an object to be stored in a
data set.
[0018] The controller determining whether the relational database
includes a field associated with a class of the object. If the
relational database does not include a field associated with the
class of the object, the controller adds a column to the relational
database, the added column having a field corresponding to the
class. The controller adds the data of the object to the relational
database.
[0019] The following detailed description and accompanying drawings
provide a better understanding of the nature and advantages of the
present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] FIG. 1 is a schematic representation of a system for
generating a common data model for relational and object-oriented
databases according to an embodiment of the present invention.
[0021] FIG. 2 illustrates an example table for a single table data
store according to an embodiment of the present invention.
[0022] FIG. 3 illustrates example tables for a partitioned table
data store according to an embodiment of the present invention.
[0023] FIG. 4 illustrates a process for generating a relational
database according to an embodiment of the present invention.
[0024] FIG. 5 illustrates a process generating multiple partition
tables for a relational database according to an embodiment of the
present invention.
[0025] FIG. 6 illustrates a process generating a table of a
relational database according to an embodiment of the present
invention.
[0026] FIG. 7 illustrates hardware used to implement embodiments of
the present invention.
DETAILED DESCRIPTION
[0027] Described herein are techniques for generating a common data
model for relational and object-oriented databases. The
apparatuses, methods, and techniques described below may be
implemented as a computer program (software) executing on one or
more computers. The computer program may further be stored on a
computer readable medium. The computer readable medium may include
instructions for performing the processes described below. In the
following description, for purposes of explanation, numerous
examples and specific details are set forth in order to provide a
thorough understanding of the present invention. It will be
evident, however, to one skilled in the art that the present
invention as defined by the claims may include some or all of the
features in these examples alone or in combination with other
features described below, and may further include modifications and
equivalents of the features and concepts described herein.
[0028] FIG. 1 is a schematic representation of a system 100 for
generating a common data model for relational and object-oriented
databases according to an embodiment of the present invention.
System 100 includes a user or other interface 105, a data store
108, and a data modeling system 112. In the following description,
the term "data store" is used interchangeably with "database." Data
store 108 may comprise one or more data stores. Data modeling
system 112 comprises a relational database 120, an object database
121, an index database 122, a relational database model 124, a data
modeling engine 125 and a controller 130.
[0029] Information is conveyed between user interface 105, data
store 108, and data modeling system 112, along data flow paths 132,
133, and 134. For example, data modeling system 112 accesses the
contents of database 108 over data flow path 134 when combining
data.
[0030] Relational database 120 and object database 121 are sets of
data that are stored in database 108 and accessed by data modeling
system 112. Object database 121 stores data based on objects. The
objects may be defined by a class, sub-classes, and inheritance
between classes and sub-classes. Index database 122 comprises an
index of relational database 120 and object database 121 for
facilitating searching. Data modeling engine 125 executes a process
or algorithm that analyzes data from relational database 120,
object database 121 and index database 122 and generates relational
database model 124 based on the analysis. Data modeling engine 125
analyzes various properties of the databases for generating the
relational database model 124. In some embodiments, relational
database model 124 is arranged in a column store format. In some
embodiments, relational database model 124 is arranged in a row
store format.
[0031] User or other interface 105 is a collection of one or more
data input/output devices for interacting with a human user or with
another data processing system to receive and output data. For
example, interface 105 can be a presentation system, one or more
software applications, or a data communications gateway, for
example. Data flow path 132 is data communicated over interface 105
that retrieves data from or causes a change to data stored in
database 108. Such changes include the insertion, deletion, or
modification of all or a portion of the contents of database 108.
Data output over interface 105 can present the results of data
processing activities in system 100. For example, data flow path
133 can convey the results of queries or other operations performed
on data modeling system 112 for presentation on a monitor or a data
communications gateway.
[0032] Data store 108 is a collection of information that is stored
at one or more data machine readable storage devices (e.g., data
stores). Data store 108 may be a single data store or multiple data
stores, which may be coupled to one or more software applications
for storing application data. Data store 108 may store data as a
plurality of data records. Each data record comprises a plurality
of data elements (e.g., fields of a record). Data store 108 may
include different structures and their relations (e.g., data store
tables, data records, fields, and foreign key relations).
Additionally, different structures and fields may include data
types, descriptions, or other metadata, for example, which may be
different for different data records. Data flow path 134 conveys
information describing changes to data stored in data store 108
between data modeling system 112 and data store 108 Such changes
include the insertion, deletion, and modification of all or a
portion of the contents of one or more data stores.
[0033] Data modeling system 112 is a collection of data processing
activities (e.g., one or more data analysis programs or methods)
performed in accordance with the logic of a set of machine-readable
instructions. The data processing activities can include running
queries on the contents of data store 108. The results of such
queries can be aggregated to yield an aggregated result set. A
query is a request for information. A result set is a set of
information that answers a query. An aggregated result set is a set
of information from a data store that answers a query, such as from
data store 108. The set of information in an aggregated result set
can be, for example, a union of the results of independent queries
on two or more data sets. The aggregated result sets can be
conveyed to interface 105 over data flow path 133. Interface 105
can, in turn, render the aggregated result sets over an output
device for a human or other user or to other systems. This output
of aggregated result sets drawn from data modeling system 112,
based on data from data store 108, allows system 100 to accurately
portray the combination of different data sets.
[0034] Queries from the data combination engine 125 or the user
interface 105 may be received by controller 130. Controller 130 may
be a component on the same system as a data store or part of a
different system and may be implemented in hardware, software, or
as a combination of hardware and software, for example. Controller
130 receives a query from data combination engine 125 and generates
one or more requests based on the received query depending on the
data stores 108 and data sets that are to be accessed. Data store
108 transforms the request from controller 130 into a query syntax
(e.g., SQL) compatible with the data store, and the query may
specify specific tables and fields to be read from the data
store.
[0035] Controller 130 receives data from data store 108. In
responding to the query from data combination engine 125,
controller 130 may aggregate the data of the data sets from data
store 108. The aggregation may be implemented with a join
operation, for example. Finally, controller 130 returns the
aggregated data to data combination engine 125 in response to the
query.
[0036] FIG. 2 illustrates a single table 200 in accordance with one
embodiment of the present invention. Table 200 stores data for one
class and all sub-classes of the class. In the illustrative
example, the class is employee. Sub-classes are, for example,
department, title and union status. Table 200 includes columns for
all fields of all objects. If one field is only present in one
class, the column is there for all the other classes as well. In
those cases, the field is unused and set to null. In some
embodiments, data modeling engine 125 compresses the unused values.
For example, only one employee (Tim Thomas) shown in the list is a
member of the union. The values for all other shown employees are
null. The column of union status may be compressed.
[0037] In some embodiments, data modeling engine 125 uses index
database 122 for searching for a selected object type. In some
embodiments, table 200 further includes a column that includes
information about the object type of a data entry for each class of
the object stored in table 200. In some embodiments, the column is
a bit-vector. The bit-vector indicates whether the row of table 200
belongs to the class of the vector. Data modeling engine 125
performs a logical AND of the bit vector with the corresponding
class column. Data modeling engine 125 uses the bit-vector to
accelerate the search on specific object types, or to sort table
200 for compression. In some embodiments, data modeling engine 125
generates a one bit vector for each transaction during
transactional processing. The one bit vector defines whether the
row corresponding to the vector is seen by the transaction. This
provides a consistent view during transactional processing, because
data modeling engine 125 may insert new rows into table 200 during
parallel transactions, which may not be visible for already running
transactions. Data modeling engine 125 determines the visible rows
by performing a logical AND operation with the consistent view bit
vector of the transaction and the bit vectors of the class.
[0038] Data modeling engine 125 may sort table 200 according to the
class hierarchy resulting in all objects of the same class are next
to each other in table 200. This sorting allows higher compression
rates using block compression. In this instance, data modeling
engine 125 compresses the null values for the higher
compression.
[0039] FIG. 3 illustrates a table 300 that is partitioned into
tables based on sub-classes. Main table 300 is a main table that
represents the super-class for all other objects of object database
121. In the illustrative example of FIG. 3, main table 300 includes
three additional partitions. Partition tables are generated for
each sub-class of a class in the main table. Although the
partitions are described based on class, other attributes of the
objects may be used for partitioning. Each partition has a
different number of columns A partition has always the same columns
as its super-class partition and might add its own columns In the
illustrative example of FIG. 3, a partition table 310 and a
partition table 320 are partitions for sub-classes of the class
represented as main table 300. A partition table 321 is a partition
for a sub-class of partition table 320.
[0040] Partition table 310, partition table 320, and partition
table 321 include the columns 311 of main table 300. Partition
table 310 includes an additional column 312. Partition table 320
includes additional columns 322, which in the illustrative example
is two different columns Partition table 321 includes columns 311
and columns 322 of partition table 320, and further includes an
additional column 323. Data modeling engine 125 may generate
partition table 310, partition table 320 and partition table 321
based on the inheritance of the classes and sub-classes of main
table 300.
[0041] In some embodiments, the partition tables are generated
based on the inheritance of a class. In some embodiments, a
partition tables may be generated from more than one class or
sub-class.
[0042] FIG. 4 illustrates a process for generating relational
database model 124 according to an embodiment of the present
invention. At 402, data modeling system 112 receives an object
database 121. At 404, data modeling engine 125 determines whether
the size of relational database model 124 created from object
database 121 exceeds a threshold. If the size exceeds the
threshold, at 408, data modeling engine 125 generates a partitioned
database as described below in conjunction with FIG. 5. Otherwise,
at 410, data modeling engine 125 generates a single table such as
table 200. Data modeling engine 125 may generate the table 200
using the process described below in conjunction with FIG. 6. In
some embodiments, data modeling engine 125 does not determine the
size of the database model 124 at 406, and generates a single table
at 410 regardless of the size of relational database model 124. In
some embodiments, at 412, data modeling engine 125 generates
vectors for the class. In some embodiments, at 414, data modeling
engine 125 creates or updates index database 122.
[0043] FIG. 5 illustrates a process for generating relational
database model 124 includes a multiple table according to an
embodiment of the present invention. In the process of FIG. 5, data
modeling engine 125 generates a main table and iteratively checks
whether the table has any sub-classes and continues generating
partition tables at lower levels. Data modeling engine 125 may
generate the table 300 and the partition tables 310, 320 and 321
using the process described below in conjunction with FIG. 6. When
a table does not have a sub-class, data modeling engine 125 moves
up a level and checks the partition table. Data modeling engine 125
moves up levels as partitions are finished for the table. In an
illustrative example, FIG. 5 is described for the multiple table of
FIG. 3. Although FIG. 5, is described for a partition for each
class, in some embodiments, the partition tables are generated
based on the inheritance of a class. In some embodiments, a
partition tables may be generated from more than one class or
sub-class. In some embodiments, data modeling engine 125 generates
a standard table instead of a partition table.
[0044] At 502, data modeling engine 125 generates a main table
(main table 300). At 504, data modeling engine 125 determines
whether the current table being evaluated (table 300 in the first
iteration) has any sub-classes. If so, at 506, data modeling engine
125 generates a partition table (table 310 in the first instance)
based on the sub-class. At 508, data modeling engine 125 determines
whether the partition table generated at 506 as a sub-class. In the
illustrative example, table 310 does not have any sub-classes. At
510, data modeling engine 125 increases the level of the partition
table under evaluation (now table 300) and at 504 determines
whether the table 300 has another sub-class. In this example, table
300 has a second sub-class, and at 506, data modeling engine 125
generates a sub-level partition table 320. At 508, data modeling
engine 125 determines that table 320 has a sub-class, and at 506
generates table 321. At 508, data modeling engine 125 determines
that table 321 has no sub-classes, and at 510 increases a level. At
504, data modeling engine 125 determines that table 320 has no
further sub-classes. At 512, data modeling engine 125 determines
that the level is not the last level. At 510, data modeling engine
125 increases the level to table 300. At 504, data modeling engine
125 determines that table 300 has no further sub-classes, and at
512, determines table 300 is the last level. At 512, data modeling
engine 125 ends the generation of the table and returns to 408
(FIG. 4).
[0045] In some embodiments of FIG. 5, a hybrid partitioning may be
performed. The partition tables may be partitioned when the
partitioned table exceeds a threshold size. In some embodiments of
FIG. 5, data modeling engine 125 may search for objects of a class
and, within the same query, search for objects of a super classes,
sub-classes or both.
[0046] FIG. 6 illustrates a process generating a table of a
relational database according to an embodiment of the present
invention. At 602, data modeling engine 125 determines whether
relational database 124 includes a field associated with a class of
the object. If, at 602, relational database 124 does not include a
field associated with the class of the object, at 604, data
modeling engine 125 adds a column to relational database 124. The
added column has a field corresponding to the class. In some
embodiments, data modeling engine 125 stores the data for the field
in metadata fields associated with the database or in index
database 122. At 606, data modeling engine 125 adds the data of the
object to relational database 124.
[0047] FIG. 7 illustrates hardware used to implement embodiments of
the present invention. An example computer system 710 is
illustrated in FIG. 7. Computer system 710 includes a bus 705 or
other communication mechanism for communicating information, and
one or more processors 701 coupled with bus 705 for processing
information. Computer system 710 also includes a memory 702 coupled
to bus 705 for storing information and instructions to be executed
by processor 701, including information and instructions for
performing the techniques described above, for example. This memory
may also be used for storing variables or other intermediate
information during execution of instructions to be executed by
processor 701. Possible implementations of this memory may be, but
are not limited to, random access memory (RAM), read only memory
(ROM), or both. A machine readable storage device 703 is also
provided for storing information and instructions. Common forms of
storage devices include, for example, a non-transitory
electromagnetic medium such as a hard drive, a magnetic disk, an
optical disk, a CD-ROM, a DVD, a flash memory, a USB memory card,
or any other medium from which a computer can read. Storage device
703 may include source code, binary code, or software files for
performing the techniques above, for example. Storage device 703
and memory 702 are both examples of computer readable mediums.
[0048] Computer system 710 may be coupled via bus 705 to a display
712, such as a cathode ray tube (CRT) or liquid crystal display
(LCD), for displaying information to a computer user. An input
device 711 such as a keyboard and/or mouse is coupled to bus 705
for communicating information and command selections from the user
to processor 701. The combination of these components allows the
user to communicate with the system, and may include, for example,
user interface 105. In some systems, bus 705 may be divided into
multiple specialized buses.
[0049] Computer system 710 also includes a network interface 704
coupled with bus 705. Network interface 704 may provide two-way
data communication between computer system 710 and the local
network 720, for example. The network interface 704 may be a
digital subscriber line (DSL) or a modem to provide data
communication connection over a telephone line, for example.
Another example of the network interface is a local area network
(LAN) card to provide a data communication connection to a
compatible LAN. Wireless links are another example. In any such
implementation, network interface 704 sends and receives
electrical, electromagnetic, or optical signals that carry digital
data streams representing various types of information.
[0050] Computer system 710 can send and receive information,
including messages or other interface actions, through the network
interface 704 across a local network 720, an Intranet, or the
Internet 730. For a local network, computer system 710 may
communicate with a plurality of other computer machines, such as
server 715. Accordingly, computer system 710 and server computer
systems represented by server 715 may be programmed with processes
described herein. In the Internet example, software components or
services may reside on multiple different computer systems 710 or
servers 731-735 across the network. Some or all of the processes
described above may be implemented on one or more servers, for
example. Specifically, data store 108 and data combination system
112 might be located on different computer systems 710 or one or
more servers 715 and 731-735, for example. A server 731 may
transmit actions or messages from one component, through Internet
730, local network 720, and network interface 704 to a component on
computer system 710. The software components and processes
described above may be implemented on any computer system and send
and/or receive information across a network, for example.
[0051] The above description illustrates various embodiments of the
present invention along with examples of how aspects of the present
invention may be implemented. The above examples and embodiments
should not be deemed to be the only embodiments, and are presented
to illustrate the flexibility and advantages of the present
invention as defined by the following claims. Based on the above
disclosure and the following claims, other arrangements,
embodiments, implementations and equivalents will be evident to
those skilled in the art and may be employed without departing from
the spirit and scope of the invention as defined by the claims.
* * * * *