U.S. patent number RE46,537 [Application Number 14/877,992] was granted by the patent office on 2017-09-05 for querying an object-oriented data hierarchy using relational query languages.
This patent grant is currently assigned to Semmle Limited. The grantee listed for this patent is Semmle Limited. Invention is credited to Oege de Moor, Elnar Hajiyev, Mathieu Verbaere.
United States Patent |
RE46,537 |
Verbaere , et al. |
September 5, 2017 |
**Please see images for:
( Certificate of Correction ) ** |
Querying an object-oriented data hierarchy using relational query
languages
Abstract
A novel system, computer readable storage medium and method for
creating re-usable queries over complex data including hierarchies,
trees and graphs is described. This is achieved by an
object-oriented query language, where a class is a logical property
of a data item, and inheritance is implication between such
properties. Virtual method calls execute all relevant method
implementations in most specific classes. Expressions can be
multi-valued, thus avoiding the need for naming intermediate
results. All constructs closely follow the syntax of mainstream
object-oriented languages like Java.
Inventors: |
Verbaere; Mathieu (Oxford,
GB), de Moor; Oege (Oxford, GB), Hajiyev;
Elnar (Oxford, GB) |
Applicant: |
Name |
City |
State |
Country |
Type |
Semmle Limited |
Oxford |
N/A |
GB |
|
|
Assignee: |
Semmle Limited (N/A)
|
Family
ID: |
40845382 |
Appl.
No.: |
14/877,992 |
Filed: |
October 8, 2015 |
Related U.S. Patent Documents
|
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
Issue Date |
|
|
12349761 |
Apr 3, 2012 |
8150866 |
|
|
|
61019376 |
Jan 7, 2008 |
|
|
|
Reissue of: |
13423766 |
Mar 19, 2012 |
8554782 |
Oct 8, 2013 |
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F
16/243 (20190101); G06F 16/2425 (20190101); G06F
16/243 (20190101); G06F 16/2425 (20190101) |
Current International
Class: |
G06F
17/30 (20060101) |
Field of
Search: |
;707/759,760,761,762,763 |
References Cited
[Referenced By]
U.S. Patent Documents
Other References
Oege de Moor et al., "Keynote Address: .QL for Source Code
Analysis", 7th IEEE International Working Conference on Source Code
Analysis and Manipulation, IEEE Computer Society, Sep. 30, 2007.
cited by applicant .
Oege de Moor, et al., ".QL: Object-Oriented Queries Made Easy",
GTTSE, Jul. 2-7, 2007, Braga, Portugal. cited by applicant.
|
Primary Examiner: Peikari; Behzad
Attorney, Agent or Firm: Fish & Richardson P.C.
Parent Case Text
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is .[.based on and claims priority from.]. .Iadd.a
continuation of .Iaddend.U.S. patent application Ser. No.
12/349,761 filed on Jan. 7, 2009, which is based on and claims
priority from U.S. Provisional Patent Application No. 61/019,376,
filed on Jan. 7, 2008, the entire disclosure of each of the
above-referenced patent applications are hereby incorporated by
reference in their entirety.
Claims
What is claimed is:
1. A process .[.for creating re-usable queries over complex data,
the process.]. comprising: .[.defining a re-usable
object-oriented.]. .Iadd.receiving queries defined in a first
.Iaddend.query language .[.with.]. .Iadd.in which logical
properties are represented by .Iaddend.object-oriented classes
.[.treated as logical properties.]., .[.wherein each logical
property is defined by one or more specified classes of at least
one complex data item.]. .Iadd.and in which inheritance
relationships between the classes represent relationships between
the logical properties, and wherein the queries include requests
for data items having requested logical properties.Iaddend.;
.[.relating inheritance, used to form new class with the specified
classes, to the logical properties along with a plurality of
virtual method calls that execute all applicable method
implementations in at least one of the specified classes;.]. and
.Iadd.in response to the queries requesting data items having
logical properties encoded as methods of particular classes,
.Iaddend.dispatching on the .[.logical properties.].
.Iadd.particular classes .Iaddend.to choose .[.at least one.].
.Iadd.one or more .Iaddend.most specific .[.implementation.].
.Iadd.implementations .Iaddend.of .[.an operation.]. .Iadd.the
methods.Iaddend., wherein a most specific implementation .Iadd.of a
method .Iaddend.is in a class where no .[.subclasses define a same
method.]. .Iadd.other classes that inherit from the class include
an implementation of the method.Iaddend., wherein the dispatching
on the .[.logical properties.]. .Iadd.particular classes
.Iaddend.further includes: .[.defining.]. .Iadd.identifying
.Iaddend.a named predicate p as predicate p(T.sub.1x.sub.1, . . . ,
T.sub.nx.sub.n) {formula} for each one of the .[.applicable
method.]. .Iadd.most specific .Iaddend.implementations, wherein the
named predicate p has variables x.sub.1, . . . , x.sub.n. so as to
restrict a range of a relation, which contains tuples (x.sub.1, . .
. x.sub.n) where each x.sub.i has a type .[.T.].
.Iadd.T.sub.i.Iaddend.; testing a receiver against each
.[.relevant.]. type and choosing .[.the.]. applicable method
implementations .Iadd.from the most specific
implementations.Iaddend., wherein testing a type of the receiver
.[.is achieved by.]. .Iadd.includes .Iaddend.invoking a named
constructor for each class defining a .[.method of a relevant
signature.]. .Iadd.most specific implementation.Iaddend.,
regardless of static types; .[.implementing queries by.].
translating .[.actual or possibly recursive object-oriented.].
.Iadd.the .Iaddend.queries to a .Iadd.second .Iaddend.query
language using for each method and constructor of each class, a
first special variable that holds a value that is a member of the
class, and for each method that is not a predicate, a second
special variable that holds a result of a method; and .[.rending.].
.Iadd.rendering the .Iaddend.queries concise by at least using
multi-valued operations to avoid naming intermediate results in
nested method calls.
2. The process as set forth in claim 1, wherein .Iadd.translating
.Iaddend.the .[.implementing.]. queries includes mapping each
constructor to a named characteristic predicate of its class, which
tests whether a value belongs to the class, by taking a logical
conjunction of characteristic predicates of all superclasses and a
property specified in the constructor.
3. The process as set forth in claim .[.1.]. .Iadd.2.Iaddend.,
wherein .Iadd.translating .Iaddend.the .[.implementing.]. queries
includes mapping instanceof tests and cast expressions to calls of
.[.such.]. .Iadd.each .Iaddend.characteristic predicates.
4. The process as set forth in claim 1, wherein .Iadd.translating
.Iaddend.the .[.implementing.]. queries includes mapping
expressions including nested method calls to a conjunction of
predicate calls, with a new named variable denoting an intermediate
result of each method call.
5. The process as set forth in claim 1, wherein .[.the implementing
queries by.]. translating .[.actual or possibly recursive
object-oriented.]. .Iadd.the .Iaddend.queries to the .Iadd.second
.Iaddend.query language produces queries in Datalog and wherein at
least one search engine is targeted by translating a same
intermediate Datalog to different executable formats.
6. The process as set forth in claim 1, wherein .Iadd.rendering
.Iaddend.the queries .[.are rendered.]. concise .[.by.].
.Iadd.includes .Iaddend.at least one of: using a * notation and a +
notation on multi-valued operations to indicate a chained
application of such operations; .[.and.]. .Iadd.or .Iaddend. using
an aggregate notation with variables, a range condition and an
expression (agg(vars|condition|expr)).
7. The process as set forth in claim 1, wherein .[.complex data.].
.Iadd.at least one of the data items .Iaddend.is stored in a
.[.standard relational database.]. .Iadd.data storage.Iaddend., and
.[.column types are used to build a hierarchy of object-oriented
classes on top of an existing relational database, by annotating an
existing relational schema.]. .Iadd.relationships between the
logical properties are identified in a data format definition for
the data storage storing the data item.Iaddend..
8. The process as set forth in claim 1, wherein at least one
.Iadd.of the data items is a .Iaddend.complex data item .[.is at.].
.Iadd.and includes .Iaddend.least one of: a data item in a
hierarchy; a tree; .[.and.]. .Iadd.or .Iaddend. a graph.
9. The process as set forth in claim 1, wherein .[.an
object-oriented query language is in any form depending on a domain
of object-oriented query language use, including.].
.Iadd.instructions in the first query language are represented in
.Iaddend.at least one of: a textual form; a graphical form; a
query-by-example .[.user interface.]. .Iadd.form.Iaddend.;
.[.and.]. .Iadd.or an .Iaddend. embedded .[.in a more general.].
.Iadd.form within another .Iaddend.programming language.
10. The process as set forth in claim .[.1.]. .Iadd.7.Iaddend.,
wherein .[.complex data is stored in a plurality of datasource
formats including.]. .Iadd.the data storage includes .Iaddend.at
least one of: a relational database; a web service; a web page;
.[.and.]. .Iadd.or .Iaddend. a .Iadd.structured data .Iaddend.file
.[.on hard disk and column types are used to build a hierarchy of
object-oriented classes on top of such data formats, by annotating
each relevant data format schema.]..
11. A computer program product .[.for creating re-usable queries
over complex data, the computer program product.]. comprising: a
non-transitory computer readable storage medium, the non-transitory
computer readable storage medium storing software programming
instructions capable of performing with a processor programming
code to carry out: .[.defining a re-usable object-oriented.].
.Iadd.receiving queries defined in a first .Iaddend.query language
.[.with.]. .Iadd.in which logical properties are represented by
.Iaddend.object-oriented classes .[.treated as logical
properties.]., .[.wherein each logical property is defined by one
or more specified classes of at least one complex data item.].
.Iadd.and in which inheritance relationships between the classes
represent relationships between the logical properties, and wherein
the queries include requests for data items having requested
logical properties.Iaddend.; .[.relating inheritance, used to form
new class with the specified classes, to the logical properties
along with a plurality of virtual method calls that execute all
applicable method implementations in at least one of the specified
classes;.]. and .Iadd.in response to the queries requesting data
items having logical properties encoded as methods of particular
classes, .Iaddend.dispatching on the .[.logical properties.].
.Iadd.particular classes .Iaddend.to choose .[.at least one.].
.Iadd.one or more .Iaddend.most specific .[.implementation.].
.Iadd.implementations .Iaddend.of .[.an operation.]. .Iadd.the
methods.Iaddend., wherein a most specific implementation .Iadd.of a
method .Iaddend.is in a class where no .[.subclasses define a same
method.]. .Iadd.other classes that inherit from the class include
an implementation of the method.Iaddend., wherein the dispatching
on the .[.logical properties.]. .Iadd.particular classes
.Iaddend.further includes: .[.defining.]. .Iadd.identifying
.Iaddend.a named predicate p as predicate p(T1x1, . . . , Tnxn)
{formula} for each one of the .[.applicable method.]. .Iadd.most
specific .Iaddend.implementations, wherein the named predicate p
has variables x1, . . . , xn so as to restrict a range of a
relation, which contains tuples (x1, . . . , xn) where each xi has
a type Ti; testing a receiver against each .[.relevant.]. type and
choosing .[.the.]. applicable method implementations .Iadd.from the
most specific implementations.Iaddend., wherein testing a type of
the receiver .[.is achieved by.]. .Iadd.includes .Iaddend.invoking
a named constructor for each class defining a .[.method of a
relevant signature.]. .Iadd.most specific implementation.Iaddend.,
regardless of static types; .[.implementing queries by.].
translating .[.actual or possibly recursive object-oriented.].
.Iadd.the .Iaddend.queries to a .Iadd.second .Iaddend.query
language using for each method and constructor of each class, a
first special variable that holds a value that is a member of the
class, and for each method that is not a predicate, a second
special variable that holds a result of a method; and .[.rending.].
.Iadd.rendering the .Iaddend.queries concise by at least using
multi-valued operations to avoid naming intermediate results in
nested method calls.
12. The computer program product of claim 11, wherein
.Iadd.translating .Iaddend.the .[.implementing.]. queries includes
mapping each constructor to a named characteristic predicate of its
class, which tests whether a value belongs to the class, by taking
a logical conjunction of characteristic predicates of all
superclasses and a property specified in the constructor.
13. The computer program product of claim .[.11.].
.Iadd.12.Iaddend., wherein .Iadd.translating .Iaddend.the
.[.implementing.]. queries includes mapping instanceof tests and
cast expressions to calls of .[.such.]. .Iadd.each
.Iaddend.characteristic .[.predicates.].
.Iadd.predicate.Iaddend..
14. The computer program product of claim 11, wherein
.Iadd.translating .Iaddend.the .[.implementing.]. queries includes
mapping expressions including nested method calls to a conjunction
of predicate calls, with a new named variable denoting an
intermediate result of each method call.
15. The computer program product of claim 11, wherein .[.the
implementing queries by.]. translating .[.actual or possibly
recursive object-oriented.]. .Iadd.the .Iaddend.queries to the
.Iadd.second .Iaddend.query language produces queries in Datalog
and wherein at least one search engine is targeted by translating a
same intermediate Datalog to different executable formats.
16. An information processing system .[.for creating re-usable
queries over complex data, the system.]. comprising: a computer
memory capable of storing machine instructions; and a processor in
communication with said computer memory, said processor configured
to access the memory.[., the processor performing a re-usable
object-oriented query language defined with object-oriented classes
treated as logical properties, wherein each logical property is
defined by one or more specified classes of at least one complex
data item; a new class formed by relating inheritance with the
specified classes, to the logical properties along with a plurality
of virtual method calls that execute all applicable method
implementations in at least one of the specified classes; and.].
.Iadd.and execute the machine instructions to perform operations
comprising: .Iaddend. .Iadd.receiving queries defined in a first
query language in which logical properties are represented by
object-oriented classes, and in which inheritance relationships
between the classes represent relationships between the logical
properties, and wherein the queries include requests for data items
having requested logical properties; .Iaddend. .[.means for.].
.Iadd.in response to the queries requesting data items having
logical properties encoded as methods of particular classes,
.Iaddend.dispatching on the .[.logical properties.].
.Iadd.particular classes .Iaddend.to choose .[.at least.]. one
.Iadd.or more .Iaddend.of a most specific .[.implementation.].
.Iadd.implementations .Iaddend.of .[.an operation.]. .Iadd.the
methods.Iaddend., wherein a most specific implementation .Iadd.of a
method .Iaddend.is in class where no .[.subclasses define a same
method.]. .Iadd.other classes that inherit from the class include
an implementation of the method.Iaddend., wherein the dispatching
on the .[.logical properties.]. .Iadd.particular classes
.Iaddend.further includes: .[.means for defining.].
.Iadd.identifying .Iaddend.a named predicate p as predicate
p(T.sub.1x.sub.1, . . . T.sub.nx.sub.n) {formula} for each one of
the .[.applicable method.]. .Iadd.most specific
.Iaddend.implementations, wherein the named predicate p has
variables x.sub.1, . . . x.sub.n. so as to restrict a range of a
relation, which contains tuples (x.sub.1, . . . , x.sub.n) where
each x.sub.1 has a type T.sub.i; .[.means for.]. testing a receiver
against each .[.relevant.]. type and choosing .[.the.]. applicable
method implementations .Iadd.from the most specific
implementations.Iaddend., wherein testing a type of the receiver
.[.is achieved by.]. .Iadd.includes .Iaddend.invoking a named
constructor for each class defining a .[.method of a relevant
signature.]. .Iadd.most specific implementation.Iaddend.,
regardless of static types; .[.means for implementing queries by.].
translating .[.actual or possibly recursive object-oriented.].
.Iadd.the .Iaddend.queries to a .Iadd.second .Iaddend.query
language using for each method and constructor of each class, a
first special variable that holds a value that is a member of the
class, and for each method that is not a predicate, a second
special variable that holds a result of a method; and .[.means for
rending.]. .Iadd.rendering the .Iaddend.queries concise by at least
using multi-valued operations to avoid naming intermediate results
in nested method calls.
17. The system of claim 16, wherein .Iadd.translating .Iaddend.the
.[.implementing.]. queries includes mapping each constructor to a
named characteristic predicate of its class, which tests whether a
value belongs to the class, by taking a logical conjunction of
characteristic predicates of all superclasses and a property
specified in the constructor.
18. The system of claim .[.16.]. .Iadd.17.Iaddend., wherein
.Iadd.translating .Iaddend.the .[.implementing.]. queries includes
mapping instanceof tests and cast expressions to calls of
.[.such.]. .Iadd.each .Iaddend.characteristic predicates.
19. The system of claim 16, wherein .Iadd.translating .Iaddend.the
.[.implementing.]. queries includes mapping expressions including
nested method calls to a conjunction of predicate calls, with a new
named variable denoting an intermediate result of each method
call.
20. The system of claim 16, wherein .[.the implementing queries
by.]. translating .[.actual or possibly recursive
object-oriented.]. .Iadd.the .Iaddend.queries to the .Iadd.second
.Iaddend.query language produces queries in Datalog and wherein at
least one search engine is targeted by translating a same
intermediate Datalog to different executable formats.
Description
FIELD OF THE INVENTION
The present invention relates generally to information retrieval,
in particular the way electronically stored data is accessed via
queries that are formulated in a programming language. Such a
special programming language for queries is commonly called a query
language, and is the usual means for creating queries over
data.
BACKGROUND OF THE INVENTION
Those skilled in the art know that SQL (Structured Query Language)
is the most popular query language when the data has been stored in
a relational database. Another example of a query language is
XQuery, for data that has been stored in XML format.
The principal use for such query languages has traditionally been
as part of a larger software system, where the application program
issues queries (in SQL or XQuery) to an information retrieval
component. In this usage scenario, queries are rarely written by
hand; instead they are generated by the software system itself.
Where they are written by hand, the author is typically an expert
who is well-trained in query technologies.
Because of this usage within larger systems by experts, it has been
possible to simultaneously design queries and the format of the
data to be searched. If a query is awkward to express, or
inefficient, the representation of the data can be adapted to
circumvent these problems.
The above assumptions are however no longer valid in certain recent
applications, such as a query interface to a wiki site, (i.e. a
site that allows users to freely create and edit Web page content
using any Web browser. Wiki supports hyperlinks and has a simple
text syntax for creating new pages and crosslinks between internal
pages on the fly), where many ad hoc queries are written by
non-expert users and the design of the data representation cannot
be changed to facilitate queries. Since these ad hoc queries are
written by people and not by systems, it is important to be able to
draw on libraries of existing queries, so that awkward details of
the data representation can be encapsulated as common operations,
allowing the query author to pose questions in the vocabulary of
the problem domain that is familiar to the author.
Accordingly, what is desired, and which has not until now been
developed, is a method and apparatus, embodied as a query language,
that allows the construction of re-usable queries, so that
non-experts can phrase questions in the vocabulary of the problem
domain. Furthermore queries in such a language should be concise
and easy to read. Finally the language should be close in syntax to
mainstream programming languages, so it is easy to learn for those
who already have some programming experience.
SUMMARY OF THE INVENTION
The present invention provides a means for creating re-usable
queries over complex data, in particular hierarchies, trees and
graphs. This is achieved by an object-oriented query language,
where a class is a logical property of a data item, and inheritance
is implication between such properties. Virtual method calls
execute all relevant method implementations in most specific
classes. Expressions can be multi-valued, thus avoiding the need
for naming intermediate results. Methods can be recursive, thus
enabling queries over recursive data. All constructs closely follow
the syntax of mainstream object-oriented languages like Java.
In one embodiment, the present invention provides a system, a
computer readable storage medium and a method for creating
re-usable queries over complex data. The method includes defining a
re-usable object-oriented query language with object-oriented
classes treated as logical properties, wherein each logical
property is defined by one or more specified classes of at least
one complex data item. Inheritance is used to form new class with
the specified classes, to the logical properties along with a
plurality of virtual method calls that execute all applicable
method implementations in at least one of the specified classes.
Dispatching is used on the logical properties to choose at least
one most specific implementation of an operation, wherein a most
specific implementation is in a class where no subclasses define a
same method.
The method in another embodiment includes implementing queries by
translating possibly recursive object-oriented queries to pure
Datalog using for each method and constructor of each class, a
first special variable that holds a value that is a member of the
class, and for each method that is not a predicate, a second
special variable that holds a result of a method.
Further in this embodiment, the dispatching on the logical
properties further includes: defining a named predicate p as
predicate p(T1x1, . . . , Tnxn) {formula} for each one of the
applicable method implementations, wherein the named predicate p
has variables x1, . . . , xn so as to restrict a range of a
relation, which contains tuples (x1, . . . , xn) where each xi has
a type Ti; and testing a receiver against each relevant type and
choosing the applicable method implementations, wherein testing a
type of the receiver is achieved by invoking a named constructor
for each class defining a method of a relevant signature,
regardless of static types.
Still, further in this embodiment, the method comprises at least
one of: (a) mapping each constructor to a named characteristic
predicate of its class, which tests whether a value belongs to the
class, by taking a logical conjunction of characteristic predicates
of all superclasses and a property specified in the constructor;
(b) mapping instanceof tests and cast expressions to calls of such
characteristic predicates; and (c) mapping expressions including
nested method calls to a conjunction of predicate calls, with a new
named variable denoting an intermediate result of each method
call.
Still, further in this embodiment, at least one search engine is
targeted by translating a same intermediate Datalog to different
executable formats.
In another embodiment, the queries are rendered concise by at least
one of: (a) using multi-valued operations to avoid naming
intermediate results; (b) using a * notation and a + notation on
multi-valued operations to indicate a chained application of such
operations; and (c) using an aggregate notation with variables, a
range condition and an expression (agg(vars|condition|expr)).
In another embodiment, the present invention stores complex data in
a standard relational database, and column types are used to build
a hierarchy of object-oriented classes on top of an existing
relational database, by annotating an existing relational
schema.
In this embodiment, the complex data can be at least one of: (a) a
data item in a hierarchy; (b) a tree; and (c) a graph.
In another embodiment, an object-oriented query language can be in
any form depending on a domain of its use, including at least one
of: (a) a textual form; a graphical form; (b) a query-by-example
user interface; and (c) embedded in a more general programming
language.
In still another embodiment, the complex data is stored in a
plurality of datasource formats including at least one of: (a) a
relational database; (b) a web service; (c) a web page; and (d) a
file on hard disk; and (e) column types are used to build a
hierarchy of object-oriented classes on top of such data formats,
by annotating each relevant data format schema.
The foregoing and other features and advantages of the present
invention will be apparent from the following more particular
description of the preferred embodiments of the invention, as
illustrated in the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
The subject matter, which is regarded as the invention, is
particularly pointed out and distinctly claimed in the claims at
the conclusion of the specification. The foregoing and other
features, and advantages of the invention will be apparent from the
following detailed description taken in conjunction with the
accompanying drawings in which:
FIG. 1 is a diagram illustrating how multiple users can employ a
library of queries that are re-usable in that they can be invoked
multiple times from different queries.
FIG. 2 is a drawing of a complete system embodying the
invention.
FIG. 3 shows a sample inheritance hierarchy for creating re-usable
queries on employee data in a company.
FIG. 4 shows a Venn diagram of the values that satisfy the
characteristic properties of those classes.
FIG. 5 is a summary of the values returned by a sample virtual
method named getChairColor.
FIG. 6 depicts an algorithm for virtual method dispatch,
exemplified on the virtual method named getChairColor.
FIG. 7 illustrates the different sublanguages of the proposed query
language, which are used in defining the meaning of the new
features that enable the creation of re-usable queries.
FIG. 8 illustrates how column types (annotations on a normal
relational schema) are used as the starting point of building a .QL
class hierarchy.
FIG. 9 illustrates how multi-valued methods are evaluated.
FIG. 10 is a block diagram of a computer system with a multiple
datasources which can be searched from a single .QL query,
according to the present invention.
.[.FIGS. 11A-11E is a block diagram giving an overview of the
components of the present invention. Solid arrows indicate the flow
between components, while dashed arrows indicate additional levels
of detail for these components..].
.Iadd.FIG. 11A is a flow chart depicting an example method for
creating concise and reusable queries..Iaddend.
.Iadd.FIG. 11B is a flow chart depicting an example method for
translating constructors and methods..Iaddend.
.Iadd.FIG. 11C is a flow chart depicting an example method for
rendering queries concise..Iaddend.
.Iadd.FIG. 11D is a diagram showing example of forms of complex
data, forms of an object-oriented query language, and forms of
storage of complex data..Iaddend.
.Iadd.FIG. 11E is a flow chart depicting an example method for
executing the result of the translation shown in FIG. 11A and FIG.
11B..Iaddend.
.Iadd.Solid arrows indicate the flow between the steps each example
method, while dashed arrows indicate additional levels of detail
for these steps..Iaddend.
FIG. 12 is a block diagram of a computer system useful for
implementing the software steps of the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
It should be understood that these embodiments are only examples of
the many advantageous uses of the innovative teachings herein. In
general, statements made in the specification of the present
application do not necessarily limit any of the various claimed
inventions. Moreover, some statements may apply to some inventive
features but not to others. In general, unless otherwise indicated,
singular elements may be in the plural and vice versa with no loss
of generality.
The objective of the invention is depicted in FIG. 1: the ability
to construct a library 101 of re-usable queries in an
object-oriented query language. These re-usable queries in the
library can be invoked from other queries 102, 103 and 104, where
those queries are authored by multiple users 105, 106 and 107, all
of whom can share the same library of re-usable queries.
An example flow diagram of the invention is shown in FIGS. 11A-11E.
This diagram is intended as an overview of the invention, showing
its essential components and the relation to other figures. The
concepts introduced in the brief description of FIGS. 11A-11E will
be further explained (with many concrete examples) afterwards.
Concise and re-usable queries over complex data are created 1101 in
three steps: first 1102 by defining an object-oriented (OO) query
language where classes are logical properties. Second 1103 by
relating the notion of inheritance to those logical properties as
well as virtual method dispatch. Third 1104 by dispatching on those
logical properties to choose a most-specific implementation of a
given method from the available classes. Each of these three steps
(1102, 1103 and 1104) will be further elaborated below; they are
also further illustrated in other figures: 1102 is elaborated in
FIG. 3, 1103 in FIGS. 4, and 1104 in FIG. 5.
The dashed lines in FIGS. 11A-11E indicate further levels of
detail. In particular, queries are implemented 1105 by: 1106
translating constructors and methods to clauses in Datalog, a
traditional query language in theoretical database research. That
is, 1107 for each constructor, method and predicate, special
variables are introduced for the current class, and (where
necessary) also for the result. The dispatching of virtual methods
1108 is achieved by 1109 defining a named predicate for each method
implementation, and 1110 testing a receiver of a virtual method
call against relevant classes by invoking a named constructor. The
translation to Datalog 1105 is further explained later, and it is
also illustrated by FIG. 2; 1106 is illustrated by FIG. 7 and 1110
by FIG. 8.
The translation of constructors and methods 1111 is achieved by
three mappings: first 1112, each constructor is mapped to a named
characteristic predicate of its class. Second 1113, instanceof
tests and cast expressions are mapped to calls of such
characteristic predicates. Third 1114, expressions including nested
method calls are mapped to a conjunction of predicate calls, with a
new named variable denoting an intermediate result of each method
call. We shall revisit 1112, 1113 and 1114 when further describing
the mapping to Datalog.
The concise and re-usable queries in 1101 are rendered concise by
1115: first 1116, the introduction 1117 of multi-valued operations
to avoid naming intermediate results and 1118 a * notation and a +
notation on such multi-valued operations to indicate their chained
application. A second device for rendering queries concise is an
aggregate notation 1119 with variables, a range condition and an
expression (agg(vars|condition|expr)). Multi-valued operations are
further discussed below via multiple concrete examples; they are
also further illustrated in FIG. 9.
The concise and re-usable queries in 1101 can be applied 1120 to
complex data is stored in a standard relational database and column
types are used to build a hierarchy of object-oriented classes on
top of an existing relational database, by annotating an existing
relational schema. This is further explained below via detailed
examples, and also in FIG. 8.
The complex data in 1101 consists of 1121 at least one complex data
item that is one of a hierarchy 1124 (for instance the organization
of a company), a tree 1125 (for instance the representation of the
syntax of a computer program) or a graph 1126 (for instance of
connections in a social network). Further examples of complex data
items will be discussed below.
The object-oriented query language of 1101 can be 1122 in any form
depending on the domain of its use: textual form 1127 (the most
common way of presenting queries), graphical form 1128 (which may
be more suited in applications to business intelligence),
query-by-example 1129 (common when non-experts wish to run a query
similar to a previous one), or embedded in a more general
programming language 1130 (examples of such general programming
languages are Java and C++).
The concise and re-usable queries can be applied 1123 to complex
data that is stored in a plurality of datasource formats, and
column types are used to build a hierarchy of object-oriented
classes by annotating an existing schema of the datasource format.
The datasource format can be one of a relational database 1131, a
web service 1132, a web page 1133, or a file on hard disk 1134.
This is further illustrated in FIG. 10.
The result of the translation 1105 can be executed 1135 by
translating a single intermediate Datalog program to different
executable formats 1136. This concludes the description of the
overview of the invention as depicted in FIGS. 11A-11E, and we now
proceed to further elaborate the concepts introduced in FIGS.
11A-11E.
For brevity, we shall name the novel object-oriented query language
.QL. The structure of the presentation is as follows: first we show
how in .QL, a traditional database schema is annotated to enable
the construction of a class hierarchy. Here we chose a typical toy
example found in textbooks on databases, namely a company database
involving employees, departments and so on. Next, we show how a
hierarchy of concepts can be constructed on top of that annotated
schema. Finally, we proceed to describe in detail how .QL may be
realized by a translation of its novel features to a more
traditional query language.
Annotating a Database Schema with Types
A fictitious company named Fuzzles Inc. has a database for its
company structure, and the schema for that database is shown below.
It is a normal SQL schema, except that each field has been
annotated with a so-called column type. In the schema below, column
types are indicated by italic font. As shown in FIG. 8, these
column types 803 are a unique feature of .QL, allowing programmers
to build class hierarchies 804 on top of a schema 802 in a normal
relational database 801.
TABLE-US-00001 employee(int ssn: @person, varchar(100) name: string
ref) department(int did: @department, varchar(100) name: string
ref) hassubsidiary(int parent: @department ref, int child:
@department ref) worksin(int ssn: @person ref, int did: @department
ref) manages(int ssn: @person ref, int did: @department ref)
parttime(int ssn: @person ref, float percentage: float ref)
cyclist(int ssn: @person ref, int cycle: @cycle) driver(int ssn:
@person ref, int car: @car)
To take one example, the `ssn` field of the `employee` table has
been annotated with @person. This in fact serves as a declaration
of the type @person, defining it to be the set of values that
occurs in that column in the database. By contrast, in the
`worksin` relation, the `ssn` field is annotated with @person ref,
expressing the fact that there it is just a reference to the
@person type. It is important to distinguish column types from the
normal representation types (like varchar (100)) that are part of
the normal database schema. In the above schema, representation
types are in normal font and column types are in italics.
It may appear that column types are similar to the integrity
constraints found in traditional databases, where the programmer
can indicate that a field refers to another field. Column types
encode that same information (the ssn field of worksin refers to
the @person type, which are precisely the values in the ssn field
of employee). However, column types have another role, namely to
express that fields do not share values. For example, the did field
of the department table is an integer (according to its
representation type), and the ssn field of employee is also an
integer. However, since the two fields have different column types,
the same integer cannot appear both as a value of did and a value
of ssn.
Building Classes
Shown now is how the vocabulary of a problem domain (in this case
the company structure) can be encapsulated as a hierarchy of
concepts, and we call these concepts "classes". By creating such a
hierarchy, we enable non-experts to ask questions about the data in
familiar terms. The use of the word "classes" draws on an analogy
with other fields of computer programming, where it has proved
economical to build libraries of common operations, which can be
reused as-is, or adapted to new requirements, without having to
reconstruct many similar queries from scratch. The most successful
paradigm for achieving such re-use is object-oriented programming.
Examples of object-oriented programming libraries are the Java
collection classes, and the C++ Standard Template Library STL. A
simple combination of object-oriented programming and query
languages has however long been known to be impossible (Jeffrey D.
Ullman: A Comparison between Deductive and Object-Oriented Database
Systems. In: Claude Delobel, Michael Kifer, Yoshifumi Masunaga
(Eds.): Deductive and Object-Oriented Databases, Second
International Conference, DOOD'91, Munich, Germany, Dec. 16-18,
1991, Proceedings. Lecture Notes in Computer Science 566 Springer
1991, ISBN 3-540-55015-1: 263-277. The aforementioned reference is
hereby incorporated by reference in its entirety). However, despite
that impossibility, we find it useful to use the terminology of
object-oriented programming in this context, as it is familiar to
those skilled in the art, and the analogy may thus be helpful in
creating an intuition about the semantics as used herein.
A Class for Employees
An employee is any person employed by Fuzzles Inc. Within Fuzzles,
an employee's status is indicated by the colored pattern of her
chair. The default is a grey pattern. A chair can however have more
than one color, as we shall see shortly. Here is a class for
defining employees, with a method for reporting the name of an
employee, the department she works in, and the color of her chair,
and finally a toString method for producing a string:
TABLE-US-00002 class Employee extends @person { string getName( ) {
employee(this,result) } Department getWorksInDepartment( ) {
worksin(this,result) } string getChairColor( ) { result="grey" }
string toString( ) { result=this.getName( ) } }
In general, a .QL class is a logical property. Here the logical
property is particularly simple, as it is just that this occurs as
a value in the @person column type. The body of a method is a
relation between two special variables named this and result. If
there are method parameters, those can also be used in the method
body. Note that while we use the syntax and terminology of
object-oriented programming in the presentation of .QL classes, the
actual meaning is radically different from traditional
object-oriented programming languages.
Given the above Employee class, one can write a select statement to
report all employees named `John`, with their department and chair
colors:
TABLE-US-00003 from Employee e where e.getName( ).matches("John %")
select e, e.getWorksInDepartment( ), e.getChairColor( )
A Class for Cyclists
In an attempt to encourage eco-friendly behavior, employees who
cycle to work get a green chair. That way, all their colleagues can
see that they make a special effort. As a class, Cyclist extends
Employee, and overrides its getChairColor method:
TABLE-US-00004 class Cyclist extends Employee { Cyclist( ) {
cyclist(this,_) } string getChairColor( ) { result="green" } }
As mentioned earlier, a class in .QL is a logical property. Here
that logical property is that this is an Employee who furthermore
occurs in the cyclist relation. The employee part of the property
is expressed by saying that Cyclist extends Employee, whereas the
cyclist part is specified in the constructor Cyclist( ). The
underscore (_) indicates a don't-care value.
Again, we note the discrepancy between traditional object-oriented
programming and the notion of `extends` in .QL: in .QL `extends` is
a form of logical conjunction, and therefore a semantic property.
By contrast, in a language like Java, inheritance is a syntactic
construct, and not a logical operation.
When a method call e.getChairColor( )is evaluated, the .QL engine
determines what the most specific classes are for e, and it uses
the definition of getChairColor from those classes. This process is
detailed in FIG. 6. As we shall see shortly, it is possible to have
more than one most specific class for a value.
Modelling Departments
Next, we define a class for departments. Within Fuzzles Inc.,
departments can have subsidiaries, which can have subsidiaries of
their own, and so on. A department has a direct size (everyone who
works just at that level), and also a total size, which is the sum
of the direct sizes of all its subdepartments including itself:
TABLE-US-00005 class Department extends @department { int
getDirectSize( ) { result = count(Employee e | worksin(e,this)) }
int getSize( ) { result = sum(Department d | this.getAChild*( ) = d
| d.getDirectSize( )) } Department getAChild( ) {
hassubsidiary(this,result) } Department getParent( ) {
hassubsidiary(result,this) } Manager getManager( ) {
manages(result,this) } string toString( ) { department(this,result)
} string getName( ) { result=this.toString( ) } }
The getAChild method may return multiple results, one for each
subsidiary of this department. The method getDirectSize presents a
first example of an aggregate, namely to count the number of
employees who work in this department. Next, in computing the total
size, we apply the getAChild method in a chained fashion, thus
finding all subsidiaries of the current department. To be more
precise, this.getAChild*( ) returns the results of evaluating just
this, or this.getAChild( ) or this.getAChild( )getAChild( ), . . .
and so on. We take the sum of the direct size of all departments d
obtained in this fashion.
A query to find all subsidiaries below a given department named
"Products" might read as follows:
TABLE-US-00006 from Department products where products.getName( ) =
"Products" select products.getAChild*( )
To get all the managers who work for those subsidiaries, you can
simply write
TABLE-US-00007 from Department products where products.getName( ) =
"Products" select products.getAChild*( ).getManager( )
Note that products.getAChild*( ).getManager( ) may return many
results. Such expressions are sometimes said to be `many-valued`.
The evaluation of this expression 901 is illustrated in FIG. 9. The
department 902 named by the string 905 "Products" has subsidiary
departments 903, namely children 906, 907 and 908. In turn child1
906 has children child11 909 child11 and child12 910. Managers 904
are depicted at the bottom of FIG. 9. Child12 has manager manager1
911 and manager2 912. Many-valued expressions can also be used in
conditions. For instance, to find all managers (in `Products` or
its subsidiaries) whose name ends with the string `son`, you could
write
TABLE-US-00008 from Department products, Manager m where
products.getName( ) = "Products" and products.getAChild*(
).getManager( ) = m and m.getName( ).matches("%son") select m
Many-valued expressions help write more concise queries because
there is no need to name intermediate results. For instance, in the
above query, in prior art query languages we would have to
explicitly name the result of products.getAChild*( ).
Classes for Various Kinds of Employee
We now turn our attention to building abstractions for different
kinds of employees. First, a part-timer is an employee who works
part time. For part-timers, it makes sense to introduce an extra
method that returns the percentage they work for:
TABLE-US-00009 class Parttimer extends Employee { Parttimer( ) {
parttime(this,_) } float getPercentage( ) { parttime(this,result) }
}
A manager is a special kind of employee, who happens to manage a
department. The number of reports for a manager is the sum of the
sizes of the departments he manages; it is possible for a manager
to be in charge of multiple departments at once. Managers have blue
chairs.
TABLE-US-00010 class Manager extends Employee { Manager( ) {
manages(this,_) } Department getManagesDepartment( ) {
manages(this,result) } string getChairColor( ) { result="blue" }
int getNumberOfReports( ) { result= sum(Department d | d =
this.getManagesDepartment( ) | d.getSize( )) } }
It is interesting to consider what happens when a manager cycles to
work. In that case there will be two most specific classes that
define the method getChairColor: the Cycle class (where it returns
green) and the above Manager class (where it returns blue). In such
cases the .QL engine returns both results, and indeed in Fuzzles
Inc., cycling managers have chairs in an attractive striped pattern
of both blue and green.
A manager has the title `vice president` when he has more than 10
reports below him. The chairs of vice-presidents are red:
TABLE-US-00011 class VP extends Manager { VP( ) {
this.getNumberOfReports( ) > 10 } string getChairColor( ) {
result="red" } }
To illustrate the fine points of virtual method dispatch in .QL,
and a few of the characteristics that distinguish it from prior
art, let us return to our earlier query
TABLE-US-00012 from Employee e where e.getName( ).matches("John %")
select e, e.getWorksInDepartment( ), e.getChairColor( )
When there is a vice president named John who is a cyclist, he will
appear twice in the results, once with a red chair and once with a
green chair: in Fuzzles Inc, eco-friendly VPs have chairs that are
striped in both red and green. That same person will still be
listed twice in the result of the subtly different query
TABLE-US-00013 from Manager e where e.getName( ).matches("John %")
select e, e.getWorksInDepartment( ), e.getChairColor( )
This is a crucial difference with all earlier proposals for
object-oriented query languages that aim to query hierarchies,
trees and graphs. Prior work such as that of Serge Abiteboul, Georg
Lausen, Heinz Uphoff and Emmanuel Waller. Methods And Rules. In:
Peter Buneman and Sushil Jaodia, Proceedings of the 1993 ACM SIGMOD
International conference on Management of Data, pages 32-41. ACM
Press, 1993 is typical, in that it considers only method
definitions in subtypes of the static receiver. For the second
query, that means that the definition of getChairColor in the
Cyclist class would not be considered. The difference is important,
because that implies that in these prior art query languages, when
writing a new query, we must understand the whole class hierarchy
in the library--which makes the creation of re-usable queries much
harder.
The previous query can also be written using an instanceof test for
guaranteeing an expression to be of a specific type:
TABLE-US-00014 from Employee e where e.getName( ).matches("John %")
and e instanceof Manager select e, e.getWorksInDepartment( ),
e.getChairColor( )
Similarly to mainstream object-oriented languages again, it is
possible to cast an expression to a specific type in order to call
a method that is defined for that type only. For example:
TABLE-US-00015 from Employee e select e, ((Manager)e).
getManagesDepartment ( )
Note that such a cast triggers no error for employees who are not
managers, but instead guarantees, like with instanceof, that all
the returned employees are managers. It is therefore again an
example where the analogy with existing object-oriented programming
languages is useful for building intuition, but the technical
details demonstrate it is not a straightforward combination of
existing ideas.
In one embodiment, the class definitions may extend multiple
classes at once. For a manager that works part time, it makes sense
to compute the feasibility metric, i.e. the ratio of time worked
divided by the number of reports times 100:
TABLE-US-00016 class ParttimeManager extends Parttimer, Manager {
string getChairColor( ) { result="purple" } float getFeasibility( )
{ result = (this.getPercentage( ) / this.getNumberOfReports( )) *
100 } }
Note the use of multiple inheritance, which is just logical `and`:
a ParttimeManager is both a Parttimer and a Manager.
In FIG. 3, the complete class hierarchy for the above example is
shown. Each class .[.301-307.]. .Iadd.301, 302, 303, 304, 305, 306,
and 307 .Iaddend.corresponds to a characteristic logical property,
and we could therefore identify a class with the set of values that
satisfies that property. FIG. 4 shows a Venn diagram of those sets
for the classes .[.401-405.]. .Iadd.401, 402, 403, 404, and 405
.Iaddend.defined above. Finally, FIG. 5 shows the result of running
the chair color query. The process of selecting a most-specific
implementation of a method (here getChairColor) is shown in FIG.
6.
Implementing .QL
As previously described above, the .QL query language, provides a
convenient and expressive formalism in which to write queries over
complex data. We discussed the object-oriented features of .QL,
which allow complex queries to be packaged up and reused in a
highly flexible fashion. These features are essential to build up a
library of queries that corresponds to the vocabulary of a problem
domain. Now an example of how to implement .QL is discussed. We
first describe the intermediate language used for .QL queries, a
deductive query language known as Datalog. We then sketch the
translation of--QL programs into Datalog. It is well-known how
Datalog itself can be implemented (on top of traditional database
systems, to find data on the web, or in other sources of
semi-structured data), so details of how Datalog itself may be
executed are omitted.
Datalog
.QL is a significant extension of a simple form of logic
programming known as Datalog, originally designed as an expressive
language for database queries. All .QL programs can be translated
into Datalog, and the language draws on the clear semantics and
efficient implementation strategies for Datalog. In this section we
describe the Datalog language before outlining how .QL programs may
be translated into Datalog. Datalog is essentially a subset of .QL,
and as such we shall be using .QL syntax for Datalog programs. The
various sublanguages of .QL and Datalog are depicted in FIG. 7: the
object-oriented query language .QL 701, Datalog with various
extensions 702 and pure Datalog 703.
Predicates
A Datalog program is a set of predicates defining logical
relations. These predicates may be recursive, which in particular
allows the transitive closure operations to be implemented. A
Datalog predicate definition is of the form: predicate
p(T.sub.1x.sub.1, . . . , T.sub.nx.sub.n){formula}
This defines a named predicate p with variables x.sub.1, . . . ,
x.sub.n. In a departure from classical Datalog each variable is
given a type. These restrict the range of the relation, which only
contains tuples (x.sub.1, . . . , x.sub.n) where each x.sub.i has
the type T.sub.i.
The body of a Datalog predicate is a logical formula over the
variables defined in the head of the clause. These formulas can be
built up as follows:
TABLE-US-00017 formula ::= predicate(variable, . . . , variable) |
test(variable, . . . , variable) | variable = expr | not(formula) |
formula or formula | formula and formula | exists(Type variable |
formula)
That is, a formula is built up from uses of predicates through the
standard logical operations of negation, disjunction, conjunction
and existential quantification. In addition to predicates, tests
are allowed in Datalog programs. A test is distinct from a
predicate in that it can only be used to test whether results are
valid, not generate results. An example of a test is a regular
expression match. The test matches("C %",X) is intended to match
all strings beginning with "C". Evidently such a test cannot be
used to generate strings, as there are infinitely many possible
results, but may constrain possible values for X. In contrast, a
predicate such as employee(SSN,NAME) may generate values--in this
case, the variables SSN and NAME are bound to each pair of elements
for which person SSN has name NAME.
Arguments to predicates are simply variables in Datalog, but
expressions allow the computation of arbitrary values. Expressions
are introduced through formulas such as X=Y+1 defining the value of
a variable, and include all arithmetic and string operators. In
addition, expressions allow aggregates to be introduced:
TABLE-US-00018 expr ::= variable | constant | expr + expr | expr *
expr | . . . | aggregate
Our definition of Datalog differs from usual presentations of the
language in several respects. The first difference is largely
inessential. While we allow arbitrary use of logical operators in
formulas, most presentations require Datalog predicates to be in
disjunctive normal form, where disjunction can only appear at the
top level of a predicate and the only negated formulas are
individual predicates. However, any formula may be converted to
disjunctive normal form, so this does not represent a major
departure from pure Datalog. Expressions, on the other hand, are
crucial in increasing the expressiveness of the language. In pure
Datalog expressions are not allowed, and this extension to pure
Datalog is non-trivial, with an impact on the semantics of the
language.
Datalog Programs
A Datalog program contains three parts:
A query. This is just a Datalog predicate defining the relation
that we wish to compute.
A set of user-defined, or intensional predicates. These predicates
represent user-defined relations to be computed to evaluate the
query.
A set of extensional predicates. These represent the elements
stored in the database to be queried.
The general structure of a Datalog program therefore mirrors that
of a .QL program. The query predicate corresponds to the query in a
.QL program, while classes and methods may be translated to
intensional predicates. Finally, in the context of program queries
the extensional predicates define the information that it stored
about the program. Examples were given in the sections above
beginning with database annotations.
Semantics and Recursion
The semantics of Datalog program are very straightforward, in
particular in comparison to other forms of logic programming such
as Prolog. A key property is that termination of Datalog queries is
not an issue. The simplicity of the semantics of Datalog programs
(and by implication of .QL programs) is an important factor in its
choice as an intermediate query language, as it is straightforward
to generate Datalog code. It is worth exploring the semantics in a
little more detail, however, as a few issues crop up when assigning
meaning to arbitrary Datalog programs.
For our purposes, the meaning of a Datalog program is that each
predicate defines a relation, or set of tuples, between its
arguments. Other, more general, interpretations of Datalog programs
are possible such as those described in Allen van Gelder and
Kenneth A. Ross and John S. Schlipf. The Well-Founded Semantics For
General Logic Programs. Journal of the ACM, volume 38, number 3,
pages 620-650, 1991, the teachings of which are hereby incorporated
by reference in its entirety. An important feature is that these
relations should be finite, so that they may be represented
explicitly in a database or in memory. It is customary to enforce
this through range restriction, that is to say ensuring that each
variable that is an argument to a predicate should be restricted to
a finite set. In our case, this is largely straightforward, as each
variable is typed. Column types such as @person or @department
restrict variables to certain kinds of information already in the
database, in this case the sets of classes or reference types in
the program. As there can only be finitely many of these, any
variable with such a type is automatically restricted. However,
primitive types such as int are more troublesome. Indeed it is easy
to write a predicate involving such variables that defines an
infinite relation: predicate p(int X,int Y){X=Y}
This predicate contains all pairs (X,X), where X is an integer,
which is infinite and therefore disallowed. For that reason, the
type system of--QL ensures that any variable of primitive type is
always constrained by a predicate, restricting its range to a
finite set.
In the absence of recursion, the semantics of a Datalog program is
very straightforward. The program can be evaluated bottom-up,
starting with the extensional predicates, and working up to the
query. Each relation, necessarily finite by range-restriction, can
be computed from the relations it depends on by simple logical
operations, and so the results of the query can be found.
The situation is more interesting in the presence of recursion.
Unlike other logic programs in which evaluation of a recursive
predicate may fail to terminate, in Datalog the meaning of a
recursive predicate is simply given by the least fixed point of the
recursive equation it defines. As an example, consider the
recursive predicate predicate p(int X,int Y){q(X,Y) or (p(X,Z) and
q(Z,Y))}
where q denotes (say) the relation {(1,2), (2,3), (3,4)}. Then p
denotes the solution of the relation equation P=q.orgate.P;q, in
which ; stands for relational composition. This is just the
transitive closure of q, so the relation p is simply
p={(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)}
This least fixed point interpretation of Datalog programs makes it
easy to find the value of any predicate. For instance, consider
predicate p(int X){p (X)}
This predicate would be nonterminating as a Prolog program.
However, in Datalog this is just the least solution of the equation
P=P. As every relation satisfies this equation, the result is just
the empty relation.
More precisely, the meaning of a Datalog program can be defined as
follows. First, break the program up into components, where each
component represents a recursive cycle between predicates
(formally, a strongly-connected component in the call graph).
Evaluation proceeds bottom-up, starting with extensional predicates
and computing each layer as a least fixed point as above.
There are two technical restrictions to the use of recursion in
Datalog. The first is known as stratification, and is necessary to
deal with negation properly. The problem can be illustrated by this
simple example: predicate p(@person X){not(p(X))}
What should this predicate mean? It is defined as its own
complement, so a person lies in p iff it she does not lie in p.
There is no relation satisfying this property, so we cannot assign
a simple relational interpretation to this program. To avoid this
issue, we only consider `stratified` Datalog. In this fragment of
Datalog, negation cannot be used inside a recursive cycle. That is,
a cycle through mutually recursive predicates cannot include
negation. This is not a problem in practice, and stratification is
not a substantial obstacle to expressiveness.
A similar problem is posed by our use of expressions, which does
not lie in the scope of classical Datalog. While expressions
increase the power of the language, their interaction with
recursion is problematic. For instance, consider the following:
predicate p(int Y){Y=0 or (Y=Z+1 and p(X,Z))}
Clearly 0 lies in p. Therefore 1 must also lie in p from the
recursive clause, and in this manner every number n lies in p. The
use of expressions in recursive calls may therefore lead to
infinite relations, and thus nontermination. In .QL this may also
lead to nonterminating queries, and so care must be used when using
arithmetic expressions in recursive calls--if, as in the above
example, the expression can create new values for each recursive
call, then the query may be non-terminating.
Translating .QL
The precise semantics of .QL programs are defined by their
translation into Datalog programs. The overall structure of .QL
programs mirrors that of Datalog programs. In particular, the query
in a .QL program is translated into a Datalog query, while methods
and classless predicates are translated to Datalog intensional
predicates.
Translating Queries
The general form of a .QL query (ignoring order by clauses, which
merely amount to a post-processing step) is:
TABLE-US-00019 from T.sub.1 x.sub.1, T.sub.2 x.sub.2, . . .,
T.sub.n x.sub.n where formula select e.sub.1, e.sub.2, . . . ,
e.sub.k
where each e.sub.i is an expression, and each x.sub.i is a declared
variable of type T.sub.i.
It is straightforward to translate this to a Datalog query, which
is just a standard predicate. The resulting relation has k
parameters (one for each selected expression), and so the query
predicate has k parameters. The variables x.sub.1 through x.sub.n
can be introduced as local variables, defined by an existential
quantifier. As a result, the Datalog translation of the above
query, omitting types, is:
TABLE-US-00020 predicate p(res.sub.1, res.sub.2, . . . , res.sub.k)
{ exists (T.sub.1 x.sub.1, T.sub.2 x.sub.2, . . . , T.sub.n x.sub.n
| formula.sub.2 and res.sub.1 = e.sub.1 and res.sub.2 = e.sub.2 and
. . . and res.sub.k = e.sub.k ) }
where formula.sub.2 is obtained from formula by translating away
all non-Datalog features of .QL, and in particular method calls, as
described below.
Translating Classes
Classes are translated into individual Datalog predicates,
representing constructors, methods and class predicates. In most
cases the translation is straightforward, the key aspect being the
translation of method calls.
A .QL method is merely a particular kind of Datalog predicate
involving two special variables--this and result. The this variable
holds the value that is a member of the class, while the result
variable holds the result of the method. As an example, consider
the following method to compute a string representation of an
employee:
TABLE-US-00021 class Employee extends @person { . . . string
toString( ) {result="employee:"+this.getName( ) } }
This is translated into the following Datalog predicate:
TABLE-US-00022 predicate Employee_toString(Employee this, string
result) { exists(string name | Employee_getName(this,name) and
result="employee:"+name) }
The translation of parameterless methods extends to methods taking
an arbitrary number of parameters, in which case the two parameters
this and result are simply added to the list of parameters. Apart
from the translation of method inheritance, which we will describe
shortly, there are few differences between the body of the method
and the body of the generated predicate. Class predicates are
similar, but as predicates do not return a value, the result
variable is not used.
The above example highlights one of the crucial advantages of .QL
methods over Datalog predicates, in addition to extensibility. In
Datalog, it is necessary to name each intermediate result, as is
the case with the name variable in the above example, which had to
be declared with exists. In contrast, methods returning (many)
values allow queries to be written in a much more concise and
readable manner.
Finally, constructors are simply translated to Datalog predicates
denoting the character of each class. For instance, consider the
definition of the Manager class discussed earlier:
TABLE-US-00023 class Manager extends Employee { Manager( ) {
manages(this,_) } . . . }
The constructor for this class is translated into a predicate
defining precisely those persons who are Managers. These are the
Employees that additionally occur in the first column of the
manages relation in the database:
TABLE-US-00024 predicate Manager(Employee this) { manages(this,_)
}
When a class inherits from multiple classes, the translation is a
little more complicated. Consider the class
class ParttimeManager extends Parttimer, Manager { . . . }
This class extends both Parttimer and Manager, and thus a person is
a ParttimeManager exactly when she is both a Parttimer and a
Manager. This is encoded in the generated characteristic predicate
for ParttimeManager:
predicate ParttimeManager(Manager this) {Parttimer (this)}
Despite the fact that ParttimeManager does not define a
constructor, it restricts the range of values that it encompasses
by inheritance, and thus this characteristic predicate must be
generated.
Characteristic predicates are also used for the translation of
casts and instanceof tests. To guarantee an expression e to be of a
specific type T, a call to the characteristic predicate of T is
generated with the expression e passed as its single argument.
Translating Method Calls
In the above, we have described the translation of methods into
Datalog predicates with extra arguments this and result, and
informally shown some method calls translated into calls to the
generated predicates. In our examples, the translation was
straightforward, as the type of the receiver was known, and so it
was immediately apparent which predicate should be called. However,
as .QL uses virtual dispatch, the method that is actually used
depends on the value it is invoked on, and the previous translation
scheme cannot work in general.
To illustrate the translation of method dispatch in .QL, consider
the following class hierarchy:
TABLE-US-00025 class All { All( ) { this=1 or this=2 or this=3 or
this=4 } string foo( ) { result = "A" } } class OneOrTwo extends
All { OneOrTwo( ) {this=1 or this=2 } string foo( ) {result ="B" }
} class TwoOrThree extends All { TwoOrThree( ) {this=2 or this=3 }
string foo( ) {result="C" } }
As we have seen previously, each of the implementations of foo is
translated into a Datalog predicate:
TABLE-US-00026 predicate All_foo(All this, string result) { result
= "A" } predicate OneOrTwo_foo(OneOrTwo this, string result) {
result = "B" } predicate TwoOrThree foo(TwoOrThree this, string
result) { result = "C" }
However, when a call to the foo method is encountered, the
appropriate methods must be chosen, depending on the value of the
receiver of the call. .QL method dispatch selects the most specific
methods, of which there may be several due to overlapping classes,
and returns results from all most specific methods. Only the most
specific methods are considered, so that a method is not included
if it is overridden by a matching method.
This virtual dispatch mechanism is implemented by defining a
dispatch predicate for each method, testing the receiver against
the relevant types and choosing appropriate methods. Testing the
type of the receiver is achieved by invoking the characteristic
predicate for each possible class, leading to the following
dispatch method for foo:
TABLE-US-00027 predicate Dispatch_foo(All this, string result) {
OneOrTwo_foo(this, result) or TwoOrThree_foo(this, result) or
(not(OneOrTwo(this)) and not(TwoOrThree(this)) and All_foo(this,
result)) }
Let us examine this dispatch predicate a little more closely. The
parameter this is given type All, as this is the most general
possible type in this case. The body of the predicate consists of
three possibly overlapping cases. In the first case, the foo method
from OneOrTwo is called. Note that this only applies when this has
type OneOrTwo, due to the type of the this parameter in OneOrTwo.
As OneOrTwo does not have any subclasses, its foo method cannot be
overridden, and whenever it is applicable it is necessarily the
most specific. The second case is symmetrical, considering the
class TwoOrThree. These cases are overlapping, if this=2, and so
the method can return several results. Finally, the third case is
the default case. If this did not match either of the specific
classes OneOrTwo or TwoOrThree, the default implementation in All
is chosen.
Suppose now that we extend the example by adding two further class
definitions, namely
TABLE-US-00028 class OnlyTwo extends OneOrTwo, TwoOrThree { foo( )
{ result = "D" } } class AnotherTwo extends All { AnotherTwo( ) {
this = 2 } foo( ) { result = "E" } }
In this new hierarchy, we added two classes with exactly the same
characteristic predicate. This changes method dispatch whenever
this=2, as the newly introduced methods are more specific than
previous methods for this case. To extend the previous example with
these new classes, we simply lift out the new implementations of
foo:
TABLE-US-00029 predicate OnlyTwo_foo(OnlyTwo this, string result) {
result = "D" } predicate AnotherTwo_foo(AnotherTwo this, string
result) { result = "E" }
and change the dispatch predicate accordingly:
TABLE-US-00030 predicate Dispatch_foo(All this, string result) {
OnlyTwo_foo(this, result) or AnotherTwo_foo(this, result) or
(not(OnlyTwo(this)) and OneOrTwo_foo(this, result)) or
(not(OnlyTwo(this)) and TwoOrThree_foo(this, result)) or
(not(OneOrTwo(this)) and not(TwoOrThree(this)) and not
(AnotherTwo(this)) and All_foo(this, result)) }
The only changes, apart from the introduction of cases for the two
new classes, is that the existing cases for OneOrTwo, TwoOrThree
and All must be amended to check whether the method is indeed the
most specific one.
Finally, a method call adorned with the chaining operator + or * is
translated to a call to a predicate encoding the transitive
closure, or reflexive transitive closure respectively, of the
original method call. For instance, d.getAChild+( )translates to a
call to:
TABLE-US-00031 predicate getAChildTC (Department X, Department Y) {
Department_getAChild(X,Y) or (getAChildTC (X,Z) and
Department_getAChild(Z,Y)) }
Embodiments
One particular way the invention can be realized is depicted in
FIG. 2. A user 201 writes a query 203, using the library 202 of
re-usable queries that is created via classes in .QL. The query is
translated into a Datalog intermediate representation 205, and at
that level optimizations are applied. The Datalog is then converted
into SQL 206. Next, the SQL is run on a relational database 207,
producing the answer 204 and reporting it back to the user.
Other possible realizations are obtained by replacing SQL and the
database in FIG. 2 by other search engines. As an example FIG. 10
is a block diagram of a computer system with multiple datasources
.[.1006-1008.]. .Iadd.1006, 1007, and 1008 .Iaddend.which can be
searched by user 1001 from a single .QL query .[.1002-1005.].
.Iadd.1002, 1003, 1004, and 1005.Iaddend., possibly simultaneously
to return answer 1009. Note in particular the differences between
FIG. 2 and FIG. 10: the single database 207 has been replaced by
multiple datasources 1006, 1007 and 1008. This illustrates an
important aspect of the present invention, i.e., the capability to
simultaneous search across multiple data sources, regardless of the
format the data is stored in.
Examples of such other search engines are a search engine for XML
documents, a text-based search engine, an API for accessing web
services, anAPI for querying information on a social networking
site, any other search engine for social networking sites, an
online patent database search engine, a search engine for financial
data, enterprise search engines, tools for software asset
management (which store software assets in some searchable format),
and so on.
Non-Limiting Hardware Examples
Overall, the present invention can be realized in hardware or a
combination of hardware and software. The processing system
according to a preferred embodiment of the present invention can be
realized in a centralized fashion in one computer system, or in a
distributed fashion where different elements are spread across
several interconnected computer systems and image acquisition
sub-systems. Any kind of computer system--or other apparatus
adapted for carrying out the methods described herein--is suited. A
typical combination of hardware and software is a general-purpose
computer system with a computer program that, when loaded and
executed, controls the computer system such that it carries out the
methods described herein.
An embodiment of the processing portion of the present invention
can also be embedded in a computer program product, which comprises
all the features enabling the implementation of the methods
described herein, and which when loaded in a computer system--is
able to carry out these methods. Computer program means or computer
programs in the present context mean any expression, in any
language, code or notation, of a set of instructions intended to
cause a system having an information processing capability to
perform a particular function either directly or after either or
both of the following a) conversion to another language, code or,
notation; and b) reproduction in a different material form.
FIG. 12 is a block diagram of an information processing system such
as a computer system useful for implementing the software steps of
the present invention. Computer system 1200 includes a display
interface 1208 that forwards graphics, text, and other data from
the communication infrastructure 1202 (or from a frame buffer not
shown) for display on the display unit 1210. Computer system 1200
also includes a main memory 1206, preferably random access memory
(RAM), and optionally includes a secondary memory 1212. The
secondary memory 1212 includes, for example, a hard disk drive 1214
and/or a removable storage drive 1216, representing a floppy disk
drive, a magnetic tape drive, an optical disk drive, etc. The
removable storage drive 1216 reads from and/or writes to a
removable storage unit 1218 in a manner well known to those having
ordinary skill in the art. Removable storage unit 1218, represents
a CD, DVD, magnetic tape, optical disk, etc. which is read by and
written to by removable storage drive 1216. As will be appreciated,
the removable storage unit 1218 includes a computer usable storage
medium having stored therein computer software and/or data. The
terms "computer program medium," "computer usable medium," and
"computer readable medium" are used to generally refer to media
such as main memory 1206 and secondary memory 1212, removable
storage drive 1216, a hard disk installed in hard disk drive 1214,
and signals.
Computer system 1200 also optionally includes a communications
interface 1224. Communications interface 1224 allows software and
data to be transferred between computer system 1200 and external
devices. Examples of communications interface 1224 include a modem,
a network interface (such as an Ethernet card), a communications
port, a PCM-CIA slot and card, etc. Software and data transferred
via communications interface 1224 are in the form of signals which
may be, for example, electronic, electromagnetic, optical, or other
signals capable of being received by communications interface 1224.
These signals are provided to communications interface 1224 via a
communications path (i.e., channel) 1226. This channel 1226 carries
signals and is implemented using wire or cable, fiber optics, a
phone line, a cellular phone link, an RF link, and/or other
communications channels.
Although specific embodiments of the invention have been disclosed,
those having ordinary skill in the art will understand that changes
can be made to the specific embodiments without departing from the
spirit and scope of the invention. The scope of the invention is
not to be restricted, therefore, to the specific embodiments.
Furthermore, it is intended that the appended claims cover any and
all such applications, modifications, and embodiments within the
scope of the present invention.
* * * * *