U.S. patent application number 11/615115 was filed with the patent office on 2008-06-26 for english-language translation of exact interpretations of keyword queries.
This patent application is currently assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION. Invention is credited to Prasad M. Deshpande, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan, Huaiyu Zhu.
Application Number | 20080154853 11/615115 |
Document ID | / |
Family ID | 39544343 |
Filed Date | 2008-06-26 |
United States Patent
Application |
20080154853 |
Kind Code |
A1 |
Deshpande; Prasad M. ; et
al. |
June 26, 2008 |
ENGLISH-LANGUAGE TRANSLATION OF EXACT INTERPRETATIONS OF KEYWORD
QUERIES
Abstract
The present invention relates to a methodology to translate
exact interpretations of keyword queries into meaningful and
grammatically correct plain-language queries in order to convey the
meaning of these interpretations to the initiator of the search.
The method includes the steps of generating at least one
grammatically valid plain-language sentence interpretation for a
keyword query form a generated sentence is based upon differing
matching elements, and presenting at least one grammatically valid
plain-language sentence interpretation for the keyword query to a
keyword query system user for the user's review.
Inventors: |
Deshpande; Prasad M.;
(Sunnyvale, CA) ; Krishnamurthy; Rajasekar;
(Mountain View, CA) ; Raghavan; Sriram; (San Jose,
CA) ; Vaithyanathan; Shivakumar; (Sunnyvale, CA)
; Zhu; Huaiyu; (Union City, CA) |
Correspondence
Address: |
CANTOR COLBURN LLP - IBM TUSCON DIVISION
20 Church Street, 22nd Floor
Hartford
CT
06103
US
|
Assignee: |
INTERNATIONAL BUSINESS MACHINES
CORPORATION
Armonk
NY
|
Family ID: |
39544343 |
Appl. No.: |
11/615115 |
Filed: |
December 22, 2006 |
Current U.S.
Class: |
1/1 ;
707/999.003; 707/E17.014; 707/E17.136 |
Current CPC
Class: |
G06F 16/9032
20190101 |
Class at
Publication: |
707/3 ;
707/E17.014 |
International
Class: |
G06F 7/10 20060101
G06F007/10; G06F 17/30 20060101 G06F017/30 |
Claims
1-17. (canceled)
18. A method for translating an interpretation of a keyword query
into a grammatically correct plain-language query statement, the
method comprising: acquiring at least one keyword to perform a
keyword query search upon; semantically interpreting the acquired
keyword, further including building a translation index to
determine matching elements, wherein matching elements are derived
from information comprising type names, attribute names, and atomic
attributes values that are associated with a specific keyword;
merging the matching elements in the event that differing keywords
comprise a same matching element and type alias; providing a clause
template for the customization of a plain-language sentence clause,
wherein the plain-language sentence clause is based upon the
matching elements that are selected for customization; generating
at least one plain-language sentence clause; determining if the
plain-language sentence clauses can be merged, wherein the
determination is based upon the attributes matched for a given type
element; specifying the plain-language sentence clauses that are to
be merged, the plain-language sentence clause mergers being based
upon the attributes matched for a given type element; merging the
plain-language sentence clauses; generating at least one
grammatically valid plain-language sentence for the keyword query
from the generated plain-language sentence clauses, wherein the
grammatically valid plain-language sentence is based upon differing
matching elements; presenting the at least one grammatically valid
plain-language sentence for the keyword query to a keyword query
system user for the user's review; providing a template for the
overall structure of the at least one grammatically valid
plain-language sentence; wherein the template comprises at least
one placeholder for the information that is contained within a
plain-language sentence clause; wherein the template includes a
plurality of templates, the templates are hierarchical in
structure, the templates being configured to generate clauses, and
sub-clauses that are comprised within the clauses, the clauses and
sub-clauses of the template being used to construct plain-language
sentences; wherein the plain-language sentence clauses are
classified as consecutively numbered types; wherein the templates
can be optionally labeled as having the capability of being merged,
in the event that the templates are labeled as having the
capability to be merged, then the clauses that correspond to the
templates are thereafter merged.
Description
BACKGROUND OF THE INVENTION
[0001] Field of the Invention
[0002] This invention relates to field of information retrieval
techniques, in particular to the English language translation of
exact interpretations of keyword queries.
[0003] Description of Background
[0004] Before our invention keyword searching was the most
important paradigm for Information Retrieval (IR). Conventionally,
an Avatar Semantic Search was accomplished by generating precise
queries from a keyword query that was based upon a domain-specific
system type. For a given keyword query, several possible
interpretations of the keyword query may be produced within a
search. Semantic optimizers using semantic knowledge and heuristics
operate to prune keyword query interpretations, wherein the
remaining keyword query interpretations are utilized to assist in
the keyword search. In structure, keyword query interpretations are
X-Path expressions, thus displaying the keyword query
interpretations directly to a user is of little value since the
interpretations cannot be easily understood and reviewed by the
user. Therefore, there exists a need for an approach for displaying
plain-language interpretations of X-Path expressions for review to
the initiator of an Avatar Semantic Search.
SUMMARY OF THE INVENTION
[0005] Aspects of the present invention relate to a methodology for
the translation of exact interpretations of keyword queries into
meaningful and grammatically correct plain-language queries in
order to convey the meaning of these interpretations to the
initiator of the keyword search.
[0006] The shortcomings of the prior art are overcome and
additional advantages are provided through the provision of a
method for translating an interpretation of a keyword query into a
grammatically correct plain-language query, the method comprising
the steps of acquiring at least one keyword to perform a keyword
query search upon, semantically interpreting the acquired keyword,
further including the step of building a translation index to
determine matching elements, wherein matching elements are derived
from information comprising type names, attribute names, and atomic
attributes values that are associated with a specific keyword.
[0007] The method further comprises the steps of merging the
matching elements in the event that differing keywords comprise the
same matching element and type alias, providing a clause template
for the customization of a plain-language sentence clause, wherein
the plain-language sentence clause is based upon the matching
elements that are selected for customization, and generating at
least one plain-language sentence clause, and determining if the
plain-language sentence clauses can be merged, wherein the
determination is based upon the matches on the attribute paths for
a given type element. Further, the method comprises the steps of
specifying the plain-language sentence clauses that are to be
merged, the plain-language sentence clause mergers being based on
the attribute paths for a given matching type element, and merging
the plain-language sentence clauses. Further, the method comprises
a language for specifying custom templates for generating clauses
and sentences.
[0008] Yet further, the method comprises the steps of generating at
least one grammatically valid plain-language sentence
interpretation for the keyword query from the generated sentence
plain-language sentence clauses, wherein the grammatically valid
plain-language sentence is based upon differing matching elements,
and presenting at least one grammatically valid plain-language
sentence interpretation for the keyword query to a keyword query
system user for the user's review.
[0009] System and computer program products corresponding to the
above-summarized methods are also described and claimed herein.
[0010] Additional features and advantages are realized through the
techniques of the present invention. Other embodiments and aspects
of the invention are described in detail herein and are considered
a part of the claimed invention. For a better understanding of the
invention with advantages and features, refer to the description
and to the drawings.
[0011] As a result of the summarized invention, technically we have
achieved a solution that assists in the translation of
interpretations of keyword queries into meaningful and
grammatically correct plain-language queries, the meaning of these
interpretations thereafter being displayed to the initiator of the
search.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] The matter that is regarded as the invention is particularly
pointed out and distinctly claimed in the claims at the conclusion
of the specification. The foregoing and other objects, features,
and advantages of the invention are apparent from the following
detailed description taken in conjunction with the accompanying
drawings in which:
[0013] FIG. 1 illustrates one example of a flow diagram
illustrating aspects of the methodology that relates to the present
invention.
[0014] FIG. 2 illustrates one example of a flow diagram detailing
aspects of a clause merge operation.
[0015] FIG. 3 illustrates one example of a flow diagram detailing
aspects of a sentence generation customization operation.
[0016] The detailed description explains the preferred embodiments
of the invention, together with advantages and features, by way of
example with reference to the drawings.
DETAILED DESCRIPTION OF THE INVENTION
[0017] One or more exemplary embodiments of the invention are
described below in detail. The disclosed embodiments are intended
to be illustrative only since numerous modifications and variations
therein will be apparent to those of ordinary skill in the art.
[0018] Document collections often have valuable structured
information that is associated with each document that is present
within the collection. Traditional information retrieval (IR)
models used in keyword searching employ text-centric
representations of queries and documents (e.g. term vectors, bag of
index terms, etc.). As a result, such IR models are incapable of
effectively utilizing structured metadata as part of keyword
retrieval operations. To address the mismatch between the need for
a simple keyword-based search interface, and the need for complex
queries to exploit structured data, Avatar Semantic Search
operations employ the concept of query interpretation. In
particular, Avatar Semantic Searching enumerates several possible
interpretations of a keyword query and expresses each
interpretation as a complex query over the underlying collection of
queries.
[0019] Conventionally, query interpretation is the process of
generating a set of precise queries over a data set, one for each
possible interpretation of a given keyword query. An interpretation
for a keyword assigns specific semantics for the particular
keyword. By assigning specific semantics to each keyword int he
query, very precise interpretations for the query are subsequently
produced. Thus, given a keyword query, a system generates a set of
interpretations for that query.
[0020] Turning now to the drawings in greater detail, FIG. 1 shows
a flow diagram detailing aspects of the present keyword translation
methodology. The method comprises the steps of the party that is
desirous of the keyword search supplying the keyword(s) that will
form the basis of the search (step 105). At step 110, the keyword
search is initiated, and at step 115, a clause is generated for
each keyword match that occurs within the query. Next, at step 120,
the clauses generated for word matches that have occurred within
the search are combined into a single clause. Lastly, at step 125,
the clauses form the type match, path match, and value match
occurrences in the search are combined with the keyword match
clause to form a plain-language interpretation of the keyword query
search.
[0021] within the search are combined into a single clause. Lastly,
at step 125, the clauses from the type match, path match, and value
match occurrences in the search are combined with the keyword match
clause to form a plain-language interpretation of the keyword query
search.
[0022] As an example, let us consider a keyword search over a body
of email documents. Given the task of looking for the telephone
number of an individual named Philip by locating an email message
in which the number is mentioned, a natural user query would by
`Philip telephone`. In the absence of any structured data, a
traditional IR engine would return documents that contain the
tokens `Philip` and `telephone` (ignoring synonym expansion,
stemming, etc.). Now assume that in addition to the actual text,
each document is automatically associated with four structured
attributes corresponding to the email headers: from, date, to, and
subject. Additionally, consider that the following text analysis
engines (TAES) are executed over the entire corpus of the
email:
[0023] 1. Entity recognition engines to extract names of persons
and organizations.
[0024] 2. Pattern recognition engines to extract telephone numbers
and URLs.
[0025] 3. Signature identifier to process email signatures and
extract persons, companies, websites, numbers, etc. from the text
of the signature.
[0026] In order to figure out possible interpretations for any
keyword, the system builds a translation index. The translation
index is a keyword-matching engine built over the set of all type
names (e.g., Email, Person, Telephone, . . . ), attribute names
(firstname, number, . . . ), and atomic attribute values (Philip,
pdf, 408, . . . ). This index allows us to restrict the potential
space of semantic interpretations for each keyword. Given a
keyword, the translation index returns a set of one or more
matching elements (types, paths, or values) from the semantic
catalog. Within aspects of the present invention, type matches are
based on type names, path matches are based on attribute names, and
value matches are based on the atomic attribute values. For
instance, given the keyword `telephone`, the translation index may
return a type match [type Telephone], and a path match [path
Signature, phone]. Similarly, given the keyword Philip, the
translation index may return one or more of the following value
matches: [val Person. name], [val signature.person.name], [val
Email. from], and [val Email. to]. Notice how the type and path
matches are dependent only on the type system, while the value
matches are actually dependent on the data.
[0027] During the Query interpretation stage, each token in the
query is probed against the translation index to enumerate all
possible semantic interpretations. In our case, this step results
in:
Philip=>
[0028] (1) [val Email. from]
[0029] (2) [val Signature.person.name]
[0030] (3) [val Email .to]
[0031] (4) [word Email. body]
Telephone=>
[0032] (1) [type Telephone]
[0033] (2) [path signature. phone]
[0034] (3) [word Email. body]
[0035] The fact that a token can be simply treated as a keyword is
reflected by the match [word Email. body] on the original document
text. Queries are generated by taking all of the possible
combinations of matches for each keyword. Some sample queries are
given below. The query label below are designed to reflect the
interpretations used for each keyword.
[0036] q1;1 retrieve emails from Philip containing a telephone
number
[0037] q2;2 retrieve emails containing Philip's signature with a
telephone number
[0038] q3;1 retrieve emails sent to Philip containing a telephone
number
[0039] q1;3 retrieve emails from Philip containing the keyword
Telephone
[0040] Each of these query label interpretations correspond to a
precise query over the data set. These precise queries are
evaluated, and the results of the evaluation are presented to the
user. Each interpretation of a query represents the particular
semantics for that query. It is very useful to display to the user
the semantics that the system is using, so that the user can see
the co-relation between the results and the particular
interpretation. One way to display the semantics is to show the
precise query corresponding to the interpretation to the user.
However, the precise query is expressed in the Avatar query
language, and this language may prove to be difficult for the user
to understand without first having an understanding of the Avatar
object model and query language. An alternative approach to
informing the user of the relationship between the results and an
interpretation is to generate an English language equivalent for
the query interpretation, and display the English language
equivalent to the user. Such an interpretation will be easy for any
user to understand, and the user can also straightforwardly compare
the different interpretations, selecting the interpretation that
accurately captures what they intended for the query. For example,
see the English language interpretations of the query `Philip
telephone` as listed above.
[0041] The problem that this invention solves can be described as
follows:
[0042] 1. Given a set of keywords and their semantic
interpretations, generate a grammatically valid English sentence to
represent the interpretation.
[0043] 2. The sentence generation should be easily customizable sot
that specific clauses can be generated for different types and
matches.
Generating Clauses:
[0044] The present invention provides solutions for generating a
clause for each match, and combines these clause into a meaningful
sentence. There are four types of possible matches:
[0045] 1. Type match (type k T)--this indicates that the keyword k
matches the name of a type T in the system. For example, the
keyword `Telephone` generates a type match (type `Telephone`,
Telephone)
[0046] 2. Path match (path k T.a.b)--this indicates that a keyword
k matches the name of an attribute path `a.b` for type T. Since the
type system is hierarchical, attributes can be other types. We use
a dot notation to denote a chain of attributes. For example the
keyword `Telephone` generates a path match (path `Telephone`
Signature. phone)
[0047] 3. Value match (value k T.a.b)--this indicates that a
keyword k matches on of the values taken by an attribute path `a.b`
for type T in the body. For example, the keyword `Philip` generates
a value match (value `Philip` signature.person.name) since there is
an instance of Signature in the body that has a person with name
`Philip`.
[0048] 4. Word match (word k)--this indicates the k be treated
simply as a keyword to match against the document. For example, the
keyword `Philip` generates a word match (word `Philip`)
[0049] For each kind of match, we have a default clause that gets
generated:
[0050] 1. Type match (type k T): the clause generated is either `a
T` or `an T` depending on the first letter of T. For example, (type
`Telephone` Telephone) generates `a Telephone`.
[0051] 2. Path match (path K T.a.b.c): the clause generated is
`a/an T having a/an a with a/an b with a/an c`. For example, (path
`Telephone` Signature. phone) generates the clause `a Signature
having a phone`.
[0052] 3. Value match (value k T.a.b.c): the clause generated is
`a/an T having a/an a with a/an b with a/an c containing k`. For
example, (value `Philip` signature.person.name) generates the
clause "a Signature having a person with a name containing
`Philip`".
[0053] 4. Word match (word k): the clause generated is `k`. For
example, (word `Philip`) generates the clause `"Philip"`
Combining the Clauses
[0054] The clauses generated from the matches are put together in a
sentence. With aspects of the present invention, the construction
of a valid sentence from clauses is based upon the grammatical
rules for the English language; however, the present methodology
can be adapted to conform to the grammatical rules of languages
other than English. In the present implementation, since the
sentence is of a very specific form, we can construct it in a more
direct manner.
[0055] Let Ck1, Ck2 . . . Ckm be the clauses from the word matches.
First, these clauses are put together into a single clause Ck=`the
keyword/s Ck1, Ck2 . . . Ckm`. For example, if there are two word
clauses `Philip` and `Telephone`, the combined clause Ck is "the
keywords `Philip` and `Telephone`".
[0056] Let C1, C2 . . . Cn be the clauses generated from type,
path, value matches, and the combined word clauses. The final
sentence will be of the form: "Retrieve documents that contain C1,
C2 . . . Cn". For example, consider an interpretation of the
keyword query `Philip telephone` that includes the matches (path
`Telephone` Signature. phone) and (word `Philip`). The clauses
generated are `a Signature having a phone` and "the keyword
`Philip`". Putting these together, we get the final sentence:
"Retrieve documents that contain a signature having a phone and the
keyword `Philip`".
Handling type Merge
[0057] In some interpretations, the different keywords might match
the same type. For example, (value `Philip` signature.person.name)
and (path `Telephone` Signature. phone) refer to the same type
signature. In this event there are two possibilities: either the
two matches might refer to different signature instances, or they
refer to the same signature instance. The semantics of the two
choices are different. In one case, we are looking for emails that
contain a signature having a person with name `Philip`, and a
signature (may be same or different) having a phone number. In the
single instance case, we are looking for emails that contain a
signature having a person with name `Philip` and a phone number.
The process of having different matches for a type refer to the
same instance is called type merging (See FIG. 2). These two
choices are considered as separate interpretations and are
generated by the system using type merge. As shown at step 205, an
initial determination is made to assign a type instance to each
match.
[0058] The information about the type instance for any match is
also stored in an interpretation using at type alias. If the alias
for two matches is the same, they refer to the same instance (step
210). Adding type alias to our notation, the two choices are:
[0059] 1. (value `Philip` Signature.person.name s1), (path
`Telephone` Signature. phone s2) where the matches refer to
different instances of Signature s1 and s2.
[0060] 2. (value `Philip` Signature.person.name s1), (path
`Telephone` Signature. phone s1) where the matches refer to the
same instance of Signature s1.
[0061] To generate an appropriate English representation for an
interpretation with type merge, we first group matches by their
type alias. For example,
[0062] 1. If the matches are (value `Philip` Signature.person.name
s1), (path `Telephone` Signature. Phone s2), we have two groups:
s1: {(value `Philip` Signature.person.name s1)} and s2: {(path
`Telephone` Signature. phone s1)}.
[0063] 2. If the matches are (value `Philip` Signature. person.
name s1), (path `Telephone` Signature. phone s1) we have a single
group s1: {(value `Philip` signature.person.name s1), (path
`Telephone` Signature. phone s1)}.
[0064] Type merge affects the way clauses are generated for
matches. Type merge is not applicable for a type match, since the
system automatically prunes multiple type matches to the same type.
Type merge is also not applicable for a word match, since word
matching is for the document content and not any particular type
instance. Let us now revisit the clause generation for path and
value matches. Type merge implies a clause merge on the generated
English clause. Rather than generating a clause for each match, we
generate a clause for each group when matches are grouped by the
type alias (step 215). The clause for a group has the type
mentioned once and has a sub clause for each different match in the
group, consider these examples:
1. Only Path Matches
[0065] After grouping by type aliasing, consider a group that
contains
[0066] t1: {(path K1 T.a.b.c t1), (path K2 T.e.f t1)}
[0067] The clause generated is `a/an T having a/an a with a/an b
with a/an c and a/an e with a/an f,
[0068] For example, the clause for the interpretation with the
group s1: {(path `Philip` Signature.person.name s1), (path
`Telephone` Signature. phone s1)} will be `a Signature having a
person with a name and a phone.`
2. Only Value Matches
[0069] The different value matches might refer to the same path or
different paths on the type. To handle these cases, we do a further
grouping by the path used in the value matches. [0070]
A>different paths [0071] t1: {(value K1 T.a.b.c t1), (value K2
T.e.f t1)}
[0072] The clause generated is `a/an T having a/an a with a/an b
with a/an c containing K1 and a/an e with a/an f containing K2`.
`B>common path a.b.c [0073] t1: {(value K1 T.a.b.c t1), (value
K2 T.a.b.c t1)}
[0074] The clause generated is `a/an T having a/an a with a/an b
with a/an c containing K1 and K2`.
[0075] For example, the clause for the interpretation with the
group s1: {(value `Philip` Signature.person.name s1), (value
`Thomas` Signature.person.name s1)} will be "a Signature having a
person with a name containing `Philip` and `Thomas`"
3. Both Path and Value Matches
[0076] We combine the steps described in 1 and 2. Consider a group
that contains:
[0077] t1: {(path K1 T.a.b.c t1), (value K2 T.e.f t1) (value K3
T.e.f t1)}
[0078] The clause generated is `a/an T having a/an a with a/an b
with a/an c and a/an e with a/an f containing K2 and K3`.
[0079] For example, the clause for the interpretation with the
group s1: {(value `Philip` Signature.person.name s1), (path
`Telephone` Signature. phone s1)}will be "Signature having a phone
and a person with a name containing `Philip`".
Customizing the Sentence Generation
[0080] The algorithm presented until now treats all types
uniformly, and generates clauses for them based on type and
attribute names. However, very often users want to customize the
plain-language English sentence that is generated. The sentence is
more readable if customized clauses are generated for certain types
and their matches. For example, rather than saying:
"Signature having a person with a name containing `Philip`", one
can say "Philip's Signature".
[0081] We have defined a template-based algorithm for allowing
these customizations (See FIG. 3). At step 305, the user can
provide a clause template for the types and matches that she wants
to customize. At step 310, the custom clauses are generated from
these templates. A design issue to consider is the level of
sentence customization that can be allowed. For example, a given
type T can have multiple attributes (and consequently attribute
paths). Due to type merge, we may have multiple paths matching for
the same type instance. To be very general, we will need to be able
to specify a clause for matches on each subset of attributes for a
type. Consider the type Signature that has the attributes
person.name and phone. In the instance of the match (value `Philip`
signature.person.name s1), we want to generate the clause "Philip's
Signature", and for the match (path `Telephone` Signature. phone
s2), we want to generate the clause "signature having a phone
number".
[0082] In the event that the two types are merged, the ideal clause
to be generated is "Philip's Signature having his phone number".
There is no obvious way to generate this from the two individual
clauses specified by the user. The user has to specify this merged
clause explicitly to be used in case is a match on both person.name
and phone for a given instance of signature. Specifying a clause
for each subset of attributes leads to an exponential blowup in the
number of clause templates that can be specified. As a tradeoff,
users are allowed to specify templates for each path separately and
also determine if these templates can be merged. If merging is
allowed (step 315), our algorithm will merge the clauses
automatically (step 320). The details of templates and algorithms
utilized within aspects of the invention are explained below.
Template Specification:
[0083] A template is a string that comprises embedded processing
instructions and placeholders. The placeholders and instructions
are specified within the characters "<<" and ">>".
Templates are arranged hierarchically, and further a template is
provided for an overall sentence. Within aspects of the present
invention templates have placeholders for clauses, wherein each
clause is generated using a template. A clause can have sub-clauses
depending on the match type. An example of a simple sentence
template is "Retrieve all emails
<<CLAUSE0>><<CLAUSE1>>". This template has
two placeholders <<CLAUSE0>> and
<<CLAUSE1>>. The constructs allowed in templates are
described below:
[0084] <<CLAUSEX>>: This is a placeholder for a clause
of type X. Clauses can be of different types that are numbered as
0, 1 . . . n. A clause of type X will be inserted at the location
of <<CLAUSEX>>. Having clauses of different types
enables us to enforce positional constraints on where difference
clauses occur in the final sentence.
[0085] <<TRIPLE: s1: s2: s3>>: This is a processing
instruction and provides a mechanism for generating different
strings depending on the position of the clause. For example, let T
be a template that has the instruction <<TRIPLE: s1: s2:
s3>>, and E be the enclosing template, i.e. T generates a
clause that is inserted into E. The semantics of these templates
are represented as:
[0086] a> If T is the first clause to be inserted into E, then
the TRIPLE generates s1 in T
[0087] b> If T is the last but not the first clause to be
inserted in E, then the triple generates s3 in T
[0088] c> If T is neither the first nor the last clause to be
inserted in E, then the triple generates s2 in T
[0089] For example, let the template for type match on Signature be
T1="<<TRIPLE: that contain:,: and >> a signature". The
template for type match on Phone is T2="<<TRIPLE: that
contain:,: and >> a phone number". T1 and T2 are clauses of
type 0. The enclosing template is the sentence template E="Retrieve
all emails<<CLAUSE0>>". If the interpretation has two
type matches, first on Signature and the second on Phone, then
applying the semantics of TRIPLE, the first clause generated is
"that contain a signature" and the second clause is "and a phone
number". Substituting these in the enclosing template E, we get
"Retrieve all emails that contain a signature and a phone number".
The TRIPLE allows us to generate "that contain" in one case and
"and" in the other case depending on were the clause will be placed
in the sentence.
[0090] <<K>> This is a placeholder for a value in a
word match.
[0091] <<V>> This is a placeholder for a value in a
value match.
[0092] <<SET: Var>> This sets a Boolean variable called
Var to true.
[0093] <<CHKRST: Var: s1>> This checks the status of
the variable Var. If Var is set, the string s1 is generated in the
clause and Var is reset to false. Otherwise, nothing is generated
and this instruction has no effect.
[0094] SET and CHKRST give more fine grain control over strings to
generate and might be useful in cases where TRIPLE is not
sufficient. This template specification language is powerful enough
to handle a great assortment of linguistic cases.
[0095] Next we will describe what templates need to be specified
for the different cases:
[0096] Sentence template: This is the overall template of the
sentence. This will have placeholders for <<CLAUSEX>>
to indicate where the clauses of different types are to be
inserted.
EXAMPLE
[0097] 1.Sentence Template="Retrieve all emails
<<CLAUSE0>><<CLAUSE1>>"
[0098] 2. Type match template: For each type, we specify:
[0099] a> a template that generates the clause for a match on
that type. This clause will be substituted into the sentence
template.
[0100] b> the type of the clause generated.
[0101] We will refer to these templates as Type Match Template.
Example: for type `Telephone`:
[0102] Type Match template="<<TRIPLE: that contain:,: and
>> a phone number" type=1
[0103] 3. Path and Value matches: Path and value matches are
affected by type merges. So the templates for them are comprised of
multiple parts that allow generation of merged clauses.
[0104] I> First for each type, we specify:
[0105] a> a template that generates the type part of the clause
for a path or value match.
[0106] b> the type of the clause generated.
[0107] We will refer to these templates as Path/Value Match Type
Templates. Example: for type `signature`:
[0108] Path/Value Match Type template="<<TRIPLE: that
contain:,:
and>><<CLAUSE0>>signature<<CLAUSE1>>"
[0109] type=1
[0110] II> For each path, for both value and path matches we
specify:
[0111] a> a template that generates a sub-clause that gets
inserted into the type template.
[0112] b> the type of the clause generated.
[0113] c> mergeable flag--indicates if this clause can be merged
with other clauses as a result of type merge. For some custom
clause, this merging might not make sense, so this flag is set to
false. Note that if this flag is false, the clause template should
typically contain the type part as well, since it is not obtained
by merging with the Path/Value Match Type Template.
[0114] We will refer to these templates as Path/Value Match Path
Templates. Example: for type `Signature` and path
`person.name`.
[0115] Value Match Path template="<<CLAUSE>>'s "
[0116] type =0
[0117] mergeable=true
[0118] Example: for type `Signature` and path `phone`:
[0119] Path Match Path template="<<TRIPLE: having:,:
and>> a phone number"
[0120] type=1
[0121] mergeable=true
[0122] II> For each path, we also specify a value match template
that is applicable for value matches
[0123] a> a template that generates the value clause to be
inserted into the path clause generated by the Path Match
template.
[0124] We will refer to these templates as Value Match Value
templates. Example: for type `Signature` and path `person.
Name`:
[0125] Value Match Value template=<<TRIPLE::,:
and>><<V>>
[0126] 4. Word matches: we specify
[0127] a> a template to generate the keyword clause that will be
inserted into the sentence template.
[0128] b> the type of the clause generated.
[0129] We refer to this template as Word Match template. Example: a
keyword template could be,
[0130] Word Match template=<<TRIPLE: that contain:,:
and>><<K>>
[0131] type=1
[0132] Consider an interpretation that has the matches:
[0133] (value `Philip` Signature.person.name s1)
[0134] (path `Telephone` Signature.phone s1)
[0135] Note that the types have been merged.
[0136] a> the Value Match Value template for
signature.person.name is "<<TRIPLE::,: and>>
<<V>>" For the value `Philip` this resolves to,
"Philip". The TRIPLE generates an empty string since this is the
first value in the enclosing template.
[0137] b> the Value Match Path template for value match of
Signature.person.name is "<<CLAUSE>>'s" with type=0 and
mergeable=true. Substituting the value clause, this resolves to
"Philip's ". This is a clause of type 0.
[0138] c> the Path Match Path template for path match on
Signature.phone is "<<TRIPLE: having:,: and >> a phone
number" with type=1 and mergeable=true. Since this is the first
clause of type 1, this resolves to "having a phone number". This is
a clause of type 1.
[0139] d> the Path/Value Match Type template for path and value
matches for Signature is "<<TRIPLE: that contain:,:
and>><<CLAUSE0>>Signature<<CLAUSE1>>"
with type=1. Substituting the clauses generated in steps b and c in
their appropriate places and resolving the TRIPLE, we get "that
contain to Philip's signature having a phone number". We could do
this since clauses generated in b and c are both mergeable. This is
a clause of type 1.
[0140] e> finally, substituting this into the sentence template
"Retrieve all
emails<<CLAUSE0>><<CLAUSE1>>", we get the
final sentence "Retrieve all emails that contain Philip's signature
having a phone number".
[0141] Thus the template based sentence generation methodologies of
the present invention allow for the straightforward customization
of generated English sentences. If customization for a type or path
is not needed, then the user doesnt't have to specify the type or
path. In these cases, the system will automatically use default
templates that will generate sentences as described initially. In
the signature example, with default templates the system will
generate:
[0142] "Retrieve documents that contain a signature having a person
with name containing `Philip` and a phone".
[0143] The capabilities of the present invention can be implemented
in software, firmware, hardware or some combination thereof.
[0144] As one example, one or more aspects of the present invention
can be included in an article of manufacture (e.g., one or more
computer program products) having, for instance, computer usable
media. The media has embodied therein, for instance, computer
readable program code means for providing and facilitating the
capabilities of the present invention. The article of manufacture
can be included as a part of a computer system or sold
separately.
[0145] Additionally, at least one program storage device readable
by a machine, tangibly embodying at least one program of
instructions executable by the machine to perform the capabilities
of the present invention can be provided.
[0146] The flow diagrams depicted herein are just examples. There
may be many variations to these diagrams or the steps (or
operations) described therein without departing from the spirit of
the invention. For instance, the steps may be performed in a
differing order, or steps may be added, deleted or modified. All of
these variations are considered a part of the claimed
invention.
[0147] While the preferred embodiment to the invention has been
described, it will be understood that those skilled in the art,
both now and int he future, may make various improvements and
enhancements which fall within the scope of the claims which
follow. These claims should be construed to maintain the proper
protection for the invention first described.
* * * * *