U.S. patent application number 16/257800 was filed with the patent office on 2019-07-25 for method and apparatus for aggregating with information generalization.
The applicant listed for this patent is ARRIA DATA2TEXT LIMITED. Invention is credited to William Anthony Bradshaw, Ehud Baruch Reiter.
Application Number | 20190228077 16/257800 |
Document ID | / |
Family ID | 54367978 |
Filed Date | 2019-07-25 |
![](/patent/app/20190228077/US20190228077A1-20190725-D00000.png)
![](/patent/app/20190228077/US20190228077A1-20190725-D00001.png)
![](/patent/app/20190228077/US20190228077A1-20190725-D00002.png)
![](/patent/app/20190228077/US20190228077A1-20190725-D00003.png)
![](/patent/app/20190228077/US20190228077A1-20190725-D00004.png)
![](/patent/app/20190228077/US20190228077A1-20190725-D00005.png)
United States Patent
Application |
20190228077 |
Kind Code |
A1 |
Bradshaw; William Anthony ;
et al. |
July 25, 2019 |
Method And Apparatus For Aggregating With Information
Generalization
Abstract
Methods, apparatuses, and computer program products are
described herein that are configured to perform aggregation of
phrase specifications. In some example embodiments, a method is
provided that comprises identifying two or more generalized phrase
specifications. In some example embodiments, the two or more
generalized phrase specifications contain at least one aggregatable
constituent. The method of this embodiment may also include
generating an aggregated phrase specification from the two or more
generalized phrase specifications. In some example embodiments, the
aggregated phrase specification comprises a combined noun phrase
generated from the aggregatable constituents and one or more
additional constituents based on a determined level of
generalization.
Inventors: |
Bradshaw; William Anthony;
(Aberdeen, GB) ; Reiter; Ehud Baruch; (Aberdeen,
GB) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
ARRIA DATA2TEXT LIMITED |
Aberdeen |
|
GB |
|
|
Family ID: |
54367978 |
Appl. No.: |
16/257800 |
Filed: |
January 25, 2019 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15421925 |
Feb 1, 2017 |
10216728 |
|
|
16257800 |
|
|
|
|
14702325 |
May 1, 2015 |
9600471 |
|
|
15421925 |
|
|
|
|
PCT/US2012/063343 |
Nov 2, 2012 |
|
|
|
14702325 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 40/289 20200101;
G06F 40/56 20200101; G06F 40/205 20200101; G06F 40/40 20200101 |
International
Class: |
G06F 17/28 20060101
G06F017/28; G06F 17/27 20060101 G06F017/27 |
Claims
1-30. (canceled)
31. A computer-implemented method for generating language by
transforming raw input data that is at least partially expressed in
a non-linguistic format into a format that can be expressed
linguistically in a textual output, the method comprising:
identifying one or more phrase specifications generated based on
the raw input data, wherein each of the one or more phrase
specifications comprises at least one aggregatable constituent;
generating an aggregated phrase specification from one or more
generalized phrase specifications, wherein the aggregated phrase
specification comprises one or more of a combined noun phrase
generated from the at least one aggregatable constituent and one or
more additional constituents based on a determined level of
generalization; and generating the textual output based at least in
part on the aggregated phrase specification, such that the textual
output is displayable via a user interface.
32. The method of claim 31, further comprising: generating the one
or more generalized phrase specifications based on the one or more
phrase specifications.
33. The method of claim 32, wherein generating the one or more
generalized phrase specifications comprises: determining one or
more constituents that are removable in one or more phrase
specifications; and removing the one or more constituents that are
removable.
34. The method of claim 32, further comprising: identifying a
domain model based on the plurality of phrase specifications,
wherein the domain model includes at least one domain rule.
35. The method of claim 34, wherein generating the one or more
generalized phrase specifications comprises: determining one or
more constituents that are removable in one or more phrase
specifications based at least in part on the at least one domain
rule; and removing the one or more constituents that are
removable.
36. The method of claim 32, wherein generating the one or more
generalized phrase specifications comprises: determining one or
more constituents that are generalizable in one or more phrase
specifications; determining a generalized constituent for at least
one of the one or more generalizable constituents; and replacing
the one or more generalizable constituents with the generalized
constituent.
37. The method of claim 36, further comprising: removing all of one
or more constituents that are identified as removable from the one
or more phrase specifications prior to determining the one or more
constituents that are generalizable in the one or more phrase
specifications.
38. The method of claim 36, wherein the generalized constituent is
a most generalized constituent in a predefined constituent
listing.
39. The method of claim 31, further comprising: identifying two or
more generalized phrase specifications, wherein the two or more
generalized phrase specifications contain at least one aggregatable
constituent; and generating the aggregated phrase specification
from the two or more generalized phrase specifications, wherein the
aggregated phrase specification comprises one or more of a combined
noun phrase generated from the at least one aggregatable
constituent and one or more additional constituents based on a
determined level of generalization.
40. The method of claim 39, further comprising: determining that
the two or more generalized phrase specifications are still
identified as aggregatable with one of a constituent that was
removed or a constituent that is less generalized than a
generalized constituent; and populating the aggregated phrase
specification with at one of a generalized constituent, a removed
constituent, or the constituent that is less generalized than the
generalized constituent based on a predefined constituent
listing.
41. An apparatus for generating language by transforming raw input
data that is at least partially expressed in a non-linguistic
format into a format that can be expressed linguistically in a
textual output, the apparatus comprising at least one processor and
at least one memory including computer program code, the at least
one memory and the computer program code configured to, with the at
least one processor, cause the apparatus to: identify one or more
phrase specifications generated based on the raw input data,
wherein each of the one or more phrase specifications comprises at
least one aggregatable constituent; generate an aggregated phrase
specification from one or more generalized phrase specifications,
wherein the aggregated phrase specification comprises one or more
of a combined noun phrase generated from the at least one
aggregatable constituent and one or more additional constituents
based on a determined level of generalization; and generate the
textual output based at least in part on the aggregated phrase
specification, such that the textual output is displayable via a
user interface.
42. The apparatus of claim 41, wherein the at least one memory and
the computer program code configured to, with the at least one
processor, further cause the apparatus to: generate the one or more
generalized phrase specifications based on the one or more phrase
specifications.
43. The apparatus of claim 42, wherein generating the one or more
generalized phrase specifications comprises: determining one or
more constituents that are removable in one or more phrase
specifications; and removing the one or more constituents that are
removable.
44. The apparatus of claim 42, wherein the at least one memory and
the computer program code configured to, with the at least one
processor, further cause the apparatus to: identify a domain model
based on the plurality of phrase specifications, wherein the domain
model includes at least one domain rule.
45. The apparatus of claim 44, wherein generating the one or more
generalized phrase specifications comprises: determining one or
more constituents that are removable in one or more phrase
specifications based at least in part on the at least one domain
rule; and removing the one or more constituents that are
removable.
46. The apparatus of claim 42, wherein generating the one or more
generalized phrase specifications comprises: determining one or
more constituents that are generalizable in one or more phrase
specifications; determining a generalized constituent for at least
one of the one or more generalizable constituents; and replacing
the one or more generalizable constituents with the generalized
constituent.
47. The apparatus of claim 46, wherein the at least one memory and
the computer program code configured to, with the at least one
processor, further cause the apparatus to: remove all of one or
more constituents that are identified as removable from the one or
more phrase specifications prior to determining the one or more
constituents that are generalizable in the one or more phrase
specifications.
48. The apparatus of claim 46, wherein the generalized constituent
is a most generalized constituent in a predefined constituent
listing.
49. The apparatus of claim 41, wherein the at least one memory and
the computer program code configured to, with the at least one
processor, further cause the apparatus to: identify two or more
generalized phrase specifications, wherein the two or more
generalized phrase specifications contain at least one aggregatable
constituent; and generate the aggregated phrase specification from
the two or more generalized phrase specifications, wherein the
aggregated phrase specification comprises one or more of a combined
noun phrase generated from the at least one aggregatable
constituent and one or more additional constituents based on a
determined level of generalization.
50. The apparatus of claim 49, wherein the at least one memory and
the computer program code configured to, with the at least one
processor, further cause the apparatus to: determine that the two
or more generalized phrase specifications are still identified as
aggregatable with one of a constituent that was removed or a
constituent that is less generalized than a generalized
constituent; and populate the aggregated phrase specification with
at one of a generalized constituent, a removed constituent, or the
constituent that is less generalized than the generalized
constituent based on a predefined constituent listing.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 15/421,925, titled "METHOD AND APPARATUS FOR AGGREGATING WITH
INFORMATION GENERALIZATION," filed Feb. 1, 2017, which is a
continuation of U.S. application Ser. No. 14/702,325 filed May 1,
2015, now U.S. Pat. No. 9,600,471, which is a continuation of and
claims priority to International Application No. PCT/US2012/063343,
filed Nov. 2, 2012, the contents of which are incorporated herein
by reference in their entirety.
TECHNOLOGICAL FIELD
[0002] Embodiments of the present invention relate generally to
natural language generation technologies and, more particularly,
relate to a method, apparatus, and computer program product for
aggregating phrase specifications.
BACKGROUND
[0003] In some examples, a natural language generation (NLG) system
is configured to transform raw input data that is expressed in a
non-linguistic format into a format that can be expressed
linguistically, such as through the use of natural language. For
example, raw input data may take the form of a value of a stock
market index over time and, as such, the raw input data may include
data that is suggestive of a time, a duration, a value and/or the
like. Therefore, an NLG system may be configured to input the raw
input data and output text that linguistically describes the value
of the stock market index; for example, "Securities markets rose
steadily through most of the morning, before sliding downhill late
in the day."
[0004] Data that is input into a NLG system may be provided in, for
example, a recurrent formal structure. The recurrent formal
structure may comprise a plurality of individual fields and defined
relationships between the plurality of individual fields. For
example, the input data may be contained in a spreadsheet or
database, presented in a tabulated log message or other defined
structure, encoded in a `knowledge representation` such as the
resource description framework (RDF) triples that make up the
Semantic Web and/or the like. In some examples, the data may
include numerical content, symbolic content or the like. Symbolic
content may include, but is not limited to, alphanumeric and other
non-numeric character sequences in any character encoding, used to
represent arbitrary elements of information. In some examples, the
output of the NLG system is text in a natural language (e.g.
English, Japanese or Swahili), but may also be in the form of
synthesized speech.
BRIEF SUMMARY
[0005] Methods, apparatuses, and computer program products are
described herein that are configured to perform aggregation of
phrase specifications. In some example embodiments, a method is
provided that comprises identifying two or more generalized phrase
specifications. In some example embodiments, the two or more
generalized phrase specifications contain at least one aggregatable
constituent. The method of this embodiment may also include
generating an aggregated phrase specification from the two or more
generalized phrase specifications. In some example embodiments, the
aggregated phrase specification comprises a specification for a
combined noun phrase generated from the aggregatable constituents
and one or more additional constituents based on a determined level
of generalization.
[0006] In further example embodiments, an apparatus is provided
that includes at least one processor and at least one memory
including computer program code with the at least one memory and
the computer program code being configured, with the at least one
processor, to cause the apparatus to at least identify two or more
generalized phrase specifications. In some example embodiments, the
two or more generalized phrase specifications contain at least one
aggregatable constituent. The at least one memory and computer
program code may also be configured to, with the at least one
processor, cause the apparatus to generate an aggregated phrase
specification from the two or more generalized phrase
specifications. In some example embodiments, the aggregated phrase
specification comprises at least one of a combined noun phrase
generated from the at least one aggregatable constituents and one
or more additional constituents based on a determined level of
generalization.
[0007] In yet further example embodiments, a computer program
product may be provided that includes at least one non-transitory
computer-readable storage medium having computer-readable program
instructions stored therein with the computer-readable program
instructions including program instructions configured to identify
two or more generalized phrase specifications. In some example
embodiments, the two or more generalized phrase specifications
contain at least one aggregatable constituent. The
computer-readable program instructions may also include program
instructions configured to generate an aggregated phrase
specification from the two or more generalized phrase
specifications. In some example embodiments, the aggregated phrase
specification comprises at least one of a combined noun phrase
generated from the at least one aggregatable constituents and one
or more additional constituents based on a determined level of
generalization.
[0008] In yet further example embodiments, an apparatus is provided
that includes means for identifying two or more generalized phrase
specifications. In some example embodiments, the two or more
generalized phrase specifications contain at least one aggregatable
constituent. The apparatus of this embodiment may also include
means for generating an aggregated phrase specification from the
two or more generalized phrase specifications. In some example
embodiments, the aggregated phrase specification comprises at least
one of a combined noun phrase generated from the at least one
aggregatable constituents and one or more additional constituents
based on a determined level of generalization.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] Having thus described embodiments of the invention in
general terms, reference will now be made to the accompanying
drawings, which are not necessarily drawn to scale, and
wherein:
[0010] FIG. 1 is a schematic representation of a natural language
generation environment that may benefit from some example
embodiments of the present invention;
[0011] FIG. 2 illustrates an example flow diagram that may be
performed by an aggregator in accordance with some example
embodiments of the present invention;
[0012] FIG. 3 illustrates a block diagram of an apparatus that
embodies a natural language generation environment having an
aggregator in accordance with some example embodiments of the
present invention; and
[0013] FIGS. 4-5 illustrate flowcharts that may be performed by an
aggregator in accordance with some example embodiments of the
present invention.
DETAILED DESCRIPTION
[0014] Example embodiments will now be described more fully
hereinafter with reference to the accompanying drawings, in which
some, but not all, embodiments are shown. Indeed, the embodiments
may take many different forms and should not be construed as
limited to the embodiments set forth herein; rather, these
embodiments are provided so that this disclosure will satisfy
applicable legal requirements. Like reference numerals refer to
like elements throughout. The terms "data," "content,"
"information," and similar terms may be used interchangeably,
according to some example embodiments, to refer to data capable of
being transmitted, received, operated on, and/or stored. Moreover,
the term "exemplary", as may be used herein, is not provided to
convey any qualitative assessment, but instead merely to convey an
illustration of an example. Thus, use of any such terms should not
be taken to limit the spirit and scope of embodiments of the
present invention.
[0015] In language, words, phrases, sentences or the like may be
aggregated to enhance readability, for example instead of "Stocks
retreated from a broad advance yesterday. Stocks closed mixed", an
aggregated sentence may recite: "Stocks retreated from a broad
advance yesterday and closed mixed." As can be seen from this
example, the latter sentence is more readable and flows much more
naturally. By way of further example, "Pressure is stable" and
"Temperature is stable" can be aggregated into a more readable
sentence: "Pressure and temperature are stable". However, in some
examples, the complexity of the words, phrases, sentences or the
like may hinder the ability for a natural language generation
system to aggregate words, phrases, sentences or the like; for
example, aggregating sentences with detailed numeric values like:
"Pressure is stable at 20 psi" and "Temperature is stable at 30
C".
[0016] As such, some example embodiments that are described herein
are configured to aggregate phrase specifications by generalizing
their respective properties or constituents, such as the detailed
numeric values in the example above (e.g. 20 psi and 30 C). For
example, in an instance in which 20 psi and 30 C are both within
normal operating ranges, a vague descriptor that generalizes the
value of 20 psi and 30 C, such as "within their normal range" or
"within a standard operating range", would enable aggregation of
sentences that would otherwise not be aggregatable. Thus by
generalizing the numeric values, a resultant aggregated sentence
may be: "Pressure and temperature are within their normal
range".
[0017] The methods, apparatus and computer program products, as
described herein, are configured to aggregate one or more phrase
specifications. A phrase specification is a specification of the
content of a linguistic constituent such as a sentence. Such
representations of content include, but are not limited to, meaning
text theory (e.g. SemR, DSyntR and/or SSyntR), lambda calculus
representations of semantics, case frames, messages,
pre-constructed surface form fragments and/or like. As such, in
some example embodiments, one or more phrase specifications and a
domain-specific function which specifies allowable generalizations
(e.g. generalizations of constituents within the phrase
specification that do not significantly reduce utility of an output
text from an end-user's perspective) of those one or more phrase
specifications may be identified. In some example embodiments, the
one or more phrase specifications may be aggregated based on the
generalization and/or removal of one or more constituents (e.g. a
coherent subpart of a phrase specification, such as, but not
limited to, a property within a message, an argument to a
predicate, a syntactic subconstituent within a larger syntactic
element, a role within a case frame and/or like) within the phrase
specification. After generalization, the one or more generalized
phrase specifications may be compared and those generalized phrase
specifications of the one or more generalized phrase specifications
that can be aggregated (e.g. are identical but for an aggregatable
constituent) are placed into groups. For each group of phrase
specifications, a level of generalization may then be determined
that still enables the group of phrase specifications to be
aggregated. In some example embodiments, the aggregatable
constituents may be combined or otherwise merged to create an
aggregated phrase specification. In some example embodiments, the
aggregated phrase specification may contain one or more additional
constituents based on the determined level of generalization.
[0018] FIG. 1 is an example block diagram of example components of
an example natural language generation environment 100. In some
example embodiments, the natural language generation environment
100 comprises a natural language generation system 102, message
store 104, a domain model 106 and/or linguistic resources 108. The
natural language generation system 102 may take the form of, for
example, a code module, a component, circuitry and/or the like. The
components of the natural language generation environment 100 are
configured to provide various logic (e.g. code, instructions,
functions, routines and/or the like) and/or services related to the
natural language generation system, the microplanner and a
referring expression generation system.
[0019] A message store 104 or knowledge pool is configured to store
one or more messages that are accessible by the natural language
generation system 102. Messages are one example of a phrase
specification described herein and are language independent data
structures that correspond to informational elements in a text
and/or collect together underlying data, referred to as properties,
arguments or slots, which can be presented within a fragment of
natural language such as a phrase or sentence. Messages may be
represented in various ways; for example, each property may consist
of a named attribute and its corresponding value; these values may
recursively consist of sets of named attributes and their values,
and each message may belong to one of a set of predefined types.
The concepts and relationships that make up messages may be drawn
from an ontology (e.g. a domain model 106) that formally represents
knowledge about the application scenario. In some examples, the
domain model 106 is a representation of information about a
particular domain and specifies how information about a domain is
communicated in language. For example, a domain model may contain
an ontology that specifies the kinds of objects, instances,
concepts and/or the like that may exist in the domain in concrete
or abstract form, properties that may be predicated of the objects,
concepts and the like, relationships that may hold between the
objects, concepts and the like, and representations of any specific
knowledge that is required to function in the particular domain.
The domain model 106 may also contain a set of rules for
generalization, removal and/or aggregation of phrase specifications
that are generated based on a corpus analysis, domain analysis or
the like.
[0020] In some example embodiments, a natural language generation
system, such as natural language generation system 102, is
configured to generate words, phrases, sentences, text or the like
which may take the form of a natural language text. The natural
language generation system 102 comprises a document planner 112, a
microplanner 114 and/or a realizer 116. The natural language
generation system 102 may also be in data communication with the
message store 104, the domain model 106 and/or the linguistic
resources 108. In some examples, the linguistic resources include,
but are not limited to, text schemas, aggregation rules, reference
rules, lexicalization rules and/or grammar rules that may be used
by one or more of the document planner 112, the microplanner 114
and/or the realizer 116. Other natural language generation systems
may be used in some example embodiments, such as a natural language
generation system as described in Building Natural Language
Generation Systems by Ehud Reiter and Robert Dale, Cambridge
University Press (2000), which is incorporated by reference in its
entirety herein.
[0021] The document planner 112 is configured to input one or more
messages from the message store 104. The document planner 112 may
comprise a content determination process that is configured to
select the messages, such as the messages that contain a
representation of the data that is to be output via a natural
language text. The document planner 112 may also comprise a
structuring process that determines the order of messages to be
included in a text. In some example embodiments, the document
planner 112 may access one or more text schemas for the purposes of
content determination and document structuring. The output of the
document planner 112 may be a tree-structured object or other data
structure that is referred to as a document plan. In an instance in
which a tree-structured object is chosen for the document plan, the
leaf nodes of the tree may contain the messages, and the
intermediate nodes of the tree structure object may be configured
to indicate how the subordinate nodes are related (e.g.
elaboration, consequence, contrast, sequence and/or the like) to
each other.
[0022] The microplanner 114 is configured to construct a
realization specification based on the document plan output from
the document planner 112, such that the document plan may be
expressed in natural language. In some example embodiments, the
microplanner 114 may convert one or more messages into a text
specification by performing aggregation, lexicalization and
referring expression generation. A text specification is a
specification of the content of a linguistic constituent such as a
sentence and contains a set of instructions for a realizer, such as
realizer 116, to produce a grammatically well-formed text. The
output of the microplanner 114, in some example embodiments, is a
tree-structured realization specification whose leaf-nodes are text
specifications, and whose internal nodes express rhetorical
relations between the leaf nodes. The microplanner 114 and the
aggregator 120 are further described with reference to FIG. 2.
[0023] A realizer 116 is configured to traverse a text
specification output by the microplanner 114 to express the text
specification in natural language. The realization process that is
applied to each text specification makes use of a grammar (e.g. the
grammar of the linguistic resources 108) which specifies the valid
syntactic structures in the language and further provides a way of
mapping from phrase specifications into the corresponding natural
language sentences. The output of the process is, in some example
embodiments, a natural language text.
[0024] FIG. 2 illustrates an example flow diagram that may be
performed by a microplanner 114, an aggregator 120 and/or the like
in accordance with some example embodiments of the present
invention. In some example embodiments, the microplanner 114 may
cause the aggregator 120 to input or the aggregator 120 may
otherwise input one or more phrase specifications. The aggregator
120 may then identify or otherwise determine a constituent in the
one or more phrase specifications that is aggregatable. In some
example embodiments, the aggregatable constituent may refer to an
entity, such as heart rate, respiration rate, temperature, pressure
and/or the like. Alternatively or additionally, a phrase
specification may contain multiple aggregatable constituents and,
as such, the use of aggregatable constituent herein should not be
considered as limiting the disclosure to a single aggregatable
constituent in a phrase specification.
[0025] A phrase specification may also have one or more
constituents that are generalizable or removable. Constituents that
are generalizable or removable may be defined by the domain model
106 for a particular domain and/or may be identified based on a
corpus analysis, business rules, user settings and/or the like. For
example, a particular value, such as a temperature, may be
generalized by a range such as "below the normal range", "in the
normal range" or "above the normal range" in some domains, but in
other domains such a generalization may be improper. In further
example embodiments, the domain model 106 may contain a generalized
constituent list which provides a list of alternative generalized
constituents for a given generalizable constituent. The domain
model 106 may also define the various levels of generalization for
each generalizable constituent. For example, the domain model may
identify "within a normal range" as the most generalized
constituent; whereas, other more specific generalizations may be
available, such as "between 25 C and 35 C". Alternatively or
additionally, the microplanner 114, the aggregator 120 or the like
may receive or otherwise determine, via a reordering flag, whether
the one or more phrase specifications can be reordered for the
purposes of aggregation.
[0026] As such, and as shown in block 202, the one or more phrase
specifications may be generalized. Such a generalization may
include, but is not limited to, generalizing all of the
constituents that are marked as generalizable by the aggregator 120
and/or removing all of the constituents that are marked as
removable by the aggregator 120. In some example embodiments, the
constituents may be generalized using a generalized constituent
marked as most generalized in the generalized constituent list or
predefined constituent list. The generalized constituent list may
contain one or more constituents that may be selected by the
aggregator 120 to replace a generalizable constituent in a phrase
specification. For example, the constituent "last Sunday" may be
generalized by, from least generalized or lowest level of
generalization to most generalized or highest level of
generalization, "earlier this week", "earlier this month", or "in
the past". Alternatively or additionally, a portion of the
constituents marked as generalizable may be generalized and/or a
portion of the constituents marked as removable may be removed.
[0027] As is shown in block 204, a group of generalized phrase
specifications that can be aggregated together are identified by
the aggregator 120. For example, sequences of phrase specifications
(if reordering is not permitted based on the reordering flag) or
subsets of the generalized phrase specifications (if reordering is
permitted based on the reordering flag) may be identified as being
aggregatable in an instance in which the sequences or subsets of
phrase specifications are identical except for their identified
aggregatable constituent. For example, if the aggregatable
constituent of "pressure is stable within normal range" is
"pressure" and the aggregatable constituent of "temperature is
stable within normal range" is "temperature", then the aggregator
120 may determine that the remaining constituents, namely "is
stable within normal range" and "is stable within normal range" are
identical and thus may determine the phrase specifications are
aggregatable. Alternatively or additionally, phrase specifications
may also be aggregated based on an indication in the domain model
106, business rules, a user setting and/or the like.
[0028] In some example embodiments, the one or more phrase
specifications are generalized to a highest level of generalization
at block 202 to identify groups of phrase specifications that can
be aggregated. Once those groups of phrase specifications are
identified, then at block 206, the level of generalization may be
reduced or otherwise lowered so long as the group of phrase
specifications can still be aggregated. For example, constituents
may be added back that were removed so long as the group of phrase
specifications can still be aggregated. As is shown in block 206, a
level of generalization that permits the group of phrase
specifications to still be aggregated is determined by the
aggregator 120. In some example embodiments, the constituents that
were removed at block 202 may be added back to the phrase
specifications in the group providing the phrase specifications in
the group are still aggregatable. In some example embodiments, a
generalized constituent may be added back to the phrase
specification instead of the removed constituent if the generalized
constituent enables the group of phrase specification to still be
aggregatable whereas adding the removed constituent would render
the group of phrase specifications no longer aggregatable.
Alternatively or additionally, less generalized constituents, as
defined by the generalized constituent listing, may replace the
generalized constituents providing the phrase specifications in the
group are still aggregatable.
[0029] Alternatively or additionally, other methods of
generalization may be used by the aggregator 120, for example, the
aggregator 120 may incrementally generalize one or more phrase
specifications until the one or more phrase specifications are
aggregatable, alternatively the aggregator 120 may determine
multiple levels of generalization for each phrase specification and
aggregate the phrase specifications based on the lowest level of
generalization, and/or the like.
[0030] At block 208, an aggregated phrase specification is
generated. In some example embodiments, the aggregated phrase
specification may contain a combination of the constituents, such
as a combined noun phrase, that contains the identified
aggregatable constituents and further contains one or more
additional constituents based on the determined level of
generalization. For example, the aggregated phrase specification
may contain the combined noun phrase and the one or more
generalized constituents but may otherwise be a copy of a phrase
specification of the one or more phrase specifications in the group
of phrase specifications. At block 210 the aggregated phrase
specification may be output by the aggregator 120 to the
microplanner and/or realizer for use in generating an output
text.
[0031] By way of example and with reference to FIG. 2, the
aggregator 120 may input one or more phrase specifications (shown
as sentences in this example), such as "heart rate was stable at 72
yesterday", "mean blood pressure was unstable yesterday with mean
value 95" and "respiratory rate was stable at 16 yesterday". In
order to generalize the one or more phrase specifications, those
constituents that are generalizable or removable may be identified.
For example, "at 72" in "heart rate was stable at 72 yesterday" and
"at 16" in "respiratory rate was stable at 16 yesterday" may be
marked as generalizable based on the domain model, business rules,
a user setting and/or the like. In some examples, both "at 72" and
"at 16" may be generalized as "within the normal range" based on a
generalizable constituent listing in the domain model. In some
examples, "yesterday" in both "heart rate was stable at 72
yesterday" and "respiratory rate was stable at 16 yesterday" may be
marked as removable. Both "with mean value 95" and "yesterday" may
also be marked as removable in "mean blood pressure was unstable
yesterday with mean value 95". Those constituents marked as
removable may be indicated as such by the domain model, business
rules, a user setting and/or the like.
[0032] As such, the one or more phrase specifications may be
generalized by removing each of the removable constituents and by
replacing each of the generalizable constituents with generalized
constituents. The one or more generalized phrase specifications may
then contain: "heart rate was stable within normal range", "mean
blood pressure was unstable" and "respiratory rate was stable
within normal range" in some example embodiments.
[0033] A group of generalized phrase specifications may then be
identified. A group of generalized phrase specifications may
include those phrase specifications that can be aggregated (e.g.
phrase specifications that are identical but for the aggregatable
constituent). In an instance in which reordering is permitted,
"heart rate was stable within normal range" and "respiratory rate
was stable within normal range" may be determined as aggregatable
because they are identical but for the aggregatable constituents
"heart rate" and "respiratory rate" and thus form a group. "Mean
blood pressure was unstable" is not aggregatable with the other
phrase specifications based on the constituent "was unstable".
Reordering would be necessary in this example, because the original
input had "heart rate was stable within normal range" as the first
phrase specification, "mean blood pressure was unstable" as the
second phrase specification and "respiratory rate was stable within
normal range" as the third specification. As such, for "heart rate
was stable within normal range" and "respiratory rate was stable
within normal range" would be reordered. In an instance in which
reordering as not permitted then these phrase specifications would
not be aggregatable.
[0034] Once a group of phrase specifications consisting of "heart
rate was stable within normal range" and "respiratory rate was
stable within normal range" is determined to be aggregatable, those
phrase specifications within the group are analyzed to determine
the level of generalization that would still enable the phrase
specifications within the group to be aggregated. For example, the
constituent "yesterday" was removed from both phrase specifications
and, as such, the addition of the constituent "yesterday" back to
the phrase specifications would still enable the phrase
specifications to be aggregated because each of the phrase
specifications in the group would remain identical but for the
aggregatable constituent. Whereas, there may not be a more specific
way to express the constituents "at 72" and "at 16" in a similar
manner and, as such, the generalization "within the normal range"
may represent the lowest level of generalization that is available
for these phrase specifications. Consequently, the phrase
specifications to be aggregated may include "heart rate was stable
within normal range yesterday" and "respiratory rate was stable
within normal range yesterday".
[0035] The aggregatable constituents, "heart rate" and "respiratory
rate" may be combined to form combined noun phrase "heart rate and
respiratory rate". In some examples, the aggregator 120 may
generate the noun phrase "heart and respiratory rate". "Heart and
respiratory rate" may then be combined with or otherwise
instantiated in an aggregated phrase specification with the
remaining constituents in a phrase specification of the group of
phrase specifications. The aggregated phrase specification is
configured to contain those constituents of the phrase
specification of the group of phrase specifications based on the
determined level of generalization (e.g. "were stable within normal
range yesterday"). As such, the resultant aggregated phrase
specification contains "heart and respiratory rate were stable
within normal range yesterday". Therefore, an output text may
include the aggregated phrase specification "heart and respiratory
rate were stable within normal range yesterday" and any unchanged
(e.g. not aggregated) phrase specifications in original form (e.g.
not generalized), such as "mean blood pressure was unstable
yesterday with mean value 95".
[0036] FIG. 3 is an example block diagram of an example computing
device for practicing embodiments of an example aggregator. In
particular, FIG. 3 shows a computing system 300 that may be
utilized to implement a natural language generation environment
having a natural language generation system 102 including, in some
examples, a document planner 112, a microplanner 114 having an
aggregator 120 and/or a realizer 116. One or more general purpose
or special purpose computing systems/devices may be used to
implement the natural language generation system 102. In addition,
the computing system 300 may comprise one or more distinct
computing systems/devices and may span distributed locations. In
some example embodiments, the natural language generation system
102 may be configured to operate remotely via the network 316. In
other example embodiments, a pre-processing module or other module
that requires heavy computational load may be configured to perform
that computational load and thus may be on a remote device or
server. For example, the realizer 116 may be accessed remotely. As
such, the natural language generation environment may be operable
remotely, such as via a cloud source, may be operable on a client
device that embodies at a least a portion of the one or more
blocks, and/or the like. Furthermore, each block shown may
represent one or more such blocks as appropriate to a specific
example embodiment. In some cases one or more of the blocks may be
combined with other blocks. Also, the natural language generation
system 102 may be implemented in software, hardware, firmware, or
in some combination to achieve the capabilities described
herein.
[0037] In the example embodiment shown, computing system 300
comprises a computer memory ("memory") 302, a display 304, one or
more processors 306, input/output devices 308 (e.g., keyboard,
mouse, CRT or LCD display, touch screen, gesture sensing device
and/or the like), other computer-readable media 310, and
communications interface 312. The processor 306 may, for example,
be embodied as various means including one or more microprocessors
with accompanying digital signal processor(s), one or more
processor(s) without an accompanying digital signal processor, one
or more coprocessors, one or more multi-core processors, one or
more controllers, processing circuitry, one or more computers,
various other processing elements including integrated circuits
such as, for example, an application-specific integrated circuit
(ASIC) or field-programmable gate array (FPGA), or some combination
thereof. Accordingly, although illustrated in FIG. 3 as a single
processor, in some embodiments the processor 306 comprises a
plurality of processors. The plurality of processors may be in
operative communication with each other and may be collectively
configured to perform one or more functionalities of the reference
system as described herein.
[0038] The natural language generation system 102 is shown residing
in memory 302. The memory 302 may comprise, for example, transitory
and/or non-transitory memory, such as volatile memory, non-volatile
memory, or some combination thereof. Although illustrated in FIG. 3
as a single memory, the memory 302 may comprise a plurality of
memories. The plurality of memories may be embodied on a single
computing device or may be distributed across a plurality of
computing devices collectively configured to function as the
natural language system, the microplanner and/or the reference
system. In various example embodiments, the memory 302 may
comprise, for example, a hard disk, random access memory, cache
memory, flash memory, a compact disc read only memory (CD-ROM),
digital versatile disc read only memory (DVD-ROM), an optical disc,
circuitry configured to store information, or some combination
thereof.
[0039] In other embodiments, some portion of the contents, some or
all of the components of the natural language generation system 102
may be stored on and/or transmitted over the other
computer-readable media 310. The components of the natural language
generation system 102 preferably execute on one or more processors
306 and are configured to enable operation of an aggregator, as
described herein.
[0040] Alternatively or additionally, other code or programs 314
(e.g., an administrative interface, a Web server, and the like) and
potentially other data repositories, such as other data sources,
also reside in the memory 302, and preferably execute on one or
more processors 306. Of note, one or more of the components in FIG.
3 may not be present in any specific implementation. For example,
some embodiments may not provide other computer readable media 310
or a display 304.
[0041] The natural language generation system 102 is further
configured to provide functions such as those described with
reference to FIG. 1. The natural language generation system 102 may
interact with the network 316, via the communications interface
312, with remote data sources 318 (e.g. remote reference data,
remote performance data, remote aggregation data, remote knowledge
pools and/or the like), third-party content providers 320 and/or
client devices 322. The network 316 may be any combination of media
(e.g., twisted pair, coaxial, fiber optic, radio frequency),
hardware (e.g., routers, switches, repeaters, transceivers), and
protocols (e.g., TCP/IP, UDP, Ethernet, Wi-Fi, WiMAX, Bluetooth)
that facilitate communication between remotely situated humans
and/or devices. In some instance the network 316 may take the form
of the internet or may be embodied by a cellular network such as an
LTE based network. In this regard, the communications interface 312
may be capable of operating with one or more air interface
standards, communication protocols, modulation types, access types,
and/or the like. The client devices 322 include desktop computing
systems, notebook computers, mobile phones, smart phones, personal
digital assistants, tablets and/or the like.
[0042] In an example embodiment, components/modules of the natural
language generation system 102 are implemented using standard
programming techniques. For example, the natural language
generation system 102 may be implemented as a "native" executable
running on the processor 306, along with one or more static or
dynamic libraries. In other embodiments, the natural language
generation system 102 may be implemented as instructions processed
by a virtual machine that executes as one of the other programs
314. In general, a range of programming languages known in the art
may be employed for implementing such example embodiments,
including representative implementations of various programming
language paradigms, including but not limited to, object-oriented
(e.g., Java, C++, C#, Visual Basic.NET, Smalltalk, and the like),
functional (e.g., ML, Lisp, Scheme, and the like), procedural
(e.g., C, Pascal, Ada, Modula, and the like), scripting (e.g.,
Perl, Ruby, Python, JavaScript, VBScript, and the like), and
declarative (e.g., SQL, Prolog, and the like).
[0043] The embodiments described above may also use synchronous or
asynchronous client-server computing techniques. Also, the various
components may be implemented using more monolithic programming
techniques, for example, as an executable running on a single
processor computer system, or alternatively decomposed using a
variety of structuring techniques, including but not limited to,
multiprogramming, multithreading, client-server, or peer-to-peer,
running on one or more computer systems each having one or more
processors. Some embodiments may execute concurrently and
asynchronously, and communicate using message passing techniques.
Equivalent synchronous embodiments are also supported. Also, other
functions could be implemented and/or performed by each
component/module, and in different orders, and by different
components/modules, yet still achieve the described functions.
[0044] In addition, programming interfaces to the data stored as
part of the natural language generation system 102, such as by
using one or more application programming interfaces can be made
available by mechanisms such as through application programming
interfaces (API) (e.g. C, C++, C#, and Java); libraries for
accessing files, databases, or other data repositories; through
scripting languages such as XML; or through Web servers, FTP
servers, or other types of servers providing access to stored data.
The message store 104, the domain model 106 and/or the linguistic
resources 108 may be implemented as one or more database systems,
file systems, or any other technique for storing such information,
or any combination of the above, including implementations using
distributed computing techniques. Alternatively or additionally,
the message store 104, the domain model 106 and/or the linguistic
resources 108 may be local data stores but may also be configured
to access data from the remote data sources 318.
[0045] Different configurations and locations of programs and data
are contemplated for use with techniques described herein. A
variety of distributed computing techniques are appropriate for
implementing the components of the illustrated embodiments in a
distributed manner including but not limited to TCP/IP sockets,
RPC, RMI, HTTP, Web Services (XML-RPC, JAX-RPC, SOAP, and the
like). Other variations are possible. Also, other functionality
could be provided by each component/module, or existing
functionality could be distributed amongst the components/modules
in different ways, yet still achieve the functions described
herein.
[0046] Furthermore, in some embodiments, some or all of the
components of the natural language generation system 102 may be
implemented or provided in other manners, such as at least
partially in firmware and/or hardware, including, but not limited
to one or more ASICs, standard integrated circuits, controllers
executing appropriate instructions, and including microcontrollers
and/or embedded controllers, FPGAs, complex programmable logic
devices ("CPLDs"), and the like. Some or all of the system
components and/or data structures may also be stored as contents
(e.g., as executable or other machine-readable software
instructions or structured data) on a computer-readable medium so
as to enable or configure the computer-readable medium and/or one
or more associated computing systems or devices to execute or
otherwise use or provide the contents to perform at least some of
the described techniques. Some or all of the system components and
data structures may also be stored as data signals (e.g., by being
encoded as part of a carrier wave or included as part of an analog
or digital propagated signal) on a variety of computer-readable
transmission mediums, which are then transmitted, including across
wireless-based and wired/cable-based mediums, and may take a
variety of forms (e.g., as part of a single or multiplexed analog
signal, or as multiple discrete digital packets or frames). Such
computer program products may also take other forms in other
embodiments. Accordingly, embodiments of this disclosure may be
practiced with other computer system configurations.
[0047] FIG. 4 is a flowchart illustrating an example method
performed in accordance with some example embodiments described
herein. As is shown in operation 402, an apparatus may include
means, such as the microplanner 114, the aggregator 120, the
processor 306, or the like, for identifying a constituent in one or
more phrase specifications as aggregatable. As is shown in decision
operation 404, an apparatus may include means, such as the
microplanner 114, the aggregator 120, the processor 306, or the
like, for determining whether two or more of the received phrase
specifications contain a constituent that is aggregatable. If not,
then as is shown in operation 406, an apparatus may include means,
such as the microplanner 114, the aggregator 120, the processor
306, or the like, for outputting the one or more received phrase
specifications.
[0048] If there are two are more phrase specifications that contain
a constituent that is aggregatable, then the phrase specifications
may be generalized in operations 408 and 410 to create one or more
generalized phrase specifications. A phrase specification may be
generalized by identifying constituents in the phrase specification
that are either generalizable or removable. As is shown in
operation 408, an apparatus may include means, such as the
microplanner 114, the aggregator 120, the processor 306, or the
like, for removing all constituents identified as removable. As is
shown in operation 410, an apparatus may include means, such as the
microplanner 114, the aggregator 120, the processor 306, or the
like, for replacing all constituents identified as generalizable
with a most generalized constituent from a generalized constituent
listing.
[0049] As is shown in operation 412, an apparatus may include
means, such as the microplanner 114, the aggregator 120, the
processor 306, or the like, for causing phrase specifications that
can be aggregated, based on one or more removed or generalized
constituents, to be grouped into phrase specification groups and
stored in a data structure ListPhraseSpecGroups. In some example
embodiments, the two or more generalized phrase specifications are
identified as aggregatable in an instance in which each of the two
or more generalized phrase specifications are identical but for the
aggregatable constituents in each of the two or more generalized
phrase specifications.
[0050] Operations 414-424, in some example embodiments, are
configured to generate aggregated phrase specifications for each of
the groups of phrase specifications. As is shown in operation 414,
an apparatus may include means, such as the microplanner 114, the
aggregator 120, the processor 306, or the like, for setting a data
structure PhraseSpecGroup to a first group of phrase specifications
in ListPhraseSpecGroups.
[0051] As is shown in operation 416, an apparatus may include
means, such as the microplanner 114, the aggregator 120, the
processor 306, or the like, for generating an aggregated phrase
specification based on at least one phrase specification in
PhraseSpecGroup. As is shown in operation 418, an apparatus may
include means, such as the microplanner 114, the aggregator 120,
the processor 306, or the like, for populating the aggregated
phrase specification with a combined noun phrase or other
aggregation of the constituents that are identified as aggregatable
constituents in the phrase specifications in PhraseSpecGroup. As is
shown in operation 420, an apparatus may include means, such as the
microplanner 114, the aggregator 120, the processor 306, or the
like, for populating the aggregated phrase specification with one
or more constituents based on a determined level of generalization.
Populating the aggregated phrase specification with one or more
constituents based on a determined level of generalization is
further described with reference to FIG. 5.
[0052] As is shown in decision operation 422, an apparatus may
include means, such as the microplanner 114, the aggregator 120,
the processor 306, or the like, for determining whether there are
additional groups of phrase specifications in ListPhraseSpecGroups.
If so, then as is shown in operation 424, an apparatus may include
means, such as the microplanner 114, the aggregator 120, the
processor 306, or the like, for setting PhraseSpecGroup to the next
group of phrase specifications in ListPhraseSpecGroups. The process
then loops back to operation 416. If there are not an additional
group of phrase specifications in ListPhraseSpecGroups, then, as is
shown in operation 426, an apparatus may include means, such as the
microplanner 114, the aggregator 120, the processor 306, or the
like, for outputting one or more aggregated phrase specifications
and/or one or more phrase specifications that were not
aggregated.
[0053] FIG. 5 is a flowchart illustrating an example method of
populating the aggregated phrase specification with one or more
constituents based on a determined level of generalization
performed in accordance with some example embodiments described
herein. As is shown in decision operation 502, an apparatus may
include means, such as the microplanner 114, the aggregator 120,
the processor 306, or the like, for determining whether a
constituent was removed in operation 408 from a phrase
specification in PhraseSpecGroup. If not, then the process
continues to decision operation 508.
[0054] If a constituent was removed in operation 408, then, as is
shown in decision operation 504, an apparatus may include means,
such as the microplanner 114, the aggregator 120, the processor
306, or the like, for determining whether the phrase specifications
in PhraseSpecGroup would still be aggregatable with the removed
constituent or a generalized version of the removed constituent. If
not, then the process continues to decision operation 508.
[0055] If the phrase specifications in PhraseSpecGroup would still
be aggregatable with the removed constituent or a generalized
version of the removed constituent, then, as is shown in operation
506, an apparatus may include means, such as the microplanner 114,
the aggregator 120, the processor 306, or the like, for populating
the aggregated phrase specification with the removed constituent or
a generalized constituent of the removed constituent provided that
it is consistent with the other phrase specifications in
PhraseSpecGroup and, as such, the PhraseSpecGroup is still
aggregatable.
[0056] As is shown in decision operation 508, an apparatus may
include means, such as the microplanner 114, the aggregator 120,
the processor 306, or the like, for determining whether a
constituent was generalized in operation 410 from a phrase
specification in PhraseSpecGroup. If not, then the process
ends.
[0057] If a constituent was generalized in operation 410, then, as
is shown in decision operation 510, an apparatus may include means,
such as the microplanner 114, the aggregator 120, the processor
306, or the like, for determining whether the phrase specifications
in PhraseSpecGroup would still be aggregatable with a less
generalized version of the generalized constituent. If not, then
the process ends.
[0058] If the phrase specifications in PhraseSpecGroup would still
be aggregatable with a less generalized version of the generalized
constituent, then, as is shown in operation 512, an apparatus may
include means, such as the microplanner 114, the aggregator 120,
the processor 306, or the like, for populating the aggregated
phrase specification with another generalized constituent, such as
a less generalized constituent, from the constituent listing
provided that it is consistent with the other phrase specifications
in PhraseSpecGroup and, as such, the PhraseSpecGroup is still
aggregatable. In some example embodiments and provided one or more
generalized constituents would still enable the PhraseSpecGroup to
be aggregatable, the aggregator 120 is configured to select the
least generalized constituent or the constituent that closest to
the original constituent. Alternatively or additionally, a
generalized constituent may be generated based on a predefined
constituent listing that is defined by the domain model and is
configured to provide constituents from least general to most
general.
[0059] FIGS. 2 and 4-5 illustrate example flowcharts of the
operations performed by an apparatus, such as computing system 300
of FIG. 3, in accordance with example embodiments of the present
invention. It will be understood that each block of the flowcharts,
and combinations of blocks in the flowcharts, may be implemented by
various means, such as hardware, firmware, one or more processors,
circuitry and/or other devices associated with execution of
software including one or more computer program instructions. For
example, one or more of the procedures described above may be
embodied by computer program instructions. In this regard, the
computer program instructions which embody the procedures described
above may be stored by a memory 302 of an apparatus employing an
embodiment of the present invention and executed by a processor 306
in the apparatus. As will be appreciated, any such computer program
instructions may be loaded onto a computer or other programmable
apparatus (e.g., hardware) to produce a machine, such that the
resulting computer or other programmable apparatus provides for
implementation of the functions specified in the flowcharts'
block(s). These computer program instructions may also be stored in
a non-transitory computer-readable storage memory that may direct a
computer or other programmable apparatus to function in a
particular manner, such that the instructions stored in the
computer-readable storage memory produce an article of manufacture,
the execution of which implements the function specified in the
flowcharts' block(s). The computer program instructions may also be
loaded onto a computer or other programmable apparatus to cause a
series of operations to be performed on the computer or other
programmable apparatus to produce a computer-implemented process
such that the instructions which execute on the computer or other
programmable apparatus provide operations for implementing the
functions specified in the flowcharts' block(s). As such, the
operations of FIGS. 2 and 4-5, when executed, convert a computer or
processing circuitry into a particular machine configured to
perform an example embodiment of the present invention.
Accordingly, the operations of FIGS. 2 and 4-5 define an algorithm
for configuring a computer or processor, to perform an example
embodiment. In some cases, a general purpose computer may be
provided with an instance of the processor which performs the
algorithm of FIGS. 2 and 4-5 to transform the general purpose
computer into a particular machine configured to perform an example
embodiment.
[0060] Accordingly, blocks of the flowchart support combinations of
means for performing the specified functions and combinations of
operations for performing the specified functions. It will also be
understood that one or more blocks of the flowcharts', and
combinations of blocks in the flowchart, can be implemented by
special purpose hardware-based computer systems which perform the
specified functions, or combinations of special purpose hardware
and computer instructions.
[0061] In some example embodiments, certain ones of the operations
herein may be modified or further amplified as described herein.
Moreover, in some embodiments additional optional operations may
also be included. It should be appreciated that each of the
modifications, optional additions or amplifications described
herein may be included with the operations herein either alone or
in combination with any others among the features described
herein.
[0062] Many modifications and other embodiments of the inventions
set forth herein will come to mind to one skilled in the art to
which these inventions pertain having the benefit of the teachings
presented in the foregoing descriptions and the associated
drawings. Therefore, it is to be understood that the inventions are
not to be limited to the specific embodiments disclosed and that
modifications and other embodiments are intended to be included
within the scope of the appended claims. Moreover, although the
foregoing descriptions and the associated drawings describe example
embodiments in the context of certain example combinations of
elements and/or functions, it should be appreciated that different
combinations of elements and/or functions may be provided by
alternative embodiments without departing from the scope of the
appended claims. In this regard, for example, different
combinations of elements and/or functions than those explicitly
described above are also contemplated as may be set forth in some
of the appended claims. Although specific terms are employed
herein, they are used in a generic and descriptive sense only and
not for purposes of limitation.
* * * * *