U.S. patent application number 17/750559 was filed with the patent office on 2022-09-08 for applied artificial intelligence technology for interactively using narrative analytics to focus and control visualizations of data.
The applicant listed for this patent is Narrative Science Inc.. Invention is credited to Lawrence A. Birnbaum, Bo He, Kathryn McCarthy Hughes, Mauro Eduardo Ignacio Mujica-Parodi, III, Daniel Joseph Platt, Alexander Rudolf Sippel.
Application Number | 20220284195 17/750559 |
Document ID | / |
Family ID | 1000006351366 |
Filed Date | 2022-09-08 |
United States Patent
Application |
20220284195 |
Kind Code |
A1 |
Platt; Daniel Joseph ; et
al. |
September 8, 2022 |
Applied Artificial Intelligence Technology for Interactively Using
Narrative Analytics to Focus and Control Visualizations of Data
Abstract
To provide users with more flexibility for focusing and
controlling visualizations of data, the inventors disclose new data
structures and artificial intelligence logic that can be utilized
in conjunction with notional specifications of focus criteria for
visualizations. In an example embodiment, the inventors disclose
technology that can be used to generate data structures that
represent notional characteristics of the visualization data which
in turn can be tied to specific elements of the visualization data
to support interactive focusing of visualizations in notional terms
that correspond to interesting aspects of the data. This allows a
user to specify using notional criteria how a visualization should
be focused on specific elements without needing to know in advance
what those specific elements are.
Inventors: |
Platt; Daniel Joseph;
(Chicago, IL) ; Sippel; Alexander Rudolf;
(Chicago, IL) ; Mujica-Parodi, III; Mauro Eduardo
Ignacio; (Chicago, IL) ; Hughes; Kathryn
McCarthy; (Chicago, IL) ; He; Bo; (Chicago,
IL) ; Birnbaum; Lawrence A.; (Evanston, IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Narrative Science Inc. |
Chicago |
IL |
US |
|
|
Family ID: |
1000006351366 |
Appl. No.: |
17/750559 |
Filed: |
May 23, 2022 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15666151 |
Aug 1, 2017 |
11341338 |
|
|
17750559 |
|
|
|
|
62382063 |
Aug 31, 2016 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06T 11/206 20130101;
G06F 40/40 20200101; G06F 40/103 20200101; G06N 5/04 20130101; G06N
3/08 20130101; G06F 3/0482 20130101; G06T 2200/24 20130101; G06F
40/169 20200101; G06F 3/04847 20130101; G06F 40/289 20200101; G06F
3/0484 20130101; G06N 5/02 20130101 |
International
Class: |
G06F 40/40 20060101
G06F040/40; G06F 40/103 20060101 G06F040/103; G06F 40/169 20060101
G06F040/169; G06F 40/289 20060101 G06F040/289; G06N 5/04 20060101
G06N005/04; G06N 5/02 20060101 G06N005/02 |
Claims
1. A system for applying artificial intelligence technology to
focus and control visualizations of visualization data, wherein the
visualization data comprises a plurality of elements, the system
comprising: a processor configured to: translate a specification of
notionally expressed focus criteria regarding a first visualization
of the visualization data into a focus configuration, wherein the
translation includes an analysis of the visualization data based on
processing logic to identify a specific element of the
visualization data that satisfies the notionally expressed focus
criteria; and generate (1) a second visualization that is a focused
version of the first visualization or (2) a focused narrative based
on the focus configuration, wherein the second visualization or
focused narrative addresses the identified specific element of the
visualization data; wherein the processor is configured to perform
the translation and generation without the processor knowing which
of the elements of the visualization data qualifies as the specific
element according to the notionally expressed focus criteria prior
to the translation.
2. The system of claim 1 wherein the processor is further
configured to provide an interface through which a user specifies
the notionally expressed focus criteria in notional terms.
3. The system of claim 2 wherein the notional terms comprise a
plurality of characteristics that an element of the visualization
data must satisfy to be identified as the specific element to be
used as part of the focus configuration.
4. The system of claim 3 wherein the processor is configured to, as
part of the analysis, process data within or derived from the
visualization data with respect to the characteristics to identify
the specific element that satisfies the characteristics.
5. The system of claim 2 wherein the interface comprises a
graphical user interface (GUI).
6. The system of claim 1 wherein the processor comprises a
plurality of processors.
7. The system of claim 6 wherein the processors includes a first
processor that is part of a narrative generation platform and a
second processor that is part of a visualization platform, wherein
the first processor is configured to generate the focused
narrative, and wherein the second processor is configured to
generate the second visualization.
8. The system of claim 7 wherein the first processor is configured
to (1) perform the translate operation in response to notionally
expressed focus criteria received via an interface with a
visualization platform and (2) communicate the focus configuration
to the second processor via the interface with the visualization
platform.
9. The system of claim 1 wherein the processor is further
configured to map the focus configuration to a story configuration,
the story configuration for use to automatically generate the
focused narrative.
10. The system of claim 9 wherein the processor is further
configured to (1) instantiate the mapped story configuration based
on the visualization data and (2) generate the focused narrative
based on the instantiated story configuration.
11. The system of claim 1 wherein the elements comprise a plurality
of entities which correspond to a dimension of the first
visualization; wherein the visualization data further comprises a
plurality of data values which correspond to a measure of the first
visualization that is applicable to the dimension; and wherein the
notionally expressed focus criteria comprise a specification of a
metric by which to analyze the data values of the measure for the
dimension to determine which of the entities qualifies as the
specific element.
12. The system of claim 11 wherein the notionally expressed focus
criteria further comprise a specification of a rank group and a
number of items; wherein the specified rank group defines a rank
criterion by which to rank the entities according to the data
values of the measure for the dimension; and wherein the specified
number of items defines how many of the ranked entities are to be
selected as the specific element according to the rank
criterion.
13. The system of claim 11 wherein the visualization data further
comprises a plurality of data values which correspond to a
plurality of measures of the first visualization that are
applicable to the dimension; and wherein the notionally expressed
focus criteria further comprise a specification of a measure to
define which of the data values are to be analyzed according to the
specified metric to determine which of the entities qualifies as
the specific element.
14. The system of claim 11 wherein the processor, as part of the
translation, is further configured to (1) instantiate code for
analyzing the data values of the measure for the dimension
according to the metric and (2) execute the instantiated code with
respect to the visualization data to identify the entity to serve
as the specific element.
15. The system of claim 1 wherein the elements comprise a plurality
of entities which correspond to a plurality of dimensions of the
first visualization; wherein the visualization data further
comprises a plurality of data values which correspond to a
plurality of measures of the first visualization that are
applicable to the dimensions; wherein the notionally expressed
focus criteria comprise a specification of a dimension, a measure,
and a metric; and wherein the specified metric defines how the data
values of the specified measure for the specified dimension are to
be analyzed to determine which of the entities qualifies as the
specific element.
16. The system of claim 15 wherein the notionally expressed focus
criteria further comprise a specification of a rank group and a
number of items; wherein the specified rank group defines a rank
criterion by which to rank the entities according to the data
values of the specified measure for the specified dimension; and
wherein the specified number of items defines how many of the
ranked entities are to be selected as the specific element
according to the rank criterion.
17. The system of claim 1 wherein the elements comprise a plurality
of measures that are applicable to a dimension of the first
visualization; wherein the visualization data further comprises a
plurality of data values which correspond to the measures; wherein
the notionally expressed focus criteria indicate a request to focus
on a driver of a measure presented by the first visualization;
wherein the processor, as part of the analysis, is further
configured to (1) access additional data associated with the
visualization data, wherein the additional data defines a plurality
of driver relationships between a plurality of the measures of the
visualization data and (2) based on the accessed additional data,
identify at least one of the measures of the visualization data as
a driver of the measure presented by the first visualization,
wherein the specific element comprises the identified at least one
measure; and wherein the focused narrative comprises a driver
explanation for the measure presented by the first visualization in
terms of the identified at least one measure.
18. The system of claim 1 wherein the specific element comprises a
plurality of specific elements of the visualization data.
19. A method for applying artificial intelligence technology to
focus and control visualizations of data, wherein the visualization
data comprises a plurality of elements, the method comprising:
translating a specification of notionally expressed focus criteria
regarding a first visualization of the visualization data into a
focus configuration, wherein the translating step includes
analyzing the visualization data based on processing logic to
identify a specific element of the visualization data that
satisfies the notionally expressed focus criteria; and generating
(1) a second visualization that is a focused version of the first
visualization or (2) a focused narrative based on the focus
configuration, wherein the second visualization or focused
narrative addresses the identified specific element of the
visualization data; wherein the translating and generating steps
are performed by a processor without the processor knowing which of
the elements of the visualization data qualifies as the specific
element according to the notionally expressed focus criteria prior
to the translating step.
20. A computer program product for applying artificial intelligence
technology to focus and control visualizations of data, wherein the
visualization data comprises a plurality of elements, the computer
program product comprising: a plurality of processor-executable
instructions resident on a non-transitory computer-readable storage
medium, wherein the instructions are configured for execution by a
processor to cause the processor to: translate a specification of
notionally expressed focus criteria regarding a first visualization
of the visualization data into a focus configuration, wherein the
translation includes an analysis of the visualization data based on
processing logic to identify a specific element of the
visualization data that satisfies the notionally expressed focus
criteria, wherein the processor does not know which of the elements
of the visualization data qualifies as the specific element
according to the notionally expressed focus criteria prior to
execution of the translate instructions; and generate (1) a second
visualization that is a focused version of the first visualization
or (2) a focused narrative based on the focus configuration,
wherein the second visualization or focused narrative addresses the
identified specific element of the visualization data.
Description
CROSS-REFERENCE AND PRIORITY CLAIM TO RELATED PATENT
APPLICATIONS
[0001] This patent application is a continuation of U.S. patent
application Ser. No. 15/666,151, filed Aug. 1, 2017, and entitled
"Applied Artificial Intelligence Technology for Interactively Using
Narrative Analytics to Focus and Control Visualizations of Data",
now U.S. Pat. No. ______, which claims priority to U.S. provisional
patent application Ser. No. 62/382,063, filed Aug. 31, 2016, and
entitled "Applied Artificial Intelligence Technology for
Interactively Using Narrative Analytics to Focus and Control
Visualizations of Data", the entire disclosures of each of which
are incorporated herein by reference.
[0002] This patent application is also related to (1) U.S. patent
application Ser. No. 15/666,168, filed Aug. 1, 2017, and entitled
"Applied Artificial Intelligence Technology for Evaluating Drivers
of Data Presented in Visualizations", now U.S. Pat. No. 11,144,838,
and (2) U.S. patent application Ser. No. 15/666,192, filed Aug. 1,
2017, and entitled "Applied Artificial Intelligence Technology for
Selective Control over Narrative Generation from Visualizations of
Data", now U.S. Pat. No. 10,853,583, the entire disclosures of each
of which are incorporated herein by reference.
INTRODUCTION
[0003] Many data visualization software programs enable users to
select particular aspects or components of the data presented in a
given visualization in order to produce a second, alternative
visualization, where this second visualization highlights or
presents only those selected aspects/components. Such functionality
in the visualization software provides users with some degree of an
ability to focus a visualization in an interactive manner. For
example,
[0004] FIG. 1 shows an example visualization that plots the number
of wins by major league baseball team for each season from 1980
through 2015. The visualization is presented in the form of a line
chart, and more specifically a multi-line chart. The vertical axis
corresponds to wins while the horizontal axis corresponds to years
(seasons), and each line on the chart corresponds to a different
baseball team. As can be seen by FIG. 1, this multi-line chart is
extremely dense in terms of information conveyed, and so rather
difficult to understand; a user may therefore want to focus in on
certain aspects of this visualization in order to understand it
better.
[0005] To provide users with such a focus capability, some
visualization software platforms provide users with an ability to
select specific elements of the visualization. A variety of
mechanisms may be utilized to provide such a selection capability
(e.g., point and click selections, drop-down menus, modal dialog
boxes, buttons, checkboxes, etc.). For example, with reference to
FIG. 1, a user may want to focus the visualization on a subset of
specific teams. To do so, the visualization software performs the
process flow of FIG. 2A. At step 200, the software receives a
selection indicative of an entity corresponding to one or more of
the lines of the line chart. For example, this can be a selection
of three teams. Then, at step 202, the software selects the data
elements within the data upon which the visualization is based that
correspond(s) to the selection(s) received at step 200. Once again,
continuing with the example where three teams are selected, the
software would select the win totals across the subject seasons for
the three selected teams. Then, at step 204, the software re-draws
the line chart in a manner that limits the presented data to only
the selected data elements (e.g., the lines for the selected teams.
FIG. 2B depicts an example visualization that would result from
step 204 after a user has selected three specific teams--the
Atlanta Braves, Detroit Tigers, and New York Mets--for the focused
presentation.
[0006] Inherent in this conventional approach to interactively
focusing visualizations is that the user must be able to directly
specify which specific elements of the visualization, and
corresponding data, should be the focus of the focusing efforts, in
terms of the data and metadata presented by the visualization
system itself. This in turn requires that the user know which
specific elements of the visualization, expressed in these terms,
should be the focus. That is, continuing with the above example,
the user needed to know in advance that the Atlanta Braves, Detroit
Tigers, and New York Mets were the entities that were to be the
subjects of the focus. Presumably, this would be based on knowledge
possessed by the user that there was something potentially
interesting about these three teams that made them worthy of the
focusing effort. The inventors believe this constraint is a
significant shortcoming of conventional focusing capabilities of
data visualization systems. That is, the ability to focus a
visualization on interesting aspects of the visualization via
conventional software relies on either prior knowledge by the user
about the specific data elements being visualized or the
recognition of a specific element of the visualization itself that
is worthy of focus (e.g., selecting the lines showing win peaks for
teams in 1998 and 2001 that are higher than the peaks for other
seasons).
[0007] For this reason, the aspects or components of data that a
user may select in these visualization systems are those which are
already manifest within the system--specifically, those data or
metadata which comprise the initial visualization with respect to
which the user is making a selection (or selections) in order to
produce a second visualization focusing on just those aspects or
components of the data. These include such elements as specific
entities or subsets of entities along the x-axis (independent
variable) of a bar chart; specific measures (if there is more than
one) along the y-axis (dependent variable); specific intervals
along the x-axis of a line chart; specific lines in multiple-line
charts; etc. In other words, the elements of the data manifest as
specific entities in the system, to which a user might refer via
some selection process, are limited to those which comprise the raw
data or metadata used to construct the initial visualization in the
first place.
[0008] In sum, the focus criteria made available by conventional
systems are criteria already known and explicitly represented
within the visualization data (such as specific teams on the line
chart of FIGS. 1 and 2B)--specifically, again, those data or
metadata which comprise the initial visualization with respect to
which the user is making a selection (or selections) in order to
produce a second visualization focusing on just those aspects or
components of the data.
[0009] However, the inventors believe that there are many
interesting aspects of many instances of base visualization data
that are hidden within that data. Unfortunately, conventional
visualization systems are unable to provide users with an automated
means for discovering these interesting aspects of the data that
are worth focusing on, and then specifying what to focus on in
terms of these interesting aspects.
[0010] As a solution to this technical problem in the art, the
inventors disclose that new data structures and artificial
intelligence logic can be utilized in conjunction with
visualization systems that support the use of notional
specifications of focus criteria. That is, continuing with the
example of FIG. 1, rather than instructing the visualization
software to "Focus on Specific Team A" within the set of teams
already manifest in the visualization data (by which we mean both
the data being visualized and the metadata that describe those data
and the visualization itself), the inventors have invented an
approach that defines the focus criteria in terms of
characteristics of the visualization data that may not yet be
known. By way of example with reference to FIG. 1, this focusing
effort could be "Focus on the Team with the Highest Average Number
of Wins per Season". The actual entity with the highest average
number of wins per season is not known by the system ahead of time,
but the inventors disclose applied computer technology that allows
for a notional specification of such an entity that leads to its
actual identification within the visualization data, which in turn
can drive a more focused visualization.
[0011] As disclosed in commonly-owned U.S. patent application Ser.
No. 15/253,385, entitled "Applied Artificial Intelligence
Technology for Using Narrative Analytics to Automatically Generate
Narratives from Visualization Data", filed Aug. 31, 2016, and U.S.
provisional patent application Ser. No. 62/249,813, entitled
"Automatic Generation of Narratives to Accompany Visualizations",
filed Nov. 2, 2015 (the entire disclosures of both of which are
incorporated herein by reference), narrative analytics can be used
in combination with visualization data in order to carry out the
automatic generation of narrative text that provides natural
language explanations of that visualization data. Thus, in example
embodiments, captions can be automatically generated to accompany a
visualization that summarize and explain the important aspects of
that visualization in a natural language format. For example, FIG.
3 shows the visualization of FIG. 1 paired with a narrative 300
produced via the technology described the above-referenced and
incorporated '385 patent application. This narrative 300 serves to
summarize and explain important or interesting aspects of the
visualization.
[0012] The inventors disclose that such narrative analytics
technology can be used to generate data structures that represent
notional characteristics of the visualization data which in turn
can be tied to specific elements of the visualization data to
support interactive focusing of visualizations in notional terms
that correspond to interesting aspects of the data, as described
above.
[0013] The operation of this narrative analytics technology applies
narrative analytics to the raw data in order to produce derived
features or categories (e.g., aggregate elements such as ranked
lists or clusters, aggregate measures such as means, medians,
ranges, or measures of volatility, and individual elements of
interest such as outliers, etc.) that play a role in determining
the appropriate data and characterizations--including in particular
derived features, values, or categories themselves--that should be
included in the resulting narrative. These narrative analytics are
the analytic processes and methods specified in (or by) the system
due to their potential relevance to the type of narrative the
system has been constructed or configured to produce.
[0014] In regards to the issues presented by the interactive
construction of more focused visualizations, the inventors note
that, in light of the discussion above, the results of these
analytics--which are constructed and explicitly represented as
entities within the narrative generation system--constitute
entities, both notional and actual, above and beyond those
representing the raw data and metadata that comprise the initial
visualization itself. Accordingly, the inventors disclose that
these explicitly represented notional entities are available for
presentation to users for selection (via whatever interface
mechanism is preferred) as focus criteria--thereby enabling the
construction and presentation of more focused visualizations (as
well as more focused narratives) specified in terms of these
derived features, not just in terms of the raw data and metadata
comprising the initial visualization itself.
[0015] Moreover, in example embodiments, the entities representing
these derived features or categories can be represented (and
presented to the user) in entirely notional terms--that is, they
can represent objects or entities with specific, relevant
properties, even if the user (or, for that matter, the system
itself) doesn't yet know the actual objects or entities which have
those properties. For example, if there is an analytic that
computes the percentage increase in some metric over some interval,
and additionally, one that ranks actual (input) entities according
to this percentage increase, then it is possible to present to the
user the possibility of selecting, to produce a more specific and
focused visualization (and narrative), the following notional
entity: "the (input) entity with the greatest percentage increase
in the metric over the interval"--whether or not we know (yet) the
actual entity that fits that description. Indeed, by combining this
description of the notional entity with appropriate metadata from
the initial visualization (concerning, e.g., the nature of the
metric and/or of the entities in question) it is possible to
present the user with the option of selecting, for a more focused
view and narrative: "the company with the greatest percentage
increase in revenue over the third quarter." In either case, this
will of course turn out to be some specific company, Company A. But
the point is that the user can select this company, and the data
about it, simply by selecting the notional reference described in
terms of the result of this analytic--i.e., not solely by using the
name of the company as manifest in the raw data or metadata
comprising the visualization. This is the difference between being
able to say, "Tell me about the three teams with the best records,"
and "Tell me about the Yankees, the White Sox, and the Red Sox."
You can specify the former without knowing which teams actually fit
the bill--in fact the point of the resulting narrative and
visualization will be, in part, to inform you of which teams
actually fit that description. This enables the user to focus the
subsequent visualization (and narrative) by reference to the
narratively interesting aspects or components of the data, in
purely functional terms.
[0016] Within the context of an example embodiment, the inventors
disclose a narrative focus filter that integrates narrative
generation technology with visualization systems to focus on the
narratively salient elements of the data, as computed by relevant
analytics. The narrative focus filter simplifies a complex
visualization by not only using a natural language generation
system to produce an insightful narrative about the visualization,
but by focusing the visualization on a subset of the data based on
what is most important to the user, in narrative terms, as
described above.
[0017] Further still, in an example embodiment, a narrative is
embedded into an existing visualization platform. With such an
example embodiment, the interface object presenting the narrative
to the user of that platform can be adapted to provide an
experience whereby the user is able to focus on a view of the
visualization that provides him or her with the most insight. In
this sense, the narrative analytics (and in some cases the
narrative itself), are what actually provide the selections and
filters utilized by the user to determine the focus; the system
then translates these into the specific entities actually used by
the visualization platform. The result is a seamless integration in
which entities that comprise important elements of the narrative
accompanying a given visualization can be used to manipulate the
visualization itself. Again, it should be noted that the necessary
narrative analytics can be performed, and these entities can be
made available for such purpose, whether or not a narrative is
actually generated or supplied to the user to accompany the
visualization.
[0018] The use of automatic narrative generation, or its component
technologies such as narrative analytics, linked with a
visualization system, can also provide other opportunities to
augment a user's ability to focus a visualization, beyond the use
of notional entities that may have interesting aspects as described
above. In particular, such an approach can make it possible for a
user to interactively investigate the drivers (causes or inputs) of
particular data presented in the visualization; or to directly
select specific analytics to be both presented in the visualization
and utilized in constructing the accompanying narrative.
[0019] These and other features and advantages of the present
invention will be described hereinafter to those having ordinary
skill in the art.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] FIG. 1 depicts an example screenshot of a visualization
generated from a set of visualization data.
[0021] FIG. 2A depicts a process flow for a conventional manner of
focusing a visualization.
[0022] FIG. 2B depicts an example screenshot of a focused
visualization.
[0023] FIG. 3 depicts an example screenshot of a visualization
paired with an automatically generated narrative text that explains
and characterizes the visualization.
[0024] FIG. 4 depicts an example process flow for focusing
visualizations with the aid of narrative analytics.
[0025] FIG. 5 depicts an example system diagram for focusing
visualizations with the aid of narrative analytics.
[0026] FIG. 6 depicts an example screenshot where the visualization
and narrative of FIG. 3 are included with a focus filter button
that is selectable by a user to initiate the process of focusing
the visualization and/or narrative in a user-defined manner.
[0027] FIG. 7A depicts an example process flow for the logic used
to determine focus criteria candidates with respect to a
visualization.
[0028] FIG. 7B depicts an example set of visualization data.
[0029] FIG. 7C depicts an example mapping of visualization types to
generalized focus options.
[0030] FIG. 7D depicts an example set of generalized focus options
that are associated with a visualization type.
[0031] FIG. 7E depict examples of models for metric options from a
set of generalized focus options.
[0032] FIG. 8A depicts an example screenshot where the
visualization that includes a focus filter menu.
[0033] FIG. 8B depicts an example screenshot where the
visualization that includes a focus filter menu where the focus
filter menu includes a drop down menu of focus options.
[0034] FIG. 8C depicts an example focus criteria data
structure.
[0035] FIG. 9A depicts an example process flow for the logic used
to generate a focus configuration for use with a visualization.
[0036] FIG. 9B depicts an example focus configuration data
structure.
[0037] FIG. 10 depicts an example screenshot showing a
visualization that has been focused based on input received through
a focus filter menu.
[0038] FIGS. 11A-C how focused narratives can be generated as part
of the visualization focusing effort.
[0039] FIG. 12 depicts another example screenshot of a
visualization generated from a set of visualization data.
[0040] FIG. 13 depicts an example data ecosystem that shows
relationships that exist between visualization data and other data
stored in a system.
[0041] FIG. 14 depicts an example process flow for evaluating
potential drivers with respect to a measure shown by a
visualization.
[0042] FIG. 15A depicts an example menu for specifying a
relationship between a visualization measure and a driver candidate
to be evaluated.
[0043] FIG. 15B depicts an example screenshot of a visualization
generated from a set of visualization data that is paired with an
automatically generated narrative text that explains the results of
evaluating the potential driver relationship specified by FIG.
15A.
[0044] FIG. 16A depicts an example menu for specifying a multiple
relationships between a visualization measure and multiple driver
candidates to be evaluated.
[0045] FIG. 16B depicts an example screenshot of a visualization
generated from a set of visualization data that is paired with an
automatically generated narrative text that explains the results of
evaluating the potential driver relationships specified by FIG.
16A.
[0046] FIG. 17A depicts an example outside measures data
structure.
[0047] FIG. 17B depicts an example outside series data
structure.
[0048] FIGS. 18-23 depict examples of how analytics and be
selectively enabled and disabled in an interactive manner.
[0049] FIGS. 24-27 depict examples of how thresholds for analytics
can be interactively controlled.
[0050] FIGS. 28-31 depict examples of how predictive analytics can
be selectively enabled and disabled.
[0051] FIGS. 32-38 depict examples of how the formatting of
resultant narrative stories can be controlled.
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0052] FIG. 4 depicts an example process flow for focusing
visualizations with the aid of narrative analytics. The process
flow of FIG. 4 can be executed by a processor in cooperation with
computer memory. For ease of illustration, FIG. 5 depicts an
example system for executing the process flow of FIG. 4. The system
of FIG. 5 includes a visualization platform 500 and narrative
analytics 510. The visualization platform provides the
visualization data 502 that serves as the source for a
visualization produced by the visualization platform. The narrative
analytics 510 can access this visualization data 502 through an
application programming interface (API) or other suitable data
access/sharing techniques. Examples of visualization platforms that
may be suitable for use with the system of FIG. 5 include the Qlik
visualization platform and the Tableau visualization platform. An
example of a narrative analytics platform 510 that is suitable for
use with the system of FIG. 5 is the Quill narrative generation
platform available from Narrative Science Inc. of Chicago, Ill.
Additional details regarding how narrative analytics platform 510
that can be designed and deployed are described in U.S. patent
application Ser. No. 14/521,264, entitled "Automatic Generation of
Narratives from Data Using Communication Goals and Narrative
Analytics, filed Oct. 22, 2014, U.S. patent application Ser. Nos.
14/211,444 and 14/570,834, and U.S. Pat. Nos. 8,355,903, 8,374,848,
8,630,844, 8,688,434, 8,775,161, 8,843,363, 8,886,520, 8,892,417,
9,576,009, 9,697,197, and 9,697,492, the entire disclosures of each
of which are incorporated herein by reference.
[0053] The system of FIG. 5 may employ multiple processors such as
a first processor serving as the visualization platform 500 and a
second processor executing the narrative analytics 510, although
this need not be the case. In an example multiple processor
environment, the various processors can be resident in networked
computer systems.
[0054] The process flow of FIG. 4 describes a technical advance in
how a visualization of data can be focused around a notional entity
or characteristic even though a user may be unaware of which
specific aspects of the visualization correspond to that entity or
characteristic. To accomplish this task, a number of new data
structures and new processing logic are employed. A data structure
is a physical manifestation of information organized within a
computing system. Examples of data structures may include data
files, records, tables, arrays, trees, objects, and the like. The
process flow of FIG. 4 can be triggered in any of a number of ways.
For example, as shown by FIG. 6, a "focus" button 600 or the like
can be included in a visualization interface. In response to user
selection of this button, the process flow of FIG. 4 can be
executed.
[0055] To begin the focusing effort, the processor at step 400
determines focus criteria candidates based on narratively relevant
characteristics of the visualization data. As shown by FIG. 5, this
step can be performed by logic 504. The focus criteria candidates
are options that provide the user with choices regarding how the
visualization is to be focused. At step 400, the system may
generate a list of narratively relevant and useful notional
entities, represented or described in appropriate and intuitive
terms--i.e., at a higher level than those provided by the raw data
or metadata themselves--for presentation to the user. Such a list
can be generated in a number of ways.
[0056] For example, under a first approach, for each type of story
potentially generated by the system, or more specifically, for each
set of narrative analytics utilized in each such story type, a
developer or configurer can examine the set of derived features or
entities that might be computed in the course of generating a story
of that type. These features or entities will be represented
internally by the system as variables or notional entities. The
developer or configurer can then determine appropriate language for
describing each of these entities to a user, and provide these
terms or phrases to the system, each linked to the corresponding
internal variable or notional entity, for presentation to the user
via a selection mechanism (e.g., a menu) when a story of that type
is generated or would be potentially relevant (typically, within
this context, to accompany an initial visualization).
[0057] As another example, under a second approach, for each type
of story potentially generated by the system, a separate process
may traverse the configuration or code for generating a story of
that type, in order to automatically determine the derived features
or entities that might be computed in the course of generating a
story of that type. These can be determined, for example, by
examining the derivations used to compute such features by the
system, which must reference them in order to provide them with
specific values during analysis and story generation.
Alternatively, in other implementations, they may explicitly
declared as the relevant notional variables for a content block or
other element of an outline or configuration representing the
rhetorical structure of the story to be generated. However
identified, the variables or notional entities that are used to
represent those derived features or entities may then be added to
the list of narratively relevant entities to be presented to the
user for possible selection. They may be included in this list
automatically; or they may be presented to a developer or
configurer for inclusion on the list. The appropriate language for
describing these notional entities for presentation to the user may
be automatically copied from the corresponding blueprints or other
data structures used to represent the linguistic expressions to be
utilized by a narrative generation engine in referring to these
entities in actual stories; or such language may be determined
explicitly by the developer or configurer.
[0058] FIG. 7A depicts an example embodiment for step 400. At step
700 of FIG. 7A, the processor processes the visualization data to
determine a visualization type, story type or set of appropriate
narrative analytics corresponding to that visualization data. The
above-referenced and incorporated '385 patent application describes
examples of how this determination can be made. FIG. 7B depicts an
example data structure for visualization data 710 in relation to
the example screenshot of FIG. 6. The visualization data 710
includes not only specific data values for the data elements
presented on the line charts of FIG. 6 but also metadata about
those data values and the nature of the visualization itself. For
example, the data and metadata may include an identification of a
chart type (e.g., a line chart), a name for the measure being
plotted (e.g., wins), a data array of values for the measure, names
for the chart dimensions (e.g., years/seasons and teams), among
other forms of data and metadata. It should be understood that this
data and metadata can be organized in the data structure 710 in any
of a number of formats. FIG. 7B is merely an example for the
purposes of illustration. The appropriate visualization type, story
type and/or set of narrative analytics can be determined at step
700 based on various fields of the data and metadata within
visualization data 710. For example, the visualization data
structure 710 may include metadata that identifies a chart type for
the visualization (e.g., line chart) and other fields may indicate
other characteristics of the chart (such as the number of
entities/lines for the line chart and whether the line chart is a
time series), which in turn indicate the appropriate visualization
type, story type and/or set of narrative analytics.
[0059] Next, at step 702, the processor selects general focus
options based on the determined story type or set of narrative
analytics. The system may include data that maps visualization
types or story types to general focus options to support step 702.
For example, as shown by FIG. 7C, a table 720 may associate
different visualization types 722 with different sets of general
focus options 724. Thus, if step 700 results in a determination
that the visualization type is a multi-entity line chart that
includes a time series, step 702 can select the set of general
focus options that are associated with this visualization type by
table 720. It should be understood that while the example of FIG.
7C uses visualization type 722 as the basis for association with
general focus options 724, the data structure could also use a
story type or set of narrative analytics found at step 700 as the
basis for selecting associated general focus options 724. Thus, by
way of example, data structure 720 can associate the story type
appropriate for a multi-entity line chart that includes a time
series with a set of general focus options 724. Also, while FIG. 7C
shows a table 720 being used for the associations with sets of
general focus options, it should be understood that other
techniques for association could be used, such as a rules-based
approach or others.
[0060] FIG. 7D depicts an example set of general focus options 724
that could be associated with a visualization type or story type
such as that appropriate for or corresponding to a multi-entity
time series line chart. These options 724 can be organized as a
data structure that includes the following as a generalized focus
expression 730: "Focus on <NUMBER OF ITEMS> <RANK
GROUP> <DIMENSION NAME(S)> by the <METRIC> of the
<MEASURE> values" as shown by FIG. 7D. This focus expression
730 is characterized as "general" because it has not yet been
provided with specific parameters as to the features or aspects of
the data or metadata that are to be the subject of the focus
effort. An example of a specific focus expression corresponding to
this general expression would be: "Focus on the 3 highest ranking
teams by the starting value of the wins values". Another example of
a specific focus expression would be "Focus on the 2 lowest ranking
teams by the average value of the wins values".
[0061] The tokens in this expression (e.g., <NUMBER OF
ITEMS>) are variables whose values will help specify the focus
filter to ultimately be applied to the visualization data. These
variables can be parameterized to specific values in response to
user input and/or automated data processing.
[0062] The variables <NUMBER OF ITEMS> 732 and <RANK
GROUP> 734 define a rank criterion for focusing and an
associated volume criterion for the rank criterion for the focusing
effort. The data structure 724 can include a specification of
options for these variables such as the set {1, 2, 3, . . . } for
the <NUMBER OF ITEMS> variable 732 and the set {Highest,
Lowest, Median, Most Average, . . . } for the <RANK GROUP>
variable 734. A user can then select from among these options to
define specific values for these variables.
[0063] The variable <DIMENSION NAME(S)> 736 can be
specifically parameterized based on the visualization data 710 that
is the subject of the focusing effort. The processor can select the
option(s) for this variable based on the data and metadata within
the visualization data 710. For example, with respect to the
example of FIG. 7B, it can be seen that the visualization data
includes two dimensions--Year and Teams. These can therefore be the
options that are used to determine the specific value(s) of the
<DIMENSION NAME(S)> variable 736 in this instance.
[0064] The variable <METRIC> 738 can be used to refer to or
denote a metric by which the measure values will be evaluated as
part of the focusing effort. The data structure 724 can include a
specification of options for the value of this metric variable such
as the set {Starting Value, Ending Value, Average Value, Median
Value, Percent Change, Absolute Change, Volatility, . . . }, as
shown by FIG. 7D. Furthermore, each of these metrics options can be
associated with a corresponding model that defines how that metric
is to be computed, as shown by FIG. 7E. For example, the Average
Value metric option can be associated with an Average Value Model
750 that defines how an average value will be computed in terms of
various input and output parameters. Similarly, the Volatility
metric option can be associated with a Volatility Model 752 that
defines how volatility will be computed in terms of various input
and output parameters. It should be understood that some metric
options may be assumed by default to apply to certain elements of
the time series (e.g, the Percent Change metric being computed by
default as the percent change between the first and last measure of
the time series). However, if desired, a practitioner could include
additional features whereby a user or automated process can further
define constraints on which elements of the time series for which
(or over which) metrics such as "percent change", "absolute
change", etc., are to be computed (e.g., a user-defined time
span).
[0065] The variable <MEASURE> 740 can be parameterized based
on the visualization data 710 that is the subject of the focusing
effort. The processor can select the measure based on the data and
metadata within the visualization data 710. For example, with
respect to the example of FIG. 7B, it can be seen that the
visualization data includes "wins" as the measure, and this measure
can therefore be used as the value for the <MEASURE> variable
740.
[0066] Returning to the process flow of FIG. 7A, at step 704, the
processor specifies the focus options variables that are defined as
a function of the visualization data. For example, at step 704, the
processor parses the visualization data 710 to select the options
for the <DIMENSION NAME(S)> variable 736 and the identity of
the <MEASURE> variable 740 as discussed above. However, it
should be understood that in other embodiments, more or fewer
variables could be defined as a function of the visualization data.
For example, the set of options for the value of the <NUMBER OF
ITEMS> variable 732 need not be a predefined list as shown by
FIG. 7D and could instead be defined dynamically as a function of a
count of how many dimension elements are present in the
visualization data (e.g., a count of the number of teams in the
example of FIG. 7B).
[0067] At the conclusion of step 704, the processor has a defined
set of focus criteria candidates to use as part of the focusing
effort. Returning to FIGS. 4 and 5, the processor can next present
the determined focus criteria candidates to a user for selection
(step 402). By way of example, this presentation can occur through
a focus filter menu 506 as shown by FIG. 5. This focus filter menu
506 can be a structured interactive menu that includes a plurality
of fields through which a user can choose from among the various
options for various focus variables. FIG. 8A shows an example focus
filter menu 800 where the menu includes fields 802, 804, and 806
through which a user can define the values to be used for the
<NUMBER OF ITEMS> variable 732, <RANK GROUP> variable
734, and <METRIC> variable 738 respectively. FIG. 8B shows
how drop down menus (see 820) can be included with the focus filter
menu 800 to provide the user with a list of the available options
for the value of each variable determined as a result of the
process flow of FIG. 7.
[0068] At step 404, the processor receives user input that defines
selections for the focus criteria candidates available through menu
506. The system now has a specific set of focus criteria to use for
its focusing effort. FIG. 8C shows an example data structure that
contains the focus criteria defined in response to the user input
at step 404. The example focus criteria data structure 830
expresses a focus statement of "Focus on the 3 highest ranking
teams by the starting value of the wins values". It should be
understood that other formats could be used for the focus criteria
data structure if desired by a practitioner.
[0069] At step 406, the processor uses the focus criteria in
combination with the visualization data to identify data and
metadata elements within the visualization data that satisfy the
focus criteria. Logic 508 shown by FIG. 5 can perform this step.
For step 406, the narratively relevant notional entities (derived
features or entities) are computed in specific terms of the case at
hand--i.e., the actual entities filling those roles are determined.
For instance, in an example concerning volatility, the level of
volatility is assessed for each of the series displayed in the
visualization; and then the (in this case) three series with the
highest level of volatility are identified in the specific terms
utilized by the visualization system (e.g., via the metadata labels
denoting these specific entities). These computations may already
have been performed by the narrative generation system, or they may
be performed on demand when the user selects the notional entity or
entities in question to focus on. These computations in a sense
carry out a "translation" process, in which a selection by the user
of entities expressed in narratively relevant notional terms (or
criteria) is transformed into a selection in terms recognizable by
the visualization platform itself, i.e., from among the elements of
the raw data or meta-data comprising the current visualization.
[0070] FIG. 9 shows an example process flow for this logic 508. At
step 900 of FIG. 9, the processor instantiates data structures and
code for computing the derived features needed for (or specified
by) the focus criteria. Continuing with the previous example of
visualization data, if the focus criteria involved finding the 3
teams with the highest volatility in win totals, step 900 can
include instantiating data and code for computing the win totals
volatility for each team in the visualization data, which would
include accessing Volatility Model 752 and mapping the input
parameter(s) for the volatility model to specific data elements of
the visualization data (e.g., the wins data array) and mapping the
output parameter(s) for the volatility model to data structures
that hold volatility values. Next, at step 902, the processor
computes the necessary data components for the focus criteria based
on the instantiated data structures and code. Continuing with the
example, at step 902, the processor would use the instantiated
volatility model 752 to compute a volatility value for each team in
the visualization data. Then, at step 904, the processor applies
other focus criteria to the computed data components to determine a
focus configuration to use for the focusing effort. For example, if
the focus criteria specified finding the 3 teams with the highest
volatility in win totals, step 904 would operate to sort or rank
the computed volatility values for each team and then select the 3
teams with the highest volatility values. The result of step 904
would thus be a focus configuration data structure that identifies
the specific elements within the visualization data that satisfy
the focus criteria. FIG. 9B shows an example of such a focus
configuration data structure 910. In the example of FIG. 9B, the
focus criteria was "Focus on the 3 highest ranking teams by the
starting value of the wins values". It can be seen in FIG. 9B that
the operation of the process flow of FIG. 9A resulted in the
identification of the "Baltimore Orioles", "Kansas City Royals",
and "New York Yankees" as the teams that satisfied these focus
criteria.
[0071] Using this focus configuration data, a call can be made to
the visualization platform via a selection API of the visualization
platform. When that call and the focus configuration data are
received, the visualization platform triggers the selection defined
in the focus configuration data and the visualization is updated to
reflect the selection of the focused entities via mechanisms within
the visualization platform itself (step 408). FIG. 10 depicts an
example screenshot showing how the visualization of FIG. 1 can be
focused on the three teams that exhibited the highest volatility in
win totals.
[0072] Thus, in contrast to conventional visualization focusing
techniques, example embodiments of the invention use innovative new
data structures and associated processing logic as discussed above
to provide users with a capability to specify focus criteria in
entirely notional terms without a requirement that the specific
focus criteria be known in advance. The processing logic leverages
these new data structures to translate the notional specification
of focus criteria into specific components of visualization data
that are to be the subject of the focusing effort. The inventors
believe this is a significant improvement over conventional
visualization focusing technology that requires a user to select
specific existing elements of a visualization as the subjects of
the focusing effort (e.g., selecting a specific line or team on the
chart shown by FIG. 1).
Focused Visualizations and Focused Narratives:
[0073] In additional example embodiments, the inventors further
disclose that the focus criteria, focus configuration data, and/or
focused visualization data can also be used to automatically
generate a focused narrative for pairing with the focused
visualization. FIG. 10 shows an example of such a focused narrative
1000. FIG. 11A shows an example process flow that corresponds to
the process flow of FIG. 4 with an added step for invoking a
narrative generation platform to generate the focused narrative
(step 1110). While FIG. 11A shows step 1150 occurring after the
focused visualization is generated, it should be understood that
step 1150 could occur before the focused visualization is generated
and/or be performed in parallel with the FIG. 4 process flow. The
above-referenced and incorporated '385 patent application describes
technology regarding how narrative analytics can be used to
automatically generate narrative texts that summarize and explain
visualizations. The inventors further disclose that the narrative
generation technology described in the above-referenced and
incorporated '385 patent application can be extended to leverage
the focusing efforts to also provide an appropriately focused
narrative that can accompany the focused visualization.
[0074] The narrative generation platform that performs step 1150
can be a highly flexible platform capable of generating multiple
types of narrative stories using a common platform that operates on
parameterized story configurations (examples of which are described
in several of the above-referenced and incorporated patents and
patent applications), it should be understood that the narrative
generation platform need not necessarily employ such a modular and
flexible approach to narrative generation. For example, as
discussed in the above-referenced and incorporated '385 patent
application, the narrative generation platform may include a number
of separate software programs that are coded to generate specific
types of stories, and the process flow of FIG. 11A can decide which
of these software programs will be used to generate a story with
respect to a given visualization type/story type under
consideration for the focusing effort. Appendix 1 also discusses
other designs for the narrative generation platform that could be
used. That being said, the inventors believe that the use of a
highly modular and parameterized narrative generation platform in
combination with focusing efforts will be particularly adept and
robust when generating focused narrative text that explains a
focused visualization in greater detail. As such, the descriptions
below will elaborate on step 1150 where a specific story type is
expressed, using some specification language, as a configuration
for a configurable narrative generation platform or engine, which
can then produce narratives of the appropriate type as specified by
that configuration.
[0075] Focused narrative generation can be triggered in any of a
number of ways. For example, the focused narrative generation can
use focused visualization data that is provided by the
visualization platform in response to its processing of the focus
configuration data (see the example process flow of FIG. 11B whose
operation is described in greater detail in the above-referenced
and incorporated '385 patent application and where step 1100 uses
the focused visualization data as its starting point). As another
example, if the focusing operation involved mapping a visualization
to a story configuration for a narrative generation system, this
mapped story configuration can be used as the starting point for
narrative generation (see FIG. 11C). Under this approach, the
narrative generation system can be directly invoked with the
appropriate data and metadata, as determined by the focus
configuration, in parallel with those data and metadata being
supplied to the visualization system to produce a focused
visualization. Under either exemplary approach, since the type of
data and metadata, including the type of the visualization, remain
the same, the same story type can be utilized in generating the
focused visualization as was utilized in generating the original
visualization, except now parameterized with appropriately
restricted data, as described in the above-referenced and
incorporated '385 patent application.
[0076] Thus, it can be seen that the narrative analytics technology
described herein can be used to generate focused visualizations
and/or focused narratives that are responsive to a user's notional
specification of focus criteria. For example, a first practitioner
might find it desirable to use the disclosed narrative generation
technology to generate focused visualizations without any pairing
with a focused narrative to accompany the focused visualization,
while a second practitioner might find it desirable to use the
disclosed narrative generation technology to generate both a
focused visualizations and a focused narrative that accompanies the
focused visualization. Still further, a third practitioner might
find it desirable to use the disclosed narrative generation
technology to generate a focused narrative that accompanies a
visualization but where the visualization itself is not updated in
a focused manner.
Driver Evaluation as the Interaction with a Visualization:
[0077] In additional example embodiments, the inventors further
disclose a system that can support driver evaluation when a user
interacts with a visualization. As explained in the
above-referenced and incorporated U.S. Pat. No. 9,576,009, many
measures depicted by data visualizations exhibit values that are
driven by other measures, e.g., the values of which may be
determined by these other measures. These other measures can be
referred to as "drivers". For example, "drivers" for a revenue
measure may include measures such as "units sold" and "price per
unit". The inventors believe that there is a need in the art for
technology that uses drivers as the criteria by which to focus a
narrative that accompanies a visualization of data.
[0078] FIG. 12 depicts an example visualization that includes a
line chart showing the average attendance for major league baseball
teams for each season from 1980 through 2015. The example
screenshot of FIG. 12 also includes narrative text 1200 that
accompanies the visualization. This narrative text, which can be
automatically generated via the technology disclosed in the
above-referenced and incorporated '385 patent application, explains
and summarizes various aspects of the average attendance line
chart. The inventors believe that there is a strong need in the art
for tools that would help users evaluate potential drivers of the
measure shown by the line chart, or of other types of
visualizations. In the context of the example of FIG. 12, what
drivers may influence average attendance at major league baseball
games? Is it home runs (where more home runs might be hypothesized
to lead to more fan excitement which leads to improved attendance)?
Is it the number of hits (under a similar assumption as home runs
but where hits are perhaps not deemed as exciting by fans as home
runs)? Some other driver?
[0079] To support the evaluation of potential drivers as focus
criteria, the system can leverage a larger ecosystem of data. In an
example embodiment, the user can select one or more measures that
could potentially drive the performance of a given subject measure
in a visualization. The measures selected as potential drivers can
be data or metadata that are manifest within the visualization
system and/or narrative analytics system but that are not
necessarily reflected in the source visualization data used for the
subject visualization presented to the user. An example of this can
be seen in FIG. 13.
[0080] FIG. 13 shows example visualization data 1300 that
corresponds to the visualization seen in FIG. 12. This
visualization data 1300 can live within a larger data ecosystem
where other data structures and data elements stored in the system
memory for either or both the visualization platform and narrative
analytics are associated in some manner with data elements of the
visualization data 1300. For example, the visualization data 1300
may live within a larger ecosystem of baseball data with data
structures 1302 for various teams, data structures 1304 for various
games, etc. Each data structure 1302, 1304 may include data
elements that describe the teams and games. The data elements
within this data ecosystem may serve as potential drivers for the
visualization data's subject measure (in this case, average
attendance).
[0081] FIG. 14 shows an example process flow for evaluating
potential drivers with respect to a measure shown by a
visualization in order to generate a visualization narrative that
is focused on the potential drivers.
[0082] At step 1400, a processor processes the visualization data
and associated data within the data ecosystem to determine the
subject visualization measure and driver candidate options for that
visualization measure. This step can be performed by first
extracting the subject visualization measure from the visualization
data and then by selecting the data elements within the ecosystem
that bear some form of a relationship with the subject measure. For
example, as shown in FIG. 13, "Average Attendance" can be selected
as the subject visualization measure, and data elements such as
"Hits", "Walks", Home Runs", and others may be selected as driver
candidate options at step 1400. In an example embodiment, an
extension can query the visualization platform for all the other
measures currently manifest within the memory of the application.
This results in a set of possible "outside measures". The names of
these measures will be what the user sees in the user interface
when selecting driver options. The raw data values of these
measures will be used when performing regression analysis to assess
the nature of the relationship between the potential driver and the
visualization measure. FIG. 17A shows an example array that could
represent a driver for evaluation (or set of possible drivers if
multiple drivers were specified).
[0083] At step 1402, the selected driver candidate options are
presented to a user. FIG. 15A shows an example of how this step can
be performed. A user interface menu can be presented on a user
computer, and this interface can be configured to guide the user
toward defining a driver candidate 1504 for a subject measure 1502.
A button 1506 can be provided that responds to user selection by
presenting a drop down menu or the like that lists the driver
candidate options selected at step 1400. From this list, a user can
select which of the driver candidate options are to be used as
driver candidates in generating the accompanying narrative.
[0084] At step 1404, the processor receives user input
corresponding to a selection of a driver candidate option to define
a specification of a relationship between the subject visualization
measure and a driver candidate. In the example of FIG. 15A, the
user has selected "Home Runs" as the driver candidate, which
specifies a relationship where "Average Attendance" could be
impacted by "Home Runs". FIG. 16A shows another example user
interface menu where multiple driver relationships have been
specified.
[0085] At step 1406, the processor maps the specified driver
relationship(s) to a driver story configuration within the
narrative analytics system. For example, the specified driver
relationship(s) may be mapped to an "Evaluate Driver" story
configuration. This story configuration can include a specification
of processing logic that is executable to perform linear
regression(s) of the independent variable(s) (the selected driver
candidate(s)) versus the dependent variable (the subject
visualization measure) in order to quantify a potential correlation
between the driver candidate(s) and the subject visualization
measure, as well as characterization logic (angles) that assess the
results of these correlations in narrative terms to be included in
the final result. Also, as mentioned above, in an embodiment where
the narrative generation platform employs special-purpose, story
type-specific adaptations of the narrative generation software
rather than a configurable general purpose platform, it should be
understood that step 1406 may operate to map the specified driver
relationship to the appropriate special-purpose narrative
generation adaptation rather than to a story configuration.
[0086] At step 1408, the processor collects data associated with
the driver candidate(s). This data collection can be referred to as
an "outside series". The "outside series" data are kept distinct
and treated differently from the subject measure because the
visualization and accompanying narrative are still focused on the
subject measure. The additional data from these outside series can
be used for regression analysis when evaluating potential driver
status, but may not necessarily comprise a major focus of the
narrative. FIG. 17B shows an example "outside series" data
structure.
[0087] Steps 1410 through 1416 then operate to render narrative
text from a story configuration as described in connection with the
above-referenced and incorporated patents and patent
applications.
[0088] At step 1410, the process instantiates the driver story
configuration based on the visualization data and the specified
drive candidate relationship(s) (using technology as described in
the above-referenced and incorporated '385 patent application and
the other above-referenced and incorporated patents and patent
applications).
[0089] Then, at step 1412, the processor computes the necessary
data components for the instantiated driver story configuration.
Continuing with the example where the driver story configuration is
geared around evaluating how average attendance could be impacted
by home runs, this step may include computing the number of home
runs for each seasons and computing linear regression parameters
for use in assessing the potential influence of each season's home
run totals on each season's average attendance. This regression
analysis uses the raw values of the subject measure (e.g., average
attendance) against the raw values of any of the driver candidates
(e.g., see FIG. 17B). The regression analytic looks, first, to
assess the overall relationship between all of the driver
candidates against the subject measure.
[0090] At step 1414, the processor creates an instantiated
narrative outline based on the instantiated driver story
configuration and the computed data components. This step may
include evaluating various angles that may be included in the
driver story configuration (such as angles that assess whether a
given characterization of the driver candidate is accurate--e.g.,
an angle corresponding to a "strong relationship between driver and
measure" that tests whether the correlation between the measure and
driver is above a threshold based on the computed data components;
and similar angles corresponding to weak or medium relationships,
etc.). In an example drawn from FIG. 16A, this outline is defined
by the driver story configuration and may exhibit a structure such
as a first sentence in a section pertaining to drivers that
explains an overall summary of the driver analysis. Subsequent
parts of the section pertaining to drivers can then explain which
of the driver candidates had the strongest relationship to the
subject measure and provide supporting evidence. Still further,
other portions of the section pertaining to drivers can call out
other results of the calculations that may be relevant to the
user's understanding of the relationships of the drivers to the
subject metric.
[0091] At step 1416, the processor renders narrative text based on
the narrative outline using NLG techniques. The rendered narrative
text can explain the nature of the relationship between the
visualization measure and the driver candidate. FIG. 15B depicts an
example of such narrative text 1510 for the example case where the
number of home runs is evaluated as a potential driver of average
MLB attendance by season.
[0092] FIG. 16A depicts an example selection menu for specifying
potential driver relationships where the user has provided input
for evaluating multiple potential drivers relative to the
visualization measure. In an example such as this, step 1408 would
operate to map the specified relationships to a driver story
configuration that is designed to evaluate and discuss each
specified relationship. FIG. 16B shows an example narrative text
1610 that could be produced as a result of the process flow of FIG.
14 applied to the specified relationships of FIG. 16A. This
narrative text 1610 represents the natural language expression of
the outline discussed above in relation to the FIG. 16A example,
namely a first sentence that is a summary of the driver analysis:
[0093] "HRs per Team, Walks per Game, and Hits per Game each had a
strong relationship with the value of attendance . . . " a second
section that discusses each driver candidate: [0094] " . . . but
HRs per Team had the most definitive relationship. Specifically,
when HRs per Team increased by 1, attendance increased 10,035.
There may be other factors at play, but there is evidence of a very
strong relationship." and a third section that presents some
additional relevant information relating to the driver analysis:
[0095] "Keep in mind that the uneven predictive power of Hits per
Game, HRs per Team, Walks per Game on attendance (non-normal
residuals) suggests that there may other factors at play besides
those three."
Interactive Analytics:
[0096] Another interactive feature of the system enables users to
both enable/disable a particular analytic package from being run in
the creation of a narrative as well as to customize when (and how)
the results of a particular analytic are expressed in a
narrative.
[0097] Specifically, when generating a narrative to accompany a
particular visualization, the user will be presented with the
available analytics potentially utilized in generating the type of
story that would be appropriate to accompany that visualization.
For example, when a user is generating a narrative relating to a
single line chart, as shown in the FIG. 18, he or she can be
presented with the available analytics for the story type relevant
to such a chart.
[0098] In this example, this story type will have the analytic
packages shown by FIG. 19 available. "Segments" in FIG. 19 refers
to analytics that would be used to generate content about
interesting parts of a time series (e.g., peaks, jumps, troughs).
The narrative will utilize these analytics by default. For example,
the narrative can include content about the longest segment of a
continuing trend (see FIG. 20).
[0099] When the data payload is sent to the narrative API it
contains information about whether or not to utilize this analytic.
In this case, it is `enabled` (see FIG. 21 showing an example
portion of a data payload structure).
[0100] However, a user could choose to `disable` any of these
analytic packages. The example of FIG. 22 shows a case where
`Segments` is disabled. The content shown in FIG. 20 will,
therefore, not appear in the accompanying narrative. FIG. 23 shows
an example narrative generated when `Segments` is disabled.
Additional Analytics Configurations--Inclusion Thresholds:
[0101] In addition to enabling/disabling a particular analytic, a
user can control the conditions under which the results of that
analytic will be examined and conveyed in the accompanying
narrative. Here we present an example of what a user sees when he
or she digs into a particular analytic (segment analytics, as
described above) in order to achieve this level of control.
[0102] Specifically, the user can apply a `threshold` that must be
met in order for the results of segment analytics to be discussed
in the story. This is a percent change that must be met for the
corresponding content to appear in the narrative, controlled by the
user in this case via a slider (although any appropriate interface
mechanism might be used), as illustrated in FIG. 24.
[0103] In this (perhaps somewhat extreme) example, the user has
indicated (as illustrated in FIG. 25) that he or she does not want
content about segment analytics to appear in the resulting
narrative unless the percent change is higher than 200%. When a
user applies this sort of customization, the data payload sent to
the narrative API would contain information about this threshold
(see FIG. 26 showing an example portion of such a data payload
structure where the `inclusion threshold` is set to `2` (i.e.,
200%)).
[0104] The narrative generation system will then use this
information to determine whether or not to discuss the results of a
particular analytic in the narrative. For example, a narrative
generation system (such as the Quill system commercialized by
Narrative Science) can employ a `derivation` that determines
whether a particular segment meets the specified criteria (see FIG.
27).
[0105] This derivation is used by the system in assessing the most
interesting segment of a common trend in order to determine whether
it's overall percent change meets the threshold of 200%. In this
case, it does not. Therefore, content discussing the results of
this analytic will not be included in the narrative.
Additional Analytics Configurations--Predictive Analytics:
[0106] Continuing with our original example, another set of
analytics potentially relevant in narratives appropriate for
continuous data, e.g., time series, as typically conveyed in line
charts, concerns trend lines. For this `Trend line" analytic, a
user can similarly set a threshold to be applied based on the
confidence level of the actual trend line as supported by the data.
For example, if a general trend can only be identified at the 85%
confidence interval, it will not be discussed in the narrative if
the inclusion threshold is set at 95%. That is, the general trend
must be identified at least a confidence interval of 95% to be
included. A user would set this threshold as illustrated in FIG.
28:
[0107] Also pertaining to the results of the "Trend Line" analytic,
a user can choose to have the narrative include content about the
predicted future movement of the data series (assuming the data fit
a statistically significant trend line), as well as specifying how
far forward to issue a prediction (see FIG. 29). When a user
supplies this configuration, the payload sent to the narrative API
when invoking the system to generate a narrative includes this
information as well (see FIG. 30 showing an example portion of such
a data payload structure). With this information included, the
narrative system is directed to include content about the
prediction; after all calculations have been carried out, the
resulting statistical information about the prediction is included
in the data and utilized in generating the resulting narrative as
illustrated by FIG. 31.
Additional Analytics Configurations--Formatting:
[0108] In addition to supplying thresholds for the inclusion of
content in a narrative, a user can also set thresholds for applying
appropriate formatting to portions of the narrative. For example,
with appropriate thresholds set, a narrative could be generated to
appear as shown in FIG. 32.
[0109] As can be seen, in this narrative the text describing
positive changes is highlighted in green, while the text describing
negative changes is highlighted in red. In order to achieve this
effect, a user first applies the formatting rules for `good
changes` and `bad changes`. These choices are then used when
generating the narrative once a user chooses to enable formatting,
as illustrated in FIG. 33.
[0110] When a user is customizing his or her analytics, he or she
can choose to apply this sort of formatting to the content, and
then set the percentage change that must be met in order to apply
the formatting, as illustrated in FIG. 34.
[0111] When this configuration is supplied by a user, the data
payload sent to the narrative API when invoking the narrative
generation system contains the appropriate information about the
user's preferences, as shown by FIG. 35.
[0112] A narrative generation system can then use this information
to determine how to style content when it is rendered in HTML. For
example, the derivation shown in FIG. 36 is used to determine
whether certain content should be formatted in this way. If the
test is true, appropriate HTML tags will be applied to this
content, as shown by FIG. 37. When the HTML is outputted by our
system, the front end display application knows to respect the
custom tag to apply the custom formatting chosen by the user, as
illustrated in the FIG. 38.
[0113] While the invention has been described above in relation to
its example embodiments, various modifications may be made thereto
that still fall within the invention's scope. Such modifications to
the invention will be recognizable upon review of the teachings
herein.
* * * * *