U.S. patent application number 12/210253 was filed with the patent office on 2009-03-26 for system, method, and computer program product for data mining and automatically generating hypotheses from data repositories.
Invention is credited to Anthony Prestigiacomo, Vijay V. Raghavan, Ying Xie.
Application Number | 20090083208 12/210253 |
Document ID | / |
Family ID | 38510270 |
Filed Date | 2009-03-26 |
United States Patent
Application |
20090083208 |
Kind Code |
A1 |
Raghavan; Vijay V. ; et
al. |
March 26, 2009 |
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR DATA MINING AND
AUTOMATICALLY GENERATING HYPOTHESES FROM DATA REPOSITORIES
Abstract
Various embodiments of the present invention provide systems,
methods, and computer programs for generating a hypothesis.
Specifically, some method embodiments include steps for accessing a
system for extracting relationships and determining a relationship
rule defining a relationship among a plurality of phrases and a
plurality of concepts stored in the system for extracting
relationships. Such embodiments further provide steps for parsing a
plurality of documents in a data repository according to the
relationship rule and generating a hypothesis comprising a
previously unknown combination of phrases and concepts being at
least partially determined from the parsed plurality of documents.
Various embodiments also provide a step for presenting the
hypothesis to a user so as to indicate the previously unknown
combination.
Inventors: |
Raghavan; Vijay V.;
(Lafayette, LA) ; Xie; Ying; (Kennesaw, GA)
; Prestigiacomo; Anthony; (Baton Rouge, LA) |
Correspondence
Address: |
ALSTON & BIRD LLP
BANK OF AMERICA PLAZA, 101 SOUTH TRYON STREET, SUITE 4000
CHARLOTTE
NC
28280-4000
US
|
Family ID: |
38510270 |
Appl. No.: |
12/210253 |
Filed: |
September 15, 2008 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
PCT/US2007/063983 |
Mar 14, 2007 |
|
|
|
12210253 |
|
|
|
|
60782935 |
Mar 15, 2006 |
|
|
|
Current U.S.
Class: |
706/47 |
Current CPC
Class: |
G06F 2216/03 20130101;
G16H 50/70 20180101; Y02A 90/10 20180101; G06F 16/30 20190101; G16H
70/60 20180101; G06F 16/35 20190101; G06N 5/02 20130101 |
Class at
Publication: |
706/47 |
International
Class: |
G06N 5/02 20060101
G06N005/02 |
Claims
1. A method for generating a hypothesis, the method comprising:
accessing a system for extracting relationships, the system for
extracting relationships comprising a plurality of phrases and a
plurality of concepts; determining a relationship rule defining a
relationship among at least a portion of the plurality of phrases
and at least a portion of the plurality of concepts; parsing a
plurality of documents in a data repository according to the
relationship rule, the plurality of documents each comprising at
least a portion of one of the plurality of phrases and the
plurality of concepts; generating a hypothesis comprising a
previously unknown combination, the previously unknown combination
including one of at least one of the plurality of phrases and at
least one of the plurality of concepts, the previously unknown
combination being at least partially determined from the parsed
plurality of documents; and presenting the hypothesis so as to
indicate the previously unknown combination.
2. A method according to claim 1, wherein the relationship rule is
selected from the group consisting of: an assignment of at least
one of the plurality of phrases to at least one of the plurality of
concepts; an assignment of at least one of the plurality of phrases
to a relationship identifier, the relationship identifier linking a
first one of the plurality of concepts to a second one of the
plurality of concepts; an assignment of at least one of the
plurality of concepts to a semantic category; an arrangement of at
least a portion of the plurality of concepts in a hierarchical
relationship, wherein a first one of the portion of concepts
comprises a child concept and a second one of the portion of
concepts comprises a parent concept; and combinations thereof.
3. A method according to claim 1, wherein at least a portion of the
plurality of documents comprises at least one of a first concept, a
second concept, and a third concept, and wherein the parsing step
further comprises: detecting a first relationship between the first
and second concepts; detecting a second relationship between the
second and third concepts; detecting a third relationship between
the first and third concepts; and determining a potential chain
relationship among the first second, and third concepts at least
partially from the detected first, second, and third relationships;
and wherein the generating step further comprises generating a
chain hypothesis comprising the previously unknown combination of
the first, second, and third concepts.
4. A method according to claim 1, wherein at least a portion of the
plurality of documents comprises at least one of a first concept, a
second concept, and a plurality of linking concepts, and wherein
the parsing step further comprises: detecting a first relationship
between the first concept and a first portion of the plurality of
linking concepts; detecting a second relationship between the
second concept and a second portion of the plurality of linking
concepts; and determining a potential substitution relationship
between the first concept and the second concept at least partially
from the detected first and second relationships and a number of
overlapping concepts present in both the first portion and the
second portion of the plurality of linking concepts; and wherein
the generating step further comprises generating a substitution
hypothesis comprising the previously unknown combination of at
least one of the first and second concepts with a portion of the
plurality of linking concepts not present in the number of
overlapping concepts.
5. A method according to claim 4, wherein the parsing step further
comprises determining a strength of the potential substitution
relationship between the first and second concepts based at least
in part on the number of overlapping concepts present in both the
first portion of the second portion of the plurality of linking
concepts.
6. A method according to claim 1, wherein at least a portion of the
plurality of documents comprises at least one of a first concept, a
second concept, and a third concept, and wherein the parsing step
further comprises: detecting a first relationship between the first
concept and the second concept; detecting a second relationship
between the second concept and the third concept; and determining a
potential pairwise relationship between the first concept and the
third concept at least partially from the detected first and second
relationships; and wherein the generating step further comprises
generating a pairwise hypothesis comprising the previously unknown
combination of the first and third concepts.
7. A method according to claim 6, wherein the parsing step further
comprises assessing a strength of the potential relationship
between the first and third concepts at least partially from a
known secondary relationship between the first and third
concepts.
8. A method according to claim 7, wherein the known secondary
relationship comprises a common semantic category including both
the first and third concepts.
9. A method according to claim 8, wherein the relationship rule
comprises the common semantic category.
10. A method according to claim 1, further comprising: identifying
a portion of the plurality of documents in the data repository
associated with a user; creating a user profile based at least in
part on the identified documents, the user profile being indicative
of a user information need; and modifying the hypothesis in
response to the user profile such that the modified hypothesis at
least partially corresponds to the user information need.
11. A method according to claim 10, wherein the user profile
comprises at least one semantic category and wherein the method
further comprises filtering the presented hypothesis such that the
previously unknown combination includes only at least one phrase
and at least one concept corresponding substantially to the at
least one semantic category.
12. A method according to claim 1, wherein presenting the
hypothesis comprises presenting a display to a user comprising a
visual representation of the previously unknown combination
including one of at least one of the plurality of phrases and at
least one of the plurality of concepts.
13. A method according to claim 12, wherein the visual
representation comprises an interactive icon configured to be
selectable by the user, the interactive icon being further
configured to modify the display when selected by the user.
14. A method according to claim 1, wherein the system for
extracting relationships is selected from the group consisting of:
a vocabulary database corresponding to a selected subject area; a
predetermined lexicon; a semantic network; a metathesaurus; and
combinations thereof.
15. A method according to claim 1, wherein the data repository is
selected from the group consisting of: a biomedical literature
database; a medical records database; a chemical literature
database; a computer science literature database; a physics
literature database; a legal literature database; a psychology
literature database; a social science literature database; a news
periodical database; a business journal database; and combinations
thereof.
16. A method according to claim 1, further comprising storing the
determined relationship rule for later or repeated use in the
subsequent parsing step.
17. A method according to claim 1, further comprising verifying the
hypothesis using at least one independent resource.
18. A computer program product for generating a hypothesis based on
a plurality of documents in a data repository in a manner that
reduces the burden on the data repository, said computer program
product comprising a computer-readable storage medium having
computer-readable program code portions stored therein, the
computer-readable program code portions comprising: a first set of
computer instructions for accessing a system for extracting
relationships, the system for extracting relationships comprising a
plurality of phrases and a plurality of concepts; a second set of
computer instructions for determining a relationship rule defining
a relationship among at least a portion of the plurality of phrases
and at least a portion of the plurality of concepts; a third set of
computer instructions for parsing the plurality of documents in the
data repository according to the relationship rule, the plurality
of documents each comprising at least a portion of one of the
plurality of phrases and the plurality of concepts; a fourth set of
computer instructions for generating a hypothesis comprising a
previously unknown combination, the previously unknown combination
including one of at least one of the plurality of phrases and at
least one of the plurality of concepts, the previously unknown
combination being at least partially determined from the parsed
plurality of documents; and a fifth set of computer instructions
for presenting the hypothesis so as to indicate the previously
unknown combination.
19. A computer program product according to claim 18, wherein the
relationship rule is selected from the group consisting of: an
assignment of at least one of the plurality of phrases to at least
one of the plurality of concepts; an assignment of at least one of
the plurality of phrases to a relationship identifier, the
relationship identifier linking a first one of the plurality of
concepts to a second one of the plurality of concepts; an
assignment of at least one of the plurality of concepts to a
semantic category; an arrangement of at least a portion of the
plurality of concepts in a hierarchical relationship, wherein a
first one of the portion of concepts comprises a child concept and
a second one of the portion of concepts comprises a parent concept;
and combinations thereof.
20. A computer program product according to claim 18, wherein at
least a portion of the plurality of documents comprises at least
one of a first concept, a second concept, and a third concept, and
wherein the third set of computer instructions for parsing further
comprises: a sixth set of computer instructions for detecting a
first relationship between the first and second concepts; a seventh
set of computer instructions for detecting a second relationship
between the second and third concepts; an eighth set of computer
instructions for detecting a third relationship between the first
and third concepts; and a ninth set of computer instructions for
determining a potential chain relationship among the first second,
and third concepts at least partially from the detected first,
second, and third relationships; and wherein the fourth set of
computer instructions for generating further comprises a tenth set
of computer instructions for generating a chain hypothesis
comprising the previously unknown combination of the first, second,
and third concepts.
21. A computer program product according to claim 18, wherein at
least a portion of the plurality of documents comprises at least
one of a first concept, a second concept, and a plurality of
linking concepts, and wherein the third set of computer
instructions for parsing further comprises: an eleventh set of
computer instructions for detecting a first relationship between
the first concept and a first portion of the plurality of linking
concepts; a twelfth set of computer instructions for detecting a
second relationship between the second concept and a second portion
of the plurality of linking concepts; and a thirteenth set of
computer instructions for determining a potential substitution
relationship between the first concept and the second concept at
least partially from the detected first and second relationships
and a number of overlapping concepts present in both the first
portion and the second portion of the plurality of linking
concepts; and wherein the fourth set of computer instructions for
generating further comprises a fourteenth set of computer
instructions for generating a substitution hypothesis comprising
the previously unknown combination of at least one of the first and
second concepts with a portion of the plurality of linking concepts
not present in the number of overlapping concepts.
22. A computer program product according to claim 21, wherein the
third set of computer instructions for parsing further comprises a
fifteenth set of computer instructions for determining a strength
of the potential substitution relationship between the first and
second concepts based at least in part on the number of overlapping
concepts present in both the first portion of the second portion of
the plurality of linking concepts.
23. A computer program product according to claim 18, wherein at
least a portion of the plurality of documents comprises at least
one of a first concept, a second concept, and a third concept, and
wherein the third set of computer instructions for parsing further
comprises: a sixteenth set of computer instructions for detecting a
first relationship between the first concept and the second
concept; a seventeenth set of computer instructions for detecting a
second relationship between the second concept and the third
concept; and an eighteenth set of computer instructions for
determining a potential pairwise relationship between the first
concept and the third concept at least partially from the detected
first and second relationships; and wherein the fourth set of
computer instructions for generating further comprises a nineteenth
set of computer instructions for generating a pairwise hypothesis
comprising the previously unknown combination of the first and
third concepts.
24. A computer program product according to claim 23, wherein the
third set of computer instructions for parsing further comprises a
twentieth set of computer instructions for assessing a strength of
the potential relationship between the first and third concepts at
least partially from a known secondary relationship between the
first and third concepts.
25. A computer program product according to claim 24, wherein the
known secondary relationship comprises a common semantic category
including both the first and third concepts.
26. A computer program product according to claim 25, wherein the
relationship rule comprises the common semantic category.
27. A computer program product according to claim 18, further
comprising: a twenty-first set of computer instructions for
identifying a portion of the plurality of documents in the data
repository associated with a user; a twenty-second set of computer
instructions for creating a user profile based at least in part on
the identified documents, the user profile being indicative of a
user information need; and a twenty-third set of computer
instructions for modifying the hypothesis in response to the user
profile such that the modified hypothesis at least partially
corresponds to the user information need.
28. A computer program product according to claim 27, wherein the
user profile comprises at least one semantic category, the computer
program product further comprising a twenty-fourth set of computer
instructions for filtering the presented hypothesis such that the
previously unknown combination includes only at least one phrase
and at least one concept corresponding substantially to the at
least one semantic category.
29. A computer program product according to claim 18, wherein fifth
set of computer instructions for presenting the hypothesis
comprises a twenty-fifth set of computer instructions for
presenting a display to a user comprising a visual representation
of the previously unknown combination including one of at least one
of the plurality of phrases and at least one of the plurality of
concepts.
30. A computer program product according to claim 29, wherein the
visual representation comprises an interactive icon configured to
be selectable by the user, the interactive icon being further
configured to modify the display when selected by the user.
31. A computer program product according to claim 18, wherein the
system for extracting relationships is selected from the group
consisting of: a vocabulary database corresponding to a selected
subject area; a predetermined lexicon; a semantic network; a
semantic database; a metathesaurus; and combinations thereof.
32. A computer program product according to claim 18, wherein the
data repository is selected from the group consisting of: a
biomedical literature database; a medical records database; a
chemical literature database; a computer science literature
database; a physics literature database; a legal literature
database; a psychology literature database; a social science
literature database; a news periodical database; a business journal
database; and combinations thereof.
33. A computer program product according to claim 18, further
comprising a twenty-sixth set of computer instructions for storing
the determined relationship rule for later or repeated use in the
subsequent parsing step.
34. A computer program product according to claim 18, further
comprising a twenty-seventh set of computer instructions for
verifying the hypothesis using at least one independent
resource.
35. A system for mining information from a data repository
comprising a plurality of documents to produce a hypothesis, the
system comprising: a system for extracting relationships comprising
a plurality of phrases and a plurality of concepts; a host
computing element in communication with said system for extracting
relationships for accessing said system for extracting
relationships; wherein said host computing element determines a
relationship rule defining a relationship among at least a portion
of the plurality of phrases and at least a portion of the plurality
of concepts; wherein said host computing element parses the
plurality of documents in a data repository according to the
relationship rule, the plurality of documents each comprising at
least a portion of one of the plurality of phrases and the
plurality of concepts; and wherein said host computing element
generates the hypothesis comprising a previously unknown
combination, the previously unknown combination including one of at
least one of the plurality of phrases and at least one of the
plurality of concepts, the previously unknown combination being at
least partially determined from the parsed plurality of documents;
and a user interface in communication with said host computing
element, said user interface configured for presenting the
hypothesis so as to indicate the previously unknown
combination.
36. A system according to claim 35, wherein said host computing
element determines a relationship rule selected from the group
consisting of: an assignment of at least one of the plurality of
phrases to at least one of the plurality of concepts; an assignment
of at least one of the plurality of phrases to a relationship
identifier, the relationship identifier linking a first one of the
plurality of concepts to a second one of the plurality of concepts;
an assignment of at least one of the plurality of concepts to a
semantic category; an arrangement of at least a portion of the
plurality of concepts in a hierarchical relationship, wherein a
first one of the portion of concepts comprises a child concept and
a second one of the portion of concepts comprises a parent concept;
and combinations thereof.
37. A system according to claim 35, wherein said host computing
element identifies a portion of the plurality of documents in the
data repository associated with a user; wherein said host computing
element creates a user profile based at least in part on the
identified documents, the user profile being indicative of a user
information need; and wherein said host computing element modifies
the hypothesis in response to the user profile such that the
modified hypothesis at least partially corresponds to the user
information need.
38. A system according to claim 37, wherein the user profile
comprises at least one semantic category and wherein said host
computing element filters the presented hypothesis such that the
previously unknown combination includes only at least one phrase
and at least one concept corresponding substantially to the at
least one semantic category.
39. A system according to claim 35, wherein said user interface
presents the hypothesis as a display to a user comprising a visual
representation of the previously unknown combination including one
of at least one of the plurality of phrases and at least one of the
plurality of concepts.
40. A system according to claim 39, wherein said user interface
presents the visual representation comprising an interactive icon
configured to be selectable by the user, the interactive icon being
further configured to modify the display when selected by the
user.
41. A system according to claim 35, wherein said system for
extracting relationships is selected from the group consisting of:
a vocabulary database corresponding to a selected subject area; a
predetermined lexicon; a semantic network; a semantic database; a
metathesaurus; and combinations thereof.
42. A system according to claim 35, wherein said host computing
element is in communication with a data repository selected from
the group consisting of: a biomedical literature database; a
medical records database; a chemical literature database; a
computer science literature database; a physics literature
database; a legal literature database; a psychology literature
database; a social science literature database; a news periodical
database; a business journal database; and combinations
thereof.
43. A system according to claim 35, further comprising a memory
device in communication with said host computing element, said
memory device configured for storing the determined relationship
rule for later or repeated use in the subsequent parsing step.
44. A system according to claim 35, further comprising an
independent resource in communication with said host computing
device, said independent resource configured for verifying the
generated hypothesis.
Description
CROSS-REFERENCE
[0001] This application is a continuation of co-pending
International Application No. PCT/US2007/063983, filed Mar. 14,
2007, the contents of which are incorporated by reference in
entirety, and which claims priority to U.S. Patent Application Ser.
No. 60/782,935, filed Mar. 15, 2006.
FIELD OF THE INVENTION
[0002] Various embodiments of the present invention relate
generally to the field of query generation, information retrieval,
and data mining with respect to data repositories (such as
literature and/or record databases, for example).
BACKGROUND
[0003] The wide volume of scientific literature provides a goldmine
for the extraction of useful knowledge and information in support
of practical decision-making as well as academic research. However,
many of the currently-available search engines querying various
data repositories offer very limited searching, indexing and
categorizing functionalities that fall short of the capabilities to
fully explore and utilize such data resources. As an example,
Medical Literature Analysis and Retrieval system Online ("MEDLINE")
(the U.S. National Library of Medicine's (NLM) premier
bibliographic database), contains approximately 13 million journal
articles in life sciences with citation information of and
references to concentration on biomedicine. Each year the
exponentially-increasing amount of biomedical literature in the
MEDLINE database poses tremendous challenges to the ultimate users
of those databases, typically scientific researchers. Currently a
small number of academic papers have proposed and discussed the
idea of generating hypotheses from biomedical literature in
databases like MEDLINE in a systematic way so as to facilitate
biomedical researchers' discovery and even possibly suggest
potential research directions. However, existing work in this area
has focused only on generating one type of hypothesis, namely, "a
potential pair wise relation", which does not fully represent most
patterns and rules embedded in the document corpus.
[0004] Furthermore, existing querying and/or discovery processes as
discussed in these papers are usually conducted in a "retrieval
mode" which necessarily implies that users must know what knowledge
and information they need so that they can provide at least one
concept of their search interest to initiate the discovery process.
In many cases, however, users may not know how to express their
knowledge and information needs or even may not realize and/or
appreciate an existing information need. For instance, a given
biomedical researcher may never be independently motivated to
research a relation between a certain gene and a certain disease
that as a matter of fact may be predicted from existing
relationships within several recent publications. In addition,
different types of users always have different knowledge and
information needs based on their respective backgrounds and/or
profiles, even if they issue the same query to the same database.
For example, for a query of "Diabetics" to MEDLINE, a biomedical
researcher may want to acquire some potential research directions
for this disease, a medical practitioner may wish to keep current
on state-of-the-art diagnosis progress, and a patient may want to
ensure that the treatment plan prescribed by her physician is
reasonable in light of current treatment options. In summary, each
user brings different levels of expertise and different interests
to a given query of a given database. Currently available query
systems do not address this issue.
[0005] In light of the above, a need exists for an improved method,
system and computer program product for automatically generating
different types of hypotheses from data repositories. There is a
further need for automatic analysis of a user's scope of interest
and effective delivery of hypotheses, information and knowledge
that match the user's interests and information needs.
BRIEF SUMMARY
[0006] The needs outlined above are met by the present invention
which, in various embodiments, provides systems and methods that
overcome many of the technical problems discussed above, as well
other technical problems, with regard to the generation and display
of potential hypotheses based on written works selected from a
database. Specifically in one embodiment, the invention provides a
method and computer program product for generating a hypothesis. In
some embodiments, the method and/or computer program product may
comprise accessing a system for extracting relationships, wherein
the system for extracting relationships comprises a plurality of
phrases and a plurality of concepts. In various embodiments, the
system for extracting relationships may include, but is not limited
to: a vocabulary database corresponding to a selected subject area;
a predetermined lexicon; a semantic network; semantic database; a
metathesaurus; and combinations of such systems.
[0007] The method and/or computer program product also comprises
determining a relationship rule defining a relationship among at
least a portion of the plurality of phrases and at least a portion
of the plurality of concepts. In some embodiments, the determined
relationship rule may include, but is not limited to: an assignment
of at least one of the plurality of phrases to at least one of the
plurality of concepts; an assignment of at least one of the
plurality of phrases to a relationship identifier, the relationship
identifier linking a first one of the plurality of concepts to a
second one of the plurality of concepts; an assignment of at least
one of the plurality of concepts to a semantic category; an
arrangement of at least a portion of the plurality of concepts in a
hierarchical relationship, wherein a first one of the portion of
concepts comprises a child concept and a second one of the portion
of concepts comprises a parent concept. Some embodiments may
further comprise a step for storing the determined relationship
rule for later or repeated use in a subsequent parsing step as
described further herein.
[0008] The method and/or computer program product may also comprise
parsing a plurality of documents in a data repository according to
the relationship rule, wherein the plurality of documents each
comprise at least a portion of one of the plurality of phrases and
the plurality of concepts. In various embodiments, the data
repository may include, but is not limited to: a biomedical
literature database; a medical records database; a chemical
literature database; a computer science literature database; a
physics literature database; a legal literature database; a
psychology literature database; a social science literature
database; a news periodical database; a business journal database;
and combinations of such data repositories.
[0009] The method and/or computer program product embodiments may
also comprise steps for generating a hypothesis comprising a
previously unknown combination, wherein the previously unknown
combination includes one of at least one of the plurality of
phrases and at least one of the plurality of concepts. The
previously unknown combination may be at least partially determined
from the parsed plurality of documents.
[0010] In some embodiments, at least a portion of the plurality of
documents may comprise at least one of a first concept, a second
concept, and a third concept. According to some such embodiments,
the parsing step described herein may further comprise: detecting a
first relationship between the first and second concepts; detecting
a second relationship between the second and third concepts;
detecting a third relationship between the first and third
concepts; and determining a potential chain relationship among the
first second, and third concepts at least partially from the
detected first, second, and third relationships. Furthermore,
according to some such embodiments, the step for generating the
hypothesis may also comprise generating a chain hypothesis
comprising the previously unknown combination of the first, second,
and third concepts.
[0011] In some additional embodiments, at least a portion of the
plurality of documents may comprise at least one of a first
concept, a second concept, and a plurality of linking concepts.
According to some such embodiments, the parsing step may further
comprise: detecting a first relationship between the first concept
and a first portion of the plurality of linking concepts; detecting
a second relationship between the second concept and a second
portion of the plurality of linking concepts; and determining a
potential substitution relationship between the first concept and
the second concept at least partially from the detected first and
second relationships and a number of overlapping concepts present
in both the first portion and the second portion of the plurality
of linking concepts. Furthermore, in some such embodiments, the
step for generating the hypothesis may further comprise generating
a substitution hypothesis comprising the previously unknown
combination of at least one of the first and second concepts with a
portion of the plurality of linking concepts not present in the
number of overlapping concepts. Furthermore, in some such
embodiments, the parsing step may further comprise determining a
strength of the potential substitution relationship between the
first and second concepts based at least in part on the number of
concepts present in both the first portion of the second portion of
the plurality of linking concepts.
[0012] Furthermore, in some other method and/or computer program
embodiments, at least a portion of the plurality of documents
comprises at least one of a first concept, a second concept, and a
third concept. In some such embodiments, the parsing step may
further comprise: detecting a first relationship between the first
concept and the second concept; detecting a second relationship
between the second concept and the third concept; and determining a
potential pairwise relationship between the first concept and the
third concept at least partially from the detected first and second
relationships. In some such embodiments, the step for generating
the hypothesis may further comprise generating a pairwise
hypothesis comprising the previously unknown combination of the
first and third concepts. Furthermore, in some such embodiments,
the parsing step may further comprise assessing a strength of the
potential relationship between the first and third concepts at
least partially from a known secondary relationship between the
first and third concepts. In various embodiments, the known
secondary relationship may comprise a common semantic category
including both the first and third concepts. Furthermore, in some
such embodiments, the relationship rule generated in the
determining step may comprise the common semantic category used to
assess the strength of the potential pairwise relationship between
the first and third concepts.
[0013] Various method and/or computer program products may also
comprise presenting the hypothesis so as to indicate the previously
unknown combination. In some such embodiments, the step for
presenting the hypothesis may comprise presenting a display to a
user comprising a visual representation of the previously unknown
combination including one of at least one of the plurality of
phrases and at least one of the plurality of concepts. Furthermore,
according to some embodiments, the visual representation presented
in the display comprises an interactive icon configured to be
selectable by the user. According to such embodiments, the
interactive icon may be further configured to modify the display
when selected by the user.
[0014] Various method and/or computer program product embodiments
of the present invention may also comprise various steps for
optimizing the generated hypothesis to meet the information needs
of a particular user. For example, some embodiments may comprise
steps for identifying a portion of the plurality of documents in
the data repository associated with the user, and creating a user
profile based at least in part on the identified documents. The
created user profile may be indicative of a user information need.
Some such embodiments may further comprise a step for modifying the
hypothesis in response to the user profile such that the modified
hypothesis at least partially corresponds to the user information
need. In some such embodiments, the created user profile may
comprise at least one semantic category and the method and/or
computer program product may further comprise a step for filtering
the presented hypothesis such that the previously unknown
combination includes only at least one phrase and at least one
concept corresponding substantially to the at least one semantic
category present in the user profile.
[0015] Various embodiments of the present invention may also
provide systems for mining information from a data repository
comprising a plurality of documents to produce a hypothesis. The
data repository may include, but is not limited to: a biomedical
literature database; a medical records database; a chemical
literature database; a computer science literature database; a
physics literature database; a legal literature database; a
psychology literature database; a social science literature
database; a news periodical database; a business journal database;
and a combination of such databases.
[0016] The system comprises a system for extracting relationships
comprising a plurality of phrases and a plurality of concepts. The
system for extracting relationships may include, but is not limited
to: a vocabulary database corresponding to a selected subject area;
a predetermined lexicon; a semantic network; a metathesaurus; and
combinations of such system for extracting relationships. Various
system embodiments further comprise a host computing element in
communication with the system for extracting relationships for
accessing the system. The host computing element is configured for
determining a relationship rule defining a relationship among at
least a portion of the plurality of phrases and at least a portion
of the plurality of concepts. The host computing element may be
configured for determining a relationship rule that includes, but
is not limited to: an assignment of at least one of the plurality
of phrases to at least one of the plurality of concepts; an
assignment of at least one of the plurality of phrases to a
relationship identifier, wherein the relationship identifier links
a first one of the plurality of concepts to a second one of the
plurality of concepts; an assignment of at least one of the
plurality of concepts to a semantic category; an arrangement of at
least a portion of the plurality of concepts in a hierarchical
relationship, wherein a first one of the portion of concepts
comprises a child concept and a second one of the portion of
concepts comprises a parent concept; and/or a combination of such
relationship rules. Some system embodiments may further comprise a
memory device in communication with the host computing element,
wherein the memory device is configured for storing the determined
relationship rule for later or repeated use in a subsequent parsing
step.
[0017] Furthermore, the host computing element may also be
configured for parsing the plurality of documents in a literature
database according to the relationship rule, wherein the plurality
of documents each comprises at least a portion of one of the
plurality of phrases and the plurality of concepts. Furthermore,
the host computing element is configured for generating the
hypothesis comprising a previously unknown combination including
one of at least one of the plurality of phrases and at least one of
the plurality of concepts. The previously unknown combination
generated by the host computing element may be at least partially
determined from the parsed plurality of documents. Furthermore,
some system embodiments may also comprise a user interface in
communication with the host computing element, wherein the user
interface is configured for presenting the hypothesis so as to
indicate the previously unknown combination. In some system
embodiments, the user interface may present the hypothesis as a
display to a user comprising a visual representation of the
previously unknown combination including one of at least one of the
plurality of phrases and at least one of the plurality of concepts.
In some such system embodiments, the user interface may present the
visual representation as an interactive icon configured to be
selectable by the user. The interactive icon may be further
configured to modify the display when selected by the user.
[0018] In some system embodiments, the host computing element may
also be configured for customizing and/or optimizing the presented
hypothesis for a particular user. For example, in some embodiments,
the host computing element may identify a portion of the plurality
of documents in the data repository associated with a user and
thereby create a user profile based at least in part on the
identified documents. The user profile created by the host
computing element in such embodiments may be indicative of a user
information need. The created user profile may also comprise at
least one semantic category and the host computing element may
therefore filter the presented hypothesis such that the previously
unknown combination includes only at least one phrase and at least
one concept corresponding substantially to the at least one
semantic category. Furthermore, in some system embodiments, the
host computing element may modify the hypothesis in response to the
user profile such that the modified hypothesis at least partially
corresponds to the user information need.
[0019] Thus the systems, methods, and computer program products for
generating and displaying potential hypotheses based on written
works selected from a database, as described in the embodiments of
the present invention, provide many advantages that may include,
but are not limited to: providing a conceptual research system
configured for mining raw materials from the large amounts of
literature in a given data repository to generate potential
hypotheses for future directed research; providing a research
system and method capable of uncovering previously unknown and/or
unappreciated combinations of concepts and/or phrases in a data
repository; providing a conceptual research system capable of
defining a user profile that is indicative of a particular user's
information needs and modifying a proposed conceptual research
hypothesis based at least in part on the defined user profile; and
providing a conceptual research concept that is configurable for
mining usable data (and generating proposed hypotheses) in a
variety of different types of data repositories.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0020] In the description below, reference is made to the
accompanying drawings, which are not necessarily drawn to scale,
and wherein:
[0021] FIG. 1 provides a non-limiting schematic overview of the
structure and components of a system and method for automatically
generating different types of hypotheses from biomedical literature
according to one embodiment of the present invention;
[0022] FIGS. 2A-D illustrate non-limiting data structures and
relationship rules resulting from a generating step, according to
one embodiment of the present invention;
[0023] FIGS. 3A-B illustrate non-limiting data structures and
previously unknown combinations resulting from a document parsing
step, according to one embodiment of the present invention;
[0024] FIGS. 4A-C illustrate non-limiting schematics of algorithms
for generating three different types of hypotheses, according to
one embodiment of the present invention;
[0025] FIG. 5 illustrates a process of providing personalized
discovery results to match different users' information and
knowledge needs that may be implemented according to one embodiment
of the present invention;
[0026] FIGS. 6A-C illustrate non-limiting schematics of a display
comprising a visual representation of the previously unknown
combination including one of at least one of the plurality of
phrases and at least one of the plurality of concepts, according to
one embodiment of the present invention;
[0027] FIGS. 7A-B illustrate a non-limiting schematic of a host
computing element and system useable for implementing various
embodiments of the present invention; and
[0028] FIG. 8 illustrates a non-limiting schematic of a
hypothesis-generating system, according to one embodiment of the
present invention, including a hypothesis verification module in
communication therewith.
DETAILED DESCRIPTION OF THE INVENTION
[0029] The present inventions will now be described with reference
to the accompanying drawings, in which some, but not all
embodiments of the inventions are shown. Indeed, these inventions
may be embodied in many different forms and should not be construed
as limited to the embodiments set forth herein. Rather, these
embodiments are provided so that this disclosure will satisfy
applicable legal requirements. Like numbers refer to like elements
throughout.
[0030] As shown in FIGS. 1-5, 6A-C, various embodiments of the
present invention provide an improved system, method, and computer
program product for automatically and/or systematically generating
different types of hypotheses from data repositories 20.
Specifically, the embodiments as presented in the non-limiting
figures are configured for generating three types of hypotheses:
(1) potential pair-wise relations (see FIG. 1, element 136, for
example), (2) potential chain relations (see FIG. 1, element 132,
for example); and (3) potential substitution relations (see FIG. 1,
element 134, for example). Importantly, the hypotheses generated by
the various embodiments described herein include previously unknown
combinations of concepts and/or phrases that may be at least
partially determined from a parsed plurality of documents within a
data repository 20.
[0031] Many of the exemplary embodiments described herein relate
generally to the generation of hypotheses related to biomedical
literature and/or research such that the various embodiments
described herein may be capable of achieving the technical effect
of producing proposed hypotheses that may lead to breakthroughs in
the application of certain combinations of drugs to certain
diseases or disease states. It should be understood, however, that
the various embodiments described herein may be used to parse
and/or mine other types of data repositories 20 for potentially
groundbreaking research topics. For example, the various
embodiments herein may be configured for parsing and/or analyzing
documents found in data repositories 20 that may include, but are
not limited to: biomedical literature databases; medical records
databases; chemical literature databases; computer science
literature databases; physics literature databases; legal
literature databases; psychology literature databases; social
science literature databases; news periodical databases; business
journal databases; and combinations of such databases. The term
"document" as used herein may include, but is not limited to:
published journal articles; text strings (such as, for example, a
physician's comments in a medical record entry); file records (such
as a particular medical record); resumes and/or curriculum vitae; a
thesis; a numerical string of data; a patent document (including,
for example, issued patents, patent applications, and
publicly-available patent prosecution documents); online journal
articles; internet web pages; material safety data sheets;
pharmaceutical and/or chemical data sheets; advertisements;
reported court case and/or administrative proceedings; news
articles; letters; and combinations of such materials.
[0032] It should be further understood that the generated
hypotheses may be implied by patterns embedded in the document
corpus of such data repositories such that appropriate relationship
rules (as described further herein) may be determined and
subsequently applied to the data repository in a substantially
automatic "mining mode" to generate hypotheses that may be
completely beyond the expectation of a system user.
[0033] In accordance with another embodiment, the present invention
analyzes various semantic relations among the concepts involved in
the identified hypotheses and provides visualization of these
relations in an intuitive way. Particular documents in support of
each of these relations may be identified to the system users for
their further research. In addition, specific search results can be
customized for particular researchers based on their specified or
potential interests. In operation, a given researcher's interests
are identified by automatically analyzing any prior publications or
papers related to this researcher. Furthermore, in some
embodiments, search results are verified using an independent
resource.
[0034] As shown in FIG. 1, various embodiments of the present
invention may provide a method for generating a hypothesis (such
as, for example, potential chain hypotheses 132, potential
substitution hypotheses 134, and/or potential pairwise hypotheses
136). The method may comprise, for example, step 110 for accessing
a system for extracting relationships 10 comprising a plurality of
phrases and a plurality of concepts and determining a relationship
rule (see elements 111, 112, 113, 114, for example) defining a
relationship among at least a portion of the plurality of phrases
and at least a portion of the plurality of concepts. The system for
extracting relationships 10 may include, but is not limited to: a
vocabulary database corresponding to a selected subject area; a
predetermined lexicon; a semantic network; a metathesaurus; and/or
combinations of such systems. For example, in some embodiments,
wherein the method is used to parse biomedical literature in search
of potential research hypotheses, the system for extracting
relationships 10 may comprise the Unified Medical Language System
(UMLS), which may further comprise component databases including,
but not limited to: the Metathesaurus.RTM.; the Semantic Network;
and/or the SPECIALIST lexicon. It should be understood that the
Metathesaurus.RTM. may comprise a large vocabulary database
containing information about biomedical and health-related
concepts, their various names, and the various relationships among
them. The Semantic Network provides a substantially consistent
categorization (i.e. a "Semantic Type") of all concepts represented
in the UMLS Metathesaurus.RTM. and defines a set of relationships
that may hold between the various semantic types. Furthermore, the
SPECIALIST lexicon is a general English lexicon comprising a
variety of biomedical terminology. The system for extracting
relationships 10 may also, in some embodiments, define a
hierarchical structure among the various phrases and/or concepts
included therein. For example, in embodiments, wherein the system
for extracting relationships 10 comprises the UMLS, the system for
extracting relationships 10 may further comprise MeSH, which
provides a controlled vocabulary thesaurus configured for arranging
descriptors (terms and/or phrases, for example) in a hierarchical
structure. For example, MeSH may comprise descriptors organized in
a plurality of categories that may include, but are not limited to:
(A) anatomic terms; (B) organisms; (C) diseases; (D) drugs and/or
chemicals; and combinations of such descriptors. Each category of
descriptors in MeSH may be further subdivided into a variety of
subcategories.
[0035] Referring to FIG. 1, the relationship rule generated in step
110 may include, but is not limited to: an assignment of at least
one of the plurality of phrases to at least one of the plurality of
concepts (see element 112 and FIG. 2A); an assignment of at least
one of the plurality of phrases to a relationship identifier (see
element 114 and FIG. 2B, wherein the relationship identifier may
link a first one of the plurality of concepts to a second one of
the plurality of concepts); an assignment of at least one of the
plurality of concepts to a semantic category (see element 111 and
FIG. 2C, for example); an arrangement of at least a portion of the
plurality of concepts in a hierarchical relationship (see element
113 and FIG. 2D for example), wherein a first one of the portion of
concepts comprises a child concept and a second one of the portion
of concepts comprises a parent concept; and/or combinations of such
relationship rules.
[0036] Referring to FIGS. 2A-2D, one or more of the relationship
rules generated in step 110 may, in some embodiment, be presented
and/or stored in a tabular data structure. The tables shown in FIG.
2A-2D may be presented to a user in some embodiments, to supplement
the presentation of the hypothesis (see step 140 and FIGS. 6A-6C,
for example) the so as to indicate the relationship rule or rules
underlying the previously unknown combination and corresponding
hypothesis. In other embodiments, the tabulated relationship rules
shown, for example, in FIGS. 2A-2D may serve as modular data
structures that may be stored in a memory device (see elements 722,
724 and 728 of FIG. 7B) for later or repeated use in a subsequent
parsing step 120. Thus, in some such embodiments, the stored
relationship rules (111, 112, 113, 114) may be maintained as a
pre-computed set of relationship rules that may be used to
efficiently parse (step 120) a plurality of documents stored in a
particular data repository 20 to which the relationship rules may
most likely apply in order to generate (step 130) and present (step
140) potential hypotheses to a user comprising previously unknown
combinations of phrases and concepts. For example, various methods
of the present invention may provide "conceptual research" services
that may provide access to the stored relationship rules 111, 112,
113, 114 (see also, FIGS. 2A-2D, for example) to a user or client
that may be in communication with a host computer 700 (see FIG. 7A,
for example) configured for performing the various method and/or
computer program steps outlined in FIG. 1. Such users may thus
define and access stored relationship rules that apply to various
systems for extracting relationships 10 and/or data repositories 20
that may be pertinent to their own research areas of interest. For
example, a biomedical researcher may "subscribe" to a relationship
rule service that may provide efficient document parsing 120 and
hypothesis generation 130 services for a biomedical data repository
(such as, for example, MEDLINE). Similarly, an attorney or law
professor might subscribe to a relationship rule database (stored
for example, in various memory devices 722, 724 and 728 of a host
computing element 700) that pertains more particularly to a data
repository 20 comprising case reporters and/or legal journals such
that the system and/or method embodiments of the present invention
may more efficiently parse 120 the documents stored therein to
generate 130 a proposed legal hypothesis gleaned from the semantic
relationships outlined in the stored relationship rules 111, 112,
113, 114 and as specifically applied to the appropriate data
repository 20.
[0037] As shown in FIG. 1, various method embodiments may further
comprise step 120 for parsing a plurality of documents in a data
repository 20 according one or more of the generated the
relationship rules (111, 112, 113, and 114). The plurality of
documents in the data repository 20 may each comprise at least a
portion of one of the plurality of phrases and the plurality of
concepts. As used herein, the term "parsing" refers generally to
the breaking down of various documents into component key phrases
and/or concepts that may be comparable to the phrases and/or
concepts present in a compatible system for extracting
relationships 10. For example, a system for extracting
relationships 10 such as the UMLS may be used as the raw material
for generating various relationship rules (111, 112, 113, and 114)
that may be used to parse a UMLS-compatible data repository (such
as MEDLINE or another compatible database of biomedical
literature).
[0038] The parsing step 120 may comprise performing various
quantitative and/or qualitative operations on the component key
phrases and/or concepts. As shown generally in FIG. 1, the parsing
step 120 may result in products that may include, but are not
limited to: concept-concept relationships 122 (generated, for
example, when a threshold number of documents within the data
repository 20 tie two or more concepts together); concept document
relationships 124 (which may map concepts and/or phrases of
interest to those documents in which they appear at a frequency
that exceeds a selected frequency). It should be understood that
threshold numbers and/or frequencies of concepts as described
herein may be selected by a user (using, for example, a host
computing element 700 as described further herein with respect to
FIGS. 7A and 7B). In other embodiments, threshold numbers and/or
frequencies of concepts as described herein may be pre-computed
and/or pre-assigned and stored in a memory device 722, 724, 728
associated with a host computing element 700 (as described further
herein with respect to FIGS. 7A and 7B).
[0039] Various method embodiments may further comprise step 130 for
generating a hypothesis (that may include, but is not limited to:
potential chain hypotheses 132, potential substitution hypotheses
134, and/or potential pairwise hypotheses 136). The hypotheses
generated in step 130 may comprise a previously unknown combination
including one of at least one of the plurality of phrases and at
least one of the plurality of concepts and may thus be used as the
basis for "conceptual research" wherein a researcher is presented
with a potential hypothesis that suggests and/or identifies a
research topic or direction that has not been addressed in previous
research (as documented by the documents in the data repository
20). As described herein, the previously unknown combination
(embodied in the generated hypothesis) may be at least partially
determined from the parsed (see step 120, for example) plurality of
documents present in the data repository 20.
[0040] In some embodiments, a "chain" relationship may be
established in the parsing step 120 among three or more previously
unrelated phrases and/or concepts. For example, at least a portion
of the plurality of documents present in the data repository 20 may
comprise at least one of a first concept, a second concept, and a
third concept. In some such embodiments, the parsing step 120
(utilizing one or more previously-identified and/or stored
relationship rules (see elements 111, 112, 113, 114, for example))
may comprise: (1) detecting a first relationship between the first
and second concepts; (2) detecting a second relationship between
the second and third concepts; (3) detecting a third relationship
between the first and third concepts; and (4) determining a
potential chain relationship 132 among the first second, and third
concepts at least partially from the detected first, second, and
third relationships. In such embodiments, the generating step 130
may further comprise generating a chain hypothesis 132 comprising
the previously unknown combination of the first, second, and third
concepts in a "chain" combination.
[0041] For example, such a "chain" relationship may be established
among three medical concepts (such as three therapeutic compounds
(A, B, C) belonging to the same general class of drugs (as
indicated, for example, by a relationship rule comprising an
assignment of at least one of the plurality of concepts to a
semantic category outlining the class of drug (see relationship
rule 111, in FIG. 1, for example))). The parsing step 120 may
indicate that therapeutic compounds A and B have a strong pairwise
relationship (see element 122 tying the concepts A and B,
together). The parsing step 120 may further indicate that
therapeutic compounds B and C and A and C have strong pairwise
relationships (see element 122 indicating an assessment of the
strength of a pairwise relationship between two particular
concepts). The parsing step 120 may further indicate that no
particular document studies all three therapeutic compounds (A, B
and C) together (as indicated, for example by element 124 which
comprises an evaluation of the relationships between one or more
concepts and each document in the data repository 20). In such an
example, step 130 may comprise generating a potential chain
hypothesis 132 reporting the previously unknown combination of the
therapeutic compounds A, B and C. The hypothesis may, in some
embodiments, tie the chain hypothesis to a particular semantic type
(see element 111, for example) corresponding to a particular
disease state that may respond most favorably to treatment with the
combination of therapeutic compounds A, B and C as indicated by the
documents in the data repository 20.
[0042] FIG. 4A shows a detailed depiction of various exemplary
subroutines that may be used to accomplish the hypothesis
generating step 130 for a potential chain hypothesis 132 comprising
concepts A, B and C. For example, given a set of interesting
semantic types (IS) in the system for extracting relationships 10,
the generating step 130 may comprise advancing through each concept
A in a particular set of concept-concept relationships 122 and
determining: (1) if concept A is a specific concept (according to a
concept-hierarchical relationship rule 113, for example); and (2)
if concept A's semantic type (according to a concept-semantic type
relationship rule 111, for example) belongs in the input set of
interesting semantic types IS. If the answer to inquiry (2) is
positive, the generating step 130 may further comprise retrieving
concept A's related concepts (by querying relationship rule 122,
for example), and denoting these results as A-relates. Then, for
each concept (B) in the group of A-relates, the generating step 130
may comprise determining: (1) if concept B's semantic type (by
consulting relationship rule 111, for example) is the same as
concept A's semantic type; and (2) if B is a specific concept
(according to a concept-hierarchical relationship rule 113, for
example). If both (1) and (2) are positive, then the generating
step 130 may further comprise retrieving concept B's related
concepts (according to relationship rule 122, for example) and
denoting these results as B-relates. In order to complete the
"chain" of concepts A, B and C, the generating step 130 may further
comprise, for each concept C among the B-relates, determining if
concept C's semantic type (according to relationship rule 111, for
example) is the same as concept A's. If so, then the generating
step 130 may further comprise determining if concept C is a
specific concept (according to a concept-hierarchical relationship
rule 113, for example) and consulting the concept-document
relationships 124 in the pertinent data repository 20 (uncovered,
for example in the parsing step 120, for example) to retrieve the
number of documents in the data repository 20 that contain some
mention of concepts A, B and C in combination. If the retrieved
number of documents is less than some selected threshold level
(that may be pre-defined as a threshold for a "previously unknown"
and/or "previously unappreciated" chain combination), then the
generating step 130 may provide an output comprising a proposed
chain hypothesis 132 that may be presented to a user, for example,
in step 140.
[0043] According to some embodiments, a "substitution" relationship
may be established in the parsing step 120 among two or more
previously unrelated phrases and/or concepts. For example, at least
a portion of the plurality of documents present in the data
repository 20 may comprise at least one of a first concept, a
second concept, and a plurality of linking concepts. In some such
embodiments, the parsing step 120 (utilizing one or more
previously-identified relationship rules (see elements 111, 112,
113, 114, for example)) may comprise: (1) detecting a first
relationship between the first concept and a first portion of the
plurality of linking concepts; (2) detecting a second relationship
between the second concept and the second portion of the plurality
of linking concepts; and (3) determining a potential substitution
relationship 134 between the first concept and the second concept
at least partially from the detected first and second relationships
and a number of overlapping concepts present in both the first
portion and the second portion of the plurality of linking
concepts. In such embodiments, the generating step 130 may further
comprise generating a substitution hypothesis 134 comprising the
previously unknown combination of at least one of the first and
second concepts with a portion of the plurality of linking concepts
not present in the number of overlapping concepts. In some such
method embodiments, the parsing step 120 may further comprise
determining a strength of the potential relationship between the
first and second concepts in the proposed substitution hypothesis
134 based at least in part on the number of concepts present in
both the first portion of the second portion of the plurality of
linking concepts.
[0044] For example, such a substitution relationship may be
established among: (1) a pair of medical concepts (such as two
therapeutic compounds (A and B); (2) a list of component compounds
present in both therapeutic compounds A and B (X1, X2, X3, . . . ,
Xm); and (3) a disease or condition (Y) that is reported as
responding positively to treatment with therapeutic compound A. The
parsing step 120 may first comprise applying a relationship rule
comprising an assignment of therapeutic compounds A and B to a
semantic category outlining the common class of drug (see
relationship rule 111, in FIG. 1, for example))). The parsing step
120 may further indicate that therapeutic compound A has a strong
relationship with the list of component compounds (X1, X2, X3, Xm)
(see, for example element 112 (tying phrases indicative of the
component compounds (X1, X2, X3, Xm) to a concept ID indicative of
compound A)). The parsing step 120 may further indicate that
therapeutic compound B also has a strong relationship with the list
of component compounds (X1, X2, X3, . . . , Xm) (see, for example
element 112 (tying phrases indicative of the component compounds
(X1, X2, X3, . . . , Xm) to a concept ID indicative of compound
B)). The parsing step 120 may further indicate that the data
repository 20 generally indicates that therapeutic compound A
correlates strongly to disease or condition Y (as indicated, for
example by element 122 which comprises an evaluation of the
relationships between one or more concepts present in the data
repository 20). In such an example, step 130 may comprise
generating a potential substitution hypothesis 132 reporting the
previously unknown combination of the therapeutic compound B with
the disease condition Y (i.e. therapeutic compound B may be
reported as a potential substitute for therapeutic compound A in
the treatment of disease or condition Y). The hypothesis may, in
some embodiments, tie the chain hypothesis to a particular semantic
type (see element 111, for example) corresponding to a particular
disease state that may respond most favorably to treatment with the
combination of therapeutic compounds A, B and C as indicated by the
documents in the data repository 20. A "strength" or potential
probative value of the potential substitution hypothesis 132 may be
evaluated quantitatively in some embodiments by measuring the value
of "m" (i.e. the number of phrases in the number of phrases (X)
tying the therapeutic compounds A and B together.
[0045] FIG. 4B shows a detailed depiction of various exemplary
subroutines that may be used to accomplish the hypothesis
generating step 130 for a potential substitution hypothesis 134.
Given a set of interesting semantic types (IS) in the system for
extracting relationships 10 and a predefined set of interesting
relations (IR), the generating step 130 may comprise advancing
through each of the concept-concept relationships defined, for
example, between concept A and other concepts via the relationship
rule 122. The generating step 130 may further comprise (1)
determining if concept A is a specific concept (according to a
concept-hierarchical relationship rule 113, for example); and (2)
determining the semantic type of concept A (via the
concept-semantic type relationship rule 111, for example). If the
determined semantic type of concept A falls within the given IS,
then the generating step 130 may further comprise retrieving
various concepts related to concept A (by consulting the
concept-concept relationship rule 122, for example) and denoting
these concepts as A-relates. For each concept B in the set of
A-relates, the generating step 130 may further comprise determining
if B is a specific concept (according to a concept-hierarchical
relationship rule 113, for example) and retrieving concepts related
to concept B (by consulting the concept-concept relationship rule
122, for example) and denoting these concepts as B-relates.
Furthermore, for each concept C in the set of B-relates, the
generating step 130 may further comprise determining if C's
semantic type is the same as that for A (by consulting the
concept-semantic type relationship rule 111, for example) and
analyzing the concept-concept relationship rule 122 to obtain a set
of concepts related to both concept A and concept C. The generating
step 130 may further comprise determining if the number of obtained
concepts is greater than some selected threshold; and analyzing the
concept-concept relationship rule 122 to obtain a list of concepts
that are related to concept A but that are unrelated to concept C
and denoting the obtained concepts as group A_not_C. Furthermore,
the generating step 130 may further comprise determining, for each
concept X in A_not_C, if the relationship between A and X is
present in the defined set of interesting relations (IR) and
reporting a potential substitution hypothesis 134 comprising
concept C as a potential substitution for concept A in relation to
concept X.
[0046] According to some other embodiments, a "pairwise"
relationship may be established by the parsing step 120 among two
or more previously unrelated phrases and/or concepts. For example,
at least a portion of the plurality of documents present in the
data repository 20 may comprise at least one of a first concept, a
second concept, and a third concept. In some such embodiments, the
parsing step 120 (utilizing one or more previously-identified
relationship rules (see elements 111, 112, 113, 114, for example))
may comprise: (1) detecting a first relationship between the first
concept and the second concept; (2) detecting a second relationship
between the second concept and the third concept; and (3)
determining a potential pairwise relationship between the first
concept and the third concept at least partially from the detected
first and second relationships. In such embodiments, the generating
step 130 may further comprise generating a pairwise hypothesis 136
comprising the previously unknown combination of the first and
third concepts.
[0047] According to some such "pairwise" hypothesis embodiments,
the parsing step 120 may further comprises assessing a strength of
the potential relationship between the first and third concepts at
least partially from a known secondary relationship between the
first and third concepts. For example, in some embodiments, the
known secondary relationship may comprise a common semantic
category including both the first and third concepts (as indicated,
for example, by a concept-semantic type relationship rule 111
(and/or another relationship rule type), that may be a product of
the determining step 110 (as shown generally in FIG. 1).
[0048] For example, a pairwise hypothesis 136 may be generated in
step 130 by first determining that a strong relationship exists
between concept A (i.e. a first therapeutic compound A) and concept
X (a disease or condition X). This may be accomplished, for
example, using the concept-concept relationship output 122 of an
initial parsing step 120. In order to complete the generation of a
potential pairwise hypothesis 136, step 130 may further comprise
determining the existence of a strong relationship (via the
concept-concept relationship output 122, for example) between the
disease or condition X and the therapeutic compound B. According to
some such embodiments, the generating step 130 may further comprise
detecting an interesting secondary relationship between concepts A
and B (i.e. detecting if therapeutic compounds A and B are in the
same or similar semantic category (see element 111, FIG. 1, for
example)). If the parsing step 130 indicates that no current
literature in the data repository 20 addresses a potential pairwise
relationship between concepts A and B (such as the potential for
treatment of particular conditions using some combination of
therapeutic compounds A and B, for example), step 130 may further
comprise reporting the pair A and B as a potential pairwise
hypothesis 136.
[0049] FIG. 4C shows a detailed depiction of various exemplary
subroutines that may be used to accomplish the hypothesis
generating step 130 for a potential pairwise hypothesis 136. Given
a set of interesting semantic type pairs (ISP) and/or a set of
interesting relations (IR) in the system for extracting
relationships 10, the generating step 130 may comprise analyzing
each concept A in relation to the concept-concept relationship rule
122 and determining if concept A is a specific concept (according
to a concept-hierarchical relationship rule 113, for example). The
generating step 130 may further comprise: (1) consulting the
concept-semantic type relationship rule 111 to determine if concept
A's semantic type is present in at least one of the pairs making up
the interesting semantic type pairs (ISP); and (2) retrieving
concepts related to concept A (according to the concept-concept
relationship rule 122, for example) and designating the retrieved
concepts as A-relates. Furthermore, for each concept B in the
A-relates, the generating step 130 may further comprise determining
if B is a specific concept (according to a concept-hierarchical
relationship rule 113, for example); and retrieving concept B's
related concepts (according to the concept-concept relationship
rule 122, for example) and designating the retrieved concepts as
B-relates. Furthermore, as shown in FIG. 4C, for each concept C in
the B-relates category, the generating step 130 may further
comprise determining if concept C is a specific concept (according
to a concept-hierarchical relationship rule 113, for example); and
determining if concept C's semantic type (according to the
concept-semantic type relationship rule 111, for example) paired
with concept A's semantic type is present in the selected ISP. In
such embodiments, the generating step 130 further comprises
consulting the concept-document relationship rule 124 to obtain a
number of documents within the data repository 20 that each
contains both concept A and concept C. The generating step 130 may
further comprise determining if the number of documents is less
than some threshold and, if so, reporting the resulting potential
pairwise hypothesis 136 comprising, for example, the previously
unknown and/or unappreciated combination of concept A and concept
C.
[0050] Referring again to FIG. 1, the method embodiments may
further comprise step 140 for presenting the hypothesis as a visual
output 145 (see, for example, the display outputs shown generally
in FIGS. 6A-6C) so as to indicate the previously unknown
combination to a user and/or presenting the previously unknown
combination to a downstream process (such as, for example, a user
analysis 150 and subsequent filtering step 160 as shown in FIG. 5).
Various embodiments of the present invention may therefore provide
the important technical effect of presenting complex hypotheses
132, 134, 136 produced by the generating step 130 in a simplified,
yet multi-layered interactive display (such as those displays
shown, for example in FIGS. 6A-6C.
[0051] As shown generally in FIGS. 7A and 7B, various system
embodiments of the present invention may comprise a user interface
704 (such as a monitor or other display device) in communication
with a host computing element 700 and configured for presenting the
hypothesis so as to indicate the previously unknown combination to
a user. In some embodiments, the user interface 704 may present one
or more of the hypotheses 132, 134, 136 as a display to a user
comprising a primary visual representation 610 of the previously
unknown combination including one of at least one of the plurality
of phrases and at least one of the plurality of concepts (as shown
for example in the display shown generally in FIGS. 6A-6C. In some
embodiments, the user interface 704 may also present the primary
visual representation 610 comprising an interactive icon 615
configured to be selectable by the user. The interactive icon 615
may be further configured to modify the display when selected by
the user (i.e. selection of the interactive icon 615 in the primary
visual representation may elicit the display of a secondary visual
representation 620 that may indicate, for example, a plurality of
relationships linking Phrase 2 and Phrase 3 (as shown, for example
in FIG. 6A).
[0052] Referring to FIGS. 6A-6C, the step 140 for presenting the
hypothesis as a visual output 145 may comprise presenting a display
to a user comprising a primary visual representation 610 of the
previously unknown combination including one of at least one of the
plurality of phrases and at least one of the plurality of concepts.
The primary visual representation 610 may comprise, in some
embodiments, an interactive icon 615 (such as a linking element)
configured to be selectable by the user (i.e. via a mouse click for
example). The interactive icon 615 may be further configured to
modify the display (by calling up a secondary visual representation
620, for example) when selected by the user. The secondary visual
representation 620 may also comprise an interactive icon 625 that,
when selected by a user, may call up a tertiary visual
representation 630 comprising further detail regarding the
previously unknown combination of phrases and/or concepts that make
up the potential hypothesis.
[0053] FIG. 6A shows primary 610, secondary 620, and tertiary 630
visual representations of a previously unknown combination
comprising a potential chain hypothesis combining Phrases 1-3 via
three separate interactive linking icons 615. As shown in FIG. 6A,
if a user selects the interactive icon 615 disposed between Phrase
2 and Phrase 3, step 140 may further comprise displaying a
secondary visual representation 620 comprising the various
relationships (defined, for example, by one or more relationship
rules 111, 112, 113, 114, for example) underlying the combination
of Phrases 2 and 3 in the previously unknown chain hypothesis
comprises Phrases 1-3. As described above the secondary visual
representation 620 may also comprise an interactive icon 625
corresponding to at least one of the linking relationships. The
interactive icon 625, when selected by a user, may be configured to
call up the tertiary visual representation 630 comprising, for
example, a list of documents from the data repository 20 that were
parsed (see step 130, for example) to generate the potential chain
hypothesis 132 shown in the primary visual representation 610.
Thus, by navigating the various interactive visual representations
610, 620, 630 produced in accordance with step 140, a user may
deconstruct and/or uncover the various logical steps used by the
various system, method, and computer program product embodiments of
the present invention to generate the hypothesis. This interactive
display feature may be especially useful for allowing a user to
assess the quality and/or basis for a particular proposed
hypothesis.
[0054] Similarly, FIGS. 6B and 6C show a progression of primary
610, secondary 620, and tertiary 630 visual representations for
presenting potential substitution hypotheses 134 and potential
pairwise hypotheses 136, respectively. Each of the first and second
visual representations 610, 620 also comprise interactive icons
615, 625 allowing a user to uncover the various relationship rules
111, 112, 113, 114 and ultimately, the very documentary evidence
gleaned from one or more data repositories 20, that serve as the
basis for the potential hypothesis presented according to various
embodiments of the present invention.
[0055] As shown in FIGS. 1 and 5, various method embodiments may
further comprise step 150 for identifying a portion of the
plurality of documents in the data repository 20 associated with a
user and creating a user profile 155 based at least in part on the
identified documents. The user profile 155 may be indicative of a
user information need. Some such embodiments may further comprise
step 160 for modifying the hypothesis in response to the user
profile 155 such that the modified hypothesis at least partially
corresponds to the user information need. Step 160 for modifying
the hypothesis may, in some embodiments, utilize one or more of the
various steps 110, 120, 130 described herein for determining
relationship rules 111, 112, 113, 114, parsing documents from a
data repository 20, and/or generating hypotheses 132, 134, 136 in
order to perform a user analysis to identify a portion of the
plurality of documents in the data repository 20 associated with a
user.
[0056] For example, and referring generally to FIG. 5, step 150 may
comprise subroutines that may include, but are not limited to: step
151 for searching a user's online publications (which may, in some
embodiments, be stored in the data repository 20); step 152 for
parsing the searched publications uncovered in step 151 and
indexing various phrases therein by a concept identifier (using,
for example a phrase-concept relationship rule 112 that may be
generated in step 110); step 153 for mapping concepts in the
searched publications uncovered in step 151 to a particular
semantic type (using, for example, a concept-semantic type
relationship rule 111 that may be generated in step 110); and step
154 for ranking the top k semantic types as to produce the user
profile 155 (wherein k may comprise an adjustable parameter).
[0057] In some embodiments (as shown, for example in FIG. 5), the
user profile 155 may comprise at least one semantic category (as
defined, for example, by a concept-semantic category relationship
rule 111 that may serve as an input to the user analysis step 150
for identifying a portion of the plurality of documents in the data
repository 20 associated with a user). According to such
embodiments, various methods may further comprise step 160 for
filtering one or more of the presented hypotheses 132, 134, 136
such that the previously unknown combination includes only at least
one phrase and at least one concept corresponding substantially to
the at least one semantic category highlighted in the user profile
155.
[0058] Thus, various system, method, and/or computer program
products of the present invention may tailor the "conceptual
research" results returned, for example, as part of the presented
hypotheses 132, 134, 136 to meet a user information need that may
be ascertained by analyzing a user's publications and/or previous
search patterns. As described further herein, various system
embodiments of the present invention may comprise a host computing
element 700 including one or more memory devices 722, 724, 728
configured for storing a user profile 155 such that each user
(identified, for example, by a unique user ID and/or password) may
log on to a host computing element 700 so as to utilize the
conceptual research capabilities of the various embodiments
described more fully herein.
[0059] Some method embodiments may further comprise a step for
verifying one or more generated hypotheses 132, 134, 136 using at
least one independent resource. For example, in some method
embodiments, the generated hypotheses 132, 134, 136 may be verified
using an independent resource 800 as shown generally in FIG. 8. For
example, an independent resource 800 (such as a "Verifying Support
System" or other verification module for example) may be in
communication with a host computing device 700 that may be
responsible for performing the parsing 120 and/or generating steps
130 described herein. The independent resource 800 may also be in
communication with the data repository 20 so as to be capable of
evaluating one or more documents contained therein, using tools
and/or subroutines that may include, but are not limited to: a data
analysis engine 802, a natural language processing (NLP) engine
804, a data "mining" engine 806, and/or an existing search engine
808. These tools may comprise "off-the-shelf" search engines or
other publicly available search tools that may be used by the
independent resource 800 to verify the hypothesis 132, 134, 136
(for example) generated by the host computing element 700 (using,
for example, the various modules and/or steps shown in FIG. 1).
"Verification" of the hypothesis may include, but is not limited
to: assessing a potential "breakthrough" value of the hypothesis;
assessing the probability of the hypothesis providing a viable
research direction and/or research focus; and ensuring that the
hypothesis meets the potential information needs of a particular
audience (such as, for example, those information needs embodied in
a user profile 155 as shown in FIG. 5).
[0060] Some embodiments of the present invention further provide a
system for mining information from a data repository 20 comprising
a plurality of documents to produce a hypothesis (see elements 132,
134, 136 of FIG. 1, for example). Referring generally to FIG. 1,
the system may comprise a system for extracting relationships 10
comprising a plurality of phrases and a plurality of concepts. The
system for extracting relationships 10 may include, but is not
limited to: a vocabulary database corresponding to a selected
subject area; a predetermined lexicon; a semantic network; a
metathesaurus; and combinations of such databases.
[0061] Various system embodiments may further comprise a host
computing element 700 (see FIG. 7, for example) in communication
with the system for extracting relationships 10 for accessing the
system for extracting relationships 10. According to such
embodiments, the host computing element 700 may determine a
relationship rule 111, 112, 113, 114 defining a relationship among
at least a portion of the plurality of phrases and at least a
portion of the plurality of concepts. As described herein with
respect to the various method and/or computer program product
embodiments of the present invention, the various relationship
rules 111, 112, 113, 114 determined by the host computing element
700 may include, but are not limited to: an assignment of at least
one of the plurality of phrases to at least one of the plurality of
concepts (see element 112 and FIG. 2A); an assignment of at least
one of the plurality of phrases to a relationship identifier (see
element 114 and FIG. 2B, wherein the relationship identifier may
link a first one of the plurality of concepts to a second one of
the plurality of concepts); an assignment of at least one of the
plurality of concepts to a semantic category (see element 111 and
FIG. 2C, for example); an arrangement of at least a portion of the
plurality of concepts in a hierarchical relationship (see element
113 and FIG. 2D for example), wherein a first one of the portion of
concepts comprises a child concept and a second one of the portion
of concepts comprises a parent concept; and/or combinations of such
relationship rules.
[0062] In some embodiments, as shown generally in FIGS. 7A-7B, the
system and/or host computing element 700 thereof may comprise one
or more memory devices 722, 724, 728 in communication with and/or
integrated with the host computing element 700. According to some
such embodiments, the memory device or devices 722, 724, 728 may be
configured for storing one or more of the determined relationship
rules 111, 112, 113, 114 for later or repeated use in a subsequent
parsing step (see step 120, FIG. 1, for example). Thus, the memory
devices 722, 724, 728 may allow the host computing element 700 to
serve as a "warehouse" of relationships comprising conceptual links
determined in part from the system for extracting relationships 10
corresponding to a particular data repository 20. For example, in
biomedical "conceptual research" embodiments of the present
invention, the memory devices 722, 724, 728 may be configured for
storing relationship rules 111, 112, 113, 114 determined from an
analysis of the Unified Medical Language System (UMLS) (or a
similar system for extracting relationships 10) for use in later
and/or repeatedly parsing (see step 120) of a biomedical data
repository 20 (such as MEDLINE, for example). Importantly, the
memory devices 722, 724, 728 may allow for the conservation of
computing power and/or time by "pre-computing"and storing
commonly-used and/or re-used relationship rules 111, 112, 113, 114
that may be necessary for parsing 120 and hypothesis generation 130
steps (as described in further detail herein) that may be later
performed by the host computing element 700.
[0063] Referring again to FIG. 1, the host computing element 700
may also be in communication (via a wired and/or wireless network
connection, for example) with a data repository 20. System
embodiments of the present invention may be configured for use with
various types of data repositories 20 in a variety of subject
areas. For example, the host computing element 700 may be in
communication with a data repository 20 that may include, but is
not limited to: a biomedical literature database (such as MEDLINE,
for example); a medical records database; a chemical literature
database; a computer science literature database; a physics
literature database; a legal literature database; a psychology
literature database; a social science literature database; a news
periodical database; a business journal database; and combinations
of such databases. In such embodiment, the host computing element
700 may be further configured for parsing (see step 120 the
plurality of documents in the data repository 20 according to one
or more of the determined relationship rules 111, 112, 113, 114
described herein. Each of the plurality of documents stored in the
data repository 20 may comprise at least a portion of one of the
plurality of phrases and the plurality of concepts such that the
relationship rules 111, 112, 113, and 114 may be directly
compatible with the terms, phrases, and/or concepts present in the
documents.
[0064] Furthermore, in some system embodiments, the host computing
element 700 may be further configured for performing step 130 for
generating one or more potential hypotheses 132, 134, 136
comprising a previously unknown combination of at least one of the
plurality of phrases and at least one of the plurality of concepts.
As described herein, the previously unknown combination may be at
least partially determined from the parsed plurality of documents
present in the data repository 20.
[0065] As shown generally in FIGS. 7A and 7B, various system
embodiments may further comprise a user interface 704 (such as a
display device, for example) in communication with the host
computing element 700. The user interface 704 may be configured for
presenting the hypothesis (as a visual indication or display, for
example, as shown generally in FIGS. 6A-6C) so as to indicate the
previously unknown combination of at least one of the plurality of
phrases and at least one of the plurality of concepts. In some
system embodiments, the user interface 704 may be configured for
presenting one or more of the generated hypotheses 132, 134, 136 as
a display (see FIGS. 6A-6C, for example) to a user comprising a
primary visual representation 610 of the previously unknown
combination including one of at least one of the plurality of
phrases and at least one of the plurality of concepts. Referring,
generally to FIG. 6A, in some such embodiments, the user interface
704 may be further configured for presenting a primary visual
representation 610 comprising an interactive icon 615 configured to
be selectable by the user (i.e. via a mouse click or other user
input received via one or more input devices 706, 708 in
communication with and/or integrated with the host computing
element 700. As described herein, the interactive icon 615, 625 may
be further configured to modify the display when selected by the
user. For example, selection of the interactive icon 615 in the
primary visual representation 610 of the presented hypothesis may
call up and/or modify the display to present a secondary visual
representation 620 comprising the underlying relationship rules
and/or linking relations used to generate (in step 130, for
example) one or more of the presented hypotheses 132, 134, 136. As
described herein, the secondary visual representation 620 may
further comprise a second interactive icon 625 that, when selected
by a user, may call up a tertiary visual representation 630
presenting, for example, at least a portion of the plurality of
parsed documents from the data repository 20 that may have been
used to generated one or more of the presented hypotheses 132, 134,
136. Thus, the host computer 700 and the associated user interface
704 elements may allow a user to fully examine the presented
hypotheses 132, 134, 136 and the logic and/or relationships
underlying the proposed hypotheses. This transparency may allow a
user to become comfortable with the various system embodiments of
the present invention and to more readily rely on the various
"conceptual research" capabilities afforded thereby.
[0066] Referring to FIG. 5, in some system embodiments, the host
computing element 700 may be further configured for tailoring
and/or modifying one or more of the presented hypotheses 132, 134,
136 to meet the particular information needs of an identified user.
In some such embodiments, the host computing element 700 may
perform step 150 for identifying a portion (see element 30, for
example, indicating the user's available online publications) of
the plurality of documents in the data repository 20 associated
with a user. The host computing element 700 may be further
configured for creating a user profile 155 based at least in part
on the identified documents. As described herein, the user profile
155 may be indicative of a user information need (such as, for
example, a particular user's field of study and/or area of
expertise). As shown in FIGS. 1 and 5, the host computing element
700 may be configured for performing step 160 for filtering the
results returned from the hypothesis generating step 130 (i.e. one
or more potential hypotheses 132, 134, 136). As shown in FIG. 5,
the host computing element 700 may be configured for receiving
inputs comprising the potential hypotheses 132, 134, 136 and the
user profile 155 and modifying one or more of the potential
hypotheses 132, 134, 136 in response to the user profile 155 such
that the modified hypothesis at least partially corresponds to the
user information need. In some system embodiments, the user profile
155 generated by the host computing element 700 may comprise at
least one semantic category (as defined, for example, by a
concept-semantic category relationship rule 111 that may serve as
an input to the user analysis step 150 for identifying a portion of
the plurality of documents in the data repository 20 associated
with a user). According to such embodiments, the host computing
element 700 may be further configured for performing step 160 for
filtering one or more of the presented hypotheses 132, 134, 136
such that the previously unknown combination includes only at least
one phrase and at least one concept corresponding substantially to
the at least one semantic category highlighted in the user profile
155.
[0067] As shown in FIG. 8, some system embodiments may further
comprise an independent resource 800 (such as a "Verifying Support
System," for example) in communication with the host computing
device 700. The independent resource 800 may also be in
communication with the data repository 20 so as to be capable of
evaluating one or more documents contained therein, using tools
that may include, but are not limited to: a data analysis engine
802, a natural language processing (NLP) engine 804, a data
"mining" engine 806, and/or an existing search engine 808. These
tools may comprise "off-the-shelf" search engines or other publicly
available search tools that may be used by the independent resource
800 to verify the hypothesis 132, 134, 136 (for example) generated
by the host computing element 700 (using, for example, the various
modules and/or steps shown in FIG. 1). "Verification" of the
hypothesis may include, but is not limited to: assessing a
potential "breakthrough" value of the hypothesis; assessing the
probability of the hypothesis providing a viable research direction
and/or research focus; and ensuring that the hypothesis meets the
potential information needs of a particular audience (such as, for
example, those information needs embodied in a user profile 155 as
shown in FIG. 5).
[0068] FIGS. 7A-7B illustrate an exemplary host computing element
700 useable for implementing some embodiments of the present
invention. In particular, FIG. 7A illustrates an example of a host
computing element 700 configured as a computer device in which some
embodiments may be utilized. As illustrated in FIG. 7A, the host
computing element 700 may comprise a system unit 702, output
devices such as display device 704 and printer 710, and input
devices such as keyboard 708, and mouse 706. The host computing
element 700 receives data for processing by the manipulation of
input devices 708 and 706 or directly from fixed or removable media
storage devices such as CD disk 712 and/or via network connection
interfaces (not illustrated). The host computing element 700 then
processes data and presents resulting output data via output
devices such as display device 704, printer 710, fixed or removable
media storage devices like disk 712 or network connection
interfaces. It should be appreciated that the computer device used
for implementing the preferred embodiment can be any sort of
computer system (e.g., personal computer (laptop/desktop), network
computer, server computer, or any other type of computer).
[0069] Referring now to FIG. 7B, there is depicted a high-level
block diagram of the components of a host computing element 700
such as that illustrated by FIG. 7A. System unit 702 includes a
processing device such as processor 720 in communication with a
main memory device 722 (which may include various types of cache,
random access memory (RAM), or other high-speed dynamic storage
devices via a local or system bus 714 or other communication means
for communicating data between such devices). The primary memory
device 722 may be capable of storing data as well as instructions
to be executed by processor 720 and may be used to store temporary
variables or other intermediate information during execution of
instructions by processor 720. The host computing element 700 may
also comprise a read only memory (ROM) and/or other static storage
devices 724 coupled to local bus 714 for storing static information
(such as one or more relationship rules 111, 112, 113, 114, for
example) and instructions for processor 720. The system unit 702 of
the host computing element 700 also features an expansion bus 716
providing communication between various devices and devices
attached to the system bus 714 via the bus bridge 718. A data
storage device 728, such as a magnetic disk 712 or optical disk
such as a CD-ROM and its corresponding drive may be coupled to the
host computing element 700 for storing data and instructions via
expansion bus 716. The host computing element 700 may also, in some
embodiments, be coupled via expansion bus 716 to a user interface
704 (or other display device), such as a cathode ray tube (CRT) or
a liquid crystal display (LCD), for displaying data to a computer
user such as the generated hypotheses 132, 134, 136 and associated
visual representations thereof (see FIGS. 6A-6C, as described
further herein). In some embodiments, the system may further
comprise an alphanumeric input device 708, including alphanumeric
and other keys, is coupled to bus 716 for communicating information
and/or command selections to processor 720. Other types of user
input devices may also be provided as a component of and/or in
communication with the host computing element 700. Such input
devices may include cursor control device 706, such as a
conventional mouse, trackball, or cursor direction keys for
communicating direction information and command selection to
processor 720 and for controlling cursor movement the user
interface 704. Such cursor control devices 706 may be especially
useful in allowing a user to select an interactive icon 615, 625
presented in one or more of the primary 610 and secondary 620
visual representations of the hypotheses 132, 134, 136 (as
depicted, for example, in FIGS. 6A-6C.
[0070] A communication device 726 may also be coupled to and/or in
communication with the bus 716 for accessing remote computers or
servers via the Internet or other network. Such remote computers
and/or servers may house, for example, one or more system for
extracting relationships 10 and/or data repositories 20. The
communication device 726 may include, but is not limited to: a
modem; a network interface card; and/or other interface devices,
such as those used for interfacing with Ethernet, Token-ring, or
other types of networks. In any event, in this manner, the host
computing element 700 may be coupled to and/or in communication
with a number of servers via a network infrastructure. The
communication device 726 may enable one or more users to
selectively access the host computing element 700 so as to take
advantage of the relationship rules 111, 112, 113, 114 and/or
generated hypotheses 132, 134, 136 that may be generated according
to the various embodiments of the present invention.
[0071] In addition to providing systems and methods, the present
invention also provides computer program products for performing
the operations described above. The computer program products have
a computer readable storage medium having computer readable program
code embodied in the medium. With reference to FIG. 7B, the
computer readable storage medium may be part of the memory device
22, and may implement the computer readable program code to perform
the above discussed operations.
[0072] In this regard, FIGS. 1, 4A-4C, and 5 are block diagram,
flowchart and control flow illustrations of methods, systems and
program products according to exemplary embodiments of the
invention. It will be understood that each block or step of the
block diagram, flowchart and control flow illustrations, and
combinations of blocks in the block diagram, flowchart and control
flow illustrations, can be implemented by computer program
instructions. These computer program instructions may be loaded
onto a computer or other programmable apparatus to produce a
machine, such that the instructions which execute on the computer
or other programmable apparatus are capable of implementing the
functions specified in the block diagram, flowchart or control flow
block(s) or step(s). These computer program instructions may also
be stored in a computer-readable memory that can direct a computer
or other programmable apparatus to function in a particular manner,
such that the instructions stored in the computer-readable memory
produce an article of manufacture including instructions which
implement the function specified in the block diagram, flowchart or
control flow block(s) or step(s). The computer program instructions
may also be loaded onto a computer or other programmable apparatus
to cause a series of operational steps to be performed on the
computer or other programmable apparatus to produce a computer
implemented process such that the instructions which execute on the
computer or other programmable apparatus provide steps for
implementing the functions specified in the block diagram,
flowchart or control flow block(s) or step(s).
[0073] Accordingly, blocks or steps of the block diagram, flowchart
or control flow illustrations support combinations of steps for
performing the specified functions, and program instructions for
performing the specified functions. It will also be understood that
each block or step of the block diagram, flowchart or control flow
illustrations, and combinations of blocks or steps in the block
diagram, flowchart or control flow illustrations, can be
implemented by special purpose hardware-based computer systems
which perform the specified functions or steps, or combinations of
special purpose hardware and computer instructions.
[0074] Many modifications and other embodiments of the invention
will come to mind to one skilled in the art to which this invention
pertains having the benefit of the teachings presented in the
foregoing descriptions and the associated drawings. Therefore, it
is to be understood that the invention is not to be limited to the
specific embodiments disclosed and that modifications and other
embodiments are intended to be included within the scope of the
appended exemplary inventive concepts. Although specific terms are
employed herein, they are used in a generic and descriptive sense
only and not for purposes of limitation.
* * * * *