U.S. patent application number 15/141562 was filed with the patent office on 2016-08-25 for method for entity-driven alerts based on disambiguated features.
The applicant listed for this patent is QBASE, LLC. Invention is credited to Scott LIGHTNER, Franz WECKESSER.
Application Number | 20160246794 15/141562 |
Document ID | / |
Family ID | 53265486 |
Filed Date | 2016-08-25 |
United States Patent
Application |
20160246794 |
Kind Code |
A1 |
LIGHTNER; Scott ; et
al. |
August 25, 2016 |
METHOD FOR ENTITY-DRIVEN ALERTS BASED ON DISAMBIGUATED FEATURES
Abstract
A method for entity-driven alerts based on disambiguated
features, is disclosed. According to an embodiment, disclosed
method may refer to entity-driven alerts based on trending or new
knowledge of a disambiguated feature. The alerts may be sent to a
user when new knowledge is discovered about the disambiguated
feature, a new association (such as new features, facts,
quotations, or topic IDs related, among others) with the feature of
interest, and/or new trending changes are emerging about the
feature of interest. According to various embodiments, method for
entity-driven alerts based on disambiguated features may reduce the
number of false positives resulting in a normal search query. Which
in turn, may increase the efficiency of monitoring, allowing for
broadened universe of alerts.
Inventors: |
LIGHTNER; Scott; (Leesburg,
VA) ; WECKESSER; Franz; (Spring Valley, OH) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
QBASE, LLC |
Reston |
VA |
US |
|
|
Family ID: |
53265486 |
Appl. No.: |
15/141562 |
Filed: |
April 28, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14558121 |
Dec 2, 2014 |
9336280 |
|
|
15141562 |
|
|
|
|
61910773 |
Dec 2, 2013 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/355 20190101;
G06F 16/24578 20190101 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A method comprising: accessing, by a server, an electronic
document stored in an in-memory database; extracting, by the
server, a first feature from content data of the electronic
document having one or more features; disambiguating, by the
server, the first feature from the content data; comparing, by the
server, the first feature to a second feature stored in the
in-memory database; in response to the first feature matching the
second feature, comparing, by the server, the first feature to a
third feature stored in the in-memory database; in response to the
first feature not matching the third feature, determining, by the
server, if the first feature is representative of a knowledge new
to the in-memory database; in response of the first feature being
representative of the knowledge, updating, by the server, the
in-memory database with the first feature; generating, by the
server, a message informative of the first feature based on the
updating; and sending, by the server, the message to a client.
2. The method of claim 1, wherein the first feature is stored in a
first data structure of the in-memory database, and the second
feature is stored in a second data structure of the in-memory
database.
3. The method of claim 2, wherein the first data structure is
distinct from the second data structure.
4. The method of claim 1, further comprising: assigning, by the
server, a score to the first feature, wherein the score is
indicative of a level of confidence associated with a degree of
disambiguation based on the disambiguating, wherein the first
feature matches the second feature based on the score; storing, by
the server, the score in the in-memory database such that the score
is associated with the first feature; granting, by the server, a
read access for the score to the client.
5. The method of claim 1, wherein the updating is based on a
distance in text from a link location in the electronic document,
wherein the distance is based on a closeness in text to the link
location.
6. The method of claim 1, wherein the knowledge is indicative of an
association new to the in-memory database, wherein the association
is between the first feature with a fourth feature stored in the
in-memory database.
7. The method of claim 1, wherein the message is a first message,
and further comprising: associating, by the server, the first
feature with a plurality of documents stored in the in-memory
database; determining, by the server, a quantity of the documents;
accessing, by the server, a threshold stored in the in-memory
database, wherein the threshold is set via the client; determining,
by the server, if the quantity meets or exceeds the threshold; in
response to the quantity meeting or exceeding the threshold,
generating, by the server, a second message informative of at least
one of the meeting or the exceeding; and sending, by the server,
the second message to the client.
8. The method of claim 7, wherein the threshold comprises a daily
average number of the documents associated with the first feature
in the in-memory database.
9. The method of claim 1, wherein the disambiguating comprises
linking, by the server, the first feature to a fourth feature
stored in the in-memory database, wherein the in-memory database
stores a plurality of co-occurring features obtained from a
plurality of electronic documents comprising the electronic
document.
10. The method of claim 9, wherein the linking is dynamic based on
a predetermined factor.
11. A method comprising: accessing, by a server, an electronic
document stored in an in-memory database; extracting, by the
server, a first feature from the electronic document;
disambiguating, by a server, the first feature; comparing, by the
server, the first feature to a second feature stored in the
in-memory database; in response to the first feature matching the
second feature, comparing, by the server, the first feature to a
third feature stored in the in-memory database; in response to the
first feature not matching the third feature, determining, by the
server, if the first feature is representative of an association
new to the in-memory database, wherein the association is between
the first feature with a fourth feature stored in the in-memory
database; in response of the first feature being representative of
the association, updating, by the server, the in-memory database
with the first feature; generating, by the server, a message
informative of the first feature based on the updating; and
sending, by the server, the message to a client.
12. The method of claim 11, wherein the first feature is stored in
a first data structure in the in-memory database and the second
feature is stored in a second data structure in the in-memory
database.
13. The method of claim 12, wherein the first data structure is
distinct from the second data structure.
14. The method of claim 11, further comprising: assigning, by the
server, a score to the first feature, wherein the score is
indicative of a level of confidence associated with a degree of
disambiguation based on the disambiguating, wherein the first
feature matches the second feature based on the score; storing, by
the server, the score in the in-memory database such that the score
is associated with the first feature; granting, by the server, a
read access for the score to the client.
15. The method of claim 11, wherein the updating is based on a
distance in text from a link location in the electronic document,
wherein the distance is based on a closeness in text to the link
location.
16. The method of claim 11, wherein the message is a first message,
and further comprising: associating, by the server, the first
feature with a plurality of documents stored in the in-memory
database; determining, by the server, a quantity of the documents;
accessing, by the server, a threshold stored in the in-memory
database, wherein the threshold is set via the client; determining,
by the server, if the quantity meets or exceeds the threshold; in
response to the quantity meeting or exceeding the threshold,
generating, by the server, a second message informative of at least
one of the meeting or the exceeding; and sending, by the server,
the second message to the client.
17. The method of claim 16, wherein the threshold comprises a daily
average number of the documents associated with the first feature
in the in-memory database.
18. The method of claim 11, wherein the disambiguating comprises
linking, by the server, the first feature to a fourth feature
stored in the in-memory database, wherein the in-memory database
stores a plurality of co-occurring features obtained from a
plurality of electronic documents comprising the electronic
document.
19. The method of claim 18, wherein the linking is dynamic based on
a predetermined factor.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. Non-Provisional
patent application Ser. No. 14/558,121, entitled "Method For Entity
Driven Alerts Based On Disambiguated Features," filed on Dec. 2,
2014, which claims a benefit of priority to U.S. Provisional
Application No. 61/910,773, entitled "Method For Entity-Driven
Alerts Based on Disambiguated Features," filed on Dec. 2, 2013, all
of which are fully incorporated herein by reference for all
purposes.
[0002] This application is related to U.S. application Ser. No.
14/558,254, entitled "Design And Implementation Of Clustered
In-Memory Database," filed Dec. 2, 2014; and U.S. application Ser.
No. 14/558,179, entitled "Alerting System Based On Newly
Disambiguated Features," filed Dec. 2, 2014; each of which are
hereby incorporated by reference in their entirety.
FIELD OF THE DISCLOSURE
[0003] The present disclosure relates in general to databases; and,
more particularly, to data management systems and alerting
systems.
BACKGROUND
[0004] A well-designed meta-analysis can provide valuable
information for researchers, policy-makers, or data analysts in
general. These users face an overwhelming amount of information,
even in narrow areas of interest. In response, search engines
designed to send alerts are frequently employed on large volumes of
information. However, there are many critical caveats in performing
and interpreting such large amount of information, and thus many
ways in which meta-analyses can yield misleading information. To
further reduce information overload, users may only want to be
alerted when new trends emerge about an entity.
[0005] Searching information about entities (i.e. people,
locations, organizations) in a large amount of documents, including
sources such as a network, may often be ambiguous, which may lead
to imprecise text processing functions, imprecise association of
features during a knowledge extraction, and, thus, imprecise data
analysis. Therefore, alerts based on keywords may be problematic
because references to named entities are ambiguous and many alerts
that are not on topic may be provided in the search results. In
addition, people may not want to get alerted on everything related
to an entity, but only when new knowledge (new information) about
an entity is available.
[0006] Keyword search may not solve these problems as it is not
easy to do that kind of filtration.
[0007] Therefore, there is still a need for tailored alerts
following certain criteria to reduce results with misleading
information or false positives, to increase the efficiency of
monitoring, allowing for broadened universe of alerts.
SUMMARY
[0008] An aspect of the present disclosure is a method for
entity-driven alerts based on disambiguated features. The method
may include a news feed, an entity disambiguation module, and an
alert database including one or more software modules.
[0009] A system for disambiguating features may include one or more
modules, such as one or more feature extraction modules, one or
more disambiguation modules, one or more scoring modules, and one
or more linking modules. Embodiments of a method for disambiguating
features may improve the accuracy of entity disambiguation beyond
what may be achieved by considering no document linking. Taking
account of document linkage may allow better disambiguation by
considering document and entity relationships implied by links.
Additionally, method for disambiguating features may be based on
topics. Disambiguated features based on topics may allow to
disambiguate one or more features/entities of interest occurring in
a document by extracting meaningful context from a document
(topics, entities, events, sentiment, and other features); and by
disambiguating the extracted features by linking the co-occurrence
of extracted features (topics, entities, etc.) using the knowledge
base of co-occurring features.
[0010] The components within alert database (AD) may vary according
to the type of alert the user wants to receive. The AD may have at
least the components discussed below.
[0011] According to various embodiments, the AD may have a user
identifier to which the alerts may be going to be sent; a
collection of disambiguated features from which the user may select
which feature the user wants to monitor; an alert specification
describing the type of alert the user wants to receive; and a
known-knowledge base in which known knowledge about the feature of
interest may be stored. Any suitable methods may be employed for
the user to communicate to the system which feature is of interest.
According to other embodiments, AD may include other components
such as a module that keeps record of the number or volume, and
average of documents related to the feature of interest, in the
case that the type of alert that the user chooses is based on
trends emerging of the feature of interest.
[0012] Another aspect of the present disclosure may be an alerting
system based on new knowledge discovered about a feature of
interest, where an alert may be sent to a user when new information
or new knowledge (for instance, new topics or frequently
co-occurring entities) about the feature of interest is
discovered.
[0013] Another aspect of the present disclosure may be an alerting
system based on new associations between a feature and the feature
of interest, where an alert may be sent to a user when new types of
association are found between features and the feature of
interest.
[0014] Another aspect of the present disclosure may be an alerting
system based on new trends emerging about a feature of interest,
where an alert may be sent to a user when detecting new trending
changes on number of occurrences for the feature of interest.
Trending changes may include changes in the number of documents
(considered as the number of documents mentioned per day/week,
depending on the user specifications), changes in the average of
the number of documents per day, and changes in the number of
occurrences, among others.
[0015] By using entity disambiguation for the alert systems,
documents may be accurately determined to be associated with the
entity of interest, allowing the systems to alert users when new
information about a feature is available, but only when it is about
the correct feature of interest; i.e., the disclosed method
eliminates alerts on documents that mention a different feature
with the same name.
[0016] According to various embodiments, method for entity-driven
alerts based on disambiguated features may reduce the number of
false positives resulting from a state of the art search queries.
This in turn, may increase the efficiency of monitoring, allowing
for a broadened universe of alerts.
[0017] In one embodiment, a computer-implemented method comprises
disambiguating, by a disambiguation computer, a document feature
from an electronic document by way of extracting, by a feature
extraction computer, the document feature from the electronic
document, and linking, by a linking computer, the extracted
document feature to one or more document features stored in a
knowledge database of co-occurring document features of a plurality
of electronic documents; assigning, by a scoring computer, to the
disambiguated document feature a confidence score indicative of a
level of confidence associated with a degree of disambiguation of
the document feature; and adding, by an in-memory database
computer, the disambiguated document feature to the knowledge
database of co-occurring document features when the disambiguated
document feature matches a document feature of interest in an alert
database based at least in part on the confidence score.
[0018] In another embodiment, a system comprises a disambiguation
computer configured to disambiguate a document feature from an
electronic document by being further configured to extract a
document feature from an electronic document, and link the
extracted document feature to one or more document features stored
in a knowledge database of co-occurring document features of a
plurality of electronic documents; a scoring computer configured to
assign to the disambiguated document feature a confidence score
indicative of a level of confidence associated with a degree of
disambiguation of the document feature; and an in-memory database
computer configured to add the disambiguated document feature to
the knowledge database of co-occurring document features when the
disambiguated document feature matches a document feature of
interest in an alert database based at least in part on the
confidence score.
[0019] Numerous other aspects, features and benefits of the present
disclosure may be made apparent from the following detailed
description taken together with the drawing figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] The present disclosure can be better understood by referring
to the following figures. The components in the figures are not
necessarily to scale, emphasis instead being placed upon
illustrating the principles of the disclosure. In the figures,
reference numerals designate corresponding parts throughout the
different views.
[0021] FIG. 1 is a diagram of a system for disambiguating features,
according to an embodiment.
[0022] FIG. 2 is a flowchart for an alerting method based on new
knowledge discovered about a feature of interest, according to an
embodiment.
[0023] FIG. 3 is a flowchart for an alerting method based on new
associations with a feature of interest, according to an
embodiment.
[0024] FIG. 4 is a flowchart for an alerting method based on new
trends emerging about a feature of interest, according to an
embodiment.
DEFINITIONS
[0025] As used here, the following terms may have the following
definitions:
[0026] "Database" refers to any system including any combination of
clusters and modules suitable for storing one or more collections
and suitable to process one or more queries.
[0027] "Document" refers to a discrete electronic representation of
information having a start and end.
[0028] "Corpus" refers to a collection of one or more
documents.
[0029] "Feature" refers to any information which is at least
partially derived from a document.
[0030] "Feature attribute" refers to metadata associated with a
feature; for example, location of a feature in a document,
confidence score, among others.
[0031] "Feature extraction" refers to information processing
methods for extracting information such as names, places, and
organizations.
[0032] "Fact" refers to objective relationships between
features.
[0033] "Knowledge Base" refers to a base containing
features/entities.
[0034] "Live corpus", or "Document Stream", refers to a corpus that
is constantly fed as new documents are uploaded into a network.
[0035] "Memory" refers to any hardware component suitable for
storing information and retrieving said information at a
sufficiently high speed.
[0036] "In-Memory Database", or "MEMDB", refers to a database in
which all records are stored in memory.
[0037] "Module" refers to a computer hardware or software
components suitable for carrying out at least one or more
tasks.
[0038] "Link on-the-fly module" refers to any linking module that
performs data linkage as data is requested from the system rather
than as data is added to the system.
[0039] "Topic" refers to a set of thematic information which is at
least partially derived from a corpus.
[0040] "Topic Model" refers to a hypothetical description of a
complex entity or process.
[0041] "Query" refers to a request to retrieve information from one
or more suitable databases.
DETAILED DESCRIPTION
[0042] The present disclosure is here described in detail with
reference to embodiments illustrated in the drawings, which form a
part here. Other embodiments may be used and/or other changes may
be made without departing from the spirit or scope of the present
disclosure. The illustrative embodiments described in the detailed
description are not meant to be limiting of the subject matter
presented here.
[0043] The present disclosure describes a method for entity-driven
alerts based on disambiguated features. According to various
embodiments, the disclosed method for entity-driven alerts may be
based on different filters based on criteria specified by a user,
who is interested in receiving information about a feature of
interest. The criteria may include restrictions, such as new
knowledge of a disambiguated feature, new associations with the
disambiguated feature, or new trends about the disambiguated
feature, among others.
[0044] According to various embodiments, disambiguated features, on
which method for entity-driven alerts is based, may be
disambiguated by a plurality of suitable methods. According to one
embodiment, a system for disambiguating features may include
multiple computer modules, such as one or more feature extraction
modules, one or more disambiguation modules, one or more scoring
modules, and one or more linking modules. The method for
disambiguating features may improve the accuracy of entity
disambiguation beyond what may be achieved by considering no
document linking. Taking account of document linkage may allow
better disambiguation by considering document and entity
relationships implied by links.
[0045] According to various embodiments, the types of features
extracted by the method for disambiguating features may include
topic IDs, employing multiple modules to combine extracted
entities. The topics may be machine generated (not human
generated), thus, may be derived directly from a corpus.
[0046] According to one embodiment, the disclosed method may
identify topic relatedness of new and existing topic IDs employing
one or more disambiguating modules including one or more
disambiguating algorithms, forming a normalized set of topic
IDs.
[0047] According to various embodiments, the disclosed method may
include a construction of a knowledge base to extract meaningful
context from each document in a massive corpus using multiple topic
models with differing levels of granularity to classify documents
to topics, feature and entity extraction, event extraction, fact
extraction, and sentiment extraction, among others.
[0048] Disambiguated features based on topics may allow to
disambiguate one or more features/entity of interest occurring in a
document by extracting meaningful context from a document (topics,
entities, events, sentiment, and other features); and by
disambiguating the extracted features by linking the co-occurrence
of extracted features (topics, entities, etc.) using the knowledge
base of co-occurring features.
[0049] Thus, the disclosed method may have an improved accuracy of
feature disambiguation by establishing more accurate relationships
between entities and documents, by considering only the entities
which occur closer in text to the link location in the source
document. This may increase the possibility of deriving useful
relationships from long documents having many entities, which would
complicate a typical entity disambiguation algorithm by introducing
a large number of irrelevant co-occurrences. Similarly, the method
may potentially handle documents that have occurrences of entities
with different disambiguation. The disambiguation algorithm may
generate different associations for different features.
[0050] An alert mechanism's novelty can be based on "disambiguated
features" that a user can specify around an already existing and
disambiguated feature of interest (can be any feature such as
entities, topics, etc.) in the knowledge base. Conventional alert
systems may be based on keyword search alert mechanisms, wherein a
"disambiguated feature" guides the alert, providing better
relevance and precision. The alert mechanism described herein can
provide a way to detect and communicate emerging trends related to
a "feature of interest," new associations to the "feature of
interest," and new knowledge discovered to the "feature of
interest," besides just a group of documents that mention the
"feature of interest." The methods can also provide a system-wide
knowledge base update process by support of dynamic
on-the-fly-linking mechanism in an in-memory database based on an
individual user's alert query. Thus, this feature can provide a
framework to support collaborative knowledge sharing among
different users in a given system establishment.
[0051] System for Disambiguating Features
[0052] FIG. 1 is a block diagram of a system 100 for disambiguating
features, according to an embodiment. In the system 100 for
disambiguating features a new document 102 is input into the
system, such as into a feature extraction module 104, which
performs feature extraction from the document 102. The new document
102 may be fed from any suitable source, such as a massive corpus
or live corpus of documents that may have a continuous input of
documents, e.g. from an internet or network connection 106
(NC).
[0053] One or more feature recognition and extraction algorithms
may be employed by the feature extraction module 104 to analyze the
document 102. A score may be assigned to each extracted feature.
The score may indicate the level of certainty of the feature being
correctly extracted and linked with the correct attributes.
Additionally, during feature extraction by the module 104, one or
more primary features may be identified from document 102. Each
primary feature may have been associated with a set of feature
attributes and one or more secondary features (like proximity
cluster of co-occurring features like entities).
[0054] During the process of disambiguation, the system may be
constantly getting new knowledge, updated by users 108, that are
not pre-linked in a static way; thus, the number of documents to be
evaluated may be infinitely increasing. This may be achieved
because of the use of MEMDB module 110. The MEMDB module 110 may
allow to perform a faster disambiguation process, and may allow to
do a Link On-the-Fly (OTF) passing through link OTF module 112,
which enables to get the latest information that is going to
contribute to MEMDB 110. The disclosed link OTF module 112 may be
capable of constantly evaluating, scoring, linking, and clustering
a feed of information.
[0055] Any suitable method for linking the features may be
employed, which may essentially use a weighted model for
determining which feature types are most important, which have more
weight, and, based on confidence scores, determine the confidence
level of feature extraction by feature extraction module 104 and
confidence level of feature disambiguation by feature
disambiguation module 114 with regard to the correct features.
Consequently, the correct feature may go into the resulting cluster
of features. As more nodes are working in parallel, the process may
be more efficient. The result of all process aforementioned may be
output as one or more newly disambiguated features 116.
[0056] By using feature disambiguation modules, documents may be
accurately determined to be associated with the entity of interest,
which may allow the system to alert users when new information
about an feature is available but only when it is about the correct
feature of interest.
[0057] After feature disambiguation 114 of new document 102 has
been made, the extracted new features may be included in MEMDB 110
to pass through link OTF module 112; where the features may be
compared and linked, and an ID of disambiguated feature 116 may be
returned to a user as a result from a query. In addition to the ID,
the resulting feature cluster defining the disambiguated feature
116, may optionally be returned.
[0058] Once features are disambiguated the number of alerts to be
sent to a user may be further reduced by letting the user specify
addition restrictions.
[0059] Disambiguated features 116 may then be included in an Alert
Database (AD). Components within AD may vary according to the type
of alert the user wants to receive. The AD may have at least the
following components.
TABLE-US-00001 TABLE 1 Alert database. Feature of User ID Interest
Alert Specifications Known-Knowledge Base
[0060] According to various embodiments, the AD may have a user
identifier to which the alerts may be going to be sent; a
collection of disambiguated features from which the user may select
which feature wants to monitor; an alert specification describing
the type of alert the user wants to receive; and a known-knowledge
base on which known knowledge about the feature of interest may be
stored. Any suitable methods may be employed for the user to
communicate to the system which is the feature of interest.
[0061] According to various embodiments, a feature of interest may
include a person, a phone number, a place, a company, among
others.
[0062] The types of alert the user may select from may include
alerts by e-mail, phone number, or other type of feature to which
the system may reach the user.
[0063] According to one embodiment, known-knowledge base may be
stored in another MEMDB. Known-knowledge base may have any suitable
structure, which may be processed by any suitable algorithms, such
as associated topics, proximity cluster of other feature (like
entities, events), derived prominence factor (based on simple
frequency counts or a weighted association with global events and
importance automatically captured via a large time bound corpus),
and temporally linked events, among others. Additionally,
known-knowledge base may include restrictions from which the alerts
may be going to be based, according to the user specifications.
Knowledge within known-knowledge base may have any suitable
representation, such as incremental graphs, among others.
[0064] According to an embodiment of the present disclosure,
restrictions within known-knowledge base may include a selection or
criteria that may be here classified as new knowledge about a
feature of interest, new association with the feature of interest,
and new trends emerging about the feature of interest, among
others.
[0065] According to other embodiments, AD may include other
components such as a module that keeps record of the number or
volume, and average of documents related to the feature of
interest, in the case that the type of alert that the user chooses
is based on trends emerging of the feature of interest.
[0066] Alert when New Knowledge is Discovered about the Feature of
Interest
[0067] After disambiguated feature 116 is obtained, disambiguated
feature 116 may be sent to AD to be compared with the feature of
interest previously selected by the user. If disambiguated features
116 match with the feature of interest, disambiguated features 116
are included within known-knowledge base.
[0068] FIG. 2 is a flowchart of alerting method 200 based on new
knowledge discovered about a feature of interest, according to one
embodiment.
[0069] According to one embodiment, in step 202, when a new
document is input into the system, the document is processed by a
disambiguation module where features within the document are
extracted 104 and disambiguated 114. In step 204, disambiguated
feature 116 may be subsequently sent to the Alert Database to be
compared, in step 206, with the existing knowledge included within
the known-knowledge base in order to determine the relationship
between disambiguated features 116 and the feature of interest. In
step 208, if disambiguated feature 116 does not match the feature
of interest, then the process may end 210.
[0070] If disambiguated features 116 match, step 208, the feature
of interest, disambiguated features 116 are compared, in step 212,
with the knowledge within known-knowledge base to determine if
there is a match between the new features and the already extracted
features that form part of the known-knowledge base, for example,
if an alert has already been sent about that knowledge. If the is
no new knowledge, step 214, the process may end, step 216. If, in
step 214, new knowledge is found, the known-knowledge base is
updated, step 218, and an alert is sent, step 220, to the user by
the specified notification method, for example email or mobile
device messaging, among others.
[0071] Alert when the Feature of Interest has a New Association to
New Features
[0072] After disambiguated feature 116 is obtained, disambiguated
feature 116 may be sent to AD to be compared with the feature of
interest previously selected by the user. If disambiguated features
116 match with the feature of interest, disambiguated features 116
are included within known-knowledge base.
[0073] FIG. 3 is a flowchart of alerting method 300 based on new
association discovered about the feature of interest, according to
one embodiment.
[0074] According to one embodiment, when a new document is input,
step 302, into the system, the document is processed by a
disambiguation module where features within the document are
extracted 104 and disambiguated 114. Disambiguated feature 116 may
be subsequently sent, step 304, to the Alert Database to be
compared, step 306, with the existing knowledge included within the
known-knowledge base in order to determine if there is an
association between disambiguated features 116 and the feature of
interest. If disambiguated feature 116 does not match, step 308,
the feature of interest, then the process ends in step 310.
[0075] If disambiguated features 116 match, in step 308, the
feature of interest, disambiguated features 116 are compared, step
312, with the knowledge within known-knowledge base to determine if
there is a match between the new features and the already extracted
features that form part of the known-knowledge base, for example,
if an alert has already been sent about that knowledge. If the is
no new association, step 314, the process ends, step 316. If new
association in step 314 is found, the known-knowledge base is
updated in step 318 and an alert is sent in step 320 to the user by
the specified notification method, for example email or mobile
device messaging, among others.
[0076] Alert when New Trends About the Feature of Interest
Emerge
[0077] After disambiguated feature 116 is obtained, disambiguated
feature 116 may be sent to AD to be compared with the feature of
interest previously selected by the user. If disambiguated features
116 match with the feature of interest, disambiguated features 116
are included within known-knowledge base.
[0078] FIG. 4 is a flowchart of alerting method 400 based on new
trends emerging about the feature of interest, according to one
embodiment.
[0079] According to one embodiment, when a new document is input,
in step 402, into the system, the document is processed by a
disambiguation module where features within the document are
extracted 104 and disambiguated 114. Disambiguated feature 116 may
be subsequently sent, in step 404, to the Alert Database to be
compared, in step 406, with the existing knowledge included within
the known-knowledge base in order to determine if there is an
association between disambiguated features 116 and the feature of
interest. If in step 408, the disambiguated feature 116 does not
match the feature of interest, then the process may end, step
410.
[0080] If disambiguated features 116 match, step 308, the feature
of interest, the documents including disambiguated features 116 of
interest are accounted and an indicator of the total amount of such
documents are stored in the Alert Database to check, in step 412,
if the volume of documents about the feature of interest is greater
than the daily average. If the volume of documents is not greater,
step 414, than the average, the process may end, step 416. If the
volume of documents is greater, step 414, than the average, the
known-knowledge base is updated, step 418, and an alert is sent, in
step 420, to the user by the specified notification method, for
example email or mobile device messaging, among others.
[0081] According to one embodiment, the volume may be considered as
the number of documents mentioned per day, week, or month, among
other, depending on the user specifications.
[0082] According to other embodiments, the volume may be considered
as the number of occurrences of the feature of interest.
[0083] According to various embodiments, method for entity-driven
alerts based on disambiguated features may reduce the number of
false positives resulting from a state of the art search queries.
This in turn, may increase the efficiency of monitoring, allowing
for a broadened universe of alerts.
[0084] Example #1 is an embodiment of alerting method 200, where a
user is interested in finding new knowledge about John Doe, the
football player. In this embodiment, the known knowledge is that
John Doe appears in sports magazines; but, after applying a method
for disambiguation features, a new document input, in step 202,
where the same John Doe appears on an economic magazine is found.
As the known-knowledge base has no records of this John Doe
appearing on economic magazines, an alert is sent 220 to the
user.
[0085] Example #2 is an embodiment of alerting method 300, where a
user is interested in finding new associations with John Doe, the
musician. In this embodiment, the known knowledge is that John Doe
has been associated to music concerts and to a music company named
"Re"; but, after applying a disambiguation method, a new document
input, in step 302, where the same John Doe is associated with a
music company named "Fa" is found. As the known-knowledge base has
no records of this John Doe association with "Fa," an alert is
sent, in step 320, to the user.
[0086] Example #3 is an embodiment of alerting method 400, where a
user is interested in keeping track of the trend changes in
documents about John Doe, an environmental activist. In this
embodiment, the known knowledge is that the average number of
documents mentioning the same John Doe is 50; but, after applying a
disambiguation method and employing the AD to monitor the total
mentions of the same John Doe per day, the average of mentions on
the third they of monitoring has been calculated to be 80. As the
average of mentions in the third day is greater than the average of
mentions during the past days, an alert is sent, in step 420, to
the user.
[0087] The foregoing method descriptions and the process flow
diagrams are provided merely as illustrative examples and are not
intended to require or imply that the steps of the various
embodiments must be performed in the order presented. As will be
appreciated by one of skill in the art the steps in the foregoing
embodiments may be performed in any order. Words such as "then,"
"next," etc. are not intended to limit the order of the steps;
these words are simply used to guide the reader through the
description of the methods. Although process flow diagrams may
describe the operations as a sequential process, many of the
operations can be performed in parallel or concurrently. In
addition, the order of the operations may be re-arranged. A process
may correspond to a method, a function, a procedure, a subroutine,
a subprogram, etc. When a process corresponds to a function, its
termination may correspond to a return of the function to the
calling function or the main function.
[0088] The various illustrative logical blocks, modules, circuits,
and algorithm steps described in connection with the embodiments
disclosed here may be implemented as electronic hardware, computer
software, or combinations of both. To clearly illustrate this
interchangeability of hardware and software, various illustrative
components, blocks, modules, circuits, and steps have been
described above generally in terms of their functionality. Whether
such functionality is implemented as hardware or software depends
upon the particular application and design constraints imposed on
the overall system. Skilled artisans may implement the described
functionality in varying ways for each particular application, but
such implementation decisions should not be interpreted as causing
a departure from the scope of the present invention.
[0089] Embodiments implemented in computer software may be
implemented in software, firmware, middleware, microcode, hardware
description languages, or any combination thereof. A code segment
or machine-executable instructions may represent a procedure, a
function, a subprogram, a program, a routine, a subroutine, a
module, a software package, a class, or any combination of
instructions, data structures, or program statements. A code
segment may be coupled to another code segment or a hardware
circuit by passing and/or receiving information, data, arguments,
parameters, or memory contents. Information, arguments, parameters,
data, etc. may be passed, forwarded, or transmitted via any
suitable means including memory sharing, message passing, token
passing, network transmission, etc.
[0090] The actual software code or specialized control hardware
used to implement these systems and methods is not limiting of the
invention. Thus, the operation and behavior of the systems and
methods were described without reference to the specific software
code being understood that software and control hardware can be
designed to implement the systems and methods based on the
description here.
[0091] When implemented in software, the functions may be stored as
one or more instructions or code on a non-transitory
computer-readable or processor-readable storage medium. The steps
of a method or algorithm disclosed here may be embodied in a
processor-executable software module which may reside on a
computer-readable or processor-readable storage medium. A
non-transitory computer-readable or processor-readable media
includes both computer storage media and tangible storage media
that facilitate transfer of a computer program from one place to
another. A non-transitory processor-readable storage media may be
any available media that may be accessed by a computer. By way of
example, and not limitation, such non-transitory processor-readable
media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk
storage, magnetic disk storage or other magnetic storage devices,
or any other tangible storage medium that may be used to store
desired program code in the form of instructions or data structures
and that may be accessed by a computer or processor. Disk and disc,
as used here, include compact disc (CD), laser disc, optical disc,
digital versatile disc (DVD), floppy disk, and Blu-ray disc where
disks usually reproduce data magnetically, while discs reproduce
data optically with lasers. Combinations of the above should also
be included within the scope of computer-readable media.
Additionally, the operations of a method or algorithm may reside as
one or any combination or set of codes and/or instructions on a
non-transitory processor-readable medium and/or computer-readable
medium, which may be incorporated into a computer program
product.
[0092] The preceding description of the disclosed embodiments is
provided to enable any person skilled in the art to make or use the
present invention. Various modifications to these embodiments will
be readily apparent to those skilled in the art, and the generic
principles defined here may be applied to other embodiments without
departing from the spirit or scope of the invention. Thus, the
present invention is not intended to be limited to the embodiments
shown here but is to be accorded the widest scope consistent with
the following claims and the principles and novel features
disclosed here.
* * * * *