U.S. patent number 11,100,065 [Application Number 16/165,242] was granted by the patent office on 2021-08-24 for tools and techniques for extracting knowledge from unstructured data retrieved from personal data sources.
This patent grant is currently assigned to SALESFORCE.COM, INC.. The grantee listed for this patent is salesforce.com, inc.. Invention is credited to Thierry Donneau-Golencer, Corey Hulen, William Scott Mark, Kenneth C. Nitz, Rajan Singh, Madhu Yarlagadda.
United States Patent |
11,100,065 |
Donneau-Golencer , et
al. |
August 24, 2021 |
Tools and techniques for extracting knowledge from unstructured
data retrieved from personal data sources
Abstract
A system may include multiple personal data sources and a
machine-implemented data extractor and correlator configured to
retrieve personal data from at least one of the personal data
sources. The data extractor and correlator may extract information
from unstructured data within the retrieved personal data and
correlate the extracted information with previously stored
structured data to generate additional structured data. The system
may also include a storage device configured to store the
previously stored structured data and the additional structured
data. A natural language query module may be configured to receive
a natural language query from a user and provide a response to the
natural language query based at least in part on one or both of the
previously stored structured data and the additional structured
data.
Inventors: |
Donneau-Golencer; Thierry
(Menlo Park, CA), Singh; Rajan (San Jose, CA),
Yarlagadda; Madhu (Los Altos, CA), Hulen; Corey (Menlo
Park, CA), Nitz; Kenneth C. (Redwood City, CA), Mark;
William Scott (San Mateo, CA) |
Applicant: |
Name |
City |
State |
Country |
Type |
salesforce.com, inc. |
San Francisco |
CA |
US |
|
|
Assignee: |
SALESFORCE.COM, INC. (San
Francisco, CA)
|
Family
ID: |
48173484 |
Appl.
No.: |
16/165,242 |
Filed: |
October 19, 2018 |
Prior Publication Data
|
|
|
|
Document
Identifier |
Publication Date |
|
US 20190050432 A1 |
Feb 14, 2019 |
|
Related U.S. Patent Documents
|
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
Issue Date |
|
|
15234871 |
Aug 11, 2016 |
10140322 |
|
|
|
13287983 |
Nov 2, 2011 |
9443007 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F
16/24522 (20190101); G06F 16/22 (20190101); G06F
16/3329 (20190101); G06F 16/337 (20190101) |
Current International
Class: |
G06F
7/00 (20060101); G06F 16/22 (20190101); G06F
16/335 (20190101); G06F 16/332 (20190101); G06F
16/2452 (20190101) |
References Cited
[Referenced By]
U.S. Patent Documents
Foreign Patent Documents
Other References
Heidorn, "Natural Language Dialogue for Managing an On-Line
Calendar", Proceedings of the 1978 Annual Conference, ACM, 1978,
pp. 45-52. cited by applicant .
Modi, et al., "CMRadar: A Personal Assistant Agent for Calendar
Management", Department of Computer Science, Carnegie Mellon
University, Springer-Verlag Berlin Heidelberg, 2005, pp. 169-181.
cited by applicant .
Schwabe Williamson & Wyatt, PC Listing of Related cases; Oct.
24, 2018; 2 pages. cited by applicant .
"Google Plus Users", Google+Ripples; Oct. 31, 2011; 3 pages. cited
by applicant.
|
Primary Examiner: Cheema; Azam M
Attorney, Agent or Firm: Schwabe Williamson & Wyatt
Parent Case Text
The application is a divisional application of U.S. patent
application Ser. No. 15/234,871, filed Aug. 11, 2016, which is a
continuation patent application of U.S. patent application Ser. No.
13/287,983, filed Nov. 2, 2011, now U.S. Pat. No. 9,443,007, issued
Sep. 13, 2016 which are all herein incorporated by reference in
their entirety.
Claims
The invention claimed is:
1. A database system for updating user profiles, comprising: a
processor to: store in a knowledge store user profile data in the
user profiles; retrieve personal data from personal data sources;
correlate the personal data with the user profile data to generate
additional user profile data, the processor further to: retrieve
unstructured data from different user applications, correlate the
unstructured data from the different user applications with the
user profile data in the user profiles to generate at least some of
the additional user profile data; update the user profiles with the
additional user profile data, wherein at least some of the user
profiles are associated with different user roles for a same user;
receive, by a query module, queries submitted by users of the
database system, and generate responses to the queries based at
least on structured data; and interact, by a feedback module, with
the knowledge store and the personal data sources to update the
user profiles when the personal data from the personal data sources
changes, such that the user profiles continually update as more
queries are conducted, wherein the query module interacts with the
feedback module so that the responses generated by the query module
are adjusted based on information provided by the feedback
module.
2. The database system of claim 1, further comprising the processor
to generate new user profiles based on the correlation of the
personal data with user profile data.
3. The database system of claim 1, wherein the user profile data
comprises structured data and the personal data includes
unstructured data.
4. The database system of claim 1, further comprising the processor
to retrieve at least some of the personal data from email messages
and calendar items.
5. The database system of claim 4, further comprising the processor
to retrieve at least some of the personal data from objects in a
customer relationship management (CRM) application.
6. The database system of claim 1, wherein the personal data
sources comprise at least two of a group consisting of: an email
message, a calendar item, a customer relationship management (CRM)
application object, an address book entry, a tweet, and a blog
entry.
7. A computer-implemented method for updating user profiles,
comprising a processor: storing in a knowledge store user profile
data in the user profiles; retrieving personal data from personal
data sources; correlating the personal data with the user profile
data to generate additional user profile data by: retrieving
unstructured data from different user applications, correlating the
unstructured data from the different user applications with the
user profile data in the user profiles to generate at least some of
the additional user profile data; updating the user profiles with
the additional user profile data, wherein at least some of the user
profiles are associated with different user roles for a same user;
receiving, by a query module, queries submitted by users of the
database system, and generate responses to the queries based at
least on structured data; and interacting, by a feedback module,
with the knowledge store and the personal data sources to update
the user profiles when the personal data from the personal data
sources changes, such that the user profiles continually update as
more queries are conducted, wherein the query module interacts with
the feedback module so that the responses generated by the query
module are adjusted based on information provided by the feedback
module.
8. The computer-implemented method of claim 7, further comprising
the processor: generating new user profiles based on the
correlation of the personal data with user profile data.
9. The computer-implemented method of claim 7, wherein the user
profile data comprises structured data and the personal data
includes unstructured data.
10. The computer-implemented method of claim 7, further comprising
the processor: retrieving at least some of the personal data from
email messages and calendar items.
11. The computer-implemented method of claim 10, further comprising
the processor: retrieving at least some of the personal data from
objects in a customer relationship management (CRM)
application.
12. The computer-implemented method of claim 8, wherein the
personal data sources comprise at least two of a group consisting
of: an email message, a calendar item, a customer relationship
management (CRM) application object, an address book entry, a
tweet, and a blog entry.
13. A non-transitory computer-readable medium comprising
instructions for a computer program to: update user profiles, the
instructions operable to: store in a knowledge store user profile
data in the user profiles; retrieve personal data from personal
data sources; correlate the personal data with the user profile
data to generate additional user profile data by: retrieving
unstructured data from different user applications, correlating the
unstructured data from the different user applications with the
user profile data in the user profiles to generate at least some of
the additional user profile data; update the user profiles with the
additional user profile data, wherein at least some of the user
profiles are associated with different user roles for a same user;
receive, by a query module, queries submitted by users of the
database system, and generate responses to the queries based at
least on the structured data; and interact, by a feedback module,
with the knowledge store and the personal data sources to update
the user profiles when the personal data from the personal data
sources changes, such that the user profiles continually update as
more queries are conducted, wherein the query module interacts with
the feedback module so that the responses generated by the query
module are adjusted based on information provided by the feedback
module.
14. The non-transitory computer-readable medium of claim 13,
further comprising instructions operable to: generate new user
profiles based on the correlation of the personal data with user
profile data.
15. The non-transitory computer-readable medium of claim 13,
wherein the user profile data comprises structured data and the
personal data includes unstructured data.
16. The non-transitory computer-readable medium of claim 13,
further comprising instructions operable to: retrieve at least some
of the personal data from email messages and calendar items.
17. The non-transitory computer-readable medium of claim 16,
further comprising instructions operable to: retrieve at least some
of the personal data from objects in a customer relationship
management (CRM) application.
18. The non-transitory computer-readable medium of claim 13,
wherein the personal data sources comprise at least two of a group
consisting of: an email message, a calendar item, a customer
relationship management (CRM) application object, an address book
entry, a tweet, and a blog entry.
Description
BACKGROUND
The modern abundance of personal data from sources such as email,
contacts, and documents cannot be overstated. Indeed, there exists
a significant lack of and ever-growing need for even greater
abilities to process such data in meaningful ways so as to provide
a user with opportunities to do more than mere keyword searches or
similar actions. Current systems offer limited use of information
within personal and public data and generally provide a user with
little more than typical search engine functionality.
There remains a need for a way to address these and other problems
associated with the prior art. More particularly, there remains a
need for greater leveraging of personal data for a user,
particularly with regard to unstructured data.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating an example of a networked
system in which embodiments of the disclosed technology may be
implemented.
FIG. 2 is a block diagram illustrating an example of a system
implementing an adaptive ontology controller in accordance with
certain embodiments of the disclosed technology.
FIG. 3 is a block diagram illustrating an example of a system
implementing a data extractor and correlator in accordance with
certain embodiments of the disclosed technology.
FIG. 4 is a block diagram illustrating an example of a system
implementing a user profile module in accordance with certain
embodiments of the disclosed technology.
FIG. 5 is a block diagram illustrating an example of a system
implementing a feedback module in accordance with certain
embodiments of the disclosed technology.
FIG. 6 is a flowchart illustrating an example of a
machine-implemented method in accordance with certain embodiments
of the disclosed technology.
FIG. 7 is a flowchart illustrating an example of another
machine-implemented method in accordance with certain embodiments
of the disclosed technology.
FIG. 8 is a block diagram illustrating an example of a system
involving structured data and unstructured data retrieved from
multiple data sources in accordance with certain embodiments of the
disclosed technology.
FIG. 9 illustrates an example that shows possible relationships
between the word "Apple" and various types of entities that may be
defined.
DETAILED DESCRIPTION
The disclosed technology relates generally to data processing,
query processing, and more particularly but not exclusively to
systems and methods for processing document and text data. For
example, knowledge may be harvested from unstructured data and
subsequently relied on or used to provide a user with meaningful
information that ties together multiple pieces of data from any of
a number of personal data sources and, in some embodiments, public
data sources.
FIG. 1 is a block diagram illustrating an example of a networked
system 100 in which embodiments of the disclosed technology may be
implemented. In the example, the system 100 includes a network 102
such as the Internet, an intranet, a home network, or any
combination thereof. Traditional computing devices such as a
desktop computer 104 and laptop computer 106 may connect to the
network 102 to communicate with each other or with other devices
connected to the network.
The networked system 100 also includes three mobile electronic
devices 108-112. Two of the mobile electronic devices, 108 and 110,
are mobile communications devices such as cellular telephones or
smart phones. The third mobile electronic device, 112, is a
handheld device such as a personal data assistant (PDA) or tablet
device.
The networked system 100 also includes a storage device 114, which
may be a central database or repository, a local data store, or a
remote storage device, for example. The storage device 114 may be
accessible to any or all of the other devices 104-112, subject to
limitations or restrictions by the devices 104-112, a third party,
or the storage device 114 itself. The storage device 114 may be
used to store some or all of the personal data that is accessed
and/or used by any of the computers 104 and 106 or mobile
electronic devices 108-112. In situations involving public data,
the storage device 114 may also store any or all of the public data
accessed and/or used by any of the computers 104 and 106 or mobile
electronic devices 108-112.
FIG. 2 illustrates an example of a system 200 implementing an
adaptive ontology controller (AOC) in accordance with certain
embodiments of the disclosed technology. A knowledge worker may
interact with the system by way of a user interface 202 such as the
desktop computer 104 of FIG. 1. A query processor 204 may receive
input from the user, such as queries or requests, via the user
interface 202 and provide the user input to a knowledge extractor
and learning engine (KELE) 206.
The AOC 208 is part of the KELE 206, which includes various other
subsystems such as an intent identification module 210, a learning
module 212, a concept expansion module 214, a deep analysis and
reasoning module 216, and various user data sources 218 that
provide personal data and information. The AOC 208 is configured to
interact with a knowledge store 220, such as the storage device 114
of FIG. 1.
FIG. 3 is a block diagram illustrating an example of a system 300
implementing a machine-implemented data extractor and correlator
302 in accordance with certain embodiments of the disclosed
technology. In the example, the data extractor and correlator 302
is configured to retrieve personal data from any of a number of
personal data sources 304A-n. The personal data sources 304A-n may
include, but is not limited to, an email message, a calendar item,
a customer relationship management (CRM) application object, an
address book entry, a tweet, a blog entry, a file, a folder, a
presentation, and a document.
The system 300 also includes a knowledge store 306 configured to
store knowledge, generally in the form of structured data. As used
herein, the term structured data generally refers to data or
information that is identifiable because it is organized in a
structure. Structured data is typically searchable by data type
within content, readily understood by computing devices, and
efficiently organized for human readers. Structured data as
described herein can generally be used to identify a person, place,
or item involved with a particular field or industry, e.g., sales.
Such structured data typically includes, but is not limited to,
fields in a CRM application, such as contact information, account
name, contact name, invoice number, and phone number.
Structured data is usually organized in such a way that it is
readily and often easily searchable, presentable, or useable by an
application or user. In contrast, the term unstructured data as
used herein generally refers to data that has no identifiable
structure. Unstructured data may include content that is similar or
even identical to corresponding structured data but is not
organized in such a way that it is readily or easily searchable,
presentable, or useable by an application or user. Whereas data
corresponding to a "sender" field in an email message is usually
structured data, for example, the typical freeform text of the
email body is generally unstructured data.
The data extractor and correlator 302 is configured to retrieve
personal data from at least one of the personal data sources
304A-n. For example, the data extractor and correlator 302 may be
configured to retrieve all incoming email messages subject to a
filter, e.g., all email messages from a certain sender or
originator. Alternatively or in addition thereto, the data
extractor and correlator 302 may retrieve all documents created by
or edited by the user. A functional or actual filter may be used to
specify that only certain documents, e.g., documents pertaining to
sales involving the user, are to be retrieved by the data extractor
and correlator 302.
The data extractor and correlator 302 is further configured to
extract information from unstructured data within the retrieved
personal data. For example, an email message retrieved from the
data extractor and correlator 302 may contain unstructured data
such as freeform text in the subject or body of the message. In
such a situation, the data extractor and correlator 302 may extract
certain words, terms, or phrases, such as contact information or
sales-related information, from the unstructured data within the
message.
The data extractor and correlator 302 is further configured to
correlate the extracted information with previously stored
structured data, e.g., stored in the knowledge store 306, to
generate additional structured data. For example, consider a
situation in which the data extractor and correlator 302 extracts
additional information, e.g., a secondary phone number extracted
from the body of an email message, that pertains to a sales contact
having information, e.g., a name and a primary phone number, that
is already stored in the knowledge store 306. The extracted
information (secondary phone number) will be correlated with the
previously stored structured data (existing name and primary phone
number) to generate additional structured data (secondary phone
number added to or associated with the existing contact).
The knowledge store 306 is configured to store additional
structured data as well as previously stored structured data. The
data extractor and correlator 302 thus provides output in the form
of enriched knowledge that may be stored within the storage device
306 and used in subsequent queries or applications by the user or
other users or even other applications. For example, in the
situation described above, a subsequent query by a user involving
the sales contact may provide the secondary phone number without
the user needing to perform an additional or more detailed search
for the information.
Table 1 provides an example of different types of structured data
that may be extracted from various types of personal data
sources.
TABLE-US-00001 TABLE 1 Personal Data Source Type Extracted
Structured Data Email From, to, signature, threaded email
conversations, subject field, date, time stamp Calendar Location,
time, invitees, attendees, recurrence, time zone CRM Account,
contact, case, opportunity, partners, contact, approval, asset,
campaign, lead Address Book Name, Company, Title, email, phone,
fax, web url, IM ID, Chat ID, mobile number Documents and Last
modified time, meta data, header, footer, document stores copy
right information, title, author, shared access list
Table 2 provides an example illustrating how the data extractor and
correlator 302 of FIG. 3 may analyze and correlate structured data
and convert it into enriched knowledge.
TABLE-US-00002 TABLE 2 Structured Data Enriched Knowledge Bruce
Thomas First Name: Bruce <bruce. Last Name: Thomas t@zen.com>
Possible Org: Zen Inc (common emails domains like yahoo.com,
msn.com, gmail.com are excluded). Zen.com is used to collect
information about the organization. Company or Org Type: Machine
tool and manufacturing industry. Extracted from Zen.com web site
Group Members: Information extracted based on all the individuals
Bruce Thomas interacts with using the email ID Bruce Thomas.
Information Co-relation and Consolidation: All email addresses,
phone numbers and other information is co-related and
consolidated.
Certain embodiments of the system 300 of FIG. 3 further include a
machine-implemented document harvester configured to retrieve one
or more documents from at least one of the personal data sources
304A-n. Such embodiments may further include a machine-implemented
document indexer configured to index a plurality of documents
harvested by the document harvester from the personal data sources
304A-n.
A document harvester and indexer may be used to process and index
documents including files, e.g., word processing files, spreadsheet
files, presentation files, individual slides in presentation files,
etc., calendar events, to do lists, notes, emails, email
attachments, and web pages. These documents may be retrieved
locally from a user's computer and/or remotely from network
storage, e.g., a server that stores documents produced by a
plurality of users, as well as from the Web, e.g., from web pages
via Web application programming interfaces (APIs). The documents
may also be tagged and/or clustered.
As documents are harvested, a word popularity dictionary may be
created. Word popularity generally refers to a global dictionary
containing high frequency words and weights. When a new document is
harvested, for example, keywords that do not exist in the
dictionary may be added. Stemming may be applied to obtain root
words and text may be converted to lowercase. As a user interacts
with the system by sending email, visiting web pages etc., the
weights in the dictionary can be constantly updated. Keywords in
frequently-accessed documents may be given higher weights while
keywords in less-important documents may be given lower weights.
Consequently, an up-to-date and accurate model of the user's
universe and behavior may be effectively constructed.
FIG. 3 includes a natural language query module 307 that may be
used to generate responses to natural language queries submitted by
users to the system 300. The natural language query module 307 may
access structured information stored by the knowledge store 306
and, in some embodiments, the natural language query module 307 may
also interface directly with the data extractor and correlator 302.
The responses generated by the natural language query module 307 to
be provided to the user are based at least in part on the
structured information within the knowledge store 306. For example,
if a user submits a query pertaining to a sales lead whose
information is stored within the knowledge store 306, the natural
language query module 307 may automatically generate a response
that contains certain information, such as contact information,
that pertains to the sales lead.
FIG. 4 is a block diagram illustrating an example of a system 400
that includes a machine-implemented user profile module 408 in
accordance with certain embodiments of the disclosed technology.
Such embodiments are particularly beneficial for applications that
aim to adapt to a user by better tailoring to his or her specific
needs and preferences.
In the example, the user profile module 408 is configured to
interact with any number of user profiles 410A-n. Each user profile
may correspond to one or more users. Also, any given user may be
associated with multiple user profiles. For example, each user
profile may correspond to a certain role, e.g., sales coordinator,
that may be assigned to or associated with multiple users. Multiple
user profiles 410A-n may correspond to a user's particular
situation. For example, a user may have one user profile 410A for
work-related items and a second user profile 410B for home-related
items. Alternatively or in addition thereto, a user may have one or
more profiles that correspond to activities with friends, one or
more profiles that correspond to family activities, and one or more
profiles that correspond to business-related events.
The user profile module 408 may interact with a knowledge store 406
such as the knowledge store 306 of FIG. 3, for example. The user
profile module 408 may also interact with any of a number of user
applications 412A-n such as a sales-oriented application, for
example. In certain embodiments, a user's experience with a certain
user application may be influenced or even driven by one or more of
the user profiles 410A-n. For example, if the user is interacting
with user application 412B concerning a particular sale involving
information associated with user profile 410C, the user profile
module 408 may direct the user application 412B to proactively
provide certain information, e.g., certain contact information
stored within the knowledge store 406, to the user.
In certain embodiments, the user profile module 408 may interact
with one or more public data sources 414. For example, a personal
corpus or web data often do not provide enough information to build
or update a user profile that is detailed or accurate enough for
certain applications. In these embodiments, the user profile module
408 may proactively seek or passively receive public information
pertaining to a contact whose information is stored by the
knowledge store 406. If the new public information is different
than the previously stored information, the user profile module 408
may direct the knowledge store 406 and/or one or more of the user
profiles 410A-n to update the corresponding information
accordingly.
FIG. 4 also includes a natural language query module 407, such as
the natural language query module 307 of FIG. 3, that may be used
to generate responses to natural language queries submitted by
users to the system 400. The natural language query module 407 may
access structured information stored by the knowledge store 406
and, in some embodiments, the natural language query module 407 may
also interface directly with the user profile module 408. The
responses generated by the natural language query module 407 to be
provided to the user are based at least in part on the structured
information within the knowledge store 406. In certain embodiments,
the response is also based on one or more of the user profiles
410A-n. For example, if the query pertains to information stored in
user profile 410B, the natural language query module 407 may obtain
the information by way of the user profile module 408 and generate
a response incorporating that information.
FIG. 5 is a block diagram illustrating an example of a system 500
implementing a feedback module 516 in accordance with certain
embodiments of the disclosed technology. In the example, the system
500 includes a user profile module 508, such as the user profile
module 408 of FIG. 4, configured to interact with one or more user
profiles 510A-n, such as the user profiles 410A-n of FIG. 4. The
user profile module 508 is also configured to interact with a
knowledge store 506 such as the knowledge store 306 of FIG. 3, for
example.
The feedback module 516 may interact with one or both of the user
profile module 508 and the knowledge store 506. In certain
embodiments, the feedback module 516 may interact with one or more
public data source 514 and may cause the user profile module 508 to
alter or update one or more of the user profiles 510A-n based on
interactions with the public data source(s) 514. In certain
embodiments, the feedback module 516 may interact directly with a
user associated with one of the user profiles 510A-n. Alternatively
or in addition thereto, the feedback module 516 may interact
directly with one or more user applications 512A-n, such as the
user applications 412A-n of FIG. 4.
Consider a situation in which user profile 510B involves a
particular sales contact whose contact information just changed and
is broadcast via the public data source 514. The feedback module
516 may direct the user profile module 508 to update one or more of
the user profiles 510A-n with the new public information concerning
the sales contact. The user profiles 510A-n can be continually
updated and enriched as more searches are conducted and in an
increasingly refined manner. For example, suggestions provided to a
user based on his or her user profile(s) may be increasingly
relevant as time goes on.
In embodiments where the feedback module 516 interacts with one or
more user applications 512A-n, the feedback module 516 may be
triggered to direct the user profile module 508 to update one or
more of the user profiles 510A-n responsive to the interaction with
the user application(s) 512A-n. For example, if the feedback module
516 detects a user updating a contact mailing address in user
application 512B, the feedback module 516 may direct the user
profile module 508 to update any of the user profiles 510A-n that
include a mailing address for the contact.
FIG. 5 also includes a natural language query module 507, such as
the natural language query module 307 of FIG. 3, that may be used
to generate responses to natural language queries submitted by
users to the system 500. The natural language query module 507 may
access structured information stored by the knowledge store 506
and, in some embodiments, the natural language query module 507 may
also interface directly with the feedback module 516. The responses
generated by the natural language query module 507 to be provided
to the user are based at least in part on the structured
information within the knowledge store 506 and, in some
embodiments, may be adjusted based on information provided by the
feedback module 516. For example, a response to the natural
language query may take into account pertinent information from
user profile 510B (by way of the user profile module 508)
responsive to an indication from the feedback module 516 that the
pertinent information has changed, e.g., due to an event that has
occurred or is occurring at the public data source 514.
FIG. 6 is a flowchart illustrating an example of a
machine-implemented method 600 in accordance with certain
embodiments of the disclosed technology. At 602, data is retrieved
from one or more data sources. For example, a machine-implemented
data extractor and correlator, such as the data extractor and
correlator 302 of FIG. 3, may retrieve personal data from one or
more personal data sources, such as the personal data sources
304A-n of FIG. 3.
At 604, information is extracted from unstructured data within the
data retrieved at 602. For example, a data extractor and
correlator, such as the data extractor and correlator 302 of FIG.
3, may extract information pertaining to a sales order such as one
or both of an invoice number and a contact name. Such information
may be unstructured in that it is neither organized in a structured
manner nor readily classifiable or useable without modification or
organizing. For example, the information may be a free-text piece
of data such as the body of an email message.
The information extraction performed at 604 may be accomplished by
breaking at least one sentence into subject, verb, and object
(SVO), extracting phrases that link a subject to an object,
extracting at least one word in close proximity to an identified
feature or service, extracting at least one word in close proximity
to a known quality, or any combination thereof. Features with
certain quality or derived quality ratings may be tagged for
reviews, for example. Also, structures that approximate concepts
from documents with and without prior semantic understanding may be
constructed.
At 606, some or all of the extracted information is correlated with
previously stored structured data to generate additional structured
data. For example, a data extractor and correlator, such as the
data extractor and correlator 302 of FIG. 3, may correlate the
invoice number and/or contact name discussed above with an existing
order and/or contact having associated structured data stored
within a knowledge store, such as the knowledge store 306 of FIG.
3. The knowledge store may store both the additional structured
information and the previously stored structured data as indicated
at 608.
In certain embodiments, the retrieved data includes supplemental
structured data, e.g., structured data that has not yet been stored
within a knowledge store. In these situations, the data extractor
and correlator may correlate the supplemental structured data with
one or both of the previously stored structured data and the
additional structured data to generate further structured data that
may be stored by the knowledge store.
Certain embodiments may include retrieving public data from one or
more public data sources. In these embodiments, a data extractor
and correlator may extract public information from unstructured
data within the retrieved public data and correlate the extracted
public information with previously stored structured data to
generate further additional structured data that may be stored by
the knowledge store.
In certain embodiments, a user profile, such as the user profiles
410A-n of FIG. 4, may be generated based at least in part on one or
both of the previously stored structured data and the additional
structured data, as indicated at 610. Alternatively or in addition
thereto, an existing user profile may be updated based at least in
part on one or both of the previously stored structured data and
the additional structured data, as indicated at 612. Generation and
modification of user profiles may be performed by a user profile
module, such as the user profile module 408 of FIG. 4.
At 614, a natural language query is received from a user. For
example, a user wishing to research a particular sales lead may
provide the following query: "has there been any recent progress
with sales lead XYZ Manufacturing?" The system then generates a
response to the natural language query received at 614, as
indicated at 616. The response is based at least in part on one or
both of the previously stored data and the additional structured
data. For example, if the stored structured data contains
information pertaining to XYZ manufacturing, the generated response
may provide said information to the user in the response.
FIG. 7 is a flowchart illustrating an example of another
machine-implemented method 700 in accordance with certain
embodiments of the disclosed technology. At 702, raw content is
extracted from retrieved data, such as the personal and/or public
data retrieved at 602 of FIG. 6.
At 704, semantic analysis is performed on the raw content. For
example, a semantic analysis module may be configured to determine
semantic information based on unstructured data within the
retrieved data. A data extractor and correlator, such as the data
extractor and correlator 302 of FIG. 3, may be configured to
correlate the extracted information with previously stored
structured data based at least in part on the semantic
information.
Certain embodiments include performing a search of one or more data
sources based on results of the semantic analysis performed at 704.
Such embodiments may include performing a search of one or more
personal data sources, as indicated by 706, or performing a search
of one or more public data sources, as indicated by 708.
At 710, an additional analysis is performed based at least in part
on the results of the search performed at either 706 or 708. In
certain embodiments, a user profile, such as the user profiles
410A-n of FIG. 4, may be updated, e.g., by a user profile module,
based on one or both of the results of the search performed at
either 706 or 708 and the additional analysis performed at 710. The
additional analysis performed at 710 may include an inference
analysis, a topic analysis, information tagging, information
clustering, or some combination thereof. Probabilistic links may
also be created based on the additional analysis. Over time, topics
may be augmented, merged, deleted, or split depending on the
analysis. Also, sub-topics may be created based on the
analysis.
FIG. 8 is a block diagram illustrating an example of a system 800
involving structured data 805A and unstructured data 805B retrieved
from multiple data sources in accordance with certain embodiments
of the disclosed technology. In the example, the data sources
include multiple personal data sources 804A-E: an email message
804A, a calendar item 804B, an address book object 804C, an
application-specific document 804D, and a CRM application object
804E.
In the example, the data sources 804A-E collectively yield five
pieces of structured data 805A that may be retrieved, for example,
by a data extractor and correlator: sales contact information (name
and email address), account name, contact name, invoice number, and
phone number. The data sources 804A-E also provide various pieces
of unstructured data 805B: two proper names (person and company),
meeting time, two invoice numbers, a phone number, meeting-specific
information, and sales-specific information.
Certain information, e.g., invoice numbers, from the unstructured
data 805B may be correlated with the structured data 805A. Such
correlation may include identifying, extracting, or building at
least one relationship between the extracted information and
previously stored structured data. For example, one or more
features identified within the extracted information may be tagged
or otherwise marked for subsequent operations. Parts of speech
analysis may also be performed and then enriched by relationship
determinations.
In certain embodiments, correlation and relationship building may
include categorizing one or more portions of the unstructured data.
Portions of the unstructured data 805B may each be broken into
subject, verb, and object (SVO). Phrases linking a subject to an
object may be extracted. A determination may be made as to certain
words in close proximity to an identified feature or service, known
quality, or any combination thereof.
Consider an example in which the word "Apple" in unstructured
free-flowing data could have multiple meanings. The word could
refer to a fruit, the name of a company, the name of a person, etc.
Relationships may be established to decipher the meaning of certain
unstructured data. A possible association to each of these entities
can be created and those that have a high probability based on
parts of speech analysis and entity relationship identification,
for example, may be strengthened. In the present example, an
inference may be made that "Apple" is a company based on its
learning from a prior structured data analysis. Table 3 provides an
indication as to the relationships for "Apple" that may be
identified from structured data. In the example, the word "Apple"
occurs along with "Apple Thomas" and "Rob Cramer."
TABLE-US-00003 TABLE 3 Structured Data Relationship Identification
Apple in Apple Thomas First name of an person Apple Company or
business entity name Apple Thomas Name of an person Rob Cramer Name
of a person
FIG. 9 illustrates an example 900 that shows the possible
relationships that may be defined between the word "Apple" 902, as
discovered in unstructured data, and the various types of entities,
e.g., a person 904, a fruit 906, or a company 908, as discussed
above. As indicated visually in the figure by the different
thicknesses of the connecting lines, the word "Apple" as identified
in the unstructured data has been determined to refer to a company
name and not to a fruit or a person based on the relative strength
of the determined relationship therebetween.
In certain embodiments, one or more patterns may be extracted or
determined from structured data, such as previously stored
structured data, to create pattern fingerprints. Patterns may
subsequently be extracted from the unstructured data using these
pattern fingerprints. For example, structured data may be used to
construct a pattern fingerprint knowledge base and then use the
pattern fingerprinting knowledge to recognize similar items from
unstructured data and establish their relationship with various
entities. For example, fingerprint data patterns can be learned and
determined for sales and/or contact-related attributes such invoice
number, P.O. number, and phone number. These learnings may be
applied to identify similar patterns in unstructured data and
identify additional relationships between entities.
Certain implementations of the disclosed technology may include
personalized searching capabilities and features, personalized
content delivery, personalized advertisement delivery, intelligence
gathering and analysis, and automated augmentation of knowledge
bases.
Embodiments of the disclosed technology may be implemented as
machine-directed methods or physical devices. Accordingly, certain
implementations may take the form of an entirely-hardware
embodiment, an entirely-software embodiment, or an embodiment
combining both hardware and software aspects. For example, some or
all of the components for any given embodiment may be
computer-implemented components.
Having described and illustrated the principles of the invention
with reference to illustrated embodiments, it will be recognized
that the illustrated embodiments may be modified in arrangement and
detail without departing from such principles, and may be combined
in any desired manner. And although the foregoing discussion has
focused on particular embodiments, other configurations are
contemplated. In particular, even though expressions such as
"according to an embodiment of the invention" or the like are used
herein, these phrases are meant to generally reference embodiment
possibilities, and are not intended to limit the invention to
particular embodiment configurations. As used herein, these terms
may reference the same or different embodiments that are combinable
into other embodiments.
Consequently, in view of the wide variety of permutations to the
embodiments described herein, this detailed description and
accompanying material is intended to be illustrative only, and
should not be taken as limiting the scope of the invention. What is
claimed as the invention, therefore, is all such modifications as
may come within the scope and spirit of the following claims and
equivalents thereto.
* * * * *