U.S. patent application number 15/595762 was filed with the patent office on 2017-08-31 for search-based recommendation engine.
The applicant listed for this patent is Intelligent Language, LLC. Invention is credited to Athena Ann Smyros, Constantine John Smyros.
Application Number | 20170249317 15/595762 |
Document ID | / |
Family ID | 58670649 |
Filed Date | 2017-08-31 |
United States Patent
Application |
20170249317 |
Kind Code |
A1 |
Smyros; Athena Ann ; et
al. |
August 31, 2017 |
SEARCH-BASED RECOMMENDATION ENGINE
Abstract
The embodiments determine the recommendations for a search term
and its criteria, whereby a threshold is used for accepting a
result, whether it is a document, message, file, or any other form
of communication. The input may be part of a larger repository, and
there is no restriction on how many documents constitute the
returned recommendation set.
Inventors: |
Smyros; Athena Ann;
(Richardson, TX) ; Smyros; Constantine John;
(Richardson, TX) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Intelligent Language, LLC |
Richardson |
TX |
US |
|
|
Family ID: |
58670649 |
Appl. No.: |
15/595762 |
Filed: |
May 15, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14465524 |
Aug 21, 2014 |
9652499 |
|
|
15595762 |
|
|
|
|
61868091 |
Aug 21, 2013 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/3334 20190101;
G06F 16/2455 20190101; G06F 16/93 20190101; G06F 16/332 20190101;
G06F 16/24578 20190101 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A computer program product stored on a non-transitory
computer-readable medium having computer program logic recorded
thereon for producing a recommendation report based on search
criteria, comprising: code for receiving a search request, the
search request comprising a plurality of sentences; code for
deriving a plurality of search terms from grammatical analysis of
the plurality of sentences; code for calculating an importance
rating for each of the plurality of search terms; code for
determining a match level for one or more of the plurality of
search terms in a document in a repository, the match level being
based on the importance rating of the one or more of the plurality
of search terms; and code for identifying a search result based
upon the match level for the search terms.
2. The computer program product of claim 1, wherein the search
request comprises a document.
3. The computer program product of claim 1, wherein the search
request comprises a file.
4. The computer program product of claim 1, wherein the search
request comprises a grammatical sentence.
5. The computer program product of claim 1, wherein the search
result comprises a rank based on a frequency the search terms in
the document.
6. The computer program product of claim 1, wherein the search
result comprises an indication of an exact match the search terms
in the document.
7. The computer program product of claim 1, wherein the search
result comprises an indication of a synonym of the search terms in
the document.
8. The computer program product of claim 1, further comprising:
removing one or more repository members from the repository prior
to determining the match level for the search terms in the
document.
9. The computer program product of claim 1, further comprising:
code for determining a first coverage range for each search term in
the search request, wherein calculating the importance rating is
based at least in part on the first coverage range.
10. The computer program product of claim 9, wherein the first
coverage range indicates a frequency that each search term is used
in the plurality of sentences.
11. A computer program product stored on a non-transitory
computer-readable medium having computer program logic recorded
thereon for producing a recommendation report based on search
criteria, comprising: code for receiving a search request, the
search request comprising one or more sentences; code for
determining a first search term and a second search term from the
one or more sentences; code for calculating a first importance
rating for the first search term in the search request and a second
importance rating for the second search term in the search request;
and code for determining a first match level for the first search
term and a second match level for the second search term in a
document in a repository, the first match level being based on the
first importance rating and the second match level being based on
the second importance rating.
12. The computer program product of claim 11, further comprising:
code for determining a first coverage range for the first search
term in the search request and a second coverage range for the
second search term in the search request, the first coverage range
indicating a first frequency that the first search term is used in
the one or more sentences, the second coverage range indicating a
second frequency that the second search term is used in the one or
more sentences, wherein the first match level is based on the first
coverage range and the second match level is based on the second
coverage range.
13. The computer program product of claim 12, wherein calculating
the first importance rating comprises determining a number of words
in the first coverage range compared to a number of words in search
request.
14. The computer program product of claim 11, wherein code for
calculating the first importance rating uses deterministic
grammatical analysis.
15. The computer program product of claim 11, wherein the code for
determining the first match level is based upon locating an exact
match of the first search term.
16. The computer program product of claim 11, wherein the code for
determining the first match level is based upon locating similar
terms as the first search term.
17. The computer program product of claim 11, wherein the code for
determining the first match level comprises code for determining an
importance of the first search term in the document.
18. The computer program product of claim 11, wherein calculating
the first importance rating is determined at least in part by
determination of whether the first search term is a topic or a
sub-topic in the search request.
19. The computer program product of claim 11, wherein the one or
more sentences comprises a plurality of sentences.
20. The computer program product of claim 11, wherein the code for
determining the first match level and the second match level is
performed for a plurality of documents.
Description
RELATED APPLICATIONS
[0001] This application is a continuation application of U.S.
patent application Ser. No. 14/465,524, entitled "Search-Based
Recommendation Engine," filed on Aug. 21, 2014, which claims
priority to U.S. Provisional Application No. 61/868,091,
"Recommendation Engine", filed Aug. 21, 2013, which applications
are hereby incorporated herein by reference.
BACKGROUND
[0002] Currently, a myriad of communication devices are being
rapidly introduced that need to interact with natural language in
an unstructured manner. Communication systems are finding it
difficult to keep pace with the introduction of devices as well as
the growth of information.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] The accompanying drawings are incorporated in and are a part
of this specification. Understanding that these drawings illustrate
only typical embodiments of the invention and are not therefore to
be considered to be limiting of its scope, the invention will be
described and explained more fully through the use of these
accompanying drawings in which:
[0004] FIG. 1 illustrates an example of an Omission Detection
Process that is usable with the embodiments described herein;
[0005] FIG. 2 depicts a block diagram of a computer system which is
adapted to use the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0006] The embodiments described herein determine the
recommendations for a search term and its criteria, whereby a
threshold is used for accepting a result, whether it is a document,
message, file, or any other form of communication. The input may be
part of a larger repository, and there is no restriction on how
many documents constitute the returned recommendation set. A
threshold can be defined for both numeric and non-numeric values
that are found in a normal search term. For instance, if a search
term is equal to "find a cow that produces 10 gallons of milk a
day", a recommendation engine could be employed when 9.99 is
acceptable or 9.1 is acceptable, where this is very close to the
threshold but doesn't constitute an exact match within the range or
one that would numerically round up to 10.
[0007] Another use of a recommendation engine is the ability to
determine language-based thresholds that do not equate to a
numerical term. For instance, there are a set of characteristics
that are in a document that can be used to separate one product
from another, such as "prefer a hair dryer with a diffuser and a
curler option". The repository used to analyze the search term then
is analyzed for the content that relates to characteristics that
would allow the system to recommend a set of products that had
similar characteristics to the input, even if they are not an exact
match (i.e., as in matching the words like a Boolean search).
[0008] In other implementations, when a threshold or set of
thresholds are used to match a set of search terms, such as those
generated from a focus message or text stream, then what
constitutes a close match may be useful for the viewer of search
results. In yet another implementation, the ability to match one or
more groups of search terms, such as those grouped by date,
topical, location or other grouping variable, may be of interest to
the searcher and can be reported with each such repository member
that exhibits these characteristics. A key feature of these types
of implementations is that the focus can be analyzed in realtime
without prior knowledge of what types of focus will be brought into
the system.
[0009] A search term that is usable with the embodiments may be
comprised of one of any number of term units that make up a text
stream for a given search application. There is no limitation on
the number of search terms. It is possible that the search term
itself is a document, file, message, etc. There is for most
implementations no need for an arbitrary limit on the length of the
search term, such as the number of words that make up the search
term. In addition, the search term may or may not be fully
grammatical in nature; meaning it can be a sentence, paragraph, or
other language-based input as well as a bucket of words.
[0010] Criteria includes items used by the search engine to
determine the usability of the search, such as using documents
before/after a certain date, topics that should be
included/excluded within the search, an authority or author list
that should be used to return the results, etc. Any such criteria
may also be directly expressed within a language-based search
request. For instance, it would be possible to derive the topics
used within a search term paragraph so that only those topics are
considered for the remaining search terms, such that, for example,
the topic "husky" within the paragraph generates a list of search
terms related to it. The result then pulls up only the dog but not
people with a deep voice.
[0011] Initially to start the system, the repository and the search
criteria, including the search term and any additional criteria are
available. The search criteria 101 may either be passed to the
system, calculated based on the search term itself, or calculated
based on their rating used for search results. At a minimum, the
characteristics of the search term and the repository are important
in order for a search criteria to be discovered. In some cases, a
criteria may be shipped with a search term; this requires the
system to be able to determine the features from the criteria that
affect their rating. For instance, if a search result is ranked
based on the frequency of the term in a message, is an exact match
being measured in the rankings or can similar terms, such as
synonyms, be included as part of the search results.
[0012] It may also be that a system will require filtering to
remove repository members before any search terms are analyzed.
Also, depending on implementation, it may be necessary to calculate
the criteria using language terms. For example, a search term is
equal to "the new family dog should be at least two years old". The
search criteria in this case is based on the use of the comparison
"at least two years old". In this case, the criteria is for the
family dog, the subject of the sentence. When such search term
analysis is required, the implementation should contain a
grammatical analyzer of some kind in order to calculate such
information, an example of which is shown in U.S. application Ser.
No. 13/625,784, entitled "NATURAL LANGUAGE DETERMINER", filed 24
Sep. 2012, Attorney Docket No. 001-P010, the disclosure of which is
hereby incorporated herein by reference in its entirety.
[0013] At this point, determining the range of search term(s) in
focus 102 can be used. If a general search is used, the range of
search terms can be all the terms of a particular language or
languages; the importance is based on language-based measures.
There are many ways to generate search terms. A general method is
outlined in U.S. application Ser. No. 13/402,775, entitled "SYSTEMS
AND METHODS UTILIZING A SEARCH ENGINE", filed 22 Feb. 2012,
Attorney Docket No. 001-P002Cl, the disclosure of which is hereby
incorporated herein by reference in its entirety. Other
implementations may be based on a specific language, some can be
related to a specific range of possible uses for the search
implementation.
[0014] Another way is when a focus document is being used, and
search terms are derived from such a document. This means that
there are features of the document that can be measured to
determine how important one search term is over another. Common
measures include frequency, grammar function, headings, etc., and
these can be examined in terms of the rankings and calculated for
what is considered important 103. For instance, if a single search
term contains a single object, then the ranking of that search term
is based on the importance of the object in a repository member. If
multiple search terms are used within a particular implementation,
then the relation between the search terms, such as how they are
related to each other using an external measure like a topic, can
be used to determine the importance factors.
[0015] Other relations are possible if the search terms are derived
from a focus document, such as determining requirements matching or
determining what set of files should be used to solve a particular
problem, as in the medical field where a diagnosis is being
searched against a repository of medical experiences. The output of
103 is the basis for determining the number of search terms within
the focus. Those terms that were found to be important comprise the
search term list. The inclusion of any criteria into the search
term list can be done based on implementation requirements; if an
implementation cannot recognize topics when analyzing the search
term and criteria, for instance, then the topics should be included
as search terms and any analysis based on the documents would have
to be done by non-grammatical means. If, however, the
implementation can support such grammatical analysis, then these
would generally be used as a discriminator and not as a general
search term based on the focus.
[0016] If any search terms are generated from a focus item or set
of items 104, then a variety of optional tasks can be performed.
First, the coverage range 105 can be determined. A coverage range
indicates an interval, the number of terms (such as words, numbers,
symbols, etc.) that are related to a unique term set; as an
example, a topic is being discussed for five paragraphs in an
input. A coverage range may include any number of characteristics,
such as the frequency of the individual search term within the
focus item, the locations in where these are found, such as they
are found in a particular area of the item like a text stream or
document and not in others.
[0017] Other coverage measures include paragraphs that the search
term is in, sentences the search term is in, etc. For more
grammatical measures, the objects that are covered under a specific
subject refers to the fact that they are either modifying the
subject in some way or indicating a characteristic of the subject,
making them in the subject's coverage range. Topics and other
information that have an associated interval may also be used to
determine a coverage range. The coverage range within an item set
needs to be established for each item found to be related to a more
important term, usually measured in words.
[0018] Once the coverage range is known, the optional calculation
to determine coverage importance 106 may be used. The importance of
the range may be determined by looking at the number of words that
are in the range based on the entire size of the item set, where
they occur, and where the grammatical relations that indicate the
coverage range is important, such as it is related to a main topic
in the document versus being related to a smaller subtopic in a
document or message.
[0019] Then, the importance rating for each search term in each
range the search term is found in the focus item can be optionally
calculated 107. Importance ratings are needed when there are
multiple search terms and they are in some language construction,
like a sentence, and not a simple bag of words. For instance, if a
term is related to the subject of a sentence versus when the search
term is a direct object will have different ratings. A rating can
be linear or more heavily weight one instance, such as a subject,
more than the direct object.
[0020] If the result of 104 is negative, then the importance rating
calculation for each search term possibility is calculated 108. In
some cases, this is a language-based calculation, based on some
division based on grammar, or it may be based on the reason for the
search to be performed, as when an implementation is being used by
a particular device that is only interested in search terms that it
can respond to and will ignore the rest. If only a single search
term is possible, then any values that can be applied, including
language measures, that are possible with the search term can be
used. For instance, if a search term is being used to operate a
device, such as find all the CDs that are available in a CD jukebox
at a point in time, then the search term is limited to possible
titles, and may not include other features of a language that are
not used to generate a title.
[0021] Regardless of the use of a focus item or a larger set of
possible search terms based on language constraints, the
calculation to find matching threshold levels is performed next
109. The threshold levels are based on the initial search criteria,
coverage, and importance ratings, that go into how the final search
ranking is formed and at what level is an appropriate match deemed
possible.
[0022] In some cases, this may also be affected by the types of
search allowed. If an exact search is required, then the threshold
of a match may be the frequency of how many times the exact phrase
"dog store" for instance is found in a repository member. In other
cases, where similarities of search terms are allowed, such as
synonyms, then the threshold may be affected by the closeness of
the similarity and how many words in the search term have a
similarity used to find the term within a repository member.
[0023] Still other ratings may be based on focus item usage, and
therefore, the rating may be the number of search terms that are
found in the repository member, their importance ratings, and the
completeness of the search terms found. For instance, if the search
term set allowed similarity measures, and some of the uses of the
search term in the repository member only contained the similar
forms, then that item may be rated differently. If the importance
rating of one search term is higher than another search term, and
the presence or absence of that search term from a focus item with
the higher importance rating will more affect the search rating,
and therefore need to be taken into account when calculating the
threshold level. The threshold level itself is then a set of
formulas that indicate what the acceptable return should look
like.
[0024] Up to this point, the emphasis has been on the input to
properly characterize the search terms so that a recommendation
calculation may be made. Now, a search can be run 110 for each
repository member. The repository member may be from a previous
search (if the search engine is separate from the system) or the
system performs one search and calculates the recommendation if
there is sufficient information to warrant the calculation. For
each repository member, each search term is plugged into a
threshold formula set to determine the total match for the searched
repository member 111.
[0025] If there is a total match, meaning that all the terms were
in the repository member, and had the correct information ratings,
then the repository member is considered a total match and a
recommendation for the repository member is not required 112.
[0026] However, if there is not a perfect match, then the threshold
level may be recalculated in the optional function to get the match
threshold level 113. This means that characteristics about the
repository member may alter the search rankings and therefore may
alter the threshold level for a particular repository member. These
may include document type, such as a word-processing document
versus a spreadsheet, and other such values.
[0027] For any searched document, the importance and coverage data
may be measured 114 if the implementation requires it. This refers
to establishing the search term's placement in the search document
for each instance, then determining its coverage and importance, as
was done for the search term when it is part of a focus item. This
can include language features, grammatical features, statistical
features, and other such features that determine the importance of
the search term(s) within the repository member, independent of
their importance within the focus item that generated the search
terms, for instance.
[0028] With this information, the repository member can have its
recommendation threshold determined 115. The threshold takes into
account all the calculated match threshold levels as indicated in
109 and solved using the information that has been obtained from
the repository member. This is normally a single value or set of
values that can be ranked in some linear order to determine what
the cutoff level is for a recommended search; this means that some
information that is in the search term(s) is not responsive within
the repository member. For instance, all the terms with high
importance ratings are in the repository member with similar high
importance ratings, but few of the less importance ratings are
found.
[0029] If the coverage of the important terms are also sufficiently
high, then it is possible that the threshold has been met for a
recommendation. If the calculations indicate that the threshold has
been met 116, then the recommendation can be reported 117. The
reporting may be in the form of text, may be presented in graphical
form, or may be presented in a multimedia form including speech.
The reporting may contain the features that caused the repository
member to pass, such as what search term(s) were found to be
responsive and their importance. If it does not meet either the
initial threshold or the recommendation threshold, then the next
repository member is considered 118.
[0030] The use of a recommendation engine is commonly associated
with marketing and sales functions within an enterprise so that if
a product closely but does not meet all the specifications of a
request, the product can still be referred to the customer. It may
be that the price is lower, it has a better rebate, or it may have
a better characteristic than the search term originally indicated
or may contain a feature with better parameters that the customer
is actually more interested in, even at the exclusion of other
features. A fixed threshold will not capture all such variations
and that would cause the customer to miss making a buy
decision.
[0031] Another closely related use of thresholds is when a
requirements document is being used as the search term, and there
is unlikely any repository member that matches the entire
requirements search term list. In this case, the importance of each
search term is critical for the recommendation engine, since a
fixed threshold based on preset rules would require presetting the
rules for every requirements document. This is especially so when
grammatical analysis is not done for the initial search and the
system is using the results of another engine's search and needs to
be used by the recommendation engine to generate an accurate
reflection of the most important search terms that should be
recommended, even if some minor ones are missing.
[0032] Another example of a use of a recommendation engine is in
planning a project. If a project is defined by one or more scope
documents, each scope document can be analyzed to locate the search
criteria, the search terms, and the importance factors. Once these
have been obtained, and if there are search terms generated from
the input, then the coverage range can be found for each search
term in the current scope document. In this example, a topic is
located for each search term in the focus, so that the search term
can be placed in the correct context by using the topic to remove
search results from a repository that do not belong. This is
helpful when a planning document contains common terms like work,
project, and others that require more information for the
recommendation to be meaningful. Then, importance calculations take
place.
[0033] While it is usually considered a better match when
everything matches exactly, in most real-world applications, there
is never going to be perfect match for every requirement. As a
result, determining what is considered more important helps the
system evaluate a repository member. Once the importance ratings
have been established, then the final item is to determine the
threshold based on a search type. In this example, an object search
type is employed that has a threshold that considers a search term
to be an object and considers a match to be a substantially similar
term, even if it is not worded the same (such as Jack's house, the
house that Jack lives in would be considered similar enough to meet
the threshold).
[0034] With this information, the recommendation engine now has
enough information to examine each repository member that
constitutes the range of documents that comprise the repository. In
this example, there has been no search run first, so the
recommendation engine works with the search engine to remove
documents that do not contain enough information to be considered
remotely a match. In this case, a repository member is analyzed by
the search engine to see if there is enough similarity between the
focus search terms and the repository member search terms, using a
keyword filter in this case. If no terms were found the repository
member, then the next repository member is found. Depending on the
threshold set for search, then any repository member that comes
close but is not an exact match with the focus will be passed to
the recommendation engine to determine if the repository member
contains enough important search terms based on their importance
ratings as allowed by the threshold (such as the wording example
given above), then the repository member is considered to be in the
recommended state and can be passed to the user. If a repository
member does not contain enough important information, then it is
not in the recommended state and is not passed to the user.
[0035] FIG. 2 illustrates computer system 200 adapted to use the
present invention. Central processing unit (CPU) 201 is coupled to
system bus 202. The CPU 201 may be any general purpose CPU, such as
an Intel Pentium processor. However, the present invention is not
restricted by the architecture of CPU 201 as long as CPU 201
supports the operations as described herein. Bus 202 is coupled to
random access memory (RAM) 203, which may be SRAM, DRAM, or SDRAM.
ROM 204 is also coupled to bus 202, which may be PROM, EPROM, or
EEPROM. RAM 203 and ROM 204 hold user and system data and programs
as is well known in the art.
[0036] Bus 202 is also coupled to input/output (I/O) controller
205, communications adapter 211, user interface 208, and display
209. The I/O adapter card 205 connects to storage devices 206, such
as one or more of flash memory, a hard drive, a CD drive, a floppy
disk drive, a tape drive, to the computer system. Communications
211 is adapted to couple the computer system 200 to a network 212,
which may be one or more of a telephone network, a local (LAN)
and/or a wide-area (WAN) network, an Ethernet network, and/or the
Internet network. User interface 208 couples user input devices,
such as keyboard 213, pointing device 207, to the computer system
200. The display card 209 is driven by CPU 201 to control the
display on display device 210.
[0037] Note that any of the functions described herein may be
implemented in hardware, software, and/or firmware, and/or any
combination thereof. When implemented in software, the elements of
the present invention are essentially the code segments to perform
the necessary tasks. The program or code segments can be stored in
a computer readable medium. The "computer readable medium" may
include any physical medium that can store or transfer information.
Examples of the processor readable medium include an electronic
circuit, a semiconductor memory device, a ROM, a flash memory, an
erasable ROM (EROM), a floppy diskette, a compact disk CD-ROM, an
optical disk, a hard disk, a fiber optic medium, etc. The code
segments may be downloaded via computer networks such as the
Internet, Intranet, etc.
[0038] Embodiments described herein operate on or with any network
attached storage (NAS), storage array network (SAN), blade server
storage, rack server storage, jukebox storage, cloud, storage
mechanism, flash storage, solid-state drive, magnetic disk, read
only memory (ROM), random access memory (RAM), or any conceivable
computing device including scanners, embedded devices, mobile,
desktop, server, etc. Such devices may comprise one or more of: a
computer, a laptop computer, a personal computer, a personal data
assistant, a camera, a phone, a cell phone, mobile phone, a
computer server, a media server, music player, a game box, a smart
phone, a data storage device, measuring device, handheld scanner, a
scanning device, a barcode reader, a POS device, digital assistant,
desk phone, IP phone, solid-state memory device, tablet, and a
memory card.
* * * * *