Self-learning Methods For Automatically Generating A Summary Of A Document, Knowledge Extraction And Contextual Mapping YASIN; Syed [YASIN; Syed]

Self-learning Methods For Automatically Generating A Summary Of A Document, Knowledge Extraction And Contextual Mapping

YASIN; Syed

Patent Application Summary

U.S. patent application number 14/601837 was filed with the patent office on 2015-05-14 for self-learning methods for automatically generating a summary of a document, knowledge extraction and contextual mapping. The applicant listed for this patent is Syed YASIN. Invention is credited to Syed YASIN.

Application Number	20150134574 14/601837
Document ID	/
Family ID	43901157
Filed Date	2015-05-14

United States Patent Application	20150134574
Kind Code	A1
YASIN; Syed	May 14, 2015

SELF-LEARNING METHODS FOR AUTOMATICALLY GENERATING A SUMMARY OF A DOCUMENT, KNOWLEDGE EXTRACTION AND CONTEXTUAL MAPPING

Abstract

Advance Machine Learning or Unsupervised Machine Learning Techniques are provided that relate to Self-learning processes by which a machine generates a sensible automated summary, extracts knowledge, and extracts contextually related Topics along with the justification that explains "why they are related" automatically without any human intervention or guidance (backed ontology's) during the process. Such processes also relate to generating a 360-Degree Contextual Result (360-DCR) using Auto-summary, Knowledge Extraction and Contextual Mapping.

Inventors:

YASIN; Syed; (Bangalore, IN)

Applicant:

Name	City	State	Country	Type
YASIN; Syed	Bangalore		IN

Family ID:

43901157

Appl. No.:

14/601837

Filed:

January 21, 2015

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
13575478	Jul 26, 2012	8977540
PCT/IB11/50409	Jan 31, 2011
14601837

Current U.S. Class:	706/11
Current CPC Class:	G06F 16/93 20190101; G06F 16/345 20190101; G06N 20/00 20190101; G06F 16/285 20190101; G06N 7/00 20130101
Class at Publication:	706/11
International Class:	G06N 99/00 20060101 G06N099/00; G06F 17/30 20060101 G06F017/30; G06N 7/00 20060101 G06N007/00

Foreign Application Data

Date	Code	Application Number
Feb 3, 2010	IN	267/CHE/2010

Claims

1. A self learning method for automatically generating a Summary of a document without human intervention, said method comprising acts of: extracting Important Words (IW) of the document based on incremental order of their occurrence; listing the order of the IW's extracted in the order of highest Word Group (WG), wherein the highest word group is combination of maximum number of words that go together as one word; for each IW's starting in the order of highest word group, analyzing every sentence in the document to determine presence of the IW and thereafter extracting all the sentences having corresponding IW as important sentences (IS) after eliminating redundancies to generate the auto-summary for the document.

2. The method as claimed in claim 1, wherein identifying WG and IW comprises using word article and punctuation marks from the document.

3. A self learning method for automatically extracting knowledge of a given set of documents without human intervention, said method comprising acts of extracting Important Words (IW) and their corresponding Important Sentences (IS), and Topics (T) of the documents in a predetermined order; eliminating duplicates of each extracted IW and its corresponding sentences; and clustering the IS's and Topic's (T) in the list based on the extracted IW's as "Contextual-Topical Cluster" and "Knowledge-cluster" to extract knowledge and related contextual Topics from the set of given documents.

4. The method as claimed in claim 3, wherein defining topic to each documents is done by comparing each IW in the document with its file name and Title name, if any of the IW's matches than that is defined as a Topic.

5. The method as claimed in claim 4, wherein the IW's with highest frequency occurrences in the document is defined as the topic of the document.

6. The method as claimed in claim 3, wherein eliminating the duplicates of each extracted IW and its corresponding sentences using Hashing technique.

7. A self-learning method for automatically displaying 360-degree Contextual Search Results without human intervention, said method comprising acts of: generating Topic (T), Important-Words (IW), Important Sentences (IS) and Auto-Summary (SY) for a given document; storing the generated Auto-Summary as a field value during indexing along with corresponding Topic and Content of the document in Master-Index; extracting Topic List by processing the Master-Index and thereafter 360-degree Contextual Mapping (360-DCM) into 360-DCM cluster; extracting Knowledge from the document into Knowledge Extraction (KE) cluster; and analyzing user query to identify Topic in the TL and corresponding 360-DCM cluster to return related Topics along with the relationship map, wherein the Master-index returns search results along with auto-summary for each result; and the KE cluster returns relevant knowledge for the search query to display 360-degree Contextual Search Results.

8. The method as claimed in claim 7, wherein the generating auto-summary comprises acts of: extracting Important Words (IW) of the document from incremental order of their occurrence; listing the order of the IW's extracted in the order of highest Word Group (WG); splitting the document into sentences and storing the sentences in a sequential order as Array of Sentences (AS); for each IW's starting in the order of highest word group, analyzing every sentence in the AS to determine presence of the IW and thereafter extracting all the sentences having IW as important sentences (IS) to eliminate redundancies; and removing the extracted sentences from the AS and corresponding IW from the list of IW's to generate the auto-summary for the document.

9. The method as claimed in claim 7, wherein the extracting Knowledge from the document into Knowledge Extraction (KE) cluster comprises acts of extracting Important Words (IW) and their corresponding Important Sentences (IS), and Topics (T) of the documents in a predetermined order; eliminating duplicates of each extracted IW and its corresponding sentences; and clustering the IS's and Topic's (T) in the list based on the extracted IW's or the Topic as "Contextual-Topical Cluster" and "Knowledge-cluster" to extract knowledge and related contextual Topics from the set of given documents.

10. The method as claimed in claim 7, wherein generating 360-degree Contextual Mapping comprises acts of indexing one or more documents; storing the topics identified for each documents in a predetermined order as Topical List (TL) during the indexing process and removing duplicates topics from the TL; extracting predefined number of results for each Topic in the TL by searching one Topic at a time in the index; extracting corresponding Topic and Content for each of the extracted result and storing the extracted Topic and Content in a predetermined order as Result-List (RL) in a temporary storage; analyzing the RL for each topic to extract Related Topic; analyzing corresponding document Content of the Topic to extract "why they are related" phrases from the content; and clustering the resultant "Related Topics" along with their respective sentences to generate 360-degree Contextual Mapping.

Description

CROSS-REFERENCE TO THE RELATED APPLICATIONS

[0001] This application is a divisional application of U.S. patent application Ser. No. 13/575,478 filed Jul. 26, 2012, which is a U.S. National Stage Application claiming the benefit of prior filed International Application No. PCT/IB2011/050409 filed Jan. 31, 2011, in which the International Application claims a priority date of Feb. 3, 2010 based on prior filed Indian Application No. 267/CHE/2010, the entire contents of which are incorporated herein by reference.

TECHNICAL FIELD

[0002] Embodiments of the present disclosure relate to Advance Machine Learning or Unsupervised Machine Learning Techniques. More particularly, embodiments relate to Self-learning process by which a machine generates a sensible automated summary, extracts knowledge, and extracts contextually related Topics along with the justification that explains "why they are related" automatically without any human intervention or guidance (backed ontology's) during the process.

BACKGROUND

[0003] No Search engine today brings the justification/description to "Why this relation?", while representing contextual Topics or Search Refinements for the user query during the process of Search. Users wonder, why or how is this Topic Related?, Also, Knowledge representation is the key to the next generation of search as suppose to mere information retrieval basis user queries. This algorithm brings in a 360-degree contextual knowledge representation apart from being capable of answering specific questions.

[0004] Currently, most of the search engines are mere keyword based information extraction basis relevance algorithms. There is a huge demand for overall or 360 degree contextual knowledge representation in Search industry, which is the future of search. In a nutshell we have and are in process to build a revelation of 3.sup.rd & 4.sup.th generation search engine.

[0005] In light of the foregoing discussion, there is a need for a method to solve the above mentioned shortcomings in the search industry.

SUMMARY

[0006] The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method and a system as described in the description.

[0007] The present disclosure solves the limitations of existing techniques by providing an Advance Machine Learning or Unsupervised Machine Learning Techniques, which uses a mathematical approach to identify and extract knowledge concepts in a set of given, documents (Unstructured Data). This approach does not necessarily need training data to help make decisions on building the 360-degree contextual map but rather has the ability to statistically learn from the data itself. Given a set of natural documents or web-pages or anything similar, the algorithm is elegant enough to organize the knowledge concepts automatically without any human guidance during the process.

[0008] In one embodiment, the technology disclosed in the present disclosure provide an method/process that is elegant enough to sensibly build an Auto-Summary of a given document completely automatically (self-learning) using the important words identified from the document.

[0009] In one embodiment, the technology disclosed in the present disclosure is a novel and inventive Text-Analytics framework, which extracts Knowledge completely automatically from Information Indexed or Processed (self-learning). The solution proposed in the present disclosure brings in a 360-degree contextual results; which is highly effective from usability perspective.

[0010] In one embodiment, the present technology or process can be used in Search Engines, both Web Search and Enterprise Search. It can also be used in Online Business (AdWords, AdSense). Summarization of Documents, WebPages etc. . . . more importantly LPeSr brings 360-contextual mapping of knowledge and contextual clusters.

[0011] The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

[0012] The novel features and characteristic of the disclosure are set forth in the appended claims. The embodiments of the disclosure itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings. One or more embodiments are now described, by way of example only, with reference to the accompanying drawings wherein like reference numerals represent like elements and in which:

[0013] FIG. 1 is a flowchart illustrating a methodology to generate an automated summary of the document and to extract knowledge concepts from the set of given documents, in accordance with an exemplary embodiment.

[0014] FIG. 2 is a flowchart illustrating a methodology to generate an automated summary of the document, in accordance with an exemplary embodiment.

[0015] FIG. 3 is a flowchart illustrating a methodology to generate Knowledge-Extraction (KE) basis Text-Analytics of given sum or set of documents, in accordance with an exemplary embodiment.

[0016] FIG. 4 is a flowchart illustrating a methodology to generate 360-Degree Contextual Mapping (360-DCM) Cluster, in accordance with an exemplary embodiment.

[0017] FIG. 5 is a flowchart illustrating a methodology to generate 360-Degree Contextual Results (360-DCR), in accordance with an exemplary embodiment.

[0018] FIG. 6 is an exemplary snap shot of the web page highlighting auto summary created by present technology for given documents and/or WebPages.

[0019] FIG. 7 is an exemplary snap shot of the web page highlighting Knowledge-Extraction (KE) created by present technology for given documents and/or WebPages.

[0020] FIG. 8 is an exemplary snap shot of the web page displaying exploded view of the contextual related topic link that gives information on why and how the topics selected are contextually related.

[0021] FIG. 9 is an exemplary snap shot of the web page displaying 360-Degree Contextual Results (360-DCR) for the selected Topic, in accordance with an exemplary embodiment.

[0022] The figures depict embodiments of the disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.

DETAILED DESCRIPTION

[0023] The foregoing has broadly outlined the features and technical advantages of the present disclosure in order that the detailed description of the disclosure that follows may be better understood. Additional features and advantages of the disclosure will be described hereinafter which form the subject of the claims of the disclosure. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the disclosure as set forth in the appended claims. The novel features which are believed to be characteristic of the disclosure, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present disclosure.

[0024] Exemplary embodiments of the present disclosure relate to Latent Precis Extraction and Synecdoche Representation (LPeSr) which is a self-learning methodology/process designed to extract important words automatically from a given document or a web page. It then builds an Automated Machine generated summary that is not just mere chopping-off or truncating paragraphs but actually makes it a real sensible summary ensuring that important sentences are very much captured, hence the word "Precis" or "a Summary". Basis the first level of process the method then maps the overall information extracted into "Knowledge Entities" basis a linguistic framework built on Natural Language Processing or NLP. During the final stage the method brings in the description to the contextually related clusters explaining "Why this relation?" . . . In short a "360-degree contextual representation of Knowledge" or "360-degree contextual map" is achieved as suppose to mere Information retrieval, hence the word "Synecdoche" (Part of something that is used to refer to the whole thing).

[0025] Aforesaid features are explained in detail herein below with the help of examples for better understanding. However, these examples should not be construed as the limitation on the scope of the present technology.

[0026] Referring now to FIG. 1, which illustrates a high-level snap-shot of steps used in generating automated summary of the document and to extract knowledge concepts from the set of given documents. The basic step starts with processing the documents or web-pages using corresponding parsers to extract the textual information. The textual or content within the document is the input to the algorithm. The instant methodology then processes the content of each document to extract important words within it automatically. A standard procedure of finding the frequency of each word in the document can be used to extract the words of high frequency instance after filtering the word articles from the documents. the present technology make use of an elegant technique that is much advanced & efficient from eminence perspective. These words make real sense from practical standpoint to be used in generation of automated summary of the document and to extract knowledge concepts from the set of given documents.

[0027] As seen from the FIG. 1 it is obvious that the end result of the instant technology disclosed in the present disclosure is to generate sensible auto-summary of the document and to extract key concepts or knowledge concepts basis the summarization process. Therefore, the instant technology gets into the details of core aspects of the process of auto-summarization and extraction of knowledge concepts.

[0028] Let's quickly look into the technique used as a base for extraction of Important-Words (IW) with High-Frequency Occurrence. In one embodiment, identify the words that go together first, example "Saudi Arabia" is a one word although made up of two words. It is termed as "Saudi Arabia"=l(2), this means two words making sense as one word. Therefore in a given document the method finds such words that go together from the incremental order of their occurrences, which means first find the highest order like l(5), then l(4), then l(3) . . . the general way to define this is "m(n)", where "n" is number of words in the group to make one word as a whole. "m" is a constant as it is a representation of the word as a whole.

[0029] In one embodiment, the procedure of extracting High-Frequency words of the document, which is applied, would be as follows: [0030] a. m(n), m(n-1), m(n-2), m(n-3) . . . m(n-p) [0031] b. Given the above "n" could be any positive integer; typically/mostly value of "n" is in the range 5 to 4 in practical scenarios. "n-p" is always 2.

[0032] For example, if the document content is something like:

[0033] "The Kingdom of Saudi Arabia, commonly known as Saudi Arabia is the largest Arab country of the Middle East. It is bordered by Jordan and Iraq on the north and northeast, Kuwait, Qatar and the United Arab Emirates on the east, Oman on the southeast, and Yemen on the south. It is also connected to Bahrain by the King Fand Causeway . . . "

[0034] To keep it simple for explanation purpose, value of "n" is considered as "2" (value of p=0), therefore the words that go together are identified by the following procedure:

[0035] Consider the first two words in the string i.e. "The Kingdom", now since "The" is a word article, this is skipped, the next word "Kingdom of", again "of" is a word article, this combination is skipped too "of Saudi", same case . . . now the next combination "Saudi Arabia" seems to make sense but for a machine does not make sense yet but since there is no word article or punctuation marks this is recorded and assuming this to be a valid word the method processes other words similarly until it finds the same combination again, if it found then "Saudi Arabia" makes it into a valid word of combination 2. Also, it is obvious that the probability of such combination depends upon at least one co-occurrence of such word.

[0036] Where ever such combination is found in the contents of the document, the method would extract such a word and record its frequency of occurrence and replace it with a void value or null. Having mentioned this, let's now assume that the initial value of n=4, then first four (4) combinations of the words are considered and proceed to compare with the consecutive combination ex: "The Kingdom of Saudi" & "Kingdom of Saudi Arabia" are compared . . . and so on. Once the document reaches its end, the value of n becomes n-1, which is 3. The process is repeated on the same document now keeping in mind the void or null value too that might have been replaced for any possible valid combination of 4 groups of words. This whole process is repeated until value of p=0.

[0037] Now, the only left over words is obviously single words, they are processed to find the highest frequency words. The combination of Group(4), Group(3), Group(2) and (1) words gives a set of valid words that make the document, in other words these words are important to the document.

[0038] Therefore, in a practical scenario if a given document has about 1500 to 2000 words and given a very comprehensive list to eliminate "common words" or simply call "Word-Articles" (is, it, the, do, should . . . ) etc. . . . the words in the document get reduced to more than 50%. A huge list of these common words for English and also other languages is freely available (we use one such list and fine tune if there are any missed out words).

[0039] In one embodiment, these common words are filtered only during the last stage of the process, i.e. during the process of single-words. Initially for Group (4), Group (3), Group (2), we very much use this common word list to figure out "Word-Groups" (WG) as explained earlier. Once, the method is through the procedure of extracting WG's and Single Words, they typically come down to 15 to 20 words for a document size of about 2000 words. Now, these words are used in a controlled way to develop a sensible automated summary for the given document.

[0040] In one embodiment, the following Hypothesis is made basis which the logic for developing an automated summary & knowledge extraction for a given document is judged by the machine/methodology. [0041] 1. Every sensible document has a meaning, message that it illustrates to its readers. [0042] 2. There are a set of words that are important and around which "common words" or "word-articles" attach to make a sentence that describe these IW. Refer to the paragraph again:

[0043] "The Kingdom of Saudi Arabia, commonly known as Saudi Arabia is the largest Arab country of the Middle East. It is bordered by Jordan and Iraq on the north and northeast, Kuwait, Qatar and the United Arab Emirates on the east, Oman on the southeast, and Yemen on the south. It is also connected to Bahrain by the King Fand Causeway . . . "

[0044] The Highlighted bold words are common words. Now, if we analyze the above paragraph point 2 becomes clear. [0045] 3. The important words and common words together form sentences . . . typically sentences and paragraphs are the building blocks of a document. [0046] 4. Each sentence is separated by a "period" (.) symbol. [0047] 5. Therefore, every sentence that contains the IW's becomes important. [0048] 6. Also, every other sentence that is close to this particular sentence also can be assumed strongly to be important. [0049] 7. Therefore, we just need a technique to elegantly extract these important sentences and join them to make a very sensible summary.

[0050] In one embodiment, the step by step process used in generation of Auto-Summary (Precis) is explained herein in detail. Firstly, the Important Words (IW's) are extracted using the technique explained hereinabove for extracting the IW's to process the next steps to generate an auto-summary. The next step is to list the order of the IW's extracted in the order of highest WG's. The given document is split into sentences, using the period symbol as the mark for the split. All of these sentences are stored in a storage medium in the sequential order. For example storage medium includes but is not limiting to an array, database and any proper media. Let's call this "Array of Sentences" as "AS". For each IW starting in the order of highest WG's, every sentence in AS is analyzed to find if said IW is present. If the IW is found then, that particular sentence is extracted; let's call it "S1". Now this sentence is removed from the list of AS. Therefore, AS becomes (AS-S1). The corresponding IW is also removed from the list of IW's. For example, if there are 10 IW's (IW1, IW2, IW3, . . . IW10), we now would have 9 IW's (IW2, IW3, IW4, . . . IW10). The reason why we might want to remove IW after a match is found is to avoid repetitions or redundancies during final stage of auto-summary creation.

[0051] In one embodiment, above said process is repeated for all other consecutive IW. As a result of which for say about 10 IW's the process would have extracted about 10 sentences that matched (S1, S2, and S3 . . . S10). The combination of these would form a very sensible summary of the given document in the real world, each sentence however is appropriately separated by consecutive periods (about 4 to 5) to give a obvious feeling that these sentences are snippets of the document that are joined. Therefore, resultant auto-summary (SY).fwdarw.(S1, S2, S3 . . . S10) is a summation of Important-Sentences (IS's)=Auto-Summary of the document as illustrated in FIG. 2. FIG. 6 is an exemplary snap shot of the web page highlighting auto summary created by present technology for given documents and/or WebPages.

[0052] In another embodiment, the present disclosure provides details about Knowledge-Extraction (KE) basis Text-Analytics of given sum or set of documents. Basis the hypothesis mentioned earlier, we now look in to the process/methodology used for extracting knowledge from given set of documents.

[0053] It is understood that for a given n number of documents, each document would have corresponding IW's and Auto-SummarY (SY) for each document. Firstly, define Topic to each document basis IW's and combination of filename & Title name. This is done by comparing each IW in a document with its filename & Title name, if any of the IW's matches than that is defined as a Topic; else the IW with highest-frequency is defined as the Topic. Sometimes, Topic can be just Title name if it is consistent in the given documents (if we manually analyze and are content with the Title to be Topic, then we simply use Title to be Topic (in most of the cases this is true but involves manual intervention).

[0054] Therefore, now we have for each Document (D1), its corresponding Topic (T1), Important-Words (IWs) and its corresponding auto-summary (SY1) and more importantly the Important-Sentences (IS's) (that were extracted to build auto-summary). i.e. D1=T1, IWs, SY1, IS's. Therefore, for given set of n number of documents we would have their corresponding Topic's, Auto-Summaries, Important-Words and Important-Sentences.

[0055] Now analyze the data statistically to extract Knowledge-Clusters and corresponding Topic-Clusters that are contextually related. For the given data set of n number of documents, the list of IW's and their corresponding Important-Sentences (IS's) and Topics (T's) are extracted in an order (may be an array, database etc. . . . any proper media). Now, each IW and its corresponding sentence is hashed to generate a hash-code (H), this will be an integer number. Associate hash-code as well with IW's in the list in the same order. In one embodiment, hashing is primarily used to eliminate any redundancies during KE process, as the same hash code is generated for same sentences, duplicates can be removed. I.e. IW+IS=hash code. No two hash codes in the list would be same after filtering duplicates. Where hashing is used to only facilitate removal of redundancies, however any other technique can also be used as an alternative.

[0056] Once, the duplicates are eliminated, the IS's & Topic's (T's) are Grouped or Clustered in the list based on IW's. Therefore, for each IW, there would be "m" number of Important-Sentences (IS's) & "m" number of Topic's that might have been extracted. The Topic's for each IW cluster can be defined as contextually related. Basis the hypothesis mentioned earlier, if the sentence containing the IW is important then obviously the Topic's of the relative IS's from other documents for the same IW shall be related contextually. Now, we have two clusters for a given IW--"Contextual-Topical Clusters" and "Knowledge-Clusters". Aforesaid process is clearly illustrated in FIG. 3. In another embodiment, it is possible to Group basis Topic instead of grouping basis IW, this depends on the scenario & requirement that one is trying to address. FIG. 7 is an exemplary snap shot of the web page highlighting Knowledge-Extraction (KE) created by present technology for given documents and/or WebPages.

360-Degree Contextual Mapping (Synecdoche)

[0057] The basic philosophy of the present disclosure is to bring "Knowledge as suppose to mere Information" during the process of retrieval of results. Although, the process is very laborious and involves huge computation but the end results are simply amazing. The next generation of search is definitely going to be in this direction. Let's understand how this is achieved.

[0058] Note that basis the techniques explained earlier we have certain attributes associated with each document after the first two levels of processing, which are Topic (T), Important-Words (IW's), Auto-Summary (SY) and Important-Sentences (IS's). To achieve 360-Degree Contextual-Mapping (360-DCM) the Topics (T's) & the Data or Content (C) of the document itself plays a vital role. There is a certain way in which this is achieved, please refer to the pointers below that explains the same.

[0059] For a given set of documents say "n", it has corresponding Topics (Tn) and Content (Cn) associated with that. Index these documents or rather process them in a very specific way. While running an Index on the documents, two separate values or fields are stored in the index which is Topic of the document, and Document Content. During the Indexing process, the Topic that is identified for each document is stored in a storage medium in a predetermined order. For example, storage medium includes but is not limiting to an array, database and any proper media. Let's call this "Topical-List" or "TL". There is a possibility that the TL may contain duplicates, filter these duplicates results in a list with non-redundant Topics. For each Topic in the TL, search index is hit for one Topic at a time and extracted predefined set of results. The predefined threshold of number of results depends upon the size of the Data or Index size. Typically, the first 50 to 150 results could be extracted. For each of the result, the corresponding Topic and Content is extracted. These are stored in a predetermined order. Let's call this "Result-List" or "RL".

[0060] Let's assume that the Topic from TL that hit the Index is "Obama", we get the corresponding RL. Now, there are two things that are analyzed. Firstly, if the RL has Topics that match each other (Example: if they are at least two occurrence of say Topic: "Hillary", then "Hillary", is extracted to be related), they are extracted to be related to the Topic "Obama". Secondly, the corresponding Document Content of the Topics is analyzed to find the sentences that list the word "Obama" (The same technique is used here to split the sentences to find this important word as the method did in the process of auto-summary creation). Refer FIG. 4. Preferably, those sentences that contain both the word "Obama" and the Topic of the document are selected. For example, if the Topic of the document from RL being processed is say "Hillary", then the sentence extracted would be like say "Hillary Diane Rodham Clinton is the 67th United States Secretary of State, serving in the administration of President Barack Obama". This kind of structure would justify "Why this Relationship?" or "How is this Topic Related?" functionality.

[0061] In one embodiment, aforesaid process is repeated until all the Topics in the TL are exhausted. The resultant "Related Topics" along with their respective sentences that justify the relationships are stored in a cluster. For example cluster includes but is not limiting to an array, database and any proper media. This is represented as the "360-Degree Contextual-Map" (360-DCM) of a given Topic that describes "Part of something that is used to refer to the whole thing", which is nothing but Synecdoche Representation.

[0062] This functionality is extremely helpful during the process of search; the user gets the required information along with contextually related Topics with the explanation of "Why or How are these Related?"; As an example, consider if the user-query is "Heart Attack", then apart from regular results the query hits the 360-DCM and if there is a cluster for the Topic "Heart Attack" then the Related Topics, which it might return may be Thrombolytic Therapy, Coronary artery spasm, Atherosclerosis, Unstable angina. Apart from this if the user clicks on the link "Why or How are these Related?">>, then the following information would be displayed:

[0063] Thrombolytic Therapy

Those who die from heart attacks generally die within 1 hour from the initial onset of symptoms and sometimes before they get to the hospital. For a person having an acute heart attack, tPA works by dissolving a major clot quickly. The clot is most likely blocking one of the coronary arteries that normally allows blood and oxygen get to the heart muscle.

[0064] health.allrefer.com/health/thrombolytic-therapy . . .

[0065] Coronary Artery Spasm

Coronary artery spasm is a temporary, sudden narrowing of one of the coronary arteries (the arteries that supply blood to the heart). In many people, coronary artery spasm may occur without any other heart risk factors (such as smoking, diabetes, high blood pressure, and high cholesterol). If the spasm lasts long enough, it may even cause a heart attack. Treatment: The goal of treatment is to control chest pain and prevent a heart attack.

[0066] www.nlm.nih.gov/medlineplus/ency/ . . .

[0067] Atherosclerosis

If the coronary arteries become narrow, blood flow to the heart can slow down or stop. This can cause chest pain (stable angina), shortness of breath, heart attack, and other symptoms. This is a common cause of heart attack and stroke. If the clot moves into an artery in the heart, lungs, or brain, it can cause a stroke, heart attack, or pulmonary embolism.

[0068] www.nlm.nih.gov/medlineplus/ency/ . . .

[0069] Unstable Angina

Unstable angina is a condition in which your heart doesn't get enough blood flow and oxygen. It is a prelude to a heart attack. This causes arteries to become less flexible and narrow, which interrupts blood flow to the heart, causing chest pain. The chest pain: Occurs without cause (for example, it wakes you up from sleep) Lasts longer than 15-20 minutes Responds poorly to a medicine called nitroglycerin May occur along with a drop in blood pressure or significant shortness of breath People with unstable angina are at increased risk of having a heart attack.

[0070] www.nlm.nih.gov/medlineplus/ency/ . . .

[0071] An exemplary snap shot of the web page displaying exploded view of the contextual related topic link that gives information on why and how the topics selected are contextually related is illustrated in FIG. 8.

Summing-Up all the Features to Display 360-Degree Contextual Results

[0072] Generations of Auto-Summary (Precis), Text-Analytics, Knowledge Extraction and 360-Degree Contextual Mapping (Synecdoche) techniques are explained in detail above. Using all the above steps, the process analyze and index the data in such a way that it will facilitate the retrieval of search results that will portray "360-Degree Contextual Results" of the search query as illustrated in FIG. 5.

[0073] As seen from FIG. 5, all the processes are collated together to bring in the 360-DCR, the following is the way in which it is achieved:

[0074] For the given data, for each document corresponding Topic (T), Important-Words (IW's), Important-Sentences (IS's), and Auto-summary (SY) is generated as explained earlier. Auto-Summary is stored as a field value during indexing along with corresponding Topic and Content of the document, this we call it the Master-Index, this Index is used for displaying search results. Since, auto-summary is a field value; every result will have a summary of the entire document, which will help the user to have a quick overview of each result without actually having the user to visit the content page. While processing Master-Index the Topical-List is extracted. TL later hits the Master-Index to extract 360-DCM clusters as explained earlier. Knowledge is extracted into KE clusters as explained earlier.

[0075] For a given user query the process analyzes it to see if such a Topic exists in the TL, if so the corresponding cluster from the 360-DCM cluster returns related Topics along with the relationship map. The Master-Index returns search results along with auto-summary for each result. The KE cluster is analyzed to see if such an Important-Word (IW) exits, if so relevant Knowledge that is gathered about such a search-query is highlighted. Therefore, in a nutshell the solution is 3 fold, the end-users get Information along with sensible summary of the document, they get Knowledge pertaining to their query and last but not least they also get contextually related Topics listed with the relationship map, which in-itself is a separate result set that is nothing but an advance level of brining in Query-Expansion based results as a part of contextual results.

[0076] Hence, for the given query the system brings in information about it along with relevant Knowledge and Contextually related topics and their relationship map, which gives the user more than just mere results. An exemplary snap shot of the web page displaying 360-Degree Contextual Results (360-DCR) for the selected Topic, in accordance with an exemplary embodiment is illustrated in FIG. 9.

[0077] The present disclosure is not to be limited in terms of the particular embodiments described in this application, which are intended as illustrations of various aspects. Many modifications and variations can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and devices within the scope of the disclosure, in addition to those enumerated herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims. The present disclosure is to be limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

[0078] With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.

[0079] In addition, where features or aspects of the disclosure are described in terms of Markush groups, those skilled in the art will recognize that the disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.

[0080] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

* * * * *

References

nlm.nih.gov/medlineplus/ency