Mining Broad Hidden Query Aspects From User Search Sessions

Punera; Kunal ;   et al.

Patent Application Summary

U.S. patent application number 15/052725 was filed with the patent office on 2016-06-16 for mining broad hidden query aspects from user search sessions. The applicant listed for this patent is Yahoo! Inc.. Invention is credited to Deepayan Chakrabarti, Kunal Punera, Xuanhui Wang.

Application Number20160171082 15/052725
Document ID /
Family ID42232196
Filed Date2016-06-16

United States Patent Application 20160171082
Kind Code A1
Punera; Kunal ;   et al. June 16, 2016

MINING BROAD HIDDEN QUERY ASPECTS FROM USER SEARCH SESSIONS

Abstract

An optimization-based framework is utilized to extract broad query aspects from query reformulations performed by users in historical user session logs. Objective functions are optimized to yield query aspects. At run-time, the best broad but unspecified query aspects relevant to any user query are presented along with the results of the run time query.


Inventors: Punera; Kunal; (Santa Clara, CA) ; Chakrabarti; Deepayan; (Mountain View, CA) ; Wang; Xuanhui; (Urbana, IL)
Applicant:
Name City State Country Type

Yahoo! Inc.

Sunnyvale

CA

US
Family ID: 42232196
Appl. No.: 15/052725
Filed: February 24, 2016

Related U.S. Patent Documents

Application Number Filing Date Patent Number
12332187 Dec 10, 2008 9305051
15052725

Current U.S. Class: 707/706
Current CPC Class: G06F 16/2465 20190101; G06F 16/285 20190101; G06F 16/9535 20190101; G06F 16/2425 20190101
International Class: G06F 17/30 20060101 G06F017/30

Claims



1. A computer-implemented method for providing search results, comprising: analyzing search logs for (i) a first query comprising a first search term, followed by (ii) a second query comprising the first search term and a qualifier not initially specified in the first query; determining k aspects of the qualifier; receiving an original query at run time; and providing in response to the original query at least one of the k aspects along with results of the original query.

2. The method of claim 1, wherein determining k aspects of the qualifier comprises clustering the first search term and qualifier.

3. The method of claim 2, wherein determining k aspects of the qualifier further comprises selecting from clusters resulting from the clustering.

4. The method of claim 2, wherein determining k aspects of the qualifier further comprises an inter cluster move of an aspect from a first cluster to a second cluster.

5. The method of claim 1, wherein determining k aspects of the qualifier comprises applying modified star clustering.

6. The method of claim 1, wherein determining k aspects of the qualifier comprises applying k means clustering.

7. A computerized searching system configured to: analyze search logs for (i) a first query comprising a first search term, followed by (ii) a second query comprising the first search term and a qualifier not initially specified in the first query; determine k aspects of the qualifier; receive an original query at run time; and providing in response to the original query at least one of the k aspects along with results of the original query.

8. The system of claim 7, wherein determining k aspects of the qualifier comprises clustering the first search term and qualifier.

9. The system of claim 8, wherein determining k aspects of the qualifier further comprises selecting from clusters resulting from the clustering.

10. The system of claim 8, wherein determining k aspects of the qualifier further comprises an inter cluster move of an aspect from a first cluster to a second cluster.

11. The system of claim 7, wherein determining k aspects of the qualifier comprises applying modified star clustering.

12. The system of claim 7, wherein determining k aspects of the qualifier comprises applying k means clustering.

13. The system of claim 7, wherein the original query comprises the first search term.

14. At least one computer readable storage medium having computer program instructions stored thereon that are arranged to perform the following operations: analyzing search logs for (i) a first query comprising a first search term, followed by (ii) a second query comprising the first search term and a qualifier not initially specified in the first query; determining k aspects of the qualifier; receiving an original query at run time; and providing in response to the original query at least one of the k aspects along with results of the original query.

15. The computer readable storage medium of claim 14, wherein determining k aspects of the qualifier comprises clustering the first search term and qualifier.

16. The computer readable storage medium of claim 15, wherein determining k aspects of the qualifier further comprises selecting from clusters resulting from the clustering.

17. The computer readable storage medium of claim 15, wherein determining k aspects of the qualifier further comprises an inter cluster move of an aspect from a first cluster to a second cluster.

18. The computer readable storage medium of claim 14, wherein determining k aspects of the qualifier comprises applying modified star clustering.

19. The computer readable storage medium of claim 14, wherein determining k aspects of the qualifier comprises applying k means clustering.

20. The computer readable storage medium of claim 14, wherein the original query comprises the first search term.
Description



RELATED APPLICATIONS

[0001] This application is a divisional application and claims priority from application Ser. No. 12/332,187, Attorney Docket No. YAH1P188, entitled "Mining Broad Hidden Query Aspects from User Search Sessions," by Punera et al, filed on Dec. 10, 2008, which is incorporated herein by reference in its entirety for all purposes,

BACKGROUND OF THE INVENTION

[0002] This invention relates generally to search engines and queries.

[0003] The World Wide Web has grown dramatically over the last few years and search engines have become the primary mode of discovering and accessing web content for a large fraction of the users. However, even though the users employ search engines for critical information access tasks, they are remarkably laconic in describing their information needs. This behavior might be an outcrop of many factors. Users often use search engines for performing research on unfamiliar topics. Hence, they might skip important details in search queries because they aren't aware of them or haven't built up the correct vocabulary yet. In some other cases users neglect to add certain terms to queries because they believe the terms are obvious from the context or they aren't aware of other ambiguous senses of their incomplete queries. Search engines themselves might reinforce this behavior by not properly taking into account the extra information when the users do provide long descriptive queries.

SUMMARY OF THE INVENTION

[0004] A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and the drawings.

[0005] Embodiments of the invention find query aspects, that although not specified by the user, may be what the user had in mind and will suggest the query aspects and in some instances run the query with the unspecified aspects. The aspects are tailored to be sufficiently broad to apply to many different queries while being specific enough to accurately describe the hidden intent of the user.

[0006] Embodiments employ an optimization-based framework to extract broad query aspects from query reformulations performed by users in historical user session logs. Objective functions are optimized to yield query aspects.

[0007] One aspect relates to a computer-implemented method for providing search results. The method comprises analyzing search logs for query reformulations, extracting query reformulations from the analysis of the search logs, clustering the extracted query reformulations into clusters, selecting a group of the clustered extracted query reformulations, selecting clustered query reformulations from among the group of clustered extracted query reformulations so as to maximize a similarity measure, and presenting the clustered extracted query reformulations along with the results of a search.

[0008] Another aspect relates to a computerized searching system. The system is configured to analyze search logs for (i) a first query by a user comprising a first search term, followed by (ii) a second query comprising the first search term and a qualifier not initially specified in the first query. The system is further configured to determine k aspects of the qualifier, receive an original query at run time, and present to the user in response to the original query at least one of the k aspects along with results of the original query.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] FIG. 1 is a flow chart of method of offline steps embodiments may utilize.

[0010] FIG. 2 is a flow chart of online steps embodiments may utilize.

[0011] FIGS. 3A and 3B are graphs illustrating the performance of different embodiments as compared to a baseline.

[0012] FIG. 4 is a simplified diagram of a computing environment in which embodiments of the invention may be implemented.

[0013] A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and the drawings.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

[0014] Reference will now be made in detail to specific embodiments of the invention including the best modes contemplated by the inventors for carrying out the invention. Examples of these specific embodiments are illustrated in the accompanying drawings. While the invention is described in conjunction with these specific embodiments, it will be understood that it is not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In addition, well known features may not have been described in detail to avoid unnecessarily obscuring the invention.

[0015] Query aspects may include query qualifiers (i.e., terms added to queries during reformulations). These reformulations are monitored and logged on a regular basis, at a time before a particular search of interest, in certain embodiments of the invention. Embodiments find such aspects and upon receiving any original query at run-time, the query qualifiers can be covered by some number of aspects, which are then presented to the user along with results of the original query. Such actions taken before a current, or new search is undertaken are referred to as "offline," whereas actions taking place to return search results for a new search may be referred to as "online" or as "run time."

[0016] FIG. 1 is a flow chart illustrating offline activities. While such steps generally occur offline, or prior to run time of a current search, it should be understood that in some embodiments one or more of the steps may occur at run time.

[0017] In step 102, the system searches logs for query reformulations. For all or a subset of the query reformulations found, the system extracts and stores the query reformulation and optionally other information relating to the reformulations in step 106. In one embodiment, only a subset of query reformulations that exceed a threshold are utilized. For example, a threshold of query reformulations that result in a user click may be utilized. The threshold will of course vary depending on user traffic and the particular search engine and related databases, but in one example, only query reformulations resulting in more than about four to five hundred clicks and associated views of a page/site per month would be utilized.

[0018] Next, in step 110 the system clusters the extracted reformulations. Modified star clustering is one of many methods that may be employed by embodiments of the invention in order to pick the set A of N query aspects. The aim is to build the set A such that, with the best k aspects being picked for each query, and the total similarity between the query qualifiers and the corresponding k aspects per query are maximized, as seen in the table below.

TABLE-US-00001 Algorithm 2 Modified Star Clustering 1: input: set of qualifiers = .orgate..sub.q.di-elect cons.Q (q), qualifier frequencies L (.upsilon.).A-inverted..upsilon. .di-elect cons. , threshold .sigma., N 2: Create a graph = ( , .epsilon.) where is the set of qualifiers, and .epsilon. = {(i,j)|cosSim(i,j) > .sigma.} 3: n .rarw. 0, Left .rarw. , A .rarw. {.phi.} 4: while n < N and Left .noteq. {.phi.} do 5: hub .rarw. argmax.sub..upsilon..di-elect cons.LeftL(.upsilon.) 6: spokes .rarw. {i|(hub, i) .di-elect cons. .epsilon.} 7: star .rarw. {hub} .orgate. spokes 8: A .rarw. A .orgate. {star} 9: Left .rarw. Left\star 10: n .rarw. n + 1 11: end while 12: output: set A of at most N query aspects

[0019] Further details on the modified star clustering process can be found in a paper by Javed A. Aslam, Ekaterina Pelekov, and Daniela Rus, entitled "The Star Clustering Algorithm for Static and Dynamic Information Organization," published in the Journal of Graph Algorithms and Applications, 8(1), 2004, hereby incorporated by reference in the entirety. Any other clustering technique may be employed, although the modified star technique is preferred. One advantage of the modified star method is that it does not require specification of how many clusters are desired. Other examples of clustering techniques that may be employed include, for example, original star, K-means, expectation maximization ("EM") or Metis.

[0020] In step 114, the system makes an inter cluster (local) move to maximize the number of user queries covered with the facet clusters that have been created. An embodiment of the local search technique associated with the inter cluster move is described in the table below.

TABLE-US-00002 Algorithm 4 Local-Search 1: input: set of queries Q, set of qualifiers = .orgate..sub.q.di-elect cons.Q (q), max- imum number of query aspects N 2: Initialize the set of query aspects A to the output of Algorithm 2 with at most N aspects 3: Compute the best k query aspects from A for each original query using Algorithm 1 4: repeat 5: reselectK = "no" 6: move .rarw. Best-Local-Move(Q, A, reselectK) 7: if move = .phi. then 8: reselectK = "yes" 9: move .rarw. Best-Local-Move(Q, A, reselectK) 10: end if 11: if move .noteq. .phi. then 12: Updqate A according to move 13: if reselectK = "yes" then 14: Recompute the best k query aspects from the new A for each original query using Algorithm 1 15: end if 16: end if 17: until move = .phi. 18: output: set A of at most N query aspects

[0021] Then in step 118 the system picks a subset of clusters from step 114. The number of clusters chosen and methodology of choosing the clusters may vary. In one embodiment the top 50-150 cluster are chosen, preferably the top 100.

[0022] FIG. 2 is a flow chart of online steps embodiments may utilize. In step 202, the system will receive a search query from a user. Then in step 206, the system will pick k aspects. In one embodiment this is done according to the pick-k process described below. Of course it should be understood that this may be done in numerous other ways.

The Pick-k Process

[0023] Given a set A of query aspects, and a query q, the method picks k aspects a1, . . . , ak.epsilon.A so as to maximize the similarity measure F(l(q), .orgate..sub.i=1.sup.k.alpha..sub.k, Embodiments maximize any similarity function of the form

S ( X , Y 1 , , Y k ) = f 0 ( X ) + .SIGMA. i f ( X , Y i ) g 0 ( X ) + .SIGMA. i g ( X , Y i ) , ( 5 ) ##EQU00001##

where X and Yi are vectors in some finite-dimensional space, the functions g.sub.0( ) and g( ) are non-negative, X is fixed from the start, and the Yi vectors must be picked from a set Y.

TABLE-US-00003 Algorithm 1 Pick-k 1: input: k, vector X, set of vectors .gamma. 2: .alpha. .rarw. f.sub.0(X)/k, .beta. .rarw. g.sub.0(X)/k, Y .rarw. {.phi.}, n .rarw. k 3: while n > 0 do 4: M .rarw. { arg max , f ( X , Y i ) + .alpha. / n g ( X , Y i ) + .beta. / n } ##EQU00002## 5: If |M| > n, then keep any n elements in M and throw away the rest 6: Y .rarw. Y .orgate. (.orgate..sub.m.epsilon.MY.sub.m) 7: .alpha. .rarw. .alpha. + .SIGMA..sub.m.epsilon.Mf(X,Y.sub.m) 8: .beta. .rarw. .beta. + .SIGMA..sub.m.epsilon.Mg(X,Y.sub.m) 9: n .rarw. n - |M| 10: end while 11: output: picked elements Y .OR right. .gamma.

[0024] Then in step 210 the system will provide the k query aspects along side the search results. In other words, it will cause a client computer to display the query aspects along side the query results.

[0025] FIGS. 3A and 3B are graphs illustrating the performance of different embodiments (of selecting and presenting broad hidden query aspects) as compared to a baseline. FIG. 3A illustrates a performance comparison of the embodiments based on one broad aspect, that is k=1, whereas FIG. 3B illustrates a performance comparison of the embodiments based on three broad aspects, that is k=3. Bar 300 represents the baseline. Bar 302 represents an embodiment that employs original star clustering (ORGSTAR), without the local (inter cluster) move of step 114 described above. Bar 304 represents an embodiment that employs modified star clustering in step 110 (MODSTAR) given in Algorithm 2, but with without the local (inter cluster) move of step 114 described above and given in Algorithm 4 (LOCSEARCH). Bar 306 represents an embodiment that employs modified star clustering in step 110, together with the pick K algorithm of step 206. Bar 308 represents an embodiment that employs modified star clustering in step 110, together with the local (inter cluster) move of step 114, and the pick K algorithm of step 206.

[0026] Searches in accordance with embodiments of the invention in some centralized manner. This is represented in FIG. 4 by server 408 and data store 410 which, as will be understood, may correspond to multiple distributed devices and data stores. The invention may also be practiced in a wide variety of network environments including, for example, TCP/IP-based networks, telecommunications networks, wireless networks, public networks, private networks, various combinations of these, etc. Such networks, as well as the potentially distributed nature of some implementations, are represented by network 412.

[0027] In addition, the computer program instructions with which embodiments of the invention are implemented may be stored in any type of tangible computer-readable media, and may be executed according to a variety of computing models including a client/server model, a peer-to-peer model, on a stand-alone computing device, or according to a distributed computing model in which various of the functionalities described herein may be effected or employed at different locations.

[0028] The above described embodiments have several advantages and are distinct from prior methods. For example, the extraction of broad aspects from query logs, and their use in query refinement, have several advantages over prior query suggestion methods. The first advantage has to do with the discovery and use of broad aspects and query suggestions. The broad nature of the query aspects ensures that enough data is available to reliably construct these aspects and predict when they apply to user queries. This is in contrast to query suggestions that are often applicable to specific queries and hence learned from significantly lesser amount of data. The availability of more data for analysis also implies that the technique avoids presenting the user with redundant query refinement options, as is often the case with query suggestions. Since by definition there are fewer broad aspects of queries than query suggestions, they can be better maintained without the need for manual intervention.

[0029] The second and more principal advantage is more subtle, and concerns the way users navigate the search results page. It has been shown in user eye-tracking studies as well as by modeling user clicking behavior that users scan search result pages extremely quickly and don't make a complete determination of the relevance of results before clicking. Users therefore acclimate to repetitive features in the search results page and use them to make clicking decisions. For example, the bolded words in the title of the result indicates to users that the title matched the query very closely, while the indented search result indicates to the user that this search result is somehow related to the previous one. When users are exposed to query suggestions, which by definition are specialized to the current query, they have to carefully read the suggested queries in order to decide whether to click on them. Since the users scan result pages very fast, they often skip the suggested queries as irrelevant content. By using a limited number of broad aspects of queries as options for refinement the user will then need less attention to interpret the aspects, for example "Reviews and Ratings," when they are presented to them.

[0030] While the invention has been particularly shown and described with reference to specific embodiments thereof, it will be understood by those skilled in the art that changes in the form and details of the disclosed embodiments may be made without departing from the spirit or scope of the invention

[0031] In addition, although various advantages, aspects, and objects of the present invention have been discussed herein with reference to various embodiments, it will be understood that the scope of the invention should not be limited by reference to such advantages, aspects, and objects. Rather, the scope of the invention should be determined with reference to the appended claims.

* * * * *


uspto.report is an independent third-party trademark research tool that is not affiliated, endorsed, or sponsored by the United States Patent and Trademark Office (USPTO) or any other governmental organization. The information provided by uspto.report is based on publicly available data at the time of writing and is intended for informational purposes only.

While we strive to provide accurate and up-to-date information, we do not guarantee the accuracy, completeness, reliability, or suitability of the information displayed on this site. The use of this site is at your own risk. Any reliance you place on such information is therefore strictly at your own risk.

All official trademark data, including owner information, should be verified by visiting the official USPTO website at www.uspto.gov. This site is not intended to replace professional legal advice and should not be used as a substitute for consulting with a legal professional who is knowledgeable about trademark law.

© 2024 USPTO.report | Privacy Policy | Resources | RSS Feed of Trademarks | Trademark Filings Twitter Feed