U.S. patent application number 15/052725 was filed with the patent office on 2016-06-16 for mining broad hidden query aspects from user search sessions.
The applicant listed for this patent is Yahoo! Inc.. Invention is credited to Deepayan Chakrabarti, Kunal Punera, Xuanhui Wang.
Application Number | 20160171082 15/052725 |
Document ID | / |
Family ID | 42232196 |
Filed Date | 2016-06-16 |
United States Patent
Application |
20160171082 |
Kind Code |
A1 |
Punera; Kunal ; et
al. |
June 16, 2016 |
MINING BROAD HIDDEN QUERY ASPECTS FROM USER SEARCH SESSIONS
Abstract
An optimization-based framework is utilized to extract broad
query aspects from query reformulations performed by users in
historical user session logs. Objective functions are optimized to
yield query aspects. At run-time, the best broad but unspecified
query aspects relevant to any user query are presented along with
the results of the run time query.
Inventors: |
Punera; Kunal; (Santa Clara,
CA) ; Chakrabarti; Deepayan; (Mountain View, CA)
; Wang; Xuanhui; (Urbana, IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Yahoo! Inc. |
Sunnyvale |
CA |
US |
|
|
Family ID: |
42232196 |
Appl. No.: |
15/052725 |
Filed: |
February 24, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12332187 |
Dec 10, 2008 |
9305051 |
|
|
15052725 |
|
|
|
|
Current U.S.
Class: |
707/706 |
Current CPC
Class: |
G06F 16/2465 20190101;
G06F 16/285 20190101; G06F 16/9535 20190101; G06F 16/2425
20190101 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A computer-implemented method for providing search results,
comprising: analyzing search logs for (i) a first query comprising
a first search term, followed by (ii) a second query comprising the
first search term and a qualifier not initially specified in the
first query; determining k aspects of the qualifier; receiving an
original query at run time; and providing in response to the
original query at least one of the k aspects along with results of
the original query.
2. The method of claim 1, wherein determining k aspects of the
qualifier comprises clustering the first search term and
qualifier.
3. The method of claim 2, wherein determining k aspects of the
qualifier further comprises selecting from clusters resulting from
the clustering.
4. The method of claim 2, wherein determining k aspects of the
qualifier further comprises an inter cluster move of an aspect from
a first cluster to a second cluster.
5. The method of claim 1, wherein determining k aspects of the
qualifier comprises applying modified star clustering.
6. The method of claim 1, wherein determining k aspects of the
qualifier comprises applying k means clustering.
7. A computerized searching system configured to: analyze search
logs for (i) a first query comprising a first search term, followed
by (ii) a second query comprising the first search term and a
qualifier not initially specified in the first query; determine k
aspects of the qualifier; receive an original query at run time;
and providing in response to the original query at least one of the
k aspects along with results of the original query.
8. The system of claim 7, wherein determining k aspects of the
qualifier comprises clustering the first search term and
qualifier.
9. The system of claim 8, wherein determining k aspects of the
qualifier further comprises selecting from clusters resulting from
the clustering.
10. The system of claim 8, wherein determining k aspects of the
qualifier further comprises an inter cluster move of an aspect from
a first cluster to a second cluster.
11. The system of claim 7, wherein determining k aspects of the
qualifier comprises applying modified star clustering.
12. The system of claim 7, wherein determining k aspects of the
qualifier comprises applying k means clustering.
13. The system of claim 7, wherein the original query comprises the
first search term.
14. At least one computer readable storage medium having computer
program instructions stored thereon that are arranged to perform
the following operations: analyzing search logs for (i) a first
query comprising a first search term, followed by (ii) a second
query comprising the first search term and a qualifier not
initially specified in the first query; determining k aspects of
the qualifier; receiving an original query at run time; and
providing in response to the original query at least one of the k
aspects along with results of the original query.
15. The computer readable storage medium of claim 14, wherein
determining k aspects of the qualifier comprises clustering the
first search term and qualifier.
16. The computer readable storage medium of claim 15, wherein
determining k aspects of the qualifier further comprises selecting
from clusters resulting from the clustering.
17. The computer readable storage medium of claim 15, wherein
determining k aspects of the qualifier further comprises an inter
cluster move of an aspect from a first cluster to a second
cluster.
18. The computer readable storage medium of claim 14, wherein
determining k aspects of the qualifier comprises applying modified
star clustering.
19. The computer readable storage medium of claim 14, wherein
determining k aspects of the qualifier comprises applying k means
clustering.
20. The computer readable storage medium of claim 14, wherein the
original query comprises the first search term.
Description
RELATED APPLICATIONS
[0001] This application is a divisional application and claims
priority from application Ser. No. 12/332,187, Attorney Docket No.
YAH1P188, entitled "Mining Broad Hidden Query Aspects from User
Search Sessions," by Punera et al, filed on Dec. 10, 2008, which is
incorporated herein by reference in its entirety for all
purposes,
BACKGROUND OF THE INVENTION
[0002] This invention relates generally to search engines and
queries.
[0003] The World Wide Web has grown dramatically over the last few
years and search engines have become the primary mode of
discovering and accessing web content for a large fraction of the
users. However, even though the users employ search engines for
critical information access tasks, they are remarkably laconic in
describing their information needs. This behavior might be an
outcrop of many factors. Users often use search engines for
performing research on unfamiliar topics. Hence, they might skip
important details in search queries because they aren't aware of
them or haven't built up the correct vocabulary yet. In some other
cases users neglect to add certain terms to queries because they
believe the terms are obvious from the context or they aren't aware
of other ambiguous senses of their incomplete queries. Search
engines themselves might reinforce this behavior by not properly
taking into account the extra information when the users do provide
long descriptive queries.
SUMMARY OF THE INVENTION
[0004] A further understanding of the nature and advantages of the
present invention may be realized by reference to the remaining
portions of the specification and the drawings.
[0005] Embodiments of the invention find query aspects, that
although not specified by the user, may be what the user had in
mind and will suggest the query aspects and in some instances run
the query with the unspecified aspects. The aspects are tailored to
be sufficiently broad to apply to many different queries while
being specific enough to accurately describe the hidden intent of
the user.
[0006] Embodiments employ an optimization-based framework to
extract broad query aspects from query reformulations performed by
users in historical user session logs. Objective functions are
optimized to yield query aspects.
[0007] One aspect relates to a computer-implemented method for
providing search results. The method comprises analyzing search
logs for query reformulations, extracting query reformulations from
the analysis of the search logs, clustering the extracted query
reformulations into clusters, selecting a group of the clustered
extracted query reformulations, selecting clustered query
reformulations from among the group of clustered extracted query
reformulations so as to maximize a similarity measure, and
presenting the clustered extracted query reformulations along with
the results of a search.
[0008] Another aspect relates to a computerized searching system.
The system is configured to analyze search logs for (i) a first
query by a user comprising a first search term, followed by (ii) a
second query comprising the first search term and a qualifier not
initially specified in the first query. The system is further
configured to determine k aspects of the qualifier, receive an
original query at run time, and present to the user in response to
the original query at least one of the k aspects along with results
of the original query.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 is a flow chart of method of offline steps
embodiments may utilize.
[0010] FIG. 2 is a flow chart of online steps embodiments may
utilize.
[0011] FIGS. 3A and 3B are graphs illustrating the performance of
different embodiments as compared to a baseline.
[0012] FIG. 4 is a simplified diagram of a computing environment in
which embodiments of the invention may be implemented.
[0013] A further understanding of the nature and advantages of the
present invention may be realized by reference to the remaining
portions of the specification and the drawings.
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
[0014] Reference will now be made in detail to specific embodiments
of the invention including the best modes contemplated by the
inventors for carrying out the invention. Examples of these
specific embodiments are illustrated in the accompanying drawings.
While the invention is described in conjunction with these specific
embodiments, it will be understood that it is not intended to limit
the invention to the described embodiments. On the contrary, it is
intended to cover alternatives, modifications, and equivalents as
may be included within the spirit and scope of the invention as
defined by the appended claims. In the following description,
specific details are set forth in order to provide a thorough
understanding of the present invention. The present invention may
be practiced without some or all of these specific details. In
addition, well known features may not have been described in detail
to avoid unnecessarily obscuring the invention.
[0015] Query aspects may include query qualifiers (i.e., terms
added to queries during reformulations). These reformulations are
monitored and logged on a regular basis, at a time before a
particular search of interest, in certain embodiments of the
invention. Embodiments find such aspects and upon receiving any
original query at run-time, the query qualifiers can be covered by
some number of aspects, which are then presented to the user along
with results of the original query. Such actions taken before a
current, or new search is undertaken are referred to as "offline,"
whereas actions taking place to return search results for a new
search may be referred to as "online" or as "run time."
[0016] FIG. 1 is a flow chart illustrating offline activities.
While such steps generally occur offline, or prior to run time of a
current search, it should be understood that in some embodiments
one or more of the steps may occur at run time.
[0017] In step 102, the system searches logs for query
reformulations. For all or a subset of the query reformulations
found, the system extracts and stores the query reformulation and
optionally other information relating to the reformulations in step
106. In one embodiment, only a subset of query reformulations that
exceed a threshold are utilized. For example, a threshold of query
reformulations that result in a user click may be utilized. The
threshold will of course vary depending on user traffic and the
particular search engine and related databases, but in one example,
only query reformulations resulting in more than about four to five
hundred clicks and associated views of a page/site per month would
be utilized.
[0018] Next, in step 110 the system clusters the extracted
reformulations. Modified star clustering is one of many methods
that may be employed by embodiments of the invention in order to
pick the set A of N query aspects. The aim is to build the set A
such that, with the best k aspects being picked for each query, and
the total similarity between the query qualifiers and the
corresponding k aspects per query are maximized, as seen in the
table below.
TABLE-US-00001 Algorithm 2 Modified Star Clustering 1: input: set
of qualifiers = .orgate..sub.q.di-elect cons.Q (q), qualifier
frequencies L (.upsilon.).A-inverted..upsilon. .di-elect cons. ,
threshold .sigma., N 2: Create a graph = ( , .epsilon.) where is
the set of qualifiers, and .epsilon. = {(i,j)|cosSim(i,j) >
.sigma.} 3: n .rarw. 0, Left .rarw. , A .rarw. {.phi.} 4: while n
< N and Left .noteq. {.phi.} do 5: hub .rarw.
argmax.sub..upsilon..di-elect cons.LeftL(.upsilon.) 6: spokes
.rarw. {i|(hub, i) .di-elect cons. .epsilon.} 7: star .rarw. {hub}
.orgate. spokes 8: A .rarw. A .orgate. {star} 9: Left .rarw.
Left\star 10: n .rarw. n + 1 11: end while 12: output: set A of at
most N query aspects
[0019] Further details on the modified star clustering process can
be found in a paper by Javed A. Aslam, Ekaterina Pelekov, and
Daniela Rus, entitled "The Star Clustering Algorithm for Static and
Dynamic Information Organization," published in the Journal of
Graph Algorithms and Applications, 8(1), 2004, hereby incorporated
by reference in the entirety. Any other clustering technique may be
employed, although the modified star technique is preferred. One
advantage of the modified star method is that it does not require
specification of how many clusters are desired. Other examples of
clustering techniques that may be employed include, for example,
original star, K-means, expectation maximization ("EM") or
Metis.
[0020] In step 114, the system makes an inter cluster (local) move
to maximize the number of user queries covered with the facet
clusters that have been created. An embodiment of the local search
technique associated with the inter cluster move is described in
the table below.
TABLE-US-00002 Algorithm 4 Local-Search 1: input: set of queries Q,
set of qualifiers = .orgate..sub.q.di-elect cons.Q (q), max- imum
number of query aspects N 2: Initialize the set of query aspects A
to the output of Algorithm 2 with at most N aspects 3: Compute the
best k query aspects from A for each original query using Algorithm
1 4: repeat 5: reselectK = "no" 6: move .rarw. Best-Local-Move(Q,
A, reselectK) 7: if move = .phi. then 8: reselectK = "yes" 9: move
.rarw. Best-Local-Move(Q, A, reselectK) 10: end if 11: if move
.noteq. .phi. then 12: Updqate A according to move 13: if reselectK
= "yes" then 14: Recompute the best k query aspects from the new A
for each original query using Algorithm 1 15: end if 16: end if 17:
until move = .phi. 18: output: set A of at most N query aspects
[0021] Then in step 118 the system picks a subset of clusters from
step 114. The number of clusters chosen and methodology of choosing
the clusters may vary. In one embodiment the top 50-150 cluster are
chosen, preferably the top 100.
[0022] FIG. 2 is a flow chart of online steps embodiments may
utilize. In step 202, the system will receive a search query from a
user. Then in step 206, the system will pick k aspects. In one
embodiment this is done according to the pick-k process described
below. Of course it should be understood that this may be done in
numerous other ways.
The Pick-k Process
[0023] Given a set A of query aspects, and a query q, the method
picks k aspects a1, . . . , ak.epsilon.A so as to maximize the
similarity measure F(l(q), .orgate..sub.i=1.sup.k.alpha..sub.k,
Embodiments maximize any similarity function of the form
S ( X , Y 1 , , Y k ) = f 0 ( X ) + .SIGMA. i f ( X , Y i ) g 0 ( X
) + .SIGMA. i g ( X , Y i ) , ( 5 ) ##EQU00001##
where X and Yi are vectors in some finite-dimensional space, the
functions g.sub.0( ) and g( ) are non-negative, X is fixed from the
start, and the Yi vectors must be picked from a set Y.
TABLE-US-00003 Algorithm 1 Pick-k 1: input: k, vector X, set of
vectors .gamma. 2: .alpha. .rarw. f.sub.0(X)/k, .beta. .rarw.
g.sub.0(X)/k, Y .rarw. {.phi.}, n .rarw. k 3: while n > 0 do 4:
M .rarw. { arg max , f ( X , Y i ) + .alpha. / n g ( X , Y i ) +
.beta. / n } ##EQU00002## 5: If |M| > n, then keep any n
elements in M and throw away the rest 6: Y .rarw. Y .orgate.
(.orgate..sub.m.epsilon.MY.sub.m) 7: .alpha. .rarw. .alpha. +
.SIGMA..sub.m.epsilon.Mf(X,Y.sub.m) 8: .beta. .rarw. .beta. +
.SIGMA..sub.m.epsilon.Mg(X,Y.sub.m) 9: n .rarw. n - |M| 10: end
while 11: output: picked elements Y .OR right. .gamma.
[0024] Then in step 210 the system will provide the k query aspects
along side the search results. In other words, it will cause a
client computer to display the query aspects along side the query
results.
[0025] FIGS. 3A and 3B are graphs illustrating the performance of
different embodiments (of selecting and presenting broad hidden
query aspects) as compared to a baseline. FIG. 3A illustrates a
performance comparison of the embodiments based on one broad
aspect, that is k=1, whereas FIG. 3B illustrates a performance
comparison of the embodiments based on three broad aspects, that is
k=3. Bar 300 represents the baseline. Bar 302 represents an
embodiment that employs original star clustering (ORGSTAR), without
the local (inter cluster) move of step 114 described above. Bar 304
represents an embodiment that employs modified star clustering in
step 110 (MODSTAR) given in Algorithm 2, but with without the local
(inter cluster) move of step 114 described above and given in
Algorithm 4 (LOCSEARCH). Bar 306 represents an embodiment that
employs modified star clustering in step 110, together with the
pick K algorithm of step 206. Bar 308 represents an embodiment that
employs modified star clustering in step 110, together with the
local (inter cluster) move of step 114, and the pick K algorithm of
step 206.
[0026] Searches in accordance with embodiments of the invention in
some centralized manner. This is represented in FIG. 4 by server
408 and data store 410 which, as will be understood, may correspond
to multiple distributed devices and data stores. The invention may
also be practiced in a wide variety of network environments
including, for example, TCP/IP-based networks, telecommunications
networks, wireless networks, public networks, private networks,
various combinations of these, etc. Such networks, as well as the
potentially distributed nature of some implementations, are
represented by network 412.
[0027] In addition, the computer program instructions with which
embodiments of the invention are implemented may be stored in any
type of tangible computer-readable media, and may be executed
according to a variety of computing models including a
client/server model, a peer-to-peer model, on a stand-alone
computing device, or according to a distributed computing model in
which various of the functionalities described herein may be
effected or employed at different locations.
[0028] The above described embodiments have several advantages and
are distinct from prior methods. For example, the extraction of
broad aspects from query logs, and their use in query refinement,
have several advantages over prior query suggestion methods. The
first advantage has to do with the discovery and use of broad
aspects and query suggestions. The broad nature of the query
aspects ensures that enough data is available to reliably construct
these aspects and predict when they apply to user queries. This is
in contrast to query suggestions that are often applicable to
specific queries and hence learned from significantly lesser amount
of data. The availability of more data for analysis also implies
that the technique avoids presenting the user with redundant query
refinement options, as is often the case with query suggestions.
Since by definition there are fewer broad aspects of queries than
query suggestions, they can be better maintained without the need
for manual intervention.
[0029] The second and more principal advantage is more subtle, and
concerns the way users navigate the search results page. It has
been shown in user eye-tracking studies as well as by modeling user
clicking behavior that users scan search result pages extremely
quickly and don't make a complete determination of the relevance of
results before clicking. Users therefore acclimate to repetitive
features in the search results page and use them to make clicking
decisions. For example, the bolded words in the title of the result
indicates to users that the title matched the query very closely,
while the indented search result indicates to the user that this
search result is somehow related to the previous one. When users
are exposed to query suggestions, which by definition are
specialized to the current query, they have to carefully read the
suggested queries in order to decide whether to click on them.
Since the users scan result pages very fast, they often skip the
suggested queries as irrelevant content. By using a limited number
of broad aspects of queries as options for refinement the user will
then need less attention to interpret the aspects, for example
"Reviews and Ratings," when they are presented to them.
[0030] While the invention has been particularly shown and
described with reference to specific embodiments thereof, it will
be understood by those skilled in the art that changes in the form
and details of the disclosed embodiments may be made without
departing from the spirit or scope of the invention
[0031] In addition, although various advantages, aspects, and
objects of the present invention have been discussed herein with
reference to various embodiments, it will be understood that the
scope of the invention should not be limited by reference to such
advantages, aspects, and objects. Rather, the scope of the
invention should be determined with reference to the appended
claims.
* * * * *