U.S. patent application number 15/231628 was filed with the patent office on 2016-12-01 for ranking advertisements with pseudo-relevance feedback and translation models.
The applicant listed for this patent is Yahoo! Inc.. Invention is credited to Vanessa Murdock.
Application Number | 20160350797 15/231628 |
Document ID | / |
Family ID | 41118649 |
Filed Date | 2016-12-01 |
United States Patent
Application |
20160350797 |
Kind Code |
A1 |
Murdock; Vanessa |
December 1, 2016 |
RANKING ADVERTISEMENTS WITH PSEUDO-RELEVANCE FEEDBACK AND
TRANSLATION MODELS
Abstract
Methods, computer products, and systems for selecting
advertisements in response to an internet query are provided. The
method provides for receiving an internet query that includes query
terms, retrieving and then ranking a first set of advertisements in
response to the internet query using a query likelihood model. The
method then selects sampling terms using pseudo-relevance feedback
and translation models, the internet query, and the first set of ad
materials obtained using the query likelihood model. The sampling
terms are chosen from a distribution of terms from the terms in the
first set of ad materials, and the pseudo-relevance feedback model
is used to select a term in the distribution of terms based on a
probability. The use of translation models enhances the topicality
of the results because the distribution words selected are related
to the terms in the original query as indicated by their
translation probabilities.
Inventors: |
Murdock; Vanessa;
(Barcelona, ES) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Yahoo! Inc. |
Sunnyvale |
CA |
US |
|
|
Family ID: |
41118649 |
Appl. No.: |
15/231628 |
Filed: |
August 8, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12059098 |
Mar 31, 2008 |
9411886 |
|
|
15231628 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06Q 30/02 20130101;
G06Q 30/0244 20130101; G06F 16/24578 20190101; G06F 16/951
20190101; G06N 7/005 20130101; G06Q 30/0277 20130101 |
International
Class: |
G06Q 30/02 20060101
G06Q030/02; G06N 7/00 20060101 G06N007/00; G06F 17/30 20060101
G06F017/30 |
Claims
1. A method to select advertisements in response to an internet
query, the method comprising: receiving an internet query (Q), the
internet query including query terms (q); retrieving information
related to the received internet query including information about
a set of internet documents; ranking the set of internet documents
by a probability that the query was generated by a sampling of
terms from each of the set of internet documents; selecting a top
ranked number (N) of the ranked set of internet documents;
constructing a model of the selected top ranked number (N) of the
ranked set of internet documents; selecting content terms from the
model; ranking a first set of ad materials (S) in response to the
model, S including a plurality of ad materials (s); and outputting
advertisements based on the first set of ad materials for display
in a portion of an internet browser.
2. The method as recited in claim 1, further comprising, generating
an expanded query including the internet query and the selected
content terms from the model, retrieving a second set of ad
materials using the expanded query, ranking the second set of ad
materials, and outputting advertisements based on the second set of
ad materials.
3. The method as recited in claim 2, wherein ranking the second set
of ad materials includes ranking the second set of ad materials
according to a likelihood the second set of ad materials generated
the expanded query.
4. The method as recited in claim 1, wherein ranking the set of
internet documents by the probability that the query was generated
by the sampling from each of the set of internet documents includes
smoothing the ranking of the set of internet documents to give a
non-zero probability to any unseen query terms.
5. The method as recited in claim 1, wherein selecting the content
terms from the model includes, selecting a translation table, using
the selected translation table to estimate a probability of a query
term given a term from a vocabulary directly, and choosing a number
of content terms with a highest estimated probability.
6. The method as recited in claim 5, wherein estimating the
probability of the query term given the term from the vocabulary
directly further includes, calculating the probability p(q|w) based
on the translation probability t(q|w), calculating a probability of
the internet query Q given the term w (p(Q|w)) as a product for all
q of all the probabilities p(q|w), including making p(q|w) equal to
the probability t(q|w), calculating a probability of the term w
(p(w)) as the sum for all s in S of probabilities of w given the ad
materials s (p(w|s)) times a probability of the ad materials s
(p(s)), and calculating the probability p(w|Q) as proportional to
the probability p(w) times the probability p(Q|w).
7. The method as recited in claim 6, wherein selecting content
terms further includes calculating a probability of query term q
given ad materials s (p(q|s)) as equal to the number of times q
occurs in the ad materials s divided by a number of terms in the ad
materials s, wherein calculating the probability p(q|w) further
includes making p(q|w) equal to the sum for all s of the
probability t(q|w) times the probability p(q|s) times the
probability p(w|s) times the probability p(s) divided by the
probability p(w).
8. The method as recited in claim 1, wherein the second set of ad
materials is selected from a group consisting of, a content match
ad for the internet query, and a sponsored search ad for the
internet query.
9. The method as recited in claim 1, wherein the ranking the set of
internet documents by the probability that the query was generated
by the sampling of terms from each of the set of internet documents
includes ranking the set of internet documents by the probability
that the query was generated by a random sampling of terms from
each of the set of internet documents.
10. A non-transitory computer readable medium having program
instructions for selecting advertisements in response to an
internet query, comprising: program instructions for receiving an
internet query (Q), the internet query including query terms (q);
program instructions for retrieving information related to the
received internet query including information about a set of
internet documents; program instructions for ranking the set of
internet documents by a probability that the query was generated by
a sampling of terms from each of the set of internet documents;
program instructions for selecting a top ranked number (N) of the
ranked set of internet documents; program instructions for
constructing a model of the selected top ranked number (N) of the
ranked set of internet documents; program instructions for
selecting content terms from the model; program instructions for
ranking a first set of ad materials (S) in response to the model, S
including a plurality of ad materials (s); and program instructions
for outputting advertisements based on the first set of ad
materials for display in a portion of an internet browser.
11. The non-transitory computer readable medium having program
instructions as recited in claim 10, further comprising, program
instructions for generating an expanded query including the
internet query and the selected content terms from the model,
program instructions for retrieving a second set of ad materials
using the expanded query, program instructions for ranking the
second set of ad materials, and program instructions for outputting
advertisements based on the second set of ad materials.
12. The non-transitory computer readable medium having program
instructions as recited in claim 11, wherein ranking the second set
of ad materials includes ranking the second set of ad materials
according to a likelihood the second set of ad materials generated
the expanded query.
13. The non-transitory computer readable medium having program
instructions as recited in claim 10, wherein ranking the set of
internet documents by the probability that the query was generated
by the sampling from each of the set of internet documents includes
smoothing the ranking of the set of internet documents to give a
non-zero probability to any unseen query terms.
14. The non-transitory computer readable medium having program
instructions as recited in claim 10, wherein the program
instructions for selecting the content terms from the model
includes, program instructions for selecting a translation table,
program instructions for using the selected translation table to
estimate a probability of a query term given a term from a
vocabulary directly, and program instructions for choosing a number
of content terms with a highest estimated probability.
15. The non-transitory computer readable medium having program
instructions as recited in claim 14, wherein the program
instructions for estimating the probability of the query term given
the term from the vocabulary directly further includes, program
instructions for calculating the probability p(q|w) based on the
translation probability t(q|w), program instructions for
calculating a probability of the internet query Q given the term w
(p(Q|w)) as a product for all q of all the probabilities p(q|w),
including making p(q|w) equal to the probability t(q|w), program
instructions for calculating a probability of the term w (p(w)) as
the sum for all s in S of probabilities of w given the ad materials
s (p(w|s)) times a probability of the ad materials s (p(s)), and
program instructions for calculating the probability p(w|Q) as
proportional to the probability p(w) times the probability
p(Q|w).
16. The non-transitory computer readable medium having program
instructions as recited in claim 15, wherein program instructions
for selecting content terms further includes program instructions
for calculating a probability of query term q given ad materials s
(p(q|s)) as equal to the number of times q occurs in the ad
materials s divided by a number of terms in the ad materials s,
wherein program instructions for calculating the probability p(q|w)
further includes program instructions for making p(q|w) equal to
the sum for all s of the probability t(q|w) times the probability
p(q|s) times the probability p(w|s) times the probability p(s)
divided by the probability p(w).
17. The non-transitory computer readable medium having program
instructions as recited in claim 11, wherein the second set of ad
materials is selected from a group consisting of, a content match
ad for the internet query, and a sponsored search ad for the
internet query.
18. The non-transitory computer readable medium having program
instructions as recited in claim 11, wherein the program
instructions for ranking the set of internet documents by the
probability that the query was generated by the sampling of terms
from each of the set of internet documents includes program
instructions for ranking the set of internet documents by the
probability that the query was generated by a random sampling of
terms from each of the set of internet documents.
19. A method of outputting advertisements in response to an
internet query, the method comprising: receiving an internet query,
the internet query including query terms; retrieving information
related to the received internet query including information about
a set of internet documents; ranking the set of internet documents
by a probability that the query was generated by a random sampling
from each of the set of internet documents including smoothing the
ranking of the set of internet documents to give a non-zero
probability to any unseen query terms; selecting a top ranked
number (N) of the ranked set of internet documents; constructing a
model of the selected top ranked number (N) of the ranked set of
internet documents; selecting content terms from the model; ranking
a first set of ad materials (S) in response to the model, S
including a plurality of ad materials (s); generating an expanded
query including the internet query and the selected content terms
from the model; retrieving a second set of ad materials using the
expanded query; ranking the second set of ad materials according to
a probability the second set of ad materials generated the expanded
query; and outputting advertisements based on the second set of ad
materials.
Description
CLAIM OF PRIORITY
[0001] This application is a Continuation Application under 35 USC
.sctn.120 of U.S. application Ser. No. 12/059,098, filed on Mar.
31, 2008. The disclosure of the above-identified patent application
is incorporated in its entirety herein by reference.
BACKGROUND
[0002] 1. Field of the Invention
[0003] The present invention relates to methods and systems for
selecting internet advertisements in response to an internet query,
and more particularly, methods and systems for generating expanded
search queries to improve the relevance of the advertisements
found.
[0004] 2. Description of the Related Art
[0005] Internet advertising methods retrieve ads for use in a
sponsored search or content match system. Both types of advertising
systems use a similar representation of the ad. That is to say, in
both systems an advertisement is represented by a set of keywords,
a title, a short description, and a URL, which when clicked takes
the user to the advertiser's web page. Typically the user is shown
the title, the description and the URL.
[0006] Sponsored search presents advertisements in the search
results in response to a user's query to a search engine. Typically
the ads are short and textual in nature, and appear at the top of
the search results or to the side. The keywords are typically
matched to the user query, and when an ad is found whose keywords
match the user query, the ad is shown to the user.
[0007] In a content match system, the ads are placed in a web page
based on the content of the web page itself. The system extracts a
set of key terms from the web page to represent its content, and
then matches the key terms from the web page to the keywords
associated with the advertisement. Ads are dynamically placed in a
web page as a function of the expected revenue to be generated, and
the similarity between the web page and the ad keywords.
[0008] Both sponsored search and content match systems rely on
sentence retrieval technology to retrieve ad candidates to be shown
to the user. In sponsored search the sentence retrieval is in
response to a user query. In content match the sentence retrieval
is in response to a set of key terms that represent the topic of
the web page, but the same retrieval technology can be applied to
both systems.
[0009] The quality of sentence retrieval in response to a query
term depends on numerous factors, such as the number of terms in
the query, the specificity of the query, the number of potential
meanings for query terms, the quality of the retrieval mechanisms,
the amount of time available for retrieval, etc. Some of the
applications for sentence retrieval include question answering,
result abstracts related to internet URLs (Universal Resource
Locator), and selection of advertising based on the provided
query.
[0010] Online advertising systems operate on short textual
descriptions of the ad, typically including a title, a description
of one or two sentences in length, a set of keywords, and a search
context. For example, the search context can be either a Web page
in the case of contextual advertising, or a query in the case of a
sponsored search. In this document we refer to ad materials as the
set of title, description, and keywords that comprise an Internet
advertisement. The term advertisement refers to the subset of these
materials that is shown to a user in the search interface.
[0011] Pseudo-relevance feedback has been shown to be effective for
document retrieval, but it has had mixed results when applied to
the retrieval of short texts such as sentences. The term
pseudo-relevance feedback is related to relevance feedback, where
feedback on the relevance of the results from a first search is
given to the system by a user in order to do another search that
relates to the documents with the better scores. Some systems use
the "more like this" button to implement relevance feedback.
Pseudo-relevance feedback relates to simulating relevance feedback
by the system before performing another focused search.
[0012] However, pseudo-relevance feedback is not very effective
when retrieving short texts, such as advertisements. Advertisements
are sensitive to expansion because the term frequency distribution
is relatively flat, and even a small number of noisy expansion
terms may be completely off the intended topic for the original
query.
[0013] It is in this context that embodiments of the invention
arise.
SUMMARY
[0014] Embodiments of the present invention provide methods,
computer products, and systems for selecting advertisements in
response to an internet query. One method includes using
pseudo-relevance feedback and translation models to perform a
second internet search with an expanded query that adds related
terms to the original internet query terms.
[0015] It should be appreciated that the present invention can be
implemented in numerous ways, such as a process, an apparatus, a
system, a device or a method on a computer readable medium. Several
inventive embodiments of the present invention are described
below.
[0016] In one embodiment, a method for selecting advertisements in
response to an internet query is provided. The method includes
receiving an internet query with at least one query term,
retrieving and then ranking a first set of ad materials in response
to the internet query using a query likelihood model. The method
continues by generating a distribution of words from the first set
of ad materials and then selecting sampling words from the
distribution using pseudo-relevance feedback and translation
models, the internet query, and the first set of ad materials
obtained using the query likelihood model. The sampling words are
chosen from a distribution of words from the words in the first set
of ad materials, and the pseudo-relevance feedback model is used to
select words (w) in the distribution of words based on a
probability that word w generates query term q (p(q|w)). The
translation model is used to calculate a translation probability
that w translates into q (t(q|w)), and then t(q|w) is utilized to
evaluate the probability p(q|w). The method then retrieves and
ranks a second set of ad materials using an expanded query formed
by adding the selected distribution words to the original internet
query. Advertisements based on the second set of ad materials are
then presented to the user. The use of translation models enhances
the topicality of the results because the distribution words
selected are related to the terms in the original query as
indicated by their translation probabilities.
[0017] In another embodiment, a system for selecting advertisements
in response to an internet query is provided. The system includes a
search server, an ad server, a translation server and a display.
The search server receives internet queries and the ad server
generates ad materials. Given an internet query, the ad server
retrieves and ranks a first set of ad materials in response to the
internet query using a query likelihood model. The translation
server receives the first set of ad materials and selects a
plurality of distribution words using pseudo-relevance feedback and
translation models, the internet query, and the first set of ad
materials. Once the translation server creates the distribution
words, the ad server retrieves and ranks a second set of ad
materials using an expanded query that includes the original
internet query plus the selected distribution words. The display is
used to present advertisements based on the second set of ad
materials to the user.
[0018] Other aspects of the invention will become apparent from the
following detailed description, taken in conjunction with the
accompanying drawings, illustrating by way of example the
principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] The invention may best be understood by reference to the
following description taken in conjunction with the accompanying
drawings in which:
[0020] FIG. 1 describes a simplified schematic diagram of a network
system for implementing embodiments of the present invention.
[0021] FIG. 2 shows samples of translation tables used in one
embodiment of the invention.
[0022] FIG. 3 shows the flow of an algorithm to rank advertisements
in accordance with one embodiment of the invention.
[0023] FIG. 4 describes the details of the flow to select
distribution words using pseudo-relevance feedback and translation
models according to one embodiment using model R1.
[0024] FIG. 5 describes the details of the flow to select
distribution words using pseudo-relevance feedback and translation
models according to one embodiment using model R2.
DETAILED DESCRIPTION
[0025] The following embodiments describe methods, computer
products, and systems for selecting advertisements in response to
an internet query. The method uses pseudo-relevance feedback and
translation models to expand the original query with related terms
to retrieve ad materials. The use of translation models enhances
the topicality of the results because the distribution words
selected are related to the terms in the original query as
indicated by their translation probabilities.
[0026] It will be obvious, however, to one skilled in the art, that
the present invention may be practiced without some or all of these
specific details. In other instances, well known process operations
have not been described in detail in order not to unnecessarily
obscure the present invention.
[0027] Sentence Retrieval is the task of retrieving a relevant
sentence in response to a query, a question, or a reference
sentence. Sentence retrieval is used in the areas of question
answering, getting an abstract or snippet for a document, selecting
advertisement based on the content of the found sentences,
selecting an advertisement such a sponsored search based on the
query, novelty detection, etc. The techniques used for sentence
retrieval are usually the same techniques used for document
retrieval. However, document retrieval differs from sentence
retrieval and some of the approaches used for document retrieval do
not perform well for sentence retrieval. The term sentence used
herein for the description of the invention should be interpreted
as a general term that can refer to any part of ad materials, such
as a description, a title, or a set of key terms, or to any
combination of the parts in ad materials.
[0028] FIG. 1 describes a simplified schematic diagram of a network
system for implementing embodiments of the present invention.
Internet 110 is used to interconnect users with servers. Users 118
access the Internet 110 via a variety of the devices, such as PCs
104, laptops 106, mobile phones 108, etc. These are merely
examples, and any other device used to access Internet 110 can be
used to implement embodiments of this invention. For example, the
devices may be wired or wireless. In one embodiment, a browser 102
is executed on a device, and the graphical user interface is
presented on a display. The browser 102, provides the functionality
for accessing the Internet.
[0029] In accordance with one embodiment, search server 114
provides search services to Internet users. Typically, search
server 114 retrieves information related to a query provided by a
user. The information provided can include references to related
documents, including a title, an abstract or snippet summarizing
content, a URL, size and format of the document, pointers to
similar documents or cached versions, translations, etc. In
addition, search server 114 can provide answers to questions,
related queries suggestions, additional services offered to
internet users, etc. Ad server 112 generates ad materials based on
the internet query, such as advertisement based on the found
documents or the query, and sponsored searches.
[0030] Translation server 116 provides translation services. In one
embodiment, translation server 116 receives information about a set
of internet documents from search server 114 and creates a
distribution of terms from the documents. Translation server then
uses translation models to select a set of terms from the
distribution of terms based on the translation probabilities of the
terms in the query that generated the set of internet
documents.
[0031] Although three different servers are described by way of
example, the person skilled in the art will appreciate that
multiple configurations are possible by combining several servers
into one system, by having distributed systems where a single
function can be accomplished by a plurality of different servers
scattered across the Internet, or by caching information from the
different databases at the different servers to accelerate the
processing of information.
[0032] Query likelihood for document retrieval ranks documents by
the probability that the query was generated by the same
distribution of terms the document is from. When query likelihood
is used to retrieve sentences, the term "document" is replaced by
the term sentence, and the sentence is part of a document.
P ( s Q ) .varies. P ( s ) q .di-elect cons. Q Q P ( q s ) ( 1 )
##EQU00001##
[0033] Where Q is the query, |Q| is the number of terms in the
query, q.sub.i, is the i.sup.th term in the query, and s is a
document (or sentence). Words that appear in the query and do not
appear in the document have a probability of zero, which results in
a zero probability of the query having been generated by the
document. To resolve this problem, smoothing is used to give a
non-zero probability to unseen words.
[0034] In the query likelihood approach, the words in a document
are assumed to have a multinomial distribution. Documents are
ranked according to the probability that the query was generated by
random sampling from the document. Documents are sufficiently long
to be distinguishable in terms of their word probability
distributions. Sentences, which have considerably fewer words than
documents, may be too short to accurately estimate the probability
distributions of the words, or to compare those distributions to
each other.
[0035] Relevance feedback collects terms from known relevant
documents or clusters of related terms, and uses these terms in
place of the original query. Pseudo-relevance feedback also
replaces the query with terms from documents, but whereas in
relevance feedback the documents or term clusters have been judged
relevant by a person (for instance, the user), in pseudo-relevance
feedback the documents are automatically retrieved and assumed--but
not guaranteed--to be relevant.
[0036] Relevance models perform an initial query likelihood
retrieval, using the original query. A model is constructed from
the top N retrieved documents, and m content terms are sampled from
the distribution of terms in the model. This set of sampled terms
serves as a distribution of query terms, and the documents are
re-ranked according to the likelihood they generated the new
distribution of query terms. In another embodiment, the m terms are
added to the query terms before re-ranking the documents. N and m
are parameters to be tuned.
[0037] A relevant document will contain many of the terms in the
expanded query, because the expansion terms co-occurred frequently
in the top N documents. Terms in the expanded query that are not in
the document will get the background probability, but for a
relevant document the scores for the matching terms will dominate
the scores from the non-matching terms. If the query is expanded
with a few spurious terms from a different topic, there may be
documents that are retrieved because of these terms, but their
scores will be lower than the scores of the relevant documents
because there will be fewer spurious matching terms.
[0038] A relevant sentence, by contrast, will match a few terms
from the expanded query, and is unlikely to contain multiples of
the same term. If the query is expanded with a few spurious terms
unrelated to the topic of the query, a non-relevant sentence that
matches the spurious terms will match as many terms in the query as
a relevant sentence. Furthermore, since the sentence has so few
terms to begin with, most of the terms in the expanded query will
receive the background score, causing the scores of relevant and
non-relevant sentences to be similar. Still yet, relevance models
are designed to capture a model of the topic of the document. By
their nature relevance models capture the notion of relevance
required by document retrieval: the notion of topicality. For many
sentence retrieval tasks, topicality is not sufficient for
relevance. For this reason, in addition to the reasons outlined
above, relevance models are not ideal for sentence retrieval.
[0039] Pseudo-relevance models produce a query expansion for ad hoc
document retrieval by creating a distribution of terms from the set
of relevant documents for a given query. A pseudo-relevance model
estimates the set of relevant documents by doing an initial
retrieval, and then creating a unigram language model of the terms
in the retrieved documents. The model then creates a new query by
sampling from the distribution, according to:
P(w|Q)=kP(w)P(Q|w) (2)
[0040] Where Q is a query, w is a term in the vocabulary, k is a
constant, and P(Q|w) is estimated as the product over each term in
the query of probability of the query term q given a term w in the
vocabulary:
P ( Q w ) = q .di-elect cons. Q P ( q w ) ( 3 ) ##EQU00002##
[0041] The probability of a word in the query, given a word from
the vocabulary, is estimated over the working set of "relevant"
documents D from the initial retrieval pass:
P ( q w ) = d .di-elect cons. D P ( q d ) P ( w d ) P ( d ) P ( w )
( 4 ) ##EQU00003##
[0042] Where the probability of a query term (or word from the
vocabulary) given a document, is the frequency of the term in the
document divided by the total number of terms in the document:
P ( w d ) = # ( w ; d ) d ( 5 ) ##EQU00004##
[0043] P(w) is the prior probability of a word in the
vocabulary:
P ( w ) = d .di-elect cons. D P ( w d ) P ( d ) ( 6 )
##EQU00005##
[0044] The number of documents from which to sample, and the number
of terms to be sampled are parameters to be set empirically. In
theory, a query could be expanded with the entire vocabulary of the
set of "relevant" documents because the model assumes that there is
some distribution of terms from which relevant documents are
sampled. Thus the distribution of terms in a relevant document will
be closer to the distribution of terms in the expanded query than
the distribution of terms in a non-relevant document. The frequency
of a term in the document is an indicator of its topic, as terms
that represent the topic of the document are more likely to have a
higher frequency, and off-topic terms generally have a lower
frequency.
[0045] The problem for sentence retrieval arises in equation (5),
where the expansion terms are weighted by their frequency in the
sentence. Sentences typically have one instance of a content term.
Term frequency is not a reliable indicator of topicality as each
term in a sentence has approximately the frequency 1 divided by the
number of terms in the sentence. Higher frequency terms in
sentences are most likely stop words--frequent words that carry
little content such as "of" and "the". Embodiments of the invention
provide for a family of models, named Model R family, that weight
expansion terms according to their probability of being related to
the original query terms, rather than by their frequency in the
sentence.
[0046] Machine translation has its foundations in statistical
speech recognition. Translation models are trained on a parallel
corpus of sentences in the source language paired with sentences in
the target language. In sentence retrieval, the source (S) is the
sentence to be ranked, and the observation or target (Q) is the
query. Translation as a concept is not intuitive when discussing
two sentences that are in the same language. Translation models for
sentence retrieval treat the two sentences as if they are in
completely different languages, and learn a translation table
assuming the two vocabularies are distinct.
[0047] FIG. 2 shows samples of translation tables used in one
embodiment of the invention. The examples in columns one and two
are from a translation table learned from a parallel corpus of
questions and answer sentences. The third and fourth columns are
from a translation table learned from a parallel corpus
artificially constructed from terms sampled according to their
mutual information with the document, paired with sentences from
the document.
[0048] Using translation models generalizes a variety of data, and
there is no dependency on preprocessing, such as parsing or
tagging. A major difference between machine translation and
sentence retrieval is that machine translation assumes there is
little, if any, overlap in the vocabularies of the two languages.
In sentence retrieval, the overlap between the two vocabularies is
considered by the retrieval algorithm. In one embodiment of the
invention, the translation probabilities are taken from a
translation table learned prior to ranking.
[0049] FIG. 3 shows the flow of an algorithm to rank advertisements
in accordance with one embodiment of the invention. The method
begins using query likelihood 302-306 to retrieve and select ad
materials related to an internet query. In operation 302 the query
(Q) is received by search server 114 of FIG. 1. The query Q
includes one or more query terms (q). Ad server 112 then retrieves
ad materials relevant for the given query. In operation 306, the
retrieved ad materials are ranked and a first set (S) of ad
materials (s) is formed by selecting the top most relevant ad
materials previously retrieved. Then, a distribution of the terms
in the first set of ad materials is obtained in operation 308.
[0050] In operation 310, the top distribution words are selected
using pseudo-relevance feedback and translation models, the query,
and the first set of ad materials. Two embodiments for performing
operation 310 are described below with respect to FIGS. 4 and 5,
describing models R1 and R2 respectively. In operation 312 an
extended query is formed by adding the selected distribution words,
also known as sampling words, to the original query terms. In
another embodiment, only the selected distribution words are used
for the expanded query. The new set of ad materials are ranked in
operation 314, and then advertisements based on the new set of ad
materials are displayed in operation 316. The advertisements are
based on the ad materials found and the advertisements displayed
may include any combination of title, description and keywords from
the ad materials, where the title, description and keywords may be
shown in full or in part. In one embodiment, the results are
presented in an internet browser window.
[0051] R1 is the first model in the model R family. In model R1,
the distribution terms are weighed according to translation
probabilities. In previous pseudo-relevance models the sentences
are sampled from a working set of higher quality terms, with a
wider vocabulary, but previous models have a problem because terms
are sampled according to a flat frequency distribution. In
equations (3) and (4) the probability of a term in the query given
a word from the vocabulary is estimated as the query term given the
document weighted by the frequency of the vocabulary term in the
document. In model R1, a translation table is used to estimate the
probability of a query term given a term from the vocabulary
directly. Model R1 then includes the following formulas:
P ( w Q ) = kP ( w ) P ( Q w ) ( 7 ) P ( Q w ) = q .di-elect cons.
Q P ( q w ) ( 8 ) P ( q w ) = t ( q w ) ( 9 ) P ( w ) = s .di-elect
cons. S P ( w s ) P ( s ) ( 10 ) ##EQU00006##
[0052] Where t(q|w) is estimated using a translation model, thus
replacing the unigram term frequency distribution in formula (3).
In this model, only related terms that appear in the translation
table will have a non-zero probability, and only terms that appear
in the working set of sentences that also are associated with the
query terms via the translation table are considered for expansion.
Furthermore, the expansion terms are weighted by their translation
probability with query terms.
[0053] FIG. 4 describes the details of the flow to select
distribution words using pseudo-relevance feedback and translation
models according to one embodiment using model R1. In operation
402, a translation table is selected. A plurality of translation
tables can be available, such as for example the ones described
with respect to FIG. 2. In operation 404, a first word w is
selected from the distribution of terms in S to start the process
of selecting the best distribution words for the extended
query.
[0054] In operation 406, formula (9) is evaluated, that is the
probability that word w generates q (p(q|w)) is equal to the
translation probability that w translates into q (t(q|w)). Formula
(8) is evaluated in operation 408, that is the probability of a
query Q given word w is calculated as the product for all q of the
probability p(q|w). Operation 410 evaluates formula (10) by
calculating the probability of word w (p(w)) as the sum for all ad
materials s in S of the probability of w given s (p(w|s)) times the
probability of the ad materials s (p(s)).
[0055] Formula (7) used to select the extended query is evaluated
in operation 412 by calculating the probability of word w given a
query Q (p(w|Q)) as proportional to the probability p(w) times the
probability p(Q|w). In operation 414, the method checks whether
there are more distribution words to appraise. If there are more
words, operation 418 selects a new distribution word and operations
406-412 are performed for the new word. If there are no more words
in operation 414, the method proceeds to operation 416, where a
group of top words are selected from the distribution words
according to their appraised p(w|Q). The method then continues to
operation 312 as seen in FIG. 3.
[0056] R1 is the second model in the model R family. In model R2,
the unigram distribution is weighed by the translation probability.
One issue with using translation tables is that the probabilities
are learned from a parallel corpus. As a parallel corpus is
somewhat noisy, the translation probabilities might be somewhat
unreliable. Instead of using the translation probabilities
directly, translation probabilities are used to weight the estimate
of P(q|w). Model R2 includes the following formulas:
P ( w Q ) = kP ( w ) P ( Q w ) ( 11 ) P ( Q w ) = q .di-elect cons.
Q P ( q w ) ( 12 ) P ( q w ) = s .di-elect cons. S t ( q w ) P ( q
s ) P ( w s ) P ( s ) P ( w ) ( 13 ) P ( w ) = d .di-elect cons. D
P ( w s ) P ( s ) ( 14 ) P ( q s ) = # ( w ; s ) s ( 15 )
##EQU00007##
[0057] Where t(q|w) is estimated using a translation model. In this
embodiment, the unigram term frequency is weighted by the
translation probability with the query term. The effect is to
discount high frequency terms that are not related to the query
terms, and to increase the variance in the contribution of content
terms.
[0058] FIG. 5 describes the details of the flow to select
distribution words using pseudo-relevance feedback and translation
models according to one embodiment using model R2. In operation
502, a translation table is selected. In operation 504, a first
word w is selected from the distribution of terms in S to start the
process of selecting the best distribution words for the extended
query. In operation 505, formula (15) is evaluated by calculating
the probability that a query term q given ad materials s as the
number of times q occurs in s divided by the number of terms in ad
materials s. In operation 506, formula (13) is evaluated, that is,
the probability that word w generates q (p(q|w)) is equal to the
sum for all ad materials s in S of the translation probability that
w translates into q (t(q|w)) times p(q|s) times p(w|s) times p(s)
divided by the probability of w (p(w)).
[0059] Formula (12) is evaluated in operation 508, that is, the
probability of a query Q given word w is calculated as the product
for all query terms, q of the probability p(q|w). Operation 510
evaluates formula (14) by calculating the probability of word w
(p(w)) as the sum for all ad materials s in S of the probability of
w given s (p(w|s)) times the probability of the ad materials s
(p(s)).
[0060] Formula (11) used to select the extended query is utilized
in operation 512 by calculating the probability of word w given a
query Q (p(w|Q)) as proportional to the probability p(w) times the
probability p(Q|w). In operation 514, the method checks whether
there are more distribution words to appraise. If there are more
words, operation 518 selects a new distribution word and operations
505-512 are performed for the new word. If there are no more words
in operation 514, the method proceeds to operation 516, where a
group of top words are selected from the distribution words
according to their appraised p(w|Q). The method then continues to
operation 312 as seen in FIG. 3.
[0061] Embodiments of the present invention may be practiced with
various computer system configurations including hand-held devices,
microprocessor systems, microprocessor-based or programmable
consumer electronics, minicomputers, mainframe computers and the
like. The invention can also be practiced in distributed computing
environments where tasks are performed by remote processing devices
that are linked through a wire-based or wireless network.
[0062] With the above embodiments in mind, it should be understood
that the invention can employ various computer-implemented
operations involving data stored in computer systems. These
operations are those requiring physical manipulation of physical
quantities. Usually, though not necessarily, these quantities take
the form of electrical or magnetic signals capable of being stored,
transferred, combined, compared and otherwise manipulated.
[0063] Any of the operations described herein that form part of the
invention are useful machine operations. The invention also relates
to a device or an apparatus for performing these operations. The
apparatus can be specially constructed for the required purpose, or
the apparatus can be a general-purpose computer selectively
activated or configured by a computer program stored in the
computer. In particular, various general-purpose machines can be
used with computer programs written in accordance with the
teachings herein, or it may be more convenient to construct a more
specialized apparatus to perform the required operations.
[0064] The invention can also be embodied as computer readable code
on a computer readable medium. The computer readable medium is any
data storage device that can store data, which can be thereafter be
read by a computer system. Examples of the computer readable medium
include hard drives, network attached storage (NAS), read-only
memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic
tapes and other optical and non-optical data storage devices. The
computer readable medium can also be distributed over a
network-coupled computer system so that the computer readable code
is stored and executed in a distributed fashion.
[0065] Although the method operations were described in a specific
order, it should be understood that other housekeeping operations
may be performed in between operations, or operations may be
adjusted so that they occur at slightly different times, or may be
distributed in a system which allows the occurrence of the
processing operations at various intervals associated with the
processing, as long as the processing of the overlay operations are
performed in the desired way.
[0066] Although the foregoing invention has been described in some
detail for purposes of clarity of understanding, it will be
apparent that certain changes and modifications can be practiced
within the scope of the appended claims. Accordingly, the present
embodiments are to be considered as illustrative and not
restrictive, and the invention is not to be limited to the details
given herein, but may be modified within the scope and equivalents
of the appended claims.
* * * * *