U.S. patent application number 12/163974 was filed with the patent office on 2009-07-02 for system and method for advertisement delivery optimization.
Invention is credited to Dominic Antonelli, Jonathan Burgstone, Justin Fiedler, Heston Liebowitz, Jeremy SCHIFF, Sharam Shirazi, Neil Warren.
Application Number | 20090172730 12/163974 |
Document ID | / |
Family ID | 40799631 |
Filed Date | 2009-07-02 |
United States Patent
Application |
20090172730 |
Kind Code |
A1 |
SCHIFF; Jeremy ; et
al. |
July 2, 2009 |
SYSTEM AND METHOD FOR ADVERTISEMENT DELIVERY OPTIMIZATION
Abstract
Systems and methods for advertisement delivery optimization are
disclosed. In one aspect, embodiments of the present disclosure
include a method, which may be implemented on a system, of
identifying multimedia content associated with a web-user. One
embodiment can include, analyzing the multimedia content to
retrieve a set of descriptors, comparing the set of descriptors
with metadata of a plurality of advertisements, selecting a
candidate pool of advertisements from the plurality of
advertisements based on relevancy indicated by the comparison, and
presenting at least a portion of the candidate pool of
advertisements to the web-user.
Inventors: |
SCHIFF; Jeremy; (Berkeley,
CA) ; Fiedler; Justin; (Whitefish Bay, WI) ;
Antonelli; Dominic; (Berkeley, CA) ; Liebowitz;
Heston; (Emeryville, CA) ; Warren; Neil;
(Berkeley, CA) ; Burgstone; Jonathan; (San
Francisco, CA) ; Shirazi; Sharam; (Atherton,
CA) |
Correspondence
Address: |
PERKINS COIE LLP
P.O. BOX 1208
SEATTLE
WA
98111-1208
US
|
Family ID: |
40799631 |
Appl. No.: |
12/163974 |
Filed: |
June 27, 2008 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61009358 |
Dec 27, 2007 |
|
|
|
61043039 |
Apr 7, 2008 |
|
|
|
Current U.S.
Class: |
725/34 |
Current CPC
Class: |
G06Q 30/0273 20130101;
G06Q 30/0244 20130101; G06Q 30/02 20130101 |
Class at
Publication: |
725/34 |
International
Class: |
H04N 7/10 20060101
H04N007/10 |
Claims
1. A method of advertisement optimization, comprising: identifying
multimedia content associated with a web-user; analyzing the
multimedia content to retrieve a set of descriptors; wherein the
multimedia content comprises image content; comparing the set of
descriptors with metadata of a plurality of advertisements;
selecting a candidate pool of advertisements from the plurality of
advertisements based on relevancy indicated by the comparison; and
presenting at least a portion of the candidate pool of
advertisements to the web-user.
2. The method of claim 1, further comprising: identifying a
non-candidate pool of advertisements from the plurality of
advertisements based on lack of relevancy indicated by the
comparison; and preventing at least a portion of the non-candidate
pool of advertisements from being presented to the web-user.
3. The method of claim 1, further comprising, assigning a unique
identifier to the multimedia content to associate the multimedia
content with at least one advertisement of the candidate pool of
advertisements.
4. The method of claim 3, further comprising: identifying the
multimedia content associated with a second web-user; retrieving
the unique identifier assigned to the multimedia content;
identifying the at least one advertisement associated with the
multimedia content based on the unique identifier; and presenting
the at least one advertisement of the candidate pool of
advertisements to the web-user.
5. The method of claim 4, further comprising, assigning unique
identifiers to a plurality of multimedia content to associate each
of the plurality of multimedia content with a relevant
advertisement.
6. The method of claim 1, wherein the multimedia comprises, one or
more of, image content, audio content, animated content, video
content, hypermedia, and interactive multimedia.
7. The method of claim 6, wherein the multimedia further comprises,
textual content.
8. The method of claim 1, further comprising, performing object
detection to analyze the multimedia content.
9. The method of claim 8, wherein the object detection comprises
face detection.
10. The method of claim 1, further comprising, performing text
recognition to analyze the multimedia content.
11. The method of claim 10, further comprising, performing near-by
text analysis.
12. The method of claim 1, further comprising, performing category
classification on the multimedia content.
13. The method of claim 12, further comprising, identifying a
predetermined set of categories into which multimedia content can
be categorized.
14. The method of claim 1, further comprising, performing speech
recognition to analyze the multimedia content.
15. The method of claim 1, further comprising, tracking and
recording a history of the click-through rates associated with the
at least a portion of the candidate pool of advertisements.
16. The method of claim 15, further comprising, further refining
the candidate pool of advertisements based on the history of the
click-through rates.
17. The method of claim 1, wherein, the set of descriptors
comprises a brand indicator.
18. The method of claim 17, further comprising, selecting the
candidate pool of advertisements based on the brand indicator.
19. A method of advertisement delivery optimization in a web-based
photograph-sharing environment, comprising: identifying a
photograph at which a user is viewing; analyzing the photograph to
classify the photograph as associated with one or more categories
of a set of predetermined categories; selecting a candidate pool of
advertisements from a plurality of advertisements based on
classification into the one or more categories; and presenting at
least a portion of the candidate pool of advertisements to the
user.
20. The method of claim 19, wherein the analyzing comprises:
extracting a set of image features from the photograph; computing
statistical parameters for at least a portion of the image features
of the set of image features; using the statistical parameters as
variables in a set of predetermined models; and identify the one or
more categories that the photograph is associable with.
21. The method of claim 20, wherein, the one or more categories are
qualitatively identified.
22. The method of claim 20, wherein, the determining further
comprises, computing probability values that the photograph is
associated with the one or more categories.
23. The method of claim 22, further comprising, modifying order of
presenting the at least a portion of the candidate pool of
advertisements based on the probability values.
24. The method of claim 20, wherein the set of predetermined models
are generated via performing a machine learning process.
25. The method of claim 23, wherein the performing the machine
learning process, comprises: extracting training image features
from a set of training images associated with a particular category
of the set of predetermined categories; computing statistical
parameters for at least a portion of the training image features of
the set of image features; generating a set of descriptors
characteristic of images of the particular category; and generating
the particular set of predetermined models that correspond to the
particular category based on the set of descriptors.
26. The method of claim 24, further comprising, generating a set of
predetermined models for each of the set of predetermined
categories.
27. The method of claim 24, wherein, the training image features
comprises one or more of, color feature, texture feature, shape
feature, and frequency content.
28. The method of claim 24, wherein, the predetermined categories
are user-modifiable.
29. The method of claim 24, further comprising, selecting a
predetermined number of training images.
30. A content awareness system, comprising: a multimedia content
data repository to store multimedia content; an advertisement data
repository to store a pool of advertisements; a machine learning
module communicatively coupled to the multimedia content data
repository; wherein, in operation, the machine learning module
analyzes a set of training multimedia content from the multimedia
database; a multimedia content analyzer module communicatively
coupled to the multimedia content data repository and the machine
learning module; wherein, in operation, the multimedia content
analyzer module analyzes multimedia content from the multimedia
data repository to generate a set of descriptors; and an
advertisement optimizer module communicatively coupled to the
multimedia content analyzer module; wherein, in operation, the
advertisement optimizer module selects a set of advertisements
based on the set of descriptors.
31. The system of claim 30, wherein the machine learning module
comprises a feature extraction module.
32. The system of claim 31, wherein the feature extraction module
is an image feature extraction module.
33. The system of claim 30, wherein the multimedia content database
comprises an image content repository.
34. The system of claim 33, wherein the image content database
comprises a photograph repository.
35. The system of claim 30, wherein the set of training multimedia
content comprises a set of training images.
36. The system of claim 30, further comprising, an image
classification module, wherein, in operation, the image
classification module receives images from the image content data
repository and classifies the image content as associated with one
or more categories.
37. The system of claim 30, further comprising, a tracking
module.
38. The system of claim 31, further comprising, a user data
repository to store user data.
39. A system, comprising: means for, identifying multimedia content
associated with a web-user; means for, analyzing the multimedia
content to retrieve a set of descriptors; wherein the multimedia
content comprises image content; means for, comparing the set of
descriptors with metadata of a plurality of advertisements; means
for, selecting a candidate pool of advertisements from the
plurality of advertisements based on relevancy indicated by the
comparison; and means for, presenting at least a portion of the
candidate pool of advertisements to the web-user.
40. A method of advertisement delivery optimization in a web-based
media-sharing environment, comprising: identifying a digital image
at which a user is viewing; analyzing the digital image to classify
the digital image as associated with one or more categories of a
set of predetermined categories; selecting a candidate pool of
advertisements from a plurality of advertisements based on
classification into the one or more categories; and presenting at
least a portion of the candidate pool of advertisements to the
user.
41. The method of claim 40, further comprising, detecting presence
of pornographic content in the digital image.
42. The method of claim 40, further comprising, detecting presence
of pornographic content in an advertisement and further
categorizing the advertisement in a non-candidate pool of
advertisements.
43. The method of claim 42, further comprising, identifying a
second set of predetermined categories which the digital image is
not associated with.
44. The method of claim 42, further comprising, tracking behavior
of the user.
45. The method of claim 44, wherein, the tracking of the behavior
of the user comprises identifying a conversion rate.
46. The method of claim 44, wherein, the tracking of the behavior
of the user comprises tracking click-through rate.
47. A method of advertisement delivery optimization, comprising:
identifying user behavior; selecting a candidate pool of
advertisements from a plurality of advertisements based on the user
provided content; and presenting at least a portion of the
candidate pool of advertisements to the user.
48. The method of claim 47, wherein, the identifying the user
behavior comprises tracking a click-through rate.
49. The method of claim 47, wherein, the tracking of the behavior
of the user comprises identifying a conversion rate.
50. The method of claim 47, wherein, the identifying the user
behavior comprises detecting user provided content.
Description
CLAIM OF PRIORITY
[0001] This application claims priority to U.S. Provisional Patent
Application No. 61/009,358 entitled "Method, System, and Apparatus
For Automated Digital Media Classification", which was filed on
Dec. 27, 2007, the contents of which are expressly incorporated by
reference herein.
[0002] This application claims further priority to U.S. Provisional
Patent Application No. 61/043,039 entitled "Method, System, and
Apparatus For Optimizing Online Advertisements Using Analysis of
Rich Media Content", which was filed on Apr. 7, 2008, the contents
of which are expressly incorporated by reference herein.
TECHNICAL FIELD
[0003] The present disclosure relates generally to advertisement
delivery optimization and in particular to optimizing advertisement
delivery via analysis of multimedia content.
BACKGROUND
[0004] One way of determining which advertisements should be placed
on which pages is based on matches with textual data. However,
text-based identification and matching limits the relevancy of ad
placements. Since the same words can be used in different contexts,
depending on the identified text, the advertisement identifier may
need to parse through tens of thousands of advertisements which may
be relevant. Also, in many contexts, particularly on social-network
and photo sharing sites which rely on tagging, there is
insufficient text for accurately identifying advertisements.
Furthermore, since many related concepts or ideas do not
necessarily have the same identifying keywords, some search results
or recommendations which are relevant may be missed by the
traditional text-based identification means.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 illustrates a block diagram of a plurality of client
devices able to communicate with a plurality of content providers
and a server that optimizes advertisement delivery via a network,
according to one embodiment.
[0006] FIG. 2 depicts a block diagram of the components of a host
server for advertisement delivery optimization, according to one
embodiment.
[0007] FIG. 3A depicts a block diagram illustrating a database for
storing data used for advertisement delivery optimization,
according to one embodiment.
[0008] FIG. 3B depicts a block diagram of a database for storing
multimedia content, user data, and advertisement depository,
according to one embodiment.
[0009] FIG. 4A illustrates an example screenshot of a graphical
user interface displaying images of cell phones being viewed by a
user and the advertisement thus presented, according to one
embodiment.
[0010] FIG. 4B illustrates an example screenshot of a graphical
user interface displaying images of digital cameras being viewed by
a user and the advertisement thus presented, according to one
embodiment.
[0011] FIG. 4C illustrates an example screenshot of a graphical
user interface displaying a photograph being viewed by a user and
the advertisements thus presented, according to one embodiment.
[0012] FIG. 5 illustrates a diagrammatic representation of the
process for using multimedia content for advertisement selection,
according to one embodiment.
[0013] FIG. 6A illustrates a diagrammatic representation of the
process of the machine learning process for image classification,
according to one embodiment.
[0014] FIG. 6B illustrates a diagrammatic representation of the
process of iteratively altering the set of features presented to
the learning algorithm in order to improve accuracy and speed of
image classification, according to one embodiment.
[0015] FIG. 7A depicts a flow diagram illustrating a process of
selecting candidate and non-candidate pool of advertisements based
on identified multimedia content, according to one embodiment.
[0016] FIG. 7B depicts a flow diagram illustrating a process of
using identifiers for multimedia content to identify associated
advertisements, according to one embodiment.
[0017] FIG. 8A depicts a flow diagram illustrating a process for
selecting candidate pool of advertisements based on category
classification of a photograph, according to one embodiment.
[0018] FIG. 8B depicts a flow diagram illustrating a process for
category classification of a photograph utilizing a machine
learning process, according to one embodiment.
[0019] FIG. 9A depicts a flow diagram illustrating a process of
machine learning to generate predetermined models to represent
functions that can receive as input, characteristics of an image to
determine its category, according to one embodiment.
[0020] FIG. 9B depicts a flow diagram illustrating a process for
classifying images, according to one embodiment.
DETAILED DESCRIPTION
[0021] The following description and drawings are illustrative and
are not to be construed as limiting. Numerous specific details are
described to provide a thorough understanding of the disclosure.
However, in certain instances, well-known or conventional details
are not described in order to avoid obscuring the description.
References to one or an embodiment in the present disclosure can
be, but not necessarily are, references to the same embodiment;
and, such references mean at least one of the embodiments.
[0022] Reference in this specification to "one embodiment" or "an
embodiment" means that a particular feature, structure, or
characteristic described in connection with the embodiment is
included in at least one embodiment of the disclosure. The
appearances of the phrase "in one embodiment" in various places in
the specification are not necessarily all referring to the same
embodiment, nor are separate or alternative embodiments mutually
exclusive of other embodiments. Moreover, various features are
described which may be exhibited by some embodiments and not by
others. Similarly, various requirements are described which may be
requirements for some embodiments but not other embodiments.
[0023] The terms used in this specification generally have their
ordinary meanings in the art, within the context of the disclosure,
and in the specific context where each term is used. Certain terms
that are used to describe the disclosure are discussed below, or
elsewhere in the specification, to provide additional guidance to
the practitioner regarding the description of the disclosure. For
convenience, certain terms may be highlighted, for example using
italics and/or quotation marks. The use of highlighting has no
influence on the scope and meaning of a term; the scope and meaning
of a term is the same, in the same context, whether or not it is
highlighted. It will be appreciated that the same thing can be said
in more than one way.
[0024] Consequently, alternative language and synonyms may be used
for any one or more of the terms discussed herein, nor is any
special significance to be placed upon whether or not a term is
elaborated or discussed herein. Synonyms for certain terms are
provided. A recital of one or more synonyms does not exclude the
use of other synonyms. The use of examples anywhere in this
specification including examples of any terms discussed herein is
illustrative only, and is not intended to further limit the scope
and meaning of the disclosure or of any exemplified term. Likewise,
the disclosure is not limited to various embodiments given in this
specification.
[0025] Without intent to further limit the scope of the disclosure,
examples of instruments, apparatus, methods and their related
results according to the embodiments of the present disclosure are
given below. Note that titles or subtitles may be used in the
examples for convenience of a reader, which in no way should limit
the scope of the disclosure. Unless otherwise defined, all
technical and scientific terms used herein have the same meaning as
commonly understood by one of ordinary skill in the art to which
this disclosure pertains. In the case of conflict, the present
document, including definitions will control.
[0026] Embodiments of the present disclosure include systems and
methods for advertisement delivery optimization in a web-based
environment.
[0027] One aspect of the present disclosure includes identifying
and presenting advertisements to a user in a web-based environment
based on the presence of multimedia (e.g., rich media, rich
content). In some embodiments, targeted advertisement services
(e.g., local or remote) are provided to content providers that
represent web-publishers. The web-publishers (e.g., PhotoBucket)
generally have at their disposal, a large number of advertisements
from which they can select and place. Services and functionalities
can be provided to the web-publishers to assist them in optimally
placing the advertisements on each page.
[0028] In one embodiment, images hosted by the web-publisher for
display on the website metadata associated with a set of
advertisements available to the web-publisher for placement on the
website are analyzed. For example, the set of images are analyzed
and can be categorized as associable with one or more of the set of
predetermined image categories. Thus, based on the analysis, an
advertisement suitable for placement on a webpage of the website
can be identified, in some instances, for a fee.
[0029] Alternatively, targeted advertisement services can be
provided to the advertisement database (inventory) of a 3rd party
advertising company. For example, a 3rd party advertisement company
(e.g., Tribal Fusion) provides advertisements to different
web-publishers. The services and functionalities can be applied to
the advertisement database of the 3rd party advertising company
such that the advertisements provided by the 3rd party
advertisement company to each customer can be optimized, for
example, using the information provided by performing image
classification on the ads and/or on multimedia content existing on
webpage.
[0030] In one embodiment, a set of images associated with a set of
advertisements provided by the third-party advertisement company
and customer data of the plurality of customers are analyzed. Thus,
an advertisement suitable for a customer of the plurality of
customers can be selected based on the analysis. In one embodiment,
text-based information related to the set of images or the customer
data is analyzed. In general, the text-based information can
include metadata and tags.
[0031] In one embodiment, a set of predetermined image categories
are selected for use in identifying the advertisement. The
web-publisher or 3rd party advertisement company may select the
predetermined image categories for use in identifying the
advertisement. The web-publisher or 3rd party advertisement can
also a subset of the set of predetermined image categories for use
in identifying the advertisement. In one embodiment, additional
categories of the set of predetermined image categories for use in
identifying the advertisement to the web-publisher, in some
instances, for a fee.
[0032] One embodiment further includes, generating a set of custom
image categories for the web-publisher. Additionally the set of
predetermined image categories are adaptive and modifiable.
[0033] Advertisements include any sort of promotional content such
as flyers, coupons, e-coupons, delivered via any means including
but not limited to, web-based delivery (e.g., email, banner, flash
media, multimedia, etc.), telephonic delivery (e.g., via cell
phone, via SMS text, via land-line), physical delivery (e.g., via
mail, banner display, etc.).
[0034] Note that although embodiments of the present disclosure are
described with reference to web-based multimedia, it is
contemplated that the novel techniques are applicable to multimedia
(e.g., images, pictures, videos, text, etc.) existing in a physical
environment/surroundings or multimedia presented on a display unit
(e.g., cable television, videos, DVDs, etc.) which may or may not
be connected to network (e.g., Internet).
[0035] In general, multimedia includes, in addition to textual
content, audio content, animated content, video content,
hypermedia, and/or interactive multimedia, and online advertisement
can be optimized based on any combination of the aforementioned
types of multimedia.
[0036] Multimedia content (e.g., movie content, audio content, type
of background music, speech content, image content, flash-media,
etc.) can be determined to gauge user interest such that online
advertisements of improved relevancy can be presented to the user.
For example, if a user is viewing a friend's ski vacation pictures,
advertisements relevant to Tahoe ski resorts or ski rentals can be
identified and presented to the user. If the user is determined to
have a preference for trance music, advertisements and/or content
promoting various trance music D.J.s can be identified and
presented to the user.
[0037] Speech recognition can be employed to identify the content
of music, a dialogue, and/or video content. These techniques can
further be used in conjunction with detection of textual data
(e.g., keywords) to obtain information about the user. Content
descriptors can be generated for each type of multimedia to
identify a candidate pool of ads most relevant for the user. In
addition, user data (e.g., user profile information including but
not limited to, age, demographic data, geographical data, etc.) are
in some instances used to further refine the candidate pool of ads
of potential interest to the user. In addition, general knowledge
about users may be detected and compiled. Implicit knowledge about
users may be determined based on their detected activities in a
web-environment. For example, visitors of an online wine shop could
be assumed to be someone who likes wines or is otherwise related to
the wine industry.
[0038] One aspect of the present disclosure includes generating
content descriptors (e.g., content information, content data,
and/or content metadata, etc.) from the multimedia that is present
in an online environment. The content descriptors may represent
different types of attributes of multimedia. For example,
descriptors for images (e.g., photographs, artwork, paintings,
sketches, hand-written documents, etc.) can include, by way of
example but not limitation, color indicators, frequency content
indicators, texture indicators, category indicators, shape
indicators, etc.
[0039] Descriptors for audio content (e.g., speech, sound, music,
etc.) can include, by way of example but not limitation, topic of
speech, type of sound, tone of sound, frequency content, frequency
distribution, genre of music, beat, instrument, etc. Descriptors
for video content, for example, can include, color, video quality,
category of video, characters in the video, etc. These descriptors
can be used alone or in conjunction to identify a candidate pool of
advertisement relevant to the user. Descriptors for image content
can also be referred to as "features" or "image features".
[0040] One aspect of the present disclosure includes using image
classification for generating descriptors (e.g., image features) to
identify relevant ads. The image classification process is able to
identify one or more image categories that the image (e.g.,
photograph) can be associated with. For example, an image (e.g.,
drawing, painting, or photograph) of a baby in a ski cap can be
associated with both a ski photo and a baby photo. By classifying
images into topic categories, the system can identify topics of
interest to the user and further select advertisement based on the
identified topics of interest. This process typically includes a
learning phase, which learns a model given a broad array of images
in each category, and then a classification phase, which determines
the category of some new media. This process typically includes
decomposing each image used in the learning step into a set of
features. Then, the set of features and an image category are input
into a statistical learning algorithm, which builds a model for
classifying an image. When a new image arrives, a system will
compute the same features of the image, and use the predetermined
model to determine the image category.
[0041] Conventional approaches to this problem generally vary by
two major categories: 1) the image is decomposed into features, and
2) what learning algorithm is used.
[0042] For image decomposition, approaches include using the image
pixels directly, constructing statistics over the entire image such
as by building color histograms (possibly using other colorspaces
such as HSV, LAB, or YCrCb), or histograms of image transformations
such as gradients, Gabor Features, Gaussian Edge Detectors,
Discrete Fourier Transforms, and many more. Other approaches
segment the image into pieces, such as the top left quarter, the
top right quarter, etc. and compute image features for each of
these sub-regions, and then concatenate all of these features
together. This particular approach is referred to as a feature
pyramid. These sub-regions can also overlap, for instance the
top-left quarter, the top-right quarter, the top-middle (having the
same width and height as the top-left or top-right quarter), etc.
We could then compute features such as a color histogram for each
of these sub-regions. Also, interest point detectors can be used to
just compute statistics about patches at points of interest, and
ignoring the rest of the image. For example, a feature could be a
histogram of the green component of 3.times.3 patch of pixels at
locations where a corner detector exceeds some specific threshold.
An important technique is to merge all of these techniques
together, allowing us to use spatial, color, interest point, and
image-transform features all together, to provide more robust
results. For instance, we could use an interest point detector on
the top-left quarter of the hue component (after transforming the
image to HSV space) of the image. We could then compute a Gabor
feature on the 3.times.3 patch of the points determined as
interesting. Other procedures such as image normalization can be
used as a pre-processing step to improve robustness. For other
applications such as sound, Discrete Fourier Transforms (DFTs) and
Discrete Cosine Transforms (DCTs) are common choices for such a
decomposition.
[0043] For the learning, many methods are available such as
K-nearest neighbors (KNN), Support Vector Machines (SVMs), Adaptive
Boosting (Adaboost), Neural Networks, Bayesian Learning, etc. Often
there are also some parameters which are chosen for the system. For
K-nearest neighbors, what distance function, averaging function,
and the value of K are all options. For SVMs, what kernel is used
can have a large effect on performance. The machine learning
process for image classification is described with further
reference to FIG. 6 and FIG. 9.
[0044] FIG. 1 illustrates a block diagram of a plurality of client
devices 104A-N able to communicate with a plurality of content
providers 108A-N, 110 and a server 100 that optimizes advertisement
delivery via a network, according to one embodiment.
[0045] The plurality of client devices 104A-N and content providers
108A-N, 110 can be any system and/or device, and/or any combination
of devices/systems that is able to establish a connection with
another device, a server and/or other systems. The client devices
104A-N and content providers 108A-N, 110 typically include display
or other output functionalities to present data exchanged between
the devices to a user. For example, the client devices and content
providers can be, but are not limited to, a server desktop, a
desktop computer, a computer cluster, a mobile computing device
such as a notebook, a laptop computer, a handheld computer, a
mobile phone, a smart phone, a PDA, a Blackberry device, a Treo,
and/or an iPhone, etc. In one embodiment, the client devices 104A-N
and content providers 108A-N, 110 are coupled to a network 106. In
some embodiments, the modules may be directly connected to one
another.
[0046] The network 106, over which the client devices 104A-N and
content providers 108A-N, 110 communicate, may be a telephonic
network, an open network, such as the Internet, or a private
network, such as an intranet and/or the extranet. For example, the
Internet can provide file transfer, remote log in, email, news,
RSS, and other services through any known or convenient protocol,
such as, but is not limited to the TCP/IP protocol, Open System
Interconnections (OSI), FTP, UPnP, iSCSI, NFS, ISDN, PDH, RS-232,
SDH, SONET, etc.
[0047] The network 106 can be any collection of distinct networks
operating wholly or partially in conjunction to provide
connectivity to the client devices, host server, and/or the content
providers 108A-N, 110 and may appear as one or more networks to the
serviced systems and devices. In one embodiment, communications to
and from the client devices 104A-N and content providers 108A-N,
110 can be achieved by, an open network, such as the Internet, or a
private network, such as an intranet and/or the extranet. In one
embodiment, communications can be achieved by a secure
communications protocol, such as secure sockets layer (SSL), or
transport layer security (TLS).
[0048] In addition, communications can be achieved via one or more
wireless networks, such as, but is not limited to, one or more of a
Local Area Network (LAN), Wireless Local Area Network (WLAN), a
Personal area network (PAN), a Campus area network (CAN), a
Metropolitan area network (MAN), a Wide area network (WAN), a
Wireless wide area network (WWAN), Global System for Mobile
Communications (GSM), Personal Communications Service (PCS),
Digital Advanced Mobile Phone Service (D-Amps), Bluetooth, Wi-Fi,
Fixed Wireless Data, 2G, 2.5G, 3G networks, enhanced data rates for
GSM evolution (EDGE), General packet radio service (GPRS), enhanced
GPRS, messaging protocols such as, TCP/IP, SMS, MMS, extensible
messaging and presence protocol (XMPP), real time messaging
protocol (RTMP), instant messaging and presence protocol (IMPP),
instant messaging, USSD, IRC, or any other wireless data networks
or messaging protocols.
[0049] The content providers 108A-N are generally advertisers or
content promoters who wish to have their advertisements optimally
delivered to end users (e.g., users of client devices 102A-N).
[0050] The database 132 can store software, descriptive data,
multimedia, user data, system information, drivers, and/or any
other data item utilized by other components of the host server 100
and/or the content providers 108 for operation. The database 132
may be managed by a database management system (DBMS), for example
but not limited to, Oracle, DB2, Microsoft Access, Microsoft SQL
Server, PostgreSQL, MySQL, FileMaker, etc.
[0051] The database 132 can be implemented via object-oriented
technology and/or via text files, and can be managed by a
distributed database management system, an object-oriented database
management system (OODBMS) (e.g., ConceptBase, FastDB Main Memory
Database Management System, JDOInstruments, ObjectDB, etc.), an
object-relational database management system (ORDBMS) (e.g.,
Informix, OpenLink Virtuoso, VMDS, etc.), a file system, and/or any
other convenient or known database management package. An example
set of data to be stored in the database 132 is illustrated in FIG.
3A-3B.
[0052] The host server 100 is, in some embodiments, able to
communicate with client devices 102A-N and content providers 108A-N
via the network 106. In addition, the host server 100 is able to
retrieve data from the database 132. In some embodiments, the host
server 100 is able to assimilate data obtained from the content
providers 108A-N and/or client devices 102A-N to provide enhanced
or optimized advertisement delivery.
[0053] The communications that the host server 100 establishes with
the client-end devices can be multi-way and via one or more
different protocols. Any number of communications sessions may be
established prior to providing optimized advertisement delivery.
Each session may involve multiple users communicating via the same
or different protocols. The host server 100 communicates with the
participating parties or entities (e.g., client devices, end users,
advertisers, content providers, databases, etc.), in series and/or
in parallel to obtain the necessary information from the users to
identify relevant advertisements. The process of which is described
in detail with further reference to FIG. 5-9.
[0054] In addition, the host server 100 can establish communication
sessions with the database 132 to identify additional information
about the users, such as, but not limited to subscription
information, historical information, click-through history, user
preferences (explicit or implicit), and/or any other useful
information which may indicate a user's likes and dislikes.
[0055] FIG. 2 depicts a block diagram of the components of a host
server 200 for advertisement delivery optimization, according to
one embodiment.
[0056] The host server 200 includes a network interface 202, a
communications module 204, a multimedia content analyzer module
206, an image classification module 208, a machine learning module
210, a click-through rate tracker 212, and/or an advertisement
optimizer module 214. Additional or fewer modules can be included
without deviating from the novel art of this disclosure. In
addition, each module in the example of FIG. 2 can include any
number and combination of sub-modules, and systems, implemented
with any combination of hardware and/or software modules. The host
server 224 may be communicatively coupled to the user database 222,
the multimedia content database 224, and/or the advertisement
database 226 as illustrated in FIG. 2. In some embodiments, the
user database 222, the multimedia content database 224, and/or the
advertisement database 226 are partially or wholly internal to the
host server 200. The databases are described with further reference
to FIG. 3A-B.
[0057] The host server 200, although illustrated as comprised of
distributed components (physically distributed and/or functionally
distributed), could be implemented as a collective element. In some
embodiments, some or all of the modules, and/or the functions
represented by each of the modules can be combined in any
convenient or known manner. Furthermore, the functions represented
by the modules can be implemented individually or in any
combination thereof, partially or wholly, in hardware, software, or
a combination of hardware and software.
[0058] In the example of FIG. 2, the network interface 202 can be
one or more networking devices that enable the host server 200 to
mediate data in a network with an entity that is external to the
host server, through any known and/or convenient communications
protocol supported by the host and the external entity. The network
interface 202 can include one or more of a network adaptor card, a
wireless network interface card, a router, an access point, a
wireless router, a switch, a multilayer switch, a protocol
converter, a gateway, a bridge, a bridge router, a hub, a digital
media receiver, and/or a repeater.
[0059] A firewall, can, in some embodiments, be included to govern
and/or manage permission to access/proxy data in a computer
network, and track varying levels of trust between different
machines and/or applications. The firewall can be any number of
modules having any combination of hardware and/or software
components able to enforce a predetermined set of access rights
between a particular set of machines and applications, machines and
machines, and/or applications and applications, for example, to
regulate the flow of traffic and resource sharing between these
varying entities. The firewall may additionally manage and/or have
access to an access control list which details permissions
including for example, the access and operation rights of an object
by an individual, a machine, and/or an application, and the
circumstances under which the permission rights stand.
[0060] Other network security functions performed or included in
the functions of the firewall, can be, for example, but are not
limited to, intrusion-prevention, intrusion detection,
next-generation firewall, personal firewall, etc. without deviating
from the novel art of this disclosure. In some embodiments, the
functionalities of the network interface 202 and the firewall are
partially or wholly combined and the functions of which can be
implemented in any combination of software and/or hardware, in part
or in whole.
[0061] The network interface 202 includes a communications module
or a combination of communications modules communicatively coupled
to the network interface 818 to manage a one-way, two-way, and/or
multi-way communication sessions over a plurality of communications
protocols.
[0062] In the example of FIG. 2, the host server 200 includes the
communications module 204 or a combination of communications
modules communicatively coupled to the network interface 202 to
manage a one-way, two-way, and/or multi-way communication sessions
over a plurality of communications protocols.
[0063] Since the communications module 204 is typically compatible
with receiving and/or interpreting data originating from various
communication protocols, the communications module 204 is able to
establish parallel and/or serial communication sessions with end
users and content promoters (e.g., advertisers). The communications
module is also able to communicate with one or more of the user
database 222, the multimedia content database 224, and/or the
advertisement database 226.
[0064] Thus, in some embodiments, the communications module 204
receives data and information relevant to providing advertisers
with optimized online advertisement delivery services and providing
end users with ads of improved relevancy. In addition, the
communications module 204 communicates with the end user devices to
deliver the identified ads based on multimedia content present in a
web-based environment. The data/information received may have
originated from various protocols and may be in various formats,
including, but not limited to, multimedia content including, text,
audio, speech, image, video, hypermedia, etc.
[0065] The multimedia content analyzer module 206 can be any
combination of software agents and/or hardware modules able to
detect, identify, and/or analyze multimedia content present in a
web-based environment.
[0066] The multimedia content analyzer module 206 (hereinafter
referred to as "analyzer module") can detect the presence of
multimedia content (e.g., rich media, audio, text, audio,
hypermedia, and the like types of media) in a web-based environment
that a user is currently viewing or otherwise interacting with. For
example, the analyzer module 206 detects and tracks any music
tracks that the user may be listening to over the web. Furthermore,
the analyzer module 206 may also detect any image content (e.g.,
pictures, artwork, photographs, abstract images, and the like
content . . . ) that the user may be viewing, browsing through,
and/or searching for. Additionally, presence of textual content,
hypermedia, flash media, and/or interactive media may also be
detected and identified as such.
[0067] In one embodiment, the analyzer module 206 distinguishes
between different types of multimedia and identifies the detected
content as text, audio, hypermedia, and the like types of media or
a combination thereof. Once multimedia has been identified as a
particular type of media, one or more algorithms and/or processes
are applied for analysis.
[0068] In one embodiment, the analyzer module 206 analyzes the
detected multimedia content to obtain features of the content that
may be useful for providing optimized advertisement services via
identifying ads with higher relevancy to user interests or
preferences. These features may be used by the analyzer module 206
to obtain additional information about the multimedia. For example,
a general topic/category of the multimedia may be determined, a
genre of music, a topic of dialogue, an image category, an image
type, etc. The features can additionally be used by the image
classification module 208 or the machine learning module 210 to
determine additional information about the multimedia content.
[0069] In one embodiment, the multimedia content analyzer module
206 includes a feature extractor for extracting multimedia
features. Features of audio content include by way of example but
not limitation, audio spectrum (time and frequency domain data),
type of sound (e.g., human voice, synthetic sound, a cappella,
instrumental, female voice, male voice), type of music (e.g.,
hip-hop, classical, vocal, jazz, etc.), pitch, loudness, Discrete
Fourier Transforms (DFTs) and Discrete Cosine Transforms (DCTs),
etc. Features of image content can include, for example, color
components, edges, distribution of color components, texture,
shape, frequency content (e.g., 2D Fourier Transform), etc.
[0070] In particular, for image feature extraction generally occurs
in an image decomposition process. Feature extraction techniques
can further include for example, using the image pixels directly,
constructing statistics over the entire image such as by building
color histograms (possibly using other colorspaces such as HSV,
LAB, or YCrCb), or histograms of image transformations such as
gradients, Gabor Features, Gaussian Edge Detectors, Discrete
Fourier Transforms, and many more. Other approaches segment the
image into pieces, such as the top left quarter, the top right
quarter, etc. and compute image features for each of these
sub-regions, and then concatenate all of these features together.
This particular approach is referred to as a feature pyramid. These
sub-regions can also overlap, for instance the top-left quarter,
the top-right quarter, the top-middle (having the same width and
height as the top-left or top-right quarter), etc. We could then
compute features such as a color histogram for each of these
sub-regions. Also, interest point detectors can be used to just
compute statistics about patches at points of interest, and
ignoring the rest of the image. For example, a feature could be a
histogram of the green component of 3.times.3 patch of pixels at
locations where a corner detector exceeds some specific threshold.
An important technique is to merge all of these techniques
together, allowing us to use spatial, color, interest point, and
image-transform features all together, to provide more robust
results. For instance, we could use an interest point detector on
the top-left quarter of the hue component (after transforming the
image to HSV space) of the image. We could then compute a Gabor
feature on the 3.times.3 patch of the points determined as
interesting. Other procedures such as image normalization can be
used as a pre-processing step to improve robustness.
[0071] Note that alternate types of multimedia are supported by the
analyzer module. Additional, less, or different features associated
with alternate, additional, or less types of multimedia are
contemplated and compatibility with the host server/analyzer module
206 and considered within the novel art of the techniques herein
described.
[0072] In one embodiment, statistical parameters of the detected
features are computed. For example, the set of features and an
image category are input into a statistical learning algorithm,
which builds a model for classifying an image. In one embodiment,
the machine learning module 210 builds a model for classifying an
image based on the statistical parameters. The statistical
parameters may be computed by the analyzer module 206 or the
machine learning module 210. When a new image arrives, for example,
the image classification module 208 computes the features of the
image, and uses the model built by the machine learning module 210
to determine the image category.
[0073] One embodiment of the host server 200 includes the machine
learning module 210. The machine learning module 210 can be any
combination of software agents and/or hardware modules able to
identify, receive, and/or compute statistical parameters for
features of multimedia and build a statistical model from machine
learning to identify specific attributes of multimedia content. For
example, statistical parameters for image features computed by the
analyzer module 206 can be used for machine learning to build a
model for image classification into topic categories.
[0074] The machine learning module 210 generally employs one or
more learning algorithms for model building to achieve image
classification, including but not limited to, K-nearest neighbors
(KNN), Support Vector Machines (SVMs), Adaptive Boosting
(Adaboost), Neural Networks, Bayesian Learning, etc. Often there
are also some parameters which are chosen for the learning. For
example, for K-nearest neighbors, the distance function, averaging
function, and the value of K are adjustable parameters. In some
situations, decreasing the value of K may speed things up and
increasing K can improve the accuracy. In general, the accuracy
improvement increases with K but generally saturates after a
certain point and in some instances, may decrease with further
increases of K. The distance functions also generally affect speed
and/or accuracy. For SVMs, the kernel that is used can be used to
tweak performance. Different kernels can have different speeds, and
generally there is a tradeoff between speed and accuracy. For some
kernels, there are also parameters which can be set, which affect
robustness.
[0075] In one embodiment, the machine learning module 210 generates
predetermined models for classifying images based on a hierarchical
method. The hierarchical method fuses multiple binary classifiers
into a k-classifier. In general, in a binary classifier, the more
likely of two classes can be determined and thus selected. Thus,
for multi-class classification, fusing of multiple binary
classifiers can be performed.
[0076] In one embodiment, the k-classifier is built from results of
a generalization accuracy computation. The generalization accuracy
computation is, in one embodiment, determined from performing
machine learning on a learning data set and recording the accuracy
on a verification data set. For example, the training data set can
be split into a learning data set and a verification data set.
Machine learning can then be performed on the learning data set to
generate a model. The accuracy of the model can then be determined
by applying the model to the verification data set. By performing a
learning process and a verification process, a probability for each
pair of categories can be generated.
[0077] Alternatively, the training data set can be split in
different ways. For example, in a k-fold cross validation process,
the training data is split into k sets where (k-1) sets are used as
the training data set and a single set is used as the verification
data set. Subsequently, a different (k-1) is used for training and
a different set is used as the verification data set, and so on.
The generalization accuracy for the k-fold cross validation method
can then be determined by averaging over the different selections
of the (k-1) training data set and the single verification data
set.
[0078] The probability for each pair of categories obtained from
the combination of the learning and verification process can then,
in one embodiment, be used to construct a `tree` to facilitate the
multi-classification process. A number of tree construction methods
can be used and are contemplated and are not limited to those
discussed herein.
[0079] For example, the tree construction can occur from the
bottom-up or from the top-down. One example of the bottom-up scheme
is to merge two categories with the least generalization accuracy
and merging them to create a new category (sometimes also referred
to as a `meta-category`). The creation of the new category also
implicitly creates a new node with the two original categories as
child-categories. Subsequently, in one embodiment, the machine
learning process can be performed for the situation where the two
categories are treated as one category. The new accuracies for the
newly constructed model based on having merged categories are
determined.
[0080] This process can be repeated iteratively until there are two
categories left among which, the better fitting one can be
determined therefore yielding a categorization that occurs in an
inverse direction. Using the model from the last step that
determines one of the two meta-categories at the last phase of the
tree construction the test data is likely to be, the same process
can be repeated from the top-down starting at the top node of the
tree. This is repeated until one category remains. Since the leaves
of the tree are categories rather than meta-categories, the
remaining category is category determined to be most likely
associable with the data in question.
[0081] In alternative embodiments, a tree may be force-balanced by
combining k/2 categories to generate a first meta-category and
repeating the process. The k/2 categories can be selected, in one
embodiment, based on the greedy process, where the pairs with the
lowest generalization accuracies are selected. Subsequently, the
next pair can be selected based on an optimization approach. For
example, the pairings can be selected to minimize the sum of the
square of the generalization accuracies.
[0082] One embodiment of the host server 200 includes the image
classification module 208. The image classification module 208 can
be any combination of software agents and/or hardware modules able
to identify image features and compute the statistical attributes
of the image parameters. The image classification module 208, when,
in operation, is communicatively coupled to the machine learning
module 210 such that the statistical attributes are used as
variables in models that have been built in the machine learning
module 210 to classify images.
[0083] In one embodiment, the machine learning module 210 updates
(e.g., refines, enhances) the models based on the output of the
image classification module 208. Since the output of the image
classification module 208 provides information related to accuracy
of the classification, validation images, such as images of known
categories that were not used to train the machine learning module
210, are submitted to the image classification module 208 to
evaluate the performance of the models on the validation set. The
output of the classification module 208 indicating the accuracy of
the model can then be used to update learning parameters, or the
features used by the machine learning module 210. To update
learning parameters, different values can be used to determine how
accuracy is affected. For example, the parameters can be increased
from an initial value to find the optimal value. In general the
accuracy increases with increasing the value of a learning
parameter and will sometimes saturate or decrease with further
increase in value of the learning parameter. In some situations,
different values can be tried randomly to identify an optimal value
that yields high accuracy. In one embodiment, learning parameters
and the resulting accuracy can be paired to speed up the search
process for the optimal parameter value, for example, performing a
binary search.
[0084] In one embodiment, the image features used in building the
models are updated based on evaluation of model performance. Many
different image features can be selected to build a model for a
particular image category. The feature selection process is, in one
embodiment, implemented with the `greedy formulation`, where one
feature with the lowest error is initially selected and combined
with a second feature that yields the least error, etc. Thus, at
each step, each additional feature added to the model is the
feature that decreases the error from the previous iteration
without the current feature being incorporated into the model. Any
number of image features may be added. In alternative embodiments,
all features may be used while subsequently removing features which
leave the classifier with the highest remaining accuracy
one-by-one. Another method includes classifying each feature
independently, and then selecting a predetermined number of
features. Potentially, heuristics about which features are similar
can be inserted, so they are not all selected (e.g., a gradient
feature may be preferred over another color feature if multiple
color features have already been selected).
[0085] In one embodiment, image features are added until the
accuracy exceeds a predetermined threshold. For example, if the
accuracy threshold is 80%, features are added until the computed
accuracy with validation by the classification module 208 is at
least 80%. Alternatively, predetermined numbers of image features
may be added to build the image model for a particular image
category.
[0086] Note that the `greedy formulation` can be similarly applied
to features of other types of multimedia (e.g., audio, video,
hypermedia, text, etc.) for building a classification model for
various types of multimedia. This application is contemplated and
also considered to be within the novel art of this disclosure.
[0087] In one embodiment, the learning algorithms employed in the
machine learning module 210 are selected and updated based on the
`greedy formulation` process. In addition, learner parameters
(e.g., Kernel selection for SVM) can be similarly updated and
selected. Therefore, the coupling of the image classification
module 208 and the machine learning module 210 allows the host
system 200 the ability to automatically re-configure as suitable
since multiple parameters are adjustable on-demand to achieve a
certain accuracy threshold.
[0088] In some embodiments, alternate search algorithms including,
optimization formulations, linear programming, genetic algorithm,
and/or simulated annealing approaches are used in lieu of, or in
conjunction with, the greedy formulation to optimize the
classification or machine learning processes.
[0089] Additionally, to ensure robustness when detecting images of
varying scales, orientations, etc., the training images can be
slightly altered to create additional images for the purposes of
building the models. For example, new images, which are slightly
tilted, slightly rotated, and/or scaled versions (e.g., thumbnail
versions) can be generated as additional training images. Also, to
ensure sensitivity to certain effects such as border effects, new
images with borders can be created for training purposes. In one
embodiment, border effects can be remove as a part of the
pre-processing process.
[0090] One embodiment of the host server 200 includes the
click-through rate tracker 212. The click-through rate tracker 212
can be any combination of software agents and/or hardware modules
able to track the click-through history. For example, the
click-through rate tracker 212 tracks click-through history
associated with an end user and the click-through rate associated
with particular advertisements. When advertisements are selected
based on identified multimedia content, their click-through rates
can be tracked to further refine the candidate pool of
advertisements associated with the identified multimedia. The
click-through rate tracker 212, when in operation, communicates
with the advertisement optimizer module 214 to provide information
related to popularity of the advertisements selected. The
advertisement optimizer module 214 can optionally refine the
candidate pool of ads based on the click-through rates.
[0091] One embodiment of the host server 200 includes the
advertisement optimizer module 214. The advertisement optimizer
module 214 can be any combination of software agents and/or
hardware modules able to identify a candidate pool and/or a
non-candidate pool of advertisements based on descriptors of
multimedia content. The descriptors may be received from the image
classification module 208 or the machine learning module 210.
[0092] In one embodiment, the advertisement optimizer module 214
communicates with the advertisement database 226 to extract
metadata. The metadata can then be compared with the multimedia
descriptors such that relevant advertisements can be identified and
delivered to a user. In addition, non-relevant advertisement may
also be identified such that they are not presented to a user.
[0093] FIG. 3A depicts a block diagram illustrating a database 332
for storing data used for advertisement delivery optimization,
according to one embodiment.
[0094] In the example of FIG. 3A, the database 332 can store
information about multimedia content, users, and advertisements. In
one embodiment, the database 332 includes a multimedia content
database 322, a user database 324, and an advertisement database
326. The databases 322, 324, and 326 may be partially or wholly
external to the database 322.
[0095] FIG. 3B depicts a block diagram of a multimedia content
database 322, a user database 324, and an advertisement database
326, according to one embodiment.
[0096] In the example of FIG. 3B, the multimedia content database
322 can store multimedia content and/or information (e.g.,
descriptors) about multimedia content.
[0097] For example, the types of multimedia content include image,
audio, video, textual, animated, hypermedia, and/or interactive
multimedia. Various descriptors (e.g., descriptors obtained from
extracting features from the media) associated with the multimedia
may be stored in the database 322 as well. In one embodiment, the
image content includes photographs, including but not limited to
photographs associated with a particular user. The photographs may
be organized into one or more albums. Therefore, the photographs in
an album can be analyzed and a descriptor can be generated. In one
embodiment, user data about the particular user is compiled based
on the descriptor and an advertisement suitable for targeting the
particular user can be identified. In addition, a type or category
of advertisement suitable for targeting the particular user can be
identified. The descriptor can be generated from at least one
sub-descriptor associated with a photograph of the collection of
photographs.
[0098] In general, an album includes any physical or web-based
collection of photographs (e.g., digital photograph or a physical
picture).
[0099] The user database 324 can store user data. For example, user
data can include descriptive data of personal information such as,
but is not limited to, a first name and last name of the user, a
valid email ID, a unique user name, age, occupation, location,
education, ethnicity, race, etc. The user information further
includes interest information, which may include, but is not
limited to, activities, hobbies, professional information, photos,
etc.
[0100] The advertisement database 326 can store advertisements
and/or advertisement data (e.g., advertisement metadata). The
advertisement metadata may be used for identifying ads with
increased relevancy, for example, via comparison with descriptors
retrieved from multimedia content present in a web-based
environment.
[0101] FIG. 4A illustrates an example screenshot of a graphical
user interface 400 displaying images of cell phones 402A-402D being
viewed by a user and the advertisement 404 thus presented,
according to one embodiment
[0102] In order to provide a user with advertisements more relevant
to pages they are viewing, thus increasing the effectiveness of the
advertisements, the system analyzes multimedia content in a
web-based environment to identify a candidate pool of
advertisements. In the example screen shot of FIG. 4A, a collection
of cell phones, 402A-402D with links to reviews and a more in depth
description, is displayed on the screen.
[0103] The system then analyzes those images 402A-402D present on
the webpage and retrieves a set of descriptors that characterize
the content being viewed as cell phones. For example the text,
phone, cell, or even brand names, could be used as descriptors of
this image. Descriptors of the image can also include image
features such as color distribution, texture distribution, color
content, 2DFT, edges, and/or shapes, etc.
[0104] The system then compares the descriptors to metadata
associated with a pool of advertisements available for placement. A
candidate pool of advertisements is selected from the
advertisements available for placement based on how relevant the
advertisements are to the multimedia content 402 being accessed.
For example, advertisements for cell phones, cell phone
accessories, or cell service providers could be placed in the
candidate pool because they are relevant to the content being
viewed by the user. With the candidate pool of advertisements
selected, the system then presents at least some of that candidate
pool to the user.
[0105] In the example of FIG. 4A, an advertisement 404 of a Sony
Ericsson cell phone is presented to the user. In some embodiments,
advertisements that are not relevant to the descriptors found by
analyzing multimedia content can be placed in a non-candidate pool.
The advertisements in this non-candidate pool are generally
prevented from being displayed to the user.
[0106] FIG. 4B illustrates an example screenshot of a graphical
user interface 420 displaying images of digital cameras 422A-422F
being viewed by a user and the advertisement 424 thus presented,
according to one embodiment.
[0107] Similar to the example illustrated in FIG. 4A, the
multimedia content is detected, analyzed and a candidate pool of
advertisements is selected based on the multimedia content in the
web-based environment. Here, the system detects the images
422A-422F being viewed by the user as cameras and an advertisement
424 related to cameras is selected and displayed to the user.
[0108] In one embodiment, a unique identifier is assigned to
multimedia content (e.g., images 422) that have been analyzed and
accordingly categorized, rather than performing the analysis. This
unique identifier enables the system to associate the multimedia
content with an advertisement, or a candidate pool of
advertisements, without having to re-analyze the same image. This
may be useful for frequently accessed multimedia content (e.g., a
popular song, a popular video clip on YouTube, etc.) since
computing and time resources can be conserved. Thus, when the same
multimedia content is identified, the system references the unique
identifier and is able to automatically identify the set of
advertisements that have been previously identified as relevant to
the multimedia content. However, in some embodiments, images can be
analyzed on-demand in real time as they are detected as being
viewed or otherwise accessed.
[0109] FIG. 4C illustrates an example screenshot of a graphical
user interface 440 displaying a photograph 442 being viewed by a
user and the advertisements 444-448 thus presented, according to
one embodiment.
[0110] The photograph 442 of a person skiing can be analyzed in a
manner similar to the examples of FIG. 4A-4B. For example, image
features (e.g., color, shades, frequency content, spatial color
distribution, spatial frequency distribution, spatial texture
distribution, texture, shapes, edges, etc.) can be extracted from
the photograph 442 to be analyzed. In one embodiment, statistical
parameters of the image features can be computed and used as
variables in predetermined models. The predetermined models include
one or more representations of functions that determine the
classification of an image based on a set of input variables, in
this case, statistical parameters. The comparison can, in one
embodiment, facilitate in identification of the category with which
the photograph 442 can be associated with.
[0111] For example, in a color image (not shown), a predominance of
white and blue in a photograph may indicate that the photograph is
a skiing picture and a candidate pool of advertisements is created
from which several are displayed to the user.
[0112] While the above examples show images being analyzed, other
types of multimedia content, including but not limited to, textual
content, audio content, video content, animated content,
interactive multimedia, and hypermedia, can similarly be analyzed.
Other methods of analyzing multimedia content can be used to create
a candidate pool of relevant advertisements. For example, face
detection and recognition, object detection, text recognition,
near-by text analysis can be used to further refine the candidate
pool of advertisements.
[0113] Additionally, in one embodiment, advertisement selection
based on multimedia content analysis enables the system to provide
brand protection services. For example if the user were accessing
content related to a particular brand of product, advertisements
particular to that same brand can then be selected for the
candidate pool. Brands may be used in analysis of the multimedia
content being accessed. For example, if a Coca-Cola.RTM. logo is
detected in an image, there may be an increased possibility that
the image contains an image of a Coke bottle or can. In one
embodiment, the candidate pool can be narrowed by removing
advertisements which are not of the particular brand being viewed
in the multimedia content.
[0114] In a further embodiment, inappropriate/unsuitable content
(e.g., inappropriate images, nudity, adult content, pornographic
videos or images) can be detected in advertisements or multimedia
content. For example, if a particular advertiser does not want
advertisements displayed adjacent to inappropriate content, the
advertisements are not selected when inappropriate content is
identified. Additionally, some web-sites do not want advertisements
with inappropriate/adult content displayed.
[0115] FIG. 5 illustrates a diagrammatic representation of the
process for using multimedia content for advertisement selection,
according to one embodiment.
[0116] The content of the multimedia (or rich media) is, in one
embodiment, determined by an automatic analysis process. The
content can then be represented by one or more content descriptors
which are then used to identify an optimized pool of ads.
[0117] FIG. 6A illustrates a diagrammatic representation of the
process of the machine learning phase and the classification phase
for image classification, according to one embodiment.
[0118] Although the flow for image classification is illustrated,
this approach can be applied to other forms of multimedia. In one
embodiment, the learning phase uses a set of predetermined images
associated with a particular category (e.g., class) and extracts
image features from the predetermined images. The image features
are used with the learner (e.g., in a machine learning process) to
produce a model for the particular category. Then, in the
classification phase, image features are computed for an
unclassified image, and compared with developed models to classify
the image of interest.
[0119] FIG. 6B illustrates a diagrammatic representation of the
process of image classification based on feature extraction with
automatic optimization via adjustable feature selections, according
to one embodiment.
[0120] The example of FIG. 6B provides an enhancement to the
process shown in FIG. 6A. In one embodiment, the output of the
classification process is used for feature selection in the machine
learning process. For example, validation images of known
categories are analyzed and classified based on constructed models.
Since categories of validation images are known, the accuracy of
the models can be determined. Therefore, the results of the
validation process, is in one embodiment used to refine the models.
For example, the output of the classifier (e.g., indicative of the
accuracy of the classification model) can be used to select
learning algorithms, learning parameters and/or image features to
improve the classification accuracy (e.g., to reach a certain
predetermined threshold). The model optimization process is
described with further reference to FIG. 2.
[0121] FIG. 7A depicts a flow diagram illustrating a process of
selecting candidate and non-candidate pool of advertisements based
on identified multimedia content, according to one embodiment.
[0122] In process 702, multimedia content associated with a
web-user is identified. Association can be gained by virtue of the
user viewing, browsing, searching, listening to, or otherwise
interacting with the multimedia content. In process 704, the
multimedia content is analyzed. In particular, features may be
identified from the multimedia content to facilitate analysis. In
process 706, descriptors are identified and/or retrieved from the
multimedia content. In process 708, the descriptors are compared
with advertisement metadata. A process of using identifiers for
multimedia content to identify associated advertisements is
described with further reference to FIG. 7B. Advertisement metadata
may be provided by the content promoter (e.g., advertiser) or
automatically identified from advertisements.
[0123] In process 710, a candidate pool of advertisements based on
relevancy indicated by the comparison is selected. The relevancy
may be indicated qualitatively or quantitatively. For example, the
candidate pool of advertisements may be the set with relevancy
scores that exceed a certain threshold. Alternatively, the
candidate pool of advertisements may be the predetermined number of
ads (e.g., top 100, top 500, top 1000, etc.) with the highest
relevancy scores.
[0124] The candidate pool of advertisements may optionally be
further refined before they are presented to the web-user in
process 712. In process 714, a non-candidate pool of advertisements
based on lack of relevancy indicated by the comparison is
optionally identified. The non-candidate pool of advertisements are
generally recognized by having low relevancy scores (e.g., scores
below a particular threshold) or a predetermined number of lowest
scoring ads. In process 716, at least a portion of the
non-candidate advertisements are prevented from being presented to
the web-user.
[0125] FIG. 7B depicts a flow diagram illustrating a process of
using identifiers for multimedia content to identify associated
advertisements, according to one embodiment.
[0126] In process 732, a unique identifier is assigned to the
multimedia content. In process 734, the multimedia content is
associated with the advertisement that has been identified as being
relevant. The advertisement may have been identified based on the
process described in FIG. 7A. The unique identifier enables the
system to utilize a look-up table such that the same multimedia
would not be analyzed twice. If the same content is detected, the
associated advertisements can be identified without having to
re-analyze the content. This also has the advantage that the time
when the media is processed is decoupled from the time it takes
from serving the ad, allowing the media to be analyzed in
batch.
[0127] For example, in process 736, the same multimedia associated
with a second web-user is identified. Therefore, the identifier
previously assigned to the multimedia content can thus be
retrieved. In process 738, the advertisement associated with the
multimedia content is identified based on the unique identifier. In
process 740, the advertisement is presented to the web-user.
[0128] FIG. 8A depicts a flow diagram illustrating a process for
selecting candidate pool of advertisements based on category
classification of a photograph, according to one embodiment.
[0129] In process 802, a photograph which a user is viewing is
identified. In process 804, the photograph is analyzed to classify
the photograph as being associated with a category. The image
classification process can be further described with reference to
the example of FIG. 8B. In process 806, a candidate pool of
advertisements is selected based on classification into the one or
more categories. The image categories may be linked to
advertisements of particular categories. In one embodiment,
advertisements are linked to a particular image category by
tracking click-through history of users viewing image of the
particular category. In process 808, at least a portion of the
candidate pool of advertisements is presented to the user.
[0130] In one embodiment, probability values that the photograph is
associated with the one or more categories are computed. Thus, the
order of presenting the at least a portion of the candidate pool of
advertisements can be determined based on the probability
values.
[0131] FIG. 8B depicts a flow diagram illustrating a process for
category classification of a photograph utilizing a machine
learning process, according to one embodiment.
[0132] In process 832, image features are extracted from the
photograph. In process 834, statistical parameters of the image
features are computed. In process 836, the statistical parameters
are used as variables in the predetermined models. In one
embodiment, the predetermined models are generated via performing
machine learning. The machine learning process is illustrated with
further reference to FIG. 9A. In process 838, the category that the
photograph is associable with is identified.
[0133] FIG. 9A depicts a flow diagram illustrating a process of
machine learning to generate predetermined models to represent
functions that can receive as input, characteristics of an image to
determine its category, according to one embodiment.
[0134] In process 902, training image features are extracted from
training images associated with a particular category (or, class).
Training image features, similar to image features, include by way
of example but not limitation, color, texture, shape, edges,
corners, frequency content, spatial distribution, size of features,
etc. Some, additional, or all of these features can be extracted.
The features that are extracted may be determined on a case-by-case
basis or may be specific to image categories. In general, the
selected features are adjustable and modifiable as needed.
[0135] In process 904, statistical parameters for the training
image features are computed. In process 906, descriptors (e.g.,
text descriptor, numerical descriptor, vector parameters, or more
sophisticated data structures such as, a tree, a hash table,
matrices, etc.) characteristic of images of the particular category
are generated based on the statistical parameters. The descriptors
of each entry of a vector or matrix can be generated from
statistical parameters in one or more ways. In one embodiment, the
statistical parameters can be concatenated to produce a vector. In
general, the order and the number of the statistical parameters and
the descriptors should be consistent.
[0136] For example, in an image including red, green, and blue
features, each of which can range from 0-255, a histogram having a
predetermined number of measures for each of the features can be
constructed. For example, a 2-measure histogram can be generated
for the red feature such that pixel values below 122 are counted
for the first measure and the pixel values above 122 are counted as
the second measure in the histogram. A 2-measure histogram can
similarly be constructed for the green feature.
[0137] Based on the red and green features, a vector with four
parameters can be generated. The first two descriptors are
determined from the 2-measured histogram for the red feature and
the second two descriptors may be determined from the 2-measured
histogram for the green feature. In one embodiment, the histogram
entries are normalized to one. For example, the histogram for the
red feature and the green feature can be individually normalized to
one. Alternatively, the sum of the histogram for the red and green
features can be adjusted to sum to one.
[0138] In process 908, the particular set of predetermined models
that correspond to the particular category is generated. In process
910, a set of predetermined models is generated for each of the
predetermined categories. The image classification process based on
machine learning is described with further reference to the example
of FIG. 9B.
[0139] FIG. 9B depicts a flow diagram illustrating a process for
classifying images, according to one embodiment.
[0140] In process 932, an image to be classified is received. In
process 934, the image features of the image are identified. In
process 936, the statistical parameters of the image features are
computed. In process 938, the statistical parameters used with the
predetermined models. In one embodiment, one or more predetermined
models receives the statistical parameters of a particular image
and determines the classification.
[0141] In process 940, the topic categories that the image is
associable with are determined. In process 942, weighting values
are optionally assigned to each topic category. The weight values
can indicate likelihood that the image belongs to each of topic
categories.
[0142] Unless the context clearly requires otherwise, throughout
the description and the claims, the words "comprise," "comprising,"
and the like are to be construed in an inclusive sense, as opposed
to an exclusive or exhaustive sense; that is to say, in the sense
of "including, but not limited to." As used herein, the terms
"connected," "coupled," or any variant thereof, means any
connection or coupling, either direct or indirect, between two or
more elements; the coupling of connection between the elements can
be physical, logical, or a combination thereof. Additionally, the
words "herein," "above," "below," and words of similar import, when
used in this application, shall refer to this application as a
whole and not to any particular portions of this application. Where
the context permits, words in the above Detailed Description using
the singular or plural number may also include the plural or
singular number respectively. The word "or," in reference to a list
of two or more items, covers all of the following interpretations
of the word: any of the items in the list, all of the items in the
list, and any combination of the items in the list.
[0143] The above detailed description of embodiments of the
disclosure is not intended to be exhaustive or to limit the
teachings to the precise form disclosed above. While specific
embodiments of, and examples for, the disclosure are described
above for illustrative purposes, various equivalent modifications
are possible within the scope of the disclosure, as those skilled
in the relevant art will recognize. For example, while processes or
blocks are presented in a given order, alternative embodiments may
perform routines having steps, or employ systems having blocks, in
a different order, and some processes or blocks may be deleted,
moved, added, subdivided, combined, and/or modified to provide
alternative or subcombinations. Each of these processes or blocks
may be implemented in a variety of different ways. Also, while
processes or blocks are at times shown as being performed in
series, these processes or blocks may instead be performed in
parallel, or may be performed at different times. Further any
specific numbers noted herein are only examples: alternative
implementations may employ differing values or ranges.
[0144] The teachings of the disclosure provided herein can be
applied to other methods, devices, and/or systems, not necessarily
to those described above. The elements and acts of the various
embodiments described above can be combined to provide further
embodiments.
[0145] Any patents and applications and other references noted
above, including any that may be listed in accompanying filing
papers, are incorporated herein by reference. Aspects of the
disclosure can be modified, if necessary, to employ the systems,
functions, and concepts of the various references described above
to provide yet further embodiments of the disclosure.
[0146] These and other changes can be made to the disclosure in
light of the above Detailed Description. While the above
description describes certain embodiments of the disclosure, and
describes the best mode contemplated, no matter how detailed the
above appears in text, the teachings can be practiced in many ways.
Details of the device may vary considerably in its implementation
details, while still being encompassed by the subject matter
disclosed herein. As noted above, particular terminology used when
describing certain features or aspects of the disclosure should not
be taken to imply that the terminology is being redefined herein to
be restricted to any specific characteristics, features, or aspects
of the disclosure with which that terminology is associated.
[0147] In general, the terms used in the following claims should
not be construed to limit the disclosure to the specific
embodiments disclosed in the specification, unless the above
Detailed Description section explicitly defines such terms.
Accordingly, the actual scope of the disclosure encompasses not
only the disclosed embodiments, but also all equivalent ways of
practicing or implementing the disclosure under the claims.
[0148] While certain aspects of the disclosure are presented below
in certain claim forms, the inventors contemplate the various
aspects of the disclosure in any number of claim forms.
Accordingly, the inventors reserve the right to add additional
claims after filing the application to pursue such additional claim
forms for other aspects of the disclosure.
* * * * *