U.S. patent application number 14/707440 was filed with the patent office on 2015-08-27 for systems and methods for using video metadata to associate advertisements therewith.
The applicant listed for this patent is TiVo Inc.. Invention is credited to Matthew G. Berry, Schuyler E. Eckstrom, Albert L. Segars, Benjamin J. Weinberger.
Application Number | 20150245111 14/707440 |
Document ID | / |
Family ID | 40524088 |
Filed Date | 2015-08-27 |
United States Patent
Application |
20150245111 |
Kind Code |
A1 |
Berry; Matthew G. ; et
al. |
August 27, 2015 |
SYSTEMS AND METHODS FOR USING VIDEO METADATA TO ASSOCIATE
ADVERTISEMENTS THEREWITH
Abstract
A system for using metadata from a video signal to associate
advertisements therewith, comprising (i) a segmentation system to
divide the video signal into video clips, (ii) a digitizing system
for digitizing the video clips, (iii) a feature extraction system
for extracting audio and video features from each video clip,
associating each audio feature with respective video clips,
associating each video feature with respective video clips, and
saving the audio and video features into an associated metadata
file, (iv) a web interface to the feature extraction system for
receiving the video clips, and (v) a database, wherein video
signals and associated metadata files are stored and indexed,
wherein the associated metadata file is provided when a video
player requests the corresponding video signal, enabling selection
of a relevant advertisement for presentment in conjunction with
respective video clips based on the associated audio and video
features of the respective video clip.
Inventors: |
Berry; Matthew G.; (Raleigh,
NC) ; Weinberger; Benjamin J.; (Durham, NC) ;
Eckstrom; Schuyler E.; (Beaufort, SC) ; Segars;
Albert L.; (Beaufort, SC) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
TiVo Inc. |
Alviso |
CA |
US |
|
|
Family ID: |
40524088 |
Appl. No.: |
14/707440 |
Filed: |
May 8, 2015 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12206622 |
Sep 8, 2008 |
|
|
|
14707440 |
|
|
|
|
60970593 |
Sep 7, 2007 |
|
|
|
Current U.S.
Class: |
725/34 |
Current CPC
Class: |
H04N 21/26603 20130101;
H04H 60/61 20130101; H04N 21/2668 20130101; H04N 21/84 20130101;
H04N 7/162 20130101; H04N 21/845 20130101; G11B 27/105 20130101;
H04H 60/59 20130101; H04H 60/37 20130101; H04N 9/8205 20130101;
H04N 21/8453 20130101; H04N 21/4622 20130101; H04N 21/44008
20130101; H04N 9/8715 20130101; G06F 16/40 20190101; H04H 60/07
20130101; G06F 16/71 20190101; H04H 60/27 20130101; H04H 60/73
20130101; H04N 21/23418 20130101; G06F 16/58 20190101; H04N 9/8227
20130101; H04N 21/835 20130101; H04N 21/478 20130101; H04N 21/4394
20130101; G11B 27/322 20130101; G11B 27/28 20130101; H04N 21/4331
20130101; H04N 21/434 20130101; H04N 21/8456 20130101; H04H 60/58
20130101; G06Q 30/0277 20130101; G06Q 30/02 20130101; H04N 21/2353
20130101; H04N 21/254 20130101; H04N 21/812 20130101; H04N 21/435
20130101 |
International
Class: |
H04N 21/81 20060101
H04N021/81; G06Q 30/02 20060101 G06Q030/02; G11B 27/10 20060101
G11B027/10; G11B 27/28 20060101 G11B027/28; G11B 27/32 20060101
G11B027/32; H04H 60/07 20060101 H04H060/07; H04H 60/27 20060101
H04H060/27; H04H 60/37 20060101 H04H060/37; H04H 60/58 20060101
H04H060/58; H04H 60/59 20060101 H04H060/59; H04H 60/61 20060101
H04H060/61; H04H 60/73 20060101 H04H060/73; H04N 7/16 20060101
H04N007/16; H04N 9/82 20060101 H04N009/82; H04N 9/87 20060101
H04N009/87; H04N 21/234 20060101 H04N021/234; H04N 21/266 20060101
H04N021/266; H04N 21/434 20060101 H04N021/434; H04N 21/435 20060101
H04N021/435; H04N 21/439 20060101 H04N021/439; H04N 21/462 20060101
H04N021/462; H04N 21/478 20060101 H04N021/478; H04N 21/835 20060101
H04N021/835; H04N 21/84 20060101 H04N021/84; H04N 21/845 20060101
H04N021/845; H04N 21/235 20060101 H04N021/235; G06F 17/30 20060101
G06F017/30 |
Claims
1. A method, comprising: receiving end user characteristics
metadata from a video display device associated with an end user;
receiving a time-based metadata file providing time-coded
information associated with a video program to enable selection of
appropriate advertisements for presentment at specific time-code
locations of the video program; receiving in real-time, an actual
time-code location of the video program as the video program is
being viewed on the video display device by the end user; selecting
in real-time, an advertisement based on the end user
characteristics metadata, the time-based metadata file, and the
actual time-code location as the video program is being viewed;
providing the advertisement as the video program is being viewed,
over the Internet to the video display device, for real-time
presentment of the advertisement on the video display device at the
actual time-code location of the video program, the real-time
presentment of the advertisement comprising presentment of the
advertisement simultaneous and in non-interference with the video
program.
2. The method of claim 1, where the time-based metadata file
includes at least identified extracted audio features, identified
extracted video features, and a context for each video segment of a
plurality of video segments comprising the video program.
3. The method of claim 1, where the time-based metadata file
includes identified extracted audio features generated by
extracting audio features from the video program, identifying
extracted audio features from the video program using audio
processing, each of the audio features including corresponding
start and stop time codes.
4. The method of claim 1, where the time-based metadata file
includes identified extracted video features generated by
extracting video features from the video program, identifying
extracted video features from the video program using visual
processing, each of the video features including corresponding
start and stop time codes
5. The method of claim 1, where the time-based metadata file
includes a context for each video segment of a plurality of video
segments comprising the video program, the context for each video
segment of the plurality of video segments automatically determined
from analysis of identified extracted audio and video features
generated at least in part by extracting audio and video features
from the video program.
6. The method of claim 1, where the time-based metadata file
includes identified extracted video features that represent one or
more of actors, characters, animals, objects, geographic locations,
background, setting, theme, events, and scenes.
7. The method of claim 1, where the time-based metadata file
includes identified extracted audio features that represent one or
more of words, speeches, dialogue, music, discrete sounds, and
background noise.
8. The method of claim 1, where the advertisement correlates to at
least one of: (a) extracted video features obtained from the
time-based metadata file that appears in the video program during
the respective time code location or (b) extracted audio features
obtained from the time-based metadata file that occurs in the video
program during the respective time code location.
9. The method of claim 1, where the time-based metadata file
comprises one or more of: (a) identification of the video program;
(b) a file name; (c) a digital signature; (d) the time-coded
duration of the video program; (e) a keyword list; (f) a time-coded
transcript; and (g) one or more segments of the video program, each
segment having a corresponding start and stop time and each segment
identifying video features and audio features associated with each
said segment.
10. The method of claim 1, wherein the advertisement comprises
background information, detailed information about the video
program, detailed information about actors, scenes, events, or
locations that appear in the video program, or related video
programs.
11. One or more non-transitory machine-readable media storing
instructions which, when executed by one or more processors, cause
the one or more processors to perform: receiving end user
characteristics metadata from a video display device associated
with an end user; receiving a time-based metadata file providing
time-coded information associated with a video program to enable
selection of appropriate advertisements for presentment at specific
time-code locations of the video program; receiving in real-time,
an actual time-code location of the video program as the video
program is being viewed on the video display device by the end
user; selecting in real-time, an advertisement based on the end
user characteristics metadata, the time-based metadata file, and
the actual time-code location as the video program is being viewed;
providing the advertisement as the video program is being viewed,
over the Internet to the video display device, for real-time
presentment of the advertisement on the video display device at the
actual time-code location of the video program, the real-time
presentment of the advertisement comprising presentment of the
advertisement simultaneous and in non-interference with the video
program.
12. The media of claim 11, where the time-based metadata file
includes at least identified extracted audio features, identified
extracted video features, and a context for each video segment of a
plurality of video segments comprising the video program.
13. The media of claim 11, where the time-based metadata file
includes identified extracted audio features generated by
extracting audio features from the video program, identifying
extracted audio features from the video program using audio
processing, each of the audio features including corresponding
start and stop time codes.
14. The media of claim 11, where the time-based metadata file
includes identified extracted video features generated by
extracting video features from the video program, identifying
extracted video features from the video program using visual
processing, each of the video features including corresponding
start and stop time codes
15. The media of claim 11, where the time-based metadata file
includes a context for each video segment of a plurality of video
segments comprising the video program, the context for each video
segment of the plurality of video segments automatically determined
from analysis of identified extracted audio and video features
generated at least in part by extracting audio and video features
from the video program.
16. The media of claim 11, where the time-based metadata file
includes identified extracted video features that represent one or
more of actors, characters, animals, objects, geographic locations,
background, setting, theme, events, and scenes.
17. The media of claim 11, where the time-based metadata file
includes identified extracted audio features that represent one or
more of words, speeches, dialogue, music, discrete sounds, and
background noise.
18. The media of claim 11, where the advertisement correlates to at
least one of: (a) extracted video features obtained from the
time-based metadata file that appears in the video program during
the respective time code location or (b) extracted audio features
obtained from the time-based metadata file that occurs in the video
program during the respective time code location.
19. The media of claim 11, where the time-based metadata file
comprises one or more of: (a) identification of the video program;
(b) a file name; (c) a digital signature; (d) the time-coded
duration of the video program; (e) a keyword list; (f) a time-coded
transcript; and (g) one or more segments of the video program, each
segment having a corresponding start and stop time and each segment
identifying video features and audio features associated with each
said segment.
20. The media of claim 11, wherein the advertisement comprises
background information, detailed information about the video
program, detailed information about actors, scenes, events, or
locations that appear in the video program, or related video
programs.
21. An apparatus, comprising: a subsystem, implemented at least in
part in hardware, that receives end user characteristics metadata
from a video display device associated with an end user; a
subsystem, implemented at least in part in hardware, that receives
a time-based metadata file providing time-coded information
associated with a video program to enable selection of appropriate
advertisements for presentment at specific time-code locations of
the video program; a subsystem, implemented at least in part in
hardware, that receives in real-time, an actual time-code location
of the video program as the video program is being viewed on the
video display device by the end user; a subsystem, implemented at
least in part in hardware, that selects in real-time, an
advertisement based on the end user characteristics metadata, the
time-based metadata file, and the actual time-code location as the
video program is being viewed; a subsystem, implemented at least in
part in hardware, that provides the advertisement as the video
program is being viewed, over the Internet to the video display
device, for real-time presentment of the advertisement on the video
display device at the actual time-code location of the video
program, the real-time presentment of the advertisement comprising
presentment of the advertisement simultaneous and in
non-interference with the video program.
22. The apparatus of claim 21, where the time-based metadata file
includes at least identified extracted audio features, identified
extracted video features, and a context for each video segment of a
plurality of video segments comprising the video program.
23. The apparatus of claim 21, where the time-based metadata file
includes identified extracted audio features generated by
extracting audio features from the video program, identifying
extracted audio features from the video program using audio
processing, each of the audio features including corresponding
start and stop time codes.
24. The apparatus of claim 21, where the time-based metadata file
includes identified extracted video features generated by
extracting video features from the video program, identifying
extracted video features from the video program using visual
processing, each of the video features including corresponding
start and stop time codes
25. The apparatus of claim 21, where the time-based metadata file
includes a context for each video segment of a plurality of video
segments comprising the video program, the context for each video
segment of the plurality of video segments automatically determined
from analysis of identified extracted audio and video features
generated at least in part by extracting audio and video features
from the video program.
26. The apparatus of claim 21, where the time-based metadata file
includes identified extracted video features that represent one or
more of actors, characters, animals, objects, geographic locations,
background, setting, theme, events, and scenes.
27. The apparatus of claim 21, where the time-based metadata file
includes identified extracted audio features that represent one or
more of words, speeches, dialogue, music, discrete sounds, and
background noise.
28. The apparatus of claim 21, where the advertisement correlates
to at least one of: (a) extracted video features obtained from the
time-based metadata file that appears in the video program during
the respective time code location or (b) extracted audio features
obtained from the time-based metadata file that occurs in the video
program during the respective time code location.
29. The apparatus of claim 21, where the time-based metadata file
comprises one or more of: (a) identification of the video program;
(b) a file name; (c) a digital signature; (d) the time-coded
duration of the video program; (e) a keyword list; (f) a time-coded
transcript; and (g) one or more segments of the video program, each
segment having a corresponding start and stop time and each segment
identifying video features and audio features associated with each
said segment.
30. The apparatus of claim 21, wherein the advertisement comprises
background information, detailed information about the video
program, detailed information about actors, scenes, events, or
locations that appear in the video program, or related video
programs.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a Continuation of U.S. patent
application Ser. No. 12/206,622, filed Sep. 8, 2008, which claims
the benefit under 35 U.S.C. .sctn.119(e) of U.S. Provisional Patent
Application Ser. No. 60/970,593, entitled "Systems and Methods for
Using Video Metadata to Associate Advertisements Therewith," filed
Sep. 7, 2007. The entire contents of the above mentioned
applications are hereby incorporated by reference for all purposes
as if fully set forth herein. The applicant(s) hereby rescind any
disclaimer of claim scope in the parent application(s) or the
prosecution history thereof and advise the USPTO that the claims in
this application may be broader than any claim in the parent
application.
TECHNICAL FIELD
[0002] The present invention relates generally to targeted
advertisements and, more particularly, to methods and systems for
delivering targeted advertisements in association with a video
program based on metadata associated with the video program.
BACKGROUND
[0003] An advertisement promotes the goods, services,
organizations, ideas, etc. of an organization or company via a
media. Traditional advertisements were made on printed materials
and were available on pamphlets, flyers, billboards, posters,
newspapers, and magazines. As electronic technology developed,
commercials were incorporated into multimedia content, such as
radio, television, and movies and were typically presented as an
interruption of the primary content--occurring either before the
primary content or at intervals during the primary content. Today,
advertisements are placed within television programs and movies
through product placements and are available on the Internet and on
electronically stored content (e.g., DVDs), such as in commercials,
trailers, and in promotions on DVDs.
[0004] Traditional advertisements have typically targeted general
audiences. Such advertisements can be tailored somewhat to the
audience likely to be watching a movie, television program or show
or event, or radio station or program based on the general content
of the program or show and based on the likely demographic of the
audience who would be expected to watch such program or show. The
Internet provides advertisers with a more specific targeted
audience and, hence, higher potential return on their advertisement
expenses. For example, because each computer contains potentially
trackable and usable information about user(s) of that computer
(e.g., through the use of cookies, location information, language
settings, and prior web sites accessed), Internet websites are able
to use such information to generate banner or pop-up advertisements
that are based on some information available about potential users
of each computer. In yet another example, Internet search engine
sites are able to "sell" the terms or keywords used by an Internet
searcher to present targeted advertisements that have been
associated with specific keywords or search terms. Such
advertisements are presented in pop-up windows, banner
advertisement windows, or as "sponsored" links to websites that
have requested and paid for prominent placements on the search
results screen for specific keywords or search terms. An Internet
user that searches "keywords" is more likely than a member of the
general public to be a potential customer of a good or services
associated with such keywords.
[0005] With the continuing advance of technology, bandwidth, and
availability of broadband access, online video viewing is becoming
increasingly popular and promises to become even more prevalent
with the continuing expansion and use of IPTV and video on demand.
Unlike static or substantially-static content (text, photographs)
that is typically available on a webpage, that gets updated only
periodically (more frequently for a news webpage and much less
frequently for a standard company webpage), and that sustains a
particular viewer for only a brief amount of time, commercial
videos over the Internet provide an opportunity to capture a
viewing audience for a substantially longer amount of time.
However, audiences that are used to watching movies and television
on DVDs or off of a DVR are unwilling to view conventional
advertisements that interrupt the flow of the video stream.
[0006] For these and many other reasons, there is a need for a
technology platform that is able to provide and display
advertisements that are targeted to the specific audience and that
are tied to specific programming being viewed. There is a need for
methods and systems that enable such advertisements to be viewed
selectively and simultaneously with the primary content in such a
way that does not interfere with the primary content. There are yet
further needs for methods and systems that provide real-time
advertisements for the viewer regardless of whether the viewer is
accessing the content from off of the Internet or from a DVD or
similar electronic media storage if the display device has access
to the Internet.
[0007] Therefore, it is apparent that a heretofore unaddressed need
exists in the art to address the aforementioned deficiencies and
inadequacies.
SUMMARY
[0008] The present invention, in one aspect, relates to a method
for using metadata from a video signal to associate advertisements
therewith. In one embodiment, the method includes (i) segmenting
the video signal into a plurality of video clips, (ii) extracting
audio and video features from a video signal, (iii) digitizing the
plurality of video clips, (iv) identifying extracted audio features
within respective digitized video clips using audio processing,
wherein each audio feature is associated with the respective
digitized video clip, (v) identifying extracted video features
within respective digitized video clips using visual processing,
wherein each video feature is associated with the respective
digitized video clip, (vi) saving the associated audio features and
associated video features in a metadata file, (vii) associating the
metadata file with the video signal, (viii) storing the metadata
file in a database, and (ix) providing the associated metadata file
when a video player requests the corresponding video signal. The
associated metadata file enables selection of a relevant
advertisement for presentment in conjunction with each respective
digitized video clip of the corresponding video signal based on the
associated audio features and the associated video features of the
respective digitized video clip.
[0009] The video features includes at least one of (i) one or more
people, (ii) one or more characters, (iii) one or more animals,
(iv) one or more objects, (v) one or more geographic locations,
(vi) background, (vii) one or more scene, or a combination of these
features. In one embodiment, these video features are extracted by
a visual processing system of the feature extraction system. In
another embodiment, the method includes the step of identifying and
recognizing one or more objects from the video signal by an object
classification system of the feature extraction system. In yet
another embodiment, the method includes the step of identifying and
recognizing one or more scenes from the video signal by a scene
classification system of the feature extraction system. In yet
another embodiment, the method includes a combination of both
steps.
[0010] In one embodiment, the video signal may contain accompanying
audio signal. Audio features of the audio signal includes at lest
one of (i) a list of one or more words, (ii) speeches by one or
more people, (iii) dialogue by one or more people, (iv) music, (v)
background sound, and a combination of these audio features. In
another embodiment, the method further includes the steps of: (i)
identifying and recognizing one or more background sounds from the
audio signal by using a sound classification system of the feature
extraction system, (ii) identify and recognizing one or more music
segments from the audio signal by using a music classification
system of the feature extraction system, and (iii) identifying and
recognizing human speech, dialogues, one or more words, one or more
phrases by using a speech recognition system of the feature
extraction system. In yet another embodiment, the method further
includes the steps of: (i) collecting audio features of the audio
signal by using audio signal recognition system of the feature
extraction system, and (ii) saving the collected audio features in
the metadata file.
[0011] In one embodiment, the metadata file is an XML file. The
metadata file contains one or more of (i) video identification
information, (ii) a file name, (iii) a digital signature, (iv) the
length of the video signal, (v) a keyword list, (vi) a time-coded
transcript, (vii) one or more segments with a corresponding start
and stop time, (viii) one or more contents, (ix) one or more
characters, (x) one or more animals, (xi) one or more objects, and
(xii) a list of vocabulary.
[0012] In another aspect, the present invention relates to a system
for using metadata from a video signal to associate advertisements
therewith. In one embodiment, the system has (i) a segmentation
system for dividing the video signal into a plurality of video
clips, (ii) a digitizing system for digitizing the plurality of
video clips, (iii) a feature extraction system for extracting audio
features and video features from each digitized video clip,
associating each audio feature with at least one digitized video
clip, associating each video feature with at least one digitized
video clip, and saving the audio features and video features into a
metadata file associated with the video signal, (iv) a web
interface to the feature extraction system for receiving the
digitized video clips, and (v) a database accessible by a third
party user, wherein video signals and associated metadata files are
stored and indexed with a unique filename for each video signal in
the database and its corresponding video signal. The associated
metadata file is provided when a video player requests the
corresponding video signal, and enables selection of a relevant
advertisement for presentment in conjunction with each respective
digitized video clip of the corresponding video signal based on the
associated audio features and the associated video features of the
respective digitized video clip.
[0013] In one embodiment, the video features comprise at least one
of (i) one or more people, (ii) one or more characters, (iii) one
or more animals, (iv) one or more objects, (v) one or more
geographic locations, (vi) background, (vii) one or more scenes,
and (viii) any combination thereof. In another embodiment, the
video signal includes an accompanying audio signal.
[0014] In another embodiment, the audio features of the audio
signal comprise one or more of (i) a list of one or more words,
(ii) speeches by one or more people, (iii) dialogue by one or more
people, (iv) music, (v) background sound, and (vi) any combination
thereof. In one feature, the feature extraction system further
comprises an audio signal recognition (ASR) system to identify and
recognize the audio features of the video signal, and a visual
processing system to identify and recognize the visual features of
the video signal. In another feature, the visual processing system
further comprises a object classification system to identify and
recognize one or more objects from the video signal, and a scene
classification system to identify and recognize one or more scenes
from the video signal. In yet a further feature, the audio signal
recognition system further comprises a sound classification system
to identify and recognize one or more background sounds from the
audio signal, and a music classification system to identify and
recognize one or more music segments from the audio signal, and a
speech recognition system to identify and recognize human speech,
dialogues, one or more words, one or more phrases. In another
feature, the metadata file comprises one or more of video
identification information, a file name, a digital signature, the
length of the video signal, a keyword list, a time-coded
transcript, one or more segments with a corresponding start and
stop time, one or more contents, one or more characters, one or
more pets, one or more objects, and a list of vocabulary.
[0015] These and other aspects of the present invention will become
apparent from the following description of the preferred embodiment
taken in conjunction with the following drawings, although
variations and modifications therein may be affected without
departing from the spirit and scope of the novel concepts of the
disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The accompanying drawings illustrate one or more embodiments
of the invention and, together with the written description, serve
to explain the principles of the invention. Wherever possible, the
same reference numbers are used throughout the drawings to refer to
the same or like elements of an embodiment, and wherein:
[0017] FIG. 1A illustrates a first embodiment of an advertisement
placement system of the present invention;
[0018] FIG. 1B illustrates a second embodiment of an advertisement
placement system of the present invention;
[0019] FIG. 1C illustrates a third embodiment of an advertisement
placement system of the present invention;
[0020] FIG. 1D illustrates a fourth embodiment of an advertisement
placement system of the present invention;
[0021] FIG. 1E illustrates a fifth embodiment of an advertisement
placement system of the present invention;
[0022] FIG. 1F illustrates a sixth embodiment of an advertisement
placement system of the present invention;
[0023] FIG. 2 illustrates one representative display screen for
viewing a video program and advertisements associated herewith
based on underlying time-coded metadata;
[0024] FIG. 3 illustrates a high level intake system for receiving
video files and generating underlying time-coded metadata;
[0025] FIG. 4 illustrates a more detailed flow chart describing the
extraction of metadata from a video signal.
[0026] FIGS. 5A and 5B illustrate one exemplary meta data file
generated and used within the present invention.
DETAILED DESCRIPTION
[0027] The present invention is more particularly described in the
following examples that are intended as illustrative only since
numerous modifications and variations therein will be apparent to
those skilled in the art. Various embodiments of the invention are
now described in detail. Referring to the drawings, like numbers
indicate like components throughout the views. As used in the
description herein and throughout the claims that follow, the
meaning of "a", "an", and "the" includes plural reference unless
the context clearly dictates otherwise. Also, as used in the
description herein and throughout the claims that follow, the
meaning of "in" includes "in" and "on" unless the context clearly
dictates otherwise.
[0028] The terms used in this specification generally have their
ordinary meanings in the art, within the context of the invention,
and in the specific context where each term is used.
[0029] Certain terms that are used to describe the invention are
discussed below, or elsewhere in the specification, to provide
additional guidance to the practitioner in describing the apparatus
and methods of the invention and how to make and use them. For
convenience, certain terms may be highlighted, for example using
italics and/or quotation marks. The use of highlighting has no
influence on the scope and meaning of a term; the scope and meaning
of a term is the same, in the same context, whether or not it is
highlighted. It will be appreciated that the same thing can be said
in more than one way. Consequently, alternative language and
synonyms may be used for any one or more of the terms discussed
herein, nor is any special significance to be placed upon whether
or not a term is elaborated or discussed herein. Synonyms for
certain terms are provided. A recital of one or more synonyms does
not exclude the use of other synonyms. The use of examples anywhere
in this specification, including examples of any terms discussed
herein, is illustrative only, and in no way limits the scope and
meaning of the invention or of any exemplified term. Likewise, the
invention is not limited to various embodiments given in this
specification. Furthermore, subtitles may be used to help a reader
of the specification to read through the specification, which the
usage of subtitles, however, has no influence on the scope of the
invention.
[0030] As used herein, a video program refers to any multimedia
content, such as a movie, a television program, an event, a video,
an advertisement, a broadcast, or the like that a user would be
interested in viewing online or in recorded format.
[0031] Turning now to FIG. 1A, a first preferred embodiment of an
advertisement placement system 100A based primarily upon time-coded
metadata associated with an underlying video program displayed
therewith is illustrated. In this first embodiment, it is
contemplated that the video program will be viewed in a
Video-on-Demand (VOD) or video streaming context from a video
provider 110 and that the underlying metadata file associated with
the video program, once created, is maintained by the video
provider 110. This system 100A includes one or more video storage
databases 115 of the video provider 110 and a video server 113 that
provides video programs in VOD or video streaming format over a
computer network, such as the Internet for example, to a viewer 150
(or end user).
[0032] Before a specific video is provided to the viewer 150, a
video file 120 associated with the video program 121 is preferably
provided to a metadata generator 130. The video file 120 has or
includes a unique file name or other video identifier (designated
herein by the variable VID). As will be described in greater detail
hereinafter, the metadata generator 130 receives the video file 120
and, using a metadata processor 133, creates or generates a
time-coded metadata file 125 associated with the corresponding
video file 120 and underlying video program 121. As shown in FIG.
1A, this metadata file 125 is stored in a database 135 of the
metadata generator 130 but also provided back to the video provider
110 and associated with the corresponding video file 120 and
underlying video program 121 in video storage databases 115.
[0033] When a request 140 for VOD or video streaming of the video
program 121 associated with the video file 120 is received from a
video display device 155 (such as a computer, Internet or
interactive TV, or similar video playback or viewing device) of the
viewer 150, the video provider 110 begins providing access to the
video program 121 in conventional fashion (i.e., this assumes all
communication and billing parameters are already or previously
satisfied; such communication and billing parameters being beyond
the scope of the present invention but within the scope and
understanding of those skilled in the art). Simultaneously or
substantially simultaneously with the start of the video streaming,
the metadata file 125 associated with the video file 120 is
provided to an advertisement distributor 160, which uses an
advertisement server 163 to process the metadata file 125 to
selectively identify one or more appropriate advertisements from
its database 165 of potential advertisements that is appropriate to
provide in conjunction with the video program 121 and,
specifically, with each discrete segment of the video program 121
based on its time-coded metadata. The selected advertisement
file(s) 175 are then provided to the video display device 155 of
the viewer 150. The metadata file 125 may be provided in whole to
the advertisement distributor 160 or it may be parsed and provided
in piece meal or "as needed" fashion to the advertisement
distributor 160.
[0034] Preferably, as shown in FIG. 2, the video display 200 of the
video display device 155 is configured to receive and display
various types of advertisements in conjunction with the actual
video display. Such advertisements are preferably displayed in
manners that do not interrupt or delay viewing of the requested
video, as would a conventional commercial shown on broadcast
television. For example, such advertisements may be shown as
conventional banner ads that appear (i) in an optional vertical
side window or panel 205 or (ii) in an optional horizontal window
or panel 210 that do not interfere with the main video display area
225. The video display 200 may also include conventional header
areas and menu control areas 215, 220. Obviously, the placement and
purpose of each of the windows and panels of the video display 200
are within the purview of those skilled in the art. In addition,
although the advertisements can be displayed as banner ads, it is
also possible and expected that alternative advertisements, such as
interstitial ads, bug ads, or hyperlinks that can be opened or
accessed by the viewer 150, may be used alternatively or in
conjunction with the banner advertisements. Such additional
advertisements may be placed within the main video display area
225, such as in the lower portion 230 of the main video display
area 225. In addition, it should also be understood that while
"advertisements" are being used generally to define the information
that may be displayed around and during the video playback, it is
also possible and expected that other information associated with
the video playback, such as background information, more detailed
information about the video program, the actors in the video
program, scenes, events or locations that appear in the video
program, related videos or information, and the like, can be
displayed, advertised, or linked during the playback and tied to
the current time-code of the video as it is being viewed. Such
advertisements can be text, still graphics, videos, audio,
hyperlinks, or the like. In some embodiments, the advertisements
merely display information. In other embodiments, the
advertisements include a hyperlink that, when activated, pause the
primary video and allow the user to view or access the
advertisement or other additional information.
[0035] Although not shown in FIG. 1A, as the video program 121 is
being viewed by the viewer 150, the actual time-code of the video
program 121 is provided either from the video provider 110 or,
preferably, from the video display device 155 itself. This
time-code information associated with the actual viewing of the
video program 121 either is provided in real time to the
advertisement distributor 160 so that appropriate advertisement
files 175 can be provided back to the video display device 155 in
real time or, alternatively, is provided in advance to the video
display device 155 for caching and later access at the appropriate
time, based on the time-code location of the video program 121 as
it is being viewed.
[0036] In an optional embodiment of that shown in FIG. 1A, user or
video display device characteristics 185 are obtainable from the
video display device 155 and may be provided to the advertisement
distributor 160. Such user or video display device characteristics
185 typically include location, age, gender, interests, Internet
websites visited, and other similar demographic data that may be
obtained from cookies or similar tracking information. The
advertisement distributor 160 utilizes the video display device
characteristics 185 to generate advertisement files 175 that are
further targeted and customized for the viewer 150. The
advertisement files 175 are provided back to the video display
device 155 for display during viewing of the video program 121.
Such targeted advertisements are still shown and synchronized with
the time-coded video program 121; however, the user data 185
enables the advertisement server 163 to select more accurately
between one or more potentially valid advertisements that could be
associated with the video program for a particular time-coded
segment.
[0037] Turning now to FIG. 1B, a second preferred embodiment of an
advertisement placement system 100B based primarily upon time-coded
metadata associated with an underlying video program displayed
therewith is illustrated. In this embodiment, it is contemplated
that the video program is still viewed in a Video-on-Demand (VOD)
or video streaming context from a video provider 110; however,
unlike the first embodiment, in this scenario, the underlying
metadata file 125 associated with the video program 121 is sent to
the video display device 155 along with the video program 121. The
video display device 155 (or at least the video player
system/software installed on the video display device) then sends
the metadata file 125 (or parsed segments thereof at appropriate
time intervals) to the advertisement distributor 160 so that
appropriate advertisement file(s) 175 are returned back to the
player 155. Again, optionally, user data 185 may be provided from
the video display device 155 to the advertisement distributor 160
to enable the advertisement server 163 to select more accurately
between one or more potentially valid advertisements that could be
associated with the video program for a particular time-coded
segment.
[0038] Turning now to FIG. 1C, a third preferred embodiment of an
advertisement placement system 100C based primarily upon time-coded
metadata associated with an underlying video program displayed
therewith is illustrated. In this embodiment, it is contemplated
that the video program is also viewed in a Video-on-Demand (VOD) or
video streaming context from a video provider 110; however, unlike
the first and second embodiments, in this scenario, the underlying
metadata file 125 associated with the video program 121, once
created, is not provided back to the video provider 110. This
embodiment is similar in most respects to the first embodiment;
however, when a request for video 140 is received from the viewer
150, the video provider 110 sends a request 145 for metadata file
125 associated with the video program 121. This request 145 is
either sent directly to the metadata generator 130 (as shown) or
(as not shown) to the advertisement distributor 160 first, which
then requests the same from the metadata generator 130. The
metadata generator 130 then retrieves the appropriate time-coded
metadata file 125 from its database 135 and provides it to the
advertisement distributor 160. The remaining aspects, variations,
and alternatives of this embodiment are similar to those discussed
in association with the first embodiment.
[0039] Turning now to FIG. 1D, a fourth embodiment is illustrated,
which is another variation of the embodiment shown in FIG. 1C.
Again, the metadata file 125 is maintained by the metadata
generator 130, but upon receipt of a request 145, this time from
the video display device 155, the metadata file 125 is provided to
the video display device 155 and provided, preferably in parsed or
"as needed" basis to the advertisement distributor 160. In another
slight alternative arrangement, in response to the request 145 (or
series of requests containing the video ID and time code location)
from the video display device 155, the metadata generator 130 may
provide the "as needed" portion of the metadata file 125 to the
advertisement distributor 160 corresponding to the video segment
being viewed by the viewer 150.
[0040] Turning now to FIG. 1E, a fifth preferred embodiment of an
advertisement placement system 100E based primarily upon time-coded
metadata associated with an underlying video program displayed
therewith is illustrated. In this embodiment, it is contemplated
that the video program is actually provided or sold to the viewer
150 on a DVD 117 or similar storage medium, or is provided as a
file download (not shown) (as opposed to a mere video streaming in
which the file is not actually downloaded) for later playback. This
embodiment is similar to the first embodiment to the extent that
the underlying metadata file associated with the video program,
once created, is maintained by the video provider 110. This system
100E includes one or more video storage databases 115 of the video
provider 110 and a video manager 116 that communicates with the
metadata generator 130, the advertisement distributor 160, and the
video storage databases 115, and which manages the production of
stored video programs 117 for distribution in DVD format or for
download or the like.
[0041] Similar to the first embodiment, before a stored video
program 117 is created and made available to an viewer 150, a video
file 120 associated with the stored video program 117 is preferably
provided to the metadata generator 130. The video file 120 has or
includes a unique file name or other video identifier (designated
herein by the variable VID). As will be described in greater detail
hereinafter, the metadata generator 130 receives the video file 120
and, using a metadata processor 133, creates or generates a
time-coded metadata file 125 associated with the corresponding
video file 120 and underlying video program 121. This metadata file
125 is stored in a database 135 of the metadata generator 130 but
is also provided back to the video provider 110 and associated with
the corresponding video file 120 in video storage databases
115.
[0042] As part of the process for creating a stored video program
117, the metadata file 125 associated with the video file 120 is
provided to the advertisement distributor 160, which uses an
advertisement server 163 to process the metadata file 125 to
selectively identify one or more appropriate advertisements from
its database 165 of potential advertisements that is appropriate to
provide in conjunction with the stored video program 117 and,
specifically, with each discrete segment of the stored video
program 117 based on its time-coded metadata. The selected
advertisement file(s) 175 are then provided back to the video
provider 110, which incorporates the advertisement files 175
directly on the stored video program 117 along with the actual
video file 120. In this manner, the stored video program 117 has
all necessary and desired advertisement files 175 built into the
stored video program 117 and plays advertisements during viewing of
the video in situations in which the video display device 155 does
not (intentional, unintentional, non-compatible, or for whatever
reason) have real time access to the Internet to obtain real-time
advertisements associated with the video. The remaining aspects,
variations, and alternatives of this embodiment are similar to
those discussed in association with the first embodiment.
[0043] Turning now to FIG. 1F, a sixth preferred embodiment of an
advertisement placement system 100F based primarily upon time-coded
metadata associated with an underlying video program displayed
therewith is illustrated. This embodiment is similar to the fifth
embodiment; however, it is contemplated that the video display
device 155 has access to the Internet and, thus, is able to obtain
real-time advertisement files 175 from advertisement distributor
160. This arrangement is preferred to the fifth embodiment since
advertisements associated with the video program are not fixed and
unchangeable on the stored video program 117 media. Instead, over
time and with each viewing of the stored video program 117, the
viewer 150, potentially, has a new advertisement experience.
[0044] For this reason, it is desirable to have the time-coded
metadata file 125 actually stored on the stored video program 117
along with the video file 121 so that when the video program is
actually being viewed by the viewer 150 on the video display device
155, the video display device 155 initiates a communication with
the advertisement distributor 160 to provide the time-coded
metadata file 125 and to receive back appropriate advertisement
file(s) 175. Again, in an alternative arrangement, it may be
desirable for the viewer 150 to provide or for the advertisement
distributor 160 to have user or video display device
characteristics 185 (as described in greater detail previously) so
that the advertisement files 175 associated with the time-coded
metadata of the stored video program 117 are tailored and targeted
slightly more at the viewer 150, but still associated with the
appropriate segment of the video program.
[0045] In an additional, alternative embodiment (not shown), the
embodiments shown in FIGS. 1E and 1F are combined to store a base
set of advertisement files 175 on the stored video program 117. The
advertisement files 175 are for situations in which the video
display device 155 is "offline" and does not have access to the
Internet. The video display device 155 (or the software associated
with the stored video program 117) is configured to interact in
real time with the advertisement distributor 160 to obtain current
and up-to-date advertisement files 175 when it is actually able to
access the Internet and communicate with the advertisement
distributor 160. In such a scenario, the more up-to-date
advertisement files 175 are shown during the video playback if they
are available. If they are not available, the pre-stored, base
advertisement files 175 are used.
[0046] It should also be understood that there are many other
alternative arrangements and variations of how and where various
files are stored and provided. The embodiments shown in FIGS. 1A
through 1F represent just some of the more likely arrangements and
components involved. Additionally, there may be multiple additional
parties involved such that the roles and responsibilities for
providing and receiving files, for processing files, and for
exchanging and storing data can be handled by different parties or
components. For example, there may be two separate parties or
components used to generate time-coded metadata files and to store
and provide such time-coded metadata files to third parties upon
request. Likewise, the video provider may want to act as the
throughput for the advertisement distributor so that the video
display device never interacts directly with a specified
advertisement distributor. This can be controlled more easily in
the video streaming context, since the links to the advertisement
distributor can be dynamically changed over time to point to the
preferred or desired advertisement distributor associated with the
video provider. For the stored video program embodiments, it may be
desirable to have advertisement links that go back through the
video provider--this would enable the video provider to update and
change the advertisement distributor used over longer periods of
time and prevent such links, hard-coded onto the stored video
program from becoming obsolete or broken.
[0047] FIGS. 3 through 5B provide more detailed explanations are
described for the creation of time-coded metadata files associated
with underlying video programs. Turning first to FIG. 3, a high
level view 300 of the intake process for creating a time-coded
metadata file 125 is described. The metadata generator 130 receives
the video file 120, which has or includes a unique file name or
other video identifier (designated herein by the variable VID). The
video file 120 and identifier are stored initially in a SOAP
database 310. Preferably, the video file 120 is received in .mp4
(MPEG 4) format or, if not, is converted to such (or similar)
format, as may be changed or updated from time to time. A hash of
this file is run to generate a unique "video signature" and is
checked against the existing video signatures stored in the
database 135 to determine if a time-coded metadata file already
exists for the video file 120 received. If so, the appropriate
time-coded metadata file 125 is provided to any requesting party.
If the file does not exist, the video file in .mp4 format is then
provided to an audio processor 320 and to a video processor 330.
Once the audio and video have been parsed and used to identify
underlying metadata of the video program, the time-coded metadata
file is stored in database 135 and is available for distribution or
use, as described in association with the embodiments of FIGS. 1A
through 1F.
[0048] FIG. 4 illustrates, in more detail, the steps 400 performed
by the metadata processor 133 when a video file 120 is received for
intake and processing. First, the video identifier is obtained from
the video provider (step 410). As stated previously, this
identifier may simply be the file name for the video file or it may
be the title and year of the video file or something similar. Based
on this identifier, it is possible to determine whether this
particular video had been previously processed (step 412). If the
video had already been processed previously, the database storing
such time-coded metadata file is updated and cross-referenced with
the identifier and existing video signature (step 430), then the
process jumps to step 480 and determines whether another video
needs to be processed or not. If the video had not been processed
previously, at least based on its identifier, the system then
downloads or receives the full video file from the video provider
for further processing (step 415). As part of step 415, the video
file is converted to .mp4 (MPEG 4) format, if it is not already in
such format, and a hash or "video signature" of the .mp4 version of
the video is created. Based on this video signature, it is possible
to determine whether this particular video had been previously
processed, even if the video identifier did not match a
previously-known identifier (step 420). If the video had already
been processed previously, the database storing such time-coded
metadata files is updated and cross-referenced with the additional
identifier and video signature, if necessary (step 430), then the
process jumps to step 480 and determines whether another video
needs to be processed or not. If the video had not been processed
previously, it is then submitted to an audio processor (step 440)
for audio capture and separation and speech recognition (among
other things) and a video processor (step 450) for classification
and analysis (among other things). The resulting metadata is
compiled, tied to or associated with the underlying timecode of the
video program, and stored in the metadata database (step 470). The
process then determines whether there is another video to be
processed (step 480). If so, the process 400 starts over. If not,
the process 400 ends.
[0049] Generally, when a video program is received or converted to
.mp4 format, an underlying time-code exists or is established for
the video program. All audio and video metadata identified or
extracted from the video program by the metadata processor 133 is
then tied or associated with specific points or regions within the
time code. Initially, key identifiers for the video program are
determined and identified. This includes all characters who appear
in the video program, key or reoccurring scene locations, key props
and objects, key terms, etc. The key identifiers are typically
audio features and/or video features, and are extracted from the
video signal. Then, the video portion of the video program is
parsed and divided into "short clips" or discrete segments. Such
segments can be specified by a predetermined time frame, but can
alternatively be identified based on information within the video
signal, such as, for example, a change of camera shots, angles,
scene change, scene break or the like. It should also be noted that
different video segments can be defined by different predetermined
time frames.
[0050] Once the video signal is divided, then each segmented video
clip is then digitized. The breaks between each segment is
identified and tied to the time-code timeline. Next, the metadata
processor 133 runs a language and speech recognition process
through the entire video and associates all of the dialogue and
background audio with the appropriate video segments and
time-codes. Next, characters within the video signal are associated
with each of the dialogue entries. Finally, the metadata processor
133 runs a number of visual processing programs to identify
characters, objects, scenes within each segment of the video
program. Each identified audio feature is thus associated with at
least one segmented video clip. Similarly, each identified video
feature is also associated with at least one segmented video
clip.
[0051] The associated metadata file enables selection of a relevant
advertisement for presentment in conjunction with each respective
digitized video clip of the corresponding video signal based on the
associated audio features and the associated video features of the
respective digitized video clip. Those of skill in the art will
readily appreciate that presentment is typically implemented by a
visual display device, but may also include email, file delivery,
and other delivery methods.
[0052] The video features identified by the visual processor
include at least one of (i) people, (ii) characters, (iii) animals,
(iv) objects, (v) geographic locations, (vi) background, (vii)
scenes, or a combination of any of these features. Preferably,
these video features are extracted by a visual processing system of
the feature extraction system. In one embodiment, the method
includes the step of identifying and recognizing one or more
objects from the video signal by an object classification system of
the feature extraction system. In another embodiment, the method
includes the step of identifying and recognizing one or more scenes
from the video signal by a scene classification system of the
feature extraction system. In yet another embodiment, the method
includes a combination of both steps.
[0053] Audio features of the audio signal includes at least one of
(i) a list of one or more words, (ii) speeches by one or more
people, (iii) dialogue by one or more people, (iv) music, (v)
background sound, and a combination of these audio features. The
method further includes the steps of: (i) identifying and
recognizing one or more background sounds from the audio signal by
using a sound classification system of the feature extraction
system, (ii) identify and recognizing one or more music segments
from the audio signal by using a music classification system of the
feature extraction system, and (iii) identifying and recognizing
human speech, dialogues, one or more words, one or more phrases by
using a speech recognition system of the feature extraction system.
The method further includes the steps of: (i) collecting audio
features of the audio signal by using audio signal recognition
system of the feature extraction system, and (ii) saving the
collected audio features in the metadata file.
[0054] Preferably, the metadata file is in XML format. An exemplary
portion of a time-coded metadata file, in XML format, is
illustrated in FIGS. 5A and 5B. The metadata file contains one or
more of (i) video identification information, (ii) a file name,
(iii) a digital signature, (iv) the length of the video signal, (v)
a keyword list, (vi) a time-coded transcript, (vii) one or more
segments with a corresponding start and stop time, (viii) one or
more contents, (ix) one or more characters, (x) one or more
animals, (xi) one or more objects, and (xii) a list of
vocabulary.
[0055] The foregoing description of the exemplary embodiments of
the invention has been presented only for the purposes of
illustration and description and is not intended to be exhaustive
or to limit the invention to the precise forms disclosed. Many
modifications and variations are possible in light of the above
teaching.
[0056] The embodiments were chosen and described in order to
explain the principles of the invention and their practical
application so as to enable others skilled in the art to utilize
the invention and various embodiments and with various
modifications as are suited to the particular use contemplated.
Alternative embodiments will become apparent to those skilled in
the art to which the present invention pertains without departing
from its spirit and scope. Accordingly, the scope of the present
invention is defined by the appended claims rather than the
foregoing description and the exemplary embodiments described
therein.
* * * * *