U.S. patent application number 10/303045 was filed with the patent office on 2003-06-26 for alternate delivery mechanisms of customized video streaming content to devices not meant for receiving video.
Invention is credited to Begeja, Lee, Gibbon, David Crawford, Liu, Zhu, Markowitz, Robert Edward, Renger, Bernard Simon, Shahraray, Behzad, Zamchick, Gary Lee.
Application Number | 20030120748 10/303045 |
Document ID | / |
Family ID | 27364714 |
Filed Date | 2003-06-26 |
United States Patent
Application |
20030120748 |
Kind Code |
A1 |
Begeja, Lee ; et
al. |
June 26, 2003 |
Alternate delivery mechanisms of customized video streaming content
to devices not meant for receiving video
Abstract
In one exemplary embodiment, the invention relates to a system
and method for delivering content, including: reading profile data
related to a user; automatically identifying a portion of at least
one source video stream based on relevance to the profile data; and
transforming the identified portion of the at least one source
video stream into a destination media, wherein the destination
media does not comprise a video stream.
Inventors: |
Begeja, Lee; (Gillette,
NJ) ; Gibbon, David Crawford; (Lincroft, NJ) ;
Liu, Zhu; (Marlboro, NJ) ; Markowitz, Robert
Edward; (Glen Rock, NJ) ; Renger, Bernard Simon;
(New Providence, NJ) ; Shahraray, Behzad;
(Freehold, NJ) ; Zamchick, Gary Lee; (Tenafly,
NJ) |
Correspondence
Address: |
COOLEY GODWARD LLP
One Freedom Square
Reston Town Center
11951 Freedom Drive
Reston
VA
20190-5656
US
|
Family ID: |
27364714 |
Appl. No.: |
10/303045 |
Filed: |
November 25, 2002 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10303045 |
Nov 25, 2002 |
|
|
|
10034679 |
Dec 28, 2001 |
|
|
|
60282204 |
Apr 6, 2001 |
|
|
|
60296436 |
Jun 6, 2001 |
|
|
|
Current U.S.
Class: |
709/217 ;
348/E7.071 |
Current CPC
Class: |
H04N 21/4828 20130101;
H04N 21/632 20130101; H04N 7/17318 20130101; H04N 21/812 20130101;
H04N 21/25891 20130101; H04N 21/482 20130101; H04N 21/26603
20130101 |
Class at
Publication: |
709/217 |
International
Class: |
G06F 015/16 |
Claims
What is claimed is:
1. A method for delivering content, comprising: reading profile
data related to a user; automatically identifying a portion of at
least one source video stream based on relevance to the profile
data; and transforming the identified portion of the at least one
source video stream into a destination media, wherein the
destination media does not comprise a video stream.
2. The method of claim 1, wherein the profile data is updated.
3. The method of claim 1, wherein the profile data comprises
topical information.
4. The method of claim 1, wherein automatically identifying
comprises determining a start of the identified portion of the at
least one source video stream and an end of the identified portion
of the at least one source video stream.
5. The method of claim 1, wherein transforming comprises sampling
the identified portion of the at least one source video stream, and
the destination media comprises at least one image.
6. The method of claim 1, wherein transforming comprises extracting
information from the identified portion of the at least one source
video stream to yield closed caption text.
7. The method of claim 6, wherein transforming further comprises
processing the closed caption text for at least one of error
correction and language translation.
8. The method of claim 6, wherein transforming further comprises a
text-to-speech conversion of the closed caption text into an audio
stream.
9. The method of claim 8, wherein transforming comprises storing
the audio stream as a sound file.
10. The method of claim 1, wherein transforming comprises
demultiplexing the at least one source video stream to yield an
audio stream.
11. The method of claim 10 wherein transforming further comprises
speech recognition processing of the audio stream to yield a text
file.
12. The method of claim 11, wherein transforming further comprises
processing the text file for at least one of error correction and
language translation.
13. The method of claim 11, wherein transforming is tailored to a
class of destination device.
14. The method of claim 1, further comprising delivering the
destination media to at least one destination device.
15. The method of claim 14, wherein delivering the destination
media comprises running an interactive voice response system in
response to instructions from the user.
16. The method of claim 15, wherein delivering the destination
media comprises loading a voice mailbox.
17. The method of claim 16, wherein delivering the destination
media comprises playing the destination media to the user in
response to at least one of DTMF and voice instruction.
18. The method of claim 15, wherein delivering the destination
media includes generating VXML and storing the VXML on a
server.
19. The method of claim 18, wherein delivering the destination
media further includes receiving a call from the destination device
in a VXML gateway.
20. The method of claim 19, wherein delivering the destination
media further includes fetching a URL from the server and receiving
the generated VXML in the VXML gateway.
21. The method of claim 14, wherein the at least one destination
device comprises at least one of a wired telephone, a wireless
telephone, a smart phone, a facsimile machine, a personal digital
assistant, a pager, a radio, and an electronic picture frame.
22. The method of claim 14, wherein the at least one destination
media is delivered in near real-time.
23. The method of claim 14, wherein delivering the destination
media comprises storing the destination media in a server prior to
delivering the destination media to the at least one destination
device.
24. The method of claim 14, wherein delivering the destination
media comprises storing the destination media to the at least one
destination device.
25. The method of claim 14, wherein delivering the destination
media is performed according to at least one of a predetermined
time and a predetermined time interval.
26. The method of claim 14, wherein delivering the destination
media is event-triggered.
27. A method for delivering content, comprising: reading profile
data related to a user; step for automatically identifying a
portion of at least one source video stream based on relevance to
the profile data; and step for transforming the identified portion
of the at least one source video stream into a destination media,
wherein the destination media does not comprise a video stream.
28. The method of claim 27, further comprising step for delivering
the destination media to at least one destination device.
29. A system for delivering content, comprising: means for reading
profile data related to a user; means for automatically identifying
a portion of at least one source video stream based on relevance to
the profile data; and means for transforming the identified portion
of the at least one source video stream into a destination media,
wherein the destination media does not comprise a video stream.
30. The system of claim 29, further comprising means for delivering
the destination media to at least one destination device.
31. A system for delivering content, comprising: a server
configured to read profile data related to a user, automatically
identify a portion of at least one source video stream based on
relevance to the profile data, and transform the identified portion
of the at least one source video stream into a destination media,
wherein the destination media does not comprise a video stream; and
an interface to a destination device coupled to the server and
configured to receive the destination media.
32. A system for delivering content, comprising: a server
configured to read profile data related to a user, automatically
identify a portion of at least one source video stream based on
relevance to the profile data, and transform the identified portion
of the at least one source video stream into an audio file; and an
interface to a voice mailbox, wherein the voice mailbox is
configured to receive the audio file from the server and play the
audio file in response to at least one of DTMF and voice
instruction from the user.
33. A system for delivering content, comprising: a server
configured to read profile data related to a user, automatically
identify a portion of at least one source video stream based on
relevance to the profile data, transform the identified portion of
the at least one source video stream into an audio file, store the
audio file, and generate VXML related to the stored audio file; and
a VXML gateway, wherein the VXML gateway is coupled to the server
and configured to receive the generated VXML.
34. The system of claim 33, further comprising an interface to at
least one destination device coupled to the VXML gateway, and
wherein the VXML gateway is configured to deliver the audio file to
the at least one destination device via an interactive voice
response system.
35. A system for delivering content, comprising: a server
configured to read profile data related to a user, automatically
identify a portion of at least one source video stream based on
relevance to the profile data, transform the identified portion of
the at least one source video stream into an audio stream, store
the audio stream, and generate VXML related to the stored audio
stream; and a VXML gateway, wherein the VXML gateway is coupled to
the server and configured to receive the generated VXML.
36. The system of claim 35, further comprising an interface to at
least one destination device coupled to the VXML gateway, and
wherein the VXML gateway is configured to deliver the audio stream
to the at least one destination device via an interactive voice
response system.
37. A method for conveying information derived from a source video
stream to a user comprising: searching for at least one portion of
the source video stream based on preferences of the user; selecting
at least one delivery medium based on at least one of the user's
destination devices; and transforming the at least one portion of
the source video stream into the at least one delivery medium.
38. The method of claim 37, further comprising transmitting the at
least one transformed portion of the source video stream to the at
least one of the user's destination devices.
Description
[0001] This application is a continuation-in-part of nonprovisional
application Ser. No. 10/034,679, which was filed on Dec. 28, 2001,
and claims priority to provisional application No. 60/282,204,
which was filed Apr. 6, 2001, and to provisional application
60/296,436, which was filed Jun. 6, 2001, all of which are hereby
incorporated by reference in their entireties.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The invention relates to the delivery of multimedia assets
to a user. More specifically, the invention relates to a method and
system for transforming streaming video content for delivery to
devices not equipped to receive data in video format.
[0004] 2. Description of the Related Art
[0005] FIG. 6 illustrates a table of representative portable and
non-portable destination devices according to one embodiment of the
invention. For each representative device, FIG. 6 indicates whether
the device typically provides the capability for a user to be
presented with text, audio, image, or video media.
[0006] For instance, a typical wired telephone is configured only
to receive audio, however it is common for cellular or other
wireless telephones to also be equipped for receipt of textual
information, subject to subscription agreements between the user
and a network service provider. As used herein, a smart phone, as
referred to in FIG. 6, may be a more capable device such as a
PalmPhone.TM. or PocketPC Phone (hybrid devices functioning both as
a personal digital assistant and a telephone), a Web-enabled phone
having Wireless Application Protocol (WAP), an I-mode phone (phones
having protocols tailored for access to compatible Web sites), or
other hybrid telephones.
[0007] Facsimile machines, pagers, one or two-way radios, and
personal computers are well-known destination devices.
[0008] Personal Digital Assistants (PDA's) have evolved into a
range of products; FIG. 6 contemplates PDA's with network
communication capabilities. An electronic picture frame, as used
herein, refers to a special class of computers having a network
communication capability, and adapted, typically with a large
high-resolution display, to function as a digital photo display
device. Ceiva's Digital Photo Receiver is an example of an
electronic picture frame. A tablet PC is a type of notebook-sized
personal computer where a user makes inputs via a digital pen and
input panel.
[0009] FIG. 6 thus refers to a wide range of potential destination
devices, many of which are adapted to mobile application
environments. The destination devices of FIG. 6 present various
disadvantages in terms of the type of media that they are capable
of presenting to a user. For example, only selected device types
are capable of receiving and presenting video streams to a user.
Moreover, even within those few device types, only selected models
have such capability. Video broadcasts, however, represent an
abundant source of information. Thus, even where advances are made
in searching video stream sources, there exists a need for systems
and methods to transform video stream content for delivery to
destination devices not equipped to receive streaming video
media.
[0010] The foregoing description of the known art is hereby applied
to the detailed description of the invention to the extent that
such disclosure enables one to practice the invention, or for other
reasons.
SUMMARY OF THE INVENTION
[0011] In one exemplary embodiment, the invention relates to a
method for delivering content, including: reading profile data
related to a user; automatically identifying a portion of at least
one source video stream based on relevance to the profile data; and
transforming the identified portion of the at least one source
video stream into a destination media, wherein the destination
media does not comprise a video stream.
[0012] In another embodiment, the invention provides a system for
delivering content, having: a server configured to read profile
data related to a user, automatically identify a portion of at
least one source video stream based on relevance to the profile
data, and transform the identified portion of the at least one
source video stream into a destination media, wherein the
destination media does not comprise a video stream; and an
interface to a destination device coupled to the server and
configured to receive the destination media.
[0013] The features and advantages of the invention will become
apparent from the following drawings and description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] The invention is described with reference to the
accompanying drawings. In the drawings, like reference numbers
indicate identical or functionally similar elements. Additionally,
the left-most digit of a reference number identifies the drawing in
which the reference number first appears.
[0015] FIG. 1 demonstrates an exemplary methodology for media
processing according to one embodiment of the invention.
[0016] FIG. 2 illustrates an architecture for implementing an
exemplary embodiment of the invention.
[0017] FIG. 3 demonstrates a more specific hardware architecture
according to another exemplary embodiment of the invention.
[0018] FIG. 4 is an exemplary page view of a page viewed by a user
utilizing a client according to one embodiment of the
invention.
[0019] FIG. 5 demonstrates a page view showing a content retrieval
page according to the exemplary embodiment shown in FIG. 4.
[0020] FIG. 6 illustrates a table of representative destination
devices according to one embodiment of the invention.
[0021] FIG. 7 is a flow diagram illustrating transformation of
video source data according to one embodiment of the invention.
[0022] FIG. 8A is a system diagram illustrating a functional
architecture according to one embodiment of the invention.
[0023] FIG. 8B is a flow diagram illustrating a method for
delivering video source content according to one embodiment of the
invention.
[0024] FIG. 9A is a system diagram illustrating a functional
architecture according to one embodiment of the invention.
[0025] FIG. 9B is a flow diagram illustrating a method for
delivering video source content according to one embodiment of the
invention.
DETAILED DESCRIPTION
[0026] While the invention is described below with respect to
various exemplary embodiments, the invention is not limited to only
those embodiments that are disclosed. Other embodiments can be
implemented by those skilled in the art without departing from the
spirit and scope of the invention.
[0027] The invention solves the above-discussed problems and
provides a personalized, customizable multimedia delivery service
that is convenient and easy to use. In one embodiment of the
invention, the service works by recording all of the video streams
of appropriate source and interest to a target audience. For
example, the service may record content from a collection of (or a
particular one of) sports or news channels on television. In
another example, the service may record content related to training
videos, presentations or executive meetings in a business, school
or other particularized environment. Recording may occur as the
content is originally being broadcast (i.e., live), afterwards from
recorded media, or even before the content is broadcast to its
intended audience.
[0028] Once the content is captured and recorded, it can be
segmented, analyzed and/or classified, and thereafter stored on a
platform. For example, the content can be broken down into its
component parts, such as video, audio and/or text. The text can
include, for example, closed caption text associated with the
original transmission, text generated from an audio portion by
speech recognition software, or a transcription of the audio
portion created before or after the transmission. In the latter
case, it becomes possible to utilize the invention in conjunction
with executive speeches, conferences, corporate training, business
TV, advertising, and many other sources of video which do not
typically have available an associated textual basis for searching
the video.
[0029] Having obtained or generated the text, it can then be used
as a basis for searching the multimedia content. In particular, the
text provides the basis for an exemplary methodology for overcoming
the above-identified problems associated with searching video in
the prior art. That is, if a user wishes to search the stored
content for video segments relevant to the President of the United
States discussing a particular topic, then the President's name and
the associated topic can be searched for within the text associated
with the video segments. Whenever the President's name and the
associated topic are located, an algorithm can be used to determine
which portion of an entire video file actually pertains to the
desired content and should therefore be extracted for delivery to
the user. Thus, if a video file comprises an entire news broadcast
about a number of subjects, the user will receive only those
portions of the broadcast, if any, that pertain to the President
and the particular topic desired. For example, this could include
segments in which the President talks about the topic, or segments
in which another talks about the topic and the President's
position.
[0030] Once the pertinent segments of the broadcast have been
appropriately extracted, for a given user, they can be stitched
together for continuous delivery to that user. In this way, for
example, the segments can be streamed to the user as a means of
providing an easy-to-use delivery methodology for the user, and as
a means of conserving bandwidth. Users can view the delivered
multimedia asset in its entirety, skip between the assets, or view
only portions of the assets, as they desire. Moreover, a user can
have access to portions of the original video file that occurred
immediately before or after the extracted segments; for example,
the user could choose to watch the entire original video file. Such
access can be granted by including a "more" or "complete" button in
a user interface.
[0031] In one embodiment of the invention, a profile of the user is
stored which specifies criteria for searching available multimedia
assets. The criteria may include, for example, key words and/or
phrases, a source(s) of the content, etc. The profile can be set
directly by the user via interaction with an appropriately designed
graphical user interface (GUI). When such a profile is available,
the invention is capable of automatically searching the available
assets on a periodic basis, and thereafter extracting, combining
and delivering the compiled assets (or segments thereof, regardless
of their original source) to the user. In one embodiment, the
invention can be utilized such that a service platform assisting in
implementing the invention notifies the user whenever new
multimedia assets consistent with the user's profile have been
prepared. In another embodiment, the invention may automatically
deliver multimedia assets in accordance with a user's profile
according to a predetermined schedule, such as hourly or daily.
Alternatively, the invention may notify the user of the presence of
desired video clips, rather than actually deliver those clips.
[0032] The assets can be classified and indexed on-the-fly as they
are received. In this way, the assets can be compared against the
user's profile virtually in real-time, so that results can be
provided to the user (and the user can be notified) whenever they
become available. Furthermore, a user can provide criteria for a
search or searches beyond those set in the user's profile.
[0033] The identified assets can be delivered to the user in a
variety of manners. For example, delivery may occur via cable or
satellite television, or directly to a personal computer. The
invention can be practiced via a plurality of platforms and
networks. For example, the invention may be practiced over the
Internet to reach a large consumer audience, or it may be practiced
over an Intranet to reach a highly targeted business or industry
target.
[0034] In one embodiment, the invention allows video streaming of
identified video clips. Video streaming (i.e., allowing the viewing
of a video clip as it is downloaded rather than only after it is
downloaded, which speeds the viewing process and largely obviates
the need for video storage at the user location) is a
communications technique that is growing in popularity with the
increasing availability of both video players (especially for use
with personal computers) and bandwidth to the average consumer.
However, no conventional service allows users to accurately and
quickly find desired clips for playing, and do not provide a ready
means for providers to profit from the video streams that are
provided.
[0035] When streaming the identified video clips, users may receive
only those video clips identified by a search executed on the
user's behalf. However, if a user desires, he or she may also
choose to view an entire program from which the clip(s) was
extracted. A user may also be allowed to choose some or all of the
video clips for long-term storage, whereby the clip(s) can be
archived for later use. In one embodiment, the user may store the
clips at a local computer, and thereafter make the clips available
to other users connected via a peer-to-peer network.
[0036] In another embodiment, the invention allows improved
video-on-demand (VOD). VOD is typically defined in the
cable/satellite television arena as the ability to request
programming at any time and to have VCR-like controls over the
content being streamed to the TV. The invention adds value to
conventional VOD by allowing the user to demand video more
accurately and completely.
[0037] An extension to VOD is personal video recorder (PVR)
technology, which allows even more control over TV programs being
viewed. Current PVR implementations are offered by TiVo and
ReplayTV, and allow users great flexibility in storing programs for
later viewing and/or manipulation in viewing (e.g., skipping over
commercials in a television program). The invention provides a
searching tool for allowing users to find interesting programs,
even from a variety of channel sources, to thereafter be recorded
and viewed using PVR technology.
[0038] Moreover, whereas conventional PVR records only entire
programs based on a user's directions, the invention permits the
recording of only those portions of programs that the user desires.
In this regard, the invention contemplates recording the desired
portions either by doing so directly from the program, or by
recording the entire program locally and then utilizing only those
portions of the program desired by the user.
[0039] Having described various exemplary embodiments of the
invention, it should be noted that the terms "video file," "video
input," "video," "video program" or any similar term refers
generically to any analog or digital video information, including
any content associated therewith, such as multimedia content,
closed caption text, etc. The terms "clip," "video clip,"
"electronic clip" or "eClip" should be understood to refer to any
subsection of a video program that is selected based on a user
search criterion. Also, the terms "extracting," "parsing,"
"removing," "accessing" or any similar term with respect to a video
file refers to the use of a selected portion of the video file.
Such use may include literal removal (permanent or temporary) from
the context of a larger file, copying of the selected portion for
external use, or any other method for utilizing the selected
portion.
[0040] Based on the above-described features of the invention, a
user may accurately, completely and promptly receive multimedia
assets that he or she finds interesting, and may conveniently
exploit the received assets in a manner best-suited to that
user.
[0041] FIG. 1 demonstrates an exemplary methodology for media
processing in a digital video library (DVL) according to one
embodiment of the invention. Such media processing is used in
implementing the invention at a user level, by capturing,
segmenting and classifying multimedia assets for later use and
manipulation. It should be noted that the media processing
implementation of FIG. 1 and discussion of associated concepts are
provided in greater detail in the following documents, which are
hereby incorporated herein by reference: Shahraray B., "Scene
Change Detection and Content-Based Sampling of Video Sequences,"
Proc. SPIE 2419, Digital Video Compression: Algorithms and
Technologies, pp. 2-13, February 1995; Shahraray B., Cox R.,
Haskell B., LeCun Y., Rabiner L., "Multimedia Processing for
Advanced Communications Services", in Multimedia Communications, F.
De Natale and S. Pupolin Editors, pp. 510-523, Springer-Verlag,
1999; Gibbon D., "Generating Hypermedia Documents from
Transcriptions of Television Programs Using Parallel Text
Alignment," in Handbook of Internet and Multimedia Systems and
Applications, Borko Furht Editor, CRC Press 1998; Shahraray B.
"Multimedia Information Retrieval Using Pictorial Transcripts," in
Handbook of Multimedia Computing, Borko Furht Editor, CRC Press
1998; and Huang Q., Liu Z., Rosenberg A., Gibbon D., Shahraray B.,
"Automated Generation of News Content Hierarchy By Integrating
Audio, Video, and Text Information," Proc. IEEE International
Conference On Acoustics, Speech, and Signal Processing ICASSP'99,
pp. 3025-3028, Phoenix; Ariz., May 1999.
[0042] In FIG. 1, multimedia assets including video 105, associated
text captions 110 and corresponding audio portions 115 are imported
into the system for processing. Content-based sampling engine 135
receives the video 105 and segments it into individual shots or
video frames; this information will be combined with information
extracted from the other components of the video program to enable
the extraction of individual stories (i.e., video segments related
to a particular topic or topics), as will be described.
Additionally, this process allows a representative image for a
particular story, segment or clip to be selected by engine 160; and
second, the process allows boundaries around the story, segment or
clip to be set by engine 155.
[0043] A database 120 of linguistic rules is used by linguistic
analysis engine 140 to combine the caption information 110 with the
segmented video within engines 155 and 160, to thereby assist in
the functionality of those two engines. Similarly, information
within model databases 125 and 130 is used by acoustic
classification engine 145 and program identification engine 150 to
provide segmentation/identification of commercials and programs,
respectively. Once the multimedia asset(s) have been captured,
segmented and classified as described above, they can be stored
thereafter in DVL database 165.
[0044] All of the information from engines 135-150 is utilized in
engines 155 and 160 to discern a length of a particular video story
or clip that will be associated with each topic. In particular, for
example, multimodal story segmentation algorithms such as those
described in "Automated Generation of News Content Hierarchy By
Integrating Audio, Video, and Text Information" (above) can be used
to determine an appropriate length of a video clip to be associated
with a particular topic. Similarly, the algorithm can be used in
conjunction with the user profile to either compare the profile
information to newly-acquired content on-the-fly, or to similarly
determine an appropriate length for a video clip to be associated
with a particular portion of the user profile.
[0045] As referred to above, textual information used to identify
clips of interest can be derived, for example, from closed caption
text that accompanies most television programs. Real-time closed
captioning typically lags behind the audio and video by a variable
amount of time from about 1 to 10 seconds. To take this factor into
account, the embodiment of FIG. 1 is capable of using speech
processing to generate very accurate word timestamps.
[0046] When closed caption text is not available, a large
vocabulary automatic speech recognition system can be used to
generate a transcript of the audio track. While the accuracy of the
automatically generated transcripts is below that of closed
captions, they provide a reasonable alternative for identifying
clips of interest with reduced, but acceptable, accuracy.
Alternatively, a parallel text alignment algorithm can be used to
import high quality off-line transcripts of the program when they
are or become available.
[0047] FIG. 2 implements an architecture for implementing an
exemplary embodiment of the invention. It should be noted that the
architectural elements discussed below can be deployed to a user
and/or provider of multimedia assets in whole or in part, and
therefore each element interfaces with one another and external
components using standard, conventional interfaces.
[0048] In FIG. 2, Video Capture/Media Analysis component 205
records and compresses broadcast TV programming. Also at component
205, various functions can be performed on the content such as
scene change detection, audio analysis, and compression. These
video files are shipped to the Video Storage database 210 from
which they will be served when the video is streamed to the client
250.
[0049] Associated metadata is shipped to the Metadata database 215.
Note that thumbnail images are included as part of the metadata, as
well as terms and/or phrases associated with a clip(s) for
categorizing the clip(s) within a topical subset. Typically, this
video capture/media analysis process need not occur in real time.
However, there is no reason why it could not occur in real time if
an operator so desires and wishes to devote sufficient
computational resources. In any case, it is not necessary to wait
until a show is completed before indexing and searching that
show.
[0050] Video Server 220 responds to clip requests and makes the
video content available to the client 250. For example, the video
server 220 may download the video clips in whole or in part, stream
the clips (e.g., via MPEG4 ASF or MPEG2) to the client 250 or
generate the clip metadata discussed above (such as terms and/or
phrases associated with a clip for categorizing the clip within a
topical subset).
[0051] DVL Server 225 handles query requests (such as how many
clips are available, which shows have clips, etc.) and/or clip
content requests (metadata that describes clip content including
"clip pointer" to video content). Thus, it handles multimedia
search (such as closed caption text) and determines the start and
stop times of the clips, which are designated with "clip pointers,"
as just mentioned.
[0052] eClips server 230 handles client requests for web pages
related to a service for providing eClips. eClips server 230
utilizes Perl Common Gateway Interface (CGI) scripts that the
client navigates in order to perform the functions of the eClips
service. For example, the scripts deal with login/registration
related pages, home page, profile related pages, archive related
pages, player pages, and administration related pages. Player
scripts can be launched in a separate window. Each CGI request from
the client 250 will return HTML with HTML DIVs, JavaScript, and CSS
style sheets. The DIVs and CSS style sheets are used to position
the various elements of the page. DHTML is used to dynamically load
DIV content on the fly (for instance, a list of shows in an instant
search pulldown performed by a user).
[0053] In FIG. 2, three databases 235, 240 and 245 are shown as
Extensible Markup Language (XML) databases. Thus, Perl scripts can
be utilized to access (i.e., read from and/or write to) these
databases via XML. Specifically, these three databases include show
database 235, which contains information about recorded broadcasts,
Profile database 245, which contains personal search terms and/or
phrases, and Archive database 240, which contains saved clip
information (e.g., entire clips or simply clip pointers).
[0054] eClips Client 250, in one embodiment, includes a JavaScript
that each Perl script includes in the HTML that is returned from
the eClips server 230. It is through the JavaScript that the client
250 interacts with the DVL server 225 to determine the desired
content and through JavaScript that the client initiates the
streaming content with the video server 220. The JavaScript also
accesses (reads) the Show and Profile XML files in those
databases.
[0055] The Video Server 220 may have a separate IP host name, and
should support HTTP streaming. The DVL and eClips servers 225 and
230 may have the same IP host name, and may be collocated within a
single machine.
[0056] In FIG. 2, the key interactions that cause video to be
streamed to the client 250 are demonstrated. In a home page view, a
user has logged in already and should see a list of topics
determined by their profile, as well as the number of clips for
each topic. An example of a topic could be "sports" and the keyword
string associated with this topic could be football, baseball,
hockey. The keyword string is used to search the CC text (in this
case, clips that have any of these terms will be valid).
[0057] When the home page is loaded, JavaScript will send a CGI
query to DVL server 225, which generates an XML response. The XML
is parsed into JavaScript variables on the client using the XML
document object model (DOM). The CGI query and XML response is
implemented as part of the DVL system and acts as a layer above an
Index Server, which, as part of the DVL server 225, performs text
indexing of the video clips (as discussed above) that allows the
user to locate a desired clip. The XML response will include the
number of clips found for each topic. It is with these query
responses that the home page knows which topics have hits and can
activate the links to play the content.
[0058] These JavaScript links, when clicked, can launch the player
page in a separate window. When the player page is loaded,
essentially the same JavaScript can be used to recalculate the
number of clips for each topic. In principle, this could be changed
to calculate this only once and to pass this on to the player
script thereafter. The JavaScript may also run a query to get the
list of shows with clips for a particular topic. The JavaScript
then loops through all the shows with hits and queries the DVL
server via the separate CGI script to get the clip information
needed to play the clip. This information is also returned via XML
and parsed via the JavaScript. The JavaScript loads various DIVs
that depend on this information, such as hit search term found in
CC text, CC text, and thumbnail. Finally, the player page
JavaScript starts the media player with the first clip using a
pointer (start time) to the video. It should be noted that, in one
embodiment of the invention, the just-described process is almost
completely automated, so that dynamic clip extraction occurs when a
clip is selected, and a show automatically starts and will play
completely through if not interrupted by the user.
[0059] In the architecture shown in FIG. 2, eClips client 250 may
reside on, for example, a user's home or business computer, a
personal digital assistant (PDA), or a set-top box on a user's
television set. Client 250 interacts with eClips server 230 as
discussed above to provide the user with an interface for viewing
and utilizing the video clips. Client 250 can be written to
contain, for example, a JavaScript object that contains profile
results (eClips object). A user using eClips client 250 running on
a PC may access stored clips through a network, such as the
Internet or a locally defined Intranet.
[0060] In one embodiment, the user defines a search criterion,
either through an "instant search" feature or within a user
profile. When multiple clips are found matching the user search,
the clips can be stitched together and streamed to the user as one
continuous program. In another embodiment, eClips server
periodically searches for clips matching a given user's profile,
and makes the clips available to the user, perhaps by notifying the
user via email of the availability of the clips.
[0061] The architecture shown in FIG. 2 allows for video to be
stored and displayed in several formats including MPEG2 (e.g., for
digital television and video on demand) and MPEG4 (e.g., for
streaming video on the Internet). As mentioned above, the video may
be stored for later use by the user; in particular, a user may
archive some or all of the received video and thereafter permit
searching and uploading of the video from storage by other members
of a peer-to-peer computer network.
[0062] FIG. 3 demonstrates a more specific hardware architecture
according to another exemplary embodiment of the invention. In FIG.
3, video feeds 310 are received through various sources (such as
television channels CNN, ESPN and CNBC) at Video Capture/Media
Analysis component 205 within Video Distribution Center 305.
Component 205 receives the feeds and forwards captured/analyzed
results to video server 220 and/or DVL/eClips server 225/230 within
cable Headend 325. In FIG. 3, video analysis portion 315 is
illustrated within component 205, although it should be understood
from FIG. 2 and the associated discussion above that component 205
may perform other media analysis such as audio analysis. The
DVL/eClips servers 225/230 operate as described above in
conjunction with FIG. 2 to deliver, using, for example, Hybrid
Fiber Coaxial (HFC) connections, all or part of the video feeds to
routing hub 330, and then through fiber node 340 to cable modem 350
located within user home 355. Additional marketing and advertising
(such as a commercial placed between every third clip stitched
together) could be tied into the video stream in one embodiment of
the invention at the Headend from providers 320 such as Double
Click.
[0063] Within user home 355 the feed is received at cable modem 350
via high speed data line (HSD) to a PC 360 running eClips client
250. Alternatively, the feed could be sent to Set top box 370 atop
TV 380, where Set top box 370 runs eClips client 250. In the
example where the video clips are received via cable modem 350, the
service can be streamed as high speed data (HSD) through a cable
modem as MPEG4 video. When the video is received via Set top box
370, it can be delivered as MPEG2 over video on demand (VOD)
channels that could be set up in advance for a service providing
the invention.
[0064] FIG. 4 is an exemplary page view of a page viewed by a user
utilizing an eClips client according to one embodiment of the
invention. In FIG. 4, for example, the user might see page view 400
just after logging in to a system implementing the invention. In
page view 400, section 405 demonstrates the results of a profile
search performed for the user on a given day, or over some other
pre-defined period, according to the previously stored profile of
that user. In section 405, clips are listed both by topic and by
number of clips related to that topic. In section 405, the user
therefore has the option of viewing one or more of the clips
related to a particular topic.
[0065] Section 405 also identifies a source for the criteria used
to select the various topical clips. More specifically, on a
profile page, a user can select default sources (shows) which will
be searched based on the user's profile; this is referred to as a
"Main" list, and would restrict any profile topic that has the Main
option to search only those shows selected on the profile page. On
a topic editor page, where a user is allowed to add or modify
topics for searching, the user can specify this Main list, or can
make Custom selections that are only valid for a particular search
topic. In section 405, the user has selected the latter option, and
so a "source" is shown as Custom.
[0066] In section 410, the user additionally has the option of
entering new search terms and/or phrases not related to his or her
current profile, whereby the invention searches a clips database
via DVL server as described above with respect to FIG. 2. Section
415 indicates the media sources which will be searched for the
terms or phrases entered in section 410.
[0067] Also, in page view 400, button 420, "Play all clips," allows
a user to view all currently available clips with one click. The
user can add a new topic using button 425. The user can return to a
home page by clicking on button 430 (although this option is only
valid when the user is on a page different from the home page 400
itself), access his profile via button 435 and access an archive of
previously saved clips via button 440. Finally, a user can log out
of the service using button 445.
[0068] FIG. 5 demonstrates a page view 500 showing a content
retrieval page according to the exemplary embodiment shown in FIG.
4. In section 505, still frames of the beginning of each clip
(i.e., thumbnails) within a topic can be viewed by the user.
Section 505 can be controlled by section 515, which allows the user
to select a topic of clips to be shown, as well as section 520,
which allows a user to select a portion of the clips from that
topic that will be played. With buttons 560 and 565, a user may
clear or select all of the clips being shown within a particular
topic.
[0069] When one or more of these clips is chosen for viewing by the
user, that clip is shown in section 510. Section 510 can be
controlled by buttons 525-550, which allow a user to skip to a
previous clip with button 525, stop the clip with button 530, play
the clip with button 535, skip the clip with button 540, switch to
a new topic of clips with button 545 or view footage after the
selected clip(s) with button 550. Note that section 510 may also
include advertisements 555, and may display a time remaining for a
currently playing clip, a source of the clip, and a date and time
the clip was originally broadcast.
[0070] In one exemplary embodiment of the invention, page 500 will
play all of the clips currently available in a predetermined order
(e.g., reverse chronological order, by source of content, etc.) if
the user does not choose a specific topic or clip. Button 570 is
activated when a user wants to view the clip(s) available; i.e., as
shown in view 500. Button 575 allows the user to send (e.g., email)
the clip(s) to another user, and button 580 allows the user to save
the clip(s) to an archive (i.e., the archive accessed by button 440
in FIG. 4).
[0071] Having discussed various exemplary embodiments of the
invention and associated features thereof, as well as potential
uses of the invention, the following provides a more detailed
summary of application categories in which the invention is of
use.
[0072] Generally speaking, because the invention can capture
content from nearly any multimedia source and then use standard
streaming media to deliver the appropriate associated clips, it is
nearly limitless in the markets and industries that it can
support.
[0073] As a practical matter, the invention can be packaged to
address different market segments. Therefore, it should be assumed
that the target markets and applications supported could fall into,
for example, any or all of the Consumer, Business-to-Consumer or
Business-to-Business Marketplaces. The following discussion
summarizes some exemplary application categories.
[0074] First, as a consumer offering, the invention can be provided
as an extension to standard television programming. In this model,
an ISP, Cable Programming Provider, Web Portal Provider, etc., may
allow consumers to sign up for this service, or the set of features
provided by the invention can be provided as a premium
subscription.
[0075] In the consumer service model, a consumer would enter a set
of keywords and/or phrases in the profile. In addition, as part of
the preferences selected in the profile the user may determine that
only specific content sources should be monitored. As the user
profile is created or changed it would be updated in the user
profile database. As video content is captured in the system, the
user profile database is matched against the closed caption text.
As an example, a consumer may be interested in sports but only want
to see the specific "play of the day." In this scenario, the
consumer would enter the key words "play of the day" and then
identify in the profile the specific content sources (channels or
programs) that should be recorded/analyzed by the invention. For
example, the consumer could choose channels that play sports games
or report on sports news. When the consumer returns from work that
evening, a site or channel for accessing the invention would be
accessed. This consumer would then see all of the clips of programs
that matched the keywords "play of the day," meaning that this
consumer would see in one session all of the content and clips
matching that set of words.
[0076] As another example, in a Business-to-Consumer offering, the
invention can be provided as an extension to standard television
programming. In this case, both the programming and its sponsorship
would be different from the consumer model above. For example, a
corporate sponsor or numerous corporate sponsors may offer specific
types of content, or may offer an assemblage of content overlaid
with advertising sponsorship. The sponsorship would be evident in
the advertising that would be embedded in the player or in the
content, since the design of the invention is modular in design and
allows for customization.
[0077] In the Business-to-Consumer service model, a consumer would
enter a set of keywords in the profile. As the user profile is
created or changed it would be updated in the user profile
database. Because this model and the content provided would be
underwritten by corporate sponsorship, the content provided may be
limited to a proprietary set of content. As an example, if CNN were
the sponsor of the service, all of the content provided may be
limited to CNN's own broadcasts. In addition, it may be very
evident to the consumer that the service is brought to them by CNN
in that the CNN logo may be embedded in the user interface, or may
be embedded in the content itself.
[0078] Next, as a Business-to-Business offering, the invention can
be used in intra-company applications as well as extra-company
applications. The applications supported include, as just a few
examples: Business TV, Advertising, Executive Announcements,
Financial News, Training, Competitive Information Services,
Industry Conferences, etc. In essence, the invention can be used as
a tool to assist employees in retrieving and viewing specific
portions of content on demand.
[0079] In this Business-to-Business service model, a user would
enter a set of keywords in the profile that would be updated in the
user profile database. In this case, the content captured will be
dependent upon the business audience using the service.
[0080] In an intra-business application, the user may wish to
combine sources from within the business and sources outside of the
business. As an example a user may wish to see all clips dealing
with the category "Virtual Private Networks." In this example, a
business may have planned a new advertising campaign talking about
"Virtual Private Networks" and have an advertisement available to
its internal personnel. At the same time, there may be an internal
training class that has been recorded and is available internally
in which a section talks about "Virtual Private Networks." Again,
this could be another content option captured by the invention.
Also, one of this company's competitors may have provided a talk at
an industry conference the day before about their solution for the
"Virtual Private Network" area. As with the other content options,
this too could be captured and available as a content option
through the invention. Therefore, when our user begins a session
using the invention and looks under the term "Virtual Private
Networks," there could be numerous clips available from multiple
sources (internal and external) to provide this user with a
complete multimedia view of "Virtual Private Networks".
[0081] As an extra-business tool, the invention can provide
businesses, their suppliers, their best customers, and all other
members of communities of interests with specific targeted content
clips that strengthen the relationships. These may include (but not
be limited to) product details, new announcements, public relations
messages, etc.
[0082] As further examples of applications of the invention, the
following represent industry applications which may benefit from
use of the invention.
[0083] In the financial industry, financial information can be
available for both professionals and potential clients to receive
late-breaking information on stocks, companies and the global
markets. The information can be from a variety of sources such as
Financial News Network, Bloomberg, CNN, etc. and allow users to
identify key areas of interest and to continually be up to
date.
[0084] In the advertising/announcements industry, advertisers would
be able to target their ads to consumers based on peoples'
preferences as expressed in their profiles. This is potentially a
win/win situation because people would not be getting any more ads
but they would be seeing more things that interest them.
Advertisers could charge more for this targeted approach and
thereby pay for any costs associated with the invention.
[0085] Similarly, large companies run TV advertisements for a
multitude of products, services, target markets, etc. These
companies could benefit by housing these commercials on an on-line
database that can be accessible to their marketing staff, the
advertising agencies, and clients interested in seeing particular
commercials that used specific words or product names. The
invention can then allow these commercials to be easily searched
and accessed.
[0086] In the entertainment industry, the movie industry can use
the invention to easily scan through archives of old and new movie
footage that can be digitized and stored in a central repository.
Sports highlights can be made available for particular games or
events. Networks could maintain a library of indexed TV shows
(e.g., PBS) where users can search for a particular
episode/topic.
[0087] In the travel industry, searches can be done on new
information in the travel industry such as airlines, causes of
delays, etc. In addition, the invention can be used to provide key
clips from specific resorts and other potential vacation
destinations.
[0088] In the distance learning/education industry, a large variety
of courses could be stored on-line. In many circumstances, a user
may want to only see the salient points on a specific topic of
interest. The invention can then play a key role in providing
support to the user for access and retrieval of the key needed
information.
[0089] For conferences and trade events, the invention can be an
information dissemination tool for finding the latest information
quickly when videos are captured of talks and demonstrations in key
events.
[0090] One embodiment of the invention relates to current bandwidth
shortages and limitations which sometimes limit the-prompt and
effective provisioning of streaming video and other media. For
example, Internet users, particularly home Internet users, often do
not have access to high-speed data rates such as those found in
cable and/or fiber-optic transmissions. As a result, such users
often experience a significant delay between the time a video
stream is selected and the time the stream actually begins to play.
This delay time may be additionally and/or further exacerbated by
the need to buffer an initial portion of the video stream locally,
so that the video stream will play smoothly once it does begin to
play. These shortcomings of conventional streaming techniques may
therefore also affect the provisioning of eClips servers according
to the invention, as has already been described.
[0091] In order to alleviate the need for a user of the eClips
service or other media streaming service to wait in front of a
blank screen while the media prepares to play, the invention
provides relevant information to the user during such a potential
wait time, thereby providing entertainment, advertising or other
services and reducing the apparent wait time until playing begins.
For example, with respect to the eClips service described above, a
user receiving a customized media presentation might have
information relevant to the subject matter of the presentation
automatically downloaded from a DVL/eClips server during an
off-time (such as late at night). The relevant information can be
determined based on, for example, a user profile set up as part of
the eClips service and in a manner similar to that described above
for formulating the customized media presentation itself. The
information might also be information previously obtained and
stored locally by the user for viewing which has simply not yet
been viewed by the user. This way, the information can be made
available on the user's local hard drive, and can therefore be
played immediately upon selection of a particular media stream,
during the time when the media stream is being delivered and/or
buffered for viewing. While the information is being displayed, the
viewer may choose to see the information in its entirety before
viewing the particular video stream selected for viewing. In
another embodiment, however, the user may discontinue viewing the
local information as soon as the primary stream becomes
available.
[0092] Relevant information that might be embedded into a media
stream being delivered as just described might include, for
example, information about the subject matter of the stream or
information related thereto, such as advertising for related
products or services. Additional possibilities for embedding into
the media stream include graphics, games, text, pictures and other
types of known media assets. The invention might operate through
the use of multiple media players, perhaps displaying only one
instance of a particular video stream. For example, if the
particular video stream is selected for viewing on a certain media
player, the invention might automatically (or optionally) open a
second media player for playing the locally stored information to
be displayed prior to the playing of the primary video stream.
Moreover, the invention might display multiple pieces of relevant
information, so that the user may choose what to view during the
wait time for the primary stream. For example, a number of video
thumbscreens or video shots might be displayed from which the user
can choose for viewing. Software at the user's local system may be
operable to set forth a criterion and/or timing according to which
information to be embedded is located, stored and/or displayed.
Alternatively, this functionality may be enabled at a server
location, for example, using the DVL/eClips server discussed
above.
[0093] In another embodiment, the invention embeds locally stored
media into a video or other media stream to be presented to the
viewer, so that the user avoids any wait time in viewing the
selected stream that may occur due to bandwidth shortages or other
system considerations. The locally stored media may be relevant to
the content of the primary stream, so that the user does not have
to wait an undue amount of time to view information about a desired
topic.
[0094] Although large multimedia files often must be delivered via
broadband communication links, the fact that the invention extracts
exactly what the user is interested in makes it possible to deliver
downloadable content to portable devices efficiently. The content
can include video clips as discussed primarily above, or can be
limited to still frames and text (or just text) if
bandwidth/storage does not permit full motion video with audio.
Hybrid schemes are also contemplated in which some of the content
includes video, but other (e.g., perhaps older, or repeated similar
stories from multiple sources) clips only include audio, or include
only still images and/or text. In this regard, multimedia analysis
techniques can be used to determine if stories are about the same
topic, or contain the same video material. Because the invention is
capable of using standard access and delivery methods, it can be
employed in virtually any home or industry application where
delivery of multimedia assets is desired.
[0095] FIG. 7 is a flow diagram illustrating transformation of
video source data 710 according to one embodiment of the invention.
FIG. 7 illustrates several alternative transformation paths to
deliver content of video source 710 to destination devices 775. As
used herein, video source data 710 may be live streaming video,
delayed streaming video, or stored video data.
[0096] Sampling function 715 processes video source data 710 to
produce static images 720. In an embodiment where video source data
710 is streaming video, capture process 723 produces a video file
725 from video source data 710. Static images 720 or video files
725 are then delivered to destination devices 775.
[0097] FIG. 7 illustrates that demultiplexing process 745 processes
video source file 710 to obtain or produce audio stream 750. The
flowchart shows that there are at least four options for the
delivery of audio stream 750. First, audio stream 750 can be
delivered to destination devices 775 directly. Second, capture
process 753 can create sound file 755 from audio stream 750 for
eventual delivery to destination devices 775 via link 780. Third,
speech recognition process 760 can process audio stream 750 to
produce text 765. Text 765 can then be delivered to destination
devices 775. Fourth, process 768 can further process text 765 to
provide for correction of errors generated by the speech
recognition process 760, or may, either in the alternative or in
combination, translate text 765 to another language to produce
processed text 770. Processed text 770 can then be delivered to
destination devices 775.
[0098] In addition, FIG. 7 illustrates that extraction process 728
generates Closed Caption Text (CCT) 730 from video source data 710.
Process 733 corrects for errors in CCT 730, provides language
translation, and/or performs other translations to generate
processed CCT 735. Processed CCT 735 may be delivered directly to
destination devices 775. In the alternative, text-to-speech process
740 operates on either CCT 730 or processed CCT 735 to produce
audio stream 750, with at least all transformation paths available
as described above with regard to audio stream 750 for eventual
delivery to destination devices 775.
[0099] Destination devices 775 may be or include, for example, any
of the representative devices referred to in FIG. 6 and described
in the background section of this specification. A user's choice of
destination device will effect the manner in which the user will
select and navigate delivered content.
[0100] A user may use a single destination device, or the user may
use multiple devices in combination to receive delivered content.
For instance, a particular user may utilize a facsimile to receive
an image 720 and a wireless telephone to receive an audio stream
750. Where multiple destination devices are used, and where the
media delivered to the multiple destination devices are related,
the delivered content may be associated using tags or other
identifiers that allow a user to align the content received on
multiple devices. For example, audio stream 750 received on a
wireless telephone may be associated to images 720 sent to a
facsimile with reference to a page number of the facsimile
transmission.
[0101] As a general matter, not all transformations described with
reference to FIG. 7 will need to be performed in delivering content
from a source video to a user. Content may be tailored to a target
destination device 775 according to known alerting or notification
utilities that communicate a class of destination device to a
content provider at or near a time of delivery. In the alternative,
or in combination, content may be tailored according to a
predetermined user profile.
[0102] Transformed content may be delivered to destination devices
775 according to alternative timing schemes. For example, CCT 730,
processed CCT 735, audio stream 750, text 765, processed text 770
may be delivered in near real-time (e.g., where content delivery is
delayed only by processing and communication overhead). In other
embodiments, transformed content is stored for later delivery.
Moreover, the timing for delivery of stored content may be
according to a predetermined schedule, such as a set time of day.
In addition, or in the alternative, content can be delivered
according to a set interval of time, such as every hour or other
fixed period of time. The predetermined schedule may be specified
in a user's profile data. In addition, or in the alternative, the
delivery of near real-time and/or stored content may be
event-triggered. For instance, a user profile may specify that
breaking headline news, special reports, and/or severe weather
warnings trigger near real-time delivery of content separate from,
or together with, related stored content.
[0103] Sample process 715, demultiplexing process 745, extraction
process 728, text-to-speech process 740, speech recognition process
760, capture processes 723 and 753, and processes 733 and 768 may
be performed on a server or other network based host computer
having access to video source data 710. Specific embodiments for
delivering audio stream 750 or sound file 755 to destination
devices 775 are provided with reference to FIGS. 8A-9B below.
[0104] FIG. 8A is a system diagram illustrating a functional
architecture according to one embodiment of the invention. As shown
therein, server 810 is coupled to user profile data 825 and is
further coupled to voice mailbox 815 via automatic load path 830.
Although the functions of server 810 and voice mailbox 815 are
distinct, persons skilled in the art will appreciate that server
810 and voice mailbox 815 may optionally be hosted on the same
computer. One or more destination devices 820 are coupled to the
voice mailbox 815 via data retrieval path 835, and may optionally
be coupled to server 810 via link 840.
[0105] The user profile data 825 may be or include, for example,
user identifiers, topics of interest to the user, and other
information. The user profile data 825 is loaded and periodically
updated from destination device 820 or another device to a database
accessible by server 810.
[0106] In one embodiment, the system of FIG. 8A is configured to
perform the functions described with reference to FIG. 8B.
[0107] FIG. 8B is a flow diagram illustrating a method for
delivering video source content according to one embodiment of the
invention. As shown therein, server 810 reads previously stored
profile data 825 in step 850. Server 810 then identifies
information in video source data 710 relevant to topics in the user
profile data 825 in step 855. Server 810 transforms the relevant
video source data 710 into an audio stream 750 or a sound file 755
in step 860. As described above, demultiplexing process 745 can
create audio stream 750 from video source data 710, and capture
process 753 can create sound file 755 from the audio stream 750. In
one embodiment, server 810 streams audio stream 750 to voice
mailbox 815 in step 865. In an alternative embodiment, server 810
loads sound file 755 to voice mailbox 815 in step 865. Server 810
plays the transformed information (i.e., audio stream 750 or sound
file 755) from voice mailbox 815 in step 870.
[0108] In an alternative embodiment, delivery of information to
destination device 820 is under the local control of voice mailbox
815. Where destination device 820 is a wireless phone, voice
mailbox 815 may receive Dual-Tone Multi-Frequency (DTMF) signals
and/or voice commands to effect the delivery of audio stream 750 or
sound file 755 according to user input.
[0109] Where multiple media formats are delivered, server 810 may
also send media to destination device 820 via link 840. For
example, where destination device 820 is a smart phone, a user may
simultaneously receive an audio stream 750 via voice mailbox 815
and an image 720 via link 840.
[0110] FIG. 9A is a system diagram illustrating a functional
architecture according to one embodiment of the invention. The
architecture in FIG. 9A is an alternative approach to the
architecture in FIG. 8A for delivery of audio stream 750 and/or
sound file 755. As shown in FIG. 9A, a Web server 910, having
access to user profile data 905, and including Voice Extensible
Mark-up Language (VXML) generator 915, is coupled to VXML gateway
920. VXML gateway 920 includes Interactive Voice Response (IVR)
system 925 and is coupled to client 930. Client 930 may be a wired,
wireless, or smart telephone, for example, having DTMF and speech
input capability 935. The description of profile data 825 above is
applicable to profile data 905. In one embodiment, the system of
FIG. 9A is configured to perform the process shown in FIG. 9B.
[0111] FIG. 9B is a flow diagram illustrating a method for
delivering video source content according to one embodiment of the
invention. As shown therein, Web server 910 reads previously stored
profile data 905 in step 940. Web server 910 then identifies
information in video source data 710 relevant to the user profile
data 905 in step 945, and transforms the relevant info in step 950.
Server 910 optionally stores the transformed information in step
955.
[0112] Gateway 920 receives a call from client 930, for example at
a toll free number, in step 960. The VXML gateway 920 has a table
that correlates the toll free number with a particular Uniform
Resource Locator (URL) related to a particular application. VXML
gateway 920 fetches the corresponding URL on Web Server 910 in step
965, and Web server 910 generates VXML based on code derived from
the corresponding URL in step 970. Accordingly, when the VXML
gateway 920 runs IVR system 925 to deliver the transformed
information to client 930 in step 980, the application, greeting,
and content of the IVR session may be tailored according to the
incoming call in step 960.
[0113] In one embodiment, step 980 is, or includes, the delivery of
an audio stream 750; in another embodiment, step 980 is, or
includes, delivery of a sound file 755 to be played by client
935.
[0114] In conclusion, a service for providing personalized
multimedia assets such as electronic clips from video programs,
based upon personal profiles, has been presented. In one
embodiment, it uses text to ascertain the appropriate clips to
extract and then assembles these clips into a single session. Thus,
users only see the specific portions of videos that they desire.
Therefore, users do not have to undertake the arduous task of
manually finding desired video segments, and further don't have to
manually select the specified videos one at a time. Rather, the
invention generates all of the desired content automatically.
Moreover, one embodiment of the invention provides an improved
system and method for delivering video content to destination
devices not adapted to receive streaming video.
[0115] While this invention has been described in various
explanatory embodiments, other embodiments and variations can be
effected by a person of ordinary skill in the art without departing
from the scope of the invention.
* * * * *