U.S. patent application number 11/820478 was filed with the patent office on 2008-03-20 for system, method and apparatus of video processing and applications.
Invention is credited to John D. Ralston, Steven E. Saunders.
Application Number | 20080072261 11/820478 |
Document ID | / |
Family ID | 38834076 |
Filed Date | 2008-03-20 |
United States Patent
Application |
20080072261 |
Kind Code |
A1 |
Ralston; John D. ; et
al. |
March 20, 2008 |
System, method and apparatus of video processing and
applications
Abstract
Systems, methods, and apparatuses of providing and processing
video data for delivery to mobile devices.
Inventors: |
Ralston; John D.; (Portola
Valley, CA) ; Saunders; Steven E.; (Cupertino,
CA) |
Correspondence
Address: |
GREENBERG TRAURIG, LLP (SV);IP DOCKETING
2450 COLORADO AVENUE
SUITE 400E
SANTA MONICA
CA
90404
US
|
Family ID: |
38834076 |
Appl. No.: |
11/820478 |
Filed: |
June 18, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60814383 |
Jun 16, 2006 |
|
|
|
Current U.S.
Class: |
725/62 |
Current CPC
Class: |
H04L 65/602 20130101;
G11B 27/036 20130101; G11B 27/034 20130101 |
Class at
Publication: |
725/062 |
International
Class: |
H04N 7/16 20060101
H04N007/16 |
Claims
1. A method, comprising: processing video data; and providing
processed video for mobile delivery.
Description
CLAIM OF PRIORITY
[0001] This application claims priority to U.S. Patent Application
No. 60/814,383 entitled "Video Processing and Applications Server",
which was filed on Jun. 16, 2006, the contents of which are
expressly incorporated by reference herein.
CROSS-REFERENCE TO RELATED PATENT APPLICATION
[0002] This application is related to a copending U.S. patent
application Ser. No. 11/357,661, entitled "MOBILE IMAGING
APPLICATION, DEVICE ARCHITECTURE, SERVICE PLATFORM ARCHITECTURE AND
SERVICES", filed 16 Feb. 2006 with the same assignee as the present
disclosure. The applicants of that application are also applicants
of this application. The disclosure of the above identified
copending application is incorporated in its entirety herein by
reference.
[0003] This application is related to a copending U.S. patent
application Ser. No. 11/232,165, entitled "COMPRESSION RATE CONTROL
SYSTEM AND METHOD WITH VARIABLE SUBBAND PROCESSING", filed 20 Sep.
2005 with the same assignee as the present disclosure. The
applicants of the above applications are also applicants of this
application. The disclosure of the above identified copending
applications is incorporated in its entirety herein by
reference.
[0004] This application is further related to a copending U.S.
patent application Ser. No. 11/232,726, entitled "MULTIPLE
TECHNIQUE ENTROPY CODING SYSTEM AND METHOD", filed 21 Sep. 2005
with the same assignee as the present disclosure. The applicants of
the above applications are also applicants of this application. The
disclosure of the above identified copending applications is
incorporated in its entirety herein by reference.
[0005] This application is further related to a copending U.S.
patent application Ser. No. 11/232,725 entitled "PERMUTATION
PROCRASTINATION", filed 21 Sep. 2005 with the same assignee as the
present disclosure. The applicants of the above applications are
also applicants of this application. The disclosure of the above
identified copending applications is incorporated in its entirety
herein by reference.
[0006] This application is further related to a copending U.S.
patent application Ser. No. 11/249,561 entitled "MOBILE IMAGING
APPLICATION, DEVICE ARCHITECTURE, SERVICE PLATFORM ARCHITECTURE",
filed 12 Oct. 2005 with the same assignee as the present
disclosure. The applicants of the above applications are also
applicants of this application. The disclosure of the above
identified copending applications is incorporated in its entirety
herein by reference.
[0007] This application is further related to a copending U.S.
patent application Ser. No. 11/250,797 entitled "VIDEO MONITORING
APPLICATION, DEVICE ARCHITECTURE, AND SYSTEM ARCHITECTURE", filed
13 Oct. 2005 with the same assignee as the present disclosure. The
applicants of the above applications are also applicants of this
application. The disclosure of the above identified copending
applications is incorporated in its entirety herein by
reference.
[0008] Some references, which may include patents, patent
applications and various publications, are cited and discussed in
the description of this disclosure. The citation and/or discussion
of such references is provided merely to clarify the description of
the present disclosure and is not an admission that any such
reference is "prior art" to the disclosure described herein. All
references cited and discussed in this specification are
incorporated herein by reference in their entireties and to the
same extent as if each reference was individually incorporated by
reference.
TECHNICAL FIELD
[0009] The present disclosure relates generally to a system,
apparatus, and method of video processing and applications.
SUMMARY
[0010] Directly digitized images and video are resource intensive;
thus, images and video can be compressed for storage, transmission,
and other uses. For example, compression can be characterized by a
three-stage process: transform, quantize, and entropy-code. Most
image and video compressors share this basic architecture, with
variations.
[0011] The transform stage in a video compressor can be to gather
the energy or information of the source picture into as compact a
form as possible by taking advantage of local similarities and
patterns in the picture or sequence of pictures. Compressors
typically compress different inputs with different compression
levels. For example, compressors may be designed to work well on
"typical" inputs and ignore their failure to compress "random" or
"pathological" inputs. Many image compression and video compression
methods, such as MPEG-2 and MPEG-4, use the discrete cosine
transform (DCT) as the transform stage.
[0012] Quantization may discard information after the transform
stage, therefore, in some instances, the reconstructed decompressed
image may not be an exact reproduction of the original. Entropy
coding is generally a lossless process: this process takes the
information remaining after quantization and codes it so that it
can be reproduced in the decoder. Thus the design decisions about
what information to discard are not affected by the following
entropy-coding stage.
[0013] DCT-based video compression/decompression (codec)
techniques, in some instances, having been developed for, for
example, broadcasting and streaming of studio-generated video
content, are associated with the encoding of video content in a
studio environment, for example, where high-complexity encoders can
be run on computer workstations. Such computationally complex
encoders enable computationally simple and relatively inexpensive
decoders (players) to be installed in consumer playback
devices.
[0014] However, as depicted in FIG. 1, the asymmetricity in
encode/decode technologies may result in difficulties in support of
the compression of full television-sized video content using the
processor capacity available in mobile multimedia devices, such as
camcorder phones, in which video messages are captured and
compressed in real time in the mobile device itself, as well as
played back. As a result, video in mobile devices may be limited to
much smaller sizes and much lower frame rates than in other
consumer products, as depicted in FIG. 2.
[0015] Video editing with DCT-based techniques and other video
processing applications may require full or partial decoding of
compressed video input data prior to editing or other processing of
the fully or partially decoded video, followed by compression of
the edited or otherwise processed video data for output and
subsequent distribution. Therefore, the computational complexity of
DCT-based video editing and other processing applications may
exceed the computational capacity of many standard server computers
based on general-purpose personal computer (PC) central processing
units (CPUs).
[0016] Rather, video editing and other processing applications
typically utilize specialized video applications server computers,
in which video processing may be carried out using a combination of
specialized data processing elements, including, but not restricted
to: digital signal processors (DSPs), application specific
integrated circuits (ASICs), multimedia processors, and
reconfigurable processing devices (RPDs). The number, cost, and
power consumption of such specialized video data processing
elements lead to much higher cost and power consumption for
specialized video servers, in comparison to standard server
computers based on general purpose PC CPUs. However, the commercial
deployment of emerging mobile video services requires that such
video editing and other processing be provided for large numbers of
concurrent service subscribers, and that the costs of deploying and
maintaining the corresponding video applications servers be as low
as possible.
[0017] Various embodiments of the present disclosure may include,
one or more various, video processing, and other processes, such
as, and including one or more of the following: [0018] Compression;
[0019] Full or partial decompression; [0020] Editing of fully or
partially decompressed video, including, but not limited to,
cutting, trimming, inserting transitions, re-ordering, adjusting
exposure, compensating for backlighting, compensating for limited
low light sensitivity of the camera imaging element (typically, a
CMOS, CCD or similar element), compensating for distortions coming
from the camera module's lenses, compensating for camera jitter
occurring during video recording, modifying image background, and
fixing red-eye; [0021] Transcoding, including conversions between
the video format of the present disclosure and other
standards-based and/or proprietary video formats; [0022]
Transrating, including modification of video compression level, bit
rate, frame rate, image size, and compressed format for playback
compatibility between different devices and screen sizes; [0023]
Tagging and embedding meta data for video search applications;
[0024] Digital watermarking for security and rights management;
[0025] Video storing (including in a data base), searching,
retrieving; [0026] Recognition, measurement, and classification of
image and video content, including music beats, video cuts, scene
change, point-of-view change, exposure and contrast properties,
rate of motion, direction and coherence of motion, lighting (sun
vs. fluorescent etc.), faces, red eyes, stock scenes, and
watermarks;
[0027] In some embodiments of the present disclosure video
applications designed to run on video applications servers and
support various combinations of the video processing functions
listed above may include, but are not limited to, one or more of:
[0028] Compression; [0029] Decompression; [0030] Editing, including
cutting, trimming, inserting transitions, adjusting exposure,
correcting for backlighting, fixing red-eye, synchronizing to beat
of soundtrack, inserting stock titles and scenes, applying
templates, correcting for camera motion, improving composition;
[0031] Transcoding, including conversions between video format of
the present disclosure and other commonly-deployed standards-based
and proprietary video formats; [0032] Transrating, including
modification of video compression level, bit rate, frame rate,
image size, and compressed format for playback compatibility
between different devices and screen sizes; [0033] Tagging and
embedding meta data for video search indexing, or other editing
applications; [0034] Video storing (including in a data base),
searching, retrieving; [0035] Digital rights management (DRM);
[0036] RSS (Really Simple Syndication) applications to broadcast
user created video to other subscribers through a feed. RSS
applications may include an aggregator and a feed reader, and may
allow user-created video to be viewed on computers and hand-held
devices; [0037] Recognition, measurement, and classification of
image and video content;
[0038] In some embodiments, video services supported by
combinations of the video applications listed above running on
video applications servers and supporting various combinations of
the video processing functions listed above may include, but are
not limited to, one or more of the following: [0039] Video
messaging, sharing, and blogging: non-real-time, i.e. store and
forward, including via RSS feeds; [0040] Video IMS: instant
messaging services over IP networks--real time video transmission
and streaming; [0041] Video calling: real time over IP, ATM, or
circuit-switched networks; [0042] Video mail, analogous to voice
mail, i.e. leave a video mail if the party being called does not
answer their phone; [0043] Video conferencing, for example
peer-to-peer between multiple parties; [0044] Manual or automated
editing, on handset or on a network- or web-based server, of video
clips capture on handset; [0045] On-line video storage, albums,
blogs, etc; [0046] Sharing of captured/edited/stored video clips
and albums; [0047] Managing access, defining who has access and
when, discovering who has seen or requested the material; [0048]
Tagging; database storage, searching, and retrieving; previewing,
downloading (soft copy), ordering hard copy (DVD) of video; [0049]
Personal multi-media market place services, including: [0050]
Preview, share, buy, sell "soft" copies (download) or "hard" copies
(DVD); [0051] Media "tagging" for indexing, RSS feeds; [0052]
Interfaces to existing online market places (e.g., E-bay, Google,
Yahoo, Microsoft, other portals); [0053] Comparison, contrast,
juxtaposition with material purchased, from friends, and from
public sources;
[0054] In some embodiments, video systems to deploy one or more of
the video services identified above supported by combinations of
the video applications listed above running on video applications
servers and supporting various combinations of the video processing
functions described above may include, but are not limited to, one
or more of the following: [0055] Circuit-switched mobile cellular
network, fixed wireless network, landline telephone network,
landline cable network, landline security network, or satellite
network; [0056] IP-based mobile cellular network, mobile mesh
network, mobile ad-hoc network, fixed wireless network, landline
telephone network, landline data network, or satellite network;
[0057] Converged fixed/mobile wireless networks; [0058] Other
wireless or wireline data networks: ATM, etc.;
[0059] Some embodiments of the present disclosure may include,
methods, devices, applications, systems, and services for one or
more of the following: video image recording, transmitting,
storing, editing, processing, transcoding, searching, retrieving,
sharing, distributing, and marketing, including mobile devices and
video processing/applications servers, corresponding mobile device
and video processing/applications server architectures, service
platform architectures, and methods and services for transmitting,
storing, editing, processing, transcoding, searching, retrieving,
sharing, distributing, and marketing still images and video images
over wireless and wired networks and systems, and viewing them on
display-enabled devices, as well as network and other system
services in relation to the foregoing.
[0060] Embodiments of the present disclosure further comprise image
recording and processing techniques, and corresponding improvements
in the architectures of mobile devices, video
processing/applications servers, and service platforms. The present
disclosure further include end-to-end functionality and performance
of mobile video services. These may be enabled by passing
information, such as anti-shake camera motion compensation
information, captured in the imager module in a mobile device, to
one or more of: a subsequent video codec in the handset devices, a
video processing applications server in the mobile network, and/or
a receiving video playback device.
[0061] Such information can then be used to further reduce the
computational requirements of the video codec, for example by
providing additional motion compensation information that may
otherwise be extracted by the video codec from the input video
data. Such information can also be used to further compensate for
camera motion, which occurs during video capture in the mobile
device, during editing and further video processing that is
subsequently carried out in a video processing applications server
in the mobile network. Such information can also be used to
recreate the effects of camera motion, which may have been
previously removed via video preprocessing in the camera module in
the mobile device, during editing and further video processing that
is subsequently carried out in a video processing applications
server in the mobile network, and/or in a receiving video playback
device.
[0062] Aspects of the present disclosure may further comprise, one
or more of the following: [0063] 1. Software video codecs/camcorder
device applications for compressing and/or decompressing video or
still images; [0064] 2. Software video processing applications for
compression, decompression, editing, transcoding, tagging and
embedding metadata for search applications, storing, databasing,
searching, retrieving, and distributing video; [0065] 3.
Infrastructure products, methods and processes, including mobile
multimedia service (MMS) infrastructure server computers and
applications, for deploying video messaging and sharing services in
conjunction with software video codec/camcorder applications for
mobile handsets as well as software processing applications; [0066]
4. Methods, processes and business processes for establishing,
enabling, distributing and operating innovative MMS services,
including mobile video messaging, sharing, and blogging; video
streaming and video calling; and personal media producer services
that support creation and marketing of video content created by
mobile users on mobile devices;
BRIEF DESCRIPTION OF FIGURES
[0067] FIG. 1 depicts video codec computational requirements,
according to one embodiment.
[0068] FIG. 2 depicts video image size limitations in mobile
devices and services, according to one embodiment.
[0069] FIG. 3 depicts a mobile imaging service platform
architecture, according to one embodiment.
[0070] FIG. 4 depicts a mobile imaging handset architecture,
according to one embodiment.
[0071] FIG. 5 depicts a video processing and applications server
functional block diagram, according to one embodiment.
[0072] FIG. 6 depicts a video processing and applications server
architecture, according to one embodiment.
[0073] FIG. 7 depicts a distributed video editing system
architecture, according to one embodiment.
[0074] FIG. 8 depicts a comparison of video codec technologies,
according to one embodiment.
[0075] FIG. 9 depicts reduced video codec computational
requirements, according to one embodiment.
[0076] FIG. 10 depicts an improved mobile imaging handset
architecture, according to one embodiment.
[0077] FIG. 11 depicts an improved video processing and
applications server architecture, according to one embodiment.
[0078] FIG. 12 depicts an alternative improved video processing and
applications server architecture, according to one embodiment.
[0079] FIG. 13 depicts an all-software implementation of an
improved video processing and applications server architecture,
according to one embodiment.
[0080] FIG. 14 depicts an all-hardware implementation of an
improved video processing and applications server architecture,
according to one embodiment.
[0081] FIG. 15 depicts a hybrid software and hardware
implementation of an improved video processing and applications
server architecture, according to one embodiment.
[0082] FIG. 16 depicts an improved distributed video editing system
architecture, according to one embodiment.
[0083] FIG. 17 depicts an improved mobile imaging service platform
architecture, according to one embodiment.
[0084] FIG. 18 depicts a self-decoding video MMS that eliminates
the need for transcoding and allows existing video
processing/applications servers to process the video format,
according to one embodiment.
[0085] FIG. 19 depicts OTN upgrade of deployed video processing and
applications server, according to one embodiment.
[0086] FIG. 20 depicts deduction in complexity, cost, and number of
video editing servers required to deploy media producer services,
according to one embodiment.
[0087] FIG. 21 depicts the functional elements of an improved video
messaging/sharing/calling platform, according to one
embodiment.
[0088] FIG. 22 depicts faster, lower cost development and
deployment of higher quality multimedia handsets & services,
according to one embodiment.
[0089] FIG. 23 depicts applications to broadband multimedia devices
and services, according to one embodiment.
[0090] FIG. 24a depicts an example embodiment of a video editing
system interacting with a user and a system administrator.
[0091] FIG. 24b depicts an example embodiment of a VESClient to
communicate with the TIP via an SSP.
[0092] FIG. 25 depicts an example embodiment of a process flow of a
VESClient to communicate with the TIP via an SSP.
[0093] FIG. 26 depicts an example embodiment of another process
flow of a VESClient to communicate with the TIP via an SSP.
[0094] FIG. 27 depicts an example embodiment of a process flow of a
VESClient to communicate with a database via an SSP.
[0095] FIG. 28 depicts an example embodiment of a process flow of a
receiving PC to communicate with a website.
[0096] FIG. 29 depicts an example embodiment of a process flow of a
template editor.
[0097] FIG. 30 depicts an example embodiment of another process
flow of a template editor.
[0098] FIG. 31 depicts an example embodiment of a screenshot.
[0099] FIG. 32 depicts an example embodiment of a screenshot.
[0100] FIG. 33a depicts an example embodiment of a screenshot.
[0101] FIG. 33b depicts an example embodiment of a screenshot.
[0102] FIG. 34 depicts an example embodiment of a screenshot.
[0103] FIG. 35 depicts an example embodiment of a screenshot.
[0104] FIG. 36 depicts an example embodiment of a screenshot.
[0105] FIG. 37 depicts an example embodiment of a screenshot.
[0106] FIG. 38 depicts an example embodiment of a screenshot.
[0107] FIG. 39 depicts an example embodiment of a screenshot.
[0108] FIG. 40 depicts an example embodiment of a screenshot.
[0109] FIG. 41 depicts an example embodiment of a screenshot.
[0110] FIG. 42 depicts an example embodiment of a screenshot.
[0111] FIG. 43 depicts an example embodiment of a screenshot.
[0112] FIG. 44 depicts an example embodiment of a screenshot.
[0113] FIG. 45 depicts an example embodiment of a screenshot.
[0114] FIG. 46 depicts an example embodiment of a screenshot.
[0115] FIG. 47 depicts an example embodiment of a screenshot.
[0116] FIG. 48 depicts an example embodiment of a screenshot.
[0117] FIG. 49 depicts an example embodiment of a screenshot.
[0118] FIG. 50 depicts an example embodiment of a screenshot.
[0119] FIG. 51 depicts an example embodiment of a screenshot.
[0120] FIG. 52 depicts an example embodiment of a screenshot.
[0121] FIG. 53 depicts an example embodiment of a screenshot.
[0122] FIG. 54 depicts an example embodiment of a screenshot.
[0123] FIG. 55 depicts an example embodiment of a screenshot.
DETAILED DESCRIPTION
[0124] The following description and drawings are illustrative and
are not to be construed as limiting. Numerous specific details are
described to provide a thorough understanding of the disclosure.
However, in certain instances, well-known or conventional details
are not described in order to avoid obscuring the description.
References to one or an embodiment in the present disclosure can
be, but not necessarily are, references to the same embodiment;
and, such references mean at least one.
[0125] Reference in this specification to "one embodiment" or "an
embodiment" means that a particular feature, structure, or
characteristic described in connection with the embodiment is
included in at least one embodiment of the disclosure. The
appearances of the phrase "in one embodiment" in various places in
the specification are not necessarily all referring to the same
embodiment, nor are separate or alternative embodiments mutually
exclusive of other embodiments. Moreover, various features are
described which may be exhibited by some embodiments and not by
others. Similarly, various requirements are described which may be
requirements for some embodiments but not other embodiments.
Image Processing
[0126] A wavelet transform may comprise the repeated application of
wavelet filter pairs to a set of data, either in one dimension or
in more than one. For still image compression, a 2-D wavelet
transform (horizontal and vertical) can be utilized. Video codecs
according to the present disclosure may use a 3-D wavelet transform
(horizontal, vertical, and temporal). Symmetrical 3-D wavelet-based
video compression/decompression (codec) device may be used to
reduce the computational complexity and power consumption in mobile
devices well below those required for DCT-based codecs, as well as
to enable simultaneous support for processing still images and
video images in a single codec.
[0127] Simultaneous support for still images and video images in a
single codec may eliminate or reduce the need for separate MPEG
(video) and JPEG (still image) codecs, or greatly enhance
compression performance and hence storage efficiency with respect
to for example, Motion JPEG codecs. A symmetrical 3-D wavelet-based
video processing device is used to reduce the computational
complexity and power consumption in, and to increase the number of
concurrent mobile subscribers that can be supported by, MMS
infrastructure equipment utilized to support automated or manual
editing of user-created video, as well as database storage, search,
and retrieval of user-created video.
Mobile Imaging Services and Service Platform Architecture
[0128] Aspects of one embodiment provide a new generation of
innovative MMS video services, including mobile video messaging,
sharing, and blogging; video streaming and video calling; and
personal "media producer" services that support creation and
marketing of video content created by mobile users on mobile
devices. Components of a mobile imaging service platform
architecture according to aspects of the present disclosure (see
FIG. 3) may include, one or more of: [0129] Mobile Handsets; [0130]
Mobile Base stations (BTS); [0131] Base station Controller/Radio
Network Controller (BSC/RNC);Mobile Switching Center (MSC); [0132]
Gateway Service Node (GSN); [0133] Mobile Multimedia Service
Controller (MMSC);
[0134] Typical functions included in the MMSC according to aspects
of the present disclosure (see FIG. 3) may include, one or more of:
[0135] Video gateway; [0136] Telco server; [0137] MMS applications
server; [0138] Storage server;
[0139] The video gateway in an MMSC, according to aspects of the
present disclosure, may serve to transcode between the different
video formats that are supported by the imaging service platform.
Transcoding is also utilized by wireless operators to support
different voice codecs used in mobile telephone networks, and the
corresponding voice transcoders can be integrated into the RNC.
[0140] Upgrading such a mobile imaging service platform with the
architecture shown in FIG. 3 may include deploying new handsets,
and manually adding new hardware to the MMSC video gateway. In some
mobile video messaging and sharing applications, cost and
complexity associated with transcoding may be eliminated. One
aspect of the current disclosure is the ability to embed a software
decoder with each transmitted video stream, enabling "self-playing"
functionality on common handset and PC video players.
[0141] The MMS applications servers in an MMSC may support
applications such as automated or manual editing of user-created
video, as well as database storage, search, and retrieval of
user-created video. The computational complexity associated with
implementing such video editing functions and other processing
applications with DCT-based video exceeds the computational
capacity of many standard server computers based on general-purpose
personal computer (PC) central processing units (CPUs).
[0142] The commercial deployment of potential new mobile video
services may include providing video editing and other processing
for large numbers of concurrent service subscribers, and that the
costs of deploying and maintaining the corresponding video
applications servers be as low as possible, according to aspects of
the present disclosure. Upgrading MMSC infrastructure is also
costly if new or specialized hardware is required. A SW
applications and service platform would be preferable in order to
enable automated over-the-air (OTA) software upgrade of handsets,
over-the-network (OTN) software upgrade of MMSC video gateways, and
support for mobile video applications using standard PCs and
servers.
[0143] Aspects of the present disclosure comprise new methods,
services and systems relating to innovative capture, compression,
transmission, editing, storing and sharing video content associated
with mobile devices. Aspects of the present disclosure may apply to
telecom (both wireless and wireline providers) and Internet, cable
and other data and multimedia operators including fixed and mobile
wireless service providers. Aspects of the present disclosure may
provide for richer content, higher bandwidth usage and higher
average revenue per user (ARPU).
[0144] Mobile multimedia service (MMS) according to aspects of the
present disclosure, include innovative video messaging, sharing,
blogging, and personal "media producer" applications that enable a
target audience to communicate personal information. Mobile image
messaging and sharing may require the addition of digital camera
functionality (still images) and/or camcorder functionality (video
images) to mobile handsets, so that subscribers can both capture
(encode) video messages that they wish to send, and play back
(decode) video messages that they receive.
[0145] According to aspects of the present disclosure, mobile
devices may be enabled to evolve into integrated consumer
multimedia entertainment platforms. A substantial investment in
industry has been directed toward technologies and platforms that
enable re-packaged broadcast television programming (such as news
clips, sports highlights, and special "mobisodes" of popular TV
programs) and other studio-generated video content (such as film
previews and music videos) to be transmitted to and viewed on
mobile devices. In this latter case, the mobile subscriber is
exploited as a new class of video consumer. However, this latter
case utilizes largely video content that has been compressed in
large broadcast enterprise servers.
[0146] However, according to aspects of the present disclosure,
mobile operators worldwide also gain significant new opportunities
to support their subscribers as media producers (as enabled by
aspects of the present disclosure), rather than just media
consumers. As enabled by aspects of the present disclosure, the
ability to capture and share photographs and video on mobile
devices with the same quality as stand-alone digital cameras and
camcorders is a technical cornerstone for such new services,
together with the deployment and convergence of higher speed
cellular and fixed wireless data networks.
[0147] Aspects of the present disclosure further includes enabling
significant reductions in the development cost and retail price of
both camcorder phones and video messaging/sharing infrastructure
equipment, which may be key to large scale commercial adoption of
such devices and related mobile multimedia/data services, in both
mature and emerging markets.
[0148] Mobile image messaging/sharing services and applications may
be limited to capturing and transmitting much smaller-size and
lower-frame-rate video images than those typically captured and
displayed on other multimedia devices (see FIG. 2), such as TVs,
personal computers, digital video camcorders, and personal media
players. Mobile image messaging services and applications capable
of supporting VGA (or larger) video at a frame rate of 30 fps or
higher, as provided by aspects of the present disclosure, would be
preferable.
[0149] Aspects of the present disclosure, further comprise, a
software mobile imaging applications service platform that may
include, one or more of: [0150] 1. support automated over-the-air
(OTA) software upgrade of deployed handsets; [0151] 2. support
automated over-the network (OTN) software upgrade of deployed
MMSCs; [0152] 3. support the deployment of mobile video
applications and services using standard PCs and servers; [0153] 4.
enable larger numbers of concurrent mobile video service
subscribers to be supported by a smaller number of servers; [0154]
5. support the deployment of mobile video applications and services
without the need for video transcoding in the handset of network;
[0155] 6. enable mobile video devices, applications, and services
that support capturing and transmitting full-size and
full-frame-rate video images similar to those typically captured
and displayed on other consumer multimedia devices such as digital
camcorders and TVs;
[0156] Java implementations of the mobile handset and MMS server
applications, according to aspects of the present disclosure, may
be used for handset/network robustness against viruses, worms, and
other "attacks", allowing mobile network operators to provide the
quality and reliability of service required by national regulators,
in one embodiment.
Mobile Imaging Handset Architecture
[0157] In embodiments of the present disclosure, the addition of
digital camcorder functionality to mobile handsets is generally
associated with adding the following functions, either in hardware,
software, or as a combination of hardware and software (see FIG.
4): [0158] imager array (typically array of CMOS or CCD pixels),
with corresponding pre-amplifiers and analog-to-digital (A/D)
signal conversion circuitry [0159] image processing functions such
as pre-processing, encoding/decoding (codec), post-processing
[0160] buffering of processed images for non-real-time transmission
or real-time streaming over wireless or wire line networks [0161]
one or more image display screens [0162] local image storage on
built-in or removable memory.
[0163] Using codecs based on DCT transforms, such as MPEG-4,
commercially available imaging-enabled mobile handsets are limited
to capturing smaller-size and lower-frame-rate video images than
those typically captured and displayed on other multimedia devices,
such as TVs, personal computers, digital video camcorders, and
personal media players. These latter devices typically
capture/display video images in VGA format (640.times.480 pixels)
or larger, at a display rate of 30 frames-per-second (fps) or
higher, whereas commercially available imaging-enabled mobile
handsets may be limited to capturing video images in for example,
QVGA format (320.times.240 pixels), QCIF format (176.times.144
pixels) or smaller, at a display rate of for example, 15 fps or
lower (See, e.g., FIG. 2).
[0164] This reduced video capture capability may typically be due
to the large computational requirements, processor power
consumption, and buffer memory required to complete the number,
type, and sequence of computational steps associated with video
compression/decompression using DCT transforms.
[0165] Using commercially available video codec and microprocessor
technologies leads to very complex, power-hungry, and expensive
architectures for mobile imaging handsets that target capture of
VGA (or larger) video at a frame rate of 30 fps or higher. Such
handset architectures utilize codecs having a combination of both
software programs and hardware accelerators running on a
combination of reduced instructions set (RISC) processors, digital
signal processors (DSPs), application-specific integrated circuits
(ASICs), and reconfigurable processing devices (RPDs), together
with larger buffer memory blocks (typical memory capacity of 1
Mbyte or more).
[0166] These codec functions may be implemented using such RISC
processors, DSPs, ASICs, multimedia processors, and RPDs as
separate integrated circuits (ICs), or may combine one or more of
the RISC processors, DSPs, ASICs, multimedia processors, and RPDs
integrated together in a system-in-a-package (SIP) or
system-on-a-chip (SoC).
[0167] Codec functions running on RISC processors or DSPs can be
software routines, with the advantage that they can be modified in
order to correct programming errors or upgrade functionality. The
disadvantage of implementing certain complex, repetitive codec
functions as software is that the resulting overall processor
resource and power consumption requirements typically exceeds those
available in mobile communications devices. Codec functions running
on ASICs and multimedia processors are typically fixed hardware
implementations of complex, repetitive computational steps, with,
typically, the advantage that specially tailored hardware
acceleration can substantially reduce the overall power consumption
of the codec.
[0168] The disadvantages of implementing certain codec functions in
fixed hardware include longer and more expensive design cycles, the
risk of expensive product recalls in the case where errors are
found in the fixed silicon implementation, and the inability to
upgrade fixed silicon functions in deployed products in the case
where newly developed features are to be added to the imaging
application. Codec functions running on RPDs are typically routines
that utilize both hardware acceleration and the ability to add or
modify functionality in final mobile imaging handset products.
[0169] An imaging application that reduces or eliminates complex,
repetitive codec functions so as to enable mobile imaging handsets
capable of capturing VGA (or larger) video at a frame rate of 30
fps with an all-software architecture would be preferable, in order
to simplify the above architecture and enable handset costs
compatible with high-volume commercial deployment.
[0170] Multimedia handsets are required not only to support picture
and video messaging capabilities, but also a variety of additional
multimedia capabilities (voice, music, graphics) and a variety of
fixed and mobile wireless access modes, including but not limited
to 2.5G and 3G cellular access, WiBro, HSDPA, WiFi, wireless LAN,
and Bluetooth. The complexity and risk involved in developing,
deploying, and supporting such products makes over-the-air (OTA)
distribution and management of many functions and applications
highly beneficial, in order to more efficiently deploy new
revenue-generating services and applications, and to avoid costly
product recalls.
[0171] A SW imaging application would be preferable to enable OTA
distribution and management of the imaging application by handset
manufacturers, mobile operators, and other MMS service providers.
The present disclosure provides these objectives.
[0172] Aspects of the present disclosure include, one or more of:
[0173] 1. Enabling mobile video devices, applications, and services
that support capturing and transmitting full-size and
full-frame-rate video images similar to typically captured and
displayed on other consumer multimedia devices such as digital
camcorders and TVs; [0174] 2. Supporting automated over-the-air
(OTA) software upgrade of deployed handsets;
[0175] Java implementations of the mobile handset application,
according to aspects of the present disclosure, may be used for
handset/network robustness against viruses, worms, and other
"attacks", allowing mobile network operators to provide the quality
and reliability of service required by national regulators, in one
embodiment.
Video Processing and Applications Server Architecture
[0176] In one embodiment, MMS video services, include mobile video
messaging, sharing, and blogging; video streaming and video
calling; and personal "media producer" services that support
creation and marketing of video content created by mobile users on
mobile devices, automated video editing, video "post-production",
and other video processing applications provided on mobile handsets
and/or in MMSC application servers. However, the commercial
deployment of such capabilities includes providing video editing
and other processing for large numbers of concurrent service
subscribers, and that the costs of deploying and maintaining the
corresponding video applications servers be as low as possible.
[0177] FIG. 5 depicts a functional block diagram for a video
processing and applications server, according to one embodiment.
The video input data is typically in a compressed format, and can
be fully or partially decoded before implementing video processing
algorithms on the video input data. The processed video can be
compressed again for output and subsequent transmission and
distribution. The video processing functions may include, one or
more of: [0178] Compression; [0179] Decompression; [0180] Editing,
including sequence edits such as cuts and transitions, and image
content edits such as color correction, fades, and jitter
removal;
[0181] Post-production, such as adding titles or incorporating
chosen cuts of the video into a production template; [0182]
Transcoding, including conversions between the present wavelet
format and other commonly-deployed standards-based and proprietary
video formats; [0183] Transrating, including compression level, bit
rate, frame rate, image size, and compressed format for playback
compatibility between different devices and screen sizes; [0184]
Tagging and embedding meta data for search applications; [0185]
Storing (including in a data base, searching, retrieving); [0186]
Managing the content, including provenance, ownership, and
permissions, and auditing compliance with license restrictions;
[0187] The computational complexity of DCT-based video editing and
other processing applications may exceed the computational capacity
of server computers based on general-purpose personal computer (PC)
central processing units (CPUs). In some situations, video editing
and other processing applications utilize specialized video
applications server computers, in which video processing may be
carried out using a combination of specialized data processing
elements, including, but not restricted to: digital signal
processors (DSPs), application specific integrated circuits
(ASICs), multimedia processors, and reconfigurable processing
devices (RPDs).
[0188] FIG. 6 depicts a representative video processing and
applications server architecture to provide the computational
requirements of DCT-based video editing and other processing
applications, according to one embodiment. The number, cost, and
power consumption of the specialized video data processing elements
leads to higher cost and power consumption for specialized video
servers, in comparison to standard server computers based on
general purpose PC CPUs.
[0189] Aspects of the present disclosure may further include, a
system having one or more of the following characteristics: [0190]
1. Reducing computational complexity for video encode, decode, and
editing; [0191] 2. Allowing mobile video applications to run on
low-cost, low-power, PC CPUs, rather than specialized, expensive,
power-hungry DSPs or ASICs; [0192] 3. Enabling fewer, less
expensive, PC-based servers to replace larger number of specialized
video application servers, reducing deployment & operational
cost per subscriber; [0193] 4. Allowing substantial increase in the
number of concurrent mobile subscribers that can be supported by
each video application server; [0194] 5. Supportings automated
over-the network (OTN) software upgrade of deployed MMSCs video
application servers; [0195] 6. Supports the deployment of mobile
video applications and services without the need for video
transcoding in the handset or network; Video Editing, Archiving,
and Retrieval Systems
[0196] MMS video services such as, mobile video messaging, sharing,
and blogging; video streaming and video calling; and personal
"media producer" services that support creation and marketing of
video content created by mobile users on mobile devices, in one
embodiment, provides one or more of, automated video editing, video
"post-production", and other video processing applications on
mobile handsets and/or in MMSC application servers.
[0197] Video production is a distributed process, with resources
physically distributed over several sites. For example, in the
broadcast industry, broadcasters outsource specific production and
post-production phases to specialized studios or upcoming virtual
studios. Aspects of the current disclosure further comprise,
embodiments to simplify and accelerate the deployment of
distributed virtual studio applications for mobile personal "media
producer" services.
[0198] FIG. 7 illustrates the functions and elements of a
distributed video editing system for broadcast applications,
including elements that support video archival and retrieval
functions, according to one embodiment. Such systems are designed
with the goal of providing commercial broadcasters with a complete
solution for distributed video post-production, which integrates
archival, retrieval, and editing functionalities.
[0199] In one embodiment, the system includes an archive server, an
editing server, a catalog server, and a client station for the end
user. Aspects of the current disclosure further comprise
simplifying and accelerating the design and commercial deployment
of similar distributed virtual studio systems that can support
mobile personal "media producer" services, rather than just
commercial broadcast services.
[0200] The archive server in FIG. 7 stores videos at both low and
high bit-rates, and offers video streaming and file transfer
services, according to one embodiment. The catalog server can host
a database where video clips' metadata are stored and indexed. The
client station allows users to perform archival and retrieval
operations, as well as video editing using existing material at low
bit-rate. The editing list created by the user is then processed by
the editing server and applied to the corresponding high bit-rate
material, in order to produce the ready-to-broadcast final
video.
[0201] The catalog server automatically fetches the low bit-rate
version of each new clip in the video archive and preprocesses it,
in order to extract metadata. A video clip can be decomposed into
smaller segments, by detecting the transition between shots and by
analyzing motion properties. For each shot, still images
(keyframes) can be extracted for display purposes, and to enable
automatic image indexing approach. Camera and camera lens motion
(e.g., pan, tilt, zoom, stationary) properties can be computed from
the motion vectors. These preprocessing steps are performed on the
low bit-rate stream (typically MPEG-1), without decompression.
[0202] The archival tool in FIG. 7 allows the document list to
visualize/edit the results of the clip preprocessing algorithm, and
to enter additional textual annotation, according to one
embodiment.
[0203] Graphical user interfaces enable, for example, a journalist
or a program director to retrieve video material from the archive,
using the available metadata from the catalog server. Once the
items are selected, it is possible to export them to the editing
tool. The retrieval tool in FIG. 7 allows one to query the database
using textual and visual information, in one embodiment. Textual
queries address specific fields entered during the archival
process. Visual queries address metadata extracted during the
preprocessing phase. The user specifies an example image, and
defines the desired type of camera motion.
[0204] Embodiments of the present disclosure further comprise:
[0205] 1. Reduced computational complexity for video encode,
decode, and editing, in one embodiment. [0206] 2. Performing video
editing steps or operations in the wavelet transformed domain,
thereby saving both the computation of inverse wavelet transforms
and forward wavelet transforms, and also saving computation by
accessing and modifying fewer data items than would be required for
the conventional operations on pixel data values, in one
embodiment. [0207] 3. Allows mobile video applications to run on
low-cost, low-power, PC CPUs, rather than specialized, expensive,
power-hungry DSPs or ASICs, in one embodiment. [0208] 4. Fewer,
less expensive, PC-based servers can replace larger number of
specialized video application servers, reducing deployment &
operational cost per subscriber, in one embodiment. [0209] 5.
Reduces the computing power required to execute all of the
real-time functions of an online editing system, and allows a
software implementation of these real-time editing functions, in
one embodiment. [0210] 6. Enables end user support for automated
video editing and other processing via a SW client on mobile
handsets, personal media players, laptop computers, and personal
computers, in addition to end-user workstations, in one embodiment.
[0211] 7. Enables further improvements in the end-to-end
functionality and performance of mobile video services, enabled by
passing information, such as anti-shake camera motion compensation
information, that can be captured in the imager module in a mobile
device, to one or more of: a subsequent video codec in the handset
devices, a video processing applications server in the mobile
network, and/or a receiving video playback device.
[0212] The video editing services, for example the "automated video
editing system", can also be used compensate for or correct one or
more of: limited low light sensitivity of the camera; imaging
element (typically, a CMOS, CCD or similar devices), and/or for
distortions coming from the camera module's lenses, compensating
for camera jitter occurring during video recording.
[0213] Such information can then be used to further reduce the
computational requirements of the video codec, for example by
providing additional motion compensation information that may
otherwise be extracted by the video codec from the input video
data, in one embodiment. Such information can also be used to
further compensate for camera motion, which occurs during video
capture in the mobile device, during editing and further video
processing that is subsequently carried out in a video processing
applications server in the mobile network. Such information can
also be used to recreate the effects of camera motion, which may
have been previously removed via video preprocessing in the camera
module in the mobile device, during editing and further video
processing that is subsequently carried out in a video processing
applications server in the mobile network, and/or in a receiving
video playback device, according to one embodiment.
[0214] With the embodiments of the present disclosure, mobile video
services are being launched into a market that now associates video
with home cinema quality broadcast (e.g., full size image formats
such as VGA at 30 frames per second). Furthermore, processing of
such large volumes of data using exceeds the computing resources
and battery power available for real-time video capture (encoding)
in mobile handsets.
[0215] In some situations, encoding of video content for broadcast
and streaming applications may be performed in a studio
environment, where high-complexity encoders can be run on computer
workstations. Since video messages are captured in real time in the
handset itself, they are limited to much smaller sizes and much
lower frame rates.
[0216] Embodiments of the present disclosure, include lower
complexity imaging applications (e.g., video codec client for
mobile handsets, video editing and processing applications for MMS
application servers) that can be implemented as an application in
mobile handsets and MSS application servers, to reduce the
complexity of the handset architecture and the complexity of the
mobile imaging service platform architecture.
[0217] According to embodiments of the present disclosure, a video
codec solution reduces or eliminates baseband processor and video
accelerator costs and requirements in multimedia handsets. Combined
with the ability to install the codec post-production via OTA
download, this all-SW solution substantially reduces the
complexity, risk, and cost of both handset development and video
messaging service architecture and deployment. Reduced camcorder
phone development time and increased product platform flexibility
provide further camcorder phone cost reductions.
[0218] SW video transcoders and editing, storing, searching,
retrieval applications according to the present disclosure enable
automated over-the-network (OTN) upgrade of deployed MMS control
(MMSC) infrastructure, as well as the use of standard PCs and
servers to run such applications. Additionally, the present
disclosure wavelet transcoders provide carriers with complete
interoperability between the wavelet video format and other
standards-based and proprietary video formats. Embodiments of the
present disclosure further includes a software decoder to be
embedded with each transmitted video stream, enabling
"self-playing" functionality on common handset and PV video
players, and eliminating the cost and complexity of transcoding
altogether.
[0219] In one embodiment, the video platform allows rapid
deployment of new MMS services. Some embodiments of embodiments of
the present disclosure also leverage processing speed and video
production accuracy not available with other existing technologies.
Such new MMS services are themselves aspects of the current
disclosure.
[0220] The present disclosure's wavelet codecs are also unique in
their ability to efficiently process both still images and video,
and can thus replace separate MPEG and JPEG codecs with a single
lower-cost and lower-power solution that can simultaneously support
both mobile picture-mail and video-messaging services. Embodiments
of the present disclosure further comprises improving the
end-to-end functionality and performance of mobile video services,
by sharing information, such as anti-shake camera motion
compensation information, that is captured in the imager module in
a mobile device, with a subsequent video codec in the handset
devices, a video processing applications server in the mobile
network, and/or a receiving video playback device.
Improved Wavelet-Based Image Processing
[0221] Aspects of the present disclosure further utilize 3-D
wavelet transforms in video compression/decompression (codec)
devices, for example, with much lower computational complexity than
DCT-based codecs.
[0222] FIG. 8 provides a comparison of the relative computational
requirements of a traditional DCT encoder technology and exemplary
technologies of the present disclosure, according to one
embodiment. The application of a wavelet transform stage also
enables design of quantization and entropy-coding stages with
greatly reduced computational complexity.
[0223] FIG. 9 depicts the reduction in video codec computational
requirements enabled by aspects of the present disclosure,
according to one embodiment.
[0224] In some embodiments, wavelet codecs (e.g., 3D wavelet codes)
may further provide, for mobile imaging applications, devices, and
services, one or more of the following: [0225] Symmetric,
low-complexity video encoding and decoding; [0226] Lower processor
power requirements for both SW and HW codec implementations; [0227]
Software encoding and decoding of VGA (or larger) video at a frame
rate of 30 fps (or higher) with processor requirements compatible
with existing commercial mobile handsets, both as native code and
as a Java application; [0228] Lower gate-count ASIC cores for SoC
integration; [0229] Lower buffer memory requirements; [0230] Single
codec supports both still images (.about.JPEG) and video
(.about.MPEG); [0231] Simplified video editing (cuts, inserts, text
overlays,) due to shorter group of pictures (GOP); [0232]
Simplified synchronization with voice codecs, due to shorter GOP;
[0233] Low latency for enhanced video streaming, due to shorter
GOP; [0234] Fine grain scalability for adaptive rate control,
multicasting, and joint source-channel coding; [0235]
Low-complexity performance scaling to emerging HDTV video formats;
[0236] Compact SW decoder (<40 kB) can be integrated with each
transmitted video stream to enable "self playing" video messages
compatible with common handset and PC video players;
[0237] In some embodiments, application of wavelet transforms
utilize short dyadic integer filter coefficients in the lifting
structure. For example, the Haar, 2-6, and 5-3 wavelets and
variations of them can be used.
[0238] In one embodiment, the Lifting Scheme computation algorithm
can be used. For example, these filters are computed using the
Lifting Scheme, which enables in-place computation. This decreases
use of registers and temporary RAM locations, and keeps references
local for highly efficient use of caches.
[0239] In one embodiment, wavelet transforms in pyramid form with
customized pyramid structure can be used. For example, some
embodiments of the present disclosure further includes computing
each level of the wavelet transform sequence on half of the data
resulting from the previous wavelet level, so that the total
computation is almost independent of the number of levels. In one
embodiment, the pyramid is customized to leverage the advantages of
the Lifting Scheme above and further economize on register usage
and cache memory bandwidth.
[0240] In one embodiment, block structure can be utilized. For
example the present disclosure divides the picture into rectangular
blocks and processes each block separately from the other thus
enabling memory references to be kept local and to do an entire
transform pyramid with data that remains in the processor cache,
saving a significant amount of data movement within most
processors. The present block structure may be beneficial in HW
embodiments as it avoids the requirement for large intermediate
storage capacity in the signal flow.
[0241] In one embodiment, block boundary filters can be used: the
present disclosure uses modified filter computations at the
boundaries of each block that avoid sharp artifacts as set out in
U.S. patent application Ser. No. 10/418,363, incorporated herein by
reference.
[0242] In one embodiment, chroma temporal removal can be used: for
example, using a single field of chroma for a GOP as set out in
U.S. patent application Ser. No. 10/447,514, incorporated herein by
reference.
[0243] In one embodiment, temporal compression using 3D wavelets
can be used: Instead certain embodiments of the present disclosure
compute a field-to-field temporal wavelet transform. This is much
less expensive to compute. Also used are short integer filters with
the Lifting Scheme in one aspect.
[0244] In one embodiment, the dyadic quantization algorithm can be
used: In certain embodiments of the present disclosure, the
quantization step of the compression process is accomplished using
a binary shift operation uniformly over a range of coefficient
locations.
[0245] In one embodiment, the piling algorithm can be used. For
example, in some embodiments of the present disclosure, the amount
of data to be handled by the following entropy coder by doing
run-of-zeros conversion is reduced. In certain embodiments, the
methods and disclosures disclosed in U.S. patent application Ser.
No. 10/447,455 incorporated herein by reference are utilized for
counting runs of zeros on parallel processing architectures.
[0246] In one embodiment, cycle-efficient entropy coding can be
used. For example, the entropy coding step of the compression
process can be accomplished using techniques that combine the
traditional table lookup with direct computation on the input
symbol. Because the symbol distribution has been characterized,
such simple entropy coders as Rice-Golomb or exp-Golomb or Dyadic
Monotonic can be used. The choice of entropy coder details
depending on the processor platform capabilities. The methods
disclosed in U.S. patent application Ser. No. 10/447,467
incorporated herein by reference, and U.S. patent application Ser.
No. 11/232,726 incorporated herein by reference, may be
utilized.
[0247] Aspects of the present disclosure also enable video editing
processes or operations to be accomplished in the wavelet
transformed domain, thereby saving both the computation of inverse
wavelet transforms and forward wavelet transforms, and also saving
computation by accessing and modifying fewer data items than would
be required for the conventional operations on pixel data values.
Examples of such video editing processes accomplished in the
wavelet transform domain with reduced computation further include,
but are not limited to:
[0248] 1. Fade to Black
[0249] In one embodiment, starting with wavelet transform data
(coefficients rather than pixel data values), decrease the Luma DC
coefficient of a picture (or of each block if there are blocks) by
some amount in each time step, thus making the overall brightness
level decrease smoothly to black. The decrease stops when the DC
level has reached full black, or can be continued beyond full black
to assure that all parts of the picture have reached black. Note
that about 1/256 of the data is accessed and/or modified at each
time step, in the case of block transforms as used in Droplet's
current commercial codecs, or about 1/300,000 of the data in the
case of a non-blocked full-transform wavelet implementation.
[0250] 2. Fade to White
[0251] This is similar to Fade to Black except that the DC
coefficients are increased progressively toward the full-brightness
level, according to one embodiment.
[0252] 3. Blur Out
[0253] Starting with the same wavelet transform data coefficients,
at each successive time step we replace the next coefficient in
order of fine-to-coarse spatial detail, in one embodiment. This
order corresponds to the "reverse zigzag scan" order of
coefficients in JPEG and MPEG encoding. At the final time step we
replace the DC coefficient with the value representing middle gray.
The replacements are done within each block of the picture, in the
case of block transforms. This process has the effect of blurring
the image until no information remains. Note that about 1/256 of
the data is accessed and/or modified at each time step.
[0254] 4. Cheshire Fade (Fade to Fine Detail)
[0255] Starting with the same wavelet transform data coefficients,
at the first time step we replace the DC coefficient with the value
representing middle gray, in one embodiment. At each successive
time step we replace the next coefficient in order of
coarse-to-fine spatial detail. This order corresponds to the
"zigzag scan" order of coefficients in JPEG and MPEG encoding. The
replacement is done within each block of the picture, in the case
of block transforms. Note that about 1/256 of the data is accessed
and/or modified at each time step.
[0256] 5. Color Correction
[0257] Starting with the same wavelet transform data coefficients,
we modify the Chroma DC coefficients only. This has the effect of
modifying color balance across the entire image, and only needs to
access and modify about 1/256 or fewer of the data elements.
Improved Mobile Imaging Handset Architecture
[0258] FIG. 10 depicts a mobile imaging handset architecture
enabled by aspects of the present disclosure, according to one
embodiment.
Improved Video Processing and Applications Server Architecture
[0259] FIG. 11 depicts a video processing and applications server
architecture, in which separate line cards containing specialized
data processing elements, including, but not restricted to: digital
signal processors (DSPs), application specific integrated circuits
(ASICs), multimedia processors, and reconfigurable processing
devices (RPDs), are replaced by general-purpose personal computer
(PC) central processing units (CPUs), according to one
embodiment.
[0260] FIG. 12 depicts a video processing and applications server
architecture, in which wavelet-based SW video transcoders and
editing, storing, searching, retrieval applications according to
the present disclosure replace DCT-based video processing running
on digital signal processors (DSPs), application specific
integrated circuits (ASICs), multimedia processors, and
reconfigurable processing devices (RPDs), according to one
embodiment.
[0261] Various embodiments of the present disclosure provide
enhancements to the MMS applications server architecture. For
example, several implementation options can be considered for the
SW wavelet-based video processing and applications in the improved
video processing and applications server (see FIG. 13). The imaging
application can be installed via OTN download to the multimedia
processing section of the server. The imaging application can also
be installed during manufacturing, at point-of-sale, or during
installation, to the multimedia processing section of the server.
Additional implementation options are also possible.
[0262] According to aspects of the present disclosure, performance
of the video processing and applications server may be improved,
and costs and power consumption may be reduced, by accelerating
some computational elements via HW-based processing resources in
order to take advantage of ongoing advances in mobile device
computational HW (ASIC, DSP, RPD) and integration technologies
(SoC, SIP). Several all-HW options can be implemented for
integrating these hardware-based processing resources in the server
(see FIG. 14).
[0263] As shown in FIG. 15, hybrid architectures offered by aspects
of the present disclosure for the video processing applications may
offer enhancements by implementing some computationally intensive,
repetitive, fixed functions in HW, and implementing in SW those
functions for which post-manufacturing and post-installation
modification may be desirable or required, according to one
embodiment.
Improved Video Editing, Archiving, and Retrieval System
[0264] FIG. 16 illustrates the functions and elements of
distributed video editing system for broadcast applications,
including elements that support video archival and retrieval
functions, according to one embodiment. Aspects of the current
disclosure simplify and accelerate the design and deployment of
such distributed virtual studio systems that can support mobile
personal "media producer" services, rather than just commercial
broadcast services. Embodiments of the present disclosure further
include, one or more of: [0265] 1. Reduced computational complexity
for video encode, decode, and editing; [0266] 2. Performing video
editing steps or operations in the wavelet transformed domain,
thereby saving both the computation of inverse wavelet transforms
and forward wavelet transforms, and also saving computation by
accessing and modifying fewer data items than would be required for
the conventional operations on pixel data values; [0267] 3. Allows
mobile video applications to run on low-cost, low-power, PC CPUs,
rather than specialized, expensive, power-hungry DSPs or ASICs;
[0268] 4. Fewer, less expensive, PC-based servers can replace
larger number of specialized video application servers, reducing
deployment & operational cost per subscriber; [0269] 5. Reduces
the computing power required to execute all of the real-time
functions of an online editing system, and allows a software
implementation of these real-time editing functions; [0270] 6.
Enables end user support for automated video editing and other
processing via a SW client on mobile handsets, personal media
players, laptop computers, and personal computers, in addition to
end-user workstations; [0271] 7. Enables an end-to-end
functionality and performance of mobile video services; These are
enabled by passing information, such as anti-shake camera motion
compensation information, that is captured in the imager module in
a mobile device, to one or more of: a subsequent video codec in the
handset devices, a video processing applications server in the
mobile network, and/or a receiving video playback device. Such
information can then be used to further reduce the computational
requirements of the video codec, for example by providing
additional motion compensation information that must otherwise be
extracted by the video codec from the input video data.
[0272] Such information can also be used to further compensate for
camera motion, which occurs during video capture in the mobile
device, during editing and further video processing that is
subsequently carried out in a video processing applications server
in the mobile network. Such information can also be used to
recreate the effects of camera motion, which may have been
previously removed via video preprocessing in the camera module in
the mobile device, during editing and further video processing that
is subsequently carried out in a video processing applications
server in the mobile network, and/or in a receiving video playback
device, in one embodiment.
Improved Mobile Imaging Service Platform Architecture
[0273] Components of an improved mobile imaging service platform
architecture according to embodiments of the present disclosure
(see FIG. 17) include, one or more of: [0274] Mobile Handsets
[0275] Mobile Base stations (BTS) [0276] Base station
Controller/Radio Network Controller (BSC/RNC) [0277] Mobile
Switching Center (MSC) [0278] Gateway Service Node (GSN) [0279]
Mobile Multimedia Service Controller (MMSC) [0280] Imaging Service
Download Server
[0281] Functions included in the MMSC (see FIG. 17) include, one or
more of: [0282] Video Gateway [0283] Telco Server [0284] MMS
Applications server [0285] Storage Server
[0286] Embodiments of the present disclosure includes processes for
deploying the improved imaging service platform include, one or
more of:
Process 1.
[0287] Signal network that video editing/processing applications
are available for updating deployed MMSCs. The update can be
installed via automated OTN deployment or via manual
procedures;
Process 2.
[0288] Install and configure video editing/processing SW
applications via automated OTN deployment or via manual procedures
(see FIG. 17);
Process 3.
[0289] Signal subscriber handset that Mobile Video Imaging
Application is available for download and installation;
Process 4.
[0290] If accepted by subscriber, and transaction settlement is
completed successfully, download and install Mobile Video Imaging
Application;
Process 5.
[0291] Signal network that handset upgrade is complete. Activate
service and related applications. Update subscriber monthly billing
records to reflect new charges for Mobile Video Imaging
Application;
[0292] According to one embodiment of the present disclosure, FIG.
18 shows "self-decoding" video MMS functionality achieved by
integrating the SW decoder with the transmitted video stream, that
eliminates the need for transcoding and allows existing video
processing/applications servers to process the video format
provided by the current disclosure.
[0293] According to aspects of the current disclosure, FIG. 19
depicts OTN upgrade of deployed video processing and applications
server.
[0294] According to one embodiment of the present disclosure, FIG.
20 shows the reduction in complexity, cost, and number of video
application servers required to deploy media producer services such
as automated or manual editing of user-created video, as well as
database storage, search, and retrieval of user-created video.
[0295] According to one embodiment of the present disclosure, FIG.
21 shows the functional processes of a video
messaging/sharing/calling platform incorporating the improved
wavelet-based codec/camcorder application, and improved video
editing/processing, and database storage, search, and
retrieval.
[0296] According to one embodiment of the present disclosure, FIG.
22 shows the benefits in terms of faster, lower cost development
and deployment of higher quality multimedia handsets &
services, including the ability to deploy an innovative personal
multi-media market place platform in which users can preview,
share, buy, and sell "soft" copies (download) or "hard" copies
(DVD) of user-created audio/video content. The present disclosure
also allows for more efficient video "tagging" for database
indexing and network (RSS) feeds, and supports interfaces to
existing web-based market places such as E-bay, Google, Yahoo,
Microsoft, and other portals.
[0297] According to one embodiment of the present disclosure, FIG.
23 shows applications of the above video messaging/sharing/calling
platform incorporating the improved wavelet-based codec/camcorder
application and improved video editing/processing and database
storage, search, and retrieval, to deploy new video services on
fixed wireless, mobile wireless, and wireline networks, as well as
"converged" networks combining elements of fixed wireless, mobile
wireless, and wireline architectures.
[0298] The present disclosure, with its wavelet-based mobile video
imaging application, handset architecture, and service platform
architecture achieves the goal of higher mobile video image
quality, lower handset cost and complexity, and reduced service
deployment costs, in one embodiment.
[0299] The imaging solution of the present disclosure substantially
reduces processor costs and requirements in video editing servers.
Combined with the ability to install and upgrade the video editing
application post-production via OTN download, this SW solution can
substantially reduce the complexity, risk, and cost of video
messaging and sharing service deployment.
[0300] The present disclosure provides mobile operators with the
first mobile video messaging and sharing platform that delivers the
video quality and service deployment costs required for mass-market
adoption by consumer and enterprise customers. The present
disclosure provides a SW camcorder phone application capable of
real-time capture of full (VGA)-size images (640.times.480 pixels)
at 30 frames per second (fps), using only a single standard RISC
processor already incorporated in the vast majority of multimedia
handsets. For mobile carriers, the present disclosure's
low-complexity video processing and distribution technologies are
integrated into a powerful software platform that enables turnkey
deployment using existing mobile handsets and mobile Multimedia
Messaging Service (MMS) infrastructure.
[0301] Complementing the above SW mobile camcorder application, the
present disclosure's content management platform provides carriers
with modules for integrating compressed images and videos,
according to the present technology, together with sounds and text
into complete mobile multimedia messages and "ring-tones", along
with on-the-fly editing, thumbnail previews, multimedia mailboxes,
on-line repository, sharing, and marketing services, and
subscription management, according to one embodiment.
[0302] A typical video data segment may also typically include or
incorporate other types of data. Such other data may include audio
data captured concurrently with the video. It may also include
other data including metadata comprising time of capture, location
information (derived from GPS, mobile cell tower location, from
scene recognition from camera image data, wireless transmitter
(e.g., WIFI) identification, etc.), user identification, mobile
device identification, user added information (including user
responses to service queries, including video service queries,
titling, naming, later added annotation audio data).
[0303] The data may also include metadata derived from the video
and audio data being captured or derived from the process of video
capture. It may also include metadata derived from additional
sensor devices for example jitter data derived from a gyroscope or
angular rate sensor. This additional data can be used in various
ways in the video editing, storing, search, retrieval, location
identification, integration with advertising, video offerings to
camera users and other services described in this application.
[0304] In one embodiment, video delivery is based on the location
of a user. For example, mobile devices (e.g., a cellular phone, a
BlackBerry, etc.) may include GPS tracking functionalities thus the
location of the mobile device user can be identified for delivering
geographic specific video data to the user.
[0305] The location based videos can be of scenic tours as
determined by the user's location. For example, a tourist in Italy
who is unfamiliar with Vienna, may request information specific to
Vienna from a mobile device. Streaming video of a live tour may be
delivered to the user, for example, upon request, and/or
automatically based on user settings. Similarly, video templates
that are geographic specific can be suggested and/or provided to
users based on an identified geographic location of the user.
[0306] Such dynamic template suggestions can be automatically
provided or provided upon request based on user settings. For
example, if the user is in shooting videos in New York City,
templates of for example, the Empire State Building, the Statue of
Liberty, etc. may be provided to the user.
[0307] In addition, enhancements for videos could be provided based
on a geographic location of the user. For example, pre-recorded
videos could be provided on demand or automatically, of a current
location of a user, for enhancing the videos taken by a user. In
one embodiment, targeted advertising could be provided, for
example, user data.
[0308] The user data can include geographical data, age data,
subscription data. User data may be gathered from various sources,
such as information provided by the user, billing information,
subscription information, real-time gathered information (e.g.,
call records, geographic location of the user, etc.).
[0309] Such user data may be utilized to determine user preferences
and hobbies for example to deliver targeted advertisements. In some
embodiments, service fee offsets can be provided for advertisements
that are viewed.
[0310] One or more embodiments of the present disclosure can be
embodied in a system, for example, in an exemplary embodiment, a
video editing service system (e.g., an automated video editing
service system), described below.
Automated Video Editing Service (AVES) System
[0311] An example of a preferred embodiment of the present
disclosure may comprise an automated video editing service (AVES).
FIG. 24a shows a possible schematic overview of components that may
comprise the makeup of AVES, in accordance with the present
disclosure.
[0312] In one embodiment, the video editing service provides video
editing services, in response to a user request, generated for
example, via a portable device with video capturing
functionalities. The video editing can be provided to multiple
users, simultaneously via a routing system, to several video
processors.
[0313] In one embodiment, a router tracks the operation status of
one or more video processors such that new processing tasks are
routed to video processors that are not in operation, or routed to
a processor that has a smaller queue of tasks. Thus, multiple video
processing tasks can be conducted simultaneously.
[0314] In one embodiment, the video editing service provides one or
more templates to a user via a mobile device. The one or more
templates can be pre-stored on the mobile device. In some
embodiments, the templates can be downloaded via the mobile device,
either for example, based on user request, or automatically based
on user settings, etc. In some embodiments, the templates can be
provided based on user-specific data, the data can be collected via
one of many processes, such as, for example. GPS functionalities,
triangulation data obtained via towers, user subscription data,
etc. In one embodiment, the templates are provided to users for a
fee.
[0315] A request for video edit may include a video clip and a
template, as chosen by the user. The video may be clipped by the
user prior to sending the request. Upon receiving the request for
video edit, the video server processes the request and performs the
edits specified by the user. The edits may comprise using the video
edits associated with the template chosen by the user.
[0316] At least a portion of the edited video can be sent back to
the user such that real-time reviewing of the edits is facilitated.
Upon user indication of approval of edits, the full length version
of the edited video can be sent back to the mobile device. In one
embodiment, the edited video can be sent to multiple recipients to
be received via a mobile device, at the request of the user. In
some embodiments, the edited video can be offered to multiple
users, based on user settings, for example, to receive videos of
relating to videos of a particular subject matter.
[0317] With reference to FIG. 24a, AVES may comprise a Video
Editing Service Client (VESClient) 2410. The VESClient may comprise
a mobile application that may run on any designated mobile
operating system. Preferably, the VESClient may comprise an
application that interfaces the AVES to send and edit video that is
encoded. In accordance with an embodiment of this invention, the
VESClient may connect with AVES via WiFi. It should be noted that
any known method or any other method developed in the future may be
used to connect the VESCLient with AVES.
[0318] The VESClient may comprise many features. The features
comprise, but are not limited to, one or more of, a title screen
(which may comprise options such as, for example, sent/edit video,
My Friends, My Studio, My Videos), ability to select one or more
videos, support AVI containers with MP3 or WMA audio tracks,
ability to trim (i.e., crop a video to a certain length) video,
ability to preview trimmed (i.e., cropped) video, ability to redo
trimming, ability to title video, show activated templates, splice
multiple videos together, send video and/or audio sequences to
AVES, preview edited video resultant file (in some embodiments,
this preview may start within 10 seconds of video upload
commencement), display recipient list based on users contacts
(which may be managed through a website), functions to receive
input of an intended recipient's phone number and/or email address
directly, and show a summary of the last N videos sent from
VESClient to AVES (where N can be any integer). In one embodiment,
a wavelet codec (e.g., a 3D wavelet codec) can be used for video
compression.
[0319] In one embodiment, the AVES includes one or more Services
Switch Points (SSP) 2420. FIG. 24b depicts an exemplary embodiment
of the SSP in connection with N VESClients and X TPs (where N and X
are integers that may or may not be equal to one another). The SSP
may comprise a switch that handles incoming client connections and
assigns them to a Template Processor that is available for video
editing. The SSP may perform load balancing and may be able to form
a distributed network in order to scale the number of concurrent
VESClients that can be connected at any time, to the template
processor or an array of template processorts, for example.
[0320] The SSP may comprise many features. The features may
comprise, but are not limited to one or more of, processing one or
more VESClients and one or more Template Processors connections to
one or more SSP. The SSP can be configured by specifying parameter
values in a configuration file,
[0321] In one embodiment, the system can accept a user login and
determine if the login is legal in the AVES database. In one
embodiment, the system can determine if there is an available
Template Processor to process a VESClient edit request. In one
embodiment, the system can communicate a busy status to a VESClient
based on a determined Template Processor status. In one embodiment,
the system may be able to get contacts and/or template information
from the AVES database and send this information to VESClient. In
one embodiment, the system receives editing requests from VESClient
and stores this information into a database. In one embodiment, the
system is further able to merge template and editing information,
and to receive data chunks from VESClient and forward the data
chunks to a Template Processor.
[0322] In one embodiment, the system may be able to obtain a
summary of a predetermined number of videos (e.g., the last 10
videos) uploaded by this user from the AVES database and send this
information to the VESClient, the system may further be able to
receive a preview request from VESClient and forward the request to
a Template Processor, can receive a data steam (e.g. AVI data
stream) for preview from Template Processor and forward to
VESClient, the system may further receive a preview `skip` from
VESClient and notify Template Processor. In one embodiment, the
system may further receive preview `cancel` from VESClient and
notify the Template Processor,
[0323] In one embodiment, the system may be able to scan the AVES
database to identify new scheduled jobs, for example, to manage the
Template Processor processes (e.g. launching or destroying) In one
embodiment, the system may be able to send new video notifications
to recipients after the video is edited.
[0324] The AVES may also comprise one or more Template Processors
(TP) 2430. The TP may receive editing requests from the SSP. For
example, editing requests may be sent to the TP for processing and
the TP may also provide a scaled down preview version of the video
for streaming back to the VESClient in real-time as the edited
video is being composed. In one embodiment, the TP may further be
able to monitor the disk usage of each user to prevent users from
consuming too much storage.
[0325] The TP may comprise a software system particularly
configured to accomplish the, or a part of the, video editing
processes of the AVES. A plurality of discrete TPs may
simultaneously operate on the same hardware platform and share the
same processor or set of processors. The TPs may be configured in
an array so that the SSP can direct VESClient needs to one of the
available TPs. Thus, this architecture is highly scalable and can
be built using relatively low cost generic platforms (i.e., not
custom video editing hardware platforms) that have the plurality of
software TP engines available on each platform. Each hardware
platform may have its own SSP or, in some embodiments, an SSP on
one hardware platform can functionally operate with TPs on
different hardware platforms.
[0326] The TP may comprise many features. The features may comprise
but are not limited to, one or more of, receiving editing
information from SSP (e.g., the editing information may contain the
XML description of a video editing template, in addition to other
meta data used to generate the edited video), parsing editing
information in order to determine how to compose the resultant
video.
[0327] In an embodiment of the present disclosure, the uploaded
data stream can be received (which may be AVI) from SSP, In one
embodiment, the original raw data file can be stored for uploaded
videos. In addition video and audio from an AVI file may be
de-multiplexed. Additional embodiments may include receiving edited
video and audio (the TP may be directed by information contained in
the associated template), and re-multiplexing the edited video and
audio on the server, and/or storing the edited data as an AVI
file.
[0328] One embodiment further comprises one or more of the ability
to splice videos into the beginning or end of uploaded feeds, to
splice multiple videos together, to center or stretch-to-fit still
pictures which have a different resolution than the target edited
video, to mix audio tracks, to produce some transition effects
between video segments according to the selected template or by
analyzing scene transitions (the transition effects may comprise
wipe, cross-fade, dissolve, fly, magnify, blinds, checker, and
appear, among other possibilities), to convert the video segments
to black & white or sepia, to add a time stamp, date stamp,
and/or location stamp to the video, to create a slide show from the
edited video, to produce a preview AVI stream (the video may
comprise a reduced frame size and reduced frame rate),
[0329] Embodiments of the present disclosure further comprise one
or more of the ability to, receive `Preview` request from SSP and
then send back a preview video data stream, to receive `Preview
skip` request from SSP and then stop producing the preview stream,
to merge several media objects into an AVI according to which
template the user has selected, to apply video and audio effects
(which may be based on the user selected template), to apply color
morphing on video segments, to apply a black and white transform on
video segments, to support animation and background overlays for
video segments, and/or may be able to insert text captions for
video segments.
[0330] The AVES may also comprise a website 2440. The website may
be used by users to see a list of their videos. For example, the
videos may be created by the user or sent to the user by other
users. The website may also be used to by users to edit contact
information, activate templates, and activate media. In one
embodiment, the website may also provide the user with one or more
of the ability to invite friends to view videos, to launch the
template editor, to activate audio tracks, to display videos by
locations, to display a list of videos uploaded by or sent to the
user, purchase new templates, and allow user to maintain the
contact list. It should be noted that this is not an exhaustive
list of features available to user via the website. Other features
can be included.
[0331] The AVES may also comprise a template editor (TE) 2450. In
one embodiment, the TE is a tool used to create custom templates.
The template editor may be Flashed based and run in the user's
browser. In one embodiment, the TE may interact with AVES over the
internet, or any other type of network, such as a LAN, WAN.
[0332] In some embodiments, the TE may further provide one or more
user services, such as: create a template that may combine video
segments comprising transition components, background music, and/or
still pictures; add media into the template; insert transitions
between two video segments; mix audio tracks; preview pictures,
audio tracks, and/or video elements; set properties on pictures
(such as, for example, if the pictures need to be centered or
stretch-to-fit); set the duration for showing still pictures and
transitions; insert video slots into the template; upload the
produced templates to the AVES; preview how templates would work;
set properties for each video and audio element; set the properties
for video segments, for example: color morphing, animation,
background overlays, and/or text captions; set the weight of an
audio track relative to other tracks it overlays; and show the
user's activated templates.
[0333] The AVES may also comprise PHP Services 2460, which may
comprise a set of PHP files used by the Website and/or the TE to
allow access to the AVES database. These PHP files may also provide
support for the Receiving PC Application to get notifications about
when a new video has become available.
[0334] The AVES may also comprise a Receiving PC Application
(ReceivingApp). The Receiving PC application may be an application
that periodically polls the servers to see if a user has new videos
sent to them. It may be set up to require a user id and password to
login. It may also be able to be used to launch the website when
new videos arrive, and it may also be able to detect if the wavelet
codec (e.g., 3D wavelet codec) is installed. In addition, If the
codec is not present, the Receiving PC may install the codec if it
is not present. The codec may be obtained from the server.
[0335] The AVES may also comprise a Database 2470. The AVES
database may be configured to hold information about, among other
items, user accounts, uploaded videos, edited videos, media,
templates, and scheduled jobs from the TE. It should be noted that
the Database may be configured to store any information
desired.
Example of AVES Setup
[0336] This example illustrates an example of the architectural
setup and usage patterns of an exemplary embodiment of the present
invention.
[0337] VESClient
[0338] 1. VESClient gets server information from a local config
file and uses this information to connect to the AVES. Preferably,
the user should not have to configure the server IP address
information, as this may be handled automatically.
[0339] 2. If the user logs into AVES for the first time, he/she has
to input his/her cell phone number. Later the VESClient can use
this stored number to login automatically.
[0340] 3. Login is successful if the cell phone number is known to
AVES.
[0341] 4. After logging into AVES, AVES will return back a template
list and recipient list to the VESClient.
[0342] 5. The user may select an activated template from the
template list, select some videos in local system, and has the
option to trim some of them. (Trimming operations may be
implemented in another sub-window, in which the user can set the
start point and end point of the selected video, and the video will
be trimmed using the two selected points.) The user can trim the
selected video repeatedly until user is satisfied.
[0343] 6. For a given template, the user may select an equal number
of videos as the number of slots in the template. There will be an
indicator in brackets to suggest how many clips a slot should
contain.
[0344] 7. The user may also provide a title for the video before
uploading.
[0345] 8. The VESClient may upload selected trimmed videos to the
AVES.
[0346] 9. The user may preview the edited video while upload is
occurring. The user may also cancel the preview directly or skip
the preview.
[0347] 10. After uploading is finished or if the user skips
previewing, the user may decide who will receive the edited video
by selecting contacts from their contact list. The user may also
manage their contacts from the phone.
[0348] 11. After sending the edited video to receivers, VESClient
may present the last 10 videos uploaded by this user.
[0349] 12. The user may return to the Home screen within the
VESClient.
[0350] SSP (Services Switch Point)
[0351] For VESClient
[0352] 1. VESClient connects to the SSP and sends its cell phone
number. SSP may check to see if the cell phone number exists in the
database.
[0353] 2. If the number does not exist, the user will be informed
that they need to signup for an account.
[0354] 3. After logging in, SSP may retrieve the user's list of
templates and contact information from the AVES database and send
them to the VESClient.
[0355] 4. SSP may then receive editing information from the
VESClient.
[0356] 5. SSP may retrieve the details for the selected template
from the AVES database. SSP may then merge the template and the
editing information and save this editing information into the AVES
database.
[0357] 6. SSP may then send the merged information to the
corresponding TP.
[0358] 7. SSP then may receive an incoming data stream from
VESClient and forward the data directly to a TP.
[0359] 8. SSP may receive a request for preview from VESClient. The
SSP may then tell the corresponding TP to send back a video data
stream.
[0360] 9. SSP then may receive the video data stream from TP and
forward it to VESClient.
[0361] 10. TP may notify SSP when it has finished the editing
process. SSP may tell VESClient that the editing process has
finished and annotate the database.
[0362] 11. SSP may receive title and recipient information from
VESClient, and stores this information into the database.
[0363] 12. SSP may receive a request for history from VESClient,
and then may return the last 10 videos created by the corresponding
user.
[0364] 13. After entire process has finished, SSP may clean all
relative information in AVES (such as temporary video files and
database entries used during the editing process.)
[0365] For TE
[0366] 1. SSP may check the scheduled "Try it now" job table
regularly.
[0367] 2. If there are jobs waiting, SSP may check if there is an
available TP for the job. If there is not an available TP, the job
may be held until there is an available TP.
[0368] 3. SSP may read information about the job and retrieve the
newly created template from the database. SSP may then merge the
template and job meta-data.
[0369] 4. SSP may send the merged job information to TP.
[0370] 5. After the TP completes the process, the TP may notify
SSP. SSP may then notify the Website that the process has been
completed by updating the database.
[0371] 6. After entire process has finished, SSP may clean all
relative information in AVES.
[0372] TP (Template Processor)
[0373] 1. TP may receive the merged template and parse it.
[0374] 2. TP may receive the uploaded video data stream and store
it as an original file.
[0375] 3. TP may edit the data stream according to the parsed
template.
[0376] 4. TP may save the result as an edited file.
[0377] 5. TP may produce a video for preview (the preview will be
stored in memory).
[0378] 6. TP may receive a preview request and then send the
preview stream back to SSP.
[0379] 7. If the TP receives a request to skip preview, then TP may
stop producing the preview data stream and delete all preview
chunks in memory.
[0380] 8. After TP has finished, TP may notify SSP.
[0381] Website
[0382] The website may have different options depending on whether
the site is being accessed by an user or administrator. The website
may have the ability to login and logout out both a user and
administrator.
[0383] For a user the website may offer the ability to register,
manage contacts, manage video (for example, see sent videos,
received videos, uploaded videos, etc.), manage media (for example,
view music, purchase music, view background images, purchase
background images, view templates, purchase templates, etc.), and
create new templates.
[0384] For an administrator the website may offer the ability to
manage media (for example, create music, delete music, create
background images, delete background images, etc.), and manage
templates (for example, create and delete templates).
[0385] TE (Template Editor)
[0386] 1. User may launch TE from the website.
[0387] 2. User may add still pictures, videos, transitions, etc. to
the time line.
[0388] 3. User may set background music (and its duration) to video
slots in the time line.
[0389] 4. After the user finishes editing the template, the user
may upload the template to AVES.
[0390] 5. After uploading the template, the user may preview the
effect by clicking `Try it now`.
[0391] 6. `Try it now` may instruct the user to select videos
already uploaded to AVES for each of the template's empty
slots.
[0392] 7. When a `Try it now` job has been completed, the user may
watch the final result from the Website.
[0393] ReceivingApp
[0394] 1. ReceivingApp may be launched when Windows starts.
[0395] 2. A login dialog box may pop up when ReceivingApp launches
(this may only occur the first time, after that the user login
information may be cached).
[0396] 3. The user enters their Droplet Id and Password to
login.
[0397] 4. ReceivingApp then connects to the Website.
[0398] 5. ReceivingApp may periodically check to see if there are
some edited videos for current user.
[0399] 6. If there are some edited videos for current user,
ReceivingApp may pop up a balloon notification.
[0400] 7. If the user clicks on the balloon, the Website may be
launched.
[0401] 8. The user can right click on the ReceivingApp icon in the
status bar of Windows to open the Website.
[0402] Connections
[0403] As a distributed system, AVES components may be connected in
two ways; one is based on TCP, and the other is based on HTTP.
[0404] VESClient & SSP
[0405] The connections between the VESClient and SSP may be socket
and TCP based. There may be two connections between any VESClient
and SSP. One connection is for commands, which may be based on a
private binary protocol. The other connection is for the preview
data stream.
[0406] SSP & TP
[0407] The connections between SSP and TP may be socket and TCP
based. There may be two connections between the SSP and any given
TP. One connection is for sending messages from the SSP to the TP.
The other connection is for receiving the preview data stream from
TP to SSP. These messages may be based on a private binary
protocol.
[0408] TE & PHP Services
[0409] The connections between TE and PHP Services may be HTTP
based. These connections may be based on private HTTP
protocols.
[0410] ReceivingApp & PHP Services
[0411] The connections between the ReceivingApp and PHP Services
may be HTTP based. These connections may be based on private HTTP
protocols.
[0412] Login and Upload Process
[0413] With reference to FIG. 25, the following is an example of a
workflow of an exemplary embodiment of the present invention as it
relates to the login and upload processes of the AVES.
[0414] 1. VESClient sends login message with cell phone number to
SSP.
[0415] 2. SSP checks database to see if there is a record that
matches the cell phone number.
[0416] 3. SSP checks if there is an available TP for the
VESClient.
[0417] 4. Login successfully occurs if the cell phone number is
matched and there is an available TP. Then SSP gets template and
contact information from the database.
[0418] 5. SSP returns template and contact information back to
VESClient.
[0419] 6. User may select template, select videos, and trim
selected videos.
[0420] 7. VESClient sends edit information to SSP.
[0421] 8. SSP gets the corresponding template details for the
request from the database, merges the template with the editing
information, and saves the editing information into database.
[0422] 9. SSP sends merged template to TP.
[0423] Uploading and Preview
[0424] With reference to FIG. 26, the following is an example of a
workflow of an exemplary embodiment of the present invention as it
relates to the uploading and preview processes of the AVES.
[0425] 1. SSP sends received video chunks to TP.
[0426] 2. TP stores these chunks as original video files.
[0427] 3. TP edits these chunks according to the corresponding
template.
[0428] 4. TP stores edited chunks as edited video files.
[0429] 5. TP produces chunks for preview.
[0430] 6. User may decide when preview will start. SSP receives
preview request from User and then communicates to TP to begin the
preview.
[0431] 7. TP transmits preview chunks to SSP.
[0432] 8. SSP transmits preview chunks to VESClient.
[0433] 9. VESClient displays these chunks as a video within Windows
Media Player.
[0434] 10. User may skip the preview or wait for it to end.
[0435] 11. If the user skips the preview, SSP tells TP to stop
preview.
[0436] 12. After complete editing, TP tells SSP editing is
finished.
[0437] 13. SSP stores necessary information into database, and
cleans up the database.
[0438] After Uploading
[0439] With reference to FIG. 27, the following is an example of a
workflow of an exemplary embodiment of the present invention as it
relates to processes of the AVES that occur after uploading.
[0440] 1. User may select recipients that will receive the edited
video.
[0441] 2. VESClient sends selected title and recipients to SSP.
[0442] 3. SSP stores these recipients into the database.
[0443] 4. VESClient requests history of last 10 files uploaded.
[0444] 5. SSP gets history from database.
[0445] 6. SSP returns history back to VESClient.
[0446] 7. VESClient displays history.
[0447] Receiving
[0448] With reference to FIG. 28, the following is an example of a
workflow of an exemplary embodiment of the present invention as it
relates to the receiving process of the AVES.
[0449] 1. Login dialog box pops up when Receiving Application
launches.
[0450] 2. The user enters id and password to login.
[0451] 3. The Receiving Application queries the PHP pages.
[0452] 4. PHP pages check the database to see if there is any video
sent to the user.
[0453] 5. If login is successful, Website will be launched.
[0454] 6. If there is a new video sent to user, an indicator will
pop up.
[0455] 7. The user can click on the indicator to go to the
website.
[0456] 8. In the website the user can see the video list or a map
with the available videos.
[0457] 9. User may also click the application on the computer
desktop to go to the website.
[0458] Login, Edit, & Upload of Templates
[0459] With reference to FIG. 29, the following is an example of a
workflow of an exemplary embodiment of the present invention as it
relates to the processes of logging in, editing, and uploading
templates of the AVES.
[0460] Edit
[0461] 1. When the user is creating templates, they can insert
media elements into the templates. These elements may include a
title, still pictures, videos, audio tracks, transition effects,
etc.
[0462] 2. When the user wants to specify media resources (e.g.
picture files) for elements, TE will send a request to the PHP
Services for the available resource list.
[0463] 3. PHP Services receives the request and queries the
database to find available resources. PHP Services then sends this
list back list to TE.
[0464] 4. The user can select resources from this list.
[0465] Upload Templates
[0466] 1. After the user finishes creating a template, they may
click the "Upload" button to upload the template.
[0467] 2. An uploading request will be sent to PHP Services.
[0468] 3. PHP Services receives the template script and records the
template script into the database.
[0469] Try It Now
[0470] With reference to FIG. 30, the following is an example of a
workflow of an exemplary embodiment of the present invention as it
relates to the Try It Now function for templates of the AVES.
[0471] 1. The user may try out a newly created template by clicking
the "Try It Now" button.
[0472] 2. A request is sent to the PHP Service to query for the
available videos which are on the server. The PHP Service returns
the video information list to TE.
[0473] 3. A pop-up window lists these videos. The user selects the
appropriate number of videos files from the list for the slots in
template.
[0474] 4. TE submits an editing request to PHP Service.
[0475] 5. PHP Service schedules a new job by adding this job to
database.
[0476] 6. SSP polls the database regularly to check whether there
are newly scheduled jobs. If it finds a newly scheduled job, it
will look for an available TP to execute it. If an available TP is
found, this TP will process the job, otherwise, TE will have to
wait until a TP is available. After TP finishes processing, SSP may
remove this new job and add a new record to the results table.
[0477] 7. The user may determine when the video is available by
checking the video list on the webpage.
[0478] 8. After the editing process is completed, the user may
click the corresponding link for the edited video in the video list
page. This will cause the preview to be launched.
[0479] Example of User Experience
[0480] With reference to FIGS. 31-55, this example illustrates an
example of a user's experience with a preferred embodiment of the
present invention.
[0481] FIG. 31 illustrates an example of a user navigation map in
accordance with an embodiment of the present invention.
[0482] FIG. 32 depicts an example of a title screen or home page of
the VESClient. In the embodiment depicted in FIG. 32, the home
screen has 4 buttons: Send/Edit Video, My friends, My studio, and
My videos.
[0483] Clicking on the Send/Edit Video button guides the user to
the "Video-Template Select" Page, an example of which is depicted
in FIGS. 33a and 33b. This page may allow a user to select videos
and a template. Possible descriptions of the buttons depicted in
FIGS. 33a and 33b are as follows:
[0484] Script: Select a script to use.
[0485] Video List: List all the selected videos.
[0486] Add: Add a video to the list.
[0487] Title: Add a title for the video.
[0488] Up: Move up one position.
[0489] Down: Move down one position.
[0490] Remove: Remove a video from the "Video List".
[0491] Trim Button: Starts the Video Trim screen to trim the
selected video. A trimmed video has a trim icon next to it.
[0492] Untrim Button: Reset the frame pointer to begin at 0 and end
at the last frame.
[0493] Play: Preview the video.
[0494] Possible scenarios from the screen depicted in FIGS. 33a and
33b are illustrated as follows:
[0495] 1. User may select a script.
[0496] 2. User may click the "Add" button to add a video.
[0497] 3. User may set the order of selected videos. (The user may
select a video from the "Video List" and then click the "Up" button
to move up a position. The user may also select a video from the
"Video List" and then click the "Down" button to move down a
position.)
[0498] 4. User may select a video from the "Video List" and then
click the "Remove" button to remove the video out of the "Video
List".
[0499] 5. User may select a video from the "Video List" and then
click the "Trim" button to pop up the "Video Trim" Page in order to
trim the video.
[0500] 6. User may select a trimmed video from the "Video List" and
then click the "Untrim" button to cancel the trim.
[0501] 7. User may select a video from the "Video List" and then
click the "Preview" button to preview the video.
[0502] 8. User may select a template from the "Template" drop down
list.
[0503] 9. User may return to the VESClient Home screen by clicking
the "Home" menu.
[0504] 10. After selecting videos and a template, the user may
click the "Upload" menu item. This will take the user to the
"Preview" page.
[0505] FIG. 34 depicts an example of the screen a user would see if
the user chose to add a video.
[0506] FIG. 35 depicts an example of the screen a user would see if
the user chose to preview a video.
[0507] FIG. 36 depicts an example of a screen a user may see if the
user chose to trim a video. When you select "Trim" to trim a video,
the video may begin to play. The left menu item may be "Cancel",
and the right menu item may be "Set Start". If you click "Set
Start" the right menu item may change to "Set Stop". When the video
is done playing or the user clicks "Set Stop" the video may pause
and the left menu item may change to "Accept". The Video Trim page
may be divided into three pages: Set Start Pos, Set End Pos, and
Play Complete.
[0508] FIG. 37 depicts an example of the Set Start Pos page. In
this example, if the user selects the "Cancel" menu item, the user
may be returned to the Video-Template select screen. If the user
selects the "Set Start" menu item, the start trim position is set
and the right menu item may automatically change to "Set Stop".
[0509] FIG. 38 depicts an example of the Set Stop Pos page. In this
example, the user may click the "Set Stop" menu item to set the
video end position. If the user does not select the Set Stop item,
the end of the video may be set as the end position. The user may
select the "Cancel" button to replay video and to reset the start
and end positions.
[0510] FIG. 39 depicts an example of the Accept Trim page. The user
may click the "Cancel" menu item to return back to Video-Template
Select page. The user then may be able to choose to trim the video
again or play the currently trimmed selection. The user may click
the "Accept" menu item to accept the cropped video and go back to
Video-Template Select page.
[0511] If the user selects to preview a video, a series of three
pages may be displayed. FIG. 40 depicts the Uploading page, which
the VESClient may display first. The "Preview" button may be
disabled at first, but become enabled for the user to choose once a
preview is available. An advertisement may be displayed for some
period of time while the video is being uploaded.
[0512] FIG. 41 depicts screens showing that the video is being
uploaded and the a preview is "now available" while uploading
continues.
[0513] If "Cancel" is chosen, the user may be returned to the
"Video-Template Select" page. FIG. 42 depicts and example of a
screen when the user chooses to cancel an upload. The user may be
prompted to ensure that the user intends to cancel the uploading
and previewing of the video. If the user chooses to preview the
video, the "Preview" button may be changed to "Skip".
[0514] FIG. 43 depicts an example of a screen that shows the user a
preview of a video. If the user chooses to skip the preview, the
user may be directed to the "Recipients Select" page.
[0515] FIG. 44 depicts an example of a screen if the user chooses
to cancel the uploading.
[0516] FIG. 45 depicts an example of a screen in which the preview
has completed playing. The "Skip" button may change to "Done". The
user may select "Cancel" item to cancel the upload, cancel the
preview, and return to "Video-Template Select" page. User may
select "Done" to go to the "Recipients Select" page.
[0517] FIG. 46 depicts examples of a screen in which a user can
select a recipient to receive a video. If the user chooses to send
a video to a recipient, the user may select recipients from the
recipients drop down list and then click the "Add" button to add
the recipients to the "Recipients List". The user may select a
recipient from the "Recipients List" and then click "Remove" button
to remove it from the "Recipients List". The user may click the
"Cancel" menu item to go back to "Video-Template Select" page.
After selecting recipients, the user may click the "Send" menu item
to send the edited video.
[0518] A description of the screen items depicted in FIG. 46 are as
follows:
[0519] Send: If the upload hasn't completed, this item will be
disabled.
[0520] Cancel: Back to the Video-Template Select page.
[0521] Recipients: List all the recipients.
[0522] Recipients List: List all the selected recipients.
[0523] Send: When upload completes, this item will become
enable.
[0524] FIG. 47 depicts an example of the Summary and History page.
In this embodiment, the Summary and History pages shows the title
of the last 10 edited videos that were sent by the user. If the
user should select the "Home" menu item, the video editing program
may be restarted and the user may be sent to the "Video-Template
Select" page. The user may also select "Exit" to exit the VESClient
application.
[0525] FIG. 48 depicts an example of a login page for the Receiving
PC application. In the example of this exemplary embodiment, the
Receiving PC application is a Microsoft Foundation Class (MFC)
application that resides in the Windows application tray.
[0526] FIG. 49 depicts an example of a screen shot if the user
login fails. A warning message will be issued and the user may
reenter the user name and password.
[0527] If the login is successful, an icon (as depicted inside the
highlighted square of FIG. 50) appears on the task bar, and the
website may be launched. Preferably the website is launched
automatically. If the user double clicks the left mouse button on
the icon, the application opens the website automatically. If the
user right clicks on the icon a menu may pop up, as depicted in
FIG. 51. The user may select "Web" to launch the website, select
(or deselect) "Auto Start" to decide if the application auto runs
within Windows, or select "Exit" to end the application.
[0528] FIG. 52 depicts a bubble that may pop up to alert the user
that a newly edited video is available. If the user left clicks on
this icon, the application may launch the website
automatically.
[0529] FIGS. 53, 54, and 55 depict different examples of screen
shots of the template editor. Examples of certain components and
features of this exemplary embodiment of the template editor page
are as follows:
[0530] Video Panel
[0531] This panel may include images, videos, and slots.
[0532] 1. Image--the list of still pictures may be downloaded from
the server. After selecting an image the actual picture data may be
downloaded from the server and shown to the user.
[0533] 2. Video--the list of videos may be downloaded from the
server.
[0534] 3. Slots--clicking on the slot button may cause a slot item
to be created.
[0535] Transition Components Panel
[0536] This panel includes different examples of transition
components (as buttons). The user may click on a transition button
and create the transition item in the video timeline.
[0537] Video Line
[0538] The video time line may consist of elements that represent
still pictures, video on the server, original video, and transition
components. The elements are ordered by time. After an element or
transition component is added to the timeline, right clicking on
the element may allow the user to modify the properties of this
video element, add background music, or delete the element. If the
user selects to add background music, an audio line may be created.
The user may be able to set the weight of the volume for each audio
element added. These weights may be used to mix overlapping audio
in the final video.
[0539] Audio Line
[0540] Each video element may have at least one audio track added
to it for this version. An example of the template editor in use is
shown below:
[0541] 1. The user launches the TE application on the Website.
[0542] 2. The user moves the mouse over the "Picture" button. (The
list of pictures on the server may be shown in a pop up
window.)
[0543] 3. The user clicks one of the pictures in the list. (A
picture element may be created on the video line.)
[0544] 4. The user clicks the transition button in the "Transition
panel". (A transition element may be created on the video
line.)
[0545] 5. The user clicks the "Slot" button. (An empty slot may be
created on the video line.)
[0546] 6. Right clicking on a slot element may cause a menu to be
shown. Selecting "Add background music" may display a list of
available music tracks to add. Background music may play to
completion, across multiple slots.
[0547] 7. The user may repeat steps 2-6
[0548] 8. The user clicks the "Upload" button to upload the
template to the server. (This may enable the "Try it now"
button).
[0549] 9. The user clicks the "Try it now" button after uploading a
template. (A panel may be shown for the user to select his or her
previously uploaded videos for the empty slots in the template. If
the user has not previously uploaded clips for this use, AVES may
use default "try it now" clips from AVES.)
[0550] 10. If "Try it now" is executed, the user may watch the
resulting video on the Website after the TP has finished creating
it.
[0551] In one embodiment, a machine in the exemplary form of a
computer system within which a set of instructions, for causing the
machine to perform any one or more of the methodologies discussed
herein, may be executed. In alternative embodiments, a machine
operates as a standalone device or may be connected (e.g.,
networked) to other machines. In a networked deployment, the
machine may operate in the capacity of a server or a client machine
in a client-server network environment, or as a peer machine in a
peer-to-peer (or distributed) network environment. The machine may
be a server computer, a client computer, a personal computer (PC),
a tablet PC, a set-top box (STB), a personal digital assistant
(PDA), a cellular telephone, a web appliance, a network router,
switch or bridge, or any machine capable of executing a set of
instructions (sequential or otherwise) that specify actions to be
taken by that machine.
[0552] While the machine-readable medium is shown in an exemplary
embodiment to be a single medium, the term "machine-readable
medium" should be taken to include a single medium or multiple
media (e.g., a centralized or distributed database, and/or
associated caches and servers) that store the one or more sets of
instructions. The term "machine-readable medium" shall also be
taken to include any medium that is capable of storing, encoding or
carrying a set of instructions for execution by the machine and
that cause the machine to perform any one or more of the
methodologies of the present invention. In general, the routines
executed to implement the embodiments of the disclosure, may be
implemented as part of an operating system or a specific
application, component, program, object, module or sequence of
instructions referred to as "computer programs." The computer
programs typically comprise one or more instructions set at various
times in various memory and storage devices in a computer, and
that, when read and executed by one or more processors in a
computer, cause the computer to perform operations to execute
elements involving the various aspects of the disclosure.
[0553] Moreover, while embodiments have been described in the
context of fully functioning computers and computer systems, those
skilled in the art will appreciate that the various embodiments are
capable of being distributed as a program product in a variety of
forms, and that the disclosure applies equally regardless of the
particular type of machine or computer-readable media used to
actually effect the distribution. Examples of computer-readable
media include but are not limited to recordable type media such as
volatile and non-volatile memory devices, floppy and other
removable disks, hard disk drives, optical disks (e.g., Compact
Disk Read-Only Memory (CD ROMS), Digital Versatile Disks, (DVDs),
etc.), among others, and transmission type media such as digital
and analog communication links.
[0554] Although embodiments have been described with reference to
specific exemplary embodiments, it will be evident that the various
modification and changes can be made to these embodiments.
Accordingly, the specification and drawings are to be regarded in
an illustrative sense rather than in a restrictive sense. The
foregoing specification provides a description with reference to
specific exemplary embodiments. It will be evident that various
modifications may be made thereto without departing from the
broader spirit and scope as set forth in the following claims. The
specification and drawings are, accordingly, to be regarded in an
illustrative sense rather than a restrictive sense.
* * * * *