U.S. patent application number 15/140357 was filed with the patent office on 2016-11-03 for low latency and low defect media file transcoding using optimized storage, retrieval, partitioning, and delivery techniques.
This patent application is currently assigned to Box, Inc.. The applicant listed for this patent is Box, Inc.. Invention is credited to Bryan Huh, Tanooj Luthra, Ritik Malhotra.
Application Number | 20160323351 15/140357 |
Document ID | / |
Family ID | 57204093 |
Filed Date | 2016-11-03 |
United States Patent
Application |
20160323351 |
Kind Code |
A1 |
Luthra; Tanooj ; et
al. |
November 3, 2016 |
LOW LATENCY AND LOW DEFECT MEDIA FILE TRANSCODING USING OPTIMIZED
STORAGE, RETRIEVAL, PARTITIONING, AND DELIVERY TECHNIQUES
Abstract
Systems, methods and computer program products for
high-performance, low latency start-up of large shared media files.
A method for low latency startup with low defect playback commences
upon identifying a first media file having a first format to be
converted to a second media file having a second format. A
scheduler divides the first media file into multiple partitions
separated by partition boundaries. The method continues by
converting the partitions into respective converted partitions that
comport with the second format. Determinations as to the position
of the partition boundaries is made based on measurable conditions
present at a particular moment in time. Different formats receive
different treatment based on the combination of characteristics of
the first format, characteristics of the second format, as well as
on characteristics of measurable conditions present at the moment
in time just before conversion of a segment.
Inventors: |
Luthra; Tanooj; (San Diego,
CA) ; Malhotra; Ritik; (San Jose, CA) ; Huh;
Bryan; (San Jose, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Box, Inc. |
Redwood City |
CA |
US |
|
|
Assignee: |
Box, Inc.
Redwood City
CA
|
Family ID: |
57204093 |
Appl. No.: |
15/140357 |
Filed: |
April 27, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62154658 |
Apr 29, 2015 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 2212/154 20130101;
G06F 16/183 20190101; G06F 16/185 20190101; G06F 2212/657 20130101;
H04L 67/34 20130101; G06F 2212/463 20130101; H04L 67/1097 20130101;
G06F 16/113 20190101; G06F 16/1774 20190101; H04L 65/607 20130101;
G06F 16/1727 20190101; G06F 2212/1016 20130101; H04L 65/602
20130101; G06F 9/46 20130101; G06F 16/188 20190101; G06F 16/22
20190101; G06F 16/196 20190101; G06F 16/172 20190101; G06F 12/122
20130101; G06F 12/0891 20130101; G06F 16/23 20190101; G06F 2212/60
20130101; H04L 67/06 20130101; G06F 16/1748 20190101; H04N 19/40
20141101; G06F 12/1081 20130101; G06F 16/2443 20190101; G06F 16/182
20190101; G06F 16/9574 20190101; G06F 2212/1044 20130101; H04L
63/0428 20130101; H04L 65/80 20130101 |
International
Class: |
H04L 29/06 20060101
H04L029/06; H04N 19/40 20060101 H04N019/40 |
Claims
1. A method comprising: identifying a first media file having a
first format to be converted to a second media file having a second
format; partitioning the first media file into two or more
partitions separated by one or more partition boundaries, wherein
the one or more partition boundaries are at a determined key frame
position; converting into the second format, the two or more
partitions to respective two or more converted partitions, wherein
the respective two or more partitions are converted by respective
two or more computing devices; and assembling the respective two or
more converted partitions to comprise the second media file.
2. The method of claim 1, wherein the determined key frame position
is a closest key frame location to a respective one of the
partition boundaries.
3. The method of claim 1, further comprising a delivering at least
a portion of the second media file to a requestor to be viewed on a
media player.
4. The method of claim 1, wherein the two or more partitions are
characterized by a set of progressively increasing quality
levels.
5. The method of claim 1, wherein at least one of the respective
two or more converted partitions comprise an attribute dataset.
6. The method of claim 5, wherein the attribute dataset is at least
one of, an atom, a movie atom, and a moov atom.
7. The method of claim 1, wherein steps for converting into the
second format comprise a timecode correction.
8. The method of claim 1, wherein at least one of the respective
two or more partitions corresponds to a beginning portion of the
first media file.
9. A computer readable medium, embodied in a non-transitory
computer readable medium, the non-transitory computer readable
medium having stored thereon a sequence of instructions which, when
stored in memory and executed by a processor causes the processor
to perform a set of acts, the acts comprising: identifying a first
media file having a first format to be converted to a second media
file having a second format; partitioning the first media file into
two or more partitions separated by one or more partition
boundaries, wherein the one or more partition boundaries are at a
determined key frame position; converting into the second format,
the two or more partitions to respective two or more converted
partitions, wherein the respective two or more partitions are
converted by respective two or more computing devices; and
assembling the respective two or more converted partitions to
comprise the second media file.
10. The computer readable medium of claim 9, wherein the determined
key frame position is a closest key frame location to a respective
one of the partition boundaries.
11. The computer readable medium of claim 9, further comprising
instructions which, when stored in memory and executed by the
processor causes the processor to perform acts of a delivering at
least a portion of the second media file to a requestor to be
viewed on a media player.
12. The computer readable medium of claim 9, wherein the two or
more partitions are characterized by a set of progressively
increasing quality levels.
13. The computer readable medium of claim 9, wherein at least one
of the respective two or more converted partitions comprise an
attribute dataset.
14. The computer readable medium of claim 13, wherein the attribute
dataset is at least one of, an atom, a movie atom, and a moov
atom.
15. The computer readable medium of claim 9, wherein steps for
converting into the second format comprise a timecode
correction.
16. The computer readable medium of claim 9, wherein at least one
of the respective two or more partitions corresponds to a beginning
portion of the first media file.
17. A system comprising: a storage medium having stored thereon a
sequence of instructions; and a processor or processors that
execute the instructions to cause the processor or processors to
perform a set of acts, the acts comprising, identifying a first
media file having a first format to be converted to a second media
file having a second format; partitioning the first media file into
two or more partitions separated by one or more partition
boundaries, wherein the one or more partition boundaries are at a
determined key frame position; converting into the second format,
the two or more partitions to respective two or more converted
partitions, wherein the respective two or more partitions are
converted by respective two or more computing devices; and
assembling the respective two or more converted partitions to
comprise the second media file.
18. The system of claim 17, wherein the determined key frame
position is a closest key frame location to a respective one of the
partition boundaries.
19. The system of claim 17, wherein the two or more partitions are
characterized by a set of progressively increasing quality
levels.
20. The system of claim 17, wherein at least one of the respective
two or more converted partitions comprise an attribute dataset.
Description
RELATED APPLICATIONS
[0001] The present application claims the benefit of priority to
co-pending U.S. Provisional Patent Application Ser. No. 62/154,658
titled, "METHOD MECHANISM TO IMPLEMENT A VIRTUAL FILE SYSTEM FROM
REMOTE CLOUD STORAGE" (Attorney Docket No. BOX-2015-0012-US00-PRO),
filed Apr. 29, 2015, and this application claims the benefit of
priority to co-pending U.S. Provisional Patent Application Ser. No.
62/154,022, titled, "LOW LATENCY AND LOW DEFECT MEDIA FILE
TRANSCODING USING OPTIMIZED STORAGE, RETRIEVAL, PARTITIONING, AND
DELIVERY TECHNIQUES" (Attorney Docket No. BOX-2015-0013-US00-PRO),
filed Apr. 28, 2015, both of which are hereby incorporated by
reference in their entirety
FIELD
[0002] This disclosure relates to the field of file sharing of
large media files, and more particularly to techniques for low
latency and low defect media file transcoding using optimized
storage, partitioning, and delivery techniques.
BACKGROUND
[0003] In today's "always on, always connected" world, people often
share video and other media files on multiple devices (e.g., smart
phones, tablets, laptops, etc.) for various purposes (e.g.,
collaboration, social interaction, entertainment, etc.). In some
situations, the format (e.g., encoding, container, etc.) of a
particular media file needs to be converted (e.g., transcoded) into
some other format. There are many reasons why such a conversion or
transcoding is needed. For example, a collaborator might have a
video file in a first encoding or format, and would want to
compress it so as to consume less storage space and/or consume less
transmission bandwidth when it is shared (e.g., delivered to
collaborating recipients). In many cases, a collaborator would want
to view a video as soon as it is posted, however, due to the
aforementioned reasons why such a conversion or transcoding might
be needed, the video would need to be converted before being made
available for previewing or sharing. Further transcoding may be
needed for viewing the video using the various media players
available on the various devices of the collaborators.
[0004] Legacy approaches to the problem of reducing the latency
between availability of an original media file (e.g., in a first
format) and availability of a transcoded media file (e.g., in a
second format) can be improved. In one legacy case, an original
media file is sent to an extremely high-powered computer, with the
expectation that the transcoding can complete sooner. In other
legacy cases, an original media file in a first format is divided
into equally sized partitions, and each partition is transcoded in
parallel with each other partition. While such a partitioning and
parallel processing techniques serve to reduce the latency time to
a first viewing of a transcoded media file, such an approach is
naive, at least as pertains to the extent that many of the
resulting transcoded partitions exhibit defects.
[0005] What is needed is a technique or techniques to improve over
legacy and/or over other considered approaches. Some of the
approaches described in this background section are approaches that
could be pursued, but not necessarily approaches that have been
previously conceived or pursued. Therefore, unless otherwise
indicated, it should not be assumed that any of the approaches
described in this section qualify as prior art merely by virtue of
their inclusion in this section.
[0006] What is needed is a technique or techniques to reduce the
first-view latency time incurred when transcoding a media file in a
first format to a second format while reducing or eliminating
defects in the resulting transcoded media file. The problem to be
solved is rooted in technological limitations of the legacy
approaches. Improvements, in particular improved design, and
improved implementation and application of the related
technologies, are needed.
SUMMARY
[0007] The present disclosure provides improved systems, methods,
and computer program products suited to address the aforementioned
issues with legacy approaches. More specifically, the present
disclosure provides a detailed description of techniques used in
systems, methods, and in computer program products for low latency
and low defect media file transcoding using optimized partitioning.
Certain embodiments are directed to technological solutions for
exploiting parallelism when transcoding from a first format to a
second format by determining partition boundaries based on the
first format. The disclosed techniques and devices within the shown
environments as depicted in the figures provide advances in the
technical field of high-performance computing as well as advances
in the technical fields of distributed computing and distributed
storage.
[0008] Further details of aspects, objectives, and advantages of
the disclosure are described below and in the detailed description,
drawings, and claims. Both the foregoing general description of the
background and the following detailed description are exemplary and
explanatory, and are not intended to be limiting as to the scope of
the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The drawings described below are for illustration purposes
only. The drawings are not intended to limit the scope of the
present disclosure.
[0010] FIG. 1 depicts an environment for using and transcoding
media files.
[0011] FIG. 2A presents a diagram depicting a full file transcoding
approach for comparison of techniques for performing low latency
and low defect media file transcoding using optimized
partitioning.
[0012] FIG. 2B presents a diagram illustrating techniques for low
latency and low defect media file transcoding using optimized
partitioning, according to an embodiment.
[0013] FIG. 3A is a chart showing media file partitioning as
implemented in systems for low latency and low defect media file
transcoding using optimized partitioning, according to an
embodiment.
[0014] FIG. 3B1 and FIG. 3B2 are a block diagrams showing media
file reformatting as implemented in systems for low latency and low
defect media file transcoding using optimized partitioning,
according to an embodiment.
[0015] FIG. 3C depicts media file reformatting as implemented in
systems for low latency and low defect media, according to an
embodiment.
[0016] FIG. 3D depicts an on-the-fly watermarking system as
implemented in systems for low latency and low defect media file
transcoding, according to an embodiment.
[0017] FIG. 4A presents a processing timeline to show full file
transcoding latency for comparison to techniques for low latency
and low defect media file transcoding using optimized partitioning,
according to an embodiment.
[0018] FIG. 4B presents a latency timeline illustrating a low
first-view latency technique used in systems implementing low
latency and low defect media file transcoding using optimized
partitioning, according to an embodiment.
[0019] FIG. 5 is a flow diagram illustrating a system for low
latency and low defect media file transcoding using optimized
partitioning, according to an embodiment.
[0020] FIG. 6A is a flow diagram illustrating caching of an initial
clip as used in systems for low latency and low defect media file
transcoding using optimized partitioning, according to an
embodiment.
[0021] FIG. 6B is a flow diagram illustrating pre-transcoding of an
initial clip as used in systems for low latency and low defect
media file transcoding using optimized partitioning, according to
an embodiment.
[0022] FIG. 6C is a flow diagram illustrating frame-by-frame
delivery of a video clip as used in systems for low latency and low
defect media file transcoding using optimized partitioning,
according to an embodiment.
[0023] FIG. 6D1 is a flow diagram illustrating playlist generation
from a video clip as used in systems for low latency and low defect
media file transcoding using optimized partitioning, according to
an embodiment.
[0024] FIG. 6D2 is a flow diagram illustrating generation of URLs
for video clips as used in systems for low latency and low defect
media file transcoding using optimized partitioning, according to
an embodiment.
[0025] FIG. 6D3 is a flow diagram illustrating generation of URLs
for video clips as used in systems for low latency and low defect
media file transcoding using optimized partitioning, according to
an embodiment.
[0026] FIG. 6D4 is a flow diagram illustrating timecode correction
techniques used when delivering video clips to viewers as used in
systems for low latency and low defect media file transcoding using
optimized partitioning, according to an embodiment.
[0027] FIG. 6E is a flow diagram illustrating techniques for
accessing media files through a custom virtual file system as used
when delivering video clips to collaborators, according to an
embodiment.
[0028] FIG. 7 depicts a system as an arrangement of computing
modules that are interconnected so as to operate cooperatively to
implement certain of the herein-disclosed embodiments.
[0029] FIG. 8A and FIG. 8B depict exemplary architectures of
components suitable for implementing embodiments of the present
disclosure, and/or for use in the herein-described
environments.
DETAILED DESCRIPTION
[0030] Some embodiments of the present disclosure address the
problem of reducing the first-view latency time incurred when
transcoding a media file in a first format to a second format,
while reducing or eliminating defects in the resulting transcoded
file and some embodiments are directed to approaches for exploiting
parallelism when transcoding from a first format to a second format
by determining partition boundaries based on the first format. More
particularly, disclosed herein and in the accompanying figures are
exemplary environments, systems, methods, and computer program
products for low latency and low defect media file transcoding
using optimized partitioning.
Overview
[0031] In today's "always on, always connected" world, people often
share video and other media files on multiple devices (e.g., smart
phones, tablets, laptops, etc.) for various purposes (e.g.,
collaboration, social interaction, etc.). In some situations, the
format (e.g., encoding, container, etc.) of a particular media file
needs to be converted (e.g., transcoded) into some other format.
However, a person may want to immediately view and/or her media
file, yet may need to wait for the media file to be converted or
transcoded. To address the need to reduce the first-view latency
time incurred when transcoding a media file in a first format to a
second format while reducing or eliminating defects in the
resulting transcoded media file, the techniques described herein
receive and analyze an original media file to determine optimized
partitions for transcoding, and techniques described herein operate
in conjunction with cloud-based remote file storage. For example, a
custom file system can be employed and/or optimized partitions can
be based in part on the target format or formats (e.g., encoding
scheme, codec, container, etc.) and/or available computing
resources (e.g., storage, processing, communications bandwidth,
etc.). Specifically, in one or more embodiments, the partition
boundaries can be selected with respect to key frames (e.g.,
I-frames). For example, a leading edge boundary partition can be
selected to be precisely at a key frame, and a trailing edge
boundary can be adjacent to a next key frame. When the partitions
and partition boundaries have been determined, the partitions can
be assigned to computing resources for simultaneous transcoding of
the respective partitions. The transcoded media file partitions can
then be assembled into a single transcoded video file (e.g.,
container) and delivered for viewing. In some embodiments, the
partitions can include attribute datasets (e.g., moov atoms) such
that a first or beginning transcoded partition can be delivered and
viewed in advance of the availability and assemblage of the
remaining transcoded partitions, thus further reduce the first-view
latency time.
[0032] Various embodiments are described herein with reference to
the figures. It should be noted that the figures are not
necessarily drawn to scale and that the elements of similar
structures or functions are sometimes represented by like reference
numerals throughout the figures. It should also be noted that the
figures are only intended to facilitate the description of the
disclosed embodiments--they are not representative of an exhaustive
treatment of all possible embodiments, and they are not intended to
impute any limitation as to the scope of the claims. In addition,
an illustrated embodiment need not portray all aspects or
advantages of usage in any particular environment. An aspect or an
advantage described in conjunction with a particular embodiment is
not necessarily limited to that embodiment and can be practiced in
any other embodiments even if not so illustrated. Also, reference
throughout this specification to "some embodiments" or "other
embodiments" means that a particular feature, structure, material,
or characteristic described in connection with the embodiments is
included in at least one embodiment. Thus, the appearances of the
phrase "in some embodiments" or "in other embodiments" in various
places throughout this specification are not necessarily referring
to the same embodiment or embodiments.
DEFINITIONS
[0033] Some of the terms used in this description are defined below
for easy reference. The presented terms and their respective
definitions are not rigidly restricted to these definitions--a term
may be further defined by the term's use within this disclosure.
The term "exemplary" is used herein to mean serving as an example,
instance, or illustration. Any aspect or design described herein as
"exemplary" is not necessarily to be construed as preferred or
advantageous over other aspects or designs. Rather, use of the word
exemplary is intended to present concepts in a concrete fashion. As
used in this application and the appended claims, the term "or" is
intended to mean an inclusive "or" rather than an exclusive "or".
That is, unless specified otherwise, or is clear from the context,
"X employs A or B" is intended to mean any of the natural inclusive
permutations. That is, if X employs A, X employs B, or X employs
both A and B, then "X employs A or B" is satisfied under any of the
foregoing instances. The articles "a" and "an" as used in this
application and the appended claims should generally be construed
to mean "one or more" unless specified otherwise or is clear from
the context to be directed to a singular form.
[0034] Reference is now made in detail to certain embodiments. The
disclosed embodiments are not intended to be limiting of the
claims.
Descriptions of Exemplary Embodiments
[0035] FIG. 1 depicts an environment 100 for using and transcoding
media files. As an option, one or more instances of environment 100
or any aspect thereof may be implemented in the context of the
architecture and functionality of the embodiments described herein.
The environment 100 or any aspect thereof may be implemented in any
desired environment.
[0036] As shown, the environment 100 supports access to workspaces
(e.g., workspace 122.sub.1 and workspace 122.sub.2) by a plurality
of users (e.g., collaborators 120) through a variety of computing
devices (e.g., user devices 102). For example, the collaborators
120 can comprise a user collaborator 123, an administrator
collaborator 124, and a creator collaborator 125. In addition, for
example, the user devices 102 can comprise one or more instances of
a laptop 102.sub.1 and laptop 102.sub.5, one or more instances of a
tablet 102.sub.2, one or more instances of a smart phone 102.sub.3,
and one or more instances of a workstation (e.g., workstation
102.sub.4 and workstation 102.sub.6). As shown, the workspaces can
present to the collaborators 120 a set of documents accessible by
each collaborator (e.g., based on permissions). For example, the
workspaces can provide certain groups of the collaborators 120
access to a set of media files (e.g., with container file
extensions .mov, .mp4, .wmv, .flv, etc.) for various collaboration
activities (e.g., creating, sharing, viewing, listening, editing,
etc.).
[0037] The environment 100 further illustrates the content (e.g.,
media files) represented in the workspaces can be managed (e.g.,
converted, transcoded, etc.) and stored on a server farm 110. For
example, the server farm 110 can be a cloud-based and/or
distributed computing and storage network(s) comprising one or more
instances of a host server 112, one or more instances of a sync
server 113, one or more instances of a notification server 114, one
or more instances of a collaboration server 116, one or more
instances of a content server 117, and one or more instances of an
origin server 118. In certain embodiments, other combinations of
computing devices and storage devices can comprise the server farm
110. The collaborators 120 interact with the workspaces to upload
media files (e.g., original media file 132) through an upload path
127 to the server farm 110. The collaborators 120 can further
interact with the workspaces to download media files (e.g.,
transcoded media file 134) through a download path 129 from the
server farm 110.
[0038] As an example, the creator collaborator 125 may have just
posted a new video (e.g., original media file 132 over the upload
path 127) that is shared with the user collaborator 123 in the
workspace 122.sub.1, and the user collaborator 123 selected the new
video for viewing on laptop 102.sub.1. However, a media player 103
on laptop 102.sub.1 and/or the associated computing resource
constraints (e.g., of laptop 102.sub.1, of download path 129, etc.)
may demand the original media file 132 be transcoded to the
transcoded media file 134 having a video playback format (e.g.,
advanced systems format (ASF) file) that is different than the
original media file 132 format (e.g., MP4). In this case, one or
more servers in the server farm 110 can perform the transcoding
using various approaches, where the choice of approach will impact
a first-view latency time 104 (e.g., the time from a request to
view to the start of viewing) and extent of viewing quality defects
experienced by the user collaborator 123 and other users. A
comparison of one approach shown in FIG. 2A to the herein disclosed
approach shown in FIG. 2B is described in the following.
[0039] FIG. 2A presents a diagram 2A00 depicting a full file
transcoding approach for comparison of techniques for performing
low latency and low defect media file transcoding using optimized
partitioning. As an option, one or more instances of diagram 2A00
or any aspect thereof may be implemented in the context of the
architecture and functionality of the embodiments described herein.
The diagram 2A00 or any aspect thereof may be implemented in any
desired environment.
[0040] One approach to transcoding a media file is shown in diagram
2A00. For example, a video file may need to be transcoded for
viewing by a user. As shown, the approach receives an original
media file (see step 202) and proceeds to process (e.g., transcode)
the entire original media file (see step 204). In this legacy full
file transcoding approach, the user desiring to view the media file
will need to wait until the entire media file is transcoded before
being able to view the transcoded media file. In some cases,
processing the original media file can comprise two steps of first
converting to an intermediate format and then converting to a
target format. A first-view latency time 221 using the approach
shown in diagram 2A00 can be improved by using high-powered
computing resources, yet the first-view latency time 221 can remain
long. For example, a 30-minute video can take 20 minutes to be
transcoded and made ready for viewing using the full file
transcoding approach shown in FIG. 2A. The first-view latency time
221 will further increase as the demanded resolution of the video
increases.
[0041] In some cases, an original media file is sent to an
extremely high-powered computer, with the expectation that the
transcoding can complete sooner. In other cases, an original media
file in a first format is divided into equally sized partitions,
and each partition is transcoded in parallel with each other
partition. For example, when an original media file in a first
format is divided into N equally sized partitions, then the time to
complete the transcoding can theoretically be reduced a time
proportional to 1/N. While this divide by N partitioning and
parallel processing technique serves to reduce the latency time to
a first viewing of a transcoded media file, such an approach can be
improved upon, at least as pertains to the aspect that many of the
resulting transcoded partitions exhibit defects. For example, many
of the resulting transcoded partitions exhibit image distortions
brought about by generating clip boundaries according to strict
divide by N partitioning.
[0042] One improved approach implemented in the herein disclosed
techniques for low latency and low defect media file transcoding
using optimized partitioning is described as pertains to FIG.
2B.
[0043] FIG. 2B presents a diagram 2B00 illustrating techniques for
low latency and low defect media file transcoding using optimized
partitioning. As an option, one or more instances of diagram 2B00
or any aspect thereof may be implemented in the context of the
architecture and functionality of the embodiments described herein.
The diagram 2B00 or any aspect thereof may be implemented in any
desired environment.
[0044] The approach illustrated in diagram 2B00 implements an
optimized partitioning of a media file for low latency and low
defect transcoding. Specifically, the set of steps describing such
an approach begins with receiving an original media file (see step
202) and analyzing the original media file to determine optimized
partitions for transcoding (see step 206). For example, optimized
partitions can be based in part on the target format or formats
(e.g., encoding scheme, codec, container, etc.) and/or available
computing resources (e.g., storage, processing, communications
bandwidth, etc.). In some cases the size of a partition might vary
with environmental considerations. Specifically, in one or more
embodiments, the leading-edge partition boundaries can be at
encoding key frames (e.g., I-frames). When the partitions and
partition boundaries have been determined, the partitions can be
assigned to computing resources for transcoding (see step 208). The
computing resources (e.g., server farm 110) can then transcode the
original media file partitions to respective transcoded media file
partitions (see parallel steps of step 210.sub.1, step 210.sub.2,
to step 210.sub.N). The transcoded media file partitions can then
be assembled into a single transcoded video file (e.g., container)
(see step 212).
[0045] The herein disclosed approach and technique presented in
diagram 2B00 has several advantages. For example, partitioning the
media file (e.g., into N partitions) for parallel transcoding
across a distributed computing system (e.g., N servers) can reduce
a first-view latency time (e.g., by a factor of 1/N) as compared to
a full file transcoding approach. Further, by determining optimal
partitions and partition boundaries (e.g., aligned with key
frames), defects in the resulting transcoded file can be minimized
or eliminated. In addition, a user can start viewing the transcoded
media file when the first partition has been transcoded or the
first set of partitions have been transcoded, such that viewing can
begin before the transcoded file has been assembled. For example, a
reduced first-view latency time 222 for a 30-minute video using the
herein disclosed approach shown in diagram 2B00 can be a few
seconds (e.g., when the first partition has been transcoded and
delivered). More details regarding the partitioning of media files
are shown and described as pertains to FIG. 3A, FIG. 3B1, FIG. 3B2,
FIG. 3C and FIG. 3D.
[0046] FIG. 3A is a chart 3A00 showing media file partitioning as
implemented in systems for low latency and low defect media file
transcoding using optimized partitioning. As an option, one or more
instances of chart 3A00 or any aspect thereof may be implemented in
the context of the architecture and functionality of the
embodiments described herein. The chart 3A00 or any aspect thereof
may be implemented in any desired environment.
[0047] The chart 3A00 shows a time-based representation of an
original media file 302 in a first encoding format or a first set
of encoding formats. As shown, for example, the original media file
302 can be a video file packaged in a container that comprises a
moov atom at the end. The moov atom, also referred to as a movie
atom, is an attribute dataset comprising information about the
original media file 302 such as the timescale, duration, display
characteristics of the video, sub-atoms containing information
associated with each track in the video, and other attributes. As
shown, an original moov atom 303 is present at the end of the
original media file 302. When transcoding of the original media
file 302 is demanded, the original media file 302 can be analyzed
to determine a set of candidate partitions 304 for parallel
processing. For example, in legacy approaches, the candidate
partitions can be determined by equally dividing (e.g., into units
of time or duration) the original media file 302 by the number of
computing resources (e.g., servers) available for parallel
transcoding operations. In this case, however, the candidate
partitions 304 may have partition boundaries that result in defects
in the playback of the assembled transcoded media file. Such
defects can comprise subjective quality as perceived by a user
(e.g., blockiness, blurriness, ringing artifacts, added high
frequency content, picture outages, freezing at a particular frame
and then skipping forward by a few seconds, etc.). In many cases,
objective metrics that characterize one or more aspects of playback
quality can be computed (e.g., frame-by-frame comparison of an
original object and a transcoded object).
[0048] In one embodiment, the herein disclosed techniques can
determine a set of optimized partitions 305 for transcoding the
original media file 302 to reduce first-view latency and reduce or
eliminate transcoding defects. As shown, for example, a set of key
frame locations (e.g., key frame location 306.sub.1, key frame
location 306.sub.2, key frame location 306.sub.3, key frame
location 306.sub.4, key frame location 306.sub.5) can be used to
define the boundaries of the file partitions (e.g., P1, P2, P3, and
P4). In some embodiments, the key frame locations can be defined in
the original media file 302, and in certain embodiments, the key
frame locations can be based in part on the target transcoding
format or formats. In one or more embodiments, the candidate
partitions 304 (e.g., based on an optimal set of computing
resources to deliver low latency) can be aligned to the closest key
frame location (e.g., based on an optimal partitioning boundary to
deliver low defects). For example, key frame location 306.sub.2 is
chosen as a partition boundary over key frame location 306.sub.5 as
being closer to an instance of a candidate partition 304. In some
cases and embodiments, a set of moov atoms can be included at
various positions in one or more partitions based in part on the
known and/or expected delivery and playback method. For example,
each instance of a partition in the optimized partitions 305 is
shown to have a moov atom at the beginning of the partition (e.g.,
see moov atom 307.sub.1 in partition P1 and moov atom 307.sub.2 in
partition P2). Positioning the moov atom at the beginning of the
partitions can reduce the first-view latency by enabling the user
media player to start decoding and playing the first partition
(e.g., P1) independently of the transcoding completion status and
delivery of the other partitions (e.g., for video streaming,
progressive downloading, etc.).
[0049] FIG. 3B1 presents a block diagram 3B100 showing media file
reformatting as implemented in systems for low latency and low
defect media file transcoding using optimized partitioning. The
techniques for media file reformatting can be practiced in any
environment.
[0050] As shown, a media file is laid out in a first format (e.g.,
the shown source format) where the file is organized into multiple
adjacent partitions. The adjacent partitions comprise playlist data
(e.g., playlist.sub.F1 322.sub.F1), video data (e.g., video extent
324), and audio data (e.g., audio extent 323). One way to convert
from a source format 342.sub.1 to a target format 352.sub.1 is to
use a multi-format transcoder where the multi-format transcoder
accepts a media file in a source format and produces a media file
in a target format. Another way to convert from certain source
formats to certain target formats is to use a non-transcoding
segment reformatter 320. Such a non-transcoding segment reformatter
320 segments a video stream into a plurality of video segments
(e.g., video segment1 326 through video segmentN 328). In some
cases, and as shown, a particular video segment may have
corresponding audio data (e.g., the soundtrack for the particular
video segment).
[0051] A non-transcoding segment reformatter 320 can combine (e.g.,
interleave) a particular video segment with corresponding audio
data. The act of combining can include producing a series of
extents (e.g., 512 byte blocks or 1 k byte blocks) that can be
stored as a file. As shown an extent includes both video data and
audio data. The combination into the extent can involve
interleaving. Interleaving can be implemented where one extent
comprises a video segment (e.g., video segment1 326, or video
segmentN 328) as well as an audio segment (e.g., audio segment1
327, or audio segmentN 329). Different interleaving techniques
interleave within an extent at different degrees of granularity.
For example, the combination of video data and audio data within
the extent can involve interleaving at a block-level degree of
granularity, or at a timecode degree of granularity, or at a
byte-by-byte or word-by-word degree of granularity.
[0052] In some embodiments, a non-transcoding segment reformatter
320 can combine video in a particular video format (e.g., in an
HTTP live streaming (HLS) format or a dynamic adaptive streaming
over HTTP (DASH) format) with corresponding audio data in a
particular audio format or encoding (e.g., as an advanced audio
coding (AAC) stream or as an MP3 stream). In some cases
interleaving can be implemented by merely moving video or audio
segments from one location (e.g., from a location in a source
format) to a location in a target format without performing
signal-level transcoding operations. In some cases interleaving can
be implemented by merely moving video segments from one location
(e.g., from a location in a source format) to an allocation in a
target format without performing any video signal-level transcoding
operations. Audio data can segmented and transcoded as needed
(e.g., using the shown audio transcoder 331) to meet the
specification of the target format. For example, situations where
the video is already being delivered in a standard H.264 encoding,
the video doesn't need to be re-transcoded, however if the audio is
encoded using (for example) the free lossless audio codec (FLAC),
the audio might need to be transcoded into an appropriate target
format (MP3, high-efficiency advanced audio coding (HE-AAC), or
AC-3).
[0053] Some formats include a playlist. Such a playlist might
identify titles or chapters or other positions in a corresponding
stream. For example, the playlist.sub.F1 322.sub.F1 in the depicted
source format includes a series of markers (e.g., titles or
chapters or other positions). For each marker, the playlist.sub.F1
322.sub.F1 includes two pointers or offsets: (1) into the video
data extent 324, and (2) into the audio data extent 323. Strictly
as an additional example, the playlist.sub.F2 322.sub.F2 in the
depicted target format includes a series of markers (e.g., titles
or chapters or other positions) where, for each marker, the
playlist.sub.F2 322.sub.F2 includes one pointer to an extent. The
degree of interleaving is assumed or inherent or can be determined
from characteristics of the target format.
[0054] FIG. 3B2 presents a block diagram 3B200 showing a variation
of the media file reformatting as depicted in FIG. 3B 1. One way to
convert from a source format 342.sub.2 to a target format 352.sub.2
is to use a multi-format transcoder where the multi-format
transcoder accepts a media file in a source format (e.g., in an
interleaved format) and produces a media file in a target format
(e.g., comprising a video extent and an audio extent).
[0055] FIG. 3C depicts a media file reformatting system 3C00 as
implemented in systems for low latency and low defect media file
transcoding. As shown, one or more media files comprise a video
portion, a respective audio portion, and a respective playlist.
Each of the shown video files are encoded and/or formatted in a
source format 342.sub.3. A non-transcoding media file reformatter
350 serves to process each video portion, its respective audio
portion, and its respective playlist so as to generate a media file
in the target format 354.sub.3.
[0056] In some embodiments the non-transcoding media file
reformatter 350 moves video data from its position in the media
file of the source format (e.g., preceding the audio portion) to a
position in the target format (e.g., following the audio portion).
As shown, the video portions of a media file (e.g., video portion
346.sub.1, video portion 346.sub.2, video portion 346.sub.N) are
moved to different positions in the media file of the target
format. Also as shown, the audio portions of a media file (e.g.,
audio portion 348.sub.1, audio portion 348.sub.2, audio portion
348.sub.N) are moved to different positions in the media file of
the target format. Playlists of media files in the source format
(e.g., playlist.sub.S1, playlist.sub.S2, . . . playlist.sub.SN) are
converted using the non-transcoding segment media file reformatter
350 such that the playlists of media files in the target format
(e.g., playlist.sub.T1, playlist.sub.T2, . . . playlist.sub.TN) are
adjusted so as to point to the same titles, chapters, locations,
etc. as were present in the playlists of media files in the source
format.
[0057] FIG. 3D depicts an on-the-fly watermarking system 3D00 as
implemented in systems for low latency and low defect media file
transcoding. As shown, the system includes a watermark generator
380 and a watermark applicator 381. On-the-fly watermarking is
performed as follows: A server video cache 396 holds video segments
that are made ready for delivery to a user/viewer. At some moment
before delivery to the user/viewer, a selector 398 determines a
next video segment (e.g., segment S.sub.1, segment2 S.sub.2,
segment3 S.sub.3, . . . , segmentN S.sub.N) to send over a network
(e.g., using sending unit 388) to a device destination prescribed
by the user/viewer.
[0058] The watermark generator 380 has access (e.g., through a data
access module 390) to a media file repository 382 that stores media
files 386 as well as media playlist files 384. The watermark
generator 380 further has access to an environmental data
repository 385. A watermark can be generated based on any
combination of data retrieved from any source. For example, a
watermark can contain a user's unique information (e.g., name or
nickname or email alias, etc.), and/or the name or identification
of the user's device, the time of day, copyright notices, logos,
etc. The watermark can be placed over the entire video and/or in a
small section, and/or move around every X frames or Y seconds, etc.
The watermark itself can also update as the video stream
progresses.
[0059] In exemplary embodiments watermark can be generated by
combining a segment from the server video cache with environmental
data 394 retrieved from the environmental data repository 385. More
specifically, a video segment can be watermarked by applying an
image over one or more frames in the selected video segment. The
aforementioned image might be an aspect of the requesting user,
possibly including the requesting user's userID, either based on a
plain text version of the userID, or based on an obfuscated version
of the userID. In some scenarios, the aforementioned image might
include an aspect of a time (e.g., a timestamp), and/or some
branding information (e.g., a corporate logo).
[0060] In some cases, the watermark generator 380 performs only
watermark generation so as to produce a generated watermark 393 and
passes the generated watermark to the watermark applicator 381. The
watermark applicator in turn can apply the generated watermark 393
to a video segment so as to produce watermarked selected segments
(e.g., watermarked selected segment 389.sub.1, watermarked selected
segment 389.sub.2). In some cases a watermark can be included in or
on an added frame. In such cases, the length of the video segment
is changed (e.g., by the added frame). As such, the playlist
regenerator 387 can retrieve and process a selected segment
playlist (e.g., see selected segment playlist 392.sub.1 and
selected segment playlist 392.sub.2) as well as instructions from
the watermark applicator (e.g., "added one frame immediately after
frame 0020") so as to produce a regenerated media file playlist 389
that corresponds to the media file from which the selected segment
was cached.
[0061] In the context of watermarking video streams as well as in
other video streams, some or all of the files in the file system
may actually be located in a remote storage location (e.g., in a
collaborative cloud-based storage system). To avoid incurring
delays in the downloading and processing of that data, a
client-side video cache 397 can be implemented. As such, serving of
video segments is enhanced.
[0062] For example, consider the situation when a video file in a
file system is selected to be played on a client-local video
player, but the file is actually located across the network at
another network location (e.g., on a server). If network conditions
are perfect and download speeds are high enough, then it is quite
possible for the video to be streamed across the network without
any stalls or interruptions in the display of the video data.
However, situations often exist where video downloads still need to
perform even when the network conditions are not ideal. Consider if
the system is configured such that the video starts being displayed
as soon as portions of the video data are received at the local
client. In this situation, there is likely to be intermittent
interruptions in the video display, where a portion of the video is
played, followed by an annoying stall in video playing (as network
conditions cause delays in data downloads), followed by more of the
video being displayed as additional data is downloaded.
[0063] The present embodiment of the invention provides an improved
approach to display data in a virtual file system that
significantly eliminates these intermittent interruptions. In the
present embodiment, the data is not always immediately queued for
display to the user. Instead, chunks (e.g., segments) of data are
continuously requested by a client-local module (e.g., a
client-local video player), and the data starts being displayed
only when there are sufficient amounts of data (e.g., a sufficient
number of segments) that has been locally received to ensure smooth
display of the data. This approach therefore avoids the problems
associated with immediate playback of data, since there should
always be enough data on hand to smooth out any changes in network
conditions.
[0064] FIG. 3D includes one approach to implement aspects of video
segment caching. In the depicted system, a request originating from
a user device is received by the server. The requested data may
correspond to any granularity of data (e.g., block, segment,
chapter, title, etc.). For example, an entire media file or title
may be requested; or, only a range of blocks or chapter of the
media can be requested. Further, the requested data may pertain to
any type of structure or format.
[0065] FIG. 4A presents a processing timeline 4A00 to show full
file transcoding latency for comparison to techniques for low
latency and low defect media file transcoding using optimized
partitioning. As an option, one or more instances of processing
timeline 4A00 or any aspect thereof may be implemented in the
context of the architecture and functionality of the embodiments
described herein. The processing timeline 4A00 or any aspect
thereof may be implemented in any desired environment.
[0066] Processing timeline 4A00 shows an original media file01
404.sub.1 transcoded into a transcoded media file01 406.sub.1 using
a full file transcoding approach 410. As shown, such an approach
introduces a setup time 402 for a computing device or resource to
prepare for transcoding an entire media file (e.g., a two-hour
movie). The full file transcoding approach 410 then proceeds to
transcode the original media file01, requiring that a user desiring
to view the transcoded file wait until the entire file is
transcoded, thus experiencing a full file transcoding approach
first-view latency 412. For example, transcoding a one-hour video
to certain combinations of formats and resolutions can result in
the full file transcoding approach first-view latency 412 being one
to two hours. For comparison to the full file transcoding approach
410, FIG. 4B illustrates the herein disclosed techniques for
reducing the first-view latency of transcoded media files, while
minimizing or eliminating defects in the transcoded media
files.
[0067] FIG. 4B presents a latency timeline 4B00 illustrating a low
first-view latency technique used in systems implementing low
latency and low defect media file transcoding using optimized
partitioning. As an option, one or more instances of latency
timeline 4B00 or any aspect thereof may be implemented in the
context of the architecture and functionality of the embodiments
described herein. The latency timeline 4B00 or any aspect thereof
may be implemented in any desired environment.
Optimized Partitioning Approach Using Increasing Chunk Size
[0068] Latency timeline 4B00 shows an original media file01
404.sub.1 transcoded into a transcoded media file01 406.sub.1 using
an optimized partitioning approach 420 and a progressive optimized
partitioning approach 430 according to the herein disclosed
techniques for low latency and low defect media file transcoding
using optimized partitioning. Specifically, the optimized
partitioning approach 420 can determine optimized partitioning of
the original media file01 based in part on the current and/or
target format (e.g., key frame location) and/or available computing
resources. The resulting partitions are delivered to a set of
computing resources for parallel processing (e.g., transcoding) and
assembled into a file container. As shown, in some embodiments, the
transcoded file can be viewed by a user when the first partition
(e.g., P1) has been processed and delivered, resulting in an
optimized partitioning first-view latency 422. In the embodiment
implementing the progressive optimized partitioning approach 430,
the first partition is relatively small to enable a faster
transcoding processing time for an initial clip. A shorter
progressive optimized partitioning first-view latency 432 can be
implemented to improve still further over the earlier-described
latency of optimized partitioning first-view latency 422. In both
approaches shown in FIG. 4B, the first-view latencies can be
substantially shorter (e.g., seconds) than the full file
transcoding approach first-view latency 412 as shown in FIG. 4A
(e.g., hours). The initial relatively small clip enables a faster
transcoding processing time for an initial clip. Successively
larger chunks (e.g., clip size) can be determined as transcoding
proceeds through the original media file. More particularly,
although the initial, relatively small clip is transcoded and
presented to the requestor with low latency, the successively
larger clip sizes can improve overall performance of the system as
a whole. Use of progressive chunk size (e.g., even for just a few
of the chunks after the first one) facilitates delivering more and
better quality upgrades (e.g., via more and better quality levels)
at a faster rate. Consider a system that implements equal chunk
sizes of 10 seconds. In such a case, a quality upgrade can only
happen at a chunk boundary. It follows then that it would take at
least 10 seconds of playback before the quality can be upgraded.
Initially, the client starts with a default quality (e.g., a very
low one such as 360p). If the client can actually support 1080p,
which may be four quality jumps away, it will take 4 chunk times
(40 seconds) to reach that quality. By using progressive chunk
sizes initially, the amount of time it takes to upgrade to the
maximum available quality is shortened. An alternative technique
for making all the chunks into small sizes (e.g., 2 seconds each)
so as to reduce the chunk time quantum would result in a very large
number of chunks--and each individual network request (e.g.,
request for a chunk) introduces communication latency and
processing overhead, resulting in undesired slower retrieval of
data. Increasing the chunk size strikes a balance between use of
system resources and user experience.
[0069] FIG. 5 is a flow diagram 500 illustrating a system for low
latency and low defect media file transcoding using optimized
partitioning. As an option, one or more instances of flow diagram
500 or any aspect thereof may be implemented in the context of the
architecture and functionality of the embodiments described herein.
The flow diagram 500 or any aspect thereof may be implemented in
any desired environment.
[0070] Flow diagram 500 in FIG. 5 shows one embodiment of
representative modules of, and flows through, a system for
implementing techniques for low latency and low defect media file
transcoding using optimized partitioning. Specifically, an
intermediate server 506 can comprise a partitioner module 508, a
partition workload assigner 510, and a transcoder partition
assembler 512. The intermediate server 506 can further communicate
with a cloud-computing provider 520 comprising a plurality of
workload nodes (e.g., workload node 524.sub.1, workload node
524.sub.2, workload node 524.sub.3, to workload node 524.sub.N) to
perform various distributed computing and storage operations.
Specifically, as pertains to the herein disclosed techniques, the
partitioner module 508 of the intermediate server 506 can receive
an original media file 502 from a storage facility for transcoding.
The partitioner module 508 can analyze the original media file 502
to determine optimized partitions for transcoding (e.g., based in
part on the target format or formats and/or available computing
resources at cloud computing provider 520). When the partitions and
partition boundaries have been determined, the partition workload
assigner 510 can assign the respective partitions to the workload
nodes at the cloud computing provider 520 to transcode the
partitioned original media file segments to respective transcoded
media file segments. The transcoder partition assembler 512 can
then assemble the transcoded media file segments into a transcoded
media video file 504.
Caching
[0071] FIG. 6A is a flow diagram 6A00 illustrating caching of an
initial clip as used in systems for low latency and low defect
media file transcoding using optimized partitioning. Performance of
previewing and other delivery of media to a collaborator can be
improved by pre-transcoding an initial portion of a media file upon
initial posting. For example, and as shown in FIG. 6A, when a user
requests to view a media clip (e.g., by selecting from a workspace
media preview icon), the system checks in a cache 602 for the
presence of the media clip (or portion thereof). If the requested
media clip is already available in the cache, then the clip is
served. If requested media clip is not yet available in the cache,
then at least an initial portion of the media can be transcoded,
stored in the cache, and served to the requestor. Various
techniques for pre-transcoding an initial portion of a media file
are shown and discussed as pertains to FIG. 6B.
Pre-Transcoding
[0072] FIG. 6B is a flow diagram 6B00 illustrating pre-transcoding
of an initial clip as used in systems for low latency and low
defect media file transcoding using optimized partitioning. As
shown, a creator collaborator 125 might provide a media file for
uploading to the remote collaborative cloud-based storage system.
At that time, the system can detect that the media file is new to
the system and, in accordance with some pre-transcoding techniques,
the new file can be preprocessed. Many preprocessing operations can
be performed, in particular pre-transcoding of an initial clip.
When a collaborator requests to view that media, a preprocessed
initial clip of the newly received file is available and can be
served to the requestor upon request, resulting in low-latency
delivery of the requested media and providing a good user
experience. In some embodiments, the media file can be transcoded
into multiple initial clips corresponding to various initial
lengths (e.g., 10 seconds, 20 seconds, 30 seconds, etc.) and
corresponding to various qualities (e.g., 480p, 720p, 1080p,
etc.).
Frame-by-Frame Delivery Using a Delivery Pipeline
[0073] FIG. 6C is a flow diagram 6C00 illustrating frame-by-frame
delivery of a video clip as used in systems for low latency and low
defect media file transcoding using optimized partitioning. In some
cases, an initial transcoded clip or other requested transcoded
clip might not be pre-transcoded (e.g., might not yet be stored in
the cache, and might not be stored in the remote collaborative
cloud-based storage system). For example, if the transcoder needs
to send a chunk to the client but does not have the chunk
transcoded, it will start the transcoding operation for that chunk
and, as each frame is transcoded, send the frame down to the
client, keeping the connection open for subsequent frames. If the
transcoder does have the chunk already transcoded (e.g., because it
is running with more resources than it needs and is ahead in the
transcoding operation than the client is in the viewing operation),
then it simply sends the entire chunk down at once (e.g., without
using a frame-by-frame pipeline).
[0074] In some situations a low latency preview can be facilitated
by transcoding a very small portion for initial deliver (e.g., see
the technique of FIG. 4B). In other cases, such as when sufficient
computing resources needed to transcode an entire initial clip
cannot be reserved, then a technique to establishing a
frame-by-frame pipeline for transcoding and delivery can be
employed. When sufficient computing resources needed to transcode
become available, the frame-by-frame pipeline can be flushed, and
other transcoding and delivery techniques can be pursued.
Frame-by-Frame Delivery Using Automatic Pre-Generation of a
Playlist
[0075] FIG. 6D1 is a flow diagram illustrating playlist generation
from a video clip as used in systems for low latency and low defect
media file transcoding using optimized partitioning. In many
situations, a playlist (e.g., an HLS playlist) or manifest (e.g., a
DASH manifest) is delivered before delivery of a media clip or
portion thereof. In many cases it is felicitous to pre-generate a
playlist or manifest upon receipt of a request to access (e.g.,
watch) a media file. The requester can see various characteristics
of the entire media file, and a media player can present navigation
controls that are substantially accurate vis-a-vis the entire media
file. In some cases many different sized clips, possibly using
different qualities of video, can be delivered. In some such cases
the timecode of the different clips is corrected (see FIG. 6D4,
below).
Generation of URLs Used for Retrieval of Media Clips
[0076] FIG. 6D2 is a flow diagram illustrating generation of URLs
for video clips as used in systems for low latency and low defect
media file transcoding using optimized partitioning. Generating a
playlist or manifest for a clip or series of clips involves
relating a time or time range with a media file. Strictly as an
example, a playlist corresponding to a series of successively
larger chunks (e.g., such as discussed in the foregoing FIG. 4B)
can be determined as transcoding proceeds through the original
media file. The playlist can refer to the initial media file
location (e.g., by URL) for immediate playback, and a playlist can
identify to the successively larger clips by referring to the media
file location of respective successively larger clips. Multiple
playlists can be generated upon presentation of a media file, and
the resulting playlists or manifests can be stored or cached for
fast retrieval.
Generation of Stateful URLs Used for Retrieval of Media Clips
[0077] FIG. 6D3 is a flow diagram illustrating generation of URLs
for video clips as used in systems for low latency and low defect
media file transcoding using optimized partitioning. Generating a
playlist or manifest for a clip or series of clips involves
relating a time or time range with a media file. Strictly as an
example, a playlist corresponding to a series of successively
larger chunks (e.g., such as discussed in the foregoing FIG. 4B)
can be determined as transcoding proceeds through the original
media file. As shown, the processing proceeds as follows: (1) a
user requests video media from workspace, (2) the video media
metadata is retrieved, and (3) a playlist is generated that
contains state-encoded URLs.
[0078] In exemplary embodiments, there is one URL generated for
each chunk in the playlist, and each URL (e.g., a state-encoded URL
entry) that is generated corresponds to an independent transcoding
job for that specific chunk. When the URL is accessed by the
player, such as after the player reads the playlist file and
requests a specific chunk, the state-encoding in the URL pertaining
to that chunk is communicates state variables (e.g., variable
values) to the transcoding server, which then operates using the
state variables and other encoded information necessary to provide
the appropriate transcoded chunk. The transcoded chunk data is then
delivered to the requesting client. An example of a stateful URL
is:
TABLE-US-00001 "http://transcode-
001.streem.com/1080.ts?start=20&chunkDuration=10&totalDurati
on=120&orientation=180&mediaId=184719719371&userId=382058184
91&jobId=5112485"
[0079] The URL itself comprises information delivered to the
transcoder. In the example above, the start location (e.g.,
"start=20"), the chunk duration (e.g., "chunkDuration=10"), and
other information can be known by the transcoder, merely by
receiving and parsing the stateful URL.
Timecode Correction Techniques to Compensate for Variable
Communication Channels
[0080] FIG. 6D4 is a flow diagram illustrating timecode correction
techniques used when transcoding or delivering video clips to
viewers as used in systems for low latency and low defect media
file transcoding using optimized partitioning. As heretofore
described, the environment in which a video clip can be served
might vary widely depending on the situation (e.g., serving to a
client with a hardline Internet connection, serving to a mobile
device over a wireless channel, etc.). Moreover, the environment in
a particular situation can vary in real-time. For example, the bit
rate available to and from a mobile device might vary substantially
while the mobile device user traverses through between cellular
towers. More generally, many variations in the communication fabric
to and from a user device can occur at any time. Various techniques
ameliorate this situation by selecting a next clip to deliver to a
mobile device where the selected next clip is selected on the basis
of determined characteristics of the available communication
channels. For example, during periods where the communication
channel is only able to provide (for example) 500 Kbps of
bandwidth, a relatively low quality of video (e.g., 480p) can be
delivered. When the communication channel improves so as to be able
to provide (for example) higher bandwidth, then a relatively higher
quality of video (e.g., 1080p) can be delivered to the viewer.
Variations of quality vis-a-vis available bandwidth can occur
dynamically over time, and any time duration for re-measuring
available bandwidth can be relatively shorter or relatively longer.
One artifact of such dynamic selection of a particular quality of a
video clip is that the timecode of the next delivered clip needs to
be corrected, depending on the selected quality. In particular, the
metadata of the clip has to be compared with the playlist so that
the playlist will still serve for navigation (e.g., fast forward,
rewind, etc.) purposes. Further, absent timecode correction, a
succession of chunks exhibit artifacts and glitches (e.g., image
freezing, skipping around, missing frames, etc.). In exemplary
embodiments, transcoding is performed "on-the-fly" using a "dynamic
transcoder". A dynamic transcoder starts and stops frequently based
on incoming requests. In some implementations, the starting and
stopping of the transcoder resets the timecode of the chunks it
generates. One or more timecode correction techniques are applied
to make each chunk indistinguishable from a chunk that had been
made in a single continuous transcoding job.
[0081] Strictly as one example, when a transcoding job is
interrupted, a timecode correction is needed. A transcoding job
might be interrupted whenever a request for a chunk that the
transcoder does not have transcoded at that time is received. Such
a condition can occur when the client requests a different quality
than the previous chunk and/or when the client requests a chunk
that is determined to be a forward chunk (e.g., a "seek") in the
video rather than a next chunk as would be received during
continuous playback of the video.
Access Techniques Using a Virtual File System
[0082] FIG. 6E is a flow diagram 6E00 illustrating techniques for
accessing media files through a custom virtual file system as used
when delivering video clips to collaborators. In certain
situations, the remote collaborative cloud-based storage systems
rely on a particular file system (e.g., NTFS, CIFS, etc.), and in
some situations, the characteristics of such a particular file
system might not map conveniently to the functional requirements of
a transcoding and delivery service. One technique to ameliorate
differences between the functional requirements of a transcoding
and delivery service and a back-end file system is to provide a
custom virtual file system, which can be used as a video prefetcher
620. The video prefetcher is used for various purposes, including:
[0083] 1. Listing the available video files on the user's cloud
remote account. The file listing presents actual, readily available
files so as to eliminate or reduce the need to write custom code to
interact with the APIs of the cloud provider. [0084] 2. Handing the
downloading of video file content (e.g., frames, bytes, etc.).
[0085] 3. Prefetching frames or bytes of the video file that are
most likely to be accessed in the near future by the transcoder.
This reduces network load by reducing or eliminating the need to
fetch data every time the transcoder needs new portions of the
video file. As such, throughput and latency are greatly improved.
[0086] 4. Providing a local repository (e.g., a cache) of the video
file that is being requested so that subsequent accesses to the
same part of the video file by the transcoder can be retrieved from
the local repository rather than from cloud-based storage.
[0087] The video prefetcher 620 serves to fetch the predicted next
parts of the original video that the transcoder is predicted to
process soon. This improves transcoding throughput by pipelining
the downloading and the transcoding into adjacent pipeline phases.
Further, the video prefetcher can be configured to cache recently
downloaded videos so that the transcoder doesn't need to
re-download the original when transcoding into another quality
level or when re-transcoding for another user. In exemplary
embodiments, the video prefetcher provides an abstraction layer to
other element of the system, thus allowing the transcoder to remain
independent from all network requests. The transcoder is relieved
of tasks and operations pertaining to downloading, authentication,
identifying and transferring metadata, etc.
Additional Embodiments of the Disclosure
Additional Practical Application Examples
[0088] FIG. 7 depicts a system 700 as an arrangement of computing
modules that are interconnected so as to operate cooperatively to
implement certain of the herein-disclosed embodiments. The
partitioning of system 700 is merely illustrative and other
partitions are possible.
[0089] FIG. 7 depicts a block diagram of a system to perform
certain functions of a computer system. As an option, the system
700 may be implemented in the context of the architecture and
functionality of the embodiments described herein. Of course,
however, the system 700 or any operation therein may be carried out
in any desired environment. The system 700 comprises at least one
processor and at least one memory, the memory serving to store
program instructions corresponding to the operations of the system.
As shown, an operation can be implemented in whole or in part using
program instructions accessible by a module. The modules are
connected to a communication path 705, and any operation can
communicate with other operations over communication path 705. The
modules of the system can, individually or in combination, perform
method operations within system 700. Any operations performed
within system 700 may be performed in any order unless as may be
specified in the claims. The shown embodiment implements a portion
of a computer system, presented as system 700, comprising a
computer processor to execute a set of program code instructions
(see module 710) and modules for accessing memory to hold program
code instructions to perform: identifying a first media file having
a first format to be converted to a second media file having a
second format (see module 720); partitioning the first media file
into two or more partitions separated by one or more partition
boundaries, wherein the one or more partition boundaries are at a
determined key frame position (see module 730); converting into the
second format, the two or more partitions to respective two or more
converted partitions, wherein the respective two or more partitions
are converted by respective two or more computing devices (see
module 740); and assembling the respective two or more converted
partitions to comprise the second media file (see module 750).
[0090] Variations include: [0091] Variations where the determined
key frame position is a closest key frame location to a respective
one of the partition boundaries. [0092] Variations that comprise
storing at least a portion of the second media file into a cache.
[0093] Variations that comprise retrieving at least a portion of
the second media file from a cache. [0094] Variations that comprise
delivering at least a portion of the second media file to a
requestor to be viewed on a media player. [0095] Variations that
comprise assigning of the two or more partitions to the respective
two or more computing devices. [0096] Variations where the two or
more partitions are characterized by a set of progressively
increasing durations. [0097] Variations where the two or more
partitions are characterized by a set of progressively decreasing
durations. [0098] Variations where the two or more partitions are
characterized by a set of progressively increasing quality levels.
[0099] Variations where the partitioning is based at least in part
on a total computing resource availability. [0100] Variations where
the partitioning is based at least in part on at least one of, the
first format and the second format. [0101] Variations where the
respective two or more computing devices comprise a cloud-based
computing system. [0102] Variations where at least one of the
respective two or more converted partitions comprise an attribute
dataset. [0103] Variations where the attribute dataset is at least
one of, an atom, a movie atom, and a moov atom. [0104] Variations
where steps for converting into the second format comprise a
timecode correction. [0105] Variations that comprise generating a
playlist based at least in part on the second media file. [0106]
Variations where at least some playlist entries comprise a
state-encoded URL. [0107] Variations where the respective two or
more partitions are selected based on a length. [0108] Variations
where at least one of the respective two or more partitions are
stored before an assembling step. [0109] Variations where at least
one of the respective two or more partitions corresponds to a
beginning portion of the first media file.
System Architecture Overview
Additional System Architecture Examples
[0110] FIG. 8A depicts a block diagram of an instance of a computer
system 8A00 suitable for implementing embodiments of the present
disclosure. Computer system 8A00 includes a bus 806 or other
communication mechanism for communicating information. The bus
interconnects subsystems and devices such as a CPU, or a multi-core
CPU (e.g., processor 807), a system memory (e.g., main memory 808,
or an area of random access memory RAM), a non-volatile storage
device or area (e.g., ROM 809), an internal or external storage
device 810 (e.g., magnetic or optical), a data interface 833, a
communications interface 814 (e.g., PHY, MAC, Ethernet interface,
modem, etc.). The aforementioned components are shown within
processing element partition 801, however other partitions are
possible. The shown computer system 8A00 further comprises a
display 811 (e.g., CRT or LCD), various input devices 812 (e.g.,
keyboard, cursor control), and an external data repository 831.
[0111] According to an embodiment of the disclosure, computer
system 8A00 performs specific operations by processor 807 executing
one or more sequences of one or more program code instructions
contained in a memory. Such instructions (e.g., program
instructions 802.sub.1, program instructions 802.sub.2, program
instructions 802.sub.3, etc.) can be contained in or can be read
into a storage location or memory from any computer readable/usable
medium such as a static storage device or a disk drive. The
sequences can be organized to be accessed by one or more processing
entities configured to execute a single process or configured to
execute multiple concurrent processes to perform work. A processing
entity can be hardware-based (e.g., involving one or more cores) or
software-based, and/or can be formed using a combination of
hardware and software that implements logic, and/or can carry out
computations and/or processing steps using one or more processes
and/or one or more tasks and/or one or more threads or any
combination therefrom.
[0112] According to an embodiment of the disclosure, computer
system 8A00 performs specific networking operations using one or
more instances of communications interface 814. Instances of the
communications interface 814 may comprise one or more networking
ports that are configurable (e.g., pertaining to speed, protocol,
physical layer characteristics, media access characteristics, etc.)
and any particular instance of the communications interface 814 or
port thereto can be configured differently from any other
particular instance. Portions of a communication protocol can be
carried out in whole or in part by any instance of the
communications interface 814, and data (e.g., packets, data
structures, bit fields, etc.) can be positioned in storage
locations within communications interface 814, or within system
memory, and such data can be accessed (e.g., using random access
addressing, or using direct memory access DMA, etc.) by devices
such as processor 807.
[0113] The communications link 815 can be configured to transmit
(e.g., send, receive, signal, etc.) communications packets 838
comprising any organization of data items. The data items can
comprise a payload data area 837, a destination address 836 (e.g.,
a destination IP address), a source address 835 (e.g., a source IP
address), and can include various encodings or formatting of bit
fields to populate the shown packet characteristics 834. In some
cases the packet characteristics include a version identifier, a
packet or payload length, a traffic class, a flow label, etc. In
some cases the payload data area 837 comprises a data structure
that is encoded and/or formatted to fit into byte or word
boundaries of the packet.
[0114] In some embodiments, hard-wired circuitry may be used in
place of or in combination with software instructions to implement
aspects of the disclosure. Thus, embodiments of the disclosure are
not limited to any specific combination of hardware circuitry
and/or software. In embodiments, the term "logic" shall mean any
combination of software or hardware that is used to implement all
or part of the disclosure.
[0115] The term "computer readable medium" or "computer usable
medium" as used herein refers to any medium that participates in
providing instructions to processor 807 for execution. Such a
medium may take many forms including, but not limited to,
non-volatile media and volatile media. Non-volatile media includes,
for example, optical or magnetic disks such as disk drives or tape
drives. Volatile media includes dynamic memory such as a random
access memory.
[0116] Common forms of computer readable media includes, for
example, floppy disk, flexible disk, hard disk, magnetic tape, or
any other magnetic medium; CD-ROM or any other optical medium;
punch cards, paper tape, or any other physical medium with patterns
of holes; RAM, PROM, EPROM, FLASH-EPROM, or any other memory chip
or cartridge, or any other non-transitory computer readable medium.
Such data can be stored, for example, in any form of external data
repository 831, which in turn can be formatted into any one or more
storage areas, and which can comprise parameterized storage 839
accessible by a key (e.g., filename, table name, block address,
offset address, etc.).
[0117] Execution of the sequences of instructions to practice
certain embodiments of the disclosure are performed by a single
instance of the computer system 8A00. According to certain
embodiments of the disclosure, two or more instances of computer
system 8A00 coupled by a communications link 815 (e.g., LAN, PTSN,
or wireless network) may perform the sequence of instructions
required to practice embodiments of the disclosure using two or
more instances of components of computer system 8A00.
[0118] The computer system 8A00 may transmit and receive messages
such as data and/or instructions organized into a data structure
(e.g., communications packets 838). The data structure can include
program instructions (e.g., application code 803), communicated
through communications link 815 and communications interface 814.
Received program code may be executed by processor 807 as it is
received and/or stored in the shown storage device or in or upon
any other non-volatile storage for later execution. Computer system
8A00 may communicate through a data interface 833 to a database 832
on an external data repository 831. Data items in a database can be
accessed using a primary key (e.g., a relational database primary
key).
[0119] The processing element partition 801 is merely one sample
partition. Other partitions can include multiple data processors,
and/or multiple communications interfaces, and/or multiple storage
devices, etc. within a partition. For example, a partition can
bound a multi-core processor (e.g., possibly including embedded or
co-located memory), or a partition can bound a computing cluster
having plurality of computing elements, any of which computing
elements are connected directly or indirectly to a communications
link. A first partition can be configured to communicate to a
second partition. A particular first partition and particular
second partition can be congruent (e.g., in a processing element
array) or can be different (e.g., comprising disjoint sets of
components).
[0120] A module as used herein can be implemented using any mix of
any portions of the system memory and any extent of hard-wired
circuitry including hard-wired circuitry embodied as a processor
807. Some embodiments include one or more special-purpose hardware
components (e.g., power control, logic, sensors, transducers,
etc.). A module may include one or more state machines and/or
combinational logic used to implement or facilitate the performance
characteristics of low latency and low defect media file
transcoding using optimized partitioning.
[0121] Various implementations of the database 832 comprise storage
media organized to hold a series of records or files such that
individual records or files are accessed using a name or key (e.g.,
a primary key or a combination of keys and/or query clauses). Such
files or records can be organized into one or more data structures
(e.g., data structures used to implement or facilitate aspects of
low latency and low defect media file transcoding using optimized
partitioning). Such files or records can be brought into and/or
stored in volatile or non-volatile memory.
[0122] FIG. 8B depicts a block diagram of an instance of a
cloud-based environment 8B00. Such a cloud-based environment
supports access to workspaces through the execution of workspace
access code (e.g., workspace access code 853.sub.1 and workspace
access code 853.sub.2. Workspace access code can be executed on any
of the shown user devices 852 (e.g., laptop device 852.sub.4,
workstation device 852.sub.5, IP phone device 852.sub.3, tablet
device 852.sub.2, smart phone device 852.sub.1, etc.). A group of
users can form a collaborator group 858, and a collaborator group
can be comprised of any types or roles of users. For example, and
as shown, a collaborator group can comprise a user collaborator, an
administrator collaborator, a creator collaborator, etc. Any user
can use any one or more of the user devices, and such user devices
can be operated concurrently to provide multiple concurrent
sessions and/or other techniques to access workspaces through the
workspace access code.
[0123] A portion of workspace access code can reside in and be
executed on any user device. In addition, a portion of the
workspace access code can reside in and be executed on any
computing platform (e.g., computing platform 860), including in a
middleware setting. As shown, a portion of the workspace access
code (e.g., workspace access code 853.sub.3) resides in and can be
executed on one or more processing elements (e.g., processing
element 862.sub.1). The workspace access code can interface with
storage devices such the shown networked storage 866. Storage of
workspaces and/or any constituent files or objects, and/or any
other code or scripts or data can be stored in any one or more
storage partitions (e.g., storage partition 864.sub.1). In some
environments, a processing element includes forms of storage, such
as RAM and/or ROM and/or FLASH, and/or other forms of volatile and
non-volatile storage.
[0124] A stored workspace can be populated via an upload (e.g., an
upload from a user device to a processing element over an upload
network path 857). One or more constituents of a stored workspace
can be delivered to a particular user and/or shared with other
particular users via a download (e.g., a download from a processing
element to a user device over a download network path 859).
[0125] In the foregoing specification, the disclosure has been
described with reference to specific embodiments thereof. It will,
however, be evident that various modifications and changes may be
made thereto without departing from the broader spirit and scope of
the disclosure. For example, the above-described process flows are
described with reference to a particular ordering of process
actions. However, the ordering of many of the described process
actions may be changed without affecting the scope or operation of
the disclosure. The specification and drawings to be regarded in an
illustrative sense rather than in a restrictive sense.
* * * * *
References