U.S. patent application number 17/708185 was filed with the patent office on 2022-07-14 for systems and methods for determining delay of a plurality of media streams.
This patent application is currently assigned to Evertz Microsystems Ltd.. The applicant listed for this patent is Evertz Microsystems Ltd.. Invention is credited to Rakesh Patel, Jeff Wei.
Application Number | 20220224991 17/708185 |
Document ID | / |
Family ID | |
Filed Date | 2022-07-14 |
United States Patent
Application |
20220224991 |
Kind Code |
A1 |
Wei; Jeff ; et al. |
July 14, 2022 |
SYSTEMS AND METHODS FOR DETERMINING DELAY OF A PLURALITY OF MEDIA
STREAMS
Abstract
A system and method are provided for determining delay of a
plurality of media streams. The system and method involve
generating, at a source processor, a series of source time packets;
transmitting, at the source processor, through a network, the
series of source time packets as a source packet stream;
generating, at a destination processor, a series of destination
time packets; receiving, at the destination processor, through the
network, the source packet stream; determining, at the destination
processor, a transmission time for the source packet stream based
on the source time data and the destination time data; and
determining, at the destination processor, a relative
synchronization error based on the source signature data and the
destination signature data. Each source time packet includes source
time data and source signature data. Each destination time packet
includes destination time data and destination signature data.
Inventors: |
Wei; Jeff; (Richmond Hill,
CA) ; Patel; Rakesh; (Mississauga, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Evertz Microsystems Ltd. |
Burlington |
|
CA |
|
|
Assignee: |
Evertz Microsystems Ltd.
Burlington
CA
|
Appl. No.: |
17/708185 |
Filed: |
March 30, 2022 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16834181 |
Mar 30, 2020 |
11323780 |
|
|
17708185 |
|
|
|
|
62829319 |
Apr 4, 2019 |
|
|
|
International
Class: |
H04N 21/647 20060101
H04N021/647; H04L 43/0852 20060101 H04L043/0852; H04N 21/8547
20060101 H04N021/8547 |
Claims
1. A system for aligning a plurality of media streams comprising: a
source processor configured to: generate a series of source time
signals corresponding to the plurality of media streams, each
source time signal including source time data and source signature
data, wherein: the source time data corresponds to a first time
when the source time signal is generated; the source signature data
corresponds to characteristic features of the corresponding media
stream; transmit, through a network, the series of source time
signals; a destination processor configured to: generate a series
of destination time signals, each destination time signal including
destination time data and destination signature data, wherein: the
destination time data corresponds to a second time when the
destination time signal is generated; the destination signature
data corresponds to characteristic features of the corresponding
media stream; receive, through the network, the series of source
time signals; for at least one source time signal, determine a
relative synchronization error based on the corresponding source
signature data and the destination signature data; and realign the
corresponding at least one media stream to correct the relative
synchronization error.
2. The system of claim 1, wherein the destination processor is
further configured to determine a transmission time for one or more
source time signals based on the corresponding source time data and
the destination time data.
3. The system of claim 2, wherein each source time signal is
packetized, and the series of source time signals are transmitted
as a source packet stream.
4. The system of claim 2, wherein each destination time signal is
packetized.
5. The system of claim 3, wherein the source packet stream is
transmitted in-band with the plurality of media streams.
6. The system of claim 1, wherein the source packet stream is
transmitted out-of-band with the plurality of media streams.
7. The system of claim 1, wherein the source time data and the
destination time data is generated using PTP (Precision Time
Protocol).
8. The system of claim 1, wherein: the source processor is further
configured to transmit, through the network, the plurality of media
streams; the network comprises at least one processing device
configured to process at least one media stream of the plurality of
media streams; and the destination processor is further configured
to receive the plurality of media streams.
9. The system of claim 1, wherein the source time data and the
destination time data further include a clock signal.
10. The system of claim 3, wherein the source packet stream is
transmitted synchronously.
11. The system of claim 3, wherein the source packet stream is
transmitted asynchronously.
12. The system of claim 1, wherein the characteristic features
include at least one of: an average luma value, an average color
value, an average motion distance, and a contrast level.
13. The system of claim 1, wherein the characteristic features
include at least one of: an envelope of signal amplitude, an
average loudness level, a peak formant, and an average zero
crossing rate.
14. The system of claim 1, wherein the plurality of media streams
include at least one of: a video stream, an audio stream, and a
metadata stream.
15. A method for aligning a plurality of media streams comprising:
generating, at a source processor, a series of source time signals,
each source time signal including source time data and source
signature data, wherein: the source time data corresponds to a
first time when the source time signal is generated; the source
signature data corresponds to characteristic features of the
corresponding media stream; transmitting, through a network, the
series of source time signals; generating, at a destination
processor, a series of destination time signals, each destination
time signal including destination time data and destination
signature data, wherein: the destination time data corresponds to a
second time when the destination time signal is generated; the
destination signature data corresponds to characteristic features
of the corresponding media stream; receiving, at the destination
processor, through the network, the series of source time signals;
for at least one source time signal, determining a relative
synchronization error based on the corresponding source signature
data and the destination signature data; and realigning the
corresponding at least one media stream to correct the relative
synchronization error.
16. The method of claim 15, further comprising, determining, at the
destination processor, a transmission time for one or more source
time signals based on the corresponding source time data and the
destination time data.
17. The method of claim 16, wherein each source time signal is
packetized, and the series of source time signals are transmitted
as a source packet stream, and wherein each destination time signal
is packetized.
18. The method of claim 17, wherein the source packet stream is
transmitted in-band with the plurality of media streams.
19. The method of claim 17, wherein the source packet stream is
transmitted out-of-band with the plurality of media streams.
20. The method of claim 15, further comprising: transmitting, by
the source processor, through the network, the plurality of media
streams; processing, by at least one processing device comprised in
the network, at least one media stream of the plurality of media
streams; and receiving, at the destination processor, the plurality
of media streams.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 16/834,181 filed on Mar. 30, 2020, which claims the benefit of
U.S. Provisional Application No. 62/829,319 filed on Apr. 4, 2019,
the complete disclosures of which are incorporated herein by
reference.
FIELD
[0002] The described embodiments relate to determining delay of a
plurality of media streams, and in particular to determining
transmission times and relative synchronization errors.
BACKGROUND
[0003] Media transmission systems can route media streams from
various source devices to various downstream devices. Media streams
can contain video, audio, or metadata content. The metadata is
often referred to as vertical ancillary data (VANC) or horizontal
ancillary data (HANC). In separate elementary essence transmission
systems, each of the streams is typically a separate stream, in the
sense that the information for one stream is not embedded in
another stream. This is in contrast to SDI transmission, in which
audio and ancillary data is embedded in non-visible portions of a
video signal.
[0004] Media streams can originate from different sources and may,
as a result, be out of sync with one another. In some cases, media
streams can originate from the same source but may still be out of
sync with each other. For example, a video stream may be "running
ahead" or "running behind" a corresponding audio stream, resulting
in lip-sync errors. Furthermore, when media streams are transmitted
over a network, the media streams can travel via different network
paths, or be processed by different intermediate devices. As a
result, the media streams may arrive at a downstream device at
different times, resulting in further desynchronization.
Accordingly, it may be desirable to determine transmission times
and relative synchronization errors.
SUMMARY
[0005] In one aspect, some embodiments provide a system for
determining delay of a plurality of media streams. The system
includes a source processor and a destination processor. The source
processor is configured to generate a series of source time
packets; and transmit, through a network, the series of source time
packets as a source packet stream. Each source time packet includes
source time data and source signature data. The source time data
corresponds to a first time when the source time packet is
generated. The source signature data corresponds to characteristic
features of each of the plurality of media streams. The destination
processor is configured to generate a series of destination time
packets; receive, through the network, the source packet stream;
determine a transmission time for the source packet stream based on
the source time data and the destination time data; and determine a
relative synchronization error based on the source signature data
and the destination signature data. Each destination time packet
includes destination time data and destination signature data. The
destination time data corresponds to a second time when the
destination time packet is generated. The destination signature
data corresponds to characteristic features of each of the
plurality of media streams.
[0006] In some embodiments, the source packet stream is transmitted
in-band with the plurality of media streams.
[0007] In some embodiments, the source packet stream is transmitted
out-of-band from the plurality of media streams.
[0008] In some embodiments, the source time data and the
destination time data is generated using PTP (Precision Time
Protocol).
[0009] In some embodiments, the source processor is further
configured to transmit, through the network, the plurality of media
streams. The network includes at least one processing device
configured to process at least one media stream of the plurality of
media streams. The destination processor is further configured to
receive the plurality of media streams.
[0010] In some embodiments, the source time data and the
destination time data further include a clock signal.
[0011] In some embodiments, the source packet stream is transmitted
synchronously.
[0012] In some embodiments, the source packet stream is transmitted
asynchronously.
[0013] In some embodiments, the characteristic features include at
least one of: an average luma value, an average color value, an
average motion distance, and a contrast level.
[0014] In some embodiments, the characteristic features include at
least one of: an envelope of signal amplitude, an average loudness
level, a peak formant, and an average zero crossing rate.
[0015] In some embodiments, the plurality of media streams include
at least one of: a video stream, an audio stream, and a metadata
stream.
[0016] In one aspect, some embodiments provide a system for
determining delay of a plurality of media streams. The system
includes a source processor, a destination processor, and an
analysis processor. The source processor configured to generate a
series of source time packets; and transmit, through a network, the
series of source time packets as a source packet stream. Each
source time packet includes source time data and source signature
data. The source time data corresponds to a first time when the
source time packet is generated. The source signature data
corresponds to characteristic features of each of the plurality of
media streams. The destination processor is configured to generate
a series of destination time packets; and transmit, through the
network, the series of destination time packets as a destination
packet stream. Each destination time packet includes destination
time data and destination signature data. The destination time data
corresponds to a second time when the destination time packet is
generated. The destination signature data corresponds to
characteristic features of each of the plurality of media streams.
The analysis processor is configured to receive, through the
network, the source packet stream and the destination packet
stream; determine a transmission time for at least one of the
source packet stream and the destination packet stream based on at
least one of the source time data and the destination time data;
and determine a relative synchronization error based on the source
signature data and the destination signature data.
[0017] In one aspect, some embodiments provide a method for
determining delay of a plurality of media streams. The method
involves generating, at a source processor, a series of source time
packets; transmitting, at the source processor, through a network,
the series of source time packets as a source packet stream;
generating, at a destination processor, a series of destination
time packets; receiving, at the destination processor, through the
network, the source packet stream; determining, at the destination
processor, a transmission time for the source packet stream based
on the source time data and the destination time data; and
determining, at the destination processor, a relative
synchronization error based on the source signature data and the
destination signature data. Each source time packet includes source
time data and source signature data. The source time data
corresponds to a first time when the source time packet is
generated. The source signature data corresponds to characteristic
features of each of the plurality of media streams. Each
destination time packet includes destination time data and
destination signature data. The destination time data corresponds
to a second time when the destination time packet is generated. The
destination signature data corresponds to characteristic features
of each of the plurality of media streams.
[0018] In some embodiments, the source packet stream is transmitted
in-band with the plurality of media streams.
[0019] In some embodiments, the source packet stream is transmitted
out-of-band from the plurality of media streams.
[0020] In some embodiments, the source time data and the
destination time data is generated using PTP (Precision Time
Protocol).
[0021] In some embodiments, the source time data and the
destination time data further include a clock signal.
[0022] In some embodiments, the method further involves
transmitting, at the source processor, through the network, the
plurality of media streams; processing, at at least one processing
device the network, at least one media stream of the plurality of
media streams; and receiving, at the destination processor, the
plurality of media streams.
[0023] In some embodiments, the source packet stream is transmitted
synchronously.
[0024] In some embodiments, the source packet stream is transmitted
asynchronously.
[0025] In some embodiments, the characteristic features include at
least one of: an average luma value, an average color value, an
average motion distance, and a contrast level.
[0026] In some embodiments, the characteristic features include at
least one of: an envelope of signal amplitude, an average loudness
level, a peak formant, and an average zero crossing rate.
[0027] In some embodiments, the plurality of media streams include
at least one of: a video stream, an audio stream, and a metadata
stream.
[0028] In one aspect, some embodiments provide a method for
determining delay of a plurality of media streams. The method
involves generating, at a source processor, a series of source time
packets; transmitting, at the source processor, through a network,
the series of source time packets as a source packet stream;
generating, at a destination processor, a series of destination
time packets; transmitting, at the destination processor, through a
network, the series of destination time packets as a destination
packet stream; receiving, at an analysis processor, through the
network, the source packet stream and the destination packet
stream; determining, at the analysis processor, a transmission time
for at least one of the source packet stream and the destination
packet stream based on at least one of the source time data and the
destination time data; and determining, at the analysis processor,
a relative synchronization error based on the source signature data
and the destination signature data. Each source time packet
includes source time data and source signature data. The source
time data corresponds to a first time when the source time packet
is generated. The source signature data corresponds to
characteristic features of each of the plurality of media streams.
Each destination time packet includes destination time data and
destination signature data. The destination time data corresponds
to a second time when the destination time packet is generated. The
destination signature data corresponds to characteristic features
of each of the plurality of media streams.
BRIEF DESCRIPTION OF THE DRAWINGS
[0029] Embodiments of the present invention will now be described
in detail with reference to the drawings, in which:
[0030] FIG. 1 is a block diagram of a system for determining delay
of a plurality of media streams, in accordance with at least one
embodiment;
[0031] FIG. 2 is a block diagram of a system for determining delay
of a plurality of media streams, in accordance with at least one
embodiment;
[0032] FIGS. 3A and 3B are illustrations of a plurality of media
streams, source time packets, and destination time packets, in
accordance with at least one embodiment;
[0033] FIG. 4 is a block diagram of a processor, in accordance with
at least one embodiment;
[0034] FIG. 5 is a block diagram of a packet, in accordance with at
least one embodiment;
[0035] FIG. 6 is a flowchart of a method for determining delay of a
plurality of media streams, in accordance with at least one
embodiment; and
[0036] FIG. 7 is a flowchart of a method for determining delay of a
plurality of media streams, in accordance with at least one
embodiment.
[0037] The drawings, described below, are provided for purposes of
illustration, and not of limitation, of the aspects and features of
various examples of embodiments described herein. For simplicity
and clarity of illustration, elements shown in the drawings have
not necessarily been drawn to scale. The dimensions of some of the
elements may be exaggerated relative to other elements for clarity.
It will be appreciated that for simplicity and clarity of
illustration, where considered appropriate, reference numerals may
be repeated among the drawings to indicate corresponding or
analogous elements or steps.
DESCRIPTION OF EXEMPLARY EMBODIMENTS
[0038] It will be appreciated that numerous specific details are
set forth in order to provide a thorough understanding of the
example embodiments described herein. However, it will be
understood by those of ordinary skill in the art that the
embodiments described herein may be practiced without these
specific details. In other instances, well-known methods,
procedures and components have not been described in detail so as
not to obscure the embodiments described herein. Furthermore, this
description and the drawings are not to be considered as limiting
the scope of the embodiments described herein in any way, but
rather as merely describing the implementation of the various
embodiments described herein.
[0039] It should be noted that terms of degree such as
"substantially", "about" and "approximately" when used herein mean
a reasonable amount of deviation of the modified term such that the
end result is not significantly changed. These terms of degree
should be construed as including a deviation of the modified term
if this deviation would not negate the meaning of the term it
modifies.
[0040] In addition, as used herein, the wording "and/or" is
intended to represent an inclusive-or. That is, "X and/or Y" is
intended to mean X or Y or both, for example. As a further example,
"X, Y, and/or Z" is intended to mean X or Y or Z or any combination
thereof.
[0041] It should be noted that the term "coupled" used herein
indicates that two elements can be directly coupled to one another
or coupled to one another through one or more intermediate
elements. Furthermore, the term "body" typically refers to the body
of a patient, a subject or an individual who receives the
ingestible device. The patient or subject is generally a human or
other animal.
[0042] The embodiments of the systems and methods described herein
may be implemented in hardware or software, or a combination of
both. These embodiments may be implemented in computer programs
executing on programmable computers, each computer including at
least one processor, a data storage system (including volatile
memory or non-volatile memory or other data storage elements or a
combination thereof), and at least one communication interface. For
example and without limitation, the programmable computers may be a
server, network appliance, embedded device, computer expansion
module, a personal computer, laptop, personal data assistant,
cellular telephone, smart-phone device, tablet computer, a wireless
device or any other computing device capable of being configured to
carry out the methods described herein.
[0043] In some embodiments, the communication interface may be a
network communication interface. In embodiments in which elements
are combined, the communication interface may be a software
communication interface, such as those for inter-process
communication (IPC). In still other embodiments, there may be a
combination of communication interfaces implemented as hardware,
software, and combination thereof.
[0044] Program code may be applied to input data to perform the
functions described herein and to generate output information. The
output information is applied to one or more output devices, in
known fashion.
[0045] Each program may be implemented in a high level procedural
or object oriented programming and/or scripting language, or both,
to communicate with a computer system. However, the programs may be
implemented in assembly or machine language, if desired. In any
case, the language may be a compiled or interpreted language. Each
such computer program may be stored on a storage media or a device
(e.g. ROM, magnetic disk, optical disc) readable by a general or
special purpose programmable computer, for configuring and
operating the computer when the storage media or device is read by
the computer to perform the procedures described herein.
Embodiments of the system may also be considered to be implemented
as a non-transitory computer-readable storage medium, configured
with a computer program, where the storage medium so configured
causes a computer to operate in a specific and predefined manner to
perform the functions described herein.
[0046] Furthermore, the system, processes and methods of the
described embodiments are capable of being distributed in a
computer program product comprising a computer readable medium that
bears computer usable instructions for one or more processors. The
medium may be provided in various forms, including one or more
diskettes, compact disks, tapes, chips, wireline transmissions,
satellite transmissions, internet transmission or downloadings,
magnetic and electronic storage media, digital and analog signals,
and the like. The computer useable instructions may also be in
various forms, including compiled and non-compiled code.
[0047] Reference is first made to FIG. 1, which illustrates a block
diagram of system 100 for determining delay of a plurality of media
streams 110, in accordance with at least one embodiment. System 100
includes source processor 102, destination processor 104, and
network 108. Source processor 102 is connected to destination
processor 104 via network 108. Various data can be transmitted from
source processor 102 to destination processor 104 across network
108.
[0048] Source processor 102 and destination processor 104 can be
any suitable processors, controllers, digital signal processors,
graphics processing units, application specific integrated circuits
(ASICs), and/or field programmable gate arrays (FPGAs) that can
provide sufficient processing power depending on the configuration,
purposes and requirements of the system 100. In some embodiments,
source processor 102 and destination processor 104 can include more
than one processor with each processor being configured to perform
different dedicated tasks.
[0049] Source processor 102 can be connected to one or more source
devices (not shown) that generate media content. For example, the
source devices may be cameras, microphones, or other devices for
generating video, audio, or metadata content. Source processor 102
can receive media content from the source devices and generate
media streams 110. In some embodiments, source processor 102 can
receive media streams 110 from the source devices and does not
generate media streams 110. In some embodiments, source processor
102 is a source device. In some embodiments, source processor 102
can be connected to one or more other processing devices (not
shown) that transmit media streams 110 to source processor 102.
[0050] Each stream of media streams 110 can include video, audio,
or metadata content. In some embodiments, each stream includes only
one type of content. In other embodiments, each stream can include
more than one type of content. A media stream that includes video,
audio, or metadata may be referred to as a video stream, audio
stream, or metadata stream, respectively. In some embodiments, each
stream of media streams 110 is packetized. That is, the data within
each stream is formatted as a plurality of packets. Accordingly,
each media stream can include a plurality of media packets, each
video stream can include a plurality of video packets, each audio
stream can include a plurality of audio packets, and each metadata
stream can include a plurality of metadata packets. It will be
appreciated that although only three media streams 110 are shown,
there can be any number of media streams 110.
[0051] Source processor 102 can transmit media streams 110 through
network 108 to destination processor 104. In some embodiments,
media streams 110 are transmitted by source processor 102 using a
synchronous communication standard, such as SDI (Serial Digital
Interface). In other embodiments, media streams 110 are transmitted
using an asynchronous communication standard, such as IP (Internet
Protocol). In some cases, media streams 110 are transmitted in a
steady stream. In some cases, media streams 110 are transmitted
intermittently.
[0052] Network 108 can include various network paths (not shown)
through which data, such as media streams 110, can be routed. In
some embodiments, the network paths can include various switches
and intermediate processing devices. The switches can selectively
reconfigure one or more network paths to change the routing of
media streams 110. For example, the switches can route a stream
from the source processor 102, to one or more intermediate
processing devices, to the destination processor 104. The
intermediate processing devices can process one or more of media
streams 110. For example, an intermediate processing device can
process a video stream to adjust various characteristics of the
video content, such as resolution, color, contrast, brightness,
orientation, level of compression, etc. Similarly, an intermediate
processing device may process an audio stream to adjust
characteristics of the audio content, such as equalization, level
of compression, etc. An intermediate processing device may also
process a metadata stream to add new metadata, or remove or modify
existing metadata.
[0053] Destination processor 104 can receive media streams 110 from
source processor 102, through network 108. In some embodiments,
destination processor 104 can buffer one or more of media streams
110. That is, destination processor 104 can temporarily store data
from one or more streams in a memory (not shown). For example,
media streams 110 received at different times or at different rates
can be buffered by destination processor 104 for later
processing.
[0054] Destination processor 104 can be connected to one or more
downstream devices (not shown). For example, destination processor
104 can be connected to a video production system. Destination
processor 104 can transmit media streams 110 to the video
production system, which can output media streams 110. For example,
a video production system can display video streams on one or more
monitors or play audio streams on one or more speakers. In some
cases, the video production system can be used to facilitate
production of a television broadcast.
[0055] In some cases, media streams 110 may be out of sync or
temporally misaligned with respect to each other when they are
received by destination processor 104. For example, a video stream
may be "running ahead" or "running behind" a corresponding audio
stream, resulting in lip-sync errors. This may be caused by the
fact the media content was generated by different source devices.
In some cases, there may be desynchronization even where media
content was generated by the same source device. The
desynchronization may be caused by media streams 110 traveling on
different network paths or having different intermediate
processing. As will be discussed in further detail below, system
100 can determine transmission times and relative synchronization
errors.
[0056] Source processor 102 can generate a series of source time
packets 112. It will be appreciated that although only three source
time packets 114 are shown, source processor 102 can generate any
number of source time packets 114. In some embodiments, source
processor 102 can generate a source time packet 114 for each
segment of a media stream. For example, for a video stream, the
segment may correspond to a video frame. Accordingly, source
processor 102 can generate a series of source time packets 112 at
the same frequency as a video frame rate of a video stream. In some
embodiments, source processor 102 can generate a source time packet
114 for each set of contemporaneous media segments. For example, a
set of media segments may include a video frame, an audio segment
cotemporaneous with the video frame, and metadata cotemporaneous
with the video frame.
[0057] Each source time packet includes source time data. Source
time data corresponds to the time when the source time packet 114
is generated. For example, source time data can include a timestamp
identifying when the source time packet 114 was generated. This
time may be referred to as a first time. In some embodiments, each
source time packet 114 is generated approximately contemporaneous
with the transmission of the source time packet 114. In some
embodiments, each source time packet 114 is generated approximately
contemporaneous with the transmission of a segment of a media
stream or with the transmission of a set of cotemporaneous media
segments. Accordingly, in some embodiments, the source time data
can correspond to the time when the source time packet 114 or a
particular segment of a media stream is transmitted.
[0058] In some embodiments, source time data can be generated using
a clock which is synchronized throughout system 100. For example,
source time data can be generated using PTP (Precision Time
Protocol). PTP can ensure that time values determined at the same
time by different devices, in possibly different locations, share a
common time.
[0059] Each source time packet 114 further includes source
signature data. Source signature data corresponds to characteristic
features of each of media streams 110. The characteristic features
can be used to identify a particular segment of a particular
stream. For example, for a video stream, the characteristic
features may correspond to a particular video frame. For a video
stream, the characteristic features may include an average luma
value, an average color value, an average motion distance, or a
contrast level. Similarly, for an audio stream, the characteristic
feature may include an envelope of signal amplitude, an average
loudness level, a peak formant, and an average zero crossing rate.
For a metadata stream, the characteristic feature may include a
hash value of some or all of the metadata. In some embodiments, the
characteristic features can correspond to a set of cotemporaneous
segments of media streams 110. For example the characteristic
features can identify a video frame, an audio segment
cotemporaneous with the video frame, and metadata cotemporaneous
with the video frame.
[0060] In some embodiments, each source time packet 114 can include
additional time data, such as a clock signal, to facilitate video
network communication. For example, some video transmission
standards, such as some SDI standards, require a 90 kHz clock to be
embedded with video data, on a frame-by-frame basis. The 90 kHz
clock can be embedded in each source time packet 114 to allow each
source time packet 114 to be synchronized with each specific video
frame.
[0061] It will be appreciated that source time packets 114 can be
any data structure or collection of the various data items,
regardless of whether they are assembled or transmitted in any
particular structure. That is, a source time packet 114 may, in
some embodiments, never be assembled as a packet or
transmitted.
[0062] Source processor 102 can transmit the series of source time
packets 112 as source packet stream 116 through network 108. Source
packet stream 116 can be a packetized stream. That is, source
packet stream 116 can include data that is formatted in a plurality
of packets. Source packet stream 116 can be transmitted using a
synchronous communication standard or an asynchronous communication
standard.
[0063] Source packet stream 116 can be transmitted out-of-band from
media streams 110. That is, source packet stream 116 and media
streams 110 are transmitted in separate streams. However, in some
embodiments, source packet stream 116 is transmitted in-band with
media streams 110. That is, source packet stream 116 and media
streams 110 are transmitted in the same stream. In such
embodiments, source packet stream 116 travels along the same
network path as one or more media streams. For example, source time
packets 114 can be transmitted in the same stream as other video,
audio, or metadata packets. In another example, source time packets
114 can be embedded in a metadata packet (such as in VANC) in a
metadata stream or media stream. In some embodiments, source packet
stream 116 can be transmitted to the same IP address as media
streams 110, but to a different UDP port number.
[0064] Destination processor 104 can receive, through network 108,
source packet stream 116 and media streams 110. Destination
processor 104 can generate a series of destination time packets
118. In some embodiments, destination processor 104 generates each
destination time packet 120 for each segment of a media stream. For
example, for a video stream, a segment may correspond to a video
frame. In some embodiments, destination processor 104 can generate
a source time packet 120 for each set of contemporaneous media
segments.
[0065] Each destination time packet 120 includes destination time
data, similar to source time packets 114 and source time data.
Destination time data corresponds to the time when the destination
time packet 120 is generated. This time may be referred to as a
second time. In some embodiments, each destination time packet 120
is generated approximately contemporaneous with the reception of
each source time packet 114. In some embodiments, each destination
time packet 120 is generated approximately contemporaneous with the
reception of each segment of a media stream or each set of
cotemporaneous media segments. In some embodiments, the destination
time data is generated using PTP. In some embodiments, the
destination time data can include a clock signal.
[0066] Each destination time packet 120 also includes destination
signature data, similar to source time packets 114 and source
signature data. Destination signature data corresponds to
characteristic features of each of the media streams 110. The
characteristic features can be similar as those described source
time packets 114. In some embodiments, the characteristic features
can correspond to a set of cotemporaneous segments of media streams
110.
[0067] It will be appreciated that destination time packets 120 may
refer to any data structure or collection of the various data
items, regardless of whether they are assembled or transmitted in
any particular structure. That is, a destination time packet 120
may, in some embodiments, never be assembled as a packet or
transmitted.
[0068] Destination processor 104 can determine a transmission time
for the source packet stream 116 based on the source time data and
the destination time data. For example, destination processor 104
can determine a difference between a first time when a source time
packet is generated and a second time when a destination time
packet is generated. The source time packet can be generated
contemporaneously with the transmission of source packet stream 116
and the destination time packet can be generated contemporaneously
with the reception of source packet stream 116. Accordingly, the
difference between the first time and the second time can indicate
a transmission time of the source packet stream 116 through network
108. In some cases, the transmission time of the source packet
stream 116 can be substantially equal to the transmission time of
one or more of media streams 110. For example, this may be the case
where source packet stream 116 travels along the same network path
as one or more of media streams 110, or where the source time
packet is generated approximately cotemporaneous with the
transmission of the one or more media stream.
[0069] Destination processor 104 can also determine a relative
synchronization error for media streams 110. Relative
synchronization error can refer to a difference between the delays
of two or more media streams. For example, for an audio stream that
was delayed 100 ms and a video stream that was delayed 25 ms, the
relative synchronization error is 75 ms. That is, the audio stream
is running 75 ms behind the video stream. In some cases, the
relative synchronization can be based on relative delays of media
streams 110. That is, the delays are relative to another time,
rather than absolute. For example, the delay of the streams can be
relative to the transmission time for the source packet stream 116.
That is, the delays are relative to the time when the source packet
stream 116 is received by destination processor 104.
[0070] Destination processor 104 can determine the relative
synchronization error based on the source signature data and the
destination signature data. As discussed above, the source
signature data and destination signature data can include
characteristic features of the media streams that correspond to
particular segments of the media stream. Destination processor 104
can compare the source signature data of each source time packet
114 and destination signature data of each destination time packet
120. The comparison can be used by destination processor 104 to
locate temporal misalignments or relative synchronization errors
between media streams. The comparison of source and destination
signature data will be described in further detail below with
respect to FIGS. 3A and 3B. In some embodiments, destination
processor 104 can then realign media streams 110 to correct for the
synchronization error. In some embodiments, destination processor
104 can determine the transmission time for each of media streams
110 based on the transmission time and the relative synchronization
error.
[0071] Referring now to FIG. 2, shown therein is a block diagram of
system 200 for determining delay of a plurality of media streams,
in accordance with at least one embodiment. Similar to system 100
of FIG. 1, system 200 includes source processor 102, destination
processor 104, and network 108. However, in contrast to system 100,
system 200 further includes analysis processor 106.
[0072] Similar to system 100, source processor 102 can transmit
media streams 110 through network 108. Source processor 102 can
also generate a series of source time packets 112, where each
source time packet 114 includes source time data and source
signature data. However, in contrast with system 100, source
processor 102 transmits the series of source time packets 112 as
source packet stream 116 to analysis processor 106 (rather than to
destination processor 104). In some embodiments, source processor
102 transmits source packet stream 116 to analysis processor 106
through network 108. In some embodiments, source processor 102 can
transmit source packet stream 116 to destination processor 104 and
destination processor 104 can transmit source packet stream 116 to
analysis processor 106.
[0073] Similarly, destination processor 104 can receive media
streams 110 through network 108. Destination processor 104 can also
generate a series of destination time packets 118, where each
destination time packet 120 include source time data and source
signature data. However, in contrast to system 100, destination
processor 104 transmits the series of destination time packets 118
as destination packet stream 122 to analysis processor 106. In some
embodiments, destination processor 104 transmits destination packet
stream 122 to analysis processor 106 through network 108.
[0074] Analysis processor 106 can be any suitable processors,
controllers, digital signal processors, graphics processing units,
application specific integrated circuits (ASICs), and/or field
programmable gate arrays (FPGAs) that can provide sufficient
processing power depending on the configuration, purposes and
requirements of the system 200. In some embodiments, analysis
processor 106 can include more than one processor with each
processor being configured to perform different dedicated
tasks.
[0075] Analysis processor 106 can receive source packet stream 116
from source processor 102 and receive destination packet stream 122
from destination processor 104. In some embodiments, analysis
processor 106 can receive source packet stream 116 and destination
packet stream 122 through network 108.
[0076] Analysis processor 106 can determine a transmission time for
source packet stream 116 based on source time data. For example,
analysis processor 106 can compare a first time from the source
time data indicating when a source time packet was generated with
the time at which the source time packet was received at analysis
processor 106. Since the source time packet can be generated
approximately contemporaneously with the transmission of source
packet stream 116, the difference can correspond to the
transmission time of the source time packet. Similarly, analysis
processor 106 can also determine a transmission time for
destination packet stream 122 based on destination time data.
[0077] Analysis processor 106 can also determine a relative
synchronization error, based on the source signature data and the
destination signature data, in a similar manner as described above
with respect to destination processor 106. In some cases, the
relative synchronization error is a difference between relative
delays of media streams 110, where the delays are relative to the
transmission time for source packet stream 116 or destination
packet stream 122. In some embodiments, analysis processor 106 can
realign media streams 110 to correct for the relative
synchronization error. In some embodiments, analysis processor 106
can determine a transmission time for media streams 110 based on
the transmission time for source packet stream 116 or destination
packet stream 122 and relative synchronization error.
[0078] Referring now to FIGS. 3A and 3B, shown therein is an
illustration of media streams 110, source time packets 112, and
destination time packets 118. Media streams 110 include video
stream 110v, audio stream 110a, and metadata stream 110m. Video
stream 110v includes video segments V.sub.1, V.sub.2, . . .
V.sub.n. Similarly, audio stream 110a includes audio segments
A.sub.1, A.sub.2, . . . A.sub.n and metadata stream 110m includes
metadata segments M.sub.1, M.sub.2, . . . M.sub.n. It will be
appreciated that although only three media streams are shown, there
may be any number of media streams 110.
[0079] In FIG. 3A, video stream 110v is aligned temporally with
audio stream 110a and metadata stream 110m. For example, V.sub.1
can correspond to a video frame, A.sub.1 can correspond to audio
cotemporaneous to that video frame, and M.sub.1 can correspond to
metadata cotemporaneous to the video frame. V.sub.1 is synchronized
with A.sub.1 and M.sub.1, V.sub.2 is synchronized with A.sub.2 and
M.sub.2, and V.sub.n is synchronized with A.sub.n and M.sub.n. In
this case, there is no difference in delays between video stream
110v, audio stream 110a, and metadata stream 110m. This may be the
case, for example, for media streams 110 at source processor 102 of
FIGS. 1 and 2.
[0080] In some cases, video stream 110v, audio stream 110a, and
metadata stream 110m can become misaligned or desynchronized with
respect to each other. That is, a difference in delays can develop
between one or more of video stream 110v, audio stream 110a, and
metadata stream 110m. This may be the case, for example, for the
media streams 110 at destination processor 104 of FIGS. 1 and 2.
The desynchronization can be caused, for example, when media
streams 110 are transmitted through network 108 of FIGS. 1 and
2.
[0081] In FIG. 3B, video stream 110v is now no longer synchronized
with audio stream 110a and metadata stream 110m. That is, V.sub.1
is now cotemporaneous with A.sub.2 and M.sub.0, instead of A.sub.1
and M.sub.1, V.sub.2 is now cotemporaneous with A.sub.3 and
M.sub.1, instead of A.sub.2 and M.sub.2; and V.sub.n is
cotemporaneous with A.sub.n+1 and M.sub.n-1, instead of A.sub.n and
M.sub.n. There is a difference in delays between video stream 110v,
audio stream 110a and metadata stream 110m. Audio stream 110a is
"running ahead" of video stream 110v, and metadata stream 110m is
"running behind" video stream 110v.
[0082] In order to realign or resynchronize media streams 110v,
110a, 110m, a series of source time packets 112 and destination
time packets 118 can be used. Source time packets 112 include
packets ST.sub.1, ST.sub.2, . . . ST.sub.n. Source time packets 112
can be generated, for example, by source processor 102 of FIGS. 1
and 2. A source time packet 114 is generated for each set of
cotemporaneous segments of media stream 110v, 110a, 110m. For
example, source time packet ST.sub.1 is generated for segments
A.sub.1, and M.sub.1; source time packet ST.sub.2 is generated for
segments V.sub.2, A.sub.2, and M.sub.2; and source time packet
ST.sub.n is generated for segments V.sub.n, A.sub.n, and
M.sub.n.
[0083] Similarly, destination time packets 118 include packets
DT.sub.1, DT.sub.2, . . . DT.sub.n and can be generated, for
example, by destination processor 104 of FIGS. 1 and 2. A
destination time packet 120 is generated for each set of
cotemporaneous segments of media stream 110v, 110a, 110m. For
example, source time packet DT.sub.1 is generated for segments
V.sub.1, A.sub.2, and M.sub.0; source time packet DT.sub.2 is
generated for segments V.sub.2, A.sub.3, and M.sub.1; and source
time packet DT.sub.n is generated for segments V.sub.n, A.sub.n+1,
and M.sub.n-1.
[0084] Each source time packet ST.sub.1, ST.sub.2, . . . ST.sub.n
includes signature data that corresponds to characteristic features
of media streams 110v, 110a, 110m. The characteristic features
correspond to the respective segments of media streams 110v, 110a,
110m. For example, source time packet ST.sub.1 includes signature
data corresponding to a characteristic feature of video segment
V.sub.1, audio segment A.sub.1, and metadata segment A.sub.1.
Similarly, source time packet ST.sub.2 includes signature data
corresponding to V.sub.2, A.sub.2, and M.sub.2, and source time
packet ST.sub.n includes signature data corresponding to V.sub.n,
A.sub.n, and M.sub.n.
[0085] Each destination time packet DT.sub.1, DT.sub.2, . . .
DT.sub.n also includes signature data that corresponds to
characteristic features of media streams 110v, 110a, 110m. The
characteristic features correspond to the respective segments of
media streams 110v, 110a, 110m. For example, destination time
packet DT.sub.1 includes signature data corresponding to
characteristic feature of video segment V.sub.1, audio segment
A.sub.2, and metadata segment M.sub.0. Similarly, source time
packet DT.sub.2 includes signature data corresponding to V.sub.2,
A.sub.3, and M.sub.1 and source time packet DT.sub.n includes
signature data corresponding to media segments V.sub.n, A.sub.n+1,
and M.sub.n-1.
[0086] The signature data of source time packets 112 and
destination time packets 118 can be compared to determine a
relative synchronization error of media streams 110v, 110a, 110m.
For example, the signature data of source time packet ST.sub.1
indicates that segment V.sub.1 should be aligned temporally with
segments A.sub.1 and M.sub.1. However, the signature data of
destination time packet DT.sub.1 indicates that segment V.sub.1 is
aligned temporally with segments A.sub.2 and M.sub.0. Accordingly,
a difference in relative delays between video stream 110v and audio
stream 110a can be determined based on A.sub.1 and A.sub.2.
Similarly, a relative synchronization error between video stream
110v and metadata stream 110m can be determined based on M.sub.0
and M.sub.1. Based on the relative synchronization error, media
streams 110v, 110a, and 110m can be realigned or resynchronized, so
that V.sub.1 is synchronized with A.sub.1 and M.sub.1, V.sub.2 is
synchronized with A.sub.2 and M.sub.2, and V.sub.n is synchronized
with A.sub.n and M.sub.n.
[0087] In some cases, the signature data of source time packets 112
and destination time packets 118 may be compared using
cross-correlation. For example, in some cases, the signature data
of a source time packet may not be identical with the signature
data of a destination time packet. This may be the case when
intermediate processing is performed on the media streams 110v,
110a, 110m. In such cases, the signature data of source time
packets 112 and destination time packets 118 may be
cross-correlated to determine relative synchronization errors.
[0088] Referring now to FIG. 4, shown therein is a block diagram of
a processor 402 for determining delay of a plurality of media
streams, in accordance with at least one embodiment. For example,
Processor 402 may be source processor 102 or destination processor
104 of system 100 or system 200. Processor 402 includes signature
data generator 404, time data generator 406, and packet generator
408.
[0089] Processor 402 can be any suitable processors, controllers,
digital signal processors, graphics processing units, application
specific integrated circuits (ASICs), and/or field programmable
gate arrays (FPGAs) that can provide sufficient processing power
depending on the configuration, purposes and requirements of the
system. In some embodiments, processor 402 can include more than
one processor with each processor being configured to perform
different dedicated tasks.
[0090] Signature data generator 402 can receive media streams 110.
In some embodiments, processor 402 receives source signals (not
shown) and generates media streams 110 that are received by
signature data generator 402. Signature generator 402 can generate
signature data based on media streams 110. As discussed above,
signature data corresponds to characteristic features of each of
media streams 110.
[0091] Time data generator 406 can generate time data. The time
data corresponds to a time when a packet of packets 410 is
generated. In some embodiments, packets 410 are generated
approximately contemporaneous with the transmission of packets 410
or media streams 110. In some cases, the time data can also include
a clock signal.
[0092] Packet generator 408 can generate packets 410 that include
the signature data and time data. For example, packet generator 408
can generate source time packets 112 and destination time packets
118 of systems 100 and 200. It will be appreciated that packets 410
can any data structure or collection of the various data items,
regardless of whether they are assembled or transmitted in any
particular structure. That is, packets 410 may, in some
embodiments, never be assembled as a packet or transmitted.
[0093] Processor 402 can then transmit the generated packets 410 as
a packet stream (not shown). For example, processor can transmit
the source packet stream 116 or the destination packet stream 122
of FIGS. 1 and 2. Processor 402 can also transmit the media streams
110.
[0094] Referring now to FIG. 5, shown therein is a block diagram of
a packet 502, in accordance with at least one embodiment. Packet
502 includes time data 504 and signature data 506. For example,
packet 502 may be a source time packet 114 or a destination time
packet 118 of system 100 or system 200. It will be appreciated that
packet 502 can any data structure or collection of the various data
items, regardless of whether they are assembled or transmitted in
any particular structure. That is, packet 502 may, in some
embodiments, never be assembled as a packet or transmitted.
[0095] Time data 504 includes time stamp data 508 and clock signal
data 510. Time stamp data 508 can include data indicating a time
when packet 502 was generated. In some cases, packet 502 is
generated approximately contemporaneous with its transmission.
Clock signal data 510 can include data required by certain video
transmissions standards, such as a 90 kHz clock.
[0096] Signature data 506 includes video signature data 512, audio
signature data 514, and metadata signature data 516. Signature data
can include characteristic features of particular segments of one
or more media streams.
[0097] Referring now to FIG. 6, shown therein is a flowchart of a
method 600 for determining delay of a plurality of media streams,
in accordance with at least one embodiment. For example, method 600
can be implemented using source processor 102, destination
processor 104, and network 108 of system 100. Method 600 begins
with generating, at a source processor, a series of source time
packets at 602. For example, source processor 104 can generate a
series of source time packets 112. Each source time packet includes
source time data and source signature data. The source time data
corresponds to a first time when the source time packet is
generated. The source signature data corresponds to characteristic
features of each of the plurality of media streams.
[0098] At 604, the series of source time packets is transmitted, at
the source processor, as a source packet stream through a network.
For example, the series of source time packets 112 can be
transmitted as source packet stream 116 by source processor 102
through network 108.
[0099] At 606, a series of destination time packets is generated at
a destination processor. For example, destination processor 104 can
generate a series of destination time packets 112. Each destination
time packet includes destination time data and destination
signature data. The destination time data corresponds to a second
time when the destination time packet is generated. The destination
signature data corresponds to characteristic features of each of
the plurality of media streams.
[0100] At 608, the source packet stream is received, at the
destination processor, through the network. For example,
destination processor 104 can receive source packet stream 116.
[0101] At 610, a transmission time for the source packet stream is
determined, at the destination processor, based on the source time
data and the destination time data. For example, destination
processor 104 can determine a transmission time for source packet
stream 116 based on source time data and destination time data.
[0102] At 612, a relative synchronization error is determined, at
the destination processor, based on the source signature data and
the destination signature data. For example, destination processor
104 can determine a synchronization error based on the source
signature data and the destination signature data.
[0103] Referring now to FIG. 6, shown therein is a flowchart of a
method 700 for determining delay of a plurality of media streams,
in accordance with at least one embodiment. For example, method 700
can be implemented using source processor 102, destination
processor 104, analysis processor 106, and network 108 of system
200. Method 700 begins with generating, at a source processor, a
series of source time packets at 702. For example, source processor
102 can generate a series of source time packets 112. Each source
time packet includes source time data and source signature data.
The source time data corresponds to a first time when the source
time packet is generated. The source signature data corresponds to
characteristic features of each of the plurality of media
streams.
[0104] At 704, the series of source time packets is transmitted, at
the source processor, as a source packet stream through a network.
For example, source processor 102 can transmit source time packets
112 as source packet stream 116.
[0105] At 706, a series of destination time packets is generated at
a destination processor. For example, destination processor 104 can
generate a series of destination time packets 118. Each destination
time packet includes destination time data and destination
signature data. The destination time data corresponds to a second
time when the destination time packet is generated. The destination
signature data corresponds to characteristic features of each of
the plurality of media streams.
[0106] At 708, the series of destination time packets is
transmitted, at the destination processor, as a destination packet
stream through the network. For example, destination processor 104
can transmit destination time packets 118 as destination packet
stream 122.
[0107] At 710, the analysis processor receives the source packet
stream and the destination packet stream. For example, analysis
processor 106 can receive source packet stream 116 and destination
packet stream 122.
[0108] At 712, the analysis processor determines a transmission
time for at least one of the source packet stream and the
destination packet stream based on at least one of the source time
data and the destination time data. For example, analysis processor
106 can determine the transmission time for at least one of source
packet stream 116 and destination packet stream 122.
[0109] At 714, the analysis processor determines a relative
synchronization error based on the source signature data and the
destination signature data. For example, analysis processor 106 can
determine a relative synchronization error.
[0110] The present invention has been described here by way of
example only. Various modification and variations may be made to
these exemplary embodiments without departing from the spirit and
scope of the invention, which is limited only by the appended
claims.
* * * * *