U.S. patent application number 16/812185 was filed with the patent office on 2020-07-02 for recovery from packet loss during transmission of compressed video streams.
The applicant listed for this patent is Sony Interactive Entertainment America LLC. Invention is credited to Roger van der Laan.
Application Number | 20200213625 16/812185 |
Document ID | / |
Family ID | 51526949 |
Filed Date | 2020-07-02 |
United States Patent
Application |
20200213625 |
Kind Code |
A1 |
van der Laan; Roger |
July 2, 2020 |
Recovery From Packet Loss During Transmission Of Compressed Video
Streams
Abstract
A method for recovering errors in video delivered to a client
device over a network is disclosed. The method includes generating,
by a server, a sequence of video frames. The method includes
slicing, by the server, each video frame into a plurality of
slices. The method includes encoding, by the server, each of the
plurality of slices of each video frame. The encoding is configured
to produce a compressed video stream comprising an initial I-frame
followed by a plurality P-frames, and the initial I-frame and each
of the P-frames is defined by respective said plurality of slices.
The method includes transmitting, by the server, the compressed
video stream over the network to the client device. The method
includes receiving, by the server, a notification of data loss
detected in a slice position of a frame received at the client
device. The notification is received from the client device. The
method includes encoding, by the server, responsive to receiving
the notification at the server, a next frame with the slice
position of the next frame encoded as an I-slice and all remaining
slice positions in the next frame being encoded as P-slices.
Inventors: |
van der Laan; Roger;
(Redwood City, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Sony Interactive Entertainment America LLC |
San Mateo |
CA |
US |
|
|
Family ID: |
51526949 |
Appl. No.: |
16/812185 |
Filed: |
March 6, 2020 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15620808 |
Jun 12, 2017 |
|
|
|
16812185 |
|
|
|
|
13837541 |
Mar 15, 2013 |
9681155 |
|
|
15620808 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04N 19/166 20141101;
H04N 19/895 20141101; H04N 19/174 20141101; H04N 19/105 20141101;
H04N 19/65 20141101 |
International
Class: |
H04N 19/895 20060101
H04N019/895; H04N 19/105 20060101 H04N019/105; H04N 19/166 20060101
H04N019/166; H04N 19/174 20060101 H04N019/174; H04N 19/65 20060101
H04N019/65 |
Claims
1. A method for recovering errors in video delivered to a client
device over a network, comprising: generating, by a server, a
sequence of video frames; slicing, by the server, each video frame
into a plurality of slices; encoding, by the server, each of the
plurality of slices of each video frame, wherein the encoding is
configured to produce a compressed video stream comprising an
initial I-frame followed by a plurality P-frames, the initial
I-frame and each of the P-frames defined by respective said
plurality of slices; transmitting, by the server, the compressed
video stream over the network to the client device; receiving, by
the server, a notification of data loss detected in a slice
position of a frame received at the client device, the notification
being received from the client device; and encoding, by the server,
responsive to receiving the notification at the server, a next
frame with the slice position of the next frame being encoded as an
I-slice and all remaining slice positions in the next frame being
encoded as P-slices.
2. The method of claim 1, wherein the compressed video stream is
transmitted as a stream of data packets.
3. The method of claim 1, wherein a single data packet comprising
data for no more than a single slice to enable the single slice to
be sent for said data loss.
4. The method of claim 1, wherein the notification is sent via a
feedback loop from the client device to the server, for use by an
encoder of the server.
5. The method of claim 1, further comprising encoding a subsequent
frame immediately after the next frame in the sequence as a
P-frame.
6. The method of claim 1, wherein the initial I-frame contains only
I-slices.
7. The method of claim 1, wherein each of the P-frames contain only
P-slices.
8. The method of claim 1, wherein the client device is configured
to decode each slice received as long as no data loss is
detected.
9. The method of claim 1, wherein send encoding the next frame to
include the I-slice and all remaining slice positions to include
the P-slices, retransmission avoids sending a plurality of I-slices
for slices that were not associated with data loss.
10. Computer readable media having program instructions for
processing a method for recovering errors in video delivered to a
client device over a network, comprising: program instructions for
generating, by a server, a sequence of video frames; program
instructions for slicing, by the server, each video frame into a
plurality of slices; program instructions for encoding, by the
server, each of the plurality of slices of each video frame,
wherein the encoding is configured to produce a compressed video
stream comprising an initial I-frame followed by a plurality
P-frames, the initial I-frame and each of the P-frames defined by
respective said plurality of slices; program instructions for
transmitting, by the server, the compressed video stream over the
network to the client device; program instructions for receiving,
by the server, a notification of data loss detected in a slice
position of a frame received at the client device, the notification
being received from the client device; and program instructions for
encoding, by the server, responsive to receiving the notification
at the server, a next frame with the slice position of the next
frame being encoded as an I-slice and all remaining slice positions
in the next frame being encoded as P-slices.
11. The computer readable media of claim 10, wherein a single data
packet comprising data for no more than a single slice to enable
the single slice to be sent for said data loss.
12. The computer readable media of claim 10, wherein the
notification is sent via a feedback loop from the client device to
the server, for use by an encoder of the server.
13. The computer readable media of claim 10, further comprising
encoding a subsequent frame immediately after the next frame in the
sequence as a P-frame.
14. The computer readable media of claim 10, wherein the initial
I-frame contains only I-slices.
15. The computer readable media of claim 10, wherein each of the
P-frames contain only P-slices.
16. The computer readable media of claim 10, wherein the client
device is configured to decode each slice received as long as no
data loss is detected.
17. The computer readable media of claim 10, wherein send encoding
the next frame to include the I-slice and all remaining slice
positions to include the P-slices, retransmission avoids sending a
plurality of I-slices for slices that were not associated with data
loss.
Description
CLAIM OF PRIORITY
[0001] This application is a Divisional of U.S. application Ser.
No. 15/620,808, filed on Jun. 12, 2017, entitled "Recovery From
Packet Loss During Transmission Of Compressed Video Streams", which
is a further Divisional of U.S. application Ser. No. 13/837,541
filed Mar. 15, 2013 (U.S. Pat. No. 9,681,155, issued on Jun. 13,
2017), entitled "Recovery From Packet Loss During Transmission Of
Compressed Video Streams", which are herein incorporated by
reference.
TECHNICAL FIELD
[0002] The present disclosure relates generally to transmission of
compressed video over computer networks; more specifically, to
methods and apparatus for mitigating the effects of packet loss
which occur when one or more packets of digital data travelling
across a computer network fail to reach their destination
intact.
BACKGROUND
[0003] Remote hosting of online, fast-action, interactive video
games and other high-end video applications typically requires very
low latencies. For example, for twitch video games and
applications, low round-trip latency, as measured from the time a
user's control input is sent to the hosting service center to the
time that the newly generated video content appears on the screen
of the user's client device, is typically required. At higher
latencies, performance suffers noticeably. Achieving such low
latencies over the Internet or other similar networks requires the
video compressor at the hosting service to generate a packet stream
with particular characteristics such that the packet sequence
flowing through the entire path from the hosting service to the
client device is not subject to delays or excessive packet loss. In
addition, the video compressor must create a packet stream which is
sufficiently robust so that it can tolerate the inevitable packet
loss and packet reordering that occurs in normal Internet and
network transmissions.
[0004] In streaming video technologies, lost or dropped packets can
result in highly noticeable performance issues, potentially causing
the screen to completely freeze for a period of time or show other
screen-wide visual artifacts (e.g., jitter). If a lost/delayed
packet causes the loss of a key frame (i.e., I-frame), then the
decompressor on the client device will lack a reference for all of
the P-frames that follow until a new I-frame is received.
Similarly, if a P frame is lost, that will impact the P-frames that
follow. Depending on how long it will be before an I-frame appears,
this can have a significant visual impact. (As is well-known,
I-frames are the only type of frame that is not coded with
reference to any other frame. P-frames are coded predicatively from
a previous I-frame or P-frame; B-frames are coded predicatively
from I-frames and P-frames. In order to be properly decoded, a
B-frame associated with a group of pictures ("GOPs") may need to
reference the I-frame of a next GOP. In the context of the present
disclosure, the term "I-frame" is intended to broadly refer to an
Inter-frame and its equivalents, e.g., an IDR frame in the case of
H.264.)
[0005] A variety of mechanisms have been developed for handling
packet loss. For instance, when packet loss occurs in network
transport protocols such as Transmission Control Protocol (TCP),
any segments that have not been acknowledged are simply resent. But
the problem with such approaches is that they are often unfeasible
or impractical in streaming video technologies where it is
essential to maintain high data rates and throughput.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] Non-limiting and non-exhaustive embodiments of the present
invention are described with reference to the following figures,
wherein like reference numerals refer to like parts throughout the
various views unless otherwise specified.
[0007] FIG. 1 is an example network diagram illustrating one
embodiment for effectively dealing with packet loss.
[0008] FIG. 2 is a flow diagram illustrating an example method for
dealing with packet loss.
[0009] FIG. 3 is another example network diagram illustrating an
embodiment for handling packet loss.
[0010] FIG. 4 is yet another example network diagram illustrating
an embodiment for handling packet loss.
[0011] FIG. 5 is a flow diagram illustrating another example
process flow for handling packet loss.
[0012] FIG. 6 is still another example network diagram illustrating
an embodiment for handling packet loss.
[0013] FIG. 7 is a flow diagram illustrating an example process
flow for dealing with packet loss.
DESCRIPTION OF EXAMPLE EMBODIMENTS
[0014] In the following description, numerous specific details are
set forth in order to provide a thorough understanding of the
embodiments described. It will be apparent, however, to one having
ordinary skill in the art that the specific details may not be
needed to practice the embodiments described. In other instances,
well-known apparatus or methods have not been described in detail
in order to avoid obscuring the embodiments disclosed.
[0015] Reference throughout this specification to "one embodiment",
"an embodiment", "one example" or "an example" means that a
particular feature, structure or characteristic described in
connection with the embodiment or example is included in at least
one embodiment of the present invention. Thus, appearances of the
phrases "in one embodiment", "in an embodiment", "one example" or
"an example" in various places throughout this specification are
not necessarily all referring to the same embodiment or example.
Furthermore, the particular features, structures or characteristics
may be combined in any suitable combinations and/or
sub-combinations in one or more embodiments or examples. In
addition, it is appreciated that the figures provided herewith are
for explanation purposes to persons ordinarily skilled in the
art.
[0016] In the context of the present disclosure, a video "encoder"
broadly refers to a device, circuit or algorithm (embodied in
hardware or software) that compresses (i.e., encodes) video data
using fewer bits/bytes to reduce the size of the original video
data. Data compression is also frequently referred to as source
coding, i.e., coding of data performed at the source before it is
either transmitted or stored. Conversely, a video "decoder" or
decompressor is a device, circuit or algorithm which performs the
reverse operation of an encoder, undoing the encoding to retrieve
the original (decompressed) video data.
[0017] The term "server" broadly refers to any combination of
hardware or software embodied in a computer (i.e., a processor)
designed to provide services to client devices or processes. A
server therefore can refer to one or more computer processors that
run a server operating system from computer-executable code stored
in a memory, and which is provided to the user as virtualized or
non-virtualized server; it can also refer to any software or
dedicated hardware capable of providing computing services.
[0018] A "client device" refers a computer device such as a PC,
desktop computer, tablet, mobile, handheld, set-top box, or any
other general purpose computer (e.g., Microsoft Windows- or
Linux-based PCs or Apple, Inc. Macintosh computers) having a wired
or wireless connection to a public network such as the Internet,
and which further includes the ability to decompress/decode
compressed packet data received over a network connection. The
client device may include either an internal or external display
device for displaying one of the many digital images (compressed or
uncompressed) which comprise a movie or video (i.e., a live or
moving picture).
[0019] A video "frame" refers one of the many digital images
(compressed or uncompressed) which comprise a movie or video (i.e.,
a live or moving picture). When video is displayed, each frame of
the moving picture is flashed on a screen for a short time
(nowadays, usually 1/24, 1/25, 1/30 or 1/60 of a second) and then
immediately replaced by the next one. The human attribute of
persistence of vision blends the frames together, such that a view
perceives a live, or real-time moving picture. A frame can be
divided up into regions of an image, which are commonly referred to
as "tiles" or "slices." For instance, in the H.264/AVC standard a
frame can be composed of a single slice or multiple slices.
[0020] In the context of the present disclosure, the term "packet
loss" refers broadly to the occurrence of when one or more packets
travelling across a computer network fail to reach their
destination, or when one or more packets transmitted over a network
arrive at their destination with errors.
[0021] FIG. 1 is an example network diagram illustrating one
embodiment for effectively dealing with packet loss. As shown, a
plurality of video frames 11 is compressed (coded) by an encoder 12
to produce a primary stream of compressed video data as well as a
subset stream containing key minimal data. Key data would differ
among embodiments and would depend on specific application. The
additional stream may be transmitted at considerably lower bitrates
than the normal stream. The subset stream, or sub-stream, is shown
by arrow 13. After encoding, the primary and subset streams are
packetized by packetize devices 14a and 14b, respectively, before
being transmitted, substantially simultaneously, to the client over
network 16. Thus, for every slice, the hosting service not only
sends the normal video stream, but also a subset of that stream in
separate network packets.
[0022] In one embodiment, the subset stream contains only motion
vectors. In another embodiment, subset stream contains motion
vectors and residuals. Motion vectors represent the spatial
displacement or "delta" between two successive image areas (e.g.,
frame-to-frame).
[0023] In the embodiment shown, information that leads to one
possible encoding (i.e., motion vectors/residuals) is separated
from the actual encoding, and then sent downstream so that the
decoder may make best use of it as can in the event that packet
loss occurs. In other words, the decoder may utilize the motion
vectors to construct the frame as best as possible. Practitioners
in the art will appreciate that the additional sub-stream may be
transmitted at considerably lower bit rates than the normal or
primary stream. This is represented in FIG. 1 by the smaller-sized
sub-stream packets 15b shown being transmitted over network 16, as
compared with the primary packet stream 15a.
[0024] On the client side, both the primary stream and the
sub-stream are received at switching device 17, which may comprise
a router, switch, or other network switching device that may be
used to select between the primary and sub-stream. Switching device
17 normally selects the primary stream, which is then fed into
packet reconstruction device 18. The reconstructed packets are then
decoded by the decoder 19 to produce the reconstructed video frames
20 for display on the client device. In the event that packet loss
is detected, switching device 17 switches from the primary stream
to the sub-stream. The sub-stream information is used by decoder 19
to make a prediction of, or otherwise reconstruct, the desired
frame. Afterwards, switching device 17 switches back to the normal
or primary packet stream.
[0025] FIG. 2 is a flow diagram illustrating an example method for
dealing with packet loss in correspondence with the network diagram
of FIG. 1. The process may begin at block 24 with the arrival of a
network packet, followed by the client-side device detecting
whether the primary packet stream is corrupt. If it is not corrupt,
the primary network packet stream continues to be selected (block
24) for subsequent decoding and rendering of reconstructed video
frames on the client display device. On the other hand, if the
primary stream is corrupted by a dropped or lost packet then the
subset network packet stream is selected (block 22). The subset
data are decoded and used to reconstruct the desired frame. This
process may continue as long as the primary stream remains corrupt.
At decision block 23, once the primary stream is recovered or is no
longer corrupted (block 23) the primary network packet stream is
once again selected and the normal packets are decoded, as
described above.
[0026] FIG. 3 is another example network diagram illustrating an
embodiment for handling packet loss that is similar to that shown
in FIG. 1. In this example, video frames 31 are encoded as desired
by encoder 32a to produce a primary or normal video stream 33a. In
addition, a separate encoding generates a less ideal stream 33b
with each frame based on the previous frame of the primary stream
but with the quality scaled such that a slice or frame neatly fits
into a predetermined set of network packets, which could be a
single packet. The normal video stream is packetized by packetizer
34a and transmitted over network 35. The secondary stream is also
packetized for transmission over network 35. It is appreciated that
the bandwidth of the secondary stream may be much lower as compared
to the normal video stream.
[0027] At the client-side device, the encoded primary stream frames
are reconstructed by a device 36 to replicate video stream 33a. A
switching device 37 is utilized to select between the normal video
stream and the secondary video stream. Whichever stream is
selected, the received packets are then decoded by a decoder 39 to
generate reproduced video frames 40 on the display of the client
device. As in the previous embodiment, if a packet loss is detected
on the client-side device, the secondary stream is selected by
switching device 37. When the primary stream transmission recovers,
the video stream switching device 37 switches back the primary or
normal video stream.
[0028] It is appreciated by practitioners in the art that in order
to maintain synchronicity between the client-side decoder and the
server-side decoder (not shown), the client device may notify the
server-side when it switches over to the secondary video stream.
When lost or corrupted data packets have been detected and
switching device 37 has selected the secondary stream, a processor
associated with the client device utilizes the secondary video
stream to construct a lower-quality, yet accurate, representation
of the video frames.
[0029] FIG. 4 is yet another example network diagram illustrating
an embodiment for handling packet loss. In the embodiment shown,
the server-side encoder 41 stores a copy of the encoded bits per
slice/frame in an associated memory 42 (e.g., RAM, disk, etc.).
These encoded frames may then be retrieved and individually decoded
by decoder 43 for reconstruction of the client-side state. In this
manner, encoder 41 can continue to feed encoded slices/frames to
the client-side decoder 45, which is coupled to its own associated
memory 46, even after packet loss. When packet loss does occur,
decoder 45 (or a processor coupled with decoder 45) sends a
notification to encoder 41 (or a processor controlling encoder 41)
via feedback channel or loop 47. In response to the notification of
packet loss, encoder 41 can utilize decoder 43 and the stored
encoded bits/frames to determine exactly what happened at decoder
45 on the client side. In this way, the server can keep itself
aware of client state even when the client has received erroneous
transmissions.
[0030] In the example shown in FIG. 4, encoder 41 has encoded
frames F.sub.1-F.sub.7 and sent them over a transmission network to
the client device. By way of example, the first frame, F.sub.1, may
be an I-frame, followed by a sequence of P-frames that are
calculated predictively based on a difference or delta (A) from the
previous I-frame or P-frame. Thus, the second frame, F.sub.2, is
decoded as a delta from F.sub.1, the third frame, F.sub.3, is
decoded as a delta from F.sub.2, and so on. As shown, the third
frame, F.sub.3, is lost or corrupted. When this is detected at
decoder 45, a notification is sent to encoder 41 via feedback loop
47, notifying encoder 41 that the last good frame received was
F.sub.2. In response to the notification, server-side encoder 41
generates the eighth frame, F.sub.8, predictively as a delta from
the last good frame, F.sub.2. To do this and maintain state
synchronicity between the server and client sides, encoder 41
constructs F.sub.8 from F.sub.2, taking into account all of the
client errors resulting on the client-side due to the loss of
F.sub.3, and the subsequent frames. Utilizing decoder 43 and the
stored encoded bits in memory 42, encoder determines exactly what
each of the packets subsequent to the lost packet (e.g.,
F.sub.4-F.sub.8) would look like, taking into consideration the
client-side errors that have occurred.
[0031] FIG. 5 is a flow diagram illustrating an example process
flow wherein the server-side encoder keeps encoding frames for
reconstruction of the client state following packet loss. With the
server-side encoder having transmitted an I-frame, the next frame
(e.g., P-frame) is encoded based on the previous client state.
(Block 51) At block 52, the server-side encoder sends the encoded
bits to the client device and also stores a copy of these same bits
in an associated memory. At decision block 53, the server-side
encoder queries whether a notification of packet loss has been
received from the client-side decoder. If not, the encoder
continues encoding frames based on the previous client state.
(Block 51)
[0032] On the other hand, if packet loss was detected and a
notification received by the encoder, the server-side calculates
the client state from the last known good state (before packet
loss) and the decode data that the client received correctly, plus
the decode data that followed the correctly received data. (Block
54) This later decode data comprises the client errors resulting
from the packet loss. The process then continues at block 51, with
the encoder coding the next frame based on the previous client
state, with the previous client state now being that calculated
from block 54.
[0033] FIG. 6 is still another example network diagram illustrating
a slice-based recovery technique for overcoming packet loss. This
embodiment may be used for video frames 61 that are divided into
two or more slices. For instance, in this example a frame slicer 62
is shown dividing a video frame into four slices, which are then
encoded by server-side encoder 63 and transmitted to the
client-side decoder 64. Decoder 64 generates reconstructed video
frames 69 for display on the client device.
[0034] In FIG. 6, six frames, each having four slices, are shown
transmitted by encoder 63. The first frame comprises four I-slices,
the second frame comprises four P-slices, the third frame comprises
four P-slices, and so on. Over on the client-side, decoder 64 is
shown receiving frames 1-3 without incident. However, frame 4 is
shown being received with the data for the third slice, denoted by
reference numeral 65, having been lost during network transmission.
When lost data is detected by decoder 64, or by a packet loss
detection device 66, a notification is sent back to encoder 63 via
feedback channel or loop 67. In this example, the notification is
received by encoder 63 immediately following transmission of frame
5, which comprises four P-slices. Responsive to the notification of
lost data in the third slice, encoder encodes the next frame (frame
6) with slice 3 as an I-slice, as denoted by reference numeral
68.
[0035] The embodiment of FIG. 6 thus performs frame repair by
repairing individual slices for which data has been lost.
Practitioners in the art will appreciate that this embodiment has
the advantage of avoiding the standard practice of insuring that a
video stream contains a certain density of I-frames, which are very
large and costly to transmit. Instead of sending an I-frame say,
every two seconds (as in the case of DVD or on-demand video
transmissions that lack feedback) the embodiment of FIG. 6 relies
upon P-slice transmission for frames subsequent to the initial
frame, and then implements slice-based recovery by transmitting an
I-slice at the slice position where lost slice data was detected at
the client device.
[0036] Practitioners will further appreciate that for optimal
results, a single network packet should not contain data for more
than a single slice.
[0037] FIG. 7 is a flow diagram illustrating an example process
flow for slice-based recovery from lost data packets. On the
server-side, video frames are divided into N, where N is an
integer>1, slices. (Block 71) At decision block 72, the hosting
service queries whether it has received a notification from the
client device indicative of a lost data slice. If no such
notification has been received, it proceeds to encode each of the
slices in the frame as P-slices (block 74), which are then
transmitted to the client device (block 75). If, on the other hand,
the hosting service has received a notification that the data of a
particular slice has been lost during network transmission, the
corresponding slice in the current (i.e., next) frame is encoded as
an I-slice. (Block 73) That I-slice is then transmitted over the
network to the client device. (Block 74).
[0038] On the client side, incoming packets/slices may be checked
for data integrity. (Block 76) If data is lost for a particular
slice, the server at the hosting service center is immediately
notified. (Block 77) If no data is detected lost, then each slice
received is decoded (block 78) in order to reconstruct the video
frames for rendering on the client-side display device.
[0039] It should be understood that elements of the disclosed
subject matter may also be provided as a computer program product
which may include a machine-readable medium having stored thereon
instructions or code which may be used to program a computer (e.g.,
a processor or other electronic device) to perform a sequence of
operations. Alternatively, the operations may be performed by a
combination of hardware and software. The machine-readable medium
may include, but is not limited to, floppy diskettes, optical
disks, CD-ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs,
EEPROMs, magnet or optical cards, or other type of machine-readable
medium suitable for storing electronic instructions.
[0040] The above description of illustrated example embodiments,
including what is described in the Abstract, are not intended to be
exhaustive or to be limitation to the precise forms disclosed.
While specific embodiments and examples of the subject matter
described herein are for illustrative purposes, various equivalent
modifications are possible without departing from the broader
spirit and scope of the present invention.
* * * * *