U.S. patent number RE46,167 [Application Number 14/147,467] was granted by the patent office on 2016-09-27 for systems and methods for transmitting data over lossy networks.
This patent grant is currently assigned to Sony Coporation, Sony Electronics Inc.. The grantee listed for this patent is SONY CORPORATION, SONY ELECTRONICS INC.. Invention is credited to Behram DaCosta.
United States Patent |
RE46,167 |
DaCosta |
September 27, 2016 |
**Please see images for:
( Certificate of Correction ) ** |
Systems and methods for transmitting data over lossy networks
Abstract
An encoder/decoder system usable to decrease the probability of
a failed transmission over a lossy wireless network. In one
embodiment, data packets unsuccessfully sent over the wireless
network may be retransmitted a particular number of times,
depending on how important the data in the lost packet is. In
another embodiment, when a number of transmission failures is
determined to exceed a predetermined threshold, it may be signaled
that multiple reference frames should be used for encoding
predictions.
Inventors: |
DaCosta; Behram (San Jose,
CA) |
Applicant: |
Name |
City |
State |
Country |
Type |
SONY CORPORATION
SONY ELECTRONICS INC. |
Minato-ku Tokyo
Park Ridge |
N/A
NJ |
JP
US |
|
|
Assignee: |
Sony Coporation (Tokyo,
JP)
Sony Electronics Inc. (Park Ridge, NJ)
|
Family
ID: |
1000001855386 |
Appl.
No.: |
14/147,467 |
Filed: |
January 3, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
Issue Date |
|
|
11197818 |
Jun 29, 2010 |
7747921 |
|
|
Reissue of: |
12770464 |
Apr 29, 2010 |
8429476 |
Apr 23, 2013 |
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04L
1/1877 (20130101); H04L 1/1877 (20130101); H04L
1/0014 (20130101); H04L 2001/0098 (20130101); H04L
1/0026 (20130101); H04L 2001/0098 (20130101); H04L
1/0026 (20130101) |
Current International
Class: |
G08C
25/02 (20060101); H03M 13/00 (20060101); H04L
1/18 (20060101); H04L 1/14 (20060101); H04L
1/00 (20060101) |
Field of
Search: |
;714/748-751 |
References Cited
[Referenced By]
U.S. Patent Documents
Foreign Patent Documents
|
|
|
|
|
|
|
2004203324 |
|
Aug 2004 |
|
AU |
|
09-037425 |
|
Feb 1997 |
|
JP |
|
09037245 |
|
Feb 1997 |
|
JP |
|
2001-119437 |
|
Apr 2001 |
|
JP |
|
2001119437 |
|
Apr 2001 |
|
JP |
|
2001-156782 |
|
Jun 2001 |
|
JP |
|
2001156782 |
|
Jun 2001 |
|
JP |
|
2006-165733 |
|
Jun 2006 |
|
JP |
|
2006165733 |
|
Jun 2006 |
|
JP |
|
2004075555 |
|
Sep 2004 |
|
WO |
|
WO-2004/075555 |
|
Sep 2004 |
|
WO |
|
Other References
Notice of Allowance from U.S. Appl. No. 12/770,464 mailed Oct. 24,
2011. cited by applicant .
Notice of Allowance from U.S. Appl. No. 12/770,646 mailed Dec. 20,
2012. cited by applicant .
PCT Search Report for PCT/US06/29569 mailed Sep. 26, 2007. cited by
applicant .
R. Schaefer et al.; "The emerging H.264/AVC standard"; EBU
Technical Review, Jan. 2003, p. 1-12,
http://tech.ebu.ch/docs/techreview/trev.sub.--293-schaefer.pdf.
cited by applicant .
Notice of Preliminary Rejection for Korean Patent Application No.
10-2008-7005355 mailed from Korean Intellectual Property Office on
Sep. 14, 2012. cited by applicant .
Extended European Search Report for European Application No.
06800504 mailed from the European Patent Office on Jan. 7, 2013.
cited by applicant .
Notice of Last Preliminary Rejection for Korean Patent Application
No. 10-2008-7005355 mailed from Korean Intellectual Property Office
on Dec. 5, 2012. cited by applicant.
|
Primary Examiner: Heneghan; Matthew
Attorney, Agent or Firm: Fitch, Even, Tabin & Flannery
LLP
Parent Case Text
CROSS REFERENCE TO RELATED APPLICATION
This application is a divisional of U.S. patent application Ser.
No. 11/197,818, filed on Aug. 5, 2005, which is hereby fully
incorporated by reference.
Claims
What is claimed is:
1. A method comprising: transmitting a plurality of encoded data
packets over a wireless network; receiving acknowledgment signals
for each of said plurality of encoded data packets when
successfully transmitted; and determining if a number of
transmission failures exceeds a predetermined threshold, and if so,
signaling that multiple reference frames should be used for
encoding predictions.
2. The method of claim 1, wherein said plurality of encoded data
packets are encoded according to an H.264/AVC coding standard.
3. The method of claim 2, wherein said wireless network is a lossy
wireless network.
4. The method of claim 1, further comprising determining a number
of retransmission attempts for a particular encoded data packet if
not successfully transmitted, wherein said number of retransmission
attempts is based at least in part on a type of data in said
encoded data packet.
5. The method of claim 4, wherein said number of retransmission
attempts increases when said type of data in said particular
encoded data packet is Instantaneous Decoder Refresh (IDR) frame
data.
6. The method of claim 1, wherein determining the number of
transmission failures exceeds the predetermined threshold comprises
detecting a decrease in a number of acknowledgment signals being
sent.
7. The method of claim 1, further comprising: identifying one or
more slices of said plurality of encoded data packets which were
not successfully transmitted; and ceasing to refer to said one or
more slices when encoding subsequent frames of said encoded video
data.
8. The method of claim 1, further comprising: estimating a
distortion level caused by the unsuccessful receipt of a plurality
of lost data packets transmitted over said network; and encoding
additional video data using said distortion level to avoid
referencing said plurality of lost data packets.
9. A system comprising: a wireless network; a client coupled to
said wireless network, said client including a decoder and a
receiver; a server coupled to said wireless network, said sever
including an encoder and a transmitter, wherein said server is to,
transmit a plurality of encoded data packets over the wireless
network to said client; receive acknowledgment signals for each of
said plurality of encoded data packets when successfully
transmitted to said client; and determine if a number of
transmission failures exceeds a predetermined threshold, and if so,
signal that multiple reference frames should be used for encoding
predictions.
10. The system of claim 9, wherein said plurality of encoded data
packets are encoded according to an H.264/AVC coding standard.
11. The system of claim 10, wherein said wireless network is a
lossy wireless network.
12. The system of claim 9, wherein said server is to further
determine a number of retransmission attempts for a particular
encoded data packet if not successfully transmitted, wherein said
number of retransmission attempts is based at least in part on a
type of data in said encoded data packet.
13. The system of claim 12, wherein said number of retransmission
attempts increases when said type of data in said particular
encoded data packet is Instantaneous Decoder Refresh (IDR) frame
data.
14. The system of claim 9, wherein determining the number of
transmission failures exceeds the predetermined threshold comprises
detecting a decrease in a number of acknowledgment signals being
sent.
15. The system of claim 9, wherein said server is to further,
identify one or more slices of said plurality of encoded data
packets which were not successfully transmitted; and cease to refer
to said one or more slices when encoding subsequent frames of said
encoded video data.
16. The system of claim 9, wherein said server is to further,
estimate a distortion level caused by the unsuccessful receipt of a
plurality of lost data packets transmitted over said network; and
encode additional video data using said distortion level to avoid
referencing said plurality of lost data packets.
.Iadd.17. A system comprising: a server in communication with a
wireless network, the server being configured to: encode a
plurality of data packets including a plurality of frames
associated with video content; transmit the plurality of encoded
data packets to a client connected to the wireless network;
decompose each of the frames into a plurality of fields; segment
each of the fields into a plurality of slices; transmit each of the
slices associated with a first of the fields over the wireless
network to the client; and transmit each of the slices associated
with a second of the fields over the wireless network to the client
after transmission of each of the slices associated with the first
of the fields..Iaddend.
.Iadd.18. The system of claim 17, wherein the plurality of encoded
video data packets are encoded according to an H.264/AVC coding
standard..Iaddend.
.Iadd.19. The system of claim 17, wherein the wireless network is a
lossy wireless network..Iaddend.
.Iadd.20. The system of claim 17, wherein each of the frames is an
Instantaneous Decoder Refresh (IDR) frame..Iaddend.
.Iadd.21. The system of claim 20, wherein each of the fields
comprises a plurality of macroblocks, the macroblocks being
spatially separated from one another by p macroblocks..Iaddend.
.Iadd.22. The system of claim 21, wherein all macroblocks of the
IDR frame are included in the plurality of fields..Iaddend.
.Iadd.23. The system of claim 17, wherein the server is further
configured to identify macroblocks not received by the client from
the server and interpolate neighboring pixels to estimate pixels of
the macroblocks not received by the client..Iaddend.
.Iadd.24. The system of claim 17, wherein the server is further
configured to determine which of the video data packets are not to
be transmitted from the server to the client due to congestion of
the wireless network..Iaddend.
.Iadd.25. The system of claim 17, wherein the server is further
configured to transmit each of the plurality of fields on a
separate spatial channel..Iaddend.
.Iadd.26. The system of claim 17, wherein the server includes an
encoder adapted to encode the plurality of data packets including a
plurality of frames associated with video content..Iaddend.
.Iadd.27. The system of claim 17, further comprising a transmitter
adapted to transmit each of the slices associated with the first of
the fields over the wireless network to the client; and transmit
each of the slices associated with the second of the fields over
the wireless network to the client after transmission of each of
the slices associated with the first of the fields..Iaddend.
.Iadd.28. The system of claim 27, wherein the server includes the
transmitter..Iaddend.
.Iadd.29. The system of claim 27, wherein the transmitter is
separate from the server..Iaddend.
Description
FIELD OF THE INVENTION
The invention relates in general to transmitting data over lossy
networks, and in particular, to improving data transmission
performance over lossy wireless network connections.
BACKGROUND OF THE INVENTION
Typical problems associated with transmitting compressed video over
wireless networks include Quality of Service (QoS), latency and
maintaining basic image integrity. For example, if a single packet
of transmitted video data is lost, the fact that the video data is
typically compressed in the temporal domain can cause propagation
and cascading of a single artifact in one frame through multiple
successive frames. Such lossy wireless networks may operate
wireless protocols, such as those described by IEEE 802.11x and
802.15.3a, and video compression algorithms such as those described
by the AVC video standard. Lossy networks may use transmission
methods other than wireless such as, for example, HomePlug AV
Powerline Communications.
During video transmission over a lossy network, video data
(sometimes referred to as "video packets") may be lost. For
example, wireless lossy transmission mediums can be unreliable in
that the transmitted video packets may not always be received
(accurately or at all) by the wireless receiver. To counter this,
the 802.11x Media Access Control (MAC) requires that in most cases
a packet (or group of packets for 802.11e extensions to the
standard) that is received will be acknowledged to the transmitter
by sending back an "ACK" signal. Hence a missing ACK signal
normally indicates that a video packet (or packets) has been
lost.
In addition, video packet data can also be lost at the receiver.
For example, as Advanced Video Coding (AVC) encoded data tends to
be bursty, an unexpectedly large burst can overflow buffers on the
receiver-side at several locations between the 802.11x module and
the AVC decoder itself. Most packets lost in this way can be
detected by Real-Time Transport Protocol (RTP) feedback.
To alleviate some of the inherent drawbacks associated with lossy
network communication, various data recovery and error correction
features have been built into the data coding standard used. For
example, H264/AVC is a more recently developed coding standard
which includes a Video Coding Layer (VCL) to efficiently represent
the video content, and a Network Abstraction Layer (NAL) to format
the VCL representation of the video and provide header information
in a manner appropriate for conveyance by particular transport
layers or storage media. Despite these efforts, there is a need to
improve transmission reliability and error concealment over lossy
network connections, such as 802.11x networks.
Thus, there is still an unsatisfied need for an improved system and
method for transmitting video data over networks in a manner which
decreases the probability of a failed transmission, improves the
probability of successful decoding and/or increases the quality of
error concealment on the receiver-side.
SUMMARY OF THE INVENTION
Systems and methods for transmitting data over lossy networks, such
as wireless networks, are disclosed and claimed herein. In one
embodiment a method comprises transmitting a plurality of encoded
data packets over a wireless network, receiving acknowledgment
signals for each of said plurality of encoded data packets when
successfully transmitted, and determining if a number of
transmission failures exceeds a to predetermined threshold, and if
so, signaling that multiple reference frames should be used for
encoding predictions.
Other embodiments are disclosed and claimed herein.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts one embodiment of a simplified system overview for
implementing one or more aspects of the invention;
FIG. 2 depicts a process for carrying one aspect of the invention
in accordance with one embodiment;
FIG. 3 depicts a process for carrying another aspect of the
invention, according to one embodiment;
FIG. 4 depicts a process for carrying yet another aspect of the
invention, according to one embodiment;
FIG. 5 depicts an IDR frame according to one embodiment; and
FIG. 6 depicts an IDR frame according to another embodiment.
DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS
The invention relates to a system in which data (e.g., video data)
is being transmitted at least partially wirelessly over a lossy
network originating from a server to one or more client-side
systems. In one embodiment, the server includes an encoder module
and a transmitter, while the client includes a decoder module and a
receiver.
According to one aspect of the invention, data packets sent over
the wireless network by the server which are not successfully
received may be retransmitted a particular number of times. In one
embodiment, the number of retransmission attempts is based on how
important the data in the lost packet is considered. That is, in
one embodiment, the number of retransmission attempts may be made
adaptive, and packets considered more important (e.g., those
containing IDR frames) are retransmitted a greater number of times
as compared to less important packets (e.g., those containing P
frames). In one embodiment, the aforementioned data packets may be
encoded by an H.264/AVC encoder and/or sent over a 802.11x wireless
network connection to be decoded by one or more H.264/AVC
decoders.
Another aspect of the invention is to improve the probability of
successful decoding at the client side. In one embodiment, this is
done by having a decoder signal to a corresponding encoder to use
multiple reference frames for subsequent prediction operations. In
the case of an H.264/AVC system, this allows frames, such as P
frames, that refer to data in prior frames (or in the case of B
frames also future frames) to refer to macroblocks in multiple
reference frames in order to determine the macroblocks in a current
frame.
In another embodiment, the probability of successful decoding at
the client side may be improved by identifying exactly which slices
of the original data stream have not been received at the decoder.
In one embodiment, this may be done by noting when an
acknowledgment signal is not received for a given packet and all
the packet retransmission attempts have been used. Using this
information, an encoder may then stop referring to these lost
slices/macroblocks in future coding operations. This may then limit
the propagation of errors in future decoded frames at the client
side. While in one embodiment, this may be done concurrently with
the aforementioned operations, in another embodiment it may be
subsequently performed.
Still another aspect of the invention is to estimate the distortion
of the data caused by lost data packets/slices. Using this
information, past pixels that are considers to have been
reconstructed adequately by the error concealment at the client may
continue to be referred to, while references to data not adequately
reconstructed properly may be avoided. In one embodiment, the
client itself may indicate the level of distortion back to the
server given that the client decoder knows exactly what error
concealment was used. This information may then be communicated
back to the server. However, in another embodiment, an estimate of
error of the received data may also be determined on the
server-side by estimating the error concealment at the client for
the given lost data packets, and the distortion may then be
estimated by comparing the reconstructed data to the original data
which is also available at the server.
Still another aspect of the invention is to increase the quality of
error-concealment at the client by making use of the Flexible
Macroblock Ordering (FMO) functionality of an H.264/AVC
encoder/decoder system. That is, the reconstruction of IDR frames
(on which all successive frames in the picture depend) may be made
more robust by first decomposing each IDR frame into n fields such
that all macroblocks of the IDR frame may be included in the n
fields without duplication. Thereafter, each field may be segmented
into m slices. In one embodiment, all m slices for the first field
may be transmitted first, followed by the transmission of all m
slices for the second field, and so on until all m slices for all n
fields have been transmitted. This may be desirable since errors in
lossy network environments (e.g., 802.11x) are often bursty, where
a short burst of errors might eliminate all m slices, for example,
for a given field. That being the case, if the m slices for
different fields (e.g., 1,3,4, etc.) have been received correctly,
then the error concealment functionality of the decoder (e.g.,
H.264/AVC decoder) may interpolate neighboring pixels to estimate
the missing pixels since the missing macroblocks are spatially
surrounded by available macroblocks.
While it should be appreciated that all or some of the
aforementioned aspects of the invention may be implemented using an
H.264/AVC encoder/decoder system and/or a 802.11x wireless network
connection, it should equally be appreciated that they may also be
implemented using other similar codecs and/or lossy communication
channels.
When implemented in software, the elements of the invention are
essentially the code segments to perform the necessary tasks. The
program or code segments can be stored in a processor readable
medium or transmitted by a computer data signal embodied in a
carrier wave over a transmission medium or communication link.
H.264/AVC Overview
The H.264/AVC standard supports video coding that contains either
progressive, interlaced frames or both mixed together in the same
sequence. Generally, a frame of video contains two interleaved
fields--a top and a bottom field. The two fields of an interlaced
frame, which are separated in time by a field period, may be coded
separately as two field pictures or together as a frame picture. A
progressive frame, on the other hand, is coded as a single frame
picture. However, it is still considered to consist of two fields
at the same instant in time.
The VCL, which will be described in more detail below, represents
the content of the video data. In contrast, the NAL formats the
data and provides header information in a manner appropriate for
conveyance by the transport layers or storage media. All data is
contained in NAL units, each of which contains an integer number of
bytes. An NAL unit specifies a generic format for use in both
packet-oriented and bitstream systems.
The VCL of the H.264/AVC standard is similar in spirit to other
standards such as MPEG-2. In short, it consists of a hybrid of
temporal and spatial prediction, in conjunction with transform
coding. Each picture of a video, which can either be a frame or a
field, is partitioned into fixed-size macroblocks that cover a
rectangular picture area of 16.times.16 samples of the luma
component and 8.times.8 samples of each of the two chroma
components. All luma and chroma samples of a macroblock are either
spatially or temporally predicted, and the resulting prediction
residual is transmitted using transform coding.
The macroblocks are organized in slices, which represent portions
of a given image that can be decoded independently, and the
transmission order of macroblocks in the bitstream depends on a
Macroblock Allocation Map. The H.264/AVC standard supports five
different slice-coding types. The simplest one is referred to as an
I slice, or Intra slice. In I slices, all macroblocks are coded
without referring to other pictures within the video sequence. On
the other hand, prior-coded images can be used to form a prediction
signal for macroblocks of the predictive-coded P and B slices
(where P stands for predictive and B stands for bi-predictive). The
two additional slice types are SP (switching P) and SI (switching
I), which are specified for efficient switching between bitstreams
coded at various bit-rates.
The H.264/AVC standard supports a feature called Flexible
Macroblock Ordering (FMO) in which a pattern that assigns the
macroblocks in a picture to one or several slice groups is
specified. Each slice group may then be transmitted separately.
System Architecture Overview
FIG. 1 depicts one embodiment of a system 100 for carrying out one
or more aspects of the invention. In particular, system 100
includes the capture of video content from source 110. In one
embodiment, video content 115 may be real-time content transmitted
from a real-time video source. Regardless of whether the video
content 115 is real-time content or not, it may then be provided to
server 135 which, in the embodiment of FIG. 1, is comprised of
encoder 120 and transmitter 130. In one embodiment, encoder 120
processes the video content 115 from source 110 in order to provide
encoded data 125 to some destination point via network 140. In one
embodiment, encoder 120 may encode video content 115 in accordance
with the previously-described H.264/AVC coding standard. However,
it should equally be appreciated that the invention may be used
with similar codecs other than the H.264/AVC coding standard
Once encoded, encoded data 125 is provided to transmitter a client
145, as shown in FIG. 1. While in one embodiment, transmitter 130
is a 802.11x or 802.15.3a wireless transmitter, it should equally
be appreciated that transmitter 130 may also transmit data
according to numerous other lossy protocols. Regardless of the
wireless protocol employed, encoded data 125 may then be provided
to and sent over network 140. In one embodiment encoder 120 may
encode video content 115 in accordance with the
previously-described H.264/AVC coding standard. However, it should
equally be appreciated that the invention may be used with similar
codecs other than the H.264/AVC coding standard, and with lossy
networks other than 802.11x/802.15.3a wireless networks.
Continuing to refer to FIG. 1, client 145 includes receiver 150 and
decoder 155. Encoded data 125 may be received by receiver 145,
which is in communication with network 140 (e.g., the Internet).
Receiver 150 may then provide the encoded data 125 to decoder 150.
In another embodiment, the encoded data 125 may be received and
decoded by a plurality of client-side devices (not shown).
Depending on the coding standard used by the encoder 120 (e.g.,
H.264/AVC), the decoder 155 may then perform specific decoding
operations on the encoded data 125 to provide decoded video content
160 to a connected display device 165, as shown in FIG. 1.
It should be appreciated that server 135 and client 145 may have
numerous configurations other than as depicted in FIG. 1. For
example, either or both of encoder 120 and transmitter 130 may be
separate from server 135. Similarly, either or both of decoder 155
and receiver 150 need not be integrated into client 145.
Although not depicted, it should equally be appreciated that server
135 and/or client 145 may include other components, such as a
central processing unit (CPU), which may include an arithmetic
logic unit (ALU) for performing computations, a collection of
registers for temporary storage of data and instructions, and a
control unit for controlling operation for the computer system. In
one embodiment, the CPU may be any one of the x86, Pentium.TM.
class microprocessors as marketed by Intel.TM. Corporation,
microprocessors as marketed by AMD.TM., or the 6.times.86MX
microprocessor as marketed by Cyrix.TM. Corp. In addition, any of a
variety of other processors, including those from Sun Microsystems,
MIPS, IBM, Motorola, NEC, Cyrix, AMD, Nexgen and others may be
used. Moreover, any such CPU need not be limited to
microprocessors, but may take on other forms such as
microcontrollers, digital signal processors, reduced instruction
set computers (RISC), application specific integrated circuits, and
the like.
Other components that the server 135 and/or client 145 may include
are a random access memory, a non-volatile memory (e.g., hard disk,
floppy disk, CD-ROM, DVD-ROM, tape, high density floppy, high
capacity removable media, low capacity removable media, solid state
memory device, etc., and combinations thereof). The server 135
and/or client 145 may also include a network interface (e.g., a
network interface card, a modem interface, integrated services
digital network, etc.), and a user input device (e.g., a keyboard,
mouse, joystick and the like for enabling a user to interact with
and provide commands).
It should further be appreciated that the server 135 and/or client
145 may include system firmware, such as system BIOS, and an
operating system (e.g., DOS, Windows, Unix, Linux, Xenix, etc) for
controlling the server 135 and/or client's operation and the
allocation of resources.
Decreasing Probability of a Failed Transmission
As mentioned above, one aspect of the invention is to be able to
decrease the probability of a failed transmission. To that end,
FIG. 2 depicts a process 200 for how the probability of a failed
transmission may be decreased in accordance with one embodiment of
the invention. With real-time wireless transmissions, it is not
always feasible to guarantee successful transmission of all data
packets. In particular, excessive redundant transmissions can lead
to unacceptable latencies. With that said, some data packets can be
considered more important than others. Hence, the number of
retransmission attempts may be a function of the importance of the
packets in question. For example, packets containing Instantaneous
Decoder Refresh (IDR) frames may be retransmitted a greater number
of times as compared to less important packets, such as those
containing predictive frames, or P frames. To that end, process 200
begins at block 210 with the transmission of a data packet. A
determination may then be made at block 220 as to whether the
packet was received or not. In one embodiment, this determination
may be based on whether the destination receiver (e.g., client 145)
provided an ACK signal back to the source (e.g., server 135).
However, it should equally be appreciated that successful packet
delivery may be determined using other means as well.
If it is determined at block 220 that the packet in question was in
fact received, then process 200 simply moves to block 230 where the
next data packet is processed. If, on the other hand, it is
determined that the packet was not properly received, then process
200 may continue to block 240 where a determination is made as to
what number of retransmission should be attempted for the given
packet. In one embodiment, the number of retransmission attempts is
based on the importance of the data contained within the given
packet. In another embodiment, the number of retransmissions is
made adaptive at all levels at which retransmission is implemented
to occur. For example, the number of retransmissions may be made
adaptive at the 802.11x MAC layer, as well as at the RTP layer if a
form of reliable or semi-reliable RTP retransmission has been
implemented.
Once the packet's number of retransmissions has been determined,
process 200 will continue to block 250 where it is determined if
the packet should be re-sent or not. If the number of
retransmission attempts equals zero, then the packet will not be
re-sent and process 200 ends. If, on the other hand, the number of
retransmission attempts is greater than zero, then process 200 will
continue to block 260 where the packet is re-sent, and then the
number of remaining retransmission attempts is reduced by 1 (block
270). Once re-sent, a determination must then be made at block 280
as to whether the re-sent packet was received this time. If so,
then process 200 simply moves to block 230 where the next data
packet is processed. If not, then process 200 moves back to block
250 where it is determined if the packet should be re-sent or not
(i.e., determine if the number of retransmission attempts equals
zero or not).
Increasing Probability of Successful Decoding
As mentioned above, another aspect of the invention is to increase
the probability of successful decoding at the client side of a
lossy transmission. To that end, FIG. 3 illustrates one embodiment
of a process 300 for improving the decoding of video content
transmitted over a lossy network (e.g., network 140). In
particular, process 300 begins with a determination of whether or
not the number of transmission failures has increased. This may be
done, for example, by detecting a decreasing number of ACKs sent by
a client side device (e.g., client 145). In one embodiment, a
predetermined threshold may be used to compare the number of ACKs
against. Once the number of ACKs (or the rate of ACKs received per
given time period) falls below the predetermined threshold, process
300 continues to one or more of blocks 310, 320 and 335. That is,
portions of process 300 may be carried out concurrently or
sequentially. For example, the portions of process 300 which
comprise blocks 310-315 may be carried out concurrently with the
operations described below with reference to blocks 320-330 and/or
blocks 335-345.
Once the number of ACKs (or the rate of ACKs received) falls below
the predetermined threshold, the client or decoder-side may signal
back to the encoder that it should use multiple reference frames
for prediction purposes. For example, in the case of a H.264/AVC
encoder, P frames that refer to data in prior frames (or in the
case of B frames also future frames) to refer to macroblocks in
multiple reference frames in order to determine the macroblocks in
the current frame. Hence, if one of the reference frames is missing
due to a lossy transmission error, the current frame can still be
successfully reconstructed assuming the other reference frames are
not also lost.
Thus, by using multiple reference frames (block 315) the
probability that a lost frame will cause a cascading effect that
leads to corruption of future predicted frames can be effectively
decreased. As illustrated by the dashed progress lines leading from
block 315, the operations of blocks 310-315 may be performed either
concurrently with or sequentially with the operations described
below with reference to blocks 320-330 and 335-345.
Regardless of whether sequentially or concurrently performed, at
block 320 the server side (e.g., server 135) may identify exactly
which slices of the original video stream have not been received at
the decoder. In one embodiment, this information is available since
the server knows which data packets are lost because a ACK signal
would not have been received for the given packet at the 802.11x
MAC layer, or which RTP packets are lost at the RTP application
layer. Once the missing slices have been identified, process 300
may continue to block 325 where the encoder is notified of exactly
which slices (and hence which macroblocks) have not been received
by the decoder. Using this information, the encoder (which in one
embodiment is an H2.64/AVC encoder) will stop referring to these
lost slices/macroblocks in future P and B frames (block 330). In
another embodiment, the encoder may also generate additional IDR
frames if necessary.
Either currently with or sequentially to the aforementioned
operations of process 300, the level of distortion caused by the
missing packets may be estimated by the server at block 335. This
may be significant since not all lost packets degrade the video
equally, specially since many decoders (e.g., H264 decoders) employ
error concealment. In one embodiment, the client itself may
estimate the distortion of the client's video caused by the lost
packets. This may be preferable since the decoder will know exactly
how it has implemented possibly proprietary error concealment. In
this case, the level of distortion may be communicated back to the
server (block 340).
Alternatively, this back channel of communication between the
client and server may be unreliable. In that case, an estimate of
the error of the final video may also be determined at the server
by estimating the error concealment at the client for the given
lost slices/packets. The distortion may be estimated by comparing
the reconstructed video to the original video which is also
available at the server. Regardless of whether the distortion has
been estimated on the client side or on the server side, this
information may then be used by the encoder to only refer to past
pixels that are considered to have been reconstructed adequately by
the error concealment at the client (block 345).
In addition to real-time video content, the process 300 of FIG. 3
may also be implemented for pre-stored streams on the server side
(such as on a PVR or at a ISP's head-end). In this case, several
versions of the same encoded stream can be pre-stored, which can
decrease the latency of producing these multiple versions of
encoded data on the fly. In other cases where the content is being
encoded in real-time, in latency-critical applications it may be
preferable to generate multiple encoded streams (e.g., H.264/AVC)
simultaneously in a manner that allows switching between them to
limit dependencies on any specific lost slices as explained
above.
Increasing Quality of Error Concealment
As mentioned above, another aspect of the invention is to increase
the quality of error concealment at the client side of a lossy
transmission. To that end, FIG. 4 illustrates one embodiment of a
process 400 for using error concealment to reduce the distortion of
the video content caused by the missing macroblocks. In the case of
an H.264/AVC encoder, Flexible Macroblock Ordering (FMO) is
available. The reconstruction of IDR frames (on which all
successive frames in the picture depend) may be made more robust
using the process 400 of FIG. 4. In particular, process 400 begins
at block 410 by decomposing each IDR frame into n fields. As will
be shown below with reference to FIG. 5, each field may comprise
macroblocks spatially separated from other macroblocks by p
macroblocks. Hence, all macroblocks of the IDR frame may be
included in the n fields without duplication. Thereafter at block
420 each field may be segmented into m slices by the NAL, where
size restrictions may be imposed by a wireless network (e.g.,
802.11x) link layer.
Once the fields have been segmented into m slices, process 400 may
continue to block 430 for all m slices for the first field may be
transmitted first, followed by the transmission of all m slices for
the second field, and so on until all m slices for all n fields
have been transmitted. This form of transmission may be
particularly useful for transmissions over a lossy network in which
errors are often bursty (e.g., 802.11x), where a short burst of
errors might eliminate all m slices for a given field. However, if
the m slices for different fields (e.g., 1,3,4, etc.) have been
received correctly, then the error concealment functionality of the
decoder (e.g., H.264/AVC decoder) will be able to interpolate
neighboring pixels to estimate the missing pixels given that the
missing macroblocks are spatially adjacent to or surrounded by
available macroblocks. This approach avoids the need to transmit
duplicate data to compensate for lost packets or slices, hence
requiring less bandwidth. In another embodiment, instead of a
single macroblock separated by n macroblocks, the single macroblock
may itself be expanded to include a group of macroblocks around the
original macroblock.
In another embodiment, the encoding process 400 of FIG. 4 may also
help the buffer management and streaming stacks to determine which
packets of data can be dropped prior to wireless transmission (or
after wireless reception) should critical congestion occur at other
stages of the network. By dropping a network packet containing one
of the fields of the dropping a network packet containing one of
the fields of the frame may result in better decoded video quality
at the receiver compared to randomly dropping any packet in the
original stream. It should further be appreciated that, for
specific applications in which spatial multiplexing MIMO wireless
technology is used, it is possible to optionally transmit each
field on a separate spatial channel to further improve
performance.
While the aforementioned process 400 was described in terms of IDR
frames, it should further be appreciated that it may be used with
any type of data frame containing spatial intra-frame data so long
as the available bandwidth and computational resources permit.
Referring now to FIG. 5, depicted is one embodiment of an IDR frame
500 comprised of 4 fields, where the individual macroblock of any
one field are spatially separated from each other by p=1 macroblock
in both the horizontal and vertical directions. Field #3, for
example, is comprised of all the individual macroblocks labeled
"3," as shown in FIG. 5. In this fashion, Field #3 may be
transmitted in multiple slices/packets so as to reduce the
consequence of bursty errors by improving the error concealment
functionality on the decoder side.
FIG. 6 depicts another embodiment of an IDR frame 600. However, in
the embodiment of FIG. 6, the IDR frame 600 is comprised of a
plurality of "super-macroblocks." In this embodiment, a
super-macroblock is made up of 4 individual macroblocks all
containing data for the same field. Thus, all super-macroblocks
labelled #1, for example, are considered part of Field #1 and are
transmitted as one or more slices in sequence. Thereafter, the same
is done for Field #2 (which is correspondingly comprised of the
super-macroblocks labelled #2), then Field #3 and so on. Compared
to the macroblock based algorithm of FIG. 5, the super-macroblock
algorithm of FIG. 6 will tend to increase encoding efficiency but
decrease accuracy of reconstruction at the decoder should a field
be lost in transmission.
While the invention has been described in connection with various
embodiments, it will be understood that the invention is capable of
further modifications. This application is intended to cover any
variations, uses or adaptations of the invention following, in
general, the principles of the invention, and including such
departures from the present disclosure as, within the known and
customary practice within the art to which the invention
pertains.
* * * * *
References