U.S. patent application number 15/372780 was filed with the patent office on 2017-06-29 for system and method of jitter buffer management.
The applicant listed for this patent is QUALCOMM Incorporated. Invention is credited to Venkatraman Atti, Venkata Subrahmanyam Chandra Sekhar Chebiyyam, Duminda Dewasurendra, Vivek Rajendran, Subasingha Shaminda Subasingha.
Application Number | 20170187635 15/372780 |
Document ID | / |
Family ID | 59086906 |
Filed Date | 2017-06-29 |
United States Patent
Application |
20170187635 |
Kind Code |
A1 |
Subasingha; Subasingha Shaminda ;
et al. |
June 29, 2017 |
SYSTEM AND METHOD OF JITTER BUFFER MANAGEMENT
Abstract
A method for adjusting a delay of a buffer at a receiving
terminal includes determining, at a processor, a partial frame
recovery rate of lost frames at the receiving terminal. The method
also includes adjusting the delay of the buffer based at least in
part on the partial frame recovery rate.
Inventors: |
Subasingha; Subasingha
Shaminda; (Weston, FL) ; Rajendran; Vivek;
(San Diego, CA) ; Dewasurendra; Duminda; (San
Diego, CA) ; Atti; Venkatraman; (San Diego, CA)
; Chebiyyam; Venkata Subrahmanyam Chandra Sekhar; (San
Diego, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
QUALCOMM Incorporated |
San Diego |
CA |
US |
|
|
Family ID: |
59086906 |
Appl. No.: |
15/372780 |
Filed: |
December 8, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62271994 |
Dec 28, 2015 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04L 47/32 20130101;
H04W 28/0236 20130101; H04L 47/283 20130101; H04L 43/16 20130101;
H04L 43/0817 20130101 |
International
Class: |
H04L 12/841 20060101
H04L012/841; H04L 12/26 20060101 H04L012/26; H04W 28/02 20060101
H04W028/02; H04L 12/823 20060101 H04L012/823 |
Claims
1. A method for adjusting a delay of a buffer at a receiving
terminal, the method comprising: determining, at a processor, a
partial frame recovery rate of lost frames at the receiving
terminal; and adjusting the delay of the buffer based at least in
part on the partial frame recovery rate.
2. The method of claim 1, wherein the delay is further adjusted
based on at least one of late arriving primary frames, late
arriving partial frames, jitter associated with a wireless network,
or a delay loss rate.
3. The method of claim 2, wherein the jitter is based in part on
late partial copies of lost primary frames.
4. The method of claim 2, wherein the delay loss rate is based in
part on late partial copies of primary lost frames.
5. The method of claim 1, wherein the delay is adapted by including
partial frames that arrive before a threshold amount of time after
a playout time.
6. The method of claim 5, wherein the threshold amount of time is
controlled based on a packet loss rate.
7. The method of claim 1, wherein the delay of the buffer
corresponds to a depth of the buffer.
8. The method of claim 1, further comprising comparing the partial
frame recovery rate to a first threshold, wherein adjusting the
delay of the buffer comprises increasing the delay of the buffer in
response to the partial frame recovery rate failing to satisfy the
first threshold.
9. The method of claim 1, further comprising determining a frame
erasure rate for frames received at the receiving terminal, wherein
the delay of the buffer is adjusted based on the frame erasure
rate.
10. The method of claim 9, wherein the delay is adjusted based on a
function of the frame erasure rate.
11. The method of claim 9, further comprising comparing the frame
erasure rate to a second threshold, wherein adjusting the delay of
the buffer comprises increasing the delay of the buffer in response
to the frame erasure rate satisfying the second threshold.
12. The method of claim 1, further comprising: determining a buffer
underflow rate at the receiving terminal, the buffer underflow rate
indicating a rate that frames arrive at the receiving terminal
after corresponding playout times; and increasing a depth of the
buffer upon detecting that the buffer underflow rate satisfies a
threshold.
13. The method of claim 12, further comprising, based on the buffer
underflow rate, adjusting at least one of a minimum depth of the
buffer or a maximum depth of the buffer.
14. The method of claim 1, further comprising increasing implicit
buffer adaptation at the receiving terminal if the partial frame
recovery rate fails to satisfy a threshold.
15. The method of claim 1, wherein determining the partial frame
recovery rate and adjusting the delay are performed at a speech
decoder of a mobile device or a base station.
16. An apparatus comprising: a processor; and a memory storing
instructions executable by the processor to perform operations
comprising: determining a partial frame recovery rate of lost
frames at the receiving terminal; and adjusting a delay of a buffer
based at least in part on the partial frame recovery rate.
17. The apparatus of claim 16, wherein the delay is further
adjusted based on at least one of late arriving primary frames,
late arriving partial frames, jitter associated with a wireless
network, or a delay loss rate.
18. The apparatus of claim 16, wherein the processor and the memory
are integrated into a speech decoder of a mobile device or a base
station.
19. A non-transitory computer-readable medium comprising
instructions for adjusting a delay of a buffer at a receiving
terminal, the instructions, when executed by a processor, cause the
processor to perform operations comprising: determining a partial
frame recovery rate of lost frames at the receiving terminal; and
adjusting the delay of the buffer based at least in part on the
partial frame recovery rate.
20. The non-transitory computer-readable medium of claim 19,
wherein the operations further comprise comparing the partial frame
recovery rate to a first threshold, and wherein adjusting the delay
of the buffer comprises increasing the delay of the buffer in
response to the partial frame recovery rate failing to satisfy the
first threshold.
Description
I. CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims the benefit of U.S.
Provisional Patent Application No. 62/271,994, entitled "SYSTEM AND
METHOD OF JITTER BUFFER MANAGEMENT," filed Dec. 28, 2015, which is
expressly incorporated by reference herein in its entirety.
II. FIELD
[0002] The present disclosure is generally related to jitter buffer
management.
III. DESCRIPTION OF RELATED ART
[0003] Advances in technology have resulted in smaller and more
powerful computing devices. For example, there currently exist a
variety of portable personal computing devices, including wireless
computing devices, such as portable wireless telephones, personal
digital assistants (PDAs), and paging devices that are small,
lightweight, and easily carried by users. More specifically,
portable wireless telephones, such as cellular telephones and
internet protocol (IP) telephones, may communicate voice and data
packets over wireless networks. Further, many such wireless
telephones include other types of devices that are incorporated
therein. For example, a wireless telephone may also include a
digital still camera, a digital video camera, a digital recorder,
and an audio file player. Also, such wireless telephones may
process executable instructions, including software applications,
such as a web browser application, that may be used to access the
Internet. As such, these wireless telephones may include
significant computing capabilities.
[0004] Transmission of voice by digital techniques is widespread,
particularly in long distance and digital radio telephone
applications. To conserve resources, there is interest in sending
as little information over a channel as possible during a digital
voice call, while maintaining a perceived quality of reconstructed
speech. If speech is transmitted by sampling and digitizing, a data
rate on the order of sixty-four kilobits per second (kbps) may be
used to achieve a speech quality of an analog telephone. Through
the use of speech analysis, followed by coding, transmission, and
re-synthesis at a receiver, a significant reduction in the data
rate may be achieved.
[0005] Devices for compressing speech may find use in many fields
of telecommunications. An exemplary field is wireless
communications. The field of wireless communications has many
applications including, e.g., cordless telephones, paging, wireless
local loops, wireless telephony such as cellular and personal
communication service (PCS) telephone systems, mobile Internet
Protocol (IP) telephony, and satellite communication systems. A
particular application is wireless telephony for mobile
subscribers.
[0006] Various over-the-air interfaces have been developed for
wireless communication systems including, e.g., frequency division
multiple access (FDMA), time division multiple access (TDMA), code
division multiple access (CDMA), and time division-synchronous CDMA
(TD-SCDMA). In connection therewith, various domestic and
international standards have been established including, e.g.,
Advanced Mobile Phone Service (AMPS), Global System for Mobile
Communications (GSM), and Interim Standard 95 (IS-95). An exemplary
wireless telephony communication system is a code division multiple
access (CDMA) system. The IS-95 standard and its derivatives,
IS-95A, ANSI J-STD-008, and IS-95B (referred to collectively herein
as IS-95), are promulgated by the Telecommunication Industry
Association (TIA) and other well-known standards bodies to specify
the use of a CDMA over-the-air interface for cellular or PCS
telephony communication systems.
[0007] The IS-95 standard subsequently evolved into "3G" systems,
such as cdma2000 and WCDMA, which provide more capacity and high
speed packet data services. Two variations of cdma2000 are
presented by the documents IS-2000 (cdma2000 1xRTT) and IS-856
(cdma2000 1xEV-DO), which are issued by TIA. The cdma2000 1xRTT
communication system offers a peak data rate of 153 kbps whereas
the cdma2000 1xEV-DO communication system defines a set of data
rates, ranging from 38.4 kbps to 2.4 Mbps. The WCDMA standard is
embodied in 3rd Generation Partnership Project "3GPP", Document
Nos. 3G TS 25.211, 3G TS 25.212, 3G TS 25.213, and 3G TS 25.214.
The International Mobile Telecommunications Advanced (IMT-Advanced)
specification sets out "4G" standards. The IMT-Advanced
specification sets peak data rate for 4G service at 100 megabits
per second (Mbit/s) for high mobility communication (e.g., from
trains and cars) and 1 gigabit per second (Gbit/s) for low mobility
communication (e.g., from pedestrians and stationary users).
[0008] Devices that employ techniques to compress speech by
extracting parameters that relate to a model of human speech
generation are called speech coders. Speech coders may comprise an
encoder and a decoder. The encoder divides the incoming speech
signal into blocks of time or analysis frames. The duration of each
segment in time (or "frame") may be selected to be short enough
that the spectral envelope of the signal may be expected to remain
relatively stationary. For example, one frame length is twenty
milliseconds, which corresponds to 160 samples at a sampling rate
of eight kilohertz (kHz), although any frame length or sampling
rate deemed suitable for the particular application may be
used.
[0009] The encoder analyzes the incoming speech frame to extract
certain relevant parameters, and then quantizes the parameters into
binary representation, e.g., to a set of bits or a binary data
packet. The data packets are transmitted over a communication
channel (i.e., a wired and/or wireless network connection) to a
receiver and a decoder. The decoder processes the data packets,
unquantizes the processed data packets to produce the parameters,
and resynthesizes the speech frames using the unquantized
parameters.
[0010] The function of the speech coder is to compress the
digitized speech signal into a low-bit-rate signal by removing
natural redundancies inherent in speech. The digital compression
may be achieved by representing an input speech frame with a set of
parameters and employing quantization to represent the parameters
with a set of bits. If the input speech frame has a number of bits
N.sub.i and a data packet produced by the speech coder has a number
of bits N.sub.o, the compression factor achieved by the speech
coder is C.sub.r=N.sub.i/N.sub.o. The challenge is to retain high
voice quality of the decoded speech while achieving the target
compression factor. The performance of a speech coder depends on
(1) how well the speech model, or the combination of the analysis
and synthesis process described above, performs, and (2) how well
the parameter quantization process is performed at the target bit
rate of N.sub.o bits per frame. The goal of the speech model is
thus to capture the essence of the speech signal, or the target
voice quality, with a small set of parameters for each frame.
[0011] Speech coders generally utilize a set of parameters
(including vectors) to describe the speech signal. A good set of
parameters provides a low system bandwidth for the reconstruction
of a perceptually accurate speech signal. Pitch, signal power,
spectral envelope (or formants), amplitude and phase spectra are
examples of the speech coding parameters.
[0012] Speech coders may be implemented as time-domain coders,
which attempt to capture the time-domain speech waveform by
employing high time-resolution processing to encode small segments
of speech (e.g., 5 millisecond (ms) sub-frames) at a time. For each
sub-frame, a high-precision representative from a codebook space is
found by means of a search algorithm. Alternatively, speech coders
may be implemented as frequency-domain coders, which attempt to
capture the short-term speech spectrum of the input speech frame
with a set of parameters (analysis) and employ a corresponding
synthesis process to recreate the speech waveform from the spectral
parameters. The parameter quantizer preserves the parameters by
representing them with stored representations of code vectors in
accordance with known quantization techniques.
[0013] One time-domain speech coder is the Code Excited Linear
Predictive (CELP) coder. In a CELP coder, the short-term
correlations, or redundancies, in the speech signal are removed by
a linear prediction (LP) analysis, which finds the coefficients of
a short-term formant filter. Applying the short-term prediction
filter to the incoming speech frame generates an LP residue signal,
which is further modeled and quantized with long-term prediction
filter parameters and a subsequent stochastic codebook. Thus, CELP
coding divides the task of encoding the time-domain speech waveform
into the separate tasks of encoding the LP short-term filter
coefficients and encoding the LP residue. Time-domain coding can be
performed at a fixed rate (i.e., using the same number of bits, No,
for each frame) or at a variable rate (in which different bit rates
are used for different types of frame contents). Variable-rate
coders attempt to use the amount of bits needed to encode the codec
parameters to a level adequate to obtain a target quality.
[0014] Time-domain coders such as the CELP coder may rely upon a
high number of bits, No, per frame to preserve the accuracy of the
time-domain speech waveform. Such coders may deliver excellent
voice quality provided that the number of bits, No, per frame is
relatively large (e.g., 8 kbps or above). At low bit rates (e.g., 4
kbps and below), time-domain coders may fail to retain high quality
and robust performance due to the limited number of available bits.
At low bit rates, the limited codebook space clips the
waveform-matching capability of time-domain coders, which are
deployed in higher-rate commercial applications. Hence, despite
improvements over time, many CELP coding systems operating at low
bit rates suffer from perceptually significant distortion
characterized as noise.
[0015] An alternative to CELP coders at low bit rates is the "Noise
Excited Linear Predictive" (NELP) coder, which operates under
similar principles as a CELP coder. NELP coders use a filtered
pseudo-random noise signal to model speech, rather than a codebook.
Since NELP uses a simpler model for coded speech, NELP achieves a
lower bit rate than CELP. NELP may be used for compressing or
representing unvoiced speech or silence.
[0016] Coding systems that operate at rates on the order of 2.4
kbps are generally parametric in nature. That is, such coding
systems operate by transmitting parameters describing the
pitch-period and the spectral envelope (or formants) of the speech
signal at regular intervals. Illustrative of these so-called
parametric coders is the LP vocoder system.
[0017] LP vocoders model a voiced speech signal with a single pulse
per pitch period. This basic technique may be augmented to include
transmission information about the spectral envelope, among other
things. Although LP vocoders provide reasonable performance
generally, they may introduce perceptually significant distortion,
characterized as buzz.
[0018] In recent years, coders have emerged that are hybrids of
both waveform coders and parametric coders. Illustrative of these
so-called hybrid coders is the prototype-waveform interpolation
(PWI) speech coding system. The PWI coding system may also be known
as a prototype pitch period (PPP) speech coder. A PWI coding system
provides an efficient method for coding voiced speech. The basic
concept of PWI is to extract a representative pitch cycle (the
prototype waveform) at fixed intervals, to transmit its
description, and to reconstruct the speech signal by interpolating
between the prototype waveforms. The PWI method may operate either
on the LP residual signal or the speech signal.
[0019] Electronic devices, such as wireless telephones, may send
and receive data via networks. For example, audio data may be sent
and received via a circuit-switched network (e.g., the public
switched telephone network (PSTN), a global system for mobile
communications (GSM) network, etc.) or a packet-switched network
(e.g., a voice over internet protocol (VoIP) network, a voice over
long term evolution (VoLTE) network, etc.). In a packet-switched
network, audio packets may be individually routed from a source
device to a destination device. Due to network conditions, the
audio packets may arrive out of order. The destination device may
store received packets in a jitter buffer and may rearrange the
received packets if the received packets are out-of-order.
[0020] The destination device may reconstruct data based on the
received packets. A particular packet sent by the source device may
not be received, or may be received with errors, by a destination
device. The destination device may be unable to recover all or a
portion of the data associated with the particular packet. The
destination device may reconstruct the data based on incomplete
packets. The data reconstructed based on incomplete packets may
have degraded quality that adversely impacts a user experience.
Alternatively, the destination device may request the source device
to retransmit the particular packet and may delay reconstructing
the data while waiting to receive a retransmitted packet. The delay
associated with requesting retransmission and reconstructing the
data based on a retransmitted packet may be perceptible to a user
and may result in a negative user experience.
IV. SUMMARY
[0021] According to one implementation of the present disclosure, a
method for adjusting a delay (e.g., a playout delay) of a buffer at
a receiving terminal includes determining, at a processor, a
partial frame recovery rate of lost frames at the receiving
terminal. The method also includes adjusting the delay of the
buffer based at least in part on the partial frame recovery
rate.
[0022] According to another implementation of the present
disclosure, an apparatus for adjusting a delay of a buffer at a
receiving terminal includes a processor and a memory storing
instructions that are executable by the processor to perform
operations. The operations include determining a partial frame
recovery rate of lost frames at the receiving terminal and
adjusting the delay of the buffer based at least in part on the
partial frame recovery rate.
[0023] According to another implementation of the present
disclosure, a non-transitory computer-readable medium includes
instructions for adjusting a delay of a buffer at a receiving
terminal. The instructions, when executed by a processor, cause the
processor to perform operations including determining a partial
frame recovery rate of lost frames at the receiving terminal and
adjusting the delay of the buffer based at least in part on the
partial frame recovery rate.
[0024] According to another implementation of the present
disclosure, an apparatus for adjusting a delay of a buffer at a
receiving terminal includes means for determining a partial frame
recovery rate of lost frames at the receiving terminal. The
apparatus also includes means for adjusting the delay of the buffer
based at least in part on the partial frame recovery rate.
V. BRIEF DESCRIPTION OF THE DRAWINGS
[0025] FIG. 1 is a block diagram of a particular illustrative
implementation of a system that is operable to adjust a delay of a
buffer at a receiving terminal;
[0026] FIG. 2 is a diagram of a particular implementation of method
of adjusting a delay of a buffer at a receiving terminal;
[0027] FIG. 3 is a diagram of another particular implementation of
method of adjusting a delay of a buffer at a receiving
terminal;
[0028] FIG. 4 is a block diagram of a particular illustrative
implementation of a device that is operable to adjust a delay of a
buffer at a receiving terminal; and
[0029] FIG. 5 is a block diagram of a base station that is operable
to adjust a delay of a buffer.
VI. DETAILED DESCRIPTION
[0030] The principles described herein may be applied, for example,
to a headset, a handset, other audio device, or a component of a
device that is configured to use jitter buffer. Unless expressly
limited by its context, the term "signal" is used herein to
indicate any of its ordinary meanings, including a state of a
memory location (or set of memory locations) as expressed on a
wire, bus, or other transmission medium. Unless expressly limited
by its context, the term "generating" is used herein to indicate
any of its ordinary meanings, such as computing or otherwise
producing. Unless expressly limited by its context, the term
"calculating" is used herein to indicate any of its ordinary
meanings, such as computing, evaluating, smoothing, and/or
selecting from a plurality of values. Unless expressly limited by
its context, the term "obtaining" is used to indicate any of its
ordinary meanings, such as calculating, deriving, receiving (e.g.,
from another component, block or device), and/or retrieving (e.g.,
from a memory register or an array of storage elements).
[0031] Unless expressly limited by its context, the term
"producing" is used to indicate any of its ordinary meanings, such
as calculating, generating, and/or providing. Unless expressly
limited by its context, the term "providing" is used to indicate
any of its ordinary meanings, such as calculating, generating,
and/or producing. Unless expressly limited by its context, the term
"coupled" is used to indicate a direct or indirect electrical or
physical connection. If the connection is indirect, it is well
understood by a person having ordinary skill in the art, that there
may be other blocks or components between the structures being
"coupled".
[0032] The term "configuration" may be used in reference to a
method, apparatus/device, and/or system as indicated by its
particular context. Where the term "comprising" is used in the
present description and claims, it does not exclude other elements
or operations. The term "based on" (as in "A is based on B") is
used to indicate any of its ordinary meanings, including the cases
(i) "based on at least" (e.g., "A is based on at least B") and, if
appropriate in the particular context, (ii) "equal to" (e.g., "A is
equal to B"). In the case (i) where A is based on B includes based
on at least, this may include the configuration where A is coupled
to B. Similarly, the term "in response to" is used to indicate any
of its ordinary meanings, including "in response to at least." The
term "at least one" is used to indicate any of its ordinary
meanings, including "one or more". The term "at least two" is used
to indicate any of its ordinary meanings, including "two or
more".
[0033] The terms "apparatus" and "device" are used generically and
interchangeably unless otherwise indicated by the particular
context. Unless indicated otherwise, any disclosure of an operation
of an apparatus having a particular feature is also expressly
intended to disclose a method having an analogous feature (and vice
versa), and any disclosure of an operation of an apparatus
according to a particular configuration is also expressly intended
to disclose a method according to an analogous configuration (and
vice versa). The terms "method," "process," "procedure," and
"technique" are used generically and interchangeably unless
otherwise indicated by the particular context. The terms "element"
and "module" may be used to indicate a portion of a greater
configuration. The term "packet" may correspond to one or more
frames. Any incorporation by reference of a portion of a document
shall also be understood to incorporate definitions of terms or
variables that are referenced within the portion, where such
definitions appear elsewhere in the document, as well as any
figures referenced in the incorporated portion.
[0034] As used herein, the term "communication device" refers to an
electronic device that may be used for voice and/or data
communication over a wireless communication network. Examples of
communication devices include cellular phones, personal digital
assistants (PDAs), handheld devices, headsets, wireless modems,
laptop computers, personal computers, etc.
[0035] Referring to FIG. 1, a particular illustrative
implementation of a system operable to perform jitter buffer
management is disclosed and generally designated 100. The system
100 may include a destination device 102 (e.g., a receiving
terminal) in communication with one or more other devices (e.g., a
source device 104 or transmitting terminal) via a network 190. The
source device 104 may include or may be coupled to a microphone
146. The destination device 102 may include or may be coupled to a
speaker 142. The destination device 102 may include an analyzer 122
coupled to, or in communication with, a memory 176. The destination
device 102 may include a receiver 124, a transmitter 192, a buffer
126, a speech decoder 156, or a combination thereof.
[0036] The memory 176 may be configured to store analysis data 120.
The analysis data 120 may include packet recovery rate data 106, a
buffer depth data 110, a count of lost packets 114, frame erasure
rate (FER) data 154, a first threshold (e.g., a packet recovery
rate threshold 136), a second threshold (e.g., a FER threshold
138), other analysis data 140, or a combination thereof. The packet
recovery rate data 106 may indicate a rate at which lost packets
are "recovered" by use of partial copies. For example, if a
particular packet (or frame) transmitted by the source device 104
is lost during transmission and a subsequent packet stored in the
buffer 126 includes a partial copy of the particular packet, the
speech decoder 156 may use the subsequent packet to "recover" (or
regenerate) the particular packet and the rate at which lost
packets are recovered may increase. The buffer depth data 110 may
indicate a depth of a jitter buffer, such as the buffer 126. The
depth may be analogous to the size of the buffer 126, the delay
(e.g., the playout delay) of the buffer 126, or both. The analyzer
122 may cause a delay of the buffer 126 may be changed (e.g.,
increased or decreased) by changing a value of the buffer depth
data 110. The FER data 154 may indicate an error rate of packets
(or frames) received by the destination device 102. According to
one implementation, the error rate may be expressed as the number
of packets received with errors divided by the total number of
packets received. According to another implementation, the error
rate may be expressed as the number of packets (or frames) lost
during transmission (e.g., not received by the destination device
102) divided by the total number of packets (or frames) transmitted
by the source device 104.
[0037] The destination device 102 may include fewer or more
components than illustrated in FIG. 1. For example, the destination
device 102 may include one or more processors, one or more memory
units, or both. The destination device 102 may include a networked
or a distributed computing system. For example, the memory 176 may
be a networked or a distributed memory. In a particular
illustrative implementation, the destination device 102 may include
a communication device, a decoder, a smart phone, a cellular phone,
a mobile communication device, a laptop computer, a computer, a
tablet, a personal digital assistant (PDA), a set top box, a video
player, an entertainment unit, a display device, a television, a
gaming console, a music player, a radio, a digital video player, a
digital video disc (DVD) player, a tuner, a camera, a navigation
device, or a combination thereof. Such devices may include a user
interface (e.g., a touch screen, voice recognition capability, or
other user interface capabilities).
[0038] During operation, a first user 152 may be engaged in a voice
call with a second user 194. The first user 152 may use the
destination device 102 and the second user 194 may use the source
device 104 for the voice call. During the voice call, the second
user 194 may speak into the microphone 146 associated with the
source device 104. An input speech signal 130 may correspond to a
portion of a word, a word, or multiple words spoken by the second
user 194. For example, the input speech signal 130 may include
first data 164 and second data 166. The first data 164 and the
second data 166 may be pulse code modulated (PCM) data or analog
data. The source device 104 may receive the input speech signal
130, via the microphone 146, from the second user 194. In a
particular implementation, the microphone 146 may capture an audio
signal and an analog-to-digital converter (ADC) may convert the
captured audio signal from an analog waveform into a digital
waveform comprised of digital audio samples. The digital audio
samples may be processed by a digital signal processor. A gain
adjuster may adjust a gain (e.g., of the analog waveform or the
digital waveform) by increasing or decreasing an amplitude level of
an audio signal (e.g., the analog waveform or the digital
waveform). Gain adjusters may operate in either the analog or
digital domain. For example, a gain adjuster may operate in the
digital domain and may adjust the digital audio samples produced by
the analog-to-digital converter. After gain adjusting, an echo
canceller may reduce echo that may have been created by an output
of a speaker entering the microphone 146. The digital audio samples
may be "compressed" by a vocoder (a voice encoder-decoder). The
output of the echo canceller may be coupled to vocoder
pre-processing blocks, e.g., filters, noise processors, rate
converters, etc. An encoder of the vocoder may compress the digital
audio samples and form a sequence of packets (e.g., a first packet
132 and a second packet 134). Each of the sequence of packets may
include a representation of the compressed bits of the digital
audio samples. For example, the first packet 132 may be earlier in
the sequence of packets than the second packet 134. To illustrate,
the first packet 132 may include a digitized representation of the
first data 164 corresponding to a particular audio frame (e.g., an
audio frame N) and the second packet 134 may include a digitized
representation of the second data 166 corresponding to a subsequent
audio frame (e.g., an audio frame N+2). For example, the digitized
representations of the first data 164 and the second data 166 may
be in the form of a compressed bit stream.
[0039] In a particular implementation, a subsequent packet (e.g.,
the second packet 134) may also include redundant data (e.g., a
partial copy of the first packet 132) that may be used to
reconstruct a previous audio frame (e.g., the audio frame N). For
example, the second packet 134 may include a first partial copy 174
corresponding to at least a portion of the first data 164. In a
particular implementation, the redundant data (e.g., the first
partial copy 174) may correspond to a "critical" speech frame. For
example, a loss of the critical speech frame may cause a
user-perceptible degradation in audio quality of a processed speech
signal generated at the destination device 102.
[0040] In a particular implementation, the source device 104 and
the destination device 102 may operate on a constant-bit-rate
(e.g., 13.2 kilobit per second (kbps)) channel. In this
implementation, a primary frame bit-rate corresponding to primary
data (e.g., the second data 166) may be reduced (e.g., to 9.6 kbps)
to accommodate the redundant data (e.g., the first partial copy
174). For example, a remaining bit-rate (e.g., 3.6 kbps) of the
constant-bit-rate may correspond to the redundant data. In a
particular implementation, the reduction of the primary frame
bit-rate may be performed at the source device 104 depending on
characteristics of the input speech signal 130 to have reduced
impact on overall speech quality.
[0041] The source device 104 may transmit the sequence of packets
(e.g., the first packet 132, the second packet 134, or both) to the
destination device 102 via the network 190, such as the second
packet 134 illustrated as being received at the destination device
102 from the network 190. For example, the source device 104 may
include a transceiver. The transceiver may modulate some form of
the sequence of packets (e.g., other information may be appended to
the packets 132 and 134). The transceiver may send the modulated
information over the air via an antenna.
[0042] The analyzer 122 of the destination device 102 may receive
one or more packets (e.g., the first packet 132, the second packet
134, or both) of the sequence of packets. For example, an antenna
of the destination device 102 may receive some form of incoming
packets that include the first packet 132, the second packet 134,
or both. The first packet 132, the second packet 134, or both, may
be "uncompressed" by a decoder of a vocoder at the destination
device 102. The uncompressed waveform may be referred to as
reconstructed audio samples. The reconstructed audio samples may be
post-processed by vocoder post-processing blocks and an echo
canceller may remove echo based on the reconstructed audio samples.
For the sake of clarity, the decoder of the vocoder and the vocoder
post-processing blocks may be referred to as a vocoder decoder
module. In some configurations, an output of the echo canceller may
be processed by the analyzer 122. Alternatively, in other
configurations, the output of the vocoder decoder module may be
processed by the analyzer 122.
[0043] The analyzer 122 may store the packets (e.g., the first
packet 132, the second packet 134, or both) received by the
destination device 102 in the buffer 126 (e.g., a jitter buffer).
In a particular implementation, the packets may be received
out-of-order at the destination device 102. The analyzer 122 may
reorder one or more packets in the buffer 126 if the packets are
out-of-order. One or more packets of the sequence of packets sent
by the source device 104 may not be received, or may be received
with errors, by the destination device 102. For example, a packet
(e.g., the first packet 132) may not be received due to packet loss
or may be partially received, due to network conditions, by the
receiver 124.
[0044] The analyzer 122 may determine whether a particular packet
of the sequence of packets is missing from the buffer 126. For
example, each packet in the buffer 126 may include a sequence
number. The analyzer 122 may maintain a counter (e.g., a next
sequence number) in the analysis data 120. For example, the next
sequence number may have a starting value (e.g., 0). The analyzer
122 may update (e.g., increment by 1) the next sequence number
after processing each packet corresponding to a particular input
signal (e.g., the input speech signal 130). The analyzer 122 may
reset the next sequence number to the starting value after
processing a last packet corresponding to the particular input
signal (e.g., the input speech signal 130).
[0045] The analyzer 122 may determine that the buffer 126 includes
a next packet (e.g., the first packet 132) having the next sequence
number. The analyzer 122 may generate a processed speech signal
based on at least the next packet (e.g., the first packet 132). In
a particular implementation, the analyzer 122 may provide the first
packet 132 to the speech decoder 156 and the speech decoder 156 may
generate the processed speech signal. The analyzer 122 (or the
speech decoder 156) may generate the processed speech signal based
on the first packet 132 and the second packet 134. The processed
speech signal may correspond to the first data 164 of the first
packet 132 and the second data 166 of the second packet 134. The
analyzer 122 (or the speech decoder 156) may output the processed
speech signal via the speaker 142 to the first user 152. The
analyzer 122 may update (e.g., increment or reset) the next
sequence number.
[0046] The analyzer 122 may be configured to determine a partial
frame recovery rate of lost frames at the destination device 102
(e.g., the receiving terminal). As used herein, the "partial frame
recovery rate" may indicate a rate at which lost packets are
"recovered" by use of partial copies. For example, if a particular
packet (or frame) transmitted by the source device 104 is lost
during transmission and another packet (e.g., a subsequent packet
or a previous packet) stored in the buffer 126 includes a partial
copy of the particular packet, the speech decoder 156 may use the
other packet to "recover" (or regenerate) the particular packet and
the rate (e.g., the "partial frame recovery rate") at which lost
packets are recovered may increase. The analyzer 122 may retrieve
the packet recovery rate data 106 and determine the partial frame
recovery rate based on the packet recovery rate data 106. The
analyzer 122 may be configured to adjust the delay of the buffer
126 based at least on the partial frame recovery rate. To
illustrate, the analyzer 122 may compare the partial frame recovery
rate to the packet recovery rate threshold 136 (e.g., the first
threshold). If the partial frame recovery rate fails to satisfy the
packet recovery rate threshold 136, the delay of the buffer 126 may
be increased to store additional packets, and thus to increase the
partial frame recovery rate. According to one implementation, in
response to the partial frame recovery rate satisfying the packet
recovery rate threshold, the delay of the buffer 126 may be
decreased to improve latency.
[0047] The analyzer 122 may also be configured to determine a frame
erasure rate for frames received at the destination device 102. As
used herein, the "frame erasure rate" may indicate a rate at which
packets are unsuccessfully decoded at the destination device 102.
As a non-limiting example, the frame erasure rate may be expressed
as the number of frames lost during transmission (or received with
errors) that are not regenerated using redundancy data, divided by
the total number frames transmitted by the source device 104. The
analyzer 122 may retrieve the FER data 154 and determine the frame
erasure rate based on the FER data 154. To illustrate, the analyzer
122 may compare the frame erasure rate to the FER threshold 138
(e.g., the second threshold). According to one implementation, the
FER threshold 138 may be based on an EVS specification. For
example, the FER threshold 138 may correspond to the maximum frame
erasure rate to maintain a communication session according to the
EVS specification. In response to the frame erasure rate satisfying
the FER threshold 138, the delay of the buffer 126 may be increased
to store additional packets, and thus to decrease the frame erasure
rate.
[0048] The increase in the delay (or size) of the buffer 126 may be
based on (e.g., may be a function of) the partial frame recovery
rate. In some implementations, a level of jitter of packets (e.g.,
an amount of variation in packet arrival delays) may be used to
adjust the depth of the buffer 126. Below, is a non-limiting
example of condition that may be used to determine whether late
partial frames are used in a jitter computation. It should be
understood that the equation is not to be construed as limiting,
and other equations (or expressions) may be used in the
determination. As a non-limiting example, the condition may be
expressed as: [0049] If (arrival time of the partial frame
N<[playout_time_of_partial_frame_N+k1*(max(X,FER_rate)-X)*(1-K2*[Y-min-
(Y, partial_frame_recovery_rate)])]).
[0050] If the condition in the above equation is satisfied, late
partial frames may be used in the jitter computation. If the
condition is not satisfied, late partial frames are not used in the
jitter computation. If late partial frames are used in the jitter
computation, the delay is increased. According to the above
equation, k1 may be a constant, Y may correspond to a threshold
partial copy recovery rate to start the adjustment, and X may
correspond to the threshold frame erasure rate. According to one
implementation of the above equation, if
T.sub.1<T.sub.playout+k1 * [Y-min(Y,R)], then frame N is used
for the playout delay determination, where T.sub.1 is the arrival
time of frame N, T.sub.playout is the playout time of frame N, Y is
the minimum required partial frame recovery rate, and R is the
current partial frame recovery rate. If R>Y, the above equation
becomes T.sub.1<T.sub.playout+k1*[Y-R]. Thus, if Y equals 50
percent and R equals 20 percent, the playout delay will be
increased. If Y equals 50 percent and R equals 70 percent, the
playout delay will not be increased. If
T.sub.1<T.sub.playout+k1*[max(X,L)-X}* [Y-min(Y,R)], the partial
frames in the playout delay determination may be used. If R equal
30 percent, L=5 percent, X equals 2 percent, and Y equals 50
percent, then the playout delay will be increased. According to one
implementation, the delay of the buffer 126 may have a maximum
value to substantially limit effects on latency.
[0051] The analyzer 122 may determine whether a particular packet
(e.g., the first packet 132) of the sequence of packets sent by the
source device 104 is missing from the buffer 126. For example, the
analyzer 122 may determine that the first packet 132 is missing
based on determining that the buffer 126 does not store a next
packet (e.g., the first packet 132) having the next sequence
number. To illustrate, the analyzer 122 may determine that the
first packet 132 is missing in response to determining that a
packet (e.g., the first packet 132) corresponding to the next
sequence number is not found in the buffer 126.
[0052] The analyzer 122 may determine whether a partial copy of the
first packet 132 is stored in the buffer 126 as error correction
data in another packet (e.g., the second packet 134) stored in the
buffer 126. For example, one or more fields in a header of each
packet may indicate whether the packet includes error correction
data and may indicate a corresponding packet. The analyzer 122 may
examine the particular field of one or more packets (e.g., the
second packet 134) stored in the buffer 126. For example, the
buffer 126 may store the second packet 134. A particular field in
the header of the second packet 134 may indicate that the second
packet 134 includes error correction data corresponding to the
first packet 132. For example, the particular field may indicate a
sequence number of the first packet 132. The analyzer 122 may
determine that the partial copy of the first packet 132 is stored
in the buffer 126 based on determining that the particular field of
the second packet 134 indicates the sequence number of the first
packet 132.
[0053] The analyzer 122 may generate a processed speech signal 116
based on at least the next packet (e.g., the second packet 134).
For example, the analyzer 122 may generate the processed speech
signal 116 based on the first partial copy 174 and the second data
166. The first partial copy 174 may include at least a portion of
the first data 164 of the first packet 132. In a particular
implementation, the first data 164 may correspond to first speech
parameters of a first speech frame. The first partial copy 174 may
include the first speech parameters. In a particular
implementation, the second data 166 may correspond to second speech
parameters of a second speech frame and the first partial copy 174
may correspond to a difference between the first speech parameters
and the second speech parameters. In this implementation, the
analyzer 122 may generate the first speech parameters based on a
sum of the second speech parameters and the first partial copy
174.
[0054] The analyzer 122 may generate the processed speech signal
116 based on the first speech parameters. It will be appreciated
that having the first partial copy 174 as error correction data in
the second packet 134 may enable generation of the processed speech
signal 116 based on the first speech parameters of the particular
speech frame even when the first packet 132 corresponding to the
particular speech frame is missing from the buffer 126.
[0055] In a particular implementation, the analyzer 122 may provide
the first partial copy 174, the second packet 134, or the first
speech parameters to the speech decoder 156 and the speech decoder
156 may generate the processed speech signal 116. The analyzer 122
(or the speech decoder 156) may output the processed speech signal
116 via the speaker 142 to the first user 152. The analyzer 122 may
update (e.g., increment or reset) the next sequence number. The
processed speech signal 116 may have a better audio quality than a
processed speech signal generated based only on the second data
166. For example, the processed speech signal 116 generated based
on the first partial copy 174 and the second data 166 may have
fewer user perceptible artifacts than the processed speech signal
generated based on the second data 166 and not based on the first
data 164 (or the first partial copy 174).
[0056] In a particular implementation, the analyzer 122 may
determine that the first packet 132 and the second packet 134 are
missing from the buffer 126. For example, the analyzer 122 may
determine that the first packet 132 is missing from the buffer 126
and that the buffer 126 does not store the partial copy of the
first packet 132 as error correction data in another packet. To
illustrate, the analyzer 122 may determine that the sequence number
of the first packet 132 is not indicated by the particular field of
any of the packets corresponding to the input speech signal 130
that are stored in the buffer 126. The analyzer 122 may update the
count of lost packets 114 based on determining that the first
packet 132 and the second packet 134 are missing from the buffer
126. In a particular implementation, the analyzer 122 may update
(e.g., increment by 1) the count of lost packets 114 to reflect
that the first packet 132 is missing from the buffer 126 and that
the buffer 126 does not store a packet (e.g., the second packet
134) that includes a partial copy of the first packet 132. The
analyzer 122 may update (e.g., increment or reset) the next
sequence number.
[0057] According to one implementation, the first packet 132 may
include a first sequence number (e.g., a first generation
timestamp) and the second packet 134 may include a second sequence
number (e.g., a second generation timestamp). The first generation
timestamp may indicate a first time at which the first packet 132
is generated by the source device 104 and the second generation
timestamp may indicate a second time at which the second packet 134
is generated by the source device 104. The first partial copy 174
may include the first sequence number (e.g., the first generation
timestamp).
[0058] Each packet that is received by the destination device 102
may be assigned a receive timestamp by the receiver 124, the
analyzer 122, or by another component of the destination device
102. For example, the second packet 134 may be assigned a second
receive timestamp. The analyzer 122 may determine a first receive
timestamp based on the second receive timestamp and may assign the
first receive timestamp to the first partial copy 174. The first
receive timestamp may be the same as or distinct from the second
receive timestamp. For example, the first receive timestamp may
indicate a first receive time that is earlier than a second receive
time indicated by the second receive timestamp. In this example,
the first receive time may correspond to an estimated time at which
the first packet 132 would have been received in a timely manner.
To illustrate, the first receive time may correspond to an
estimated receive time of the first packet 132 if the first packet
132 had not been delayed or lost.
[0059] The analyzer 122 may process a packet based on a receive
timestamp associated with the packet, the buffer delay, a buffer
timeline, and a last played packet, as described herein. The buffer
delay may correspond to a threshold time that a packet is to be
stored in the buffer 126. For example, the buffer delay may
indicate a first threshold time (e.g., 5 milliseconds). A packet
may be received at a first receive time (e.g., 1:00:00.000 PM). A
receive timestamp indicating the first receive time may be
associated with the packet. A second time (e.g., 1:00:00.005 PM)
may correspond to a sum of the first receive time indicated by the
receive timestamp and the buffer delay. The packet may be processed
at or subsequent to the second time.
[0060] The buffer timeline may indicate a next packet to be
processed. For example, the buffer timeline may indicate a sequence
number of a particular packet that was most recently processed from
the buffer 126 or for which an erasure was most recently played. To
illustrate, the analyzer 122 may update the buffer timeline to
indicate a first sequence number of a packet in response to
processing the packet from the buffer 126, processing a partial
copy of the packet from the buffer 126, or playing an erasure
corresponding to the packet. In this example, the analyzer 122 may
determine a next sequence number of the next packet to be processed
based on the sequence number (e.g., the first sequence number)
indicated by the buffer timeline.
[0061] The last played packet may indicate the particular packet
that was most recently processed from the buffer 126. Processing
the particular packet from the buffer 126 may include processing
the particular packet from the buffer 126 or processing a partial
copy of the particular packet from the buffer 126. The analyzer 122
may update the last played packet to indicate a first sequence
number of a packet in response to processing the packet from the
buffer 126 or processing a partial copy of the packet from the
buffer 126.
[0062] The analyzer 122 may determine that the last played packet
indicates a previous packet that was most recently processed from
the buffer 126 by the analyzer 122. The analyzer 122 may determine
that a particular packet (e.g., the first packet 132) is subsequent
to the previous packet in the sequence of packets. The analyzer 122
may determine whether a next packet to be processed indicated by
the buffer timeline is the same as or subsequent to the first
packet 132 in the sequence of packets. The analyzer 122 may, at
approximately a first playback time, play an erasure in response to
determining that the next packet to be processed, as indicated by
the buffer timeline, is prior to the first packet 132 in the
sequence of packets.
[0063] The analyzer 122 may update the buffer timeline subsequent
to playing the erasure. For example, the buffer timeline may, prior
to the erasure being played, indicate that a first particular
packet is the next packet to be processed. The analyzer 122 may,
subsequent to playing the erasure, update the buffer timeline to
indicate that a second particular packet is the next packet to be
processed. The second particular packet may be next after the first
particular packet in the sequence of packets.
[0064] Alternatively, the analyzer 122 may, in response to
determining that the next packet to be processed indicated by the
buffer timeline is the same as or subsequent to the first packet
132 in the sequence of packets, determine whether the buffer 126
stores the first packet 132 (or the first partial copy 174). The
analyzer 122 may, in response to determining that the buffer 126
stores the first partial copy 174, determine that the first partial
copy 174 is associated with the first receive timestamp indicating
the first receive time. The analyzer 122 may, at approximately the
first playback time, process the first partial copy 174 from the
buffer 126 in response to determining that the first time is
greater than or equal to a sum of the first receive time and the
buffer delay. The buffer delay may correspond to a threshold time
that a packet is to be stored in the buffer 126. In a particular
implementation, the analyzer 122 may process the first partial copy
174 irrespective of whether the first partial copy 174 has been
stored in the buffer 126 for the threshold time. In this
implementation, the first receive time may be earlier than the
second receive time. For example, the first receive time may
correspond to an expected receive time of the first packet 132 if
the first packet 132 had been received in a timely manner. The
analyzer 122 may process the first partial copy 174 at
approximately the first playback time in response to determining
that the first packet 132 would have been stored in the buffer 126
for at least the threshold time if the first packet 132 had been
received in the timely manner. The buffer delay may include a
default value, may be based on user input from the first user 152,
or both. The analyzer 122 may adjust the buffer delay, as described
herein. The analyzer 122 may, subsequent to processing the first
partial copy 174 from the buffer 126, update the last played packet
to indicate the first packet 132 and may update the buffer timeline
to indicate a second particular packet (e.g., the second packet
134) as the next packet to be processed. The second particular
packet (e.g., the second packet 134) may be next after the first
packet 132 in the sequence of packets.
[0065] In a particular implementation, the analyzer 122 may, in
response to determining that the first packet 132 and the first
partial copy 174 are missing from the buffer 126, perform a similar
analysis on the second particular packet (e.g., the second packet
134) as performed on the first packet 132. For example, the
analyzer 122 may play an erasure in response to determining that
the next packet to be processed indicated by the buffer timeline is
prior to the second particular packet in the sequence of packets
and may update the buffer timeline subsequent to playing the
erasure. Alternatively, the analyzer 122 may, at approximately the
first playback time, process the second particular packet from the
buffer 126 in response to determining that the next packet to be
processed indicated by the buffer timeline is the same as or
subsequent to the second particular packet, that the second
particular packet or a partial copy of the second particular packet
is stored in the buffer 126, and that the first playback time is
greater than or equal to a sum of the buffer delay and a particular
receive time associated with the second particular packet.
[0066] The destination device 102 may receive the sequence of
packets (e.g., the first packet 132, the second packet 134, or
both) during a phone call. The first packet 132, the second packet
134, or both, may include speech data. The analyzer 122 may
determine or update the buffer delay, as described herein, at a
beginning of a talk spurt or at an end of the talk spurt during the
phone call. A talk spurt may correspond to a continuous segment of
speech between silent intervals during which background noise may
be heard. For example, a first talk spurt may correspond to speech
of the first user 152 and a second talk spurt may correspond to
speech of the second user 194. The first talk spurt and the second
talk spurt may be separated by a period of silence or background
noise.
[0067] The analyzer 122 may determine a previous delay loss rate.
As used herein, a "delay loss rate" is computed on a past window of
frames. If a frame arrives after its playout time, then the frame
is a "delay loss frame". The delay loss rate may be expressed as
the number of delay loss frames divided by the total number of
frames transmitted. The previous delay loss rate may correspond to
a delay loss rate determined during a previous adjustment of the
buffer delay at a first update time. The analyzer 122 may maintain
a count of delay loss packets. The count of delay loss packets may
indicate a number of packets that are received subsequent to
processing of partial copies of the packets from the buffer 126 at
corresponding playback times. The corresponding playback times may
be subsequent to the first update time. For example, the analyzer
122 may, subsequent to the first update time, process the first
partial copy 174 from the buffer 126 at a first playback time
associated with the first packet 132. The analyzer 122 may
determine that a first time corresponds to the first playback time
based on determining that one or more conditions are satisfied. For
example, the first time may correspond to the first playback time
if, at the first time, the last played packet is prior to the first
packet 132 and the first packet 132 is prior to or the same as the
next packet to be processed as indicated by the buffer timeline.
The first time may correspond to the first playback time if the
first time is greater than or equal to a sum of a receive time
associated with the first packet 132 (e.g., the first receive time
of the first partial copy 174) and the buffer delay. The first time
may correspond to the first playback time if the first packet 132
is the earliest packet in the sequence of packets that satisfies
the preceding conditions at the first time. The analyzer 122 may
update (e.g., increment) the count of delay loss packets in
response to receiving the first packet 132 subsequent to processing
the first partial copy 174.
[0068] The analyzer 122 may maintain a received packets count. For
example, the analyzer 122 may reset the received packets count
subsequent to the first update time. The analyzer 122 may update
(e.g., increment by 1) the received packets count in response to
receiving a packet (e.g., the second packet 134). The analyzer 122
may determine a second delay loss rate based on the count of delay
loss packets and the received packets count. For example, the
second delay loss rate may correspond to a measure (e.g., a ratio)
of the count of delay loss packets and the received packets count.
To illustrate, the second delay loss rate may indicate an average
number of delay loss packets (e.g., packets that are received
subsequent to processing of partial copies of the packets) during a
particular time interval. The second delay loss rate may indicate
network jitter during the particular time interval. A difference
between the previous delay loss rate and the second delay loss rate
may indicate a variation in delay of received packets. The
difference between the previous delay loss rate and the second
delay loss rate may indicate whether the average number of delay
loss packets is increasing or decreasing.
[0069] The analyzer 122 may determine a delay loss rate based on
the previous delay loss rate and the second delay loss rate. For
example, the delay loss rate may correspond to a weighted sum of
the previous delay loss rate and the second delay loss rate. The
analyzer 122 may assign a first weight (e.g., 0.75) to the previous
delay loss rate and a second weight (e.g., 0.25) to the second
delay loss rate. The first weight may be the same as or distinct
from the second weight. In a particular implementation, the first
weight may be higher than the second weight. Determining the delay
loss rate based on the weighted sum of the previous delay loss rate
and the second delay loss rate may reduce oscillation in the delay
loss rate based on temporary network conditions.
[0070] For example, bundling of packets may cause a large number of
packets (e.g., 3) to arrive at the same time followed by no packet
arrivals during a subsequent interval. The second delay loss rate
may fluctuate from a first time to a second time because the second
delay loss rate determined at the first time may correspond to an
interval during which a large number of packets is received and the
second delay loss rate determined at the second time may correspond
to an interval with no packet arrivals. Determining the delay loss
rate based on the weighted sum of the previous delay loss rate and
the second delay loss rate may reduce an effect of packet bundling
on the delay loss rate.
[0071] The analyzer 122 may increase the buffer delay by an
increment amount (e.g., 20 milliseconds) in response to determining
that the delay loss rate fails to satisfy (e.g., is less than) a
target delay loss rate (e.g., 0.01). For example, the target delay
loss rate may correspond to a first percent (e.g., 1 percent) of
delay loss packets relative to received packets. The analyzer 122
may decrease the buffer delay by a decrement amount (e.g., 20
milliseconds) in response to determining that the delay loss rate
satisfies (e.g., is greater than) the target delay loss rate, that
the delay loss rate is greater than or equal to the previous delay
loss rate, or both. The decrement amount, the increment amount, the
target delay loss rate, or a combination thereof, may include
default values, may be based on user input from the first user 152,
or both. The decrement amount may be the same as or distinct from
the increment amount.
[0072] The analyzer 122 may set the buffer delay to a maximum of
the buffer delay and a delay lower limit (e.g., 20 milliseconds).
For example, the analyzer 122 may set the buffer delay to the delay
lower limit in response to determining that the buffer delay is
lower than the delay lower limit. The analyzer 122 may set the
buffer delay to a minimum of the buffer delay and a delay upper
limit (e.g., 80 milliseconds). For example, the analyzer 122 may
set the buffer delay to the delay upper limit in response to
determining that the buffer delay exceeds the delay upper limit.
The delay lower limit, the delay upper limit, or both, may be
default values, may be based on user input from the first user 152,
or both.
[0073] The system 100 of FIG. 1 may enable partial recovery of data
of a lost packet without retransmission of the lost packet. For
example, the analyzer 122 may dynamically adjust the delay of the
buffer 126 based on the partial frame recovery rate at the
destination device 102 and based on the frame erasure rate at the
destination device 102 to increase the likelihood that a partial
copy of a lost packet is in the buffer 126 when the speech decoder
156 attempts to decode the lost packet. The techniques described
with respect to FIG. 1 may enable the playout delay to be adapted
based on the partial frame recovery rate and based on the frame
erasure rate. Late partial frames may also be used in the playout
delay determination. For example, late partial frames may be used
to compute the jitter, the delay loss rate, or both. According to
some implementations, late partial frames within a particular
duration after a corresponding playout time may be used to compute
the jitter, the delay loss rate, or both.
[0074] Referring to FIG. 2, a particular illustrative
implementation of a method for adjusting a delay of a buffer at a
receiving terminal is disclosed and generally designated 200. In a
particular implementation, the method 200 may be performed by the
analyzer 122 of FIG. 1. FIG. 2 illustrates adjustment of the buffer
depth 110 of FIG. 1 based on the partial frame recovery rate, the
frame erasure rate, or both. For example, the adjustment (D) of the
buffer depth 110 may be a function (f) of the partial frame
recovery rate (PFRR) and the frame erasure rate (FER) (e.g.,
D=f(PFRR, FER)).
[0075] The method 200 includes receiving, by a receiver, an encoded
speech frame R(N) at time N, at 202. For example, the receiver 124
of FIG. 1 may receive a particular packet corresponding to a
particular audio frame of the input speech signal 130, as described
with reference to FIG. 1.
[0076] The method 200 also includes determining whether a next
speech frame R(N-D) is available in a buffer, at 204. For example,
the analyzer 122 may determine whether a next packet is stored in
the buffer 126, as described with reference to FIG. 1. The next
packet may have a next sequence number. In a particular
implementation, the analyzer 122 may determine the next sequence
number by incrementing a sequence number of a previously processed
packet. In an alternative implementation, the analyzer 122 may
determine the next sequence number based on a difference between a
sequence number of a most recently received packet (e.g., N) and
the buffer depth 110 (e.g., D). In this implementation, the buffer
depth 110 may indicate a maximum number of packets that are to be
stored in the buffer 126. The analyzer 122 may determine whether
the next packet (e.g., the first packet 132) corresponding to the
next sequence number is stored in the buffer 126.
[0077] The method 200 further includes, in response to determining
that the next speech frame R(N-D) is available in the jitter
buffer, at 204, providing the next speech frame R(N-D) to a speech
decoder, at 206. For example, the analyzer 122 may, in response to
determining that the next packet (e.g., the first packet 132) is
stored in the buffer 126, provide the first packet 132 to the
speech decoder 156, as described with reference to FIG. 1.
[0078] The method 200 also includes, in response to determining
that the next speech frame R(N-D) is unavailable in the jitter
buffer, at 204, determining whether a partial copy of the next
speech frame R(N-D) is available in the jitter buffer, at 208. For
example, the analyzer 122 of FIG. 1 may, in response to determining
that the first packet 132 is not stored in the buffer 126,
determine whether a partial copy of the first packet 132 is stored
in the buffer 126, as described with reference to FIG. 1. To
illustrate, the analyzer 122 may determine whether the second
packet 134 that has the first partial copy 174 is stored in the
buffer 126.
[0079] The method 200 further includes, in response to determining
that the partial copy of the next speech frame R(N-D) is available
in the jitter buffer, at 208, providing the partial copy of the
next speech frame R(N-D) to the speech decoder, at 206. For
example, the analyzer 122 of FIG. 1 may, in response to determining
that the second packet 134 is included in the buffer 126 and that
the second packet 134 includes the first partial copy 174 of the
first packet 132, provide the second packet 134 to the speech
decoder 156. In a particular implementation, the analyzer 122 may
provide the first partial copy 174 to the speech decoder 156.
[0080] The method 200 also includes, in response to determining
that the partial copy of the next speech frame R(N-D) is
unavailable in the jitter buffer, at 208, determining the "average"
partial frame recovery rate (PFRR), at 210, and determining the
"average" frame erasure rate (FER), at 220. According to some
implementations, the PFRR may be a primary determination and the
FER may be a secondary determination. For example, the
determination whether to increase the delay (e.g., the buffer
depth) may be based on a function of the PFRR or based on a
function of the PFRR and the FER. Thus, according to some
implementations, the method 200 of FIG. 2 may be modified such that
the PFRR is used as a primary (or sole) factor in the determination
whether to modify the delay and the FER is used a secondary factor
(if at all) in the determination whether to modify the delay. The
analyzer 122 may retrieve the packet recovery rate data 106 and
determine the partial frame recovery rate based on the packet
recovery rate data 106. The analyzer 122 may also retrieve the FER
data 154 and determine the frame erasure rate based on the FER data
154.
[0081] The method 200 further includes comparing the partial frame
recovery rate to the first threshold (T1), at 214. To illustrate,
the analyzer 122 may compare the partial frame recovery rate to the
packet recovery rate threshold 136 (e.g., the first threshold). If
the partial frame recovery rate is less than the first threshold
(T1), at 214, the depth of the buffer may be increased for the next
talk spurt, at 216. To illustrate, in response to the partial frame
recovery rate failing to satisfy the packet recovery rate threshold
136, the delay of the buffer 126 may be increased to store
additional packets, and thus to increase the partial frame recovery
rate. According to one implementation, the delay of the buffer 126
may have a maximum value to substantially limit effects on latency
regardless of the determination, at 214. If the partial frame
recovery rate is not less than the first threshold (T1), at 214,
the delay of the buffer may remain constant (or may be
decremented). It should be understood that comparing the partial
frame recovery rate to the first threshold (T1), at 214, is not to
be construed as a limiting, and other techniques may be used to
determine whether to increase the delay (e.g., the buffer size), as
described with respect to FIG. 3.
[0082] The method 200 further includes comparing the frame erasure
rate to the second threshold (T2), at 224. To illustrate, the
analyzer 122 may compare the frame erasure rate to the FER
threshold 138 (e.g., the second threshold). According to one
implementation, the FER threshold 138 may be based on an EVS
specification. For example, the FER threshold 138 may correspond to
the maximum frame erasure rate to maintain a communication session
according to the EVS specification. If the frame erasure rate is
greater than the second threshold (T2), at 214, the depth of the
buffer may be increased for the next talk spurt, at 226. To
illustrate, in response to the frame erasure rate satisfying the
FER threshold 138, the delay of the buffer 126 may be increased to
store additional packets, and thus to decrease the frame erasure
rate. If the frame erasure rate is not greater than the second
threshold (T2), at 224, the delay of the buffer may remain constant
(or may be decremented). According to some implementations, the
method 200 may include comparing the partial frame recover rate to
the first threshold (T1), at 214, and comparing the frame erasure
rate to the second threshold (T2), at 224. In response to the
partial frame recovery rate failing to satisfy the first threshold
(T1) and the frame erasure rate satisfying the second threshold
(T2), the delay of the buffer may be increased, at 218. It should
be understood that comparing the frame erasure rate to the second
threshold (T2), at 224, is not to be construed as a limiting, and
other techniques may be used to determine whether to increase the
delay (e.g., the buffer size), as described with respect to FIG.
3.
[0083] At 218, the depth (e.g., the size, the delay, or both) of
the buffer may be increased to by an adjustment amount (Dnew) in
response to the partial frame recovery rate failing to satisfy the
first threshold (T1), in response to the frame erasure rate
satisfying the second threshold (T2), or both. For example, the
analyzer 122 of FIG. 1 may adjust the buffer depth 110 based on the
adjustment amount (e.g., D.sub.new). The method 200 may proceed to
202.
[0084] The method 200 of FIG. 2 may enable partial recovery of data
of a lost packet without retransmission of the lost packet. For
example, the analyzer 122 may dynamically adjust the delay of the
buffer 126 based on the partial frame recovery rate at the
destination device 102 and based on the frame erasure rate at the
destination device 102 to increase the likelihood that a partial
copy of a lost packet is in the buffer 126 when the speech decoder
156 attempts to decode the lost packet.
[0085] Referring to FIG. 3, a flow chart of a particular
illustrative implementation of a method 300 of adjusting a delay of
a buffer at a receiving terminal is shown. The method 300 may be
performed by the destination device 102 of FIG. 1.
[0086] The method 300 includes determining, at a processor, a
partial frame recovery rate of lost frames at a receiving terminal,
at 302. For example, referring to FIG. 1, the analyzer 122 may
determine the partial frame recovery rate of lost frames at the
destination device 102 (e.g., the receiving terminal). To
illustrate, the analyzer 122 may retrieve the packet recovery rate
data 106 and determine the partial frame recovery rate based on the
packet recovery rate data 106. The packet recovery rate data 106
may indicate a rate at which lost packets are "recovered" by use of
partial copies. For example, if a particular packet (or frame)
transmitted by the source device 104 is lost during transmission
and a subsequent packet stored in the buffer 126 includes a partial
copy of the particular packet, the speech decoder 156 may use the
subsequent packet to "recover" (or regenerate) the particular
packet and the rate at which lost packets are recovered may
increase.
[0087] A frame erasure rate may be determined for frames received
at the receiving terminal, at 304. For example, referring to FIG.
1, the analyzer 122 may also determine a frame erasure rate for
frames received at the destination device 102. To illustrate, the
analyzer 122 may retrieve the FER data 154 and determine the frame
erasure rate based on the FER data 154. The FER data 154 may
indicate an error rate of packets (or frames) received by the
destination device 102. The error rate may be expressed as the
number of packets received with errors divided by the total number
of packets received.
[0088] A delay of a buffer may be adjusted based on the partial
frame recovery rate, at 306. For example, referring to FIG. 1, the
analyzer 122 may compare the partial frame recovery rate to the
packet recovery rate threshold 136 (e.g., the first threshold). In
response to the partial frame recovery rate failing to satisfy the
packet recovery rate threshold 136, the delay of the buffer 126 may
be increased to store additional packets, and thus to increase the
partial frame recovery rate.
[0089] According to one implementation, the delay may also be
adjusted based on the frame erasure rate (in addition to the
partial frame recovery rate). For example, referring to FIG. 1, the
analyzer 122 may compare the frame erasure rate to the FER
threshold 138 (e.g., the second threshold). According to one
implementation, the FER threshold 138 may be based on an EVS
specification. For example, the FER threshold 138 may correspond to
the maximum frame erasure rate to maintain a communication session
according to the EVS specification. In response to the frame
erasure rate satisfying the FER threshold 138, the delay of the
buffer 126 may be increased to store additional packets, and thus
to decrease the frame erasure rate.
[0090] According to one implementation, the increase in the delay
(or size) of the buffer 126 may be based on (e.g., may be a
function of) the frame erasure rate and based on the partial frame
recovery rate. As a non-limiting example, the delay may be
expressed as: [0091]
playout_time_of_partial_frame_N+k1*(max(X,FER_rate)-X)*[Y-min(Y,
partial_frame_recovery_rate).
[0092] According to the above equation, k1 may be a constant, Y may
correspond to a threshold partial copy recovery rate to start the
adjustment, and X may correspond to the threshold frame erasure
rate. According to one implementation, the delay of the buffer 126
may have a maximum value to substantially limit effects on
latency.
[0093] According to one implementation, the delay of the buffer may
correspond to a depth of the buffer. For example, as the depth of
the buffer increases, the delay of the buffer may also increase.
The delay of the buffer may be based on a function of the partial
frame recovery rate, a function of the frame erasure rate, or a
function of both. According to some implementations, the delay may
be further adjusted based on late arriving partial frames in
response to a determination that the partial frame recovery rate is
below a particular threshold. The delay may also be adjusted based
on late arriving partial frame. According to one implementation,
the late arriving partial frames may be used in determining jitter
associated with a wireless network. According to one
implementation, the late arriving partial frames may be used in
determining a delay loss rate.
[0094] According to one implementation, the method 300 may include
determining whether a first frame is stored at the buffer. The
first frame may be scheduled to be decoded during a first time
period. For example, referring to FIG. 1, during a first time
period, the analyzer 122 (or the speech decoder 156) may determine
whether the first packet 132 is stored at the buffer 126. The first
packet 132 may be scheduled to be decoded during the first time
period. The method 300 may also include polling the buffer for a
second frame during the first time period in response to a
determination that the first frame is not stored at the buffer
during the first time period. The second frame may include a
partial copy of the first frame. For example, referring to FIG. 1,
the analyzer 122 (or the speech decoder 156) may poll the buffer
126 for the second packet 134 during the first time period in
response to a determination that the first packet 132 is not stored
at the buffer 126 during the first time period. The second packet
134 may include a partial copy of the first packet 132. The method
300 may also include increasing the delay of the buffer in response
to a determination that the second frame is not stored at the
buffer during the first time period. For example, referring to FIG.
1, the delay of the buffer 126 may be increased in response to a
determination that the second packet 134 is not stored at the
buffer 126 during the first time period.
[0095] According to one implementation, the method 300 may include
determining whether a partial copy of a particular frame is stored
at the buffer in response to a determination that the particular
frame is lost. For example, referring to FIG. 1, the analyzer 122
(or the speech decoder 156) may determine whether a partial copy of
the first packet 132 is stored at the buffer 126 in response to a
determination that the first packet 132 is lost. To illustrate, the
analyzer 122 may determine whether the second packet 134 (that
includes a partial copy of the first packet 132) is stored at the
buffer 126. The method 300 may also include adjusting the delay of
the buffer in response to a determination that the partial copy of
the particular frame is stored at the buffer. For example,
referring to FIG. 1, the delay of the buffer 126 may be adjusted in
response to a determination that the second packet 134 is stored at
the buffer 126. The method 300 may also include determining whether
to adjust the delay of the buffer based on the partial frame
recovery rate, the frame error rate, or both, in response to a
determination that the partial copy of the particular frame is not
stored at the buffer. For example, the referring to FIG. 1,
analyzer 122 may determine whether to adjust the delay of the
buffer 126 based on the partial frame recovery rate, the frame
error rate, or both, in response to a determination the second
packet 134 is not stored at the buffer 126.
[0096] The partial frame recovery rate may be associated with the
delay of the buffer, and the delay of the buffer may be based at
least in part on jitter associated with (e.g., introduced by) a
wireless network (e.g., a VoLTE network or an Institute of
Electrical and Electronics Engineers (IEEE) 802.11 network) or a
delayed loss rate. According to one implementation, the jitter may
be measured. Jitter may be the distribution of the end-to-end delay
of a sequence of packets. As a non-limiting example, if the packets
arrive in a VoLTE network with a mean delay of 200 ms and a
standard deviation of 10 ms, the jitter may be determined to be
relatively low. Alternatively, if the standard deviation is
approximately 100 ms, the jitter may be determined to be relatively
high. According to another implementation, the jitter may be
determined based on the arrivals of packets out-of-sequence. As a
non-limiting example, if the out-of-sequence arrivals are greater
than a threshold (e.g., five percent), the jitter may be determined
to be relatively high. Alternatively, if the out-of-sequence
arrivals are less than the threshold, the jitter may be determined
to be relatively low. According to another implementation, the
jitter may be determined based on the delayed loss rate. For
example, if the number of packets that arrive after the
corresponding playout time is large, the jitter may be determined
to be relatively high. To illustrate, if the delayed loss rate is
less the 0.2 percent, the jitter may be determined to be relatively
low. If the delayed loss rate is greater than 0.2 percent, the
jitter may be determined to be relatively high.
[0097] Thus, the delay of the buffer may be based on jitter
associated with a wireless network. The jitter may be based in part
on late partial copies of lost primary frames. Additionally, the
delay of the buffer may be based on a delay loss rate. The delay
loss rate may be based in part on late partial copies of primary
lost frames.
[0098] According to one implementation, the method 300 may include
measuring jitter based on primary frames, based on useful partial
frames, and based on late partial copies. The late partial copies
may be used unconditionally, if the late partial copies arrive
within 20 ms of the corresponding playout time, or if the primary
frame has not arrived for delay loss computation. According to
another implementation, the method 300 may include measuring jitter
based on primary frames, based on useful partial frame, based on
late partial copies, and based on the partial frame recovery rate.
If the partial frame recovery rate is less than a first rate
recovery threshold (e.g., 70 percent), the playout delay may be
based on a function of the primary frames, the useful partial
frame, and all of the late partial copies. If the partial frame
recovery rate is less than a second rate recovery threshold, the
playout delay may be based on a function of the primary frames, the
useful partial frame, and late partial copies that arrive within a
particular time period (e.g., 30 ms) after the playout time.
[0099] According to yet another implementation, the method 300 may
include measuring jitter based on based on primary frames, based on
useful partial frame, based on late partial copies, based on the
partial frame recovery rate, and based on the frame erasure rate.
The playout delay may be determined (e.g., computed) based on
jitter measured from any of the above implementations.
[0100] According to some Jitter Buffer Management (JBM)
implementations, the method 300 may include determining a buffer
underflow rate (e.g., "late loss") at the receiving terminal. The
buffer underflow rate may indicate a rate that frames arrive at the
receiving terminal after corresponding playout times. To
illustrate, if the speech decoder 156 determines that a particular
packet (e.g., the first packet 132) arrives at the destination
device 102 after the playout time of the particular packet, the
buffer underflow rate increases. As used herein, the playout time
of the particular packet corresponds to a time period during which
the speech decoder 156 is configured decode the particular packet.
The method 300 may also include increasing a depth (or delay) of
the buffer if the buffer underflow rate satisfies a threshold. For
example, if the speech decoder 156 determines that the buffer
underflow rate satisfies a particular threshold, the speech decoder
156 may increase the size of the buffer 126 (e.g., increase the
delay). Increasing the delay may enable late arriving packets to be
decoded and processed.
[0101] The method 300 may also include adjusting a minimum depth of
the buffer or a maximum depth of the buffer based on the buffer
underflow rate. For example, the buffer 126 may have minimum depth
and a maximum depth (e.g., maximum delay). If the buffer underflow
rate satisfies the threshold, the minimum depth (e.g., the minimum
delay) of the buffer 126 may be increased to enable late arriving
packets to be decoded. For example, the analyzer 122 may adjust the
buffer depth 110 based on the adjustment amount described with
respect to FIG. 2 to increase the minimum depth of the buffer
126.
[0102] The method 300 may also include increasing implicit buffer
adaptation at the destination device 102 to reduce an occurrence of
underflows that is due to delayed packets. For example, the buffer
126 may attempt to provide a particular frame (Frame N) to the
speech decoder 156 for playback. Thus, the particular frame (Frame
N) may be the "next to play" frame. If the particular frame (Frame
N) is received after a playback time associated with the particular
frame (Frame N), an erasure may be provided to the speech decoder
156 for playback. If the speech decoder 156 requests another frame
to perform decoding operations and playback, the buffer 126
provides the particular frame (Frame N) (e.g., the next to play
frame) to the speech decoder 156 or a subsequent frame (Frame N+1)
(e.g., a "next to play plus one" frame) to the speech decoder 156.
In this scenario, if the particular frame (Frame N) is present in
the buffer 126, the particular frame (Frame N) is provided to the
speech decoder 156. However, if the particular frame (Frame N) is
not present, the subsequent frame (Frame N+1) is provided to the
speech decoder 156. If both frames are present, the frame having
the smaller sequence number (e.g., Frame N) is provided to the
speech decoder 156. If neither frame is present, another erasure
may be provided to the speech decoder 156. Thus, when an occurrence
of underflows is present, the buffer 126 may provide a frame out of
sequence of frames starting from (Frame N) to (Frame N+IBA.sub.max)
to the speech decoder 156.
[0103] The method 300 of FIG. 3 may enable partial recovery of data
of a lost packet without retransmission of the lost packet. For
example, the analyzer 122 may dynamically adjust the delay of the
buffer 126 based on the partial frame recovery rate at the
destination device 102 and based on the frame erasure rate at the
destination device 102 to increase the likelihood that a partial
copy of a lost packet is in the buffer 126 when the speech decoder
156 attempts to decode the lost packet.
[0104] The method 300 of FIG. 3 may be implemented by a
field-programmable gate array (FPGA) device, an
application-specific integrated circuit (ASIC), a processing unit
such as a central processing unit (CPU), a digital signal processor
(DSP), a controller, another hardware device, firmware device, or
any combination thereof. As an example, the method 300 of FIG. 3
may be performed by a processor that executes instructions, as
described with respect to FIG. 4.
[0105] Referring to FIG. 4, a block diagram of a particular
illustrative implementation of a device (e.g., a wireless
communication device) is depicted and generally designated 400. In
various implementations, the device 400 may have more or fewer
components than illustrated in FIG. 4. In an illustrative
implementation, the device 400 may correspond to the destination
device 102, the source device 104 of FIG. 1, or both. In an
illustrative implementation, the device 400 may perform one or more
operations described with reference to FIGS. 1-3.
[0106] In a particular implementation, the device 400 includes a
processor 406 (e.g., a central processing unit (CPU). The device
400 may include one or more additional processors 410 (e.g., one or
more digital signal processors (DSPs)). The processors 410 may
include a speech and music coder-decoder (CODEC) 408 and an echo
canceller 412. The speech and music codec 408 may include a vocoder
encoder 436, a vocoder decoder 438, or both.
[0107] The device 400 may include the memory 176 and a CODEC 434.
The memory 176 may include the analysis data 120. The device 400
may include a wireless controller 440 coupled, via a transceiver
450, to an antenna 442. In a particular implementation, the
transceiver 450 may include the receiver 124, the transmitter 192,
or both, of FIG. 1.
[0108] The device 400 may include a display 428 coupled to a
display controller 426. The speaker 142 of FIG. 1, a microphone
446, or both, may be coupled to the CODEC 434. The CODEC 434 may
include a digital-to-analog converter 402 and an analog-to-digital
converter 404. In an illustrative implementation, the microphone
446 may correspond to the microphone 146 of FIG. 1. In a particular
implementation, the CODEC 434 may receive analog signals from the
microphone 446, convert the analog signals to digital signals using
the analog-to-digital converter 404, and provide the digital
signals to the speech and music codec 408. The speech and music
codec 408 may process the digital signals. In a particular
implementation, the speech and music codec 408 may provide digital
signals to the CODEC 434. The CODEC 434 may convert the digital
signals to analog signals using the digital-to-analog converter 402
and may provide the analog signals to the speaker 142.
[0109] The device 400 may include the analyzer 122, the buffer 126,
the speech decoder 156, or a combination thereof. In a particular
implementation, the analyzer 122, the speech decoder 156, or both,
may be included in the processor 406, the processors 410, the CODEC
434, the speech and music codec 408, or a combination thereof. In a
particular implementation, the analyzer 122, the speech decoder
156, or both, may be included in the vocoder encoder 436, the
vocoder decoder 438, or both. In a particular implementation, the
speech decoder 156 may be functionally identical to the vocoder
decoder 438. The speech decoder 156 may correspond to dedicated
hardware circuitry outside the processors 410 (e.g., the DSPs).
[0110] The analyzer 122, the buffer 126, the speech decoder 156, or
a combination thereof, may be used to implement a hardware
implementation of the buffer depth adjustment techniques described
herein. Alternatively, or in addition, a software implementation
(or combined software/hardware implementation) may be implemented.
For example, the memory 176 may include instructions 456 executable
by the processors 410 or other processing unit of the device 400
(e.g., the processor 406, the CODEC 434, or both). The instructions
456 may correspond to the analyzer 122, the speech decoder 156, or
both.
[0111] In a particular implementation, the device 400 may be
included in a system-in-package or system-on-chip device 422. In a
particular implementation, the analyzer 122, the buffer 126, the
speech decoder 156, the memory 176, the processor 406, the
processors 410, the display controller 426, the CODEC 434, and the
wireless controller 440 are included in a system-in-package or
system-on-chip device 422. In a particular implementation, an input
device 430 and a power supply 444 are coupled to the system-on-chip
device 422. Moreover, in a particular implementation, as
illustrated in FIG. 4, the display 428, the input device 430, the
speaker 142, the microphone 446, the antenna 442, and the power
supply 444 are external to the system-on-chip device 422. In a
particular implementation, each of the display 428, the input
device 430, the speaker 142, the microphone 446, the antenna 442,
and the power supply 444 may be coupled to a component of the
system-on-chip device 422, such as an interface or a
controller.
[0112] The device 400 may include a mobile communication device, a
smart phone, a cellular phone, a laptop computer, a computer, a
tablet, a personal digital assistant, a display device, a
television, a gaming console, a music player, a radio, a digital
video player, a digital video disc (DVD) player, a tuner, a camera,
a navigation device, or any combination thereof.
[0113] In conjunction with the described implementations, an
apparatus may include means for determining a partial frame
recovery rate of lost frames at a receiving terminal. For example,
the means for determining the partial frame recovery rate may
include the analyzer 122 of FIG. 1, the memory 176 of FIG. 1, the
packet recovery rate data 106 of FIG. 1, the analysis data 120 of
FIG. 1, the speech decoder 156 of FIG. 1, the processor 406 of FIG.
4, the processor(s) 410 of FIG. 4, the CODEC 434 of FIG. 4, vocoder
decoder 438 of FIG. 4, or a combination thereof.
[0114] The apparatus may also include means for determining a frame
erasure rate for frames received at the receiving terminal. For
example, the means for determining the frame erasure rate may
include the analyzer 122 of FIG. 1, the memory 176 of FIG. 1, the
FER data 154 of FIG. 1, the analysis data 120 of FIG. 1, the speech
decoder 156 of FIG. 1, the processor 406 of FIG. 4, the
processor(s) 410 of FIG. 4, the CODEC 434 of FIG. 4, vocoder
decoder 438 of FIG. 4, or a combination thereof.
[0115] The apparatus may also include means for comparing the
partial frame recovery rate to a first threshold. For example, the
means for comparing the partial frame rate to the first threshold
may include the analyzer 122 of FIG. 1, the memory 176 of FIG. 1,
the packet recovery rate data 106 of FIG. 1, the packet recovery
rate threshold 136 of FIG. 1, the analysis data 120 of FIG. 1, the
speech decoder 156 of FIG. 1, the processor 406 of FIG. 4, the
processor(s) 410 of FIG. 4, the CODEC 434 of FIG. 4, vocoder
decoder 438 of FIG. 4, or a combination thereof.
[0116] The apparatus may also include means for comparing the frame
erasure rate to a second threshold. For example, the means for
comparing the frame erasure rate to the second threshold may
include the analyzer 122 of FIG. 1, the memory 176 of FIG. 1, the
FER data 154 of FIG. 1, the FER threshold 138 of FIG. 1, the
analysis data 120 of FIG. 1, the speech decoder 156 of FIG. 1, the
processor 406 of FIG. 4, the processor(s) 410 of FIG. 4, the CODEC
434 of FIG. 4, vocoder decoder 438 of FIG. 4, or a combination
thereof.
[0117] The apparatus may also include means for adjusting a delay
of a buffer based on the partial frame recovery rate and based on
the frame erasure rate. For example, the means for adjusting the
delay of the buffer may include the analyzer 122 of FIG. 1, the
memory 176 of FIG. 1, the FER data 154 of FIG. 1, the packet
recovery rate threshold 136 of FIG. 1, the buffer depth data 110 of
FIG. 1, the buffer 126 of FIG. 1, the analysis data 120 of FIG. 1,
the speech decoder 156 of FIG. 1, the processor 406 of FIG. 4, the
processor(s) 410 of FIG. 4, the CODEC 434 of FIG. 4, vocoder
decoder 438 of FIG. 4, or a combination thereof.
[0118] Referring to FIG. 5, a block diagram of a particular
illustrative example of a base station 500 is depicted. In various
implementations, the base station 500 may have more components or
fewer components than illustrated in FIG. 5. In an illustrative
example, the base station 500 includes the destination device 102
of FIG. 1. In an illustrative example, the base station 500 may
operate according to the techniques described with reference to
FIGS. 1-4.
[0119] The base station 500 may be part of a wireless communication
system. The wireless communication system may include multiple base
stations and multiple wireless devices. The wireless communication
system may be a Long Term Evolution (LTE) system, a Code Division
Multiple Access (CDMA) system, a Global System for Mobile
Communications (GSM) system, a wireless local area network (WLAN)
system, or some other wireless system. A CDMA system may implement
Wideband CDMA (WCDMA), CDMA 1X, Evolution-Data Optimized (EVDO),
Time Division Synchronous CDMA (TD-SCDMA), or some other version of
CDMA.
[0120] The wireless device may also be referred to as user
equipment (UE), a mobile station, a terminal, an access terminal, a
subscriber unit, a station, etc. The wireless device may include a
cellular phone, a smartphone, a tablet, a wireless modem, a
personal digital assistant (PDA), a handheld device, a laptop
computer, a smartbook, a netbook, a tablet, a cordless phone, a
wireless local loop (WLL) station, a Bluetooth device, etc. The
wireless device may include or correspond to the device 400 of FIG.
4.
[0121] Various functions may be performed by one or more components
of the base station 500 (and/or in other components not shown),
such as sending and receiving messages and data (e.g., audio data).
In a particular example, the base station 500 includes a processor
506 (e.g., a CPU). The base station 500 includes a transcoder 510.
The transcoder 510 includes an audio CODEC 508. For example, the
transcoder 510 may include one or more components (e.g., circuitry)
configured to perform operations of the audio CODEC 508. As another
example, the transcoder 510 is configured to execute one or more
computer-readable instructions to perform the operations of the
audio CODEC 508. Although the audio CODEC 508 is illustrated as a
component of the transcoder 510, in other examples one or more
components of the audio CODEC 508 may be included in the processor
506. For example, a decoder 538 (e.g., a vocoder decoder) may be
included in a receiver data processor 564. As another example, an
encoder 536 (e.g., a vocoder encoder) may be included in a
transmission data processor 582. The audio CODEC 508 includes the
encoder 536 and the decoder 538. The decoder 538 includes the
speech decoder 156 of FIG. 1.
[0122] The transcoder 510 may function to transcode messages and
data between two or more networks. The transcoder 510 may be
configured to convert message and audio data from a first format
(e.g., a digital format) to a second format. To illustrate, the
decoder 538 may decode encoded signals having a first format and
the encoder 536 may encode the decoded signals into encoded signals
having a second format. Additionally or alternatively, the
transcoder 510 may be configured to perform data rate adaptation.
For example, the transcoder 510 may down-convert a data rate or
up-convert the data rate without changing a format of the audio
data. To illustrate, the transcoder 510 may down-convert 64 kbit/s
signals into 16 kbit/s signals.
[0123] The base station 500 includes a memory 532, such as a
computer-readable storage device, that includes instructions. The
instructions may include one or more instructions that are
executable by the processor 506, the transcoder 510, or a
combination thereof, to perform one or more operations described
with reference to the methods and systems of FIGS. 1-4. For
example, the instructions may cause the processor 506 to perform
operations including determining a partial frame recovery rate of
lost frames at the base station 500 and adjusting the delay of a
buffer based at least in part on the partial frame recovery rate.
The base station 500 may include multiple transmitters and
receivers (e.g., transceivers), such as a first transceiver 552 and
a second transceiver 554, coupled to an array of antennas. The
array of antennas includes a first antenna 542 and a second antenna
544. The array of antennas may be configured to wirelessly
communicate with one or more wireless devices, such as the device
400 of FIG. 4. For example, the second antenna 544 may receive a
data stream 514 (e.g., a bit stream) from a wireless device. The
data stream 514 may include messages, data (e.g., encoded speech
data), or a combination thereof.
[0124] The base station 500 includes a network connection 560, such
as a backhaul connection. The network connection 560 may be
configured to communicate with a core network or one or more base
stations of the wireless communication network. For example, the
base station 500 may receive a second data stream (e.g., messages
or audio data) from a core network via the network connection 560.
The base station 500 may process the second data stream to generate
messages or audio data and provide the messages or the audio data
to one or more wireless devices via one or more antennas of the
array of antennas or to another base station via the network
connection 560. In a particular implementation, the network
connection 560 may be a wide area network (WAN) connection, as an
illustrative, non-limiting example. In some implementations, the
core network may include or correspond to a packet backbone
network.
[0125] The base station 500 includes a media gateway 570 that is
coupled to the network connection 560 and the processor 506. The
media gateway 570 may be configured to convert between media
streams of different telecommunications technologies. For example,
the media gateway 570 may convert between different transmission
protocols, different coding schemes, or both. To illustrate, the
media gateway 570 may convert from PCM signals to Real-Time
Transport Protocol (RTP) signals, as an illustrative, non-limiting
example. The media gateway 570 may convert data between packet
switched networks (e.g., a Voice Over Internet Protocol (VoIP)
network, an IP Multimedia Subsystem (IMS), a fourth generation (4G)
wireless network, such as LTE, WiMax, and UMB, etc.), circuit
switched networks, and hybrid networks (e.g., a second generation
(2G) wireless network, such as GSM, GPRS, and EDGE, a third
generation (3G) wireless network, such as WCDMA, EV-DO, and HSPA,
etc.).
[0126] Additionally, the media gateway 570 includes a transcoder
and may be configured to transcode data when codecs are
incompatible. For example, the media gateway 570 may transcode
between an Adaptive Multi-Rate (AMR) codec and a G.711 codec, as an
illustrative, non-limiting example. The media gateway 570 may
include a router and a plurality of physical interfaces. In some
implementations, the media gateway 570 may also include a
controller (not shown). In a particular implementation, the media
gateway controller may be external to the media gateway 570,
external to the base station 500, or both. The media gateway
controller may control and coordinate operations of multiple media
gateways. The media gateway 570 may receive control signals from
the media gateway controller and may function to bridge between
different transmission technologies and may add service to end-user
capabilities and connections.
[0127] The base station 500 includes a demodulator 562 that is
coupled to the transceivers 552, 554, the receiver data processor
564, and the processor 506, and the receiver data processor 564 may
be coupled to the processor 506. The demodulator 562 may be
configured to demodulate modulated signals received from the
transceivers 552, 554 and to provide demodulated data to the
receiver data processor 564. The receiver data processor 564 may be
configured to extract a message or audio data from the demodulated
data and send the message or the audio data to the processor
506.
[0128] The base station 500 includes a transmission data processor
582 and a transmission multiple input-multiple output (MIMO)
processor 584. The transmission data processor 582 may be coupled
to the processor 506 and the transmission MIMO processor 584. The
transmission MIMO processor 584 may be coupled to the transceivers
552, 554 and the processor 506. In some implementations, the
transmission MIMO processor 584 may be coupled to the media gateway
570. The transmission data processor 582 may be configured to
receive the messages or the audio data from the processor 506 and
to code the messages or the audio data based on a coding scheme,
such as CDMA or orthogonal frequency-division multiplexing (OFDM),
as an illustrative, non-limiting examples. The transmission data
processor 582 may provide the coded data to the transmission MIMO
processor 584.
[0129] The coded data may be multiplexed with other data, such as
pilot data, using CDMA or OFDM techniques to generate multiplexed
data. The multiplexed data may then be modulated (i.e., symbol
mapped) by the transmission data processor 582 based on a
particular modulation scheme (e.g., Binary phase-shift keying
("BPSK"), Quadrature phase-shift keying ("QSPK"), M-ary phase-shift
keying ("M-PSK"), M-ary Quadrature amplitude modulation ("M-QAM"),
etc.) to generate modulation symbols. In a particular
implementation, the coded data and other data may be modulated
using different modulation schemes. The data rate, coding, and
modulation for each data stream may be determined by instructions
executed by the processor 506.
[0130] The transmission MIMO processor 584 may be configured to
receive the modulation symbols from the transmission data processor
582 and may further process the modulation symbols and may perform
beamforming on the data. For example, the transmission MIMO
processor 584 may apply beamforming weights to the modulation
symbols. The beamforming weights may correspond to one or more
antennas of the array of antennas from which the modulation symbols
are transmitted.
[0131] During operation, the second antenna 544 of the base station
500 may receive a data stream 514. The second transceiver 554 may
receive the data stream 514 from the second antenna 544 and may
provide the data stream 514 to the demodulator 562. The demodulator
562 may demodulate modulated signals of the data stream 514 and
provide demodulated data to the receiver data processor 564. The
receiver data processor 564 may extract audio data from the
demodulated data and provide the extracted audio data to the
processor 506.
[0132] The processor 506 may provide the audio data to the
transcoder 510 for transcoding. The decoder 538 of the transcoder
510 may decode the audio data from a first format into decoded
audio data and the encoder 536 may encode the decoded audio data
into a second format. In some implementations, the encoder 536 may
encode the audio data using a higher data rate (e.g., up-convert)
or a lower data rate (e.g., down-convert) than received from the
wireless device. In other implementations, the audio data may not
be transcoded. Although transcoding (e.g., decoding and encoding)
is illustrated as being performed by a transcoder 510, the
transcoding operations (e.g., decoding and encoding) may be
performed by multiple components of the base station 500. For
example, decoding may be performed by the receiver data processor
564 and encoding may be performed by the transmission data
processor 582. In other implementations, the processor 506 may
provide the audio data to the media gateway 570 for conversion to
another transmission protocol, coding scheme, or both. The media
gateway 570 may provide the converted data to another base station
or core network via the network connection 560.
[0133] The decoder 438 may determine a partial frame recovery rate
of lost frames at the base station 500 (e.g., a receiving
terminal). The decoder 538 may also adjust the delay of the buffer
based at least in part on the partial frame recovery rate.
[0134] The transcoded audio data from the transcoder 510 may be
provided to the transmission data processor 582 for coding
according to a modulation scheme, such as OFDM, to generate the
modulation symbols. The transmission data processor 582 may provide
the modulation symbols to the transmission MIMO processor 584 for
further processing and beamforming. The transmission MIMO processor
584 may apply beamforming weights to the modulation symbols and
provide the resulting signals to one or more antennas of the array
of antennas, such as the first antenna 542 via the first
transceiver 552. Thus, the base station 500 may provide a
transcoded data stream 516, that corresponds to the data stream 514
received from the wireless device, to another wireless device. The
transcoded data stream 516 may have a different encoding format,
data rate, or both, than the data stream 514. In other
implementations, the transcoded data stream 516 may be provided to
the network connection 560 for transmission to another base station
or a core network.
[0135] Those of skill would further appreciate that the various
illustrative logical blocks, configurations, modules, circuits, and
algorithm steps described in connection with the implementations
disclosed herein may be implemented as electronic hardware,
computer software executed by a processor, or combinations of both.
Various illustrative components, blocks, configurations, modules,
circuits, and steps have been described above generally in terms of
their functionality. Whether such functionality is implemented as
hardware or processor executable instructions depends upon the
particular application and design constraints imposed on the
overall system. Skilled artisans may implement the described
functionality in varying ways for each particular application, such
implementation decisions are not to be interpreted as causing a
departure from the scope of the present disclosure.
[0136] The steps of a method or algorithm described in connection
with the implementations disclosed herein may be embodied directly
in hardware, in a software module executed by a processor, or in a
combination of the two. A software module may reside in random
access memory (RAM), flash memory, read-only memory (ROM),
programmable read-only memory (PROM), erasable programmable
read-only memory (EPROM), electrically erasable programmable
read-only memory (EEPROM), registers, hard disk, a removable disk,
a compact disc read-only memory (CD-ROM), or any other form of
non-transient storage medium known in the art. An exemplary storage
medium is coupled to the processor such that the processor may read
information from, and write information to, the storage medium. In
the alternative, the storage medium may be integral to the
processor. The processor and the storage medium may reside in an
application-specific integrated circuit (ASIC). The ASIC may reside
in a computing device or a user terminal. In the alternative, the
processor and the storage medium may reside as discrete components
in a computing device or user terminal.
[0137] The previous description of the disclosed implementations is
provided to enable a person skilled in the art to make or use the
disclosed implementations. Various modifications to these
implementations will be readily apparent to those skilled in the
art, and the principles defined herein may be applied to other
implementations without departing from the scope of the disclosure.
Thus, the present disclosure is not intended to be limited to the
implementations shown herein and is to be accorded the widest scope
possible consistent with the principles and novel features as
defined by the following claims.
* * * * *