U.S. patent number 9,135,925 [Application Number 12/529,239] was granted by the patent office on 2015-09-15 for apparatus and method of enhancing quality of speech codec.
This patent grant is currently assigned to ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE. The grantee listed for this patent is Do-Young Kim, Byung-Sun Lee, Mi-Suk Lee. Invention is credited to Do-Young Kim, Byung-Sun Lee, Mi-Suk Lee.
United States Patent |
9,135,925 |
Lee , et al. |
September 15, 2015 |
Apparatus and method of enhancing quality of speech codec
Abstract
An apparatus and method of improving the quality of a speech
codec are provided. In the method, a first energy of a signal
decoded by a core codec is calculated, and a second energy of a
signal decoded by a low-band enhancement mode is calculated. Then,
when the first energy is less than a first threshold value or less
than a product of the second energy and a second threshold value, a
size of the decoded signal is scaled. Accordingly, generation of a
quantization error with respect to a silence segment is
reduced.
Inventors: |
Lee; Mi-Suk (Daejeon,
KR), Kim; Do-Young (Daejeon, KR), Lee;
Byung-Sun (Daejeon, KR) |
Applicant: |
Name |
City |
State |
Country |
Type |
Lee; Mi-Suk
Kim; Do-Young
Lee; Byung-Sun |
Daejeon
Daejeon
Daejeon |
N/A
N/A
N/A |
KR
KR
KR |
|
|
Assignee: |
ELECTRONICS AND TELECOMMUNICATIONS
RESEARCH INSTITUTE (Daejeon, KR)
|
Family
ID: |
40990094 |
Appl.
No.: |
12/529,239 |
Filed: |
November 28, 2008 |
PCT
Filed: |
November 28, 2008 |
PCT No.: |
PCT/KR2008/007024 |
371(c)(1),(2),(4) Date: |
August 31, 2009 |
PCT
Pub. No.: |
WO2009/072777 |
PCT
Pub. Date: |
June 11, 2009 |
Prior Publication Data
|
|
|
|
Document
Identifier |
Publication Date |
|
US 20100057449 A1 |
Mar 4, 2010 |
|
Foreign Application Priority Data
|
|
|
|
|
Dec 6, 2007 [KR] |
|
|
10-2007-0126371 |
Jan 28, 2008 [KR] |
|
|
10-2008-0008590 |
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G10L
21/0208 (20130101); G10L 21/0316 (20130101); H03G
3/341 (20130101); G10L 19/24 (20130101); G11B
20/10527 (20130101); G11B 2020/10555 (20130101); G10L
21/038 (20130101) |
Current International
Class: |
G10L
21/00 (20130101); G10L 21/0316 (20130101); G11B
20/10 (20060101); H03G 3/34 (20060101); G10L
21/0208 (20130101); G10L 19/24 (20130101); G10L
21/038 (20130101) |
Field of
Search: |
;704/227,200.1,223,233,230,206,203,210,219,262,205 ;381/23,119
;713/300 |
References Cited
[Referenced By]
U.S. Patent Documents
Foreign Patent Documents
|
|
|
|
|
|
|
1551516 |
|
Oct 2012 |
|
CN |
|
0 655 731 |
|
May 1995 |
|
EP |
|
1 475 782 |
|
Nov 2004 |
|
EP |
|
07-1523395 |
|
Jun 1995 |
|
JP |
|
07-193548 |
|
Jul 1995 |
|
JP |
|
07-307632 |
|
Nov 1995 |
|
JP |
|
08-046517 |
|
Feb 1996 |
|
JP |
|
11-330977 |
|
Nov 1999 |
|
JP |
|
3-437264 |
|
Aug 2003 |
|
JP |
|
2004-272292 |
|
Sep 2004 |
|
JP |
|
2004-302258 |
|
Oct 2004 |
|
JP |
|
10-0544731 |
|
Jan 2006 |
|
KR |
|
00/07178 |
|
Feb 2000 |
|
WO |
|
01/30049 |
|
Apr 2001 |
|
WO |
|
Other References
Roch Lefebvre, et al; "Shaping Coding Noise with Frequency-Domain
Companding", 1997 IEEE Workshop on Speech Coding for
Telecommunications Proceeding, Sep. 7-10, 1997, pp. 61-62. cited by
applicant .
International Search Report: PCT/KR2008/007024. cited by applicant
.
Yusuke Hiwasaki, et al; "A.G.711 Embedded Wideband Speech Coding
for VoIP Conferences", The Institute of Electronics, Information
and Communication Engineers, vol. E89-D, No. 9, Sep. 2006, pp.
2542-2552. cited by applicant .
Jongmo Sung, et al; "A draft recommendation of G.711WBE",
International Telecommunication Union, ITU-T WP3/16, Document
AC-0801-Q10-10, Jan. 2008, 68 pages. cited by applicant.
|
Primary Examiner: Colucci; Michael
Attorney, Agent or Firm: Ladas & Parry LLP
Claims
The invention claimed is:
1. An apparatus for improving speech signal of a speech codec
comprising, wherein the speech codec includes a low-band codec: an
energy calculation unit configured to calculate an energy of the
speech signal decoded by the low-band codec: a scaling unit
configured to scale the size of the decoded speech signal by a gain
that is less than 1; and wherein the gain is based on energies of
current and previous frames of the low-band decoded speech signal,
wherein the scaling unit determines whether to perform scaling
based on the power of a low-band and high-band decoded speech
signal, wherein the low-band decoded speech signal is a signal
decoded by a low-band codec when a low-band enhancement mode is
disabled, wherein the low-band codec uses a G.711 codec, wherein
the scaling unit performs the scaling in a silent segment, and
wherein the energy calculation unit calculates the energy of the
low-band decoded speech signal in units of frames, an energy of one
of the frames is calculated by summing energies of samples.
2. A method for improving speech signal of a speech codec
comprising, wherein the speech codec includes a low-band codec:
calculating an energy of the low-band speech signal decoded by the
low-band codec and low-band enhancement mode; scaling a size of the
decoded speech signal by a gain, when the gain is less than a
threshold value; and outputting decoded and scaled speech signal to
one or more of a transmitting unit, a memory unit and an audio
speaker unit, wherein the gain is based on energies of current and
previous frames of the low-band decoded speech signal, wherein the
scaling unit determines whether to perform scaling based on the
power of a low-band and high-band decoded speech signal, wherein
the low-band codec uses a G.711 codec, wherein the scaling is
performed in a silent segment, and wherein the calculating of the
energy calculates the energy of the decoded speech signal in units
of frames, an energy of one of the frames is calculated by summing
energies of samples.
3. A method for improving a speech signal of a speech codec
comprising: calculating an energy of the speech signal decoded by a
low-band codec; calculating an energy of the speech signal decoded
by a low-band enhancement mode; determining whether the energy
calculated for the speech signal decoded by the low-band codec is
less than a product of the energy calculated for the speech signal
decoded by the low-band enhancement mode and a predetermined
threshold value; and scaling the signal decoded by the low-band
codec.
Description
TECHNICAL FIELD
The present invention relates to a speech codec, and more
particularly, to an apparatus and method for reducing quality
degradation caused by an error in quantization of a silence segment
upon speech coding.
BACKGROUND ART
A module for compressing a speech signal is called an encoder, and
a module for decompressing a compressed speech signal is called a
decoder. The most basic speech codec is an ITU-T G.711 codec which
samples an input signal at 8 kHz and quantizes the sampled input
signal to 8 bits. Where, in order to increase quantization
efficiency, an A-law log quantizer as shown in Equation 1 or a
u-law log quantizer as shown in Equation 2 is used.
.function..function..times..times..ltoreq..ltoreq..function..function..fu-
nction..times..times..ltoreq..ltoreq..function..function..times..times..fu-
nction..times..function. ##EQU00001##
A conventional log quantizer as described above applies different
quantizing intervals according to the magnitudes of input signals.
For example, a relatively wide quantizing interval is set for a
signal having a small magnitude, and a relatively narrow quantizing
interval is set for a signal having a large magnitude, that is, a
signal highly likely to be generated. Accordingly, the efficiency
of quantization is increased.
It is well known that quantization noise is evenly distributed over
the entire bandwidth. However, according to the characteristics of
human hearing, a quantization error existing in a segment of a
signal having a large magnitude is not clearly heard as it is
buried in the signal, and a quantization error existing in a
segment of a signal having a small magnitude is easily heard as a
noise. Accordingly, not only a speech segment but also a silence
segment needs to be effectively coded because the coding of the
silence segment affects the overall performance of a codec. In
other words, noise caused by a quantization error in a silence
segment may affect the overall sound quality.
A codec may have different performances according to the magnitudes
of an input signal. In order to evaluate the performance of a
speech codec, signals having different magnitudes, for example,
signals of -16, -26, and -36 dBoV, are usually evaluated. In other
words, a codec evaluates how its performance varies according to a
change in the amplitude of an input signal.
In a codec such as G.711 or G.722, noise is generated due to a
quantization error with respect to an input signal of -36 dBoV. In
particular, a quantization error generated in a silence segment of
the input signal serves as a factor in reducing the overall quality
of the codec. Results of a subjective hearing test show that a mean
opinion score (MOS) with respect to the input signal of -26 dBoV is
higher than -36 dBoV.
DISCLOSURE OF INVENTION
Technical Problem
The present invention provides an apparatus and method of enhancing
the quality of a speech codec, by which sound quality can be
enhanced by reducing noise caused by a quantization error in a
silence segment during speech coding so that the noise is not heard
by a listener.
Technical Solution
According to an aspect of the present invention, there is provided
a speech codec quality improving apparatus comprising: a first
energy calculation unit calculating a first energy of a signal
decoded by a core codec; and a scaling unit scaling a size of the
decoded signal when the first energy is less than a first threshold
value.
According to another aspect of the present invention, there is
provided a speech codec quality improving method comprising:
calculating a first energy of a signal decoded by a core codec; and
scaling a size of the decoded signal when the first energy is less
than a first threshold value.
Advantageous Effects
According to the present invention, the quality of a speech codec
can be improved by reducing noises generated due to a quantization
error with respect to a mute section. In particular, sound quality
can be enhanced by reducing a quantization error generated in a
mute section, that is, in an input signal of a codec, which has a
small size.
DESCRIPTION OF DRAWINGS
The above and other features and advantages of the present
invention will become more apparent by describing in detail
exemplary embodiments thereof with reference to the attached
drawings in which:
FIG. 1 illustrates a wideband extension codec using a narrowband
core codec according to an embodiment of the present invention;
FIGS. 2A and 2B illustrate spectrums of an input signal and an
output signal of an encoder and a decoder, respectively that use a
G.711 codec;
FIG. 3 illustrates a structure of a speech codec quality improving
apparatus according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a speech codec quality improving
method according to an embodiment of the present invention; and
FIGS. 5A and 5B illustrate a spectrum of an output signal of a
decoder using a G.711 codec when a speech codec quality improving
method according to the present invention is applied and a spectrum
of the output signal of the decoder using the G.711 codec when the
speech codec quality improving method according to the present
invention is not applied.
BEST MODE
According to an aspect of the present invention, there is provided
a speech codec quality improving apparatus comprising: a first
energy calculation unit calculating a first energy of a signal
decoded by a core codec; and a scaling unit scaling a size of the
decoded signal when the first energy is less than a first threshold
value.
According to another aspect of the present invention, there is
provided a speech codec quality improving method comprising:
calculating a first energy of a signal decoded by a core codec; and
scaling a size of the decoded signal when the first energy is less
than a first threshold value.
Mode For Invention
An apparatus and method of improving the quality of a speech codec
according to the present invention will now be described more fully
with reference to the accompanying drawings, in which exemplary
embodiments of the invention are shown.
FIG. 1 illustrates a wideband extension codec using a narrowband
codec according to an embodiment of the present invention.
Referring to FIG. 1, the wideband extension codec is divided into a
transmission side 100 and a reception side 150. The transmission
side 100 includes a low-pass filter 105, a high-pass filter 110, a
narrowband core codec 115, a low-band enhancement mode 120, a
wideband extension mode 125, and a MUX 130. The reception side 150
includes a DEMUX 155, a narrowband core codec 160, a low-band
enhancement mode 165, a wideband extension mode 170, a low-pass
filter 175, and a high-pass filter 180.
A wideband input signal input to the transmission side 100 is
divided into a low-band signal and a high-band signal while passing
through the low-pass filter 105 and the high-pass filter 110,
respectively. The low-band signal is coded by the narrowband core
codec 115 and the low-band enhancement mode 120. The high-band
signal is coded by the wideband extension mode 125. The low-band
signal coded by the narrowband core codec 115 and the low-band
enhancement mode 120 and the high-pass signal coded by the wideband
extension mode 125, are output as a bitstream via the MUX 130.
The low-band enhancement mode 120 codes a part of the low-band
signal that has not been expressed by the narrowband core codec
115, thereby improving the quality of a narrowband signal. In
general, the low-band enhancement mode 120 determines an algorithm
that operates according to the narrowband core codec 115. However,
the low-band enhancement mode 120 mainly uses an algorithm that
operates in the time domain, and the wideband extension mode 125
uses an algorithm that operates in the frequency domain.
The DEMUX 155 of the reception side 150 receives the bitstream from
the transmission side 100 and outputs the bitstream to the
narrowband core codec 160, the low-band enhancement mode 165, and
the wideband extension mode 170. A determination as to whether the
low-band enhancement mode 165 and the wideband extension mode 170
operate is made according to the received bitstream.
The reception side 150 may output a wideband signal according to an
operation or non-operation of the wideband extension mode 170.
Regardless of the bandwidth of an output signal of the reception
side 150, the narrowband core codec 160 operates always. If only
the narrowband core codec 160 operates, the reception side 150 may
reproduce a basic narrowband signal. In order to reproduce a
narrowband signal of better quality, the low-band enhancement mode
165 as well as the narrowband core codec 160 needs to operate. In
addition, in order to output the wideband signal, both the
narrowband core codec 160 and the wideband extension mode 170 need
to operate. In other words, in order for the reception side 150 to
reproduce a wideband output signal, output signals of the
narrowband core codec 160 and the wideband extension mode 170 are
added together. Of course, in order to reproduce a wideband signal
of better quality, the reception side 150 adds the output signals
of the narrowband core codec 160 and the low-band enhancement mode
165 to the output signal of the wideband extension mode 170.
In the International Telecommunication Union Telecommunication
Standardization Sector (ITU-T), a standardization of a wideband
extension codec that uses a G.711 codec as a core codec and has a
structure similar to that of the wideband extension codec of FIG. 1
is in progress. In other words, a wideband extension codec based on
a G.711 codec recommended by the ITU-T uses the G.711 codec as the
narrowband core codecs 115 and 160, and can have such a structure
as illustrated in FIG. 1. However, in the case of a signal of
-36dBoV, the G.711 codec generates noise due to a quantization
error.
FIGS. 2A and 2B illustrate spectrums of an input signal and an
output signal of an encoder and a decoder, respectively that use a
G.711 codec. FIG. 2A illustrates a spectrum of a speech signal, and
FIG. 2B illustrates a spectrum of a silence signal.
Referring to FIG. 2A, in terms of a speech signal, the spectrum of
an input signal 200 of the encoder is almost the same as that of an
output signal 210 of the decoder. However, referring to FIG. 2B, in
terms of a silence signal, the spectrum of an input signal 230 of
the encoder is different from that of an output signal 220 of the
decoder. In other words, a speech segment has a small quantization
error, whereas a silence segment has a large quantization error.
This large quantization error is heard as noise by a listener.
FIG. 3 illustrates a structure of a speech codec quality improving
apparatus according to an embodiment of the present invention.
Referring to FIG. 3, the speech codec quality improving apparatus
includes a first energy calculation unit 300, a second energy
calculation unit 310, and a scaling unit 320.
The first energy calculation unit 300 calculates an energy of a
signal decoded by a core codec (hereinafter, referred to as an
energy of a core codec). The first energy calculation unit 300
calculates the energy of the core codec in units of frames. In the
case where the G.711 codec is used as the core codec, the size of a
frame may vary according to an environment where the G.711 codec is
used. In other words, the first energy calculation unit 300
calculates energy of one frame by summing the energies of
samples.
When a low-band enhancement mode is in operation, the second energy
calculation unit 310 calculates an energy of a signal decoded by a
low-band enhancement mode codec (hereinafter, referred to as an
energy of an enhancement mode).
When the energy of the core codec is less than a predetermined
threshold value Thr1, the scaling unit 320 scales the size of the
signal decoded by the core codec. When the energy of the core codec
is less than a product of the energy of the enhancement mode and a
predetermined threshold Thr2, the scaling unit 320 scales the size
of the signal decoded by the core codec. The scaling unit 320 may
scale the size of the decoded signal by a constant `a` that is less
than 1. Alternatively, the scaling unit 320 may perform scaling by
multiplying the decoded signal by a gain that is less than 1 and is
proportional to a sum of an energy of a current frame (i.e., an
energy of the core codec or enhancement mode) and a previous frame
(i.e., an energy of the core codec or enhancement mode), thereby
preventing a sudden change caused by scaling. In this case, the
scaling unit 320 may calculate a size of a current scaling in
consideration of a size of a previous scaling. In other words, the
scaling unit 320 may calculate the size of the current scaling by
adding a certain rate of a gain obtained based on the energies of
the current and previous frames to a certain rate of the size of
the previous scaling. Of course, scaling may be performed in units
of samples.
When the majority of the energy of the decoded signal exists in a
high band, the scaling unit 320 may not perform scaling. For
example, when the energy of the signal decoded by a wideband
extension mode is greater than the energy of the core codec or
enhancement mode by at least a predetermined value, the scaling
unit 320 does not perform scaling.
The threshold values Thr1 and Thr2, the scaling size `a,` etc.
which is used for scaling, may be calculated by experimentation. Of
course, these values may vary according to embodiments of the
present invention.
FIG. 4 is a flowchart illustrating a speech codec quality improving
method according to an embodiment of the present invention.
Referring to FIG. 4, in operation S400, a speech codec quality
improving apparatus (hereinafter, referred to as an apparatus)
according to the present invention calculates an energy of a signal
decoded by a core codec (hereinafter, referred to as an energy of
the core codec). The size of a frame may depend on the type of
codec and an environment in which a codec is applied. As
illustrated in FIG. 1, when the present invention is applied to a
wideband extension codec using a narrowband speech codec, a
low-band enhancement mode may exist. Accordingly, in operation
S410, the apparatus determines whether the low-band enhancement
mode is in operation.
If it is determined in operation S410 that the low-band enhancement
mode is in operation, the apparatus calculates the energy of the
signal decoded by the low-band enhancement mode (hereinafter,
referred to as an energy of an enhancement mode), in operation
S430. When the energy of the core codec is less than a product of
the energy of the enhancement mode and a predetermined threshold
value Thr1 or less than a predetermined threshold value Thr2 in
operation S440, the apparatus scales the size of the signal decoded
by the core codec by the constant `a`, which is less than 1, in
operation S450.
On the other hand, if it is determined in operation S410 that the
low-band enhancement mode is not in operation, it is determined
whether the energy of the core codec is less than the predetermined
threshold value Thr2, in operation S420. If it is determined in
operation S420 that the energy of the core codec is less than the
predetermined threshold value Thr2, the apparatus scales the
decoded signal, in operation S450. Scaling is performed by
multiplying the decoded signal by a gain that is less than 1 and is
proportional to a sum of an energy of a current frame (i.e., an
energy of the core codec or enhancement mode) and a previous frame
(i.e., an energy of the core codec or enhancement mode), thereby
preventing a sudden change caused by scaling. In this case, the
size of current scaling may be calculated by adding a certain rate
of a gain obtained based on the energies of the current and
previous frames to the size of the previous scaling.
As described above, the threshold values Thr1 and Thr2, the scaling
size, etc. are previously calculated by experimentation.
FIGS. 5A and 5B illustrate a spectrum of an output signal of a
decoder using a G.711 codec when a speech codec quality improving
method according to the present invention is applied and a spectrum
of the output signal of the decoder using the G.711 codec when the
speech codec quality improving method according to the present
invention is not applied. FIG. 5A illustrates spectrums of a speech
signal, and FIG. 5B illustrates spectrums of a mute signal.
Referring to FIG. 5A, in the case of a speech signal, a spectrum of
an output signal 500 of a decoder before the speech codec quality
improving method according to the present invention is applied is
consistent with a spectrum of an output signal 510 of the decoder
after the speech codec quality improving method according to the
present invention is applied.
Referring to FIG. 5B, in the case of a mute signal, a size of an
output signal 520 of the decoder before the speech codec quality
improving method according to the present invention is applied is
less than a size of an output signal 530 of the decoder after the
speech codec quality improving method according to the present
invention is applied. In other words, the level of the output
signal of the decoder in a mute section is decreased, leading to a
reduction in quantization error.
According to the present invention, the quality of a speech codec
can be improved by reducing noises generated due to a quantization
error with respect to a mute section. In particular, sound quality
can be enhanced by reducing a quantization error generated in a
mute section, that is, in an input signal of a codec, which has a
small size.
The invention can also be embodied as computer readable codes on a
computer readable recording medium. The computer readable recording
medium is any data storage device that can store data which can be
thereafter read by a computer system. Examples of the computer
readable recording medium include read-only memory (ROM),
random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks,
optical data storage devices, and carrier waves (such as data
transmission through the Internet). The computer readable recording
medium can also be distributed over network coupled computer
systems so that the computer readable code is stored and executed
in a distributed fashion.
While the present invention has been particularly shown and
described with reference to exemplary embodiments thereof, it will
be understood by those of ordinary skill in the art that various
changes in form and details may be made therein without departing
from the spirit and scope of the present invention as defined by
the following claims.
* * * * *