U.S. patent application number 11/912522 was filed with the patent office on 2009-03-26 for audio encoding device and audio encoding method.
This patent application is currently assigned to MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.. Invention is credited to Koji Yoshida.
Application Number | 20090083041 11/912522 |
Document ID | / |
Family ID | 37307977 |
Filed Date | 2009-03-26 |
United States Patent
Application |
20090083041 |
Kind Code |
A1 |
Yoshida; Koji |
March 26, 2009 |
AUDIO ENCODING DEVICE AND AUDIO ENCODING METHOD
Abstract
There is provided an audio encoding device capable of
effectively encoding a stereo audio even when a correlation between
channels of the stereo audio is small. In the device, a monaural
signal generation unit (110) generates a monaural signal by using a
first channel signal and a second channel signal contained in the
stereo signal. An encoding channel selection unit (120) selects one
of the first channel signal and the second channel signal. An
encoding unit including a monaural signal encoding unit (112), a
first channel encoding unit (122), a second channel encoding unit
(124), and a switching unit (126) encodes the generated monaural
signal to obtain core-layer encoded data and encodes the selected
channel signal to obtain extended layer encoded data corresponding
to the core-layer encoded data.
Inventors: |
Yoshida; Koji; (Kanagawa,
JP) |
Correspondence
Address: |
GREENBLUM & BERNSTEIN, P.L.C.
1950 ROLAND CLARKE PLACE
RESTON
VA
20191
US
|
Assignee: |
MATSUSHITA ELECTRIC INDUSTRIAL CO.,
LTD.
Osaka
JP
|
Family ID: |
37307977 |
Appl. No.: |
11/912522 |
Filed: |
April 27, 2006 |
PCT Filed: |
April 27, 2006 |
PCT NO: |
PCT/JP2006/308813 |
371 Date: |
October 25, 2007 |
Current U.S.
Class: |
704/500 |
Current CPC
Class: |
G10L 19/008
20130101 |
Class at
Publication: |
704/500 |
International
Class: |
G10L 21/00 20060101
G10L021/00 |
Foreign Application Data
Date |
Code |
Application Number |
Apr 28, 2005 |
JP |
2005-132366 |
Claims
1. A speech coding apparatus for encoding a stereo signal
comprising a first channel signal and a second channel signal, the
apparatus comprising: a monaural signal generating section that
generates a monaural signal using the first channel signal and the
second channel signal; a selecting section that selects one of the
first channel signal and the second channel signal; and a coding
section that encodes the generated monaural signal to obtain core
layer encoded data, and encodes the selected channel signal to
obtain enhancement layer encoded data corresponding to the core
layer encoded data.
2. The speech coding apparatus of claim 1, wherein: the selecting
section selects one of the first channel signal and the second
channel signal every frame; and the coding section encodes the
monaural signal and the channel signal selected every frame, every
frame.
3. The speech coding apparatus of claim 1, further comprising a
calculating section that calculates a first coding distortion
occurring when the first channel signal is selected and a second
coding distortion occurring when the second channel signal is
selected, wherein the selecting section selects the first channel
signal when the calculated first coding distortion is smaller than
the calculated second coding distortion, and selects the second
channel signal when the calculated second coding distortion is
smaller than the calculated first coding distortion.
4. The speech coding apparatus of claim 3, wherein the coding
section encodes the first channel signal and the second channel
signal to obtain first coded data and second coded data,
respectively, and outputs one of the first coded data and the
second coded data corresponding to the selected channel signal as
the enhancement layer encoded data, and comprises: an estimation
signal generating section that generates a second channel
estimation signal corresponding to the second channel using a
monaural decoded signal obtained when the coding section encodes
the monaural signal and a first channel decoded signal obtained
when the coding section encodes the first channel signal, and
generates a first channel estimation signal corresponding to the
first channel signal using the monaural decoded signal and a second
channel decoded signal obtained when the coding section encodes the
second channel signal; and a distortion calculating section that
calculates the first coding distortion based on error of the first
channel decoded signal with respect to the first channel signal and
error of the second channel estimation signal with respect to the
second channel signal, and calculates second coding distortion
based on error of the first channel estimation signal with respect
to the first channel signal and error of the second channel
decoding signal with respect to the second channel signal.
5. The speech coding apparatus of claim 1, wherein the selecting
section comprises a calculating section that calculates a first
intra-channel correlation corresponding to the first channel signal
and a second intra-channel correlation corresponding to the second
channel signal, and selects the first channel signal when the
calculated first intra-channel correlation is greater than the
calculated second intra-channel correlation, and selects the second
channel signal when the calculated second intra-channel correlation
is greater than the calculated first intra-channel correlation.
6. The speech coding apparatus of claim 1, wherein the coding
section carries out code excited linear prediction coding of the
first channel signal using a first adaptive codebook when the first
channel signal is selected by the selecting section, obtains the
enhancement layer encoded data using code excited linear prediction
coding results and updates the first adaptive codebook using the
code excited linear prediction coding results.
7. The speech coding apparatus of claim 6, wherein the coding
section generates a second channel estimation signal corresponding
to the second channel signal using the enhancement layer encoded
data and a monaural decoded signal obtained when the monaural
signal is encoded, and updates the second adaptive codebook used in
code excited linear prediction coding of the second channel signal
using an linear prediction coding prediction residual signal for
the second channel estimation signal.
8. The speech coding apparatus of claim 7, wherein: the selecting
section correlates the first channel signal to a frame having a
subframe and selects the first channel signal; and the coding
section obtains the enhancement layer encoded data for the frame
while carrying out excitation search every subframe for the
monaural signal and the first channel signal correlated with the
frame and selected.
9. The speech coding apparatus of claim 8, wherein the coding
section updates the first adaptive codebook per subframe and
updates the second adaptive codebook per frame.
10. A mobile station apparatus comprising the speech coding
apparatus of claim 1.
11. A base station apparatus comprising the speech coding apparatus
of claim 1.
12. A speech coding method for encoding a stereo signal comprising
a first channel signal and a second channel signal, comprising the
steps of: generating a monaural signal using the first channel
signal and the second channel signal; selecting one of the first
channel signal and the second channel signal; and encoding a
generated monaural signal to obtain core layer encoded data and
encoding a selected channel signal to obtain enhancement layer
encoded data corresponding to the core layer encoded data.
Description
TECHNICAL FIELD
[0001] The present invention relates to a speech coding apparatus
and a speech coding method. More particularly, the present
invention relates to a speech coding apparatus and a speech coding
method for stereo speech.
BACKGROUND ART
[0002] As broadband transmission in mobile communication and IP
communication has become the norm and services in such
communications have diversified, high sound quality of and
higher-fidelity speech communication is demanded. For example, from
now on, hands free speech communication in a video telephone
service, speech communication in video conferencing, multi-point
speech communication where a number of callers hold a conversation
simultaneously at a number of different locations and speech
communication capable of transmitting the background sound without
losing high-fidelity will be expected to be demanded. In this case,
it is preferred to implement speech communication by stereo speech
which has higher-fidelity than using a monaural signal, is capable
of recognizing positions where a number of callers are talking. To
implement speech communication using a stereo signal, stereo speech
encoding is essential.
[0003] Further, to implement traffic control and multicast
communication in speech data communication over an IP network,
speech encoding employing a scalable configuration is preferred. A
scalable configuration includes a configuration capable of decoding
speech data even from fragmentary encoded data at the receiving
side. Coding processing in a speech coding scheme employing a
scalable configuration is layered, providing a layer for the core
layer and a layer for the enhancement layer. Consequently, encoded
data generated by this coding processing includes encoded data of
the core layer and encoded data of the enhancement layer.
[0004] As a result, even when encoding and transmitting stereo
speech, it is preferable to implement encoding employing a
monaural-stereo scalable configuration where it is possible to
select decoding a stereo signal and decoding a monaural signal
using part of coded data at the receiving side.
[0005] Speech coding methods employing a monaural-stereo scalable
configuration include, for example, predicting signals between
channels (abbreviated appropriately as "ch") (predicting a second
channel signal from a first channel signal or predicting the first
channel signal from the second channel signal) using pitch
prediction between channels, that is, performing encoding utilizing
correlation between 2 channels (see Non-Patent Document 1).
Non-patent document 1: Ramprashad, S. A., "Stereophonic CELP coding
using cross channel prediction", Proc. IEEE Workshop on Speech
Coding, pp. 136-138, September 2000.
DISCLOSURE OF INVENTION
Problems to be Solved by the Invention
[0006] However, in the speech coding methods of the related art
described above, there are cases where a sufficient prediction
performance (prediction gain) cannot be obtained and coding
efficiency deteriorates when correlation between both channels is
small.
[0007] It is therefore an object of the present invention to
provide speech coding apparatus and a speech coding method capable
of effectively coding stereo speech even when correlation between
both channels is small.
Means for Solving the Problem
[0008] The speech coding apparatus of the present invention encodes
a stereo signal comprising a first channel signal and a second
channel signal, and employs a configuration having: a monaural
signal generating section that generates a monaural signal using
the first channel signal and the second channel signal; a selecting
section that selects one of the first channel signal and the second
channel signal; and a coding section that encodes the generated
monaural signal to obtain core layer encoded data, and encodes the
selected channel signal to obtain enhancement layer encoded data
corresponding to the core layer encoded data.
[0009] The speech coding method of the present invention for
encoding a stereo signal comprising a first channel signal and a
second channel signal, includes the steps of: generating a monaural
signal using the first channel signal and the second channel
signal; selecting one of the first channel signal and the second
channel signal; and encoding a generated monaural signal to obtain
core layer encoded data and encoding a selected channel signal to
obtain enhancement layer encoded data corresponding to the core
layer encoded data.
ADVANTAGEOUS EFFECT OF THE INVENTION
[0010] The present invention can encode stereo speech effectively
when correlation between a plurality of channel signals of stereo
speech signals is low.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 is a block diagram showing a configuration of speech
coding apparatus according to Embodiment 1 of the present
invention;
[0012] FIG. 2 is a block diagram showing a configuration of speech
decoding apparatus according to Embodiment 1 of the present
invention;
[0013] FIG. 3 is a block diagram showing a configuration of speech
coding apparatus according to Embodiment 2 of the present
invention;
[0014] FIG. 4 is a block diagram showing a configuration of speech
coding apparatus according to Embodiment 3 of the present
invention;
[0015] FIG. 5 is a block diagram showing a configuration of coding
channel selecting section according to Embodiment 3 the present
invention;
[0016] FIG. 6 is a block diagram showing a configuration of an A-ch
coding section according to Embodiment 3 of the present
invention;
[0017] FIG. 7 is a view illustrating an example of an updating
operation for an intra-channel prediction buffer of an A-channel
according to Embodiment 3 of the present invention;
[0018] FIG. 8 is a view illustrating an example of an updating
operation for an intra-channel prediction buffer of a B-channel
according to Embodiment 3 of the present invention;
[0019] FIG. 9 is a block diagram showing a configuration of speech
coding apparatus according to Embodiment 4 of the present
invention;
[0020] FIG. 10 is a block diagram showing a configuration of an
A-ch CELP coding section according to Embodiment 4 of the present
invention;
[0021] FIG. 11 is a flowchart showing an example of an adaptive
codebook updating operation according to Embodiment 4 of the
present invention;
[0022] FIG. 12 is a view illustrating an example of an operation
for updating an A-ch adaptive codebook according to Embodiment 4 of
the present invention; and
[0023] FIG. 13 is a view illustrating an example of an operation
for updating a B-ch adaptive codebook according to Embodiment 4 of
the present invention.
BEST MODE FOR CARRYING OUT THE INVENTION
[0024] The following is a detailed description with reference to
the appended drawings of embodiments of the present invention
relating to speech coding with a monaural-stereo scalable
configuration.
Embodiment 1
[0025] FIG. 1 is a block diagram showing a configuration of speech
coding apparatus according to Embodiment 1 of the present
invention. Speech coding apparatus 100 of FIG. 1 is comprised of
core layer coding section 102 that is a component corresponding to
the core layer of a scalable configuration, and enhancement layer
coding section 104 that is a component corresponding to the
enhancement layer of a scalable configuration. The following is a
description assuming that each component operates in frame
units.
[0026] Core layer coding section 102 has monaural signal generating
section 110 and monaural signal coding section 112. Further,
enhancement layer coding section 104 is comprised of coding channel
selecting section 120, first ch coding section 122, second ch
coding section 124 and switching section 126.
[0027] At core layer coding section 102, monaural signal generating
section 110 generates monaural signal s_mono(n) based on the
relationship shown in equation 1 from first ch input speech signal
s_ch1(n) and second ch input speech signal s_ch2(n) (where n=0 to
NF-1; and NF is frame length) contained in a stereo input speech
signal. The stereo signal described in this embodiment is comprised
of two channel signals (i.e. a first channel signal and a second
channel signal).
[1]
s_mono ( n ) = s_ch 1 ( n ) + s_ch 2 ( n ) 2 ( Equation 1 )
##EQU00001##
[0028] Monaural signal coding section 112 encodes monaural signal
s_mono(n) every frame. An arbitrary coding scheme may be used in
this encoding. Coded data obtained as a result of encoding monaural
signal s_mono (n) is outputted as core layer encoded data. More
specifically, core layer encoded data is multiplexed with
enhancement layer encoded data and coding channel selection
information described later and outputted from speech coding
apparatus 100 as coded transmission data.
[0029] Further, monaural signal coding section 112 decodes monaural
signal s_mono(n) and outputs the resulting monaural decoded speech
signal to first ch coding section 122 and second ch coding section
124 of enhancement layer coding section 104.
[0030] At enhancement layer coding section 104, coding channel
selecting section 120 selects an optimum channel of the first and
second channels as a channel to be subject to enhancement layer
coding based on a predetermined selection criterion using first ch
input speech signal s_ch1(n) and second ch input speech signal
s_ch2(n). The optimum channel is selected every frame. Here, the
predetermined selection criterion is a reference for implementing
enhancement layer coding at high efficiency or high sound quality
(low coding distortion). Coding channel selecting section 120
generates coding channel selection information indicating selected
channels. Generated coding channel selection information is
outputted to switching section 126 and is multiplexed with core
layer encoded data (described earlier) and enhancement layer
encoded data (described later).
[0031] Coding channel selecting section 120 may also use arbitrary
parameters, signals, or coding results (i.e. first ch encoded data
and second ch encoded data described later) obtained in coding
processes at first ch coding section 122 and second ch coding
section 124 rather than using first input speech signal s_ch1(n)
and second input speech signal s_ch2(n).
[0032] First ch coding section 122 encodes the first ch input
speech signal every frame using the first ch input speech signal
and the monaural decoded speech signal, and outputs first ch
encoded data obtained as a result to switching section 126.
[0033] Further, first ch coding section 122 decodes first ch
encoded data and obtains a first ch decoded speech signal. In this
embodiment, a first ch decoded speech signal obtained by first ch
coding section 122 is omitted from the drawings.
[0034] Second ch coding section 124 encodes the second ch input
speech signal every frame using the second ch input speech signal
and the monaural decoded speech signal and outputs second ch
encoded data obtained as a result to switching section 126.
[0035] Further, second ch coding section 124 decodes second ch
encoded data and obtains a second ch decoded speech signal. In this
embodiment, a second ch decoded speech signal obtained by second ch
coding section 124 is omitted from the drawings.
[0036] Switching section 126 selectively outputs one of first ch
encoded data and second ch encoded data every frame in accordance
with coding channel selection information. Outputted encoded data
is encoded data for channels selected by coding channel selecting
section 120. As a result, when the selected channel is switched
over from the first channel to the second channel, or from the
second channel to the first channel, encoded data outputted by
switching section 126 also changes from first ch encoded data to
second ch encoded data or from second ch encoded data to first ch
encoded data.
[0037] Here, a combination of monaural signal coding section 112,
first ch coding section 122, second ch coding section 124 and
switching section 126 described above together constitute a coding
section that encodes a monaural signal to obtain core layer encoded
data, encodes the selected channel signal, and obtains enhancement
layer encoded data corresponding to the core layer encoded
data.
[0038] FIG. 2 is a block diagram showing a configuration of speech
decoding apparatus capable of receiving and decoding transmitted
coded data outputted by speech coding apparatus 100 as received
coded data and obtaining a monaural decoded speech signal and a
stereo decoded speech signal. Speech decoding apparatus 150 of FIG.
2 is comprised of core layer decoding section 152 that is a
component corresponding to a core layer of a scalable
configuration, and enhancement layer decoding section 154 that is a
component corresponding to an enhancement layer of a scalable
configuration.
[0039] Core layer decoding section 152 has monaural signal decoding
section 160. Monaural signal decoding section 160 decodes core
layer encoded data contained in received coded data to obtain
monaural decoded speech signal sd_mono(n). Monaural decoded speech
signal sd_mono (n) is then outputted to a subsequent speech output
section (not shown), first ch decoding section 172, second ch
decoding section 174, first ch decoded signal generating section
176 and second ch decoded signal generating section 178.
[0040] Enhancement layer decoding section 154 is comprised of
switching section 170, first ch decoding section 172, second ch
decoding section 174, first ch decoded signal generating section
176, second ch decoded signal generating section 178, switching
section 180 and switching section 182.
[0041] Switching section 170 refers to coding channel selection
information contained in received coded data and outputs
enhancement layer encoded data contained in the received coded data
to a decoding section corresponding to the selected channel.
Specifically, when the selected channel is a first channel,
enhancement layer encoded data is outputted to first ch decoding
section 172, and, when the selected channel is a second channel,
enhancement layer encoded data is outputted to second ch decoding
section 174.
[0042] When enhancement layer encoded data is inputted from
switching section 170 to first ch decoding section 172, first ch
decoding section 172 decodes first ch decoded speech signal
sd_ch1(n) using this enhancement layer encoded data and monaural
decoded speech signal sd_mono(n) and outputs first ch decoded
speech signal sd_ch1(n) to switching section 180 and second ch
decoded signal generating section 178.
[0043] When enhancement layer encoded data is inputted from
switching section 170 to second ch decoding section 174, second ch
decoding section 174 decodes second ch decoded speech signal
sd_ch2(n) using this enhancement layer encoded data and monaural
decoded speech signal sd_mono(n) and outputs second ch decoded
speech signal sd_ch2(n) to switching section 182 and first ch
decoded signal generating section 176.
[0044] When second ch decoded speech signal sd_ch2(n) is inputted
from second ch decoding section 174, first ch decoded signal
generating section 176 generates first ch decoded speech signal
sd_ch1(n) based on the relationship shown in the following equation
2 using second ch decoded speech signal sd_ch2(n) and monaural
decoded speech signal sd_mono(n) inputted from second ch decoding
section 174. The generated first ch decoded speech signal sd_ch1(n)
is outputted to switching section 180.
[2]
sd_ch1(n)=2.times.sd_mono(n)-sd_ch2(n) (Equation 2)
[0045] When first ch decoded speech signal sd_ch1(n) is inputted
from first ch decoding section 172, second ch decoded signal
generating section 178 generates second ch decoded speech signal
sd_ch2(n) based on the relationship shown in the following equation
3 using first ch decoded speech signal sd_ch1(n) and monaural
decoded speech signal sd_mono (n) inputted from first ch decoding
section 172. The generated second ch decoded speech signal
sd_ch2(n) is outputted to switching section 182.
[3]
sd_ch2(n)=2.times.sd_mono(n)-sd_ch1(n) (Equation 3)
[0046] Switching section 180 selectively outputs one of first ch
decoded speech signal sd_ch1(n) inputted by first ch decoding
section 172 and first ch decoded speech signal sd_ch1(n) inputted
by first ch decoded signal generating section 176 in accordance
with coding channel selection information. Specifically, when the
selected channel is the first channel, first ch decoded speech
signal sd_ch1(n) inputted by first ch decoding section 172 is
selected and outputted. On the other hand, when the selected
channel is the second channel, first ch decoded speech signal
sd_ch1(n) inputted by first ch decoded signal generating section
176 is selected and outputted.
[0047] Switching section 182 selectively outputs one of second ch
decoded speech signal sd_ch2(n) inputted by second ch decoding
section 174 and second ch decoded speech signal sd_ch2(n) inputted
by second ch decoded signal generating section 178 in accordance
with coding channel selection information. Specifically, when the
selected channel is the first channel, second ch decoded speech
signal sd_ch2(n) inputted by second ch decoded signal generating
section 178 is selected and outputted. On the other hand, when the
selected channel is the second channel, second ch decoded speech
signal sd_ch2(n) inputted by second ch decoding section 174 is
selected and outputted.
[0048] First ch decoded speech signal sd_ch1(n) outputted by
switching section 180 and second ch decoded speech signal sd_ch2(n)
outputted by switching section 182 are outputted to a subsequent
speech outputting section (not shown) as a stereo decoded speech
signal.
[0049] In this way, according to this embodiment, monaural signal
s_mono(n) generated from first ch input speech signal s_ch1(n) and
second ch input speech signal s_ch2(n) is encoded so as to obtain
core layer encoded data, and an input speech signal (first ch
inputted speech signal s_ch1(n) or second ch inputted speech signal
s_ch2(n)) for a channel selected from the first channel and the
second channel is encoded so as to obtain enhancement layer encoded
data, so that it is possible to avoid prediction performance
(prediction gain) being insufficient when correlation between a
plurality of channels of a stereo signal is small and enable
efficient stereo speech coding.
Embodiment 2
[0050] FIG. 3 is a block diagram showing a configuration of speech
coding apparatus according to Embodiment 2 of the present
invention.
[0051] Speech coding apparatus 200 of FIG. 3 has basically the same
configuration as speech coding apparatus 100 described in
Embodiment 1. Elements of this configuration described in this
embodiment that are the same as described for Embodiment 1 are
given the same reference numerals as are used in Embodiment 1 and
are not described in detail.
[0052] Further, transmitted coded data sent from speech coding
apparatus 200 can be decoded by speech decoding apparatus having
the same basic configuration as speech decoding apparatus 150
described in Embodiment 1.
[0053] Speech coding apparatus 200 is equipped with core layer
coding section 102 and enhancement layer coding section 202.
Enhancement layer coding section 202 is comprised of first ch
coding section 122, second ch coding section 124, switching section
126 and coding channel selecting section 210.
[0054] Coding channel selecting section 210 is comprised of second
ch decoded speech generating section 212, first ch decoded speech
generating section 214, first distortion calculating section 216,
second distortion calculating section 218 and coding channel
determining section 220.
[0055] Second ch decoded speech generating section 212 generates a
second ch decoded speech signal as a second ch estimation signal
based on the relationship shown in equation 1 above using a
monaural decoded speech signal obtained by monaural signal coding
section 112 and first ch decoded speech signal obtained by first ch
coding section 122. The generated second ch decoded speech signal
is then outputted to first distortion calculating section 216.
[0056] First ch decoded speech generating section 214 generates a
first ch decoded speech signal as a first ch estimation signal
based on the relationship shown in equation 1 above using a
monaural decoded speech signal obtained by monaural signal coding
section 112 and second ch decoded speech signal obtained by second
ch coding section 124. The generated first ch decoded speech signal
is then outputted to second distortion calculating section 218.
[0057] The combination of second ch decoded speech generating
section 212 and first ch decoded speech generating section 214
constitutes an estimated signal generating section.
[0058] First distortion calculating section 216 calculates first
coding distortion using a first ch decoded speech signal obtained
by first ch coding section 122 and a second ch decoded speech
signal obtained by second ch decoded speech generating section 212.
First coding distortion corresponds to coding distortion for two
channels occurring when a first channel is selected as a target
channel for enhancement layer coding. Calculated first coding
distortion is outputted to coding channel determining section
220.
[0059] Second distortion calculating section 218 calculates first
coding distortion using a first ch decoded speech signal obtained
by second ch coding section 124 and a first ch decoded speech
signal obtained by first ch decoded speech generating section 214.
Second coding distortion corresponds to coding distortion for two
channels occurring when a second channel is selected as a target
channel for coding at the enhancement layer. Calculated second
coding distortion is outputted to coding channel determining
section 220.
[0060] Here, for example, the following two methods are given as
methods for calculating coding distortion for two channels (first
coding distortion or second coding distortion). In one method, an
average for two channels for an error power ratio (signal to coding
distortion ratio) with respect to a corresponding input speech
signal (first ch input speech signal or second ch input speech
signal) for decoded speech signals for each channel (first ch
decoded speech signal or second ch decoded speech signal) is
obtained as coding distortion for two channels. In the other
method, a total for two channels of the aforementioned error power
is obtained as coding distortion for two channels.
[0061] The combination of the first distortion calculating section
216 and the second distortion calculating section 218 constitutes a
distortion calculating section. Further, the combination of this
distortion calculating section and the prediction signal generating
section described above constitutes a calculating section.
[0062] Coding channel determining section 220 compares the value of
the first coding distortion and the value of the second coding
distortion, and selects the one of the first coding distortion and
second coding distortion having the smaller value. Coding channel
determining section 220 selects a channel corresponding to the
selected coding distortion as a target channel for coding at the
enhancement layer (coding channel) and generates coding channel
selection information indicating the selected channel. More
specifically, coding channel determining section 220 selects the
first channel when first coding distortion is smaller than second
coding distortion, and selects the second channel when the second
coding distortion is smaller than the first coding distortion.
Generated coding channel selection information is outputted to
switching section 126 and is multiplexed with core layer encoded
data and enhancement layer encoded data.
[0063] In this way, according to this embodiment, the magnitude of
coding distortion is used as a coding channel selection criterion,
so that it is possible to reduce coding distortion of the
enhancement layer and enable efficient stereo speech coding.
[0064] In this embodiment, a ratio or total of error power of a
decoded speech signal for each channel with respect to a
corresponding inputted speech signal is calculated and the results
of this calculation are used as coding distortion but it is also
possible to use coding distortion obtained in steps for coding at
first ch coding section 122 and second ch coding section 124 in
place of this. Further, this coding distortion may also be a
distortion with perceptual weight.
Embodiment 3
[0065] FIG. 4 is a block diagram showing a configuration of speech
coding apparatus according to Embodiment 3 of the present
invention. Speech coding apparatus 300 of FIG. 4 has basically the
same configuration as speech coding apparatus 100 and 200 described
in the above embodiments. Elements of this configuration described
in this embodiment that are the same as described for the
aforementioned embodiments are given the same reference numerals as
are used in the aforementioned embodiments and are not described in
detail.
[0066] Further, transmitted coded data sent from speech coding
apparatus 300 can be decoded by speech decoding apparatus having
the same basic configuration as speech decoding apparatus 150
described in Embodiment 1.
[0067] Speech coding apparatus 300 is equipped with core layer
coding section 102 and enhancement layer coding section 302.
Enhancement layer coding section 302 is comprised of coding channel
selecting section 310, first ch coding section 312, second ch
coding section 314 and switching section 126.
[0068] As shown in FIG. 5, coding channel selecting section 310 is
comprised of first ch intra-channel correlation calculating section
320, second ch intra-channel correlation calculating section 322,
and coding channel determining section 324.
[0069] First ch intra-channel correlation calculating section 320
calculates first channel intra-channel correlation cor1 using a
normalized maximum autocorrelation factor with respect to first ch
input speech signal.
[0070] Second ch intra-channel correlation calculating section 322
calculates second channel intra-channel correlation cor2 using a
normalized maximum autocorrelation factor with respect to second ch
input speech signal.
[0071] It is also possible to use pitch prediction gain with
respect to inputted speech signals for each channel or maximum
autocorrelation factor with respect to LPC (Linear Prediction
Coding) prediction error signals and pitch prediction gain values
in place of normalized maximum autocorrelation factor with respect
to inputted speech signals for each channel for the calculation of
intra-channel correlation for each channel.
[0072] Coding channel determining section 324 compares
intra-channel correlation cor1 and cor2 and selects the one having
the higher value. Coding channel determining section 324 selects a
channel corresponding to intra-channel correlation of the selected
channel as a coding channel at the enhancement layer, and generates
coding channel selection information indicating the selected
channel. More specifically, coding channel determining section 324
selects the first channel when intra-channel correlation cor1 is
higher than intra-channel correlation cor2, and selects the second
channel when intra-channel correlation cor2 is higher than
intra-channel correlation cor1. Generated coding channel selection
information is outputted to switching section 126 and is
multiplexed with core layer encoded data and enhancement layer
encoded data.
[0073] First ch coding section 312 and second ch coding section 314
have the same internal configuration. For ease of description, one
of first ch coding section 312 and second ch coding section 314 is
shown as "A-ch coding section 330", and its internal configuration
is described using FIG. 6. "A" of "A-ch" is 1 or 2. Further, "B" in
the drawings and used in the following description also is 1 or 2.
When "A" is 1, "B" is 2, and when "A" is 2, "B" is 1.
[0074] A-ch coding section 330 is comprised of switching section
332, A-ch signal intra-channel predicting section 334, subtractors
336 and 338, A-ch prediction residual signal coding section 340,
and B-ch estimation signal generating section 342.
[0075] Switching section 332 outputs an A-ch decoded speech signal
obtained by A-ch prediction residual signal coding section 340 or
A-ch estimation signal obtained by B-ch coding section (not shown)
to A-ch signal intra-channel predicting section 334 in accordance
with coding channel selection information. Specifically, when the
selected channel is an A-channel, an A-ch decoded speech signal is
outputted to A-ch signal intra-channel predicting section 334, and
when the selected channel is a B-channel, the A-ch estimation
signal is outputted to A-ch signal intra-channel predicting section
334.
[0076] A-ch signal intra-channel predicting section 334 carries out
intra-channel prediction for the A-channel. Intra-channel
prediction is for predicting the signal of the current frame from a
signal of a past frame by utilizing correlation of signals within a
channel. An intra-channel prediction signal Sp(n) and intra-channel
predictive parameter quantized code are obtained as intra-channel
prediction results. For example, when a 1.sup.st-order pitch
prediction filter is used, intra-channel prediction signal Sp(n) is
calculated using the following equation 4.
[4]
Sp(n)=gp.times.Sin(n-T) (Equation 4)
Here, Sin(n) is an inputted signal to a pitch prediction filter, T
is lag of a pitch prediction filter, and gp is a pitch prediction
coefficient for a pitch prediction filter.
[0077] A signal for a past frame as described above is held in an
intra-channel prediction buffer (A-ch intra-channel prediction
buffer) provided inside A-ch signal intra-channel predicting
section 334. Further, the A-ch intra-channel prediction buffer is
updated using the signal inputted by switching section 332 in order
to predict the signal for the next frame. The details of updating
the intra-channel prediction buffer are described in the
following.
[0078] Subtractor 336 subtracts the monaural decoded speech signal
from an A-ch input speech signal. Subtractor 338 subtracts
intra-channel prediction signal Sp(n) obtained as a result of
intra-channel prediction at A-ch signal intra-channel predicting
section 334 from a signal obtained by subtract at subtractor 336.
The signal obtained by subtraction at subtractor 338 (i.e. an A-ch
prediction residual signal), is outputted to A-ch prediction
residual signal coding section 340.
[0079] A-ch prediction residual signal coding section 340 encodes
the A-ch prediction residual signal using an arbitrary coding
method. Prediction residual coded data and an A-ch decoded speech
signal are obtained as a result of this encoding. Prediction
residual coded data is outputted as A-ch encoded data together with
intra-channel predictive parameter quantized code. The A-ch decoded
speech signal is outputted to B-ch estimation signal generating
section 342 and switching section 332.
[0080] B-ch estimation signal generating section 342 generates a
B-ch estimation signal as a B-ch decoded speech signal for the case
of encoding the A channel from the A-ch decoded speech signal and
the monaural decoded speech signal. The generated B-ch estimation
signal is then outputted to a switching section (same as switching
section 332) of the B-ch coding section (not shown).
[0081] Next, a description is given of the operation of updating an
intra-channel prediction buffer. Here, the case where the A-channel
is selected by coding channel selecting section 310 is taken as an
example, an example of an operation for updating the A-channel
intra-channel prediction buffer is described using FIG. 7, and an
example of an operation for updating the B-channel intra-channel
prediction buffer is described using FIG. 8.
[0082] In the example operation shown in FIG. 7, the A-ch
intra-channel prediction buffer 351 for within the A-ch signal
intra-channel predicting section 334 is updated using an A-ch
decoded speech signal for the i-th frame (where i is an arbitrary
natural number) obtained by A-ch prediction residual signal coding
section 340 (ST101) The updated A-ch intra-channel prediction
buffer 351 can then be used in intra-channel prediction for the
(i+1)-th frame that is the next frame (ST102).
[0083] In the example operation shown in FIG. 8, an i-th frame B-ch
estimation signal is generated using an i-th frame A-ch decoded
speech signal and an i-th frame monaural decoded speech signal
(ST201). The generated B-ch prediction signal is then outputted to
a B-ch coding section (not shown) from A-ch coding section 330. At
the B-ch coding section, the B-ch prediction signal is outputted to
the B-ch signal intra-channel predicting section (the same as A-ch
signal intra-channel predicting section 334) via a switching
section (the same as switching section 332). B-ch intra-channel
prediction buffer 352 provided inside B-ch signal intra-channel
predicting section is updated using a B-ch estimation signal
(ST202). The updated B-ch intra-channel prediction buffer 352 can
then be used in intra-channel prediction for the (i+1)-th frame
(ST203).
[0084] At a certain frame, when the A-channel is selected as a
coding channel, operations other than updating of B-ch
intra-channel prediction buffer 352 are not necessary at the B-ch
coding section, therefore it is possible to suspend coding of the
B-ch input speech signal for this frame.
[0085] According to this embodiment, the degree of intra-channel
correlation is used as a coding channel selection criterion, so
that it is possible to encode channels where intra-channel
correlation is high and improve coding efficiency using
intra-channel prediction.
[0086] Components for executing inter-channel prediction can be
added to the configuration of A-ch coding section 330. In this
case, a configuration may be adopted where, rather than inputting a
monaural decoded speech signal to subtractor 336, A-ch coding
section 330 carries out inter-channel prediction for predicting an
A-ch speech signal using a monaural decoded speech signal, and an
inter-channel prediction signal generated as a result is then
inputted to subtractor 336.
Embodiment 4
[0087] FIG. 9 is a block diagram showing a configuration of speech
coding apparatus according to Embodiment 4 of the present
invention.
[0088] Speech coding apparatus 400 of FIG. 9 has basically the same
configuration as speech coding apparatus 100, 200, and 300
described in the above embodiments. Elements of this configuration
described in this embodiment that are the same as described for the
aforementioned embodiments are given the same reference numerals as
are used in the aforementioned embodiments and are not described in
detail.
[0089] Further, transmitted coded data sent from speech coding
apparatus 400 can be decoded by speech decoding apparatus having
the same basic configuration as speech decoding apparatus 150
described in Embodiment 1.
[0090] Speech coding apparatus 400 is equipped with core layer
coding section 402 and enhancement layer coding section 404. Core
layer coding section 402 has monaural signal generating section 110
and monaural signal CELP (Code Excited Linear Prediction) coding
section 410. Enhancement layer coding section 404 is comprised of
coding channel selecting section 310, first ch CELP coding section
422, second ch CELP coding section 424 and switching section
126.
[0091] At core layer coding section 402, monaural signal CELP
coding section 410 carries out CELP coding on a monaural signal
generated by monaural signal generating section 110. Coded data
obtained as a result of this coding is outputted as core layer
encoded data. Further, a monaural excitation signal is obtained as
a result of this coding. Moreover, monaural signal CELP coding
section 410 decodes the monaural signal and outputs a monaural
decoded speech signal obtained as a result. Core layer encoded data
is multiplexed with enhancement layer encoded data and coding
channel selection information. Further, core layer encoded data, a
monaural excitation signal and a monaural decoded speech signal are
outputted to first ch CELP coding section 422 and second ch CELP
coding section 424.
[0092] At enhancement layer coding section 404, first ch CELP
coding section 422 and second ch CELP coding section 424 have the
same internal configuration. For ease of description, one of first
ch CELP coding section 422 and second ch CELP coding section 424 is
shown as "A-ch CELP coding section 430", and its internal
configuration is described using FIG. 10. As described above, "A"
of "A-ch" is 1 or 2, "B" used in the drawings and in the following
description is "1" or "2." When "A" is 1, "B" is 2, and, when "A"
is 2, "B" is 1.
[0093] A-ch CELP coding section 430 is comprised of A-ch LPC
(Linear Prediction Coding) analyzing section 431, multipliers 432,
433, 434, 435, and 436, switching section 437, A-ch adaptive
codebook 438, A-ch fixed codebook 439, adder 440, synthesis filter
441, perceptual weighting section 442, distortion minimizing
section 443, A-ch decoding section 444, B-ch estimation signal
generating section 445, A-ch LPC analyzing section 446, A-ch LPC
prediction residual signal generating section 447, and subtractor
448.
[0094] At A-ch CELP coding section 430, A-ch LPC analyzing section
431 carries out LPC analysis on the A-ch inputted speech signal and
quantizes an A-ch LPC parameter obtained as a result. Upon
quantizing of LPC parameters, A-ch LPC analyzing section 431
decodes monaural signal quantized LPC parameters from core layer
encoded data, quantizes a differential component of the A-ch LPC
parameter with respect to the decoded monaural signal quantized LPC
parameter, and obtains A-ch LPC quantized code so as to utilize the
fact that correlation between the A-ch LPC parameter and the LPC
parameters for the monaural signal is typically high. The A-ch LPC
quantized code is outputted to synthesis filter 441. Further, A-ch
LPC quantized code is outputted as A-ch encoded data together with
A-ch excitation coded data described later. It is therefore
possible to make quantizing of the enhancement layer LPC parameter
efficient by quantizing a differential component.
[0095] At A-ch CELP coding section 430, A-ch excitation coding data
is obtained by coding a residual component with respect to the
monaural excitation signal of the A-ch excitation signal. This
coding is implemented using excitation search occurring in CELP
coding.
[0096] Namely, at A-ch CELP coding section 430, an adaptive
excitation signal, fixed excitation signal, and monaural excitation
signal are respectively multiplied with corresponding gains, with
excitation signals being added after gain multiplication. Closed
loop type excitation search (adaptive codebook search, fixed
codebook search, and gain search) by distortion minimizing is then
carried out on excitation signals obtained as a result of this
addition. An adaptive codebook index (adaptive excitation index),
fixed codebook index (fixed excitation index), and gain code for an
adaptive excitation signal, fixed excitation signal, and monaural
excitation signal are then outputted as A-ch excitation coded data.
This excitation search is carried out every sub-frame obtained by
dividing frames into a plurality of portions, whereas core layer
coding, enhancement layer coding, and coding channel selection is
carried out every frame. A detailed description is given of this
configuration in the following.
[0097] Synthesis filter 441 carries out synthesis using the LPC
synthesis filter taking a signal outputted by adder 440 as an
excitation using A-ch LPC quantizing code outputted by A-ch LPC
analyzing section 431 as an excitation. The synthesis signal
obtained as a result of this synthesis is then outputted to
subtractor 448.
[0098] Subtractor 448 calculates an error signal by subtracting a
synthesis signal from the A-ch input speech signal. An error signal
is then outputted to perceptual weighting section 442. This error
signal corresponds to encoding distortion.
[0099] Perceptual weighting section 442 applies perceptual
weighting to the coding distortion and outputs coding distortion
after weighting to distortion minimizing section 443.
[0100] Distortion minimizing section 443 then decides the adaptive
codebook index and fixed codebook index in such a manner that
coding distortion becomes a minimum, and outputs the adaptive
codebook index to A-ch adaptive codebook 438 and the fixed codebook
index to A-ch fixed codebook 439. Further, distortion minimizing
section 443 generates gains corresponding to these indexes, and,
specifically generates gain (adaptive codebook gain and fixed
codebook gain) for each of the adaptive vectors described later and
fixed vectors described later, and outputs the adaptive codebook
gain to multiplier 433 and outputs the fixed codebook gain to
multiplier 435.
[0101] Moreover, distortion minimizing section 443 generates gains
(first adjusting gain, second adjusting gain, and third adjusting
gain) for adjusting gain between a monaural excitation signal, an
adaptive vector for after gain multiplication and a fixed vector
for after gain multiplication, and outputs first adjustment gain to
multiplier 432, second adjustment gain to multiplier 434, and third
adjustment gain to multiplier 436. The adjustment gains are
preferably generated so as to correlate with each other. For
example, when there is high inter-channel correlation between the
first ch input speech signal and the second ch input speech signal,
the three adjustment gains are generated in such a manner that the
proportion of the monaural excitation signal becomes relatively
large with respect to the proportion of the adaptive vector after
gain multiplication and the fixed vector for after gain
multiplication. Conversely, when there is low inter-channel
correlation, the three adjustment gains are generated in such a
manner that the proportion of the monaural excitation signal
becomes relatively small with respect to the proportion of the
adaptive vector after gain multiplication and the fixed vector for
after gain multiplication.
[0102] Further, distortion minimizing section 443 outputs the
adaptive codebook index, fixed codebook index, code for the
adaptive codebook gain, code for the fixed codebook gain, and code
for the three gain adjustment gains, as A-ch excitation coded
data.
[0103] A-ch adaptive codebook 438 stores excitation vectors
generated in the past used as excitations to synthesis filter 441
to an internal buffer. Further, A-ch adaptive codebook 438
generates one sub-frame portion of vectors from stored excitation
vectors as adaptive vectors. Generation of adaptive vectors is
carried out based on adaptive codebook lag (pitch lag or pitch
period) corresponding to an adaptive codebook index inputted by
distortion minimizing section 443. Generated adaptive vectors are
then outputted to multiplier 433.
[0104] The internal buffer of A-ch adaptive codebook 438 is then
updated using a signal outputted by switching section 437. The
details of this updating operation are described in the
following.
[0105] A-ch fixed codebook 439 outputs excitation vectors
corresponding to fixed codebook indexes outputted by distortion
minimizing section 443 to multiplier 435 as fixed vectors.
[0106] Multiplier 433 multiplies adaptive codebook gain upon
adaptive vectors outputted by A-ch adaptive codebook 438 and
outputs adaptive vectors for after gain multiplication to
multiplier 434.
[0107] Multiplier 435 multiplies fixed codebook gain upon adaptive
vectors outputted by A-ch fixed codebook 439 and outputs fixed
vectors for after gain multiplication to multiplier 436.
[0108] Multiplier 432 multiplies the monaural excitation signal by
the first adjustment gain, and outputs the monaural excitation
signal for after gain multiplication to adder 440. Multiplier 434
multiplies adaptive vectors outputted by multiplier 433 by the
second adjustment gain, and outputs adaptive vectors for after gain
multiplication to adder 440. Multiplier 436 multiplies fixed
vectors outputted by multiplier 435 by the third adjustment gain,
and outputs fixed vectors for after gain multiplication to adder
440.
[0109] Adder 440 adds a monaural excitation signal outputted by
multiplier 432, an adaptive vector outputted by multiplier 434, and
a fixed vector outputted by multiplier 436, and outputs the signal
after addition to switching section 437 and synthesis filter 441.
Switching section 437 outputs a signal outputted by adder 440 or a
signal outputted by A-ch LPC prediction residual signal generating
section 447 to A-ch adaptive codebook 438 in accordance with coding
channel selection information. More specifically, when the selected
channel is the A-channel, a signal from adder 440 is outputted to
A-ch adaptive codebook 438, and, when the selected channel is the
B-channel, a signal from A-ch LPC prediction residual signal
generating section 447 is outputted to A-ch adaptive codebook
438.
[0110] A-ch decoding section 444 decodes the A-ch coding data, and
outputs an A-ch decoded speech signal obtained as a result to B-ch
estimation signal generating section 445.
[0111] B-ch estimation signal generating section 445 generates a
B-ch estimation signal as a B-ch decoded speech signal for the case
of A-ch coding using the A-ch decoded speech signal and the
monaural decoded speech signal. The generated B-ch estimation
signal is then outputted to B-ch CELP coding section (not
shown).
[0112] A-ch LPC analyzing section 446 carries out LPC analysis on
the A-ch estimation signal outputted by the B-ch CELP coding
section (not shown) and outputs A-ch LPC parameters obtained as a
result to A-ch LPC prediction residual signal generating section
447. Here, the A-ch estimation signal outputted by the B-ch CELP
coding section corresponds to the A-ch decoded speech signal
generated when the B-ch input speech signal is encoded at the B-ch
CELP coding section(at the case of B-ch coding).
[0113] A-ch LPC prediction residual signal generating section 447
generates a coded LPC prediction residual signal for the A-ch
estimation signal using the A-ch LPC parameters outputted by A-ch
LPC analyzing section 446. The generated coded LPC prediction
residual signal is outputted to switching section 437.
[0114] Next, a description is given of the operation of updating
the adaptive codebook at A-ch CELP coding section 430 and the B-ch
CELP coding section (not shown). FIG. 11 is a flowchart showing an
adaptive codebook updating operation for when channel A is selected
by coding channel selecting section 310.
[0115] The flow of the example shown here is divided into CELP
coding processing at A-ch CELP coding section 430 (ST310), update
processing of the adaptive codebook within A-ch CELP coding section
430 (ST320), and update processing an adaptive codebook within the
B-ch CELP coding section (ST330). Further, step ST310 includes two
steps ST311 and ST312, and step ST330 includes four steps ST331,
ST332, ST333, and ST334.
[0116] First, in step ST311, LPC analysis and quantizing is carried
out by A-ch LPC analysis section 431 of A-ch CELP coding section
430. Excitation search (adaptive codebook search, fixed codebook
search, and gain search) is then carried out by a closed loop type
excitation search section mainly containing A-ch adaptive codebook
438, A-ch fixed codebook 439, multipliers 432, 433, 434, 435, and
436, adder 440, synthesis filter 441, subtractor 448, perceptual
weighting section 442, and distortion minimizing section 443
(ST312).
[0117] In step ST320, an internal buffer of A-ch adaptive codebook
438 is updated using an A-ch excitation signal obtained by the
aforementioned excitation search.
[0118] In step ST331, a B-ch estimation signal is generated by B-ch
estimation signal generating section 445 of A-ch CELP coding
section 430. The generated B-ch estimation signal is sent to B-ch
CELP coding section from A-ch CELP coding section 430. In step
ST332, LPC analysis is carried out on the B-ch estimation signal by
B-ch LPC analyzing section (the same as the A-ch LPC analyzing
section 446) of B-ch CELP coding section (not shown), so as to
obtain a B-ch LPC parameter.
[0119] In step ST333, a B-ch LPC parameter is used by a B-ch LPC
prediction residual signal generating section (same as the A-ch LPC
prediction residual signal generating section 447) of the B-ch CELP
coding section (not shown) and a coded LPC prediction residual
signal is generated for the B-ch estimation signal. This encoded
LPC prediction residual signal is outputted to a B-ch adaptive
codebook (the same as A-ch adaptive codebook 438) (not shown) via a
switching section (the same as switching section 437) of B-ch CELP
coding section (not shown). In step ST334, the internal buffer of
the B-ch adaptive codebook is updated using the coded LPC
prediction residual signal for the B-ch estimation signal.
[0120] A more detailed description is given in the following of the
operation of updating the adaptive codebook. Here, the case where
the A-channel is selected by coded channel selecting section 310 is
taken as an example, an example of an operation for updating an
internal buffer of A-ch adaptive codebook 438 is described using
FIG. 12, and an example of an operation for updating an internal
buffer of the B-channel adaptive codebook is described using FIG.
13.
[0121] In the operating example shown in FIG. 12, the internal
buffer of the A-ch adaptive codebook 438 is updated using the A-ch
excitation signal for the j-th subframe within the i-th frame
obtained by distortion minimizing section 443 (ST401). The updated
A-ch adaptive codebook 438 is used in excitation search for the
(j+1)-th subframe that is the next subframe (ST402).
[0122] In the example operation shown in FIG. 13, an i-th frame
B-ch estimation signal is generated using an i-th frame A-ch
decoded speech signal and an i-th frame monaural decoded speech
signal (ST501). The generated B-ch estimation signal is outputted
to B-ch CELP coding section from A-ch CELP coding section 430. The
B-ch encoded LPC prediction residual signal (coded LPC prediction
residual signal for the B-ch estimation signal) 451 for the i-th
frame is then generated for the B-ch LPC prediction residual signal
generating section of the B-ch CELP coding section (ST502). B-ch
coded LPC prediction residual signal 451 is outputted to B-ch
adaptive codebook 452 via the switching section of the B-ch CELP
coding section. B-ch adaptive codebook 452 is then updated by B-ch
encoded LPC prediction residual signal 451 (ST503). The updated
B-ch adaptive codebook 452 can then be used in excitation search of
the (i+1)-th frame that is the next frame (ST504).
[0123] At a certain frame, when the A-channel is selected as a
coding channel, operations other than updating of B-ch adaptive
codebook 452 are not necessary at the B-ch CELP coding section,
therefore it is possible to suspend coding of the B-ch input speech
signal for this frame.
[0124] In this way, according to this embodiment, it is possible to
encode signals for channels where intra-channel correlation is high
in cases where speech coding is carried out for each layer based on
CELP coding methods, and it is possible to improve the coding
efficiency using intra-channel prediction.
[0125] In this embodiment, a description is given of an example of
the case of using coding channel selecting section 310 described in
Embodiment 3 at the speech coding apparatus adopting the CELP
coding method but it is also possible to use the coding channel
selecting section 120 and the coding channel selecting section 210
described for Embodiment 1 and Embodiment 2, respectively, in place
of the coding channel selecting section 310 or together with the
coding channel selecting section 310. It is therefore possible to
effectively implement each of the embodiments described above in
the case of carrying out speech coding of each layer based on CELP
coding methods.
[0126] Further, it is also possible to use that other than that
described above as a selection criterion for enhancement layer
encoded channels. For example, adaptive codebook search of an A-ch
CELP coding section 430 and adaptive codebook search of a B-ch CELP
coding section are respectively carried out, and the channel
corresponding to that having the smaller value of the coding
distortion obtained as these results may then be selected as the
coding channel.
[0127] Further, the components executing inter-channel prediction
can be added to the configuration of A-ch CELP coding section 430.
In this case, a configuration may be adopted where rather than
directly multiplying the monaural excitation signal with the first
adjusting gain, A-ch CELP coding section 430 carries out
inter-channel prediction estimating A-ch decoded speech signal
using the monaural excitation signal and then multiplies the first
adjusting gain with an inter-channel prediction signal generated as
a result.
[0128] The above is a description of each of the embodiments of the
present invention. The speech coding apparatus and speech decoding
apparatus of each of the embodiments described above can also be
mounted on wireless communication apparatus such as wireless
communication mobile station apparatus and wireless communication
base station apparatus etc. used in mobile communication
systems.
[0129] Further, a description is given in the above embodiments of
an example of the case where the present invention is configured
using hardware but the present invention may also be implemented
using software.
[0130] Each function block employed in the description of each of
the aforementioned embodiments may typically be implemented as an
LSI constituted by an integrated circuit. These may be individual
chips or partially or totally contained on a single chip.
[0131] "LSI" is adopted here but this may also be referred to as
"IC", "system LSI", "super LSI", or "ultra LSI" depending on
differing extents of integration.
[0132] Further, the method of circuit integration is not limited to
LSI's, and implementation using dedicated circuitry or general
purpose processors is also possible. After LSI manufacture,
utilization of an FPGA (Field Programmable Gate Array) or a
reconfigurable processor where connections and settings of circuit
cells within an LSI can be reconfigured is also possible.
[0133] Further, if integrated circuit technology comes out to
replace LSI's as a result of the advancement of semiconductor
technology or a derivative other technology, it is naturally also
possible to carry out function block integration using this
technology. Application of biotechnology is also possible.
[0134] The present application is based on Japanese patent
application No. 2005-132366, filed on Apr. 28, 2005, the entire
content of which is expressly incorporated herein by reference.
INDUSTRIAL APPLICABILITY
[0135] The present invention may also be put to use in mobile
communication systems and communication apparatus such as packet
communication systems etc. employing internet protocols.
* * * * *