U.S. patent application number 17/010827 was filed with the patent office on 2021-02-25 for methods, apparatus and systems for decompressing a higher order ambisonics (hoa) signal.
This patent application is currently assigned to Dolby Laboratories Licensing Corporation. The applicant listed for this patent is Dolby Laboratories Licensing Corporation. Invention is credited to Sven Kordon, Alexander Krueger, Oliver Wuebbolt.
Application Number | 20210058729 17/010827 |
Document ID | / |
Family ID | 1000005197436 |
Filed Date | 2021-02-25 |
![](/patent/app/20210058729/US20210058729A1-20210225-D00000.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00001.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00002.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00003.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00004.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00005.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00006.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00007.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00008.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00009.png)
![](/patent/app/20210058729/US20210058729A1-20210225-D00010.png)
View All Diagrams
United States Patent
Application |
20210058729 |
Kind Code |
A1 |
Kordon; Sven ; et
al. |
February 25, 2021 |
METHODS, APPARATUS AND SYSTEMS FOR DECOMPRESSING A HIGHER ORDER
AMBISONICS (HOA) SIGNAL
Abstract
A method for compressing a HOA signal being an input HOA
representation with input time frames (C(k)) of HOA coefficient
sequences comprises spatial HOA encoding of the input time frames
and subsequent perceptual encoding and source encoding. Each input
time frame is decomposed (802) into a frame of predominant sound
signals (X.sub.PS(k-1)) and a frame of an ambient HOA component
({tilde over (C)}.sub.AMB(k-1)). The ambient HOA component ({tilde
over (C)}.sub.AMB(k-1)) comprises, in a layered mode, first HOA
coefficient sequences of the input HOA representation
(c.sub.n(k-1)) in lower positions and second HOA coefficient
sequences (c.sub.AMB,n(k-1)) in remaining higher positions. The
second HOA coefficient sequences are part of an HOA representation
of a residual between the input HOA representation and the HOA
representation of the predominant sound signals.
Inventors: |
Kordon; Sven; (Wunstorf,
DE) ; Krueger; Alexander; (Burgdorf, DE) ;
Wuebbolt; Oliver; (Hannover, DE) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Dolby Laboratories Licensing Corporation |
San Francisco |
CA |
US |
|
|
Assignee: |
Dolby Laboratories Licensing
Corporation
San Francisco
CA
|
Family ID: |
1000005197436 |
Appl. No.: |
17/010827 |
Filed: |
September 3, 2020 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16716424 |
Dec 16, 2019 |
10779104 |
|
|
17010827 |
|
|
|
|
16429575 |
Jun 3, 2019 |
10542364 |
|
|
16716424 |
|
|
|
|
15891606 |
Feb 8, 2018 |
10334382 |
|
|
16429575 |
|
|
|
|
15127577 |
Sep 20, 2016 |
9930464 |
|
|
PCT/EP2015/055914 |
Mar 20, 2015 |
|
|
|
15891606 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G10L 19/008 20130101;
H04S 2400/01 20130101; H04S 3/008 20130101; G10L 19/24 20130101;
H04S 2420/11 20130101 |
International
Class: |
H04S 3/00 20060101
H04S003/00; G10L 19/008 20060101 G10L019/008; G10L 19/24 20060101
G10L019/24 |
Foreign Application Data
Date |
Code |
Application Number |
Mar 21, 2014 |
EP |
14305411.2 |
Claims
1. A method of decoding a compressed Higher Order Ambisonics (HOA)
representation of a sound or a soundfield, the method comprising:
receiving a bit stream containing the compressed HOA
representation; determining whether there are multiple layers
relating to the compressed HOA representation, wherein an
indication of multiple layers is signalled in a bitstream; and
decoding, based on a determination that there are multiple layers,
the compressed HOA representation from the bitstream to obtain a
sequence of decoded HOA representations, wherein a first subset of
the sequence of decoded HOA representations corresponds to a first
set of indices of the sequence of decoded HOA representations and a
second subset of the sequence of decoded HOA representations
corresponds to a second set of indices of the sequence of decoded
HOA representations, wherein the first set of indices is based on
O.sub.MIN channels, wherein, for each index in the first set of
indices, a corresponding decoded HOA representation in the first
subset is determined based on only a corresponding ambient HOA
component, and wherein the second set of indices is determined
based on at least one of the multiple layers.
2. The method of claim 1, wherein O.sub.MIN=(N.sub.MIN+1).sup.2
with N.sub.MIN.ltoreq.N, wherein N is an order of input frames of
the compressed HOA representation.
3. The method of claim 1, wherein the multiple layers include a
base layer and at least an enhancement layer.
4. The method of claim 1, wherein, for a frame k, the sequence of
decoded HOA representations is determined based on an ambient
assignment vector (v.sub.AMB,ASSIGN(k)) and a first tuple set
.sub.DIR(k+1), comprising an index of a directional representation
and a respective quantized direction and a second tuple set
.sub.VEC(k+1)) comprising an index of a vector based representation
and a vector defining a directional distribution of the vector
based representation.
5. An apparatus for decoding a compressed Higher Order Ambisonics
(HOA) representation of a sound or a soundfield, the apparatus
comprising: a receiver for receiving a bit stream containing the
compressed HOA representation; and an audio decoder for decoding,
based on a determination that there are multiple layers, the
compressed HOA representation from a bitstream to obtain a sequence
of decoded HOA representations, wherein an indication of multiple
layers is signalled in the bitstream, wherein a first subset of the
sequence of decoded HOA representations corresponds to a first set
of indices of the sequence of decoded HOA representations and a
second subset of the sequence of decoded HOA representations
corresponds to a second set of indices of the sequence of decoded
HOA representations, wherein the first set of indices is based on
O.sub.MIN channels, wherein, for each index in the first set of
indices, a corresponding decoded HOA representation in the first
subset is determined based on only a corresponding ambient HOA
component, and wherein the second set of indices is determined
based on at least one of the multiple layers.
6. The apparatus of claim 5, wherein O.sub.MIN=(N.sub.MIN+1).sup.2
with N.sub.MIN.ltoreq.N, wherein N is an order of input frames of
the compressed HOA representation.
7. The apparatus of claim 5, wherein the multiple layers include a
base layer and at least an enhancement layer.
8. The apparatus of claim 5, wherein the audio decoder is further
configured to determine, for a frame k, the sequence of decoded HOA
representations based on an ambient assignment vector
(v.sub.AMB,ASSIGN(k)) and a first tuple set .sub.DIR(k+1),
comprising an index of a directional representation and a
respective quantized direction and a second tuple set
.sub.VEC(k+1)) comprising an index of a vector based representation
and a vector defining a directional distribution of the vector
based representation.
9. The apparatus of claim 5, wherein the audio decoder is further
configured to generate, during channel reassignment, a third set of
indices (.sub.AMB,ACT(k)) of coefficient sequences that are active
in frame k, and a second set of indices (.sub.E(k-1), .sub.D(k-1),
.sub.U(k-1)) of coefficient sequences of that have to be enabled,
disabled and to remain active, respectively, in a frame (k-1).
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is division of U.S. patent application Ser.
No. 16/716,424, filed Dec. 16, 2019, which is division of U.S.
patent application Ser. No. 16/429,575, filed Jun. 3, 2019, now
U.S. Pat. No. 10,542,364, which is division of U.S. patent
application Ser. No. 15/891,606, filed Feb. 8, 2018, now U.S. Pat.
No. 10,334,382, which is division of U.S. patent application Ser.
No. 15/127,577, filed Sep. 20, 2016, now U.S. Pat. No. 9,930,464,
which is U.S. national stage of PCT/EP2015/055914, filed Mar. 20,
2015, which claims priority to European Patent Application No.
14305411.2, filed Mar. 21, 2014, each of which is incorporated by
reference in its entirety.
FIELD OF THE INVENTION
[0002] This invention relates to a method for compressing a Higher
Order Ambisonics (HOA) signal, a method for decompressing a
compressed HOA signal, an apparatus for compressing a HOA signal,
and an apparatus for decompressing a compressed HOA signal.
BACKGROUND
[0003] Higher Order Ambisonics (HOA) offers a possibility to
represent three-dimensional sound. Other known techniques are wave
field synthesis (WFS) or channel based approaches like 22.2. In
contrast to channel based methods, however, the HOA representation
offers the advantage of being independent of a specific loudspeaker
set-up. This flexibility, however, is at the expense of a decoding
process which is required for the playback of the HOA
representation on a particular loudspeaker set-up. Compared to the
WFS approach, where the number of required loudspeakers is usually
very large, HOA may also be rendered to set-ups consisting of only
few loudspeakers. A further advantage of HOA is that the same
representation can also be employed without any modification for
binaural rendering to head-phones.
[0004] HOA is based on the representation of the so-called spatial
density of complex harmonic plane wave amplitudes by a truncated
Spherical Harmonics (SH) expansion. Each expansion coefficient is a
function of angular frequency, which can be equivalently
represented by a time domain function. Hence, without loss of
generality, the complete HOA sound field representation actually
can be assumed to consist of O time domain functions, where O
denotes the number of expansion coefficients. These time domain
functions will be equivalently referred to as HOA coefficient
sequences or as HOA channels in the following. Usually, a spherical
coordinate system is used where the x axis points to the frontal
position, the y axis points to the left, and the z axis points to
the top. A position in space x=(r, .theta., .PHI.).sup.T is
represented by a radius r>0 (i.e. the distance to the coordinate
origin), an inclination angle .theta..di-elect cons. [0,.pi.]
measured from the polar axis z and an azimuth angle .PHI. .di-elect
cons. [0,2.pi.[measured counter-clockwise in the x-y plane from the
x axis. Further, ().sup.T denotes the transposition.
[0005] A more detailed description of the HOA coding is provided in
the following The Fourier transform of the sound pressure with
respect to time denoted by .sub.t(), i.e.,
P(.omega.,x)=.sub.t(p(t,x))=.intg..sub.-.infin..sup..infin.p(t,x)e.sup.-i-
.omega.tdt with .omega. denoting the angular frequency and i
indicating the imaginary unit, may be expanded into the series of
Spherical Harmonics according to P(.omega.=kc.sub.s, r, .theta.,
.PHI.)=.SIGMA..sub.n=0.sup.N.SIGMA..sub.m=-n.sup.n
A.sub.n.sup.m(k)j.sub.n(kr)S.sub.n.sup.m(.theta., .PHI.).
[0006] Here c.sub.S denotes the speed of sound and k denotes the
angular wavenumber, which is related to the angular frequency
.omega. by
k = .omega. c s . ##EQU00001##
Further, j.sub.n() denote the spherical Bessel functions of the
first kind and S.sub.n.sup.m(.delta., .PHI.) denote the real valued
Spherical Harmonics of order n and degree m. The expansion
coefficients A.sub.n.sup.m(k) only depend on the angular wavenumber
k. Note that it has been implicitly assumed that sound pressure is
spatially band-limited. Thus, the series is truncated with respect
to the order index n at an upper limit N, which is called the order
of the HOA representation. If the sound field is represented by a
superposition of an infinite number of harmonic plane waves of
different angular frequencies .omega. and arriving from all
possible directions specified by the angle tuple (.theta., .PHI.),
the respective plane wave complex amplitude function C(.omega.,
.theta., .PHI.) can be expressed by the following Spherical
Harmonics expansion:
C(.omega.=kc.sub.s,.theta.,.PHI.)=.SIGMA..sub.n=0.sup.N.SIGMA..sub.m=-n.-
sup.nC.sub.n.sup.m(k)S.sub.n.sup.m(.theta.,.PHI.),
where the expansion coefficients C.sub.n.sup.m(k) are related to
the expansion coefficients A.sub.n.sup.m(k) by
A.sub.n.sup.m(k)=i.sup.nC.sub.n.sup.m(k).
[0007] Assuming the individual coefficients
C.sub.n.sup.m(.omega.=kc.sub.s) to be functions of the angular
frequency .omega., the application of the inverse Fourier transform
(denoted by .sup.-1()) provides time domain functions
c n m ( t ) = t - 1 ( C n m ( .omega. / c s ) ) = 1 2 .pi. .intg. -
.infin. .infin. C n m ( .omega. c s ) e i .omega. t d .omega.
##EQU00002##
for each order n and degree m, which can be collected in a single
vector c(t) by c(t)=[c.sub.0.sup.0(t) c.sub.1.sup.-1 (t)
c.sub.1.sup.0 (t) c.sub.1.sup.1(t) c.sub.2.sup.-2(t)
c.sub.2.sup.-1(t) c.sub.2.sup.0(t) . . . c.sub.N.sup.N-1(t)
c.sub.N.sup.N (t)].sup.T. The position index of a time domain
function c.sub.n.sup.m(t) within the vector c(t) is given by
n(n+1)+1+m. The overall number of elements in the vector c(t) is
given by O=(N+1).sup.2. The discrete-time versions of the functions
c.sub.n.sup.m(t) are referred to as Ambisonic coefficient
sequences. A frame-based HOA representation is obtained by dividing
all of these sequences into frames C(k) of length B and frame index
k as follows: [0008] C(k)=[c((kB+1)T.sub.s) c((kB+2)T.sub.S) . . .
c((kB+B)T.sub.S)], where T.sub.S denotes the sampling period. The
frame C(k) itself can then be represented as a composition of its
individual rows c.sub.i(k), i=1, . . . , O, as
[0008] C ( k ) = [ c 1 ( k ) c 2 ( k ) c O ( k ) ] ##EQU00003##
with c.sub.i(k) denoting the frame of the Ambisonic coefficient
sequence with position index i. The spatial resolution of the HOA
representation improves with a growing maximum order N of the
expansion. Unfortunately, the number of expansion coefficients O
grows quadratically with the order N, in particular O=(N+1).sup.2.
For example, typical HOA representations using order N=4 require
0=25 HOA (expansion) coefficients. According to these
considerations, the total bit rate for the transmission of HOA
representation, given a desired single-channel sampling rate fs and
the number of bits N.sub.b per sample, is determined by OfsN.sub.b.
Consequently, transmitting a HOA representation of order N=4 with a
sampling rate of f.sub.S=48 kHz employing N.sub.b=16 bits per
sample results in a bit rate of 19.2M Bits/s, which is very high
for many practical applications, as e.g. streaming. Thus,
compression of HOA representations is highly desirable.
[0009] Previously, the compression of HOA sound field
representations was proposed in the European Patent applications
EP2743922A, EP2665208A and EP2800401A. These approaches have in
common that they perform a sound field analysis and decompose the
given HOA representation into a directional and a residual ambient
component.
[0010] The final compressed representation is assumed to comprise,
on the one hand, a number of quantized signals, which result from
the perceptual coding of the directional signals, and relevant
coefficient sequences of the ambient HOA component. On the other
hand, it is assumed to comprise additional side information related
to the quantized signals, which is necessary for the reconstruction
of the HOA representation from its compressed version.
[0011] Further, a similar method is described in ISO/IEC
JTC1/SC29/WG11 N14264 (Working draft 1-HOA text of MPEG-H 3D audio,
January 2014, San Jose), where the directional component is
extended to a so-called predominant sound component. As the
directional component, the predominant sound component is assumed
to be partly represented by directional signals, i.e. monaural
signals with a corresponding direction from which they are assumed
to impinge on the listener, together with some prediction
parameters to predict portions of the original HOA representation
from the directional signals. Additionally, the predominant sound
component is supposed to be represented by so-called vector based
signals, meaning monaural signals with a corresponding vector which
defines the directional distribution of the vector based signals.
The known compressed HOA representation consists of I quantized
monaural signals and some additional side information, wherein a
fixed number O.sub.MIN out of these I quantized monaural signals
represent a spatially transformed version of the first O.sub.MIN
coefficient sequences of the ambient HOA component C.sub.AMB(k-2).
The type of the remaining I-O.sub.MIN signals can vary between
successive frames, and be either directional, vector based, empty
or representing an additional coefficient sequence of the ambient
HOA component C.sub.AMB(k-2).
[0012] A known method for compressing a HOA signal representation
with input time frames (C(k)) of HOA coefficient sequences includes
spatial HOA encoding of the input time frames and subsequent
perceptual encoding and source encoding. The spatial HOA encoding
100, as shown in FIG. 1A, comprises performing Direction and Vector
Estimation processing of the HOA signal in a Direction and Vector
Estimation block 101, wherein data comprising first tuple sets
.sub.DIR(k) for directional signals and second tuple sets
.sub.VEC(k) for vector based signals are obtained. Each of the
first tuple sets comprises an index of a directional signal and a
respective quantized direction, and each of the second tuple sets
comprising an index of a vector based signal and a vector defining
the directional distribution of the signals. A next step is
decomposing 103 each input time frame of the HOA coefficient
sequences into a frame of a plurality of predominant sound signals
X.sub.PS(k-1) and a frame of an ambient HOA component C.sub.AMB
(k-1), wherein the predominant sound signals X.sub.PS(k-1) comprise
said directional sound signals and said vector based sound signals.
The decomposing further provides prediction parameters .xi.(k-1)
and a target assignment vector v.sub.A,T(k-1). The prediction
parameters .xi.(k-1) describe how to predict portions of the HOA
signal representation from the directional signals within the
predominant sound signals X.sub.PS(k-1) so as to enrich predominant
sound HOA components, and the target assignment vector
v.sub.A,T(k-1) contains information about how to assign the
predominant sound signals to a given number 1 of channels.
[0013] The ambient HOA component C.sub.AMB(k-1) is modified 104
according to the information provided by the target assignment
vector v.sub.A,T(k-1), wherein it is determined which coefficient
sequences of the ambient HOA component are to be transmitted in the
given number/of channels, depending on how many channels are
occupied by predominant sound signals. A modified ambient HOA
component C.sub.M,A(k-2) and a temporally predicted modified
ambient HOA component C.sub.P,M,A(k-1) are obtained. Also a final
assignment vector v.sub.A(k-2) is obtained from information in the
target assignment vector v.sub.A,T(k-1). The predominant sound
signals X.sub.PS(k-1) obtained from the decomposing, and the
determined coefficient sequences of the modified ambient HOA
component C.sub.M,A(k-2) and of the temporally predicted modified
ambient HOA component C.sub.P,M,A(k-1) are assigned to the given
number of channels, using the information provided by the final
assignment vector v.sub.A(k-2), wherein transport signals
y.sub.i(k-2), i=1, . . . , I and predicted transport signals
y.sub.P,i(k-2), i=1, . . . , I are obtained. Then, gain control (or
normalization) is performed on the transport signals y.sub.i(k-2)
and the predicted transport signals y.sub.P,i (k-2), wherein gain
modified transport signals z.sub.i(k-2), exponents e.sub.i(k-2) and
exception flags (.beta..sub.i(k-2) are obtained.
[0014] As shown in FIG. 1B, the perceptual encoding and source
encoding comprises perceptual coding of the gain modified transport
signals z.sub.i(k-2), wherein perceptually encoded transport
signals (k-2), i=1, . . . , I are obtained, encoding side
information comprising said exponents e.sub.i(k-2) and exception
flags .beta..sub.i(k-2), the first and second tuple sets
.sub.DIR(k), .sub.VEC(k), the prediction parameters .xi.(k-1) and
the final assignment vector v.sub.A(k-2), and encoded side
information {hacek over (.GAMMA.)}(k-2) is obtained. Finally, the
perceptually encoded transport signals (k-2) and the encoded side
information are multiplexed into a bitstream.
SUMMARY OF THE INVENTION
[0015] One drawback of the proposed HOA compression method is that
it provides a monolithic (i.e. non-scalable) compressed HOA
representation. For certain applications, like broadcasting or
internet streaming, it is however desirable to be able to split the
compressed representation into a low quality base layer (BL) and a
high quality enhancement layer (EL). The base layer is supposed to
provide a low quality compressed version of the HOA representation,
which can be decoded independently of the enhancement layer. Such a
BL should typically be highly robust against transmission errors,
and be transmitted at a low data rate in order to guarantee a
certain minimum quality of the decompressed HOA representation even
under bad transmission conditions. The EL contains additional
information to improve the quality of the decompressed HOA
representation.
[0016] The present invention provides a solution for modifying
existing HOA compression methods so as to be able to provide a
compressed representation that comprises a (low quality) base layer
and a (high quality) enhancement layer. Further, the present
invention provides a solution for modifying existing HOA
decompression methods so as to be able to decode a compressed
representation that comprises at least a low quality base layer
that is compressed according to the invention.
[0017] One improvement relates to obtaining a self-contained (low
quality) base layer. According to the invention, the O.sub.MIN
channels that are supposed to contain a spatially transformed
version of the (without loss of generality) first O.sub.MIN
coefficient sequences of the ambient HOA component C.sub.AMB(k-2)
are used as the base layer. An advantage of selecting the first
O.sub.MIN channels for forming a base layer is their time-invariant
type. However, conventionally the respective signals lack any
predominant sound components, which are essential for the sound
scene. This is also clear from the conventional computation of the
ambient HOA component C.sub.AMB(k-1), which is carried out by
subtraction of the predominant sound HOA representation
C.sub.PS(k-1) from the original HOA representation C(k-1) according
to
C.sub.AMB(k-1)=C(k-1)-C.sub.PS(k-1) (1)
Therefore, one improvement of the invention relates to the addition
of such predominant sound components. According to the invention, a
solution to this problem is the inclusion of predominant sound
components at a low spatial resolution into the base layer. For
this purpose, the ambient HOA component C.sub.AMB(k-1) that is
output by a HOA Decomposition processing in the spatial HOA encoder
according to the invention is replaced by a modified version
thereof. The modified ambient HOA component comprises in the first
O.sub.MIN coefficient sequences, which are supposed to be always
transmitted in a spatially transformed form, the coefficient
sequences of the original HOA component. This improvement of the
HOA Decomposition processing can be seen as an initial operation
for making the HOA compression work in a layered mode (for example
dual layer mode). This mode provides e.g. two bit streams, or a
single bit stream that can be split up into a base layer and an
enhancement layer. Using or not using this mode is signalized by a
mode indication bit (e.g. a single bit) in access units of the
total bit stream.
[0018] In one embodiment, the base layer bit stream {hacek over
(B)}.sub.BASE(k-2) only includes the perceptually encoded signals
.sub.i(k-2), i=1, . . . , O.sub.MIN, and the corresponding coded
gain control side information, which consists of the exponents
e.sub.i(k-2) and the exception flags .beta..sub.i(k-2), i=1, . . .
, O.sub.MIN. The remaining perceptually encoded signals
.sub.i(k-2), i=O.sub.MIN+1, . . . , O and the encoded remaining
side information are included into the enhancement layer bit
stream. In one embodiment, the base layer bit stream {hacek over
(B)}.sub.BASE(k-2) and the enhancement layer bit stream {hacek over
(B)}.sub.ENH(k-2) are then jointly transmitted instead of the
former total bit stream {hacek over (B)}(k-2).
[0019] In one embodiment, the present invention is directed to a
method of decoding a compressed HOA representation of a sound or a
soundfield. The method may include receiving a bit stream
containing the compressed HOA representation. The method may
further include determining whether there are multiple layers
relating to the compressed HOA representation. It may further
include decoding, based on a determination that there are multiple
layers, the compressed HOA representation from the bitstream to
obtain a sequence of decoded HOA representations. A first subset of
the sequence of decoded HOA representations may correspond to a
first set of indices and a second subset of the sequence of decoded
HOA representations may correspond to a second set of indices. The
first set of indices may be based on O.sub.MIN channels. For each
index in the first set of indices, a corresponding decoded HOA
representation in the first subset is determined based on only a
corresponding ambient HOA component. The second set of indices may
be determined based on at least one of the multiple layers.
[0020] In another embodiment, an apparatus for decoding a
compressed HOA representation of a sound or a soundfield, may
comprise a receiver for receiving a bit stream containing the
compressed HOA representation. The apparatus may further comprise
an audio decoder for decoding, based on a determination that there
are multiple layers, the compressed HOA representation from the
bitstream to obtain a sequence of decoded HOA representations. As
above, a first subset of the sequence of decoded HOA
representations may correspond to a first set of indices and a
second subset of the sequence of decoded HOA representations may
correspond to a second set of indices.
[0021] The first set of indices may be based on O.sub.MIN channels.
For each index in the first set of indices, a corresponding decoded
HOA representation in the first subset may be determined based on
only a corresponding ambient HOA component. The second set of
indices may be determined based on at least one of the multiple
layers.
[0022] Advantageous embodiments of the invention are disclosed in
the dependent claims, the following description and the
figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] Exemplary embodiments of the invention are described with
reference to the accompanying drawings as follows:
[0024] FIGS. 1A and 1B illustrate an exemplary structure of a
conventional architecture of a HOA compressor;
[0025] FIGS. 2A and 2B illustrate an exemplary structure of a
conventional architecture of a HOA decompressor;
[0026] FIG. 3 illustrates an exemplary structure of an architecture
of a spatial HOA encoding and perceptual encoding portion of a HOA
compressor according to one embodiment of the invention;
[0027] FIG. 4 illustrates an exemplary structure of an architecture
of a source coder portion of a HOA compressor according to one
embodiment of the invention;
[0028] FIG. 5 illustrates an exemplary structure of an architecture
of a perceptual decoding and source decoding portion of a HOA
decompressor according to one embodiment of the invention;
[0029] FIG. 6 illustrates an exemplary structure of an architecture
of a spatial HOA decoding portion of a HOA decompressor according
to one embodiment of the invention;
[0030] FIG. 7 illustrates an exemplary transformation of frames
from ambient HOA signals to modified ambient HOA signals,
[0031] FIG. 8 illustrates a flow-chart of a method for compressing
a HOA signal;
[0032] FIG. 9 illustrates a flow-chart of a method for
decompressing a compressed HOA signal; and
[0033] FIG. 10 details of parts of an exemplary architecture of a
spatial HOA decoding portion of a HOA decompressor according to one
embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0034] For easier understanding, prior art solutions in FIGS. 1A,
1B and FIGS. 2A and 2B are recapitulated in the following.
[0035] FIGS. 1A and 1B show the structure of a conventional
architecture of a HOA compressor. In a method described in [4], the
directional component is extended to a so-called predominant sound
component. As the directional component, the predominant sound
component is assumed to be partly represented by directional
signals, meaning monaural signals with a corresponding direction
from which they are assumed to impinge on the listener, together
with some prediction parameters to predict portions of the original
HOA representation from the directional signals. Additionally, the
predominant sound component is supposed to be represented by
so-called vector based signals, meaning monaural signals with a
corresponding vector which defines the directional distribution of
the vector based signals. The overall architecture of the HOA
compressor proposed in [4] is illustrated in FIGS. 1A and B. It can
be subdivided into a spatial HOA encoding part depicted in FIG. 1A
and a perceptual and source encoding part depicted in FIG. 1B. The
spatial HOA encoder provides a first compressed HOA representation
consisting of I signals together with side information describing
how to create an HOA representation thereof. In the perceptual and
side info source coder the mentioned I signals are perceptually
encoded and the side information is subjected to source encoding,
before multiplexing the two coded representations.
[0036] Conventionally, the spatial encoding works as follows.
[0037] In a first step, the k-th frame C(k) of the original HOA
representation is input to a Direction and Vector Estimation
processing block, which provides the tuple sets .sub.DIR(k) and
.sub.VEC(k). The tuple set .sub.DIR(k) consists of tuples of which
the first element denotes the index of a directional signal and of
which the second element denotes the respective quantized
direction. The tuple set .sub.VEC(k) consists of tuples of which
the first element indicates the index of a vector based signal and
of which the second element denotes the vector defining the
directional distribution of the signals, i.e. how the HOA
representation of the vector based signal is computed.
[0038] Using both tuple sets .sub.DIR(k) and .sub.VEC(k), the
initial HOA frame C(k) is decomposed in the HOA Decomposition into
the frame X.sub.PS(k-1) of all predominant sound (i.e. directional
and vector based) signals and the frame C.sub.AMB(k-1) of the
ambient HOA component. Note the delay 102 of one frame,
respectively, which is due to overlap add processing in order to
avoid blocking artifacts. Furthermore, the HOA Decomposition is
assumed to output some prediction parameters (k-1) describing how
to predict portions of the original HOA representation from the
directional signals in order to enrich the predominant sound HOA
component. Additionally, a target assignment vector v.sub.A,T(k-1)
containing information about the assignment of predominant sound
signals, which were determined in the HOA Decomposition processing
block, to the I available channels is provided. The affected
channels can be assumed to be occupied, meaning they are not
available to transport any coefficient sequences of the ambient HOA
component in the respective time frame.
[0039] In the Ambient Component Modification processing block, the
frame C.sub.AMB(k-1) of the ambient HOA component is modified
according to the information provided by the tagret assignment
vector v.sub.A,T(k-1). In particular, it is determined which
coefficient sequences of the ambient HOA component are to be
transmitted in the given I channels, depending, amongst other
aspects, on the information (contained in the target assignment
vector v.sub.A,T(k-1)) about which channels are available and not
already occupied by predominant sound signals. Additionally, a fade
in and out of coefficient sequences is performed if the indices of
the chosen coefficient sequences vary between successive
frames.
[0040] Furthermore, it is assumed that the first O.sub.MIN
coefficient sequences of the ambient HOA component C.sub.AMB(k-2)
are always chosen to be perceptually coded and to be transmitted,
where O.sub.MIN=(N.sub.MIN+1).sup.2 with N.sub.MIN.ltoreq.N being
typically a smaller order than that of the original HOA
representation. In order to de-correlate these HOA coefficient
sequences, it is proposed to transform them to directional signals
(i.e. general plane wave functions) impinging from some predefined
directions .OMEGA..sub.MIN,d, d=1, . . . , O.sub.MIN-Along with the
modified ambient HOA component C.sub.M,A(k-1), a temporally
predicted modified ambient HOA component C.sub.P,M,A(k-1) is
computed to be later used in the Gain Control processing block in
order to allow a reasonable look ahead.
[0041] The information about the modification of the ambient HOA
component is directly related to the assignment of all possible
types of signals to the available channels. The final information
about the assignment is contained in the final assignment vector
v.sub.A(k-2). In order to compute this vector, information
contained in the target assignment vector v.sub.A,T(k-1) is
exploited.
[0042] The Channel Assignment assigns with the information provided
by the assignment vector v.sub.A(k-2) the appropriate signals
contained in X.sub.PS(k-2) and that contained in C.sub.M,A(k-2) to
the I available channels, yielding the signals y.sub.i(k-2), i=1, .
. . , I. Further, appropriate signals contained in X.sub.PS(k-1)
and that in C.sub.P,AMB(k-1) are also assigned to the I available
channels, yielding the predicted signals y.sub.P,i(k-2), i=1, . . .
, I. Each of the signals y.sub.i(k-2), i=1, . . . , I, is finally
processed by a Gain Control, where the signal gain is smoothly
modified to achieve a value range that is suitable for the
perceptual encoders. The predicted signal frames y.sub.P,i (k-2),
i=1, . . . , I, allow a kind of look ahead in order to avoid severe
gain changes between successive blocks. The gain modifications are
assumed to be reverted in the spatial decoder with the gain control
side information, consisting of the exponents e.sub.i(k-2) and the
exception flags .beta..sub.i(k-2), i=1, . . . , I.
[0043] FIGS. 2A and 2B show the structure of a conventional
architecture of a HOA decompressor, as proposed in [4].
Conventionally, HOA decompression consists of the counterparts of
the HOA compressor components, which are obviously arranged in
reverse order. It can be subdivided into a perceptual and source
decoding part depicted in FIG. 2A and a spatial HOA decoding part
depicted in FIG. 2B.
[0044] In the perceptual and side info source decoder, the bit
stream is first de-multiplexed into the perceptually coded
representation of the I signals and into the coded side information
describing how to create an HOA representation thereof.
Successively, a perceptual decoding of the I signals and a decoding
of the side information is performed. Then, the spatial HOA decoder
creates from the I signals and the side information the
reconstructed HOA representation.
[0045] Conventionally, spatial HOA decoding works as follows.
[0046] In the spatial HOA decoder, each of the perceptually decoded
signals {circumflex over (z)}.sub.i(k), i .di-elect cons. {1, . . .
, I}, is first input to an Inverse Gain Control processing block
together with the associated gain correction exponent e.sub.i(k)
and gain correction exception flag .beta..sub.i(k). The i-th
Inverse Gain Control processing provides a gain corrected signal
frame y.sub.i(k).
[0047] All of the I gain corrected signal frames y.sub.i(k), i
.di-elect cons. {1, . . . , I}, are passed together with the
assignment vector v.sub.AMB,ASSIGN(k) and the tuple sets
.sub.DIR(k+1) and .sub.VEC(k+1) to the Channel Reassignment. The
tuple sets .sub.DIR(k+1) and .sub.VEC(k+1) are defined above (for
spatial HOA encoding), and the assignment vector
v.sub.AMB,ASSIGN(k) consists of I components, which indicate for
each transmission channel if and which coefficient sequence of the
ambient HOA component it contains. In the Channel Reassignment the
gain corrected signal frames y.sub.i(k) are redistributed to
reconstruct the frame {circumflex over (X)}.sub.PS(k) of all
predominant sound signals (i.e., all directional and vector based
signals) and the frame C.sub.I,AMB(k) of an intermediate
representation of the ambient HOA component. Additionally, the set
.sub.AMB,ACT(k) of indices of coefficient sequences of the ambient
HOA component, which are active in the k-th frame, and the sets
.sub.E(k-1), .sub.D(k-1), and .sub.U(k-1) of coefficient indices of
the ambient HOA component, which have to be enabled, disabled and
to remain active in the (k-1)-th frame, are provided.
[0048] In the Predominant Sound Synthesis the HOA representation of
the predominant sound component C.sub.PS(k-1) is computed from the
frame {circumflex over (X)}.sub.PS(k) of all predominant sound
signals using the tuple set .sub.DIR(k+1) and the set .zeta.(k+1)
of prediction parameters, the tuple set .sub.VEC(k+1) and the sets
.sub.E(k-1), .sub.D(k-1), and .sub.U(k-1).
[0049] In the Ambience Synthesis, the ambient HOA component frame
C.sub.AMB(k-1) created from the frame C.sub.I,AMB(k) of the
intermediate representation of the ambient HOA component, using the
set .sub.AMB,ACT (k) of indices of coefficient sequences of the
ambient HOA component which are active in the k-th frame. Note the
delay of one frame, which is introduced due to the synchronization
with the predominant sound HOA component.
[0050] Finally, in the HOA Composition the ambient HOA component
frame C.sub.AMB(k-1) and the frame C.sub.PS(k-1) of the predominant
sound HOA component are superposed to provide the decoded HOA frame
C(k-1).
[0051] As has become clear from the coarse description of the HOA
compression and decompression method above, the compressed
representation consists of I quantized monaural signals and some
additional side information. A fixed number O.sub.MIN out of these
I quantized monaural signals represent a spatially transformed
version of the first O.sub.MIN coefficient sequences of the ambient
HOA component C.sub.AMB(k-2). The type of the remaining I-O.sub.MIN
signals can vary between successive frame, being either
directional, vector based, empty or representing an additional
coefficient sequence of the ambient HOA component C.sub.AMB(k-2).
Taken as it is, the compressed HOA representation is meant to be
monolithic. In particular, one problem is how to split the
described representation into a low quality base layer and an
enhancement layer.
[0052] According to the disclosed invention, a candidate for a low
quality base layer are the O.sub.MIN channels that contain a
spatially transformed version of the first O.sub.MIN coefficient
sequences of the ambient HOA component C.sub.AMB(k-2). What makes
these (without loss of generality: first) O.sub.MIN channels a good
choice to form a low quality base layer is their time-invariant
type. However, the respective signals lack any predominant sound
components, which are essential for the sound scene. This can also
be seen in the computation of the ambient HOA component
C.sub.AMB(k-1), which is carried out by subtraction of the
predominant sound HOA representation C.sub.PS(k-1) from the
original HOA representation C(k-1) according to
C.sub.AMB(k-1)=C(k-1)-C.sub.PS(k-1) (1)
A solution to this problem is to include the predominant sound
components at a low spatial resolution into the base layer.
Proposed amendments to the HOA compression are described in the
following.
[0053] FIG. 3 shows the structure of an architecture of a spatial
HOA encoding and perceptual encoding portion of a HOA compressor
according to one embodiment of the invention.
[0054] To include also the predominant sound components at a low
spatial resolution into the base layer, the ambient HOA component
C.sub.AMB(k-1), which is output by the HOA Decomposition processing
in the spatial HOA encoder (see FIG. 1A), is replaced by a modified
version
C ~ A M B , 1 ( k - 1 ) = [ c ~ AMB , 1 ( k - 1 ) c ~ AMB , 2 ( k -
1 ) c ~ AMB , O ( k - 1 ) ] ( 2 ) ##EQU00004##
whose elements are given by
c ~ AMB , n ( k - 1 ) = { c n ( k - 1 ) for 1 .ltoreq. n .ltoreq. O
MIN c AMB , n ( k - 1 ) for O MIN + 1 .ltoreq. n .ltoreq. O ( 3 )
##EQU00005##
[0055] In other words, the first O.sub.MIN coefficient sequences of
the ambient HOA component which are supposed to be always
transmitted in a spatially transformed form, are replaced by the
coefficient sequences of the original HOA component. The other
processing blocks of the spatial HOA encoder can remain
unchanged.
[0056] It is important to note that this change of the HOA
Decomposition processing can be seen as an initial operation making
the HOA compression work in a so-called "dual layer" or "two layer"
mode. This mode provides a bit stream that can be split up into a
low quality Base Layer and an Enhancement Layer. Using or not this
mode can be signalized by a single bit in access units of the total
bit stream.
[0057] A possible consequent modification of the bit stream
multiplexing to provide bit streams for a base layer and an
enhancement layer is illustrated in FIGS. 3 and 4, as described
further below.
[0058] The base layer bit stream {hacek over (B)}.sub.BASE(k-2)
only includes the perceptually encoded signals .sub.i(k-2), i=1, .
. . , O.sub.MIN, and the corresponding coded gain control side
information, consisting of the exponents e.sub.i(k-2) and the
exception flags .beta..sub.i(k-2), i=1, . . . , O.sub.MIN. The
remaining perceptually encoded signals .sub.i(k-2), i=O.sub.MIN+1,
. . . , O and the encoded remaining side information are included
into the enhancement layer bit stream. The base layer and
enhancement layer bit streams {hacek over (B)}.sub.BASE(k-2) and
{hacek over (B)}.sub.ENH(k-2) are then jointly transmitted instead
of the former total bit stream {hacek over (B)}(k-2).
[0059] In FIG. 3 and FIG. 4, an apparatus for compressing a HOA
signal being an input HOA representation with input time frames
(C(k)) of HOA coefficient sequences is shown. Said apparatus
comprises a spatial HOA encoding and perceptual encoding portion
for spatial HOA encoding of the input time frames and subsequent
perceptual encoding, which is shown in FIG. 3, and a source coder
portion for source encoding, which is shown in FIG. 4.
[0060] The spatial HOA encoding and perceptual encoding portion 300
comprises a Direction and Vector Estimation block 301, delay 302, a
HOA Decomposition block 303, an Ambient Component Modification
block 304, a Channel Assignment block 305, and a plurality of Gain
Control blocks 306.
[0061] The Direction and Vector Estimation block 301 is adapted for
performing Direction and Vector Estimation processing of the HOA
signal, wherein data comprising first tuple sets .sub.DIR(k) for
directional signals and second tuple sets .sub.VEC(k) for vector
based signals are obtained, each of the first tuple sets
.sub.DIR(k) comprising an index of a directional signal and a
respective quantized direction, and each of the second tuple sets
.sub.VEC(k) comprising an index of a vector based signal and a
vector defining the directional distribution of the signals.
[0062] The HOA Decomposition block 303 is adapted for decomposing
each input time frame of the HOA coefficient sequences into a frame
of a plurality of predominant sound signals X.sub.PS(k-1) and a
frame of an ambient HOA component {tilde over (C)}.sub.AMB(k-1),
wherein the predominant sound signals X.sub.PS(k-1) comprise said
directional sound signals and said vector based sound signals, and
wherein the ambient HOA component {tilde over (C)}.sub.AMB(k-1)
comprises HOA coefficient sequences representing a residual between
the input HOA representation and the HOA representation of the
predominant sound signals, and wherein the decomposing further
provides prediction parameters .xi.(k-1) and a target assignment
vector v.sub.A,T(k-1). The prediction parameters .xi.(k-1) describe
how to predict portions of the HOA signal representation from the
directional signals within the predominant sound signals
X.sub.PS(k-1) so as to enrich predominant sound HOA components, and
the target assignment vector v.sub.A,T(k-1) contains information
about how to assign the predominant sound signals to a given
number/of channels.
[0063] The Ambient Component Modification block 304 is adapted for
modifying the ambient HOA component C.sub.AMB(k-1) according to the
information provided by the target assignment vector
v.sub.A,T(k-1), wherein it is determined which coefficient
sequences of the ambient HOA component C.sub.AMB(k-1) are to be
transmitted in the given number/of channels, depending on how many
channels are occupied by predominant sound signals, and wherein a
modified ambient HOA component C.sub.M,A(k-2) and a temporally
predicted modified ambient HOA component C.sub.P,M,A(k-1) are
obtained, and wherein a final assignment vector v.sub.A(k-2) is
obtained from information in the target assignment vector v.sub.A,T
(k-1).
[0064] The Channel Assignment block 305 is adapted for assigning
the predominant sound signals X.sub.PS(k-1) obtained from the
decomposing, the determined coefficient sequences of the modified
ambient HOA component C.sub.M,A(k-2) and of the temporally
predicted modified ambient HOA component C.sub.P,M,A(k-1) to the
given number/of channels using the information provided by the
final assignment vector v.sub.A(k-2), wherein transport signals
y.sub.i(k-2), i=1, . . . , I and predicted transport signals
y.sub.P,i(k-2), i=1, . . . , I are obtained.
[0065] The plurality of Gain Control blocks 306 is adapted for
performing gain control (805) to the transport signals y.sub.i(k-2)
and the predicted transport signals y.sub.P,i(k-2), wherein gain
modified transport signals z.sub.i(k-2), exponents e.sub.i(k-2) and
exception flags .beta..sub.i(k-2) are obtained.
[0066] FIG. 4 shows the structure of an architecture of a source
coder portion of a HOA compressor according to one embodiment of
the invention. The source coder portion as shown in FIG. 4
comprises a Perceptual Coder 310, a Side Information Source Coder
block with two coders 320,330, namely a Base Layer Side Information
Source Coder 320 and an Enhancement Layer Side Information Encoder
330, and two multiplexers 340,350, namely a Base Layer Bitstream
Multiplexer 340 and an Enhancement Layer Bitstream Multiplexer 350.
The Side Information Source Coders may be in a single Side
Information Source Coder block.
[0067] The Perceptual Coder 310 is adapted for perceptually coding
806 said gain modified transport signals z.sub.i(k-2), wherein
perceptually encoded transport signals (k-2), i=1, . . . , I are
obtained.
[0068] The Side Information Source Coders 320,330 are adapted for
encoding side information comprising said exponents e.sub.i(k-2)
and exception flags .beta..sub.i(k-2), said first tuple sets
.sub.DIR(k) and second tuple sets .sub.VEC(k), said prediction
parameters .xi.(k-1) and said final assignment vector v.sub.A(k-2),
wherein encoded side information {hacek over (.GAMMA.)}(k-2) is
obtained.
[0069] The multiplexers 340,350 are adapted for multiplexing the
perceptually encoded transport signals (k-2) and the encoded side
information {hacek over (.GAMMA.)}(k-2) into a multiplexed data
stream {hacek over ({hacek over (B)})} (k-2), wherein the ambient
HOA component {tilde over (C)}.sub.AMB(k-1) obtained in the
decomposing comprises first HOA coefficient sequences of the input
HOA representation c.sub.n(k-1) in O.sub.MIN lowest positions (ie.
those with lowest indices) and second HOA coefficient sequences
c.sub.AMB,n(k-1) in remaining higher positions. As explained below
with respect to eq.(4)-(6), the second HOA coefficient sequences
are part of an HOA representation of a residual between the input
HOA representation and the HOA representation of the predominant
sound signals. Further, the first O.sub.MIN exponents e.sub.i(k-2),
i=1, . . . , O.sub.MIN and exception flags .beta..sub.i (k-2), i=1,
. . . , O.sub.MIN are encoded in a Base Layer Side Information
Source Coder 320, wherein encoded Base Layer side information
{hacek over (.GAMMA.)}.sub.BASE(k-2) is obtained, and wherein
O.sub.MIN=(N.sub.MIN+1).sup.2 and O=(N+1).sup.2, with
N.sub.MIN.ltoreq.N and O.sub.MIN.ltoreq.I and N.sub.MIN is a
predefined integer value. The first O.sub.MIN perceptually encoded
transport signals (k-2), i=1, . . . , O.sub.MIN and the encoded
Base Layer side information {hacek over (.GAMMA.)}.sub.BASE(k-2)
are multiplexed in a Base Layer Bitstream Multiplexer 340 (which is
one of said multiplexers), wherein a Base Layer bitstream {hacek
over (B)}.sub.BASE(k-2) is obtained. The Base Layer Side
Information Source Coder 320 is one of the Side Information Source
Coders, or it is within a Side Information Source Coder block.
[0070] The remaining I-O.sub.MIN exponents e.sub.i(k-2),
i=O.sub.MIN+1, . . . , I and exception flags .beta..sub.i(k-2),
i=O.sub.MIN+1, . . . , I, said first tuple sets .sub.DIR(k-1) and
second tuple sets .sub.VEC(k-1), said prediction parameters
.xi.(k-1) and said final assignment vector v.sub.A(k-2) are encoded
in an Enhancement Layer Side Information Encoder 330, wherein
encoded enhancement layer side information {hacek over
(.GAMMA.)}.sub.ENH(k-2) is obtained. The Enhancement Layer Side
Information Source Coder 330 is one of the Side Information Source
Coders, or is within a Side Information Source Coder block.
[0071] The remaining I-O.sub.MIN perceptually encoded transport
signals (k-2), i=O.sub.MIN+1, . . . , I and the encoded enhancement
layer side information {hacek over (.GAMMA.)}.sub.ENH(k-2) are
multiplexed in an Enhancement Layer Bitstream Multiplexer 350
(which is also one of said multiplexers), wherein an Enhancement
Layer bitstream {hacek over (B)}.sub.ENH(k-2) is obtained. Further,
a mode indication LMF.sub.E is added in a multiplexer or an
indication insertion block. The mode indication LMF.sub.E
signalizes usage of a layered mode, which is used for correct
decompression of the compressed signal.
[0072] In one embodiment, the apparatus for encoding further
comprises a mode selector adapted for selecting a mode, the mode
being indicated by the mode indication LMF.sub.E and being one of a
layered mode and a non-layered mode. In the non-layered mode, the
ambient HOA component {tilde over (C)}.sub.AMB(k-1) comprises only
HOA coefficient sequences representing a residual between the input
HOA representation and the HOA representation of the predominant
sound signals (ie., no coefficient sequences of the input HOA
representation).
[0073] Proposed amendments of the HOA decompression are described
in the following.
[0074] In the layered mode, the modification of the ambient HOA
component C.sub.AMB(k-1) in the HOA compression is considered at
the HOA decompression by appropriately modifying the HOA
composition.
[0075] In the HOA decompressor, the demultiplexing and decoding of
the base layer and enhancement layer bit streams are performed
according to FIG. 5. The base layer bit stream {hacek over
(B)}.sub.BASE(k) is de-multiplexed into the coded representation of
the base layer side information and the perceptually encoded
signals. Subsequently, the coded representation of the base layer
side information and the perceptually encoded signals are decoded
to provide the exponents e.sub.i(k) and the exception flags on the
one hand, and the perceptually decoded signals on the other hand.
Similarly, the enhancement layer bit stream is de-multiplexed and
decoded to provide the perceptually decoded signals and the
remaining side information (see FIG. 5). With this layered mode,
the spatial HOA decoding part also has to be modified to consider
the modification of the ambient HOA component C.sub.AMB (k-1) in
the spatial HOA encoding. The modification is accomplished in the
HOA composition.
[0076] In particular, the reconstructed HOA representation
C(k-1)=C.sub.PS(k-1)+C.sub.AMB(k-1) (4)
is replaced by its modified version
C ^ ~ ( k - 1 ) = [ c ^ ~ 1 ( k - 1 ) c ^ ~ 2 ( k - 1 ) c ^ ~ O ( k
- 1 ) ] ( 5 ) ##EQU00006##
whose elements are given by
c ^ .about. n ( k - 1 ) = { c ^ AMB , n ( k - 1 ) for 1 .ltoreq. n
.ltoreq. O MIN c ^ n ( k - 1 ) for O MIN + 1 .ltoreq. n .ltoreq. O
( 6 ) ##EQU00007##
[0077] That means that the predominant sound HOA component is not
added to the ambient HOA component for the first O.sub.MIN
coefficient sequences, since it is already included therein. All
other processing blocks of the HOA spatial decoder remain
unchanged.
[0078] In the following, the HOA decompression in the pure presence
of a low quality base layer bit stream {hacek over (B)}.sub.BASE(k)
is briefly considered.
[0079] The bit stream is first de-multiplexed and decoded to
provide the reconstructed signals {circumflex over (z)}.sub.i(k)
and the corresponding gain control side information, consisting of
the exponents e.sub.i(k) and the exception flags .beta..sub.i(k),
i=1, . . . , O.sub.MIN. Note that in absence of the enhancement
layer, the perceptually coded signals .sub.i(k-2), i=O.sub.MIN+1, .
. . , O, are not available. A possible way of addressing this
situation is to set the signals {circumflex over (z)}.sub.i(k),
i=O.sub.MIN+1, . . . , O, to zero, which automatically causes the
reconstructed predominant sound component C.sub.PS(k-1) to be
zero.
[0080] In a next step, in the spatial HOA decoder, the first
O.sub.MIN Inverse Gain Control processing blocks provide gain
corrected signal frames y.sub.i(k), i=1, . . . , O.sub.MIN, which
are used to construct the frame C.sub.I,AMB(k) of an intermediate
representation of the ambient HOA component by the Channel
Reassignment. Note that the set .sub.AMB,ACT(k) of indices of
coefficient sequences of the ambient HOA component, which are
active in the k-th frame, contains only the indices 1, 2, . . . ,
O.sub.MIN. In the Ambience Synthesis, the spatial transform of the
first O.sub.MIN coefficient sequences is reverted to provide the
ambient HOA component frame C.sub.AMB(k-1). Finally, the
reconstructed HOA representation is computed according to
eq.(6).
[0081] FIG. 5 and FIG. 6 show the structure of an architecture of a
HOA decompressor according to one embodiment of the invention. The
apparatus comprises a perceptual decoding and source decoding
portion as shown in FIG. 5, a spatial HOA decoding portion as shown
in FIG. 6, and a mode detector adapted for detecting a layered mode
indication LMF.sub.D indicating that the compressed HOA signal
comprises a compressed base layer bitstream {hacek over
(B)}.sub.BASE(k) and a compressed enhancement layer bitstream.
[0082] FIG. 5 shows the structure of an architecture of a
perceptual decoding and source decoding portion of a HOA
decompressor according to one embodiment of the invention. The
perceptual decoding and source decoding portion comprises a first
demultiplexer 510, a second demultiplexer 520, a Base Layer
Perceptual Decoder 540 and an Enhancement Layer Perceptual Decoder
550, a Base Layer Side Information Source Decoder 530 and an
Enhancement Layer Side Information Source Decoder 560.
[0083] The first demultiplexer 510 is adapted for demultiplexing
the compressed base layer bitstream {hacek over (B)}.sub.BASE(k),
wherein first perceptually encoded transport signals .sub.i(k),
i=1, . . . , O.sub.MIN and first encoded side information {hacek
over (.GAMMA.)}.sub.BASE(k) are obtained. The second demultiplexer
520 is adapted for demultiplexing the compressed enhancement layer
bitstream {hacek over (B)}.sub.ENH(k), wherein second perceptually
encoded transport signals .sub.i(k), i=O.sub.MIN+1, . . . , I and
second encoded side information {hacek over (.GAMMA.)}.sub.ENH(k)
are obtained.
[0084] The Base Layer Perceptual Decoder 540 and the Enhancement
Layer Perceptual Decoder 550 are adapted for perceptually decoding
904 the perceptually encoded transport signals .sub.i(k), i=1, . .
. , I, wherein perceptually decoded transport signals {circumflex
over (z)}.sub.i(k) are obtained, and wherein in the Base Layer
Perceptual Decoder 540 said first perceptually encoded transport
signals .sub.i(k), i=1, . . . , O.sub.MIN of the base layer are
decoded and first perceptually decoded transport signals
{circumflex over (z)}.sub.i(k), i=1, . . . , O.sub.MIN are
obtained. In the Enhancement Layer Perceptual Decoder 550, said
second perceptually encoded transport signals .sub.i(k),
i=O.sub.MIN+1, . . . , I of the enhancement layer are decoded and
second perceptually decoded transport signals {circumflex over
(z)}.sub.i(k), i=O.sub.MIN+1, . . . , I are obtained.
[0085] The Base Layer Side Information Source Decoder 530 is
adapted for decoding 905 the first encoded side information {hacek
over (.GAMMA.)}.sub.BASE(k), wherein first exponents e.sub.i(k),
i=1, . . . , O.sub.MIN and first exception flags .beta..sub.i (k),
i=1, . . . , O.sub.MIN are obtained.
[0086] The Enhancement Layer Side Information Source Decoder 560 is
adapted for decoding 906 the second encoded side information {hacek
over (.GAMMA.)}.sub.ENH(k), wherein second exponents e.sub.i(k),
i=O.sub.MIN+1, . . . , I and second exception flags
.beta..sub.i(k), i=O.sub.MIN+1, . . . , I are obtained, and wherein
further data are obtained. The further data comprise a first tuple
set .sub.DIR(k+1) for directional signals and a second tuple set
.sub.VEC(k+1) for vector based signals. Each tuple of the first
tuple set .sub.DIR(k+1) comprises an index of a directional signal
and a respective quantize direction, and each tuple of the second
tuple set .sub.VEC(k+1) comprises an index of a vector based signal
and a vector defining the directional distribution of the vector
based signal. Further, prediction parameters .xi.(k+1) and an
ambient assignment vector v.sub.AMB,ASSIGN(k) are obtained, wherein
the ambient assignment vector v.sub.AMB,ASSIGN(k) comprises
components that indicate for each transmission channel if and which
coefficient sequence of the ambient HOA component it contains.
[0087] FIG. 6 shows the structure of an architecture of a spatial
HOA decoding portion of a HOA decompressor according to one
embodiment of the invention. The spatial HOA decoding portion
comprises a plurality of inverse gain control units 604, a Channel
Reassignment block 605, a Predominant Sound Synthesis block 606,
and an Ambient Synthesis block 607, a HOA Composition block
608.
[0088] The plurality of inverse gain control units 604 are adapted
for performing inverse gain control, wherein said first
perceptually decoded transport signals {circumflex over
(z)}.sub.i(k), i=1, . . . , O.sub.MIN are transformed into first
gain corrected signal frames y.sub.i(k), i=1, . . . , O.sub.MIN
according to the first exponents e.sub.i(k), i=1, . . . , O.sub.MIN
and the first exception flags .beta..sub.i(k), i=1, . . . ,
O.sub.MIN, and wherein the second perceptually decoded transport
signals {umlaut over (z)}.sub.i(k), i=O.sub.MIN+1, . . . , I are
transformed into second gain corrected signal frames y.sub.i(k),
i=O.sub.MIN+1, . . . , I according to the second exponents
e.sub.i(k), i=O.sub.MIN+1, . . . , I and the second exception flags
.beta..sub.i(k), i=O.sub.MIN+1, . . . , I.
[0089] The Channel Reassignment block 605 is adapted for
redistributing 911 the first and second gain corrected signal
frames y.sub.i(k), i=1, . . . , I to I channels, wherein frames of
predominant sound signals {circumflex over (X)}.sub.PS(k) are
reconstructed, the predominant sound signals comprising directional
signals and vector based signals, and wherein a modified ambient
HOA component {tilde over (C)}.sub.I,AMB(k) is obtained, and
wherein the assigning is made according to said ambient assignment
vector v.sub.AMB,ASSIGN(k) and to information in said first and
second tuple sets .sub.DIR(k+1), .sub.VEC(k+1).
[0090] Further, the Channel Reassignment block 605 is adapted for
generating a first set of indices .sub.AMB,ACT(k) of coefficient
sequences of the modified ambient HOA component that are active in
a frame, and a second set of indices .sub.E(k-1), .sub.D(k-1),
.sub.U(k-1) of coefficient sequences of the modified ambient HOA
component that have to be enabled, disabled and to remain active in
the (k-1).sup.th frame.
[0091] The Predominant Sound Synthesis block 606 is adapted for
synthesizing 912 a HOA representation of the predominant HOA sound
components C.sub.PS(k-1) from said predominant sound signals
{circumflex over (X)}.sub.PS(k), wherein the first and second tuple
sets .sub.DIR+(k+1), .sub.VEC(k+1), the prediction parameters
.xi.(k+1) and the second set of indices .sub.E(k-1), .sub.D (k-1),
.sub.U(k-1) are used.
[0092] The Ambient Synthesis block 607 is adapted for synthesizing
913 an ambient HOA component {circumflex over ({tilde over
(C)})}.sub.AMB(k-1) from the modified ambient HOA component {tilde
over (C)}.sub.I,AMB(k), wherein an inverse spatial transform for
the first O.sub.MIN channels is made and wherein the first set of
indices .sub.AMB,ACT(k) is used, the first set of indices being
indices of coefficient sequences of the ambient HOA component that
are active in the k.sup.th frame.
[0093] If the layered mode indication LMF.sub.D indicates a layered
mode with at least two layers, the ambient HOA component comprises
in its O.sub.MIN lowest positions (ie. those with lowest indices)
HOA coefficient sequences of the decompressed HOA signal C(k-1),
and in remaining higher positions coefficient sequences that are
part of an HOA representation of a residual. This residual is a
residual between the decompressed HOA signal C(k-1) and 914 the HOA
representation of the predominant HOA sound components
C.sub.PS(k-1).
[0094] On the other hand, if the layered mode indication LMF.sub.D
indicates a single-layer mode, there are no HOA coefficient
sequences of the decompressed HOA signal C(k-1) comprised, and the
ambient HOA component is a residual between the decompressed HOA
signal C(k-1) and the HOA representation of the predominant sound
components C.sub.PS(k-1).
[0095] The HOA Composition block 608 is adapted for adding the HOA
representation of the predominant sound components to the ambient
HOA component C.sub.PS(k-1){circumflex over ({tilde over
(C)})}.sub.AMB(k-1), wherein coefficients of the HOA representation
of the predominant sound signals and corresponding coefficients of
the ambient HOA component are added, and wherein the decompressed
HOA signal C'(k-1) is obtained, and wherein, if the layered mode
indication LMF.sub.D indicates a layered mode with at least two
layers, only the highest I-O.sub.MIN coefficient channels are
obtained by addition of the predominant HOA sound components
C.sub.PS(k-1) and the ambient HOA component {circumflex over
({tilde over (C)})}.sub.AMB(k-1), and the lowest O.sub.MIN
coefficient channels of the decompressed HOA signal C'(k-1) are
copied from the ambient HOA component {circumflex over ({tilde over
(C)})}.sub.AMB(k-1). On the other hand, if the layered mode
indication LMF.sub.D indicates a single-layer mode, all coefficient
channels of the decompressed HOA signal C'(k-1) are obtained by
addition of the predominant HOA sound components C.sub.PS(k-1) and
the ambient HOA component {circumflex over ({tilde over
(C)})}.sub.AMB(k-1).
[0096] FIG. 7 shows transformation of frames from ambient HOA
signals to modified ambient HOA signals.
[0097] FIG. 8 shows a flow-chart of a method for compressing a HOA
signal.
[0098] The method 800 for compressing a Higher Order Ambisonics
(HOA) signal being an input HOA representation of an order N with
input time frames C(k) of HOA coefficient sequences comprises
spatial HOA encoding of the input time frames and subsequent
perceptual encoding and source encoding.
[0099] The spatial HOA encoding comprises steps of
[0100] performing Direction and Vector Estimation processing 801 of
the HOA signal in a Direction and Vector Estimation block 301,
wherein data comprising first tuple sets .sub.DIR(k) for
directional signals and second tuple sets .sub.VEC(k) for vector
based signals are obtained, each of the first tuple sets
.sub.DIR(k) comprising an index of a directional signal and a
respective quantized direction, and each of the second tuple sets
.sub.VEC(k) comprising an index of a vector based signal and a
vector defining the directional distribution of the signals,
[0101] decomposing 802 in a HOA Decomposition block 303 each input
time frame of the HOA coefficient sequences into a frame of a
plurality of predominant sound signals X.sub.PS (k-1) and a frame
of an ambient HOA component {tilde over (C)}.sub.AMB(k-1), wherein
the predominant sound signals X.sub.PS(k-1) comprise said
directional sound signals and said vector based sound signals, and
wherein the ambient HOA component {tilde over (C)}.sub.AMB(k-1)
comprises HOA coefficient sequences representing a residual between
the input HOA representation and the HOA representation of the
predominant sound signals, and wherein the decomposing 702 further
provides prediction parameters .xi.(k-1) and a target assignment
vector v.sub.A,T(k-1), the prediction parameters .xi.(k-1)
describing how to predict portions of the HOA signal representation
from the directional signals within the predominant sound signals
X.sub.PS(k-1) so as to enrich predominant sound HOA components, and
the target assignment vector v.sub.A,T(k-1) containing information
about how to assign the predominant sound signals to a given
number/of channels,
[0102] modifying 803 in an Ambient Component Modification block 304
the ambient HOA component C.sub.AMB(k-1) according to the
information provided by the target assignment vector
v.sub.A,T(k-1), wherein it is determined which coefficient
sequences of the ambient HOA component C.sub.AMB(k-1) are to be
transmitted in the given number/of channels, depending on how many
channels are occupied by predominant sound signals, and wherein a
modified ambient HOA component C.sub.M,A(k-2) and a temporally
predicted modified ambient HOA component C.sub.P,M,A(k-1) are
obtained, and wherein a final assignment vector v.sub.A(k-2) is
obtained from information in the target assignment vector
v.sub.A,T(k-1),
[0103] assigning 804 in a Channel Assignment block 105 the
predominant sound signals X.sub.PS(k-1) obtained from the
decomposing, and the determined coefficient sequences of the
modified ambient HOA component C.sub.M,A(k-2) and of the temporally
predicted modified ambient HOA component C.sub.P,M,A(k-1) to the
given number/of channels using the information provided by the
final assignment vector v.sub.A(k-2), wherein transport signals
y.sub.i(k-2), i=1, . . . , I and predicted transport signals
y.sub.P,i(k-2), i=1, . . . , I are obtained, and performing gain
control 805 to the transport signals y.sub.i(k-2) and the predicted
transport signals y.sub.P,i(k-2) in a plurality of Gain Control
blocks 306, wherein gain modified transport signals z.sub.i(k-2),
exponents e.sub.i(k-2) and exception flags .beta..sub.i(k-2) are
obtained.
[0104] The perceptual encoding and source encoding comprises steps
of perceptually coding 806 in a Perceptual Coder 310 said gain
modified transport signals z.sub.i(k-2), wherein perceptually
encoded transport signals (k-2), i=1, . . . , I are obtained,
[0105] encoding 807 in one or more Side Information Source Coders
320,330 side information comprising said exponents e.sub.i(k-2) and
exception flags .beta..sub.i(k-2), said first tuple sets
.sub.DIR(k) an second tuple sets .sub.VEC(k), prediction parameters
.xi.(k-1) and said final assignment vector v.sub.A(k-2), wherein
encoded side information {hacek over (.GAMMA.)}(k-2) is obtained;
and
[0106] multiplexing 808 the perceptually encoded transport signals
(k-2) and the encoded side information {hacek over (.GAMMA.)}(k-2),
wherein a multiplexed data stream {hacek over ({hacek over (B)})}
(k-2) is obtained.
[0107] The ambient HOA component {tilde over (C)}.sub.AMB(k-1)
obtained in the decomposing step 802 comprises first HOA
coefficient sequences of the input HOA representation c.sub.n(k-1)
in O.sub.MIN lowest positions (ie. those with lowest indices) and
second HOA coefficient sequences c.sub.AMB,n(k-1) in remaining
higher positions. The second coefficient sequences are part of an
HOA representation of a residual between the input HOA
representation and the HOA representation of the predominant sound
signals.
[0108] The first O.sub.MIN exponents e.sub.i(k-2), i=1, . . . ,
O.sub.MIN and exception flags .beta..sub.i(k-2), i=1, . . . ,
O.sub.MIN are encoded in a Base Layer Side Information Source Coder
320, wherein encoded Base Layer side information {hacek over
(.GAMMA.)}.sub.BASE(k-2) is obtained, and wherein
O.sub.MIN=(N.sub.MIN+1).sup.2 and O=(N+1).sup.2, with
N.sub.MIN.ltoreq.N and O.sub.MIN.ltoreq.I and N.sub.MIN is a
predefined integer value.
[0109] The first O.sub.MIN perceptually encoded transport signals
(k-2), i=1, . . . , O.sub.MIN and the encoded Base Layer side
information {hacek over (.GAMMA.)}.sub.BASE(k-2) are multiplexed
809 in a Base Layer Bitstream Multiplexer 340, wherein a Base Layer
bitstream {hacek over (B)}.sub.BASE(k-2) is obtained.
[0110] The remaining I-O.sub.MIN exponents e.sub.i(k-2),
i=O.sub.MIN+1, . . . , I and exception flags .beta..sub.i(k-2),
i=O.sub.MIN+1, . . . , I, said first tuple sets .sub.DIR(k-1) and
second tuple sets .sub.VEC(k-1), said prediction parameters
.xi.(k-1) and said final assignment vector v.sub.A(k-2) (also shown
as v.sub.AMB,ASSIGN(k) in the Figures) are encoded in an
Enhancement Layer Side Information Encoder 330, wherein encoded
enhancement layer side information {hacek over
(.GAMMA.)}.sub.ENH(k-2) is obtained.
[0111] The remaining I-O.sub.MIN perceptually encoded transport
signals (k-2), i=O.sub.MIN+1, . . . , I and the encoded enhancement
layer side information {hacek over (.GAMMA.)}.sub.ENH(k-2) are
multiplexed 810 in an Enhancement Layer Bitstream Multiplexer 350,
wherein an Enhancement Layer bitstream {hacek over
(B)}.sub.ENH(k-2) is obtained.
[0112] A mode indication is added 811 that signalizes usage of a
layered mode, as described above. The mode indication is added by
an indication insertion block or a multiplexer.
[0113] In one embodiment, the method further comprises a final step
of multiplexing the Base Layer bitstream {hacek over
(B)}.sub.BASE(k-2), Enhancement Layer bitstream {hacek over
(B)}.sub.ENH(k-2) and mode indication into a single bitstream.
In one embodiment, said dominant direction estimation is dependent
on a directional power distribution of the energetically dominant
HOA components. In one embodiment, in modifying the ambient HOA
component, a fade in and fade out of coefficient sequences is
performed if the HOA sequence indices of the chosen HOA coefficient
sequences vary between successive frames. In one embodiment, in
modifying the ambient HOA component, a partial decorrelation of the
ambient HOA component C.sub.AMB(k-1) is performed.
[0114] In one embodiment, quantized direction comprised in the
first tuple sets .sub.DIR(k) is a dominant direction.
[0115] FIG. 9 shows a flow-chart of a method for decompressing a
compressed HOA signal.
[0116] In this embodiment of the invention, the method 900 for
decompressing a compressed HOA signal comprises perceptual decoding
and source decoding and subsequent spatial HOA decoding to obtain
output time frames C(k-1) of HOA coefficient sequences, and the
method comprises a step of detecting 901 a layered mode indication
LMF.sub.D indicating that the compressed Higher Order Ambisonics
(HOA) signal comprises a compressed base layer bitstream {hacek
over (B)}.sub.BASE(k) and a compressed enhancement layer bitstream
{hacek over (B)}.sub.ENH(k).
[0117] The perceptual decoding and source decoding comprises steps
of
[0118] demultiplexing 902 the compressed base layer bitstream
{hacek over (B)}.sub.BASE(k), wherein first perceptually encoded
transport signals .sub.i(k), i=1, . . . , O.sub.MIN and first
encoded side information {hacek over (.GAMMA.)}.sub.BASE(k) are
obtained,
[0119] demultiplexing 903 the compressed enhancement layer
bitstream {hacek over (B)}.sub.ENH(k), wherein second perceptually
encoded transport signals .sub.i(k), i=O.sub.MIN+1, . . . , I and
second encoded side information {hacek over (.GAMMA.)}.sub.ENH(k)
are obtained,
[0120] perceptually decoding 904 the perceptually encoded transport
signals .sub.i(k), i=1, . . . , I, wherein perceptually decoded
transport signals {circumflex over (z)}.sub.i(k) are obtained, and
wherein in a Base Layer Perceptual Decoder 540 said first
perceptually encoded transport signals .sub.i(k), i=1, . . . ,
O.sub.MIN of the base layer are decoded and first perceptually
decoded transport signals {circumflex over (z)}.sub.i(k), i=1, . .
. , O.sub.MIN are obtained, and wherein in an Enhancement Layer
Perceptual Decoder 550 said second perceptually encoded transport
signals .sub.i(k), i=O.sub.MIN+1, . . . , I of the enhancement
layer are decoded and second perceptually decoded transport signals
{circumflex over (z)}.sub.i(k), i=O.sub.MIN+1, . . . , I are
obtained,
[0121] decoding 905 the first encoded side information {hacek over
(.GAMMA.)}.sub.BASE(k) in a Base Layer Side Information Source
Decoder 530, wherein first exponents e.sub.i(k), i=1, . . . ,
O.sub.MIN and first exception flags .beta..sub.i(k), i=1, . . . ,
O.sub.MIN are obtained, and
[0122] decoding 906 the second encoded side information {hacek over
(.GAMMA.)}.sub.NH(k) in an Enhancement Layer Side Information
Source Decoder 560, wherein second exponents e.sub.i(k),
i=O.sub.MIN+1, . . . , I and second exception flags
.beta..sub.i(k), i=O.sub.MIN+1, . . . , I are obtained, and wherein
further data are obtained 907, the further data comprising a first
tuple set .sub.DIR(k+1) for directional signals and a second tuple
set .sub.VEC(k+1) or vector based signals, each tuple of the first
tuple set .sub.DIR(k+1) comprising an index of a directional signal
and a respective quantized direction, and each tuple of the second
tuple set .sub.VEC(k+1) comprising an index of a vector based
signal and a vector defining the directional distribution of the
vector based signal, and further wherein prediction parameters
.xi.(k+1) 908 and an ambient assignment vector v.sub.AMB,ASSIGN(k)
909 are obtained. The ambient assignment vector v.sub.AMB,ASSIGN(k)
comprises components that indicate for each transmission channel if
and which coefficient sequence of the ambient HOA component it
contains.
[0123] The spatial HOA decoding comprises steps of
[0124] performing 910 inverse gain control, wherein said first
perceptually decoded transport signals {circumflex over
(z)}.sub.i(k), i=1, . . . , O.sub.MIN are transformed into first
gain corrected signal frames y.sub.i(k), i=1, . . . , O.sub.MIN
according to said first exponents e.sub.i(k), i=1, . . . ,
O.sub.MIN and said first exception flags .beta..sub.i(k), i=1, . .
. , O.sub.MIN, and wherein said second perceptually decoded
transport signals {circumflex over (z)}.sub.i(k), i=O.sub.MIN+1, .
. . , I are transformed into second gain corrected signal frames
y.sub.i(k), i=O.sub.MIN+1, . . . , I according to said second
exponents e.sub.i(k), i=O.sub.MIN+1, . . . , I and said second
exception flags (.beta..sub.i (k), i=O.sub.MIN+1, . . . , I,
[0125] redistributing 911 in a Channel Reassignment block 605 the
first and second gain corrected signal frames y.sub.i(k), i=1, . .
. , I to I channels, wherein frames of predominant sound signals
{circumflex over (X)}.sub.PS(k) are reconstructed, the predominant
sound signals comprising directional signals and vector based
signals, and wherein a modified ambient HOA component {tilde over
(C)}.sub.I,AMB(k) is obtained, and wherein the assigning is made
according to said ambient assignment vector v.sub.AMB,ASSIGN(k) and
to information in said first and second tuple sets .sub.DIR(k+1),
.sub.VEC(k+1),
[0126] generating 911b in the Channel Reassignment block 605 a
first set of indices .sub.AMB,ACT(k) of coefficient sequences of
the modified ambient HOA component that are active in the k.sup.th
frame, and a second set of indices .sub.E(k-1), .sub.D(k-1),
.sub.U(k-1) of coefficient sequences of the modified ambient HOA
component that have to be enabled, disabled and to remain active in
the (k-1).sup.th frame,
[0127] synthesizing 912 in the Predominant Sound Synthesis block
606 a HOA representation of the predominant HOA sound components
C.sub.PS(k-1) from said predominant sound signals {circumflex over
(X)}.sub.PS(k), wherein the first and second tuple sets
.sub.DIR(k+1), .sub.VEC(k+1)), the prediction parameters .xi.(k+1)
and the second set of indices .sub.E (k-1), .sub.D(k-1),
.sub.U(k-1) are used,
[0128] synthesizing 913 in the Ambient Synthesis block 607 an
ambient HOA component {circumflex over ({tilde over
(C)})}.sub.AMB(k-1) from the modified ambient HOA component {tilde
over (C)}.sub.I,AMB(k), wherein an inverse spatial transform for
the first O.sub.MIN channels is made and wherein the first set of
indices .sub.AMB,ACT(k) is used, the first set of indices being
indices of coefficient sequences of the ambient HOA component that
are active in the kth frame, wherein the ambient HOA component has
one of at least two different configurations, depending on the
layered mode indication LMF.sub.D, and
[0129] adding 914 the HOA representation of the predominant HOA
sound components C.sub.PS(k-1) and the ambient HOA component
{circumflex over ({tilde over (C)})}.sub.AMB (k-1) in a HOA
Composition block 608, wherein coefficients of the HOA
representation of the predominant sound signals and corresponding
coefficients of the ambient HOA component are added, and wherein
the decompressed HOA signal C(k-1) is obtained, and wherein the
following conditions apply:
[0130] if the layered mode indication LMF.sub.D indicates a layered
mode with at least two layers, only the highest I-O.sub.MIN
coefficient channels are obtained by addition of the predominant
HOA sound components C.sub.PS(k-1) and the ambient HOA component
{circumflex over ({tilde over (C)})}.sub.AMB(k-1), and the lowest
O.sub.MIN coefficient channels of the decompressed HOA signal
C(k-1) are copied from the ambient HOA component {circumflex over
({tilde over (C)})}.sub.AMB(k-1). Otherwise, if the layered mode
indication LMF.sub.D indicates a single-layer mode, all coefficient
channels of the decompressed HOA signal C(k-1) are obtained by
addition of the predominant HOA sound components C.sub.PS(k-1) and
the ambient HOA component {circumflex over ({tilde over
(C)})}.sub.AMB(k-1).
[0131] The configuration of the ambient HOA component in dependence
of the layered mode indication LMF.sub.D is as follows:
[0132] If the layered mode indication LMF.sub.D indicates a layered
mode with at least two layers, the ambient HOA component comprises
in its O.sub.MIN lowest positions HOA coefficient sequences of the
decompressed HOA signal C(k-1), and in remaining higher positions
coefficient sequences being part of an HOA representation of a
residual between the decompressed HOA signal C(k-1) and the HOA
representation of the predominant HOA sound components
C.sub.PS(k-1).
[0133] On the other hand, if the layered mode indication LMF.sub.D
indicates a single-layer mode, the ambient HOA component is a
residual between the decompressed HOA signal C(k-1) and the HOA
representation of the predominant HOA sound components
C.sub.PS(k-1).
[0134] In one embodiment, the compressed HOA signal representation
is in a multiplexed bitstream, and the method for decompressing the
compressed HOA signal further comprises an initial step of
demultiplexing the compressed HOA signal representation, wherein
said compressed base layer bitstream {hacek over (B)}.sub.BASE(k),
said compressed enhancement layer bitstream {hacek over
(B)}.sub.ENH(k) and said layered mode indication LMF.sub.D are
obtained.
[0135] FIG. 10 shows details of parts of an architecture of a
spatial HOA decoding portion of a HOA decompressor according to one
embodiment of the invention.
[0136] Advantageously, it is possible to decode only the BL, e.g.
if no EL is received or if the BL quality is sufficient. For this
case, signals of the EL can be set to zero at the decoder. Then,
the redistributing 911 the first and second gain corrected signal
frames y.sub.i(k), i=1, . . . , I to I channels in the Channel
Reassignment block 605 is very simple, since the frames of
predominant sound signals {circumflex over (X)}.sub.PS(k) are
empty. The second set of indices .sub.E(k-1), .sub.D(k-1),
.sub.U(k-1) of coefficient sequences of the modified ambient HOA
component that have to be enabled, disabled and to remain active in
the (k-1).sup.th frame are set to zero. The synthesizing 912 the
HOA representation of the predominant HOA sound components
C.sub.PS(k-1) from the predominant sound signals {circumflex over
(X)}.sub.PS(k) in the Predominant Sound Synthesis block 606 can
therefore be skipped, and the synthesizing 913 an ambient HOA
component {circumflex over ({tilde over (C)})}.sub.AMB(k-1) from
the modified ambient HOA component {tilde over (C)}.sub.I,AMB(k) in
the Ambient Synthesis block 607 corresponds to a conventional HOA
synthesis.
[0137] The original (ie. monolithic, non-scalable, non-layered)
mode for the HOA compression may still be useful for applications
where a low quality base layer bit stream is not required, e.g. for
file based compression. A major advantage of perceptually coding
the spatially transformed first O.sub.MIN coefficient sequences of
the ambient HOA component C.sub.AMB, which is a difference between
the original and the directional HOA representation, instead of the
spatially transformed coefficient sequences of the original HOA
component C, is that in the former case the cross correlations
between all signals to be perceptually coded are reduced. Any cross
correlations between the signals z.sub.i, i=1, . . . , I may cause
a constructive superposition of the perceptual coding noise during
the spatial decoding process, while at the same time the noise-free
HOA coefficient sequences are canceled at superposition. This
phenomenon is known as perceptual noise unmasking.
[0138] In the layered mode, there are high cross correlations
between each of the signals z.sub.i, i=1, . . . , O.sub.MIN and
also between the signals z.sub.i, i=1, . . . , O.sub.MIN and
z.sub.i, i=O.sub.MIN+1, . . . , I, because the modified coefficient
sequences of the ambient HOA component {tilde over (c)}.sub.AMB,n,
n=1, . . . , O.sub.MIN include signals of the directional HOA
component (see eq.(3)). To the contrary, this is not the case for
the original, non-layered mode. It can therefore be concluded that
the transmission robustness introduced by the layered mode may come
at the expense of compression quality. However, the reduction in
compression quality is low compared to the increase in transmission
robustness. As has been shown above, the proposed layered mode is
advantageous in at least the situations described above.
[0139] While there has been shown, described, and pointed out
fundamental novel features of the present invention as applied to
preferred embodiments thereof, it will be understood that various
omissions and substitutions and changes in the apparatus and method
described, in the form and details of the devices disclosed, and in
their operation, may be made by those skilled in the art without
departing from the spirit of the present invention. It is expressly
intended that all combinations of those elements that perform
substantially the same function in substantially the same way to
achieve the same results are within the scope of the invention.
Substitutions of elements from one described embodiment to another
are also fully intended and contemplated.
[0140] It will be understood that the present invention has been
described purely by way of example, and modifications of detail can
be made without departing from the scope of the invention.
[0141] Each feature disclosed in the description and (where
appropriate) the claims and drawings may be provided independently
or in any appropriate combination. Features may, where appropriate
be implemented in hardware, software, or a combination of the two.
Connections may, where applicable, be implemented as wireless
connections or wired, not necessarily direct or dedicated,
connections.
[0142] Reference numerals appearing in the claims are by way of
illustration only and shall have no limiting effect on the scope of
the claims.
CITED REFERENCES
[0143] [1] EP12306569.0 [0144] [2] EP12305537.8 (published as
EP2665208A) [0145] [3] EP133005558.2 [0146] [4] ISO/IEC
JTC1/SC29/WG11 N14264. Working draft 1-HOA text of MPEG-H 3D audio,
January 2014
* * * * *