U.S. patent application number 13/828761 was filed with the patent office on 2014-06-26 for method and a system for determining the geometry and/or the localization of an object.
This patent application is currently assigned to Ecole Polytechnique Federale de Lausanne EPFL. The applicant listed for this patent is ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE EPFL. Invention is credited to Ivan DOKMANIC, Yue Lu, Reza Parhizkar, Martin Vetterli, Andreas Walther.
Application Number | 20140180629 13/828761 |
Document ID | / |
Family ID | 50975639 |
Filed Date | 2014-06-26 |
United States Patent
Application |
20140180629 |
Kind Code |
A1 |
DOKMANIC; Ivan ; et
al. |
June 26, 2014 |
METHOD AND A SYSTEM FOR DETERMINING THE GEOMETRY AND/OR THE
LOCALIZATION OF AN OBJECT
Abstract
A method for determining the geometry and/or the localisation of
an object comprising the steps of: sending one or more signals by
using one transmitter; receiving by one or more receivers the
transmitted signals and the echoes of the transmitted signals as
reflected by one or more reflective surfaces building by a
computing module a first Euclidean Distance Matrix (EDM) comprising
the mutual positions of the receivers; adding to the EDM matrix a
new row and a new column, the new row and a new column comprising
time of arrivals of said echoes and computing its rank or distance
to an EDM matrix determining the geometry and/or the position of
the object based on said rank or distance.
Inventors: |
DOKMANIC; Ivan; (Lausanne,
CH) ; Parhizkar; Reza; (Ecublens, CH) ;
Walther; Andreas; (Crissier, CH) ; Vetterli;
Martin; (Grandvaux, CH) ; Lu; Yue; (Arlington,
MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE EPFL |
Lausanne |
|
CH |
|
|
Assignee: |
Ecole Polytechnique Federale de
Lausanne EPFL
Lausanne
CH
|
Family ID: |
50975639 |
Appl. No.: |
13/828761 |
Filed: |
March 14, 2013 |
Current U.S.
Class: |
702/150 ; 367/99;
702/155 |
Current CPC
Class: |
G01S 15/876 20130101;
H04S 7/305 20130101; G01S 15/42 20130101; G01S 2015/465 20130101;
G01H 7/00 20130101; G01S 15/46 20130101; H04R 29/005 20130101; G01B
5/00 20130101; G01S 7/54 20130101; G01S 7/539 20130101; G01C 15/00
20130101; G01S 17/06 20130101; H04R 1/08 20130101; G01S 15/06
20130101 |
Class at
Publication: |
702/150 ; 367/99;
702/155 |
International
Class: |
G01C 15/00 20060101
G01C015/00; G01B 5/00 20060101 G01B005/00; G01H 7/00 20060101
G01H007/00 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 22, 2012 |
CH |
02935/12 |
Claims
1. A method for determining the geometry and/or the position of an
object, comprising the steps of sending one or more signals with
one transmitter; receiving by one or more receivers the transmitted
signals and echoes of the transmitted signals reflected by one or
more reflective surfaces; building with a computing module a first
Euclidean distance matrix corresponding to mutual positions of the
receivers; adding to said matrix a new row and a new column, the
new row and a new column corresponding to the time of arrivals of
at least some of said echoes, and computing the rank of the
modified matrix, or computing the distance between the modified
matrix from a true Euclidean Distance Matrix; determining the
geometry and/or the position of the object based on the computed
information.
2. The method of claim 1, wherein only first order echoes are
considered.
3. The method of claim 1, wherein only echoes received during a
predetermined time window are considered.
4. The method of claim 1, said object being a convex room, said
transmitter being a loudspeaker, each receiver being a microphone,
said geometry being a 2D geometry, the number of receivers being
3.
5. The method of claim 1, said object being a convex room, said
transmitter being a loudspeaker, each receiver being a microphone,
said geometry being a 3D geometry, the number of receivers being
higher than 4.
6. The method of claim 1, said object being a receiver, said
transmitter being a satellite, said receiver being a mobile
device.
7. The method of claim 1, comprising determining the geometry
and/or the localisation of an object comprising the step of
labelling echos.
8. The method of claim 1, comprising determining which of the peaks
of the impulse response received by each receiver correspond to
which reflective surface.
9. The method of claim 1, comprising verifying if the augmented
matrix still verify the rank property according which a EDM in Rn
has a rank at most n+2, n being an integer and positive number.
10. The method of claim 9, comprising testing at least some echoes
combination and selecting the combination for which the rank
property is satisfied.
11. The method of claim 1, comprising augmenting said EDM matrix by
a vector t formed by the TOA from the transmitter to each
receiver.
12. The method of claim 1, comprising determining the location of
the transmitter by using least-squared distance trilateration.
13. The method of claim 1, comprising multi-dimensional
scaling.
14. The method of claim 13, comprising applying a s-stress
criterion.
15. A system for determining the geometry and/or the localisation
of an object, comprising: a transmitter for sending one or more
signals; one or more receivers for receiving the transmitted
signals and the echoes of the transmitted signals as reflected by
one or more reflective surfaces; a first computing module for
building a first Euclidean Distance Matrix (EDM) corresponding to
mutual positions of the receivers; a second computing module for
adding to the EDM a new row and a new column, the new row and a new
column comprising time of arrivals of said echoes and computing its
second rank or its distance from the first EDM; a third computing
module for determining the geometry and/or the position of the
object based on said second rank or distance.
16. The system of claim 15, the first, second and third modules
being the same module.
17. The system of claim 15, the transmitter being a loudspeaker,
the receiver being a microphone, the object being a room comprising
said loudspeaker and said microphone.
18. The system of claim 15, the transmitter being a satellite, the
receiver a mobile device, the object being said mobile device.
19. A computer program product, comprising: a tangible computer
usable medium including computer usable program code for
determining the geometry and/or the localisation of an object, the
computer usable program code being used for building a first
Euclidean Distance Matrix (EDM) comprising the mutual positions of
the receivers; adding to the EDM a new row and a new column, the
new row and a new column comprising time of arrivals of echoes of
the signals transmitted by a transmitter as reflected by one or
more reflective surfaces and received by one or more receivers and
computing its rank or its distance to the first EDM; determining
the geometry and/or the position of the object based on said rank
or distance.
Description
RELATED APPLICATION
[0001] The present application claims the priority of the Swiss
patent application CH2935/12 of Dec. 22, 2012, the content of which
is hereby incorporated by reference.
FIELD OF THE INVENTION
[0002] The present invention concerns a method and a system for
determining the geometry and/or the localisation of an object, e.g.
of a wall, a room, a microphone, a loudspeaker or a person. The
invention concerns in particular the estimation of the geometry of
a room from its acoustic room impulse responses (RIR).
DESCRIPTION OF RELATED ART
[0003] The problem of estimating the geometry of a room from its
acoustic room impulse responses (RIR) can be resumed by a question:
can a person blindfolded inside a room hear the shape of the room
after having snapped his fingers? In other words can the person
reconstruct the 2-D or 3-D geometry of the room from the acoustic
room impulse response (RIR)?
[0004] Beyond the question of uniqueness, meaning that the RIR is a
unique signature of a room, the question of reconstructing the
geometry from impulse responses is interesting algorithmically.
That is, are there efficient ways to recover the room geometry from
measured impulse responses?
[0005] Finally, establishing uniqueness would lead to localization
inside a known (or unknown) room and algorithms for tracking the
trajectory of a moving source listening to the varying RIRs. Key
questions are: how many sources, how many receivers, for what room
shapes?
[0006] Different known documents have tried to give some responses
to the question above. Moreover recently, there has been a renewed
interest in reconstructing the room shape from acoustic response,
as shown by the increasing number of publications on the
subject.
[0007] Some of these documents have used the image source model in
order to cope with the signal reflections. This image source model,
along with the first and second order echoes, are described in
FIGS. 1 and 2.
[0008] FIG. 1 illustrates a room defined by the walls w1, w2 and by
other walls not represented and comprising a source or transmitter
s and a receiver r. The source can be for example and in a non
limitative way a loudspeaker and the receiver a microphone. The
walls are reflective surface, i.e. a surface allowing a signal to
be reflected, the angle at which the signal is incident on this
surface being equal to the angle at which it is reflected.
[0009] A first audio signal transmitted by the source s is
reflected by the wall w2. The reflected signal or echo e1 is then
received by the receiver r. Since there is a single reflection of
the transmitted signal before its reception by the receiver r, the
echo e1 is a first-order echo. A second audio signal transmitted by
the source is reflected first by the wall w2 and after by the wall
w2: the reflected signal or echo e2 is then received by the
receiver r. Since there are two reflections of the transmitted
signal before its reception by the receiver r, the echo e2 is a
second-order echo.
[0010] The times of arrival (TOA) is defined as the travel time
from a source s to a receiver r. The audio signals e1 and e2 can
have different time of arrivals (TOAs).
[0011] FIG. 2 illustrates a system comprising a room defined by
some walls (for sake of clarity only three walls are represented),
a source or transmitter s and a receiver r. The points p.sub.i and
p.sub.i+1 are the end-points of the i.sub.th-wall, n.sub.i is its
unit, outward pointing normal and {tilde over (s)}.sub.i is an
image source: in fact the signal e.sub.i received by the receiver r
could be considered as generated by the image or virtual source
{tilde over (s)}.sub.i which is the mirror image of the source s
with respect to the wall defined by the points p.sub.i and
p.sub.i+1. {tilde over (s)}.sub.i is a first generation image
source as the signal e.sub.i received by the receiver r has been
reflected once by the wall. In other words {tilde over (s)}.sub.i
is a first generation image source as e.sub.i is a first-order
echo.
[0012] {tilde over (s)}.sub.ij is the image of {tilde over
(s)}.sub.i with respect of the wall (i+1). It is then a second
generation image source, generating a second-order echo.
[0013] The virtual sources {tilde over (s)}.sub.i or {tilde over
(s)}.sub.ij are not real, tangible and concrete sources as the
"real" source s. In other words they are abstract objects used for
studying the signal reflections, according to the well known
image-source theory, used e.g. in optics.
[0014] The use of the reflections of a signal for the determination
of the position of the real source and/or of the shape of a room is
known from US2011317522. However the described algorithm does not
propose to find the source location immediately as there is a huge
number of intermediated steps and hypothesis.
[0015] In U.S. Pat. No. 7,688,678 the volume of a room is
determined by using the diffused field, i.e. without image
sources.
[0016] In GEOMETRICALLY CONSTRAINED ROOM MODELING WITH COMPACT
MICROPHONE ARRAYS, F. RIBEIRO, D. A. FLORENCIO, D. E. BA, AND C.
ZHANG, IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL
PROCESSING, VOL. 20, NO. 5, PP. 1449-1460 2012, it is necessary to
know in advance the mutual position of the microphones and of the
loudspeaker. Moreover since many impulse responses have to be
measured by putting a fake wall at different positions with respect
to the microphone array and the loudspeaker, the resulting matrix
of shifted impulse responses is also quite huge and then computing
expensive.
[0017] In INFERENCE OF ROOM GEOMETRY FROM ACOUSTIC IMPULSE
RESPONSES ANTONACCI, FILOS, THOMAS, HABETS, SARTI, NAYLOR, TUBARO
TO APPEAR ON IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE
PROCESSING, 2012, only the 2D geometry of a room is estimated, i.e.
there are not estimations of the floor and the ceiling. In this
case the discretized Hough transform is used. Moreover the
described algorithm requires that the source has to be placed in
many different positions.
[0018] In F. Antonacci, A. Sarti, and S. Tubaro, "Geometric
reconstruction of the environment from its response to multiple
acoustic emissions" in Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing, Dallas,
2010, pp. 2822-2825, the authors propose to move a loudspeaker
around a microphone to collect multiple impulse responses and then
estimate the distance and the angle of the reflector (a line since
they consider a 2-D case) using the tools of projective geometry.
Each source-receiver pair defines an ellipse of possible reflection
points, and the wall is estimated as the common tangents to all of
the ellipses.
[0019] M. Kuster, D. de Vries, E. M. Hulsebos, and A. Gisolf,
"Acoustic imaging in enclosed spaces: Analysis of room geometry
modifications on the impulse response", Journal of the Acoustical
Society of America, vol. 116, no. 4, pp. 2126-2137, 2004, describes
an approach based on acoustic imaging is proposed. An array
comprising many microphones is used to sample the sound field and
then employ wave field inversion to infer the room.
[0020] J. Fibs and E. A. P. Habets, "A two-step approach to blindly
infer room geometries", in Proceedings of the International
Workshop on Acoustic Echo and Noise Control, 2010, propose to use
projective geometry tools to infer the room geometry.
[0021] S. Tervo, "Localization and tracing of early acoustic
reflections", Ph.D. thesis, Aalto, University, School of Science,
Department of Media Technology, 2012, describes a method using
directive loudspeakers, and then scanning the room for reflectors.
The proposed method requires multiple emissions for the room to be
scanned completely.
[0022] Some inventors of the present invention have previously
worked on a problem of estimating the room geometry from a single
RIR in I. Dokmanic, Y. M. Lu, and M. Vetterli, "Can One Hear the
Shape of a Room: The 2-D Polygonal Case", in Proceedings of the
IEEE International Conference on Acoustics, Speech, and Signal
Processing, Prague, 2011, EPFL. The described algorithm is based on
the complete knowledge of first and second generation echo Times of
Arrivals (TOAs): however the second generation is often difficult
to obtain for practical reasons (e.g. attenuation of the signal).
Then the proposed algorithm is not applicable in practice since
without using second-order echoes a single RIR does not suffice for
reconstructing the shape of the room.
[0023] The known solutions are then not often applicable in
practice. They are not exact, since some approximations are
necessary (as for example in the Hough transform case). Some of
them allow to reconstruct the 2D geometry of a room only, without
considering ceiling and floor. They require also a huge number of
receivers and/or transmitters.
[0024] It is an aim of the present invention to obviate or mitigate
one or more of the aforementioned disadvantages.
BRIEF SUMMARY OF THE INVENTION
[0025] According to the invention, these aims are achieved by means
of a method for determining the geometry and/or the localisation of
an object according to claim 1, a system for determining the
geometry and/or the localisation of an object according to claim
15, a computer program product determining the geometry and/or the
localisation of an object according to claim 19.
[0026] The method according to the invention comprises the steps
of
[0027] sending one or more signals by using one transmitter
[0028] receiving by one or more receivers the transmitted signals
and the echoes of the transmitted signals as reflected by one or
more reflective surfaces
[0029] building by a computing module a first Euclidean Distance
Matrix (EDM) comprising the mutual positions of the receivers;
[0030] adding to the Euclidean Distance Matrix a new row and a new
column, the new row and a new column comprising time of arrivals of
said echoes and computing the rank of the modified matrix, or by
computing how far the modified matrix is from a true Euclidean
Distance Matrix;
[0031] determining the geometry and/or the position of the object
based on the computed information.
[0032] The first EDM (Euclidean Distance Matrix) corresponds to the
receivers setup, which is known. For example given some receivers
r.sub.i, the EDM matrix D.epsilon.R.sup.M.times.M comprising the
following elements:
d.sub.ij=.parallel.r.sub.i-r.sub.j.parallel..sub.2.sup.21.ltoreq.i,j.lto-
req.M
[0033] where .parallel..cndot..parallel..sub.2.sup.2 is an
Euclidean distance.
[0034] The EDM matrix is then a symmetric matrix with positive
entries and a zero diagonal.
[0035] Advantageously the proposed method performs an echo
labelling. In fact, in order to know which of the peaks in impulse
responses received by the receivers (e.g. microphones) correspond
to which reflective surface (e.g. wall of a room), instead of
relying on different derived heuristics, intrinsic properties of
point sets in Euclidean spaces are used. A particular property
easily exploited is the rank property of EDM, which says that the
EDM corresponding to a point set in R.sup.n has the rank at most
n+2. In 2-D, its rank can thus be at most 4, and in 3-D at most
5.
[0036] The matrix D is augmented with a combination of M TOAs. This
corresponds to adding a new row and a new column to D. If the
augmented matrix D.sub.aug, still verifies the rank property (or
more generally, the EDM property), then the selected combination of
echoes corresponds to an image source, or equivalently, to a
reflective surface (e.g. a wall).
[0037] Even if this requires to test all the echoes combinations,
in practical cases the number of combinations is quite small and
does not represent a problem: e.g. with M=4, only 256 combinations
have to be tested. Moreover there are not many correct
combinations, but only one.
[0038] The number of combinations may even be smaller, by choosing
to combine a particular echo received by one microphone only with
those echoes from other microphones that were received within a
temporal window corresponding to the size of the microphone
setup.
[0039] The advantage of use the EDM approach is that it is exact,
not approximate (like e.g. the discrete Hough transform). It is
then a clear-cut criterion for good combinations of echoes.
[0040] It can be applied for many signals (acoustic signals, radio
signals, UWB signals, etc.). It is very general and can be extended
to multiple sources, multiple microphones very easily (as will be
discussed here below, it can be applied to MIMO applications).
[0041] It requires only one source or transmitter. It can work with
a small number of receivers, i.e. less than 5.
[0042] It can be used for determining a 3D geometry of a room.
[0043] In one preferred embodiment the method considers first-order
echoes only; other echoes are not considered, and may be discarded.
Therefore the method does not rely on a knowledge of second-order
and further-order echoes, which are difficult to measure.
[0044] In one preferred embodiment the object is a convex room, the
transmitter is a loudspeaker, each receiver is a microphone, the
geometry is a 2D geometry and the number of receivers is 3. In
other words the proposed method allows to determine the 2D geometry
of a room by using one loudspeaker and only 3 microphones. So the
proposed method uses a reduced number of receivers for accurately
determining the room's 2D geometry.
[0045] The proposed method based on EDM can be extended to
determine the 3D geometry of a room, using at least 5
receivers.
[0046] The method according to the invention can comprise the
determination of the location of the transmitter by using
least-squared distance trilateration. It can comprise
multi-dimensional scaling. It can comprise applying a s-stress
criterion.
[0047] The present invention concerns also a system for determining
the geometry and/or the localisation of an object comprising
[0048] a transmitter for sending one or more signals;
[0049] one or more receivers for receiving the transmitted signals
and the echoes of the transmitted signals as reflected by one or
more reflective surfaces;
[0050] a first computing module for building a first Euclidean
Distance Matrix (EDM) comprising the mutual positions of the
receivers, and optionally computing its rank;
[0051] a second computing module for adding to the EDM a new row
and a new column, the new row and a new column comprising time of
arrivals of said echoes and computing its rank or its distance from
the first EDM;
[0052] a third computing module for determining the geometry and/or
the position of the object based on said rank or distance.
[0053] In one preferred embodiment the first, second and third
modules are the same module.
[0054] The present invention concerns also a computer program
product for determining the geometry and/or the localisation of an
object, comprising:
a tangible computer usable medium including computer usable program
code being used for
[0055] building a first Euclidean Distance Matrix (EDM) comprising
the mutual positions of the receivers and optionally computing its
rank;
[0056] adding to the EDM a new row and a new column, the new row
and a new column comprising time of arrivals of echoes of the
signals transmitted by a transmitter as reflected by one or more
reflective surfaces and received by one or more receivers and
computing its rank or its distance from the first EDM or set of
EDMs;
[0057] determining the geometry and/or the position of the object
by comparing the first rank based on the second rank and/or on said
distance.
[0058] The present invention concerns also a computer data carrier
storing presentation content created with the described method.
BRIEF DESCRIPTION OF THE DRAWINGS
[0059] The invention will be better understood with the aid of the
description of an embodiment given by way of example and
illustrated by the figures, in which:
[0060] FIGS. 1 and 2 show a view of a room comprising a source or
transmitter and a receiver.
[0061] FIG. 3 shows a view of a room comprising a source or
transmitter and four receivers.
[0062] FIG. 4A to 4C show the RIR received from each receiver.
[0063] FIG. 5 illustrates some possible room's reconstructions due
to incorrect echo labelling.
[0064] FIG. 6 illustrates the feasible region concept.
[0065] FIG. 7 illustrates an embodiment of a system according to
the invention.
[0066] FIG. 8 illustrates an embodiment of a data processing system
in which a method in accordance with an embodiment of the present
invention, may be implemented.
DETAILED DESCRIPTION OF POSSIBLE EMBODIMENTS OF THE INVENTION
[0067] The present invention will be now described in more detail
in connection with its embodiment for determining the geometry of a
room. However the present invention finds applicability of
connection with many other fields, as will be discussed. Moreover
the two-dimensional case will be described first for the sake of
simplicity and illustrations. The three dimensional case then
follows easily.
[0068] The method according to the invention uses the image source
model. The idea in the image source model is that if there is a
sound source on one side of the wall, then the sound field on the
same side can be represented as a superposition of the original
sound field and the one generated by a mirror image of the source
with respect to the wall.
[0069] FIG. 2 illustrates the setup and the image source model. For
our purposes, a room is either a convex planar K-polygon or a
K-faced convex polyhedron. With the i.sub.th side of the room we
associate an outward pointing unit normal n.sub.i, and define the
normal matrix as N=(n.sub.1, . . . n.sub.k). The reference {tilde
over (s)}.sub.i denotes the image of the source s with respect to
the i side.
[0070] In the case of FIG. 2 the following relation is valid
{tilde over (s)}.sub.i=s+2p.sub.i+s,n.sub.in.sub.i (1)
[0071] In the remainder of this report we assume that the choice of
units is such that the speed of sound is unity, c=1. Adjustments
for the actual speed of sound are trivial.
[0072] By observing the impulse response and doing appropriate
computations it is possible to access to the first-order echos, but
also higher-order echoes.
[0073] FIG. 3 illustrates array of M microphones in a 2-D room (in
this case then M=4), and a loudspeaker s at an arbitrary position
in this room. The geometry enables the microphones to pick up the
first-order echoes only.
[0074] The references r1 to r4 denotes the receivers along with
their positions. Same considerations apply to the source.
[0075] In general r.sub.m.epsilon.R.sup.2 and
s.epsilon.R.sup.2.
[0076] The EDM matrix is the Euclidean Distance Matrix
corresponding to the microphones setup, which is known. In the case
of FIG. 3, the EDM matrix D.epsilon.R.sup.M.times.M comprising the
following elements:
d.sub.ij=.parallel.r.sub.i-r.sub.j.parallel..sub.2.sup.21.ltoreq.i,j.lto-
req.M
[0077] where .parallel..cndot..parallel..sub.2.sup.2 is an
Euclidean distance.
[0078] The EDM matrix is then a symmetric matrix with positive
entries and a zero diagonal.
[0079] If the loudspeaker s fires a pulse, each microphone (it is
assumed that all of them are in favourable positions so that they
observe echoes for all the walls) will receive the direct sound and
K first-order echoes. These echoes correspond to images of s across
the K walls. The locations of image sources are valid points of the
plane R.sup.2.
[0080] If the distances between the image sources and the
microphones are known, it is possible to reconstruct the locations
of image sources and hence the 2-D room.
[0081] In order to know which of the peaks Pi (see FIG. 4A to 4C)
in impulse responses RIR.sub.i received by the microphones
correspond to which wall (labeling problem) EDM is used. Instead of
relying on different derived heuristics, intrinsic properties of
point sets in Euclidean spaces are used. A particular property
easily exploited is the rank property of EDM, which says that the
EDM corresponding to a point set in R.sup.n has the rank at most
n+2. In 2-D, its rank can thus be at most 4, and in 3-D at most
5.
[0082] The matrix D is augmented with a combination of M TOAs. This
corresponds to adding a new row and a new column to D. If the
augmented matrix D.sub.aug still verifies the rank property (or
more generally, the EDM property), then the selected combination of
echoes corresponds to an image source, or equivalently, to a
reflective surface (e.g. a wall).
[0083] FIG. 5 illustrated some possible room reconstructions due to
incorrect echo labeling, of which only a single one (reference 10
in the Figure) satisfies the EDM criterion. The image source
location is estimated using least-squared-distance
trilateration.
[0084] Characterisation of Correct TOA Vectors
[0085] Denote by .tau..sub.m the set of first-order echo TOAs
received by the m-th microphone. The matrix D is now augmented by a
vector t so that
D aug = ( D t t T 0 ) ( 2 ) ##EQU00001##
[0086] where the vector t is formed by taking one TOA from each
microphone. In particular t=(t.sub.1.sup.2, . . . ,
t.sub.M.sup.2).sup.T, with t.sub.m.epsilon..tau..sub.m. It is
possible to state the following lemma:
[0087] Lemma 1.
[0088] If an M-tuple of echoes t={t.sub.1, . . . , t.sub.M} is such
that rank D.sub.aug<5, then
(t.sup.T,0).sup.T.epsilon.{(D,t).sup.T}. In particular, if M=4 and
the microphones are not colinear or on a circle, we have that
t.sup.TD.sup.-1t=0.
[0089] Proof.
[0090] First part is obvious. For the second part, let t be a
vector that corresponds to the fourtuple such that rank
(D.sub.aug)<4. It is possible to write
D aug = ( D t t T 0 ) ( 3 ) ##EQU00002##
[0091] If the rank of this matrix is 4 or less, it is possible to
represent the last column as a linear combination of the first four
columns. This in turn means that .E-backward.v such that
( D t T ) v = ( t 0 ) ( 4 ) ##EQU00003##
[0092] By components, one has
Dv=t
t.sup.Tv=0. (5)
[0093] Under the assumptions of the second part, D is invertible,
and combining the two equations yields the result.
[0094] Corollary 1.
[0095] Let
Z = { t .di-elect cons. R M : rank ( 0 t t T 0 ) < 5 } R M .
##EQU00004##
[0096] Then dim Z<M, that is, .mu.(Z)=0, where .mu. is the
Lebesgue measure in R.sup.M.
[0097] Proof.
[0098] Immediate from Lemma 1.
[0099] This means that it is possible to test all possible M-tuples
generated from the collected RIRs and know that that one that yield
singular D.sub.aug correspond to image sources. With 4 walls and 4
microphones, we have 4.sup.4=256 combinations. This amounts to 256
SVDs of a 5.times.5 matrix, which can be computed very fast, so the
combinatorial aspect is not an issue. After finding the
corresponding rows it is possible to triangulate to find the actual
locations of image sources, and from there find the walls.
[0100] The described procedure is summarized in Algorithm 1.
TABLE-US-00001 Algorithm 1 NOISELESS ROOM RECOVERY Input: Times of
Arrival T.sub.1, . . . , T.sub.M Output: Room walls 1: for every
{square root over (t)} .di-elect cons. T.sub.1 .times. . . .
.times. T.sub.M do 2: Build the matrix D aug = ( D t t T 0 ) ,
##EQU00005## 3: if rank D.sub.aug .ltoreq. 4, or equivalently,
[t.sup.T, 0].sup.T .di-elect cons. ([D, t].sup.T) then 4:
Triangulate the location of image source corresponding to t, 5:
Compute the wall normal as the vector from the loudspeaker to the
image source, 6: Compute the distance of the wall from the
loudspeaker. 7: end if 8: end for 9: Reconstruct the convex room
using the collected information.
[0101] Three-Dimensional Case
[0102] In the three-dimensional case, at least 5 microphones are
needed to apply the EDM method (see below for a method that enables
to use 4). Only slight adjustments are needed that reflect the
change of the ambient dimension. In fact, it is possible to
immediately apply the Algorithm 1, but instead of testing whether
rank D.sub.aug.ltoreq.4, one have to test whether rank
D.sub.aug.ltoreq.5.
[0103] Uniqueness
[0104] The goal is to show that the probability for the described
algorithm to fail is 0. To this end, it is defined a set of "good"
rooms in which the algorithm can be applied, and then prove two
theorems about the uniqueness of the solution. Since the algorithms
rely in the knowledge of the first-order TOAs, it is required that
the microphones hear them. This defines a "good" room, which is in
fact a combination of a room geometry and the microphone
array/loudspeaker location.
[0105] Definition 1 (Feasibility).
[0106] Given a room R and a loudspeaker position s, the point
x.epsilon.R is feasible if a microphone placed at x receives all
the first-order echoes of a pulse emitted from s. The interior of
the set of all feasible points is called a feasible region.
[0107] FIG. 6 illustrates the concept of a feasible region. With
this definition it is possible to state the first uniqueness
result.
[0108] Theorem 1.
[0109] Assume we are given a room and a source location. Assume
further that the room-loudspeaker combination generates a non-empty
feasible region and that the microphones are placed uniformly at
random in the feasible region. Then with probability 1, there is
only one room corresponding to the collected RIRs and it can be
retrieved by the Algorithm 1.
[0110] Sketch of proof. Fix any configuration of microphones
(r.sub.1, . . . , r.sub.M) such that all r.sub.m are in the
feasible region. This microphone configuration includes an M-tuple
of first-order TOAs, t.sub.0(t.sub.1, . . . , t.sub.M).sup.T. Now
since the feasible region is open, there is some
.epsilon.=.epsilon.(r.sub.1, . . . , r.sub.M)>0 such that we can
achieve any t.epsilon.B.sub..epsilon.(t.sub.0) by adjusting the
microphone positions. To see this, one can observe that it is
possible to adjust each t.sub.m independently of others by moving
the corresponding microphone.
[0111] Since this is true of any t.sub.0 one might generate, it
follows that the space of possible TOA combinations is the union of
all such open balls, and thus M-dimensional. By Corollary 1, the
dimension of the set of the M-tuples t that pass the EDM test is
smaller than M. But .mu.(A)=0 if dim(A)<M, where .mu. is the
Lebesgue measure in R.sup.M. It is possible to note that the
probability distribution introduced on ts is non singular since the
mapping t is continuous and the Jacobian of the mapping is
non-zero, so the claim follows. Alternatively by the same token it
is possible to note that the measure of all Rs that give viable ts
is zero, and directly conclude.
[0112] Remark:
[0113] A good way to think about this is that one can draw K.sup.M
samples from the non-singular (continuous) probability distribution
on the set of M-tuples t. By definition of the continuous
probability distribution, the probability to draw a sample from a
set with Lebesgue measure 0 must be 0 itself. It might appear
surprising that even if the probability to nail the correct M-tuple
is zero, one always has K correct ones. This is easy to explain by
noting that the echoes corresponding to one single wall are not
independent, but they are independent of the other echoes.
[0114] Theorem 2.
[0115] Assume you are given a fixed microphone array and a
loudspeaker position. A room is generated at random in such a
manner that the array is in the feasible region. Then with
probability 1, there is only one room corresponding to the
collected RIRs and it can be retrieved by the Algorithm 1.
[0116] The meaning of these theorems is essentially that in
whatever room one runs the algorithm so that the microphones are in
the feasible region, the solution is unique.
[0117] A Subspace Approach
[0118] The approach described in the previous section requires at
least four microphones in the 2-D case, and five microphones in the
3-D case. Now it is described another approach that works with a
minimal number of microphones (minimal in the sense that one cannot
use less by exploiting only the first order TOA information).
[0119] It is possible to always choose the origin of the coordinate
system so that one has
m = 1 M r m = 0 ( 6 ) ##EQU00006##
[0120] with r.sub.m=(r.sub.m.sup.x, r.sub.m.sup.y).sup.T. Let
{tilde over (s)}.sub.k be the location vector of one image source
(with respect to the wall k). Then, up to a possible permutation,
one receives at each microphone the squared distance
information,
y k , m = def s ~ k - r m , s ~ k - r m = s ~ k 2 - 2 s ~ k , r m +
r m 2 . ( 7 ) ##EQU00007##
[0121] Define further
y ~ k , m = def - 1 2 ( y k , m - r m 2 ) = r m , s ~ k - 1 2 s ~ k
2 ##EQU00008##
[0122] We have in vector form
( y ~ k , 1 y ~ k , 2 y ~ k , M ) = ( r 1 T - 1 2 r 2 T - 1 2 r M T
- 1 2 ) ( s ~ k s ~ k 2 ) ( 8 ) ##EQU00009##
[0123] Demote by M the above matrix,
M = def ( r 1 T - 1 2 r 2 T - 1 2 r M T - 1 2 ) ( 9 )
##EQU00010##
[0124] and set
y ~ k = def ( y ~ k , 1 , , y ~ k , M ) T , u ~ k = def ( s k x , s
k y , s ~ k 2 ) T . ##EQU00011##
[0125] We write the above expression (8) co{tilde over (y)}.sub.k=M
.sub.k.E-backward..
[0126] Thanks to the condition that
m = 1 M r m = 0 , ##EQU00012##
we have that
1 T y ~ k = - M 2 s ~ k 2 i . e . ( 10 ) s ~ k 2 = - 2 M m = 1 M y
~ k , m . ( 11 ) ##EQU00013##
[0127] Furthermore,
{tilde over (s)}.sub.k=A{tilde over (y)}.sub.k, (12)
[0128] where A is a matrix such that
AM = ( 1 0 0 0 1 0 ) . ( 13 ) ##EQU00014##
[0129] These two conditions provide a complete characterisation of
the distance information. In practice, it is sufficient to verify
the linear constraint
{tilde over (y)}.sub.k.epsilon.(M), (14)
[0130] where (M) is a proper subspace when M.gtoreq.4.
[0131] This approach enables to formulate equivalent theorems and
algorithms to the ones for EDM formulation, with analogous
argumentations. But more than that, it is possible to use the
nonlinear condition (11) to solve the problem with only 3
microphones in the 2-D and 4 microphones in 3-D.
[0132] Theorem 3.
[0133] The minimal number of microphones required to hear the room
given that they observe the first order echoes is 3 in 2-D and 4 in
3-D.
[0134] Proof.
[0135] Construct a family of counterexamples for M=2.
[0136] Practical Considerations--Working with Uncertainties
[0137] In practice one encounters several sources of error. The
first error term comes from the uncertainty when measuring the
inter-microphone distances, that is
d.sub.ij=d.sub.ij+e.sub.ij, (15)
so that
D=D+E, (16)
[0138] where E is a symmetric, zero-diagonal error matrix.
[0139] This can be dealt with the calibration, but note that the
schemes proposed in the following seem to be very stable with
respect to uncertainties in array calibration.
[0140] The second source of error comes from the effects of the
finite sampling rate and the finite precision of peak-picking
algorithms. Some of this can be alleviated by using a high sampling
rate, and better time-of-arrival estimation algorithms.
[0141] However it is better to use some kind of a distance measure
between the measured/assembled D.sub.aug and some feasible
D.sub.aug. One possible approach would be to build a heuristic
based on the singular values of D.sub.aug. Such approach, however,
would capture only the rank requirement on the matrix. But the
requirement that D.sub.aug be an EDM brings in many more subtle
dependencies between its elements. For instance one has that
( I - 1 n 11 T ) D aug ( I - 1 n 11 T ) 0. ( 17 ) ##EQU00015##
[0142] Furthermore (17) does not allow to specify the ambient
dimension of the point set. Imposing this constraint leads to even
more dependencies between the matrix elements, and the resulting
space of matrices is no longer a cone (it is actually not anymore
convex). Nevertheless, it is possible to use a family of algorithms
known as multidimensional scaling (MDS) to find the closest EDM
between the points in a fixed ambient dimension.
[0143] Multidimensional Scaling
[0144] As pointed out, in the presence of noise it is not
favourable to use the rank test on D.sub.aug. A very good way (as
verified through simulations) to deal with this nuisance is to
measure how close D.sub.aug is to a true EDM. In order to measure
the distance, it is possible to use Multidimensional Scaling to
construct a point set in a given dimension (either 2-D or 3-D)
which produces the EDM "closest" to D.sub.aug.
[0145] Multidimensional Scaling (MDS) was originally proposed in
psychometrics as a method for data visualization. Many variations
have been proposed to adapt the method for sensor localization.
[0146] Here it is used the s-stress criterion as proposed by
Takane, Young and de Leeuw (1977). Given an observed noisy matrix
{tilde over (D)}, the s-stress criterion is
s ( D ~ ) = minimize i , j ( d i , j 2 - d ~ i , j 2 ) 2
##EQU00016## subject to D .di-elect cons. 2 . ##EQU00016.2##
[0147] We call s({tilde over (D)}) the score of matrix {tilde over
(D)}. By EDM.sup.2 we denote the set of EDMs with embedding
dimension 2 (produced by point sets in 2-D). In the 3-D case,
EDM.sup.2 is replaced by EDM.sup.3.
[0148] From now on, it is assumed that the target space is R.sup.2.
The 3-D adaptation is immediate. If one associates to each point in
R.sup.2 a coordinate vector x.sub.i=(x.sub.i,y.sub.i).sup.T, one
has that
d.sup.2.sub.i,j=.parallel.x.sub.i-x.sub.j.parallel..sub.2.sup.2=(x.sub.i--
x.sub.j).sup.2+(y.sub.i-y.sub.j).sup.2.
[0149] Thus, the s-stress criterion can be rephrased as
s ( D ~ ) = minimize x i , y i .di-elect cons. i , j [ ( x i - x j
) 2 + ( y i - y i ) 2 - d ~ i , j 2 ] 2 ( 18 ) ##EQU00017##
[0150] The objective function in (18) is not convex. However, it
has been shown to have less local minima compared to other MDS
criteria. Furthermore, it yields a meaningful definition of the
distance of a matrix from an optimal EDM.
[0151] In order to further skip the local minima of (18), it is
possible to use coordinate alternation for finding the optimal EDM:
it is possible to compute (18), by first minimizing over x.sub.i
and then over y.sub.i. Although this approach is suboptimal
compared to simultaneous minimization with respect to x.sub.i, it
leads to simpler computations.
[0152] Assuming that x.sub.i has to be updated by .DELTA.x.sub.i to
give the minimum of s({tilde over (D)}), one will have
s ( D ~ ) i ( k + 1 ) = j = 1 n [ ( x i ( k ) + .DELTA. x i ( k + 1
) - x j ( k ) ) 2 + ( y i ( k ) - y j ( k ) ) 2 - d ~ i , j 2 ] 2 ,
( 19 ) ##EQU00018##
[0153] where (.cndot.).sup.(k) returns the value at iteration k.
Taking the derivative of s({tilde over (D)}).sub.i.sup.(k+1) with
respect to .DELTA.x.sub.i.sup.(k+1), one will have
.differential. s ( D ~ ) i ( k + 1 ) .differential. .DELTA. x i ( k
+ 1 ) = 4 n ( .DELTA. x i ( k + 1 ) ) 3 + 3 j = 1 n ( x i ( k ) - x
j ( k ) ) ( .DELTA. x i ( k + 1 ) ) 2 + j = 1 n [ 3 ( x i ( k ) - x
j ( k ) ) 2 + ( y i ( k ) - y j ( k ) ) 2 - d ~ i , j 2 ] .DELTA. x
i ( k + 1 ) + j = 1 n [ ( x i ( k ) - x j ( k ) ) 3 + ( x i ( k ) -
x j ( k ) ) ( y i ( k ) - y j ( k ) ) 2 - ( x i ( k ) - x j ( k ) )
d ~ i , j 2 ] . ( 20 ) ##EQU00019##
[0154] Setting (20) to zero yields at most real solutions, and
comparing the value of s({tilde over (D)}).sub.i.sup.(k+1) for the
results gives the optimal value for .DELTA.x.sub.i.sup.(k+1).
[0155] The complete optimization procedure is summarized in
Algorithm 2.
TABLE-US-00002 Algorithm 2 COORDINATE ALTERNATION FOR S-STRESS
OPTIMIZATION Input: Symmetric and zero-diagonal matrix {tilde over
(D)} Output: Estimate positions: x and s({tilde over (D)}) 1:
Assume an initial configuration for the points x.sup.0 2: repeat 3:
for i = 1 to n do 4: Assume the configuration of the points
different than i fixed, 5: Update x.sub.i using the i.sup.th row of
{tilde over (D)}, 6: Update y.sub.i using the i.sup.th row of
{tilde over (D)}, 7: end for 8: until convergence or maximum number
of iterations is reached.
[0156] FIG. 8 is an embodiment of a data processing system 300 in
which an embodiment of a method of the present invention may be
implemented. The data processing system 300 of FIG. 8 may be
located and/or otherwise operate at any node of a computer network,
that may exemplarily comprise clients, servers, etc., and it is not
illustrated in the Figure. In the embodiment illustrated in FIG. 8,
data processing system 300 includes communications fabric 302,
which provides communications between processor unit 304, memory
306, persistent storage 308, communications unit 310, input/output
(I/O) unit 312, and display 314.
[0157] Processor unit 304 serves to execute instructions for
software that may be loaded into memory 306. Processor unit 304 may
be a set of one or more processors or may be a multi-processor
core, depending on the particular implementation. Further,
processor unit 304 may be implemented using one or more
heterogeneous processor systems in which a main processor is
present with secondary processors on a single chip. As another
illustrative example, the processor unit 304 may be a symmetric
multi-processor system containing multiple processors of the same
type.
[0158] In some embodiments, the memory 306 shown in FIG. 8 may be a
random access memory or any other suitable volatile or non-volatile
storage device. The persistent storage 308 may take various forms
depending on the particular implementation. For example, the
persistent storage 308 may contain one or more components or
devices. The persistent storage 308 may be a hard drive, a flash
memory, a rewritable optical disk, a rewritable magnetic tape, or
some combination of the above. The media used by the persistent
storage 308 also may be removable such as, but not limited to, a
removable hard drive.
[0159] The communications unit 310 shown in FIG. 8 provides for
communications with other data processing systems or devices. In
these examples, communications unit 310 is a network interface
card. Modems, cable modem and Ethernet cards are just a few of the
currently available types of network interface adapters.
Communications unit 310 may provide communications through the use
of either or both physical and wireless communications links.
[0160] The input/output unit 312 shown in FIG. 8 enables input and
output of data with other devices that may be connected to data
processing system 300. In some embodiments, input/output unit 312
may provide a connection for user input through a keyboard and
mouse. Further, input/output unit 312 may send output to a printer.
Display 314 provides a mechanism to display information to a
user.
[0161] Instructions for the operating system and applications or
programs are located on the persistent storage 308. These
instructions may be loaded into the memory 306 for execution by
processor unit 304. The processes of the different embodiments may
be performed by processor unit 304 using computer implemented
instructions, which may be located in a memory, such as memory 306.
These instructions are referred to as program code, computer usable
program code, or computer readable program code that may be read
and executed by a processor in processor unit 304. The program code
in the different embodiments may be embodied on different physical
or tangible computer readable media, such as memory 306 or
persistent storage 308.
[0162] Program code 316 is located in a functional form on the
computer readable media 318 that is selectively removable and may
be loaded onto or transferred to data processing system 300 for
execution by processor unit 304. Program code 316 and computer
readable media 318 form a computer program product 320 in these
examples. In one example, the computer readable media 318 may be in
a tangible form, such as, for example, an optical or magnetic disc
that is inserted or placed into a drive or other device that is
part of persistent storage 308 for transfer onto a storage device,
such as a hard drive that is part of persistent storage 308. In a
tangible form, the computer readable media 318 also may take the
form of a persistent storage, such as a hard drive, a thumb drive,
or a flash memory that is connected to data processing system 300.
The tangible form of computer readable media 318 is also referred
to as computer recordable storage media. In some instances,
computer readable media 318 may not be removable.
[0163] Alternatively, the program code 316 may be transferred to
data processing system 300 from computer readable media 318 through
a communications link to communications unit 310 and/or through a
connection to input/output unit 312. The communications link and/or
the connection may be physical or wireless in the illustrative
examples. The computer readable media also may take the form of
non-tangible media, such as communications links or wireless
transmissions containing the program code.
[0164] The different components illustrated for data processing
system 300 are not meant to provide architectural limitations to
the manner in which different embodiments may be implemented. The
different illustrative embodiments may be implemented in a data
processing system including components in addition to or in place
of those illustrated for data processing system 300. Other
components shown in FIG. 8 can be varied from the illustrative
examples shown. For example, a storage device in data processing
system 300 is any hardware apparatus that may store data. Memory
306, persistent storage 308, and computer readable media 318 are
examples of storage devices in a tangible form.
[0165] Therefore, as explained at least in connection with FIG. 8
the present invention is as well directed to a system for
determining the geometry and/or the localisation of an element, a
computer program product for determining the geometry and/or the
localisation of an element and a computer data carrier.
[0166] In accordance with a further embodiment of the present
invention is provided for a computer data carrier storing
presentation content created while employing the methods of the
present invention.
[0167] Although the present invention has been described in more
detail in connection with its embodiment for determining the
geometry of a room, the present invention finds applicability of
connection with many other fields.
[0168] The present invention can be used for determining the exact
position of a receiver r, which is a person in the FIG. 7. In the
case a satellite, e.g. a GPS satellite is the source s of a radio
signal which can be reflected by some buildings B1, B2. If the echo
e1 is not used, the localisation of a mobile device r of a person
can be computed incorrectly (the mobile device r will be considered
located in correspondence of {tilde over (r)}).
[0169] Knowing the position of the satellite s, the position of the
buildings B1, B2, etc. (this is possible e.g. by using an
electronic map) and applying the method according to the invention,
it is possible to accurately locate the mobile device r and then
the person, without any error.
[0170] An application of the method lies in neurology. Neural
activity is measured by electrodes introduced into the human or
animal brain. These electrodes pick up signals coming from multiple
neurons. Neural spike sorting aims at identifying spikes coming
from a single neuron: such identification is a labeling o
clustering problem. For finding its solution, the method according
to the invention can be applied. Clustering is done based on the
spike shape and the relative spike amplitudes at different
electrodes.
[0171] Since the human or animal tissue is homogeneous and the
electric signals are observed through the line-of-sight
propagation, the relative spike amplitudes depend on the distance
between the electrodes and the neurons. The exact amplitude pattern
depends on the electrode array geometry and on the mutual position
of the electrode array and a given neuron.
[0172] In the noiseless case, knowing the characteristics of the
propagation in the human or animal tissue, and having a sufficient
number of electrodes would uniquely identify the location of each
given neuron.
[0173] In the noisy case, the method according to the invention
allows to find the likely location of each neuron, by finding the
closest EDM.
[0174] The method of the invention can also be used in
audio-forensics. For example, a person moving in a room while
talking on a phone might enable us to learn the shape of that room
based on the audio signal transmitted over the phone channel.
[0175] The method according to the invention can also be applied in
CDMA, or in general in MIMO communications. A possible application
is the accurate channel estimation. In multipath propagation (for
example indoor channels), the receiving antenna pick up the direct
signal, and a number of echoes or reflections. These reflections,
as discussed, can be modeled by image sources. It is possible then
to estimate the EDM corresponding to multiple emitting and
receiving antennas, and then include image sources. It is then
possible to estimate the locations of these image sources, and then
find the "perfect" locations of the corresponding path components
in impulse responses.
[0176] Furthermore, if the geometry or the position of the antenna
arrays changes, it is likely that the major reflections will still
be coming from the same reflectors. It is then possible to
efficiently re-estimate the channel by only learning the new
geometry of the antenna array.
[0177] Advantageously the method according to the invention can be
used for boost the signal power, as already attempted by the "RAKE"
receivers. However, such receivers try to decide where the
individual channel taps are from the estimated impulse responses.
On the contrary with the method according to the invention after
estimating the shape of the room, it is possible to have a perfect
knowledge of the image source locations and this could be used for
correctly combining the reflected signals in order to boost the
power.
[0178] The method according to the invention can be applied to ToF
(Time of Fly) camera, where a single light pulse illuminates the
scene, and then the scene depth is computed based on the travel
time of light. On the camera side there is a pixel array where
pixels are time-resolving sensors (or there is a shutter that has
the role of time resolving). The method according to the invention
can allow to substantially reduce the number of pixels needed by
approximating the scene with a number of planar reflectors and
finding the image source corresponding the each planar reflector by
using the EDM.
[0179] Another possible application of the method according to the
invention is the indoor sound source localization, usually
considered difficult since the reflections are difficult to predict
and they masquerade as sources.
[0180] Another set of applications is in teleconferencing and
auralization where one would, perhaps for different reasons, like
to compensate the room influence or create an illusion that the
sound is played in a specific room. This largely consists in
compensating the early reflections, which in turn requires the
knowledge of the reflector locations. The listed techniques work
because knowing the boundary conditions allow to compute the RIR
for an arbitrary source-receiver geometry inside the room.
[0181] A different field of application is in wave field synthesis:
knowing the locations of early reflections might enable to develop
more specific indoor sampling theorems.
* * * * *