U.S. patent application number 12/994716 was filed with the patent office on 2011-05-26 for data embedding system.
Invention is credited to Peter Kelly, Michael Raymond Reynolds, Christopher John Joseph Sutton.
Application Number | 20110125508 12/994716 |
Document ID | / |
Family ID | 39768060 |
Filed Date | 2011-05-26 |
United States Patent
Application |
20110125508 |
Kind Code |
A1 |
Kelly; Peter ; et
al. |
May 26, 2011 |
DATA EMBEDDING SYSTEM
Abstract
A data hiding system is described for hiding data within an
audio signal. The system can be used for watermarking, data
communications, audience surveying etc. The system hides data in an
audio signal by adding artificial echoes whose polarity varies with
the data to be hidden. In one embodiment, each data value is
represented by a positive and a negative echo having different
delays. A receiver can then remove the effects of natural echoes
and/or periodicities in the audio signal by differencing
measurements obtained at the different delays.
Inventors: |
Kelly; Peter; (Cambridge,
GB) ; Reynolds; Michael Raymond; (Cambridge, GB)
; Sutton; Christopher John Joseph; (Cambridge,
GB) |
Family ID: |
39768060 |
Appl. No.: |
12/994716 |
Filed: |
May 29, 2009 |
PCT Filed: |
May 29, 2009 |
PCT NO: |
PCT/GB2009/001354 |
371 Date: |
February 9, 2011 |
Current U.S.
Class: |
704/502 ; 381/63;
704/E19.001 |
Current CPC
Class: |
H04H 20/31 20130101;
G10L 19/018 20130101 |
Class at
Publication: |
704/502 ; 381/63;
704/E19.001 |
International
Class: |
G10L 19/00 20060101
G10L019/00 |
Foreign Application Data
Date |
Code |
Application Number |
May 29, 2008 |
GB |
PCT/GB2008/001820 |
Jul 31, 2008 |
GB |
0814041.0 |
Nov 28, 2008 |
GB |
0821841.4 |
Claims
1-48. (canceled)
49. A method of embedding a plurality of data values in an audio
signal, the method comprising: receiving the audio signal;
generating an echo of at least a portion of the received audio
signal; and embedding the plurality of data values in the audio
signal by combining the received audio signal with multiple
versions of the generated echo at different delays, wherein each
delay is associated with a respective data value.
50. A method according to claim 49, wherein each data value is
represented by first and second echoes having different delays and
opposite polarities and wherein each data value is represented by
third and fourth echoes, the third echo having the same polarity,
when combined with said audio signal, as said second echo and the
fourth echo having the same polarity, when combined with said audio
signal, as said first echo, wherein the first and second echoes are
generated by repeating substantially a first portion of the
received audio signal and wherein the third and fourth echoes are
generated by repeating substantially a second portion of the audio
signal which is different to the first portion repeated by the
first and second echoes.
51. A method according to claim 50, wherein said second portion is
adjacent the first portion.
52. A method according to claim 51, wherein said first echo is
combined with said audio signal at a first delay relative to said
first portion of the audio signal; said second echo is combined
with the audio signal at a second delay relative to said first
echo; said third echo is combined with said audio signal at a third
delay relative to said second portion of the audio signal; and said
fourth echo is combined with the audio signal at a fourth delay
relative to said third echo and preferably wherein said first delay
is equal to said third delay and/or said second delay is equal to
said fourth delay.
53. A method according to claim 49, wherein said delays are between
0.5 ms and 100 ms.
54. A method according to claim 49, wherein said echo of at least a
portion of the audio signal has a duration of between 20 ms and 500
ms.
55. A method according to claim 49, wherein the polarity of an echo
at a given delay is modulated in dependence upon the data value
associated with the delay.
56. A method according to claim 49, wherein said audio signal is
received as a stream of samples and wherein each echo is generated
by passing the stream of audio samples through a respective delay
line.
57. A method according to claim 49, wherein said combining step
combines the or each echo with the audio signal by adding and/or
subtracting each echo to/from the audio signal in dependence upon
the data value.
58. A method according to claim 49, wherein the or each echo
associated with each data value is faded in and out at different
times within a symbol period so that different echoes reach their
peak values at different times within the symbol period.
59. A computer executable instructions product comprising computer
implementable instructions for causing a programmable computer
device to carry out the method of claim 49.
60. An apparatus for embedding a plurality of data values in an
audio signal, the apparatus comprising: an input for receiving the
audio signal; an echo generator for generating an echo of at least
a portion of the received audio signal; and a combiner for
embedding the plurality of data values in the audio signal by
combining the received audio signal with multiple versions of the
generated echo at different delays, wherein each delay is
associated with a respective data value.
61. A method of recovering a plurality of data values embedded in
an audio signal, the method comprising: receiving an input signal
having the audio signal and multiple versions of an echo of at
least a portion of the audio signal at different delays, wherein
each delay is associated with a respective data value; processing
the received input signal to detect said echoes at the delays; and
recovering the plurality of data values from the detected
echoes.
62. A method according to claim 61, wherein each data value is
represented by first and second echoes having different delays and
opposite polarities, wherein each data value is represented by
third and fourth echoes, the third echo having the same polarity as
said second echo and the fourth echo having the same polarity as
said first echo, wherein the first and second echoes are a repeat
of substantially a first portion of the received audio signal,
wherein the third and fourth echoes are a repeat of substantially a
second portion of the audio signal which is different to the first
portion repeated by the first and second echoes and wherein said
processing processes the received input signal to detect the echoes
representing each data value.
63. A method according to claim 62, wherein said second portion is
adjacent the first portion.
64. A method according to claim 62, wherein said first echo is at a
first delay relative to said first portion of the audio signal;
said second echo is at a second delay relative to said first echo;
said third echo is at a third delay relative to said second portion
of the audio signal; and said fourth echo is at a fourth delay
relative to said third echo and preferably wherein said first delay
is equal to said third delay and/or said second delay is equal to
said fourth delay.
65. A method according to claim 61, wherein said delays are between
0.5 ms and 100 ms.
66. A method according to claim 61, wherein each echo has a
duration of between 20 ms and 500 ms.
67. A method according to claim 61, wherein the polarity of an echo
at a given delay depends upon the data value associated with the
delay and wherein said processing detects the polarity of the
echoes and determines said plurality of data values from the
detected echo polarities.
68. A method according to claim 61, wherein the or each echo
associated with each data value is faded in and out at different
times within a symbol period so that different echoes reach their
peak values at different times within the symbol period and wherein
the processing detects the echoes at a timing when an echo is
expected to be at its peak amplitude.
69. A computer executable instructions product comprising computer
implementable instructions for causing a programmable computer
device to carry out the method of claim 61.
70. An apparatus for recovering a plurality of data values embedded
in an audio signal, the apparatus comprising: an input for
receiving an input signal having the audio signal and multiple
versions of an echo of at least a portion of the audio signal at
different delays, wherein each delay is associated with a
respective data value; and a processor for processing the received
input signal to detect said echoes at the delays and for recovering
the plurality of data values from the detected echoes.
Description
[0001] The present invention relates to a system for embedding data
in an audio signal and to its subsequent recovery, which can be
used for watermarking, data communications, audience surveying etc.
The invention has particular relevance, to a system for hiding data
in an audio signal by adding echoes to the audio signal and to a
system for recovering the hidden data by detecting the added
echoes.
[0002] U.S. Pat. No. 5,893,067 discloses a technique for hiding
data within an audio signal for transmission to a remote receiver.
The data is hidden in the audio signal by adding an artificial echo
to the audio signal and varying the amplitude and/or delay of the
echo in accordance with the data to be hidden.
[0003] A problem with the data hiding technique described in U.S.
Pat. No. 5,893,067 is that in most situations, natural echoes can
mask the artificial echoes making it difficult for the receiver to
be able to identify the artificial echoes and hence recover the
hidden data.
[0004] The present invention aims to provide an alternative data
hiding technique. One embodiment of the invention at least
alleviates the above problem by applying echoes of opposite
polarity to represent each data value.
[0005] According to one aspect, the present invention provides a
method of embedding a data value in an audio signal, the method
comprising: generating an echo of at least a portion of the
received audio signal; and embedding the data value in the audio
signal by combining the received audio signal with the generated
echo; wherein the data value is embedded in the audio by varying
the polarity of the echo that is combined with the audio signal in
dependence upon the data value. The inventors have found that using
polarity modulation to embed the data in the audio signal can make
the recovery of the embedded data easier in the receiver,
especially in the presence of natural echoes caused, by for example
the acoustics of the room. The polarity modulation can be achieved
by varying the echo that is generated and/or by varying the way in
which the echo is combined with the audio signal.
[0006] In one embodiment, the generating step generates a first
echo of at least a portion of the received audio signal and a
second echo of at least a portion of the received audio signal, the
first and second echoes having first and second polarities
respectively, which polarities vary in dependence upon the data
value; and wherein the embedding step embeds the data value in the
audio signal by combining the received audio signal with the
generated first and second echoes.
[0007] Each of the echoes may be generated by repeating at least a
part of said audio signal. The first echo may be generated by
repeating a first portion of the audio signal and the second echo
may be generated by repeating a second portion of said audio
signal. Alternatively, the first and second echoes may be generated
by repeating substantially the same first portion of the audio
signal. Where the audio signal is received as a stream of samples,
the or each echo may be generated by passing the stream of audio
samples through a delay line.
[0008] In one embodiment, third and fourth echoes may be generated,
the third echo having the same polarity as said second echo and the
fourth echo having the same polarity as said first echo. In this
case, the third and fourth echoes may be generated by repeating
substantially the same second portion of the audio signal which is
different to the first portion repeated by the first and second
echoes. The second portion of the audio signal may be adjacent to
the first portion. The generating step may generate the third and
fourth echoes by inverting the polarity of a gain factor applied to
the echoes before being combined with the audio signal.
[0009] The first echo may be combined with the audio signal at a
first delay relative to the first portion of the audio signal; the
second echo may be combined with the audio signal at a second delay
relative to said first echo; the third echo may be combined with
said audio signal at a third delay relative to said second portion
of the audio signal; and the fourth echo may be combined with the
audio signal at a fourth delay relative to the third echo. The
first delay may be equal to said third delay and/or the second
delay may be equal to said fourth delay. In one embodiment, the
delays and the amplitudes of the echoes are independent of the data
value.
[0010] Preferably the first and third delays are between 0.5 ms and
100 ms and more preferably between 5 ms and 40 ms; and the second
and fourth delays delayed (relative to the first and third echoes
respectively) by between 0.125 ms and 3 ms and more preferably
between 0.25 ms and 1 ms, as these delays are similar to those of
natural echoes and so are less noticeable to users. In one
embodiment, the or each echo has an amplitude that is less than the
amplitude of said audio signal. Preferably the or each echo is
faded in and out to reduce obtrusiveness of the echoes to a
listener.
[0011] The first and second portions of the audio signal should be
long enough for the receiver to be able to detect the presence of
the echoes but not too long as to overly reduce the data rate that
can be communicated. The inventors have found that echoes having
durations of between 20 ms and 500 ms provides a reasonable data
rate whilst keeping to a minimum data transmission errors when
transmission occurs over an acoustic link. If transmission is over
an electrical link, then shorter echoes may be used.
[0012] The echoes may be combined with the audio signal by adding
and/or subtracting the echoes to/from the audio signal. The
polarity of each echo can therefore be controlled by controlling
the way in which each echo is combined with the audio signal.
[0013] This aspect of the invention also provides a computer
implementable instructions product comprising computer
implementable instructions for causing a programmable computer
device to carry out the method described above.
[0014] This aspect also provides an apparatus for embedding a data
value in an audio signal, the apparatus comprising: an echo
generator operable to generate an echo of at least a portion of the
received audio signal; and a combiner operable to combine the
received audio signal with the generated first and second echoes to
embed the data value in the audio signal; wherein the echo
generator and/or the combiner are arranged so that the data value
is embedded in the audio by varying the polarity of the echo that
is combined with the audio signal in dependence upon the data
value.
[0015] According to another aspect, the present invention provides
a method of recovering a data value embedded in an audio signal,
the method comprising: receiving an input signal having the audio
signal and an echo of at least part of the audio signal whose
polarity depends upon said data value; processing the received
input signal to determine the polarity of the echo; and recovering
the data value from the determined polarity.
[0016] The input signal may comprise a first echo of at least a
portion of the audio signal and a second echo of at least a portion
of the audio signal, the first and second echoes having first and
second polarities respectively, which polarities vary in dependence
upon the data value; and wherein the processing step processes the
input signal to combine the first and second echoes and to
determine the polarity of the combined echoes and wherein the
recovering step recovers the data value from the determined
polarity of the combined echoes.
[0017] In one embodiment the processing step processes the input
signal to determine a first autocorrelation measure that depends
upon the first echo and a second autocorrelation measure that
depends upon the second echo and combines the echoes by
differencing the first and second autocorrelation measures and
determines the polarity of the combined echoes by determining the
polarity of the result of the differencing step.
[0018] The first echo may be of a first portion of the audio signal
and the second echo may be of a second portion of the audio signal.
Alternatively the first and second echoes may be repeats of
substantially the same portion of the audio signal.
[0019] In one embodiment, the input signal comprises first, second,
third and fourth echoes, the first and fourth echoes having the
same polarity and the second and third echoes having the same
polarity which is opposite to the polarity of the first and fourth
echoes, wherein the processing step processes the input signal to
combine the first to fourth echoes and to determine the polarity of
the combined echoes and wherein the recovering step recovers the
data value from the determined polarity of the combined echoes.
[0020] In this embodiment, the processing step may process the
input signal to determine a first autocorrelation measure that
depends upon the first echo, a second autocorrelation measure that
depends upon the second echo, a third autocorrelation measure that
depends upon the third echo and a fourth autocorrelation measure
that depends upon the fourth echo and combines the echoes by
differencing the autocorrelation measures and determines the
polarity of the combined echoes by determining the polarity of a
result of the differencing step.
[0021] The differencing step may perform a first difference of the
first and third autocorrelation measures, a second difference of
the second and fourth autocorrelation measures, a third difference
of the result of said first difference and the result of the second
difference and wherein the polarity of the combined echoes may be
determined from the polarity of a result of the third
difference.
[0022] The first and second echoes may be repeats of substantially
the same first portion of the audio signal and the third and fourth
echoes may be repeats of substantially the same second portion of
the audio signal. Alternatively, the first and third echoes may be
repeats of substantially the same first portion of the audio signal
and the second and fourth echoes may be repeats of substantially
the same second portion of the audio signal. In one embodiment, the
or each echo is faded in and out to reduce obtrusiveness of the
echoes to a listener. In this case, the polarity of the echo may be
determined when the amplitude of the echo is at or near a
maximum.
[0023] The first echo may be delayed relative to said first portion
of the audio signal by a first delay; the second echo may be
delayed relative to the first echo by a second delay; the third
echo may be delayed relative to the second portion of the audio
signal by a third delay; and the fourth echo may be delayed
relative to the third echo by a fourth delay. The first delay may
be equal to the third delay and/or the second delay may be equal to
said fourth delay.
[0024] According to this aspect, a computer implementable
instructions product is also provided comprising computer
implementable instructions for causing a programmable computer
device to carry out the above method.
[0025] This aspect also provides an apparatus for recovering a data
value embedded in an audio signal, the apparatus comprising: an
input for receiving an input signal having the audio signal and an
echo of at least part of the audio signal whose polarity depends
upon said data value; a processor operable to process the input
signal to determine the polarity of the echo; and a data
regenerator operable to recover the data value from the determined
polarity.
[0026] These and other aspects of the invention will become
apparent to those skilled in the art from the following detailed
description of exemplary embodiments, which are described with
reference to the following drawings in which:
[0027] FIG. 1 is a block diagram illustrating the main components
of a transmitter and receiver used in an exemplary embodiment;
[0028] FIG. 2a is an impulse plot illustrating the echoes that are
added to an audio signal to encode a binary "one";
[0029] FIG. 2b is an impulse plot illustrating the echoes that are
added to an audio signal to encode a binary "zero";
[0030] FIG. 3a is an impulse plot illustrating the presence of
artificial echoes for a binary "one" after Manchester encoding and
illustrating natural echoes;
[0031] FIG. 3b is an impulse plot illustrating the presence of
artificial echoes for a binary "zero" after Manchester encoding and
illustrating natural echoes;
[0032] FIG. 4 is a block diagram illustrating in more detail the
encoding performed in the transmitter shown in FIG. 1;
[0033] FIG. 5 is a block diagram illustrating the main components
of an echo generation and shaping module forming part of the
transmitter shown in FIG. 1;
[0034] FIG. 6a illustrates a shaping and modulation function that
is applied to the echoes prior to being combined with the audio
signal when a binary "one" is to be transmitted;
[0035] FIG. 6b illustrates a shaping and modulation function that
is applied to the echoes prior to being combined with the audio
signal when a binary "zero" is to be transmitted;
[0036] FIG. 6c illustrates the way in which the shaping and
modulation function varies when two successive binary "ones" are to
be transmitted;
[0037] FIG. 6d illustrates the shaping and modulation function that
is applied when a binary "zero" is transmitted after a binary
"one";
[0038] FIG. 7 illustrates the processing performed in the receiver
shown in FIG. 1 for recovering the hidden data from the received
audio signal;
[0039] FIG. 8a is an autocorrelation plot for a typical audio
signal without artificial echoes;
[0040] FIG. 8b is an autocorrelation plot for the audio signal with
artificial echoes during a first half of a bit symbol;
[0041] FIG. 8c is an autocorrelation plot for the audio signal with
artificial echoes during the second half of the bit symbol;
[0042] FIG. 8d is a plot obtained by subtracting the
autocorrelation plot shown in FIG. 8c from the autocorrelation plot
shown in FIG. 8b;
[0043] FIG. 9 is a block diagram illustrating an alternative form
of receiver used to receive and recover the hidden data embedded in
the audio signal;
[0044] FIG. 10 is a plot illustrating the way in which an FEC error
count varies during a synchronisation process used to find the
hidden data message within the input signal; and
[0045] FIGS. 11a and 11b illustrate the processing performed
respectively by an FEC encoder and an FEC decoder in one
embodiment.
OVERVIEW
[0046] FIG. 1 is a block diagram illustrating a transmitter and
receiver system according to one embodiment in which a transmitter
1 transmits data hidden within an acoustic signal 3 to a remote
receiver 5. The transmitter 1 may form part of a television or
radio distribution network and the receiver may be a portable
device such as a cellular telephone handset that is capable of
detecting the acoustic signal 3 output by the transmitter 1.
The Transmitter
[0047] As shown in FIG. 1, the transmitter 1 includes a forward
error and correction (FEC) encoder module 7, which receives and
encodes the input data to be transmitted to the remote receiver 5.
The encoded message data output from the FEC encoding module 7 is
then passed to an echo generation and shaping module 9, which also
receives an audio signal in which the encoded message data is to be
hidden. The echo generation and shaping module 9 then hides the
message data into the audio by generating echoes of the audio which
depend upon the message data to be transmitted. The generated
echoes are then combined with the original audio signal in a
combiner module 11 and the resulting modified audio signal is then
passed to a gain control module 13 for appropriate gain control.
The audio signal is then converted from a digital signal to an
analogue signal by the digital to analogue converter 15 and it is
then amplified by a driver module 17 for driving a loudspeaker 19
which generates the acoustic signal 3 having the data hidden
therein.
[0048] As will be described in more detail below, in this
embodiment, the polarity of the echoes (as opposed to their lag
and/or amplitude) is varied in order to encode the data to be
transmitted. The inventors have found that this polarity modulation
can be more robust in the presence of natural echoes and
periodicities in the audio signal. This is particularly the case
when each data value is represented by two echoes of the same
magnitude but having different lags and opposite polarities. The
polarities of the echoes representing each message bit are reversed
to distinguish between a binary zero and a binary one. This is
illustrated by the impulse plots illustrated in FIG. 2. In
particular, FIG. 2a is an impulse plot illustrating the component
signals that are present when a binary one is to be transmitted and
FIG. 2b is an impulse plot illustrating the component signals
present when a binary zero is to be transmitted. As shown in FIG.
2a, the component signals include an initial impulse 21
representing the original audio signal followed by two lower
amplitude impulses 23-1 and 23-2 representing the two echoes of the
original signal component 21 which are added to the audio signal.
As can be seen by comparing FIGS. 2a and 2b, when a binary one is
to be transmitted, a positive echo 23-1 is transmitted first
followed by a negative echo 23-2; and when transmitting a binary
zero a negative echo 23-1 is transmitted first followed by a
positive echo 23-2. Although this could be reversed if desired.
[0049] As shown in FIG. 2, in this embodiment, the first echo is
added with a lag of approximately ten milliseconds and the second
echo is added 0.25 milliseconds after the first echo. This is the
same regardless of whether a binary one or a binary zero is to be
transmitted. Additionally, as represented in FIG. 2, in this
embodiment, the echoes that are added have lower amplitudes
compared with the amplitude of the original audio signal. In
particular, in this embodiment, the amplitude of the echoes is
approximately one third that of the original audio signal.
The Receiver
[0050] FIG. 1 also illustrates the main components of the receiver
5. As shown, the receiver includes a microphone 31 for detecting
the acoustic signal 3 and for converting it into a corresponding
electrical signal which is then filtered and amplified by filter
and amplification circuitry 33. The output from the filter
amplification circuitry 33 is then digitised by an analogue to
digital converter 35 and the digital samples are then passed to an
echo detector 37. The echo detector 37 then processes the digital
samples to identify the polarities of the echoes in the received
signal. This information is then passed through a data recovery
module 39 which processes the echo information to recover the
encoded message data. This message data is then decoded by a
decoder 41 to recover the original data that was input to the FEC
decoding module of the transmitter 1.
Manchester Encoding
[0051] As will be explained in more detail below, the echo detector
37 detects the echoes from the received signal by calculating the
auto-correlation of the received signal at specified delays.
However, natural echoes (e.g. room echoes) will also contribute to
the autocorrelation values thus calculated as will periodicities of
the original audio track. In order to distinguish the artificial
echoes representing the encoded data from these natural echoes, the
message data is also Manchester encoded so that a message data
value of "1" is transmitted as a "1", followed by a "0" (or vice
versa), whilst a message data value of "0" is transmitted as a "0"
followed by a "1". In this embodiment, this Manchester encoding is
performed by the echo generation and shaping module 9. Therefore,
when a message bit value of "0" is to be transmitted, for the first
half of the symbol, the first echo 23-1 is of positive polarity and
the second echo 23-2 is of negative polarity, whilst for the second
half of the symbol, the first echo 23-1 is of negative polarity and
the second echo 23-2 is of positive polarity. To transmit a message
bit value of "0", all polarities are reversed, as summarised in the
table given below.
TABLE-US-00001 first half of symbol second half of symbol data
polarity of polarity of polarity of polarity of value first echo
second echo first echo second echo 0 Positive negative negative
positive 1 Negative positive positive negative
[0052] The reason that the Manchester encoding can help to
distinguish the artificial echoes from the natural echoes is that
the natural echoes will be stable over the two half symbol periods.
Therefore, by subtracting the autocorrelations in the second half
of the symbol from autocorrelations in the first half of the symbol
(or vice versa), the effect of the natural echoes and periodicities
will cancel, whilst the autocorrelation peaks caused by the
artificial echoes will add constructively. Similarly, the reason
for using two echoes in each half symbol period is to distinguish
the artificial echoes from periodicities in the original track.
Typically, the autocorrelation of the original track will not
change significantly between these two lags (i.e. between 10 ms and
10.25 ms). Therefore, by differencing the autocorrelations at the
two lags, the effect of the periodicities is reduced and the
autocorrelation peaks caused by the two echoes add
constructively.
[0053] FIGS. 3a and 3b are impulse plots showing the two half
symbols and the artificial echoes 23 that are added within each
half symbol period to represent a binary "1" and a binary "0"
respectively. FIGS. 3a and 3b also illustrate natural echoes 25-1
and 25-2 which do not change from one half period to the next.
Therefore, by subtracting the echoes in one half of the symbol
period from the corresponding echoes (i.e. those with the same lag
or delay) in the other half of the symbol period, the effect of the
natural echoes and periodicities will cancel, whilst the artificial
echoes will add constructively, thereby making it easier to detect
the hidden data.
[0054] The above description provides an overview of the encoding
and decoding techniques used in the present embodiment. A more
detailed description will now be given of the main components of
the transmitter 1 and receiver 5 to carry out the encoding and
decoding processes described above.
FEC Encoder
[0055] FIG. 4 is a block diagram illustrating the main components
of the FEC encoder module 7 used in this embodiment. As shown, the
first encoding module is a Reed-Solomon encoder module 51 which
uses a shortened (13, 6) block code to represent the input data.
The data output from the Reed-Solomon encoder 51 is then passed to
a convolutional encoder 53 which performs convolutional encoding on
the data. The data bits output from the convolutional encoder 53
are then interleaved with each other by a data interleaving module
55 to protect against errors occurring in bursts. Finally, a
synchronisation data adder module 57 adds a sequence of
synchronisation bits that will help the receiver 5 lock on to the
encoded data within the received acoustic signal 3. The output from
the synchronisation data adder module 57 represents the message
data which is then passed to the echo generation and shaping module
9 shown in FIG. 1.
Echo Generation and Shaping
[0056] FIG. 5 is a block diagram illustrating the main components
of the echo generation and shaping module 9 and the combiner module
11 shown in FIG. 1. The input audio signal is represented by the
sequence of audio samples a(n) which are applied to a 10
millisecond delay unit 61 and to the adder 63 (corresponding to the
combiner 11 shown in FIG. 1). The 10 millisecond delay unit 61
delays the input sample a(n) by 10 milliseconds which it then
outputs to a 0.25 millisecond delay unit 65 and to a subtractor 67.
The 0.25 millisecond delay unit 65 delays the audio sample output
from the 10 millisecond delay unit 61 by a further 0.25
milliseconds which it then outputs to the subtractor 67. The
subtractor 67 subtracts the 10.25 millisecond delayed sample from
the 10 millisecond delayed sample outputting the result to a
multiplier 69. The delay units and the subtractor operate each time
a new audio sample a(n) arrives. In this embodiment, the audio
sample frequency is one of 8 kHz, 32 kHz, 44.1 kHz or 48 kHz.
[0057] Therefore, as those skilled in the art will appreciate, the
10 millisecond delay unit 61, the 0.25 millisecond delay unit 65
and the subtractor 67 will generate the two echoes 23-1 and 23-2
illustrated in FIG. 2. At this stage, however, the echoes that have
been generated do not depend on the data to be transmitted. As will
be explained below, this dependency is achieved by multiplying the
echoes in the multiplier 69 with a modulation function g(n) that is
output by a lookup table 71 which is addressed by lookup table
address logic 73 in response to the current message data value. In
particular, the lookup table output g(n) changes the polarity of
the echoes in dependence upon the message data so that the echoes
with the modulated polarities can then be added back to the
original audio signal by the adder 63 to generated the
echo-modulated audio output signal.
Lookup Table Output g(n)
[0058] The inventors have found that abrupt changes in the echoes
that are added can make the echoes more obtrusive to users in the
vicinity of the loudspeaker 19. Therefore, the lookup table output
g(n) is gradually increased and decreased so that the echoes are
effectively faded in and out.
[0059] Additionally, in this embodiment, the lookup table output
g(n) also performs the above described Manchester encoding of the
message data. The way in which this is achieved will now be
explained with reference to FIG. 6. In particular, FIG. 6a is a
plot illustrating the way in which the lookup table output g(n)
varies over one symbol period, when the bit value of the message
data is a binary "1". In this embodiment, the symbol period is 100
ms. As shown, during the first half of the symbol period, the
function g(n) increases from zero to a maximum value and then
decreases back to zero at the end of the first half of the symbol
period. During the second half of the symbol period, the function
g(n) is negative and increases in magnitude to a maximum negative
value and then decreases back to zero. As can be seen from FIG. 6a,
in this embodiment, the gradual increasing and decreasing of the
lookup table output g(n) is achieved by using a sinusoidal
function. Therefore, during the first half of the symbol, the
combined echoes output from the subtractor 67 will be multiplied by
a positive value and so their polarity will not be changed when
they are multiplied by g(n) in the multiplier 69. On the other
hand, during the second half of the symbol period the lookup table
output g(n) is negative and therefore, the polarities of the echoes
output from the subtractor 67 will be reversed when the echoes are
multiplied by g(n) in the multiplier 69.
[0060] As mentioned above, the artificial echoes 23 that are
generated and added to the audio signal have an amplitude which is
approximately a third that of the audio signal. In this embodiment,
the amplitude of the echoes is controlled by the output of the
lookup table g(n). As shown in FIG. 6a, the peak amplitude of the
lookup table output g(n) is a third, which means that the maximum
amplitude of the echoes which are added to the audio signal will be
a third of the amplitude of the original audio signal.
[0061] As shown in FIG. 6b, when the message data is a binary value
"0" the lookup table output g(n) is inverted compared with when the
message data has a binary value of "1". Therefore, during the first
half symbol period, the polarity of the echoes output from the
subtractor 67 will be reversed when they are multiplied by g(n) in
the multiplier 69 and during the second half of the symbol period
the polarities of the echoes output by the subtractor 67 will not
be inverted when they are multiplied by g(n) in the multiplier
69.
[0062] FIG. 6c illustrates the lookup table output g(n) over two
symbol periods when the message data to be transmitted is a binary
"1" followed by another binary "1". As shown in FIG. 6c, in this
case, the lookup table output g(n) is a simple repeat of the output
illustrated in FIG. 6a. Similarly, if successive values of the
message data are binary "0's" then the lookup table output g(n)
over the two symbol periods will be the inverse of that shown in
FIG. 6c.
[0063] However, If the message data transitions from a binary "1"
to a binary "0", then instead of using a lookup table output
function obtained by concatenating the functions shown in FIG. 6a
and FIG. 6b, the function shown in FIG. 6d is used instead. As can
be seen in FIG. 6d, when the lookup table output g(n) reaches its
peak negative value in the first symbol period, it remains at that
value until the peak would have occurred in the second symbol
period before decreasing in magnitude back to zero. Similarly, when
successive bits of the message data transition from a binary "0" to
a binary "1", the lookup table output g(n) over the two symbol
periods will be the inverse of that shown in FIG. 6d. The inventors
have found that not returning to the zero level in this way reduces
the obtrusiveness of the echo modulation scheme that is used. This
is because the human ear is more sensitive to changing echoes than
to constant echoes.
[0064] As those skilled in the art will appreciate, the lookup
table address logic 73 is responsible for analysing the successive
bits of the message data and then to look up the appropriate part
of the lookup table 71 so that the appropriate output function g(n)
is applied to the multiplier 69.
Echo Detector
[0065] FIG. 7 is a part schematic and part block diagram
illustrating the processing performed by the echo detector 37. In
particular, FIG. 7 illustrates 100 milliseconds of an input signal
61 at the input of the echo detector 37. As those skilled in the
art will appreciate, the input signal 61 is illustrated
schematically as a continuous signal for ease of understanding but
it will be a sampled and digitised waveform.
[0066] As illustrated by window i and window j, the echo detector
37 includes two sliding windows 63-1 and 63-2 which extract
adjacent segments of the input audio signal 61-1 and 61-2, each of
length 50 milliseconds. Therefore, the two windows 63 extract
portions of the input acoustic signal 61 which correspond to the
above-described half symbol periods. As shown in FIG. 7, the
extracted portion 61-1 of the input acoustic signal is input to a
first autocorrelation unit 65-1 and the extracted portion 61-2 of
the input audio signal is input to a second autocorrelation unit
65-2. Both autocorrelation units 65 operate to determine the
autocorrelation of the corresponding portion 61-1 or 61-2 of the
input acoustic signal at 10 millisecond and 10.25 millisecond lags.
The determined autocorrelation values at lags 10.25 from
autocorrelation units 65-1 and 65-2 are then input to a subtractor
67, that subtracts the autocorrelation value obtained from window j
from the autocorrelation value obtained from window i (or vice
versa). The result of this subtraction is then supplied to another
subtractor 69. Similarly, the autocorrelation value at lag 10
milliseconds from window i and the autocorrelation value at lag 10
milliseconds from window j are output from the autocorrelation
units 65 to the subtractor 71, that subtracts the autocorrelation
value obtained from window j from the autocorrelation value
obtained from window i (or vice versa) and feeds the result to the
subtractor 69. The subtractor 69 then subtracts the output from
subtractor 67 from the output from subtractor 71 (or vice versa).
Therefore, the output from the subtractor 69 is represented by the
following equation:
(A.sup.i(10)-A.sup.j(10))-(A.sup.i(10.25)-A.sup.j(10.25))
[0067] As mentioned above, subtracting the autocorrelation values
of one half symbol period from the corresponding autocorrelation
values of the other half symbol period can reduce the effect of
natural echoes in the input acoustic signal 61. This is because
natural echoes are unlikely to change from one half symbol period
to the next and so their effect will be constant in the
autocorrelations that are calculated. Consequently, performing this
subtraction will remove this common effect. Likewise, subtracting
the autocorrelation values obtained from each half symbol period
will reduce the effect of periodicities in the original audio
signal. This is because in the 0.25 ms delay between the first echo
and the second echo in the half symbol period, the effect of the
periodicities on the autocorrelations will be approximately
constant and so this subtraction will remove this common effect.
This will now be described in more detail with reference to FIG.
8.
[0068] FIG. 8a shows an autocorrelation plot 81 obtained from a
typical audio signal without any artificial echoes. As shown, the
autocorrelation plot 81 has a peak at zero lag. However, because of
periodicities in the audio signal and because of natural echoes,
the autocorrelation plot 81 does not tail off towards zero until
about 15 milliseconds after the initial peak and exhibits local
peaks and troughs in between. Peak 82 illustrates such a local peak
that may occur as a result of a natural echo being added to the
audio signal.
[0069] FIG. 8b illustrates an autocorrelation plot 83 for the same
audio signal after a positive echo has been added at a lag of 10
milliseconds and a negative echo has been added at a lag of 12
milliseconds (rather than at 10.25 ms so that the two echoes can be
seen more clearly). As shown in FIG. 8b, as a result of the
artificial echoes, the autocorrelation plot 83 includes a peak 85
at 10 milliseconds and a peak 87 at 12 milliseconds. However, the
peak 85 is masked somewhat by the earlier peak 82 caused by a
natural echo.
[0070] FIG. 8c illustrates the autocorrelation plot 89 for the
audio signal after the echoes have been added in the second half of
the symbol period. As shown, the autocorrelation plot 89 includes a
negative peak 91 at 10 milliseconds and a positive peak 93 at 12
milliseconds.
[0071] Finally, FIG. 8d illustrates the autocorrelation plot that
is obtained by subtracting the autocorrelation plot shown in FIG.
8c from the autocorrelation plot shown in FIG. 8b. As can be seen,
the common peaks in the autocorrelation plots shown in FIGS. 8b and
8c have been removed, whilst the complementary peaks 85 and 91; and
87 and 93 have added together to create the combined peaks 95 and
97 respectively. As those skilled in the art will appreciate, it is
therefore much easier to detect the peaks 95 and 97 because their
values are much greater than the autocorrelation values at other
lags. This effect is further enhanced by subtracting the
autocorrelation value at 12 milliseconds from the autocorrelation
value at 10 milliseconds. This will effectively add the two peaks
95 and 97 together to provide an even larger peak, which can then
be detected by suitable thresholding. The value of the
corresponding data value can then be determined from the polarity
of the combined peak.
[0072] As those skilled in the art will appreciate, in this
embodiment, the echo detector 37 does not calculate the
autocorrelation of the input signal over all lags. It only
calculates the autocorrelation values at the lags where the
artificial echoes have been added. The plots shown in FIG. 8 show
the autocorrelation values over lags from 0 to 15 milliseconds.
These plots therefore help to illustrate the effect of natural
echoes and periodicities in the audio signal which can mask the
artificial echoes that are added to encode the data.
Synchronisation
[0073] In this embodiment, the receiver 5 knows the duration of
each half symbol period. This defines the length of the windows
63-1 and 63-2 used in the echo detector 37. However, the echo
detector 37 initially will not be synchronised with the transmitted
data. In other words, the echo detector 37 does not know where each
symbol period begins and ends or where the start of the message is
located. Therefore, in this embodiment, the echo detector 37
performs the above analysis as each new sample is received from the
analogue to the digital converter 35. The output from the
subtractor 69 is then analysed by the data recovery module 39 to
determine the most likely symbol boundaries. The data recovery
module then determines the location of the start of the message by
finding the synchronisation bits that were added by the
synchronisation data adder 57. At this point, the data recovery
unit 39 can start to recover the whole message from the polarity of
the autocorrelation values output from the subtractor 69.
[0074] Once synchronisation has been achieved, the echo detector 37
will typically determine the autocorrelation measurements in the
middle of each half symbol period, when the echo is expected to be
at its peak amplitude and the data recovery module 39 will
determine the bit value from the polarity of the output from the
subtractor 69. The echo detector 37 may also take measurements just
before and just after the middle of each half symbol period, to
allow the data recovery module 39 to track the synchronisation.
[0075] The message data recovered by the data recovery module 39 is
then input to the FEC decoding module 41 where the message data is
decoded (using the inverse processing of the FEC encoder 7) to
obtain the original input data that was input to the encoder 7 of
the transmitter 1.
Modifications and Alternatives
[0076] In the above embodiments, the data was hidden within an
audio signal by employing a number of echoes whose polarity varied
with the data value to be transmitted. These echoes were added to
the original audio signal after appropriate delays. As those
skilled in the art will appreciate, the echoes may be added before
the original audio signal (pre-echoes), before and after the
original audio signal or only after the original audio signal.
[0077] In the above embodiment, synchronisation bits were added to
the data that was transmitted so that the decoder can identify the
boundaries of each symbol period and the start and end of each
message. The use of such synchronisation bits significantly
increases the overall message length that has to be transmitted (in
some cases by as much as 25%). Additionally, as the decoding of
each bit is subject to noise, the matching is not perfect which can
reduce the chances of a successful synchronisation. The inventors
have realised, however, that the synchronisation bits are not
required. In particular, the inventors have realised that the FEC
decoding module 41 will have higher error rates when the echo
detector 37 is not properly synchronised with the incoming data
compared with its error rate when the echo detector is synchronised
with the incoming data. Therefore, in the embodiment illustrated in
FIG. 9, the error output generated by the FEC decoding module 41 is
used to control the synchronisation of the receiver to the incoming
data.
[0078] More specifically, in this embodiment, the echo detector 37
receives a block of samples corresponding to one or more symbol(s)
and determines the optimum time within that block of samples to
detect the echoes within the symbols. Multiple symbols may be
required when Manchester encoding is used as a Manchester encoded
"one" looks the same as a Manchester encoded "zero" with a time
shift. Therefore, it may be necessary to consider a number of
symbols to allow the symbol boundaries to be identified. The actual
determination of the optimum time within the block of samples to
detect the echoes may be determined by passing the block of samples
through a matched filter (loaded with the expected signal pattern
for one symbol period) and the time within the symbol when the
absolute output (averaged over a number of successive symbols) is
at a maximum is deemed to be the best time to sample the symbols.
For example, if there are N samples per symbol, and the block of
samples has M symbols, then the following values are
calculated:
average(0)=1/M*(x(0)+x(N)+x(2N)+ . . . )
average(1)=1/M*(x(1)+x(N+1)+x(2N+1)+ . . . )
. . .
average(N-1)=1/M*(x(N-1)+x(2N-1)+x(3N-1)+ . . . )
where x(i) is absolute output of the matched filter for sample i.
The largest average value thus determined identifies the best time
to detect the echoeswithin the incoming signal during each
symbol.
[0079] The echo detector 37 then uses the determined optimum time
to detect echoes in that symbol and in the previous N-1 symbols of
the input signal (where N is the number of symbols in the
transmitted message). The data recovery module 39 then determines,
from the detected echoes, bit value(s) for each symbol and outputs
the string of bits corresponding to the possible message to the FEC
decoding module 41. The FEC decoding module 41 then performs the
inverse processing of the FEC encoder 7 to regenerate a candidate
input data codeword, which is stored in the buffer 93. The FEC
decoding module 41 also outputs an error count indicating how many
errors are identified in the candidate codeword, which it passes to
a controller 91. In response, the controller 91 compares the error
count with a threshold value and if it is greater than the
threshold, then the controller 91 flushes the candidate codeword
from the buffer 93. The above process is then repeated for the next
received symbol in the input signal, until the controller 91
determines that the error count is below the threshold. When it is,
the controller 91 instructs the FEC decoding module 41 to accept
the candidate codeword, which it then outputs for further use in
the receiver 5. In effect, therefore, the echo detector 37, the
data recovery module 39 and the FEC decoding module 41 all operate
on a window of the input signal corresponding to the length of the
transmitted message, which window is slid over the input signal
until a point is found where the FEC error count is below a defined
threshold--indicating the identification of the full message within
the input signal.
[0080] FIG. 10 is a plot illustrating the way in which the FEC
decoding module's error count 99 is expected to change as the
window 101 is slid over an input signal 103 containing a data
message 105, with the minimum appearing at symbol S.sub.N, when the
window 101 is aligned with the data message 105 in the input signal
103. The threshold (Th) level is then set to reduce the possibility
that false minimums in the FEC error output count are considered as
possible codewords, so that (in the ideal situation) only when the
receiver 5 is properly synchronised (aligned) to the message data,
will the FEC decoding module's error count reduce below the
threshold in the manner illustrated in FIG. 10. Ideally, in this
embodiment, the FEC encoding/decoding that is used is designed to
keep the error rate of the FEC decoding module 41 high except when
the window 101 is aligned with the message data 105 in the input
signal 103. The inventors have found that this simple thresholding
technique is sufficient to identify the location of the message
data in the input signal 103. However, if more accurate detection
is required, then further consideration can be made, varying the
possible positions of the start and end of the message and looking
for the positions that give the minimum FEC error count.
[0081] The above technique is useful for finding a single message
in the input signal. Clearly, if a sequence of such data messages
is transmitted, then the synchronisation timing determined for the
first data message may be used to identify the synchronisation
timing for the next data message.
[0082] One problem identified by the inventors with the
synchronisation approach discussed above is that the FEC encoder 7
often uses cyclic codewords (for example when using Reed Solomon
block encoding) which means that a one bit shift in the codeword
can also be a valid codeword. This is problematic because it can
result in false detections of a codeword (a so called false
positive) in the input signal 105. This problem can be overcome by
reordering the bits of the codeword in the FEC encoder 7 in some
deterministic manner (for example in a pseudo random manner), and
using the inverse reordering in the FEC decoder 41. The processing
that may be performed by the FEC encoder 7 and by the FEC decoder
41 in such an embodiment is illustrated in FIGS. 11a and 11b
respectively. As shown, the FEC encoder 7 performs a cyclic
encoding of the data (in this case Reed Solomon encoding 111),
followed by a pseudo random reordering 113 of the data. The
reordered data is then convolutionally encoded 115 and then
interleaved 117 as before. Similarly, the FEC decoding module 41
initially de-interleaves 121 the data and performs convolutional
decoding 123. The FEC decoding module 41 then reverses 123 the
pseudo random data reordering performed by the FEC encoder 7 and
then performs the Reed Solomon decoding 125. As those skilled in
the art will appreciate, by performing this reordering of the data
in this way, if there is a bit shift in the message data output by
the data recovery module 39, then it is far less likely to result
in a valid codeword and so the FEC error rate output is unlikely to
trigger the false identification of a data message.
[0083] In the above embodiments, each data value was represented by
four echoes--two echoes in each of two half symbol periods. As
those skilled in the art will appreciate, each data value may be
represented by any number of echoes in any number of sub-symbol
periods. For example, instead of having two echoes within each half
symbol period, each data value may be represented by a single echo
in each half symbol period. In this case, the echoes in each half
symbol period would preferably be of opposite polarity so that the
same differencing technique can be used to reduce the effects of
natural echoes. Indeed, the inventors have found that in some cases
using two echoes of opposite polarity in each half symbol period
can result in some frequency components within the original audio
signal adding constructively with the echoes and some frequency
components within the original audio signal adding destructively
with the echoes. If a single artificial echo is added, then such
distortions are less evident making the hidden data less noticeable
to users in the acoustic sound that is heard.
[0084] As those skilled in the art will appreciate, representing
each data value by one or more echoes in different sub-symbol
periods, means that the echoes in each sub-symbol period will be a
repetition of a different portion of the audio signal. If there is
only one symbol period, then each data value will be represented by
echoes of the same (or substantially the same) portion of the audio
signal.
[0085] In the above embodiments, each data value was represented by
a positive and a negative echo in a first half symbol period and by
a positive and a negative echo in the second half symbol period.
The positive and negative echoes in the first half symbol period
allowed the receiver to reduce the effects of periodicities in the
original audio signal which effect the autocorrelation
measurements. The use of complementary echoes in adjacent half
symbol periods allows the receiver to reduce the effect of natural
echoes within the received audio signal, which might otherwise mask
the artificial echoes added to represent the data. As those skilled
in the art will appreciate, in other embodiments, neither or only
one of these techniques may be used.
[0086] In the above embodiment, each data value was represented by
echoes within two adjacent half symbol periods. As those skilled in
the art will appreciate, these two half symbol periods do not have
to be immediately adjacent to each other and a gap may be provided
between the two periods if required.
[0087] In the above embodiment, the echoes in each half symbol
period were of exactly the same portion of the audio signal. As
those skilled in the art will appreciate, this is not essential.
The echoes in each half symbol period may be of slightly different
portions of the audio signal. For example, one echo may miss out
some of the audio samples of the audio signal. Alternatively, the
audio signal may include different channels (for example left and
right channels for a stereo signal) and one echo may be formed from
a repetition of the left channel and the other may be formed from a
repetition of the right channel. With modern multi channel surround
sound audio the repetitions can be of any of these channels.
[0088] In the above embodiment, the echoes generated within the
transmitter were added to the original audio signal. As those
skilled in the art will appreciate, the generated echoes may be
combined with the original audio signal in other ways. For example,
the echoes may be subtracted from the audio signal. Similarly,
instead of inverting the echoes to be added to the audio (by
controlling the polarity of the function g(n)), the same result can
be achieved by changing the way in which the echoes are combined
with the audio signal. For example, one echo may be added to the
original audio signal whilst the next echo may be subtracted from
the audio signal.
[0089] In the above embodiment, the lookup table stored values for
g(n) corresponding to one or two bits of the message data (as
illustrated in FIG. 6). As those skilled in the art will
appreciate, this is not essential. For example, the lookup table
could simply store a function which increased in value and then
decreased in value. Additional circuitry could then be provided to
convert the polarity of this output as appropriate for the two half
symbol periods. In this way, the function stored in the lookup
table would only control the fading in and out of the echo and the
additional circuitry would control the polarity of the echo as
required.
[0090] In the above embodiment, the Manchester encoding was
performed by the echo generation and shaping module. As those
skilled in the art will appreciate, this Manchester encoding, if
performed, may be performed within the FEC encoding module.
[0091] As those skilled in the art will appreciate, the techniques
described above for hiding data within the audio may be done in
advance of the transmission of the acoustic signal or it may be
done in real time. Even in the case where the data is to be
embedded within an audio signal in real time, some of the
processing can be done in advance. For example, the FEC encoding
may be performed on the data in advance so that only the echo
generation and echo shaping is performed in real time.
[0092] In the above embodiments, specific examples have been given
of sample rates for the audio signal and symbol rates for the data
that is hidden within the audio signal. As those skilled in the art
will appreciate, these rates are not intended to be limiting and
they may be varied as required. However, in order to keep the
obtrusiveness of the added echoes to a minimum, the data rate of
the encoded data is preferably kept between one and twenty symbols
per second. This corresponds to a symbol period of between 50 ms
and 1 second. In some embodiments, a long symbol period is
beneficial because the added echoes will span across spoken words
within the audio, making it easier to hide the data echoes within
the audio. A larger symbol period also reduces audibility of the
echoes. This is because humans are more sensitive to changing
echoes than they are to static or fixed echoes. Therefore, by
having a longer symbol period, the rate of change of the echoes is
lower making the presence of the echoes less noticeable to a
user.
[0093] In the above embodiment, the data rate of the data added to
the audio signal in the transmitter was constant and was known by
the receiver. This knowledge reduces the complexity of the receiver
circuitry for locking on to the data within the received signal.
However, it is not essential to the invention and more complex
circuitry may be provided in the receiver to allow the receiver to
try different data rates until the actual data rate is determined.
Similarly, the receiver may use other techniques to synchronise
itself with the transmitted data so that it knows where the symbol
boundaries are in advance of receiving the data.
[0094] In the above embodiment, FEC encoding techniques were used
to allow the receiver to be able to correct errors in the received
data. As those skilled in the art will appreciate, such encoding
techniques are not essential to the invention. However, they are
preferred, as they help to correct errors that occur in the
transmission process over the acoustic link.
[0095] In the above embodiments, the peak amplitudes of the echoes
were all the same and were independent of the data value being
transmitted. As those skilled in the art will appreciate, the peak
amplitudes of the echoes may also be varied with data to be
transmitted if desired.
[0096] In the above embodiment, the echoes in each half symbol
period were at the same delays relative to the original audio
signal. As those skilled in the art will appreciate, this is not
essential. There may be some variation in the actual delay values
used within each half symbol period.
[0097] In the above embodiment, the second echo within each half
symbol period was generated by delaying the first echo by a further
delay value. In an alternative embodiment, each echo within each
sub-symbol period may be independently generated from the original
audio signal using an appropriate delay line.
[0098] As those skilled in the art will appreciate, various uses
can be made of the above communication system. For example, the
encoded data may be used as a watermark to protect the original
audio signal. Alternatively, the embedded data may be used to
control the receiver so that it can respond in synchronism with the
audio signal. In particular, the decoder can be programmed to
perform some action a defined time after receiving the codeword.
The time delay may be programmed into the decoder by any means and
may even be defined by data in the received codewords. When used to
perform such synchronisation, shorter symbol periods are preferred
as shorter symbol periods allows for better temporal resolution and
hence more accurate synchronisation. The data may be used for
interactive gaming applications, audience surveying, e-commerce
systems, toys and the like. The reader is referred to the
Applicant's earlier International application WO02/45273 which
describes a number of uses for this type of data hiding system.
[0099] In the above embodiment, the receiver performed
autocorrelation measurements on the input audio signal in order to
identify the locations of the echoes. As those skilled in the art
will appreciate, other techniques can be used to identify the
echoes. Some of these other techniques are described in the
Applicant's earlier PCT application PCT/GB2008/001820 and in U.S.
Pat. No. 5,893,067, the contents of which are incorporated herein
by reference. Typically, although not necessarily, the techniques
involve some form of autocorrelation of the original audio signal
or of parameters obtained from the audio signal (eg LPC parameters,
cepstrum parameters etc). As an alternative, a best fit approach
could be used in which an expected audio signal (with different
echo polarities) is fitted to the actual signal until a match is
found and the polarity of the echoes thus determined.
[0100] In the embodiment described above, a single transmitter was
provided together with a receiver. As those skilled in the art will
appreciate, multiple transmitters and/or multiple receivers may be
provided. Further, the components of the transmitter may be
distributed among a number of different entities. For example, the
encoding and data hiding part of the transmitter may be provided
within a head end of a television distribution system or a user's
set top box and the loudspeaker 19 may be a speaker of the user's
television set.
[0101] In the above embodiments, the echoes were directly derived
from the original audio signal. In alternative embodiments, the
echo may not include all frequency components of the audio signal.
For example, one or more of the echoes may be generated from a
portion of the audio signal after it has been filtered to remove
certain frequencies. This may be beneficial where it is found, for
example, that there is additional noise in the low frequency part
of the echoes but not in the higher frequency part. In this case,
the received signals would also be filtered to remove the lower
frequency components (for example frequencies below about 500 Hz)
so that only the higher frequency components (those above the lower
frequency components) of the audio signal and the echoes would be
present in the signals being analysed. Alternatively, in this case,
the received signal may be passed through a filter that simply
reduces the level of the lower frequency components in the received
signal compared with the higher frequency components. This will
have the effect of reducing the relevance of the noisy low
frequency part of the received signal in the subsequent decoding
process. Similarly, if it turns out that the added echoes introduce
a noticeable distortion in the higher frequencies of the composite
audio signal, then the echoes (or the signals from which they are
derived) may be low pass filtered to remove the higher
frequencies.
[0102] The division of the audio signal into separate frequency
bands can also be used to carry data on multiple channels. For
example, if the frequency band is divided into a high frequency
part and a low frequency part, then one channel may be provided by
adding echoes to the high frequency part and another channel may be
provided by adding different echoes to the low frequency part. The
use of multiple channels in this way allows frequency or temporal
diversity if the data carried in the two channels is the same; or
allows for an increased data transfer rate if each channel carries
different data. Multiple channels can also be provided where the
audio signal also contains multiple channels (used to drive
multiple speakers). In this case, one or more data channels may be
provided in the audio signal for each audio channel.
[0103] In the above embodiment, data was hidden within an audio
signal by adding echoes to the audio signal. In some situations,
the incoming audio may already contain hidden data in the form of
such echoes. In this case, the encoder could decode the existing
hidden data from the received audio signal and then use the decoded
data to clean the audio signal to remove the artificial echoes
defining this hidden data. The encoder could then add new echoes to
the thus cleaned audio signal to hide the new data in the audio
signal. In this way, the original hidden data will not interfere
with the new hidden data.
[0104] In the above embodiment, the echoes were obtained by
delaying digital samples of the audio signal. As those skilled in
the art will appreciate, the echoes may be generated in the
analogue domain, using suitable analogue delay lines and analogue
circuits to perform the echo shaping and polarity modulation.
[0105] In the above embodiments, the audio signal with the embedded
data was transmitted to a receiver over an acoustic link. In an
alternative embodiment, the audio signal may be transmitted to the
receiver over an electrical wire or wireless link. In such an
embodiment, the data rates that are used may be higher, due to
lower noise levels.
[0106] In the above embodiment, one data bit was transmitted within
each symbol period. In an alternative embodiment, multiple bits may
be transmitted within each symbol period. For example a second pair
of echoes may be added at lags of 20 ms and 20.25 ms within each
half symbol period to encode a second bit; a third pair of echoes
may be added at lags of 30 ms and 30.25 ms within each half symbol
period to encode a third bit etc. Each echo could then be faded in
and out during each half symbol period and polarity modulated in
accordance with the bit value as before. The fading in and out of
the echoes for the different bits may be the same or it may be
different for the different bits. The polarity modulation of the
different echoes will of course depend on the different bit values
to be transmitted in the symbol period. In a preferred embodiment,
the echoes for the different bits within the same half symbol
period are faded in and out at different times of the half symbol
period, so that the different echoes reach their peak amplitudes at
different times within the half symbol period. In this way, when
the echo for one bit is at its peak amplitude (or when all the
echoes for one bit are at their peak amplitudes--if there are
multiple echoes representing each bit in each half symbol period),
the echoes for the other bits will not be at their peaks. Doing
this and sampling the different echoes when they are expected to be
at their peak amplitudes, will reduce the interference between the
echoes for the different bits within the same half symbol period.
It also reduces constructive interference of the echoes that may
render the added echoes more noticeable to a listener. Looking at
this another way, this is the same as having multiple parallel data
messages, each encoded as per the embodiments described above, but
with their respective symbol periods offset in time from each other
so that the echoes for the different messages peak at different
times--thereby reducing interference between the messages if the
echoes are all sampled around the time when they are each at their
maximum amplitudes. This technique will increase the bit rate of
data transmission between the transmitter and receiver. The
additional bits may be of the same message or they may be bits of
different messages.
[0107] The inventors have found that the above described data
hiding techniques do not work as well during portions of the audio
that include single tones or multiple harmonic tones, such as would
be found in some sections of music. This is because the hidden data
becomes more obtrusive to the listener in these circumstances and
if the tones are being used as part of an automatic setup procedure
they can cause the procedure to fail. Therefore, in one embodiment,
the inventors propose to include (within the encoder) a detector
that detects the level of tonality or other characteristic of the
audio signal and, if it is highly tonal, that switches off the echo
addition circuitry. Alternatively, as this switching off of the
echoes may itself be noticeable to the user, the encoder may fade
the echoes out during periods of high tonality and then fade them
back in during periods of low tonality. In this way, the data is
only added to the audio signal when the audio signal is not highly
tonal in nature. Various techniques may be used for making this
detection. One technique for determining the level of tonality of
an audio signal (although for a different purpose) is described in
the applicant's earlier PCT application WO02/45286, the contents of
which are incorporated herein by reference. Another technique can
be found in Davis P (1995) "A Tutorial on MPEG/Audio Compression",
IEEE Multimedia Magazine, 2(2), pp. 60-74. Instead of switching off
the echo addition circuitry, the system may be arranged to adapt
the amplitude of the added echoes depending on the detected
characteristic of the audio signal. Alternatively, instead of
varying the amplitudes of the echoes in this way, the encoder may
instead or in addition vary the data rate or the symbol period in
order to reduce the obtrusiveness of the hidden data during periods
when the audio signal is highly tonal.
[0108] An embodiment was described above in which a single message
was encoded and transmitted to a remote receiver as a number of
echoes within an audio signal. In some applications, a sequence of
messages may be transmitted. These messages may be the same or they
may be different. In either case, each message may be transmitted
after a preceding message has been transmitted. Alternatively, the
end of one message may be overlapped with the start of the next
message in a predefined way (so that the receiver can regenerate
each message. This arrangement can increase the time diversity of
the transmitted messages making them less susceptible to certain
types of noise or data loss. In a further alternative, the data
from the different messages may be interleaved in a known manner
and transmitted as a single data stream to the receiver. The
receiver would then regenerate each message by de-interleaving the
bits in the data stream using knowledge of how the messages were
originally interleaved.
[0109] As discussed above, Convolutional Coding is used as part of
the forward error correction (FEC) encoder. As is well known to
those skilled in the art, data encoded in this way generally is
decoded using a Viterbi decoder, which operates by constructing a
trellis of state probabilities and branch metrics. The transmitted
data is often terminated with a number of zeros to force the
encoder back to the zero state. This allows the decoder to start
decoding from a known state, however, it requires extra symbols to
be transmitted over the channel. An alternative technique is to
ensure that the trellis start and end states are identical. This
technique is referred to as tail biting and has the advantage of
not requiring any extra symbols to be transmitted. Tail biting is
used in many communications standards and, if desired, may be used
in the embodiments described above.
[0110] The description above has described the operation of a
system for hiding data as echoes within an audio signal. The
systems described used time domain techniques to generate and add
the echoes and to detect the echoes in the received signal. As
those skilled in the art will appreciate, equivalent processing can
be performed in the frequency domain to achieve the same or similar
results.
[0111] The inventors have found that in some instances, the decoder
does not work as well when the message consists of predominantly
`zero` bits (or conversely predominately `one` bits), since under
the encoding scheme an `all zeros` codeword segment looks the same
as a time-shifted `all ones` codeword segment. A particular example
is the `all zeros` message, which results in an `all zeros`
codeword after Reed Solomon encoding. The encoding works best when
there are approximately equal numbers of ones and zeros in the
codeword, evenly distributed throughout the codeword. This can be
achieved for the disclosed system by inverting the Reed Solomon
parity bits. This has the effect of changing the all zeroes
codeword to a mixture of zeroes and ones. This can also be achieved
by altering the initial state of the feedback shift register used
within the Reed Solomon encoder which is used to generate the
parity bits. This gives more flexibility in setting the ratio of
ones to zeroes in the codeword. Subsequent interleaving distributes
these inverted parity bits throughout the codeword. As those
skilled in the art of error detection and correction will
appreciate, these approaches to balancing the distribution of ones
and zeroes applies to any of the many FEC schemes implemented using
feedback shift registers (or Galois field arithmetic) of which Reed
Solomon is an example.
[0112] In the above embodiments, a number of processing modules and
circuits have been described. As those skilled in the art will
appreciate, these processing modules and circuits may be provided
as hardware circuits or as software modules running within memory
of a general purpose processor. In this case, the software may be
provided on a storage medium such as a CD-ROM or it may be
downloaded into an appropriate programmable device on a carrier
signal over a computer network, such as the Internet. The software
may be provided in compiled form, partially compiled form or in
un-compiled form.
* * * * *