U.S. patent application number 10/500016 was filed with the patent office on 2005-10-13 for captioning system.
Invention is credited to Emmens, Michael David, Hosking, Ian Michael, Jones, Aled Wynne, Kelly, Peter John, Reynolds, Michael Raymond.
Application Number | 20050227614 10/500016 |
Document ID | / |
Family ID | 26246904 |
Filed Date | 2005-10-13 |
United States Patent
Application |
20050227614 |
Kind Code |
A1 |
Hosking, Ian Michael ; et
al. |
October 13, 2005 |
Captioning system
Abstract
A captioning system is provided for providing captions for audio
and/or video presentations. The captioning system can be used to
provide text captions or audio descriptions of a video
presentation. A user device is provided for the captioning system
having a receiver operable to receive the captions together with
synchronisation information and a caption output circuit which is
operable to output the captions at the appropriate timings defined
by the synchronisation information. The user device is preferably a
portable hand-held device such as a mobile telephone, PDA or the
like.
Inventors: |
Hosking, Ian Michael;
(Cambridge, GB) ; Jones, Aled Wynne; (Lode,
GB) ; Reynolds, Michael Raymond; (Royston, GB)
; Kelly, Peter John; (Cambridge, GB) ; Emmens,
Michael David; (Shap, GB) |
Correspondence
Address: |
FOGG AND ASSOCIATES, LLC
P.O. BOX 581339
MINNEAPOLIS
MN
55458-1339
US
|
Family ID: |
26246904 |
Appl. No.: |
10/500016 |
Filed: |
April 18, 2005 |
PCT Filed: |
December 23, 2002 |
PCT NO: |
PCT/GB02/05908 |
Current U.S.
Class: |
455/3.06 ;
348/E7.063; 379/88.13 |
Current CPC
Class: |
H04N 7/165 20130101 |
Class at
Publication: |
455/003.06 ;
379/088.13 |
International
Class: |
H04M 011/00 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 24, 2001 |
GB |
0130936.8 |
Feb 25, 2002 |
GB |
0204323.0 |
Claims
1. A captioning system for providing captions for a presentation to
a user, the captioning system comprising: a caption store operable
to store one or more sets of captions each set being associated
with one or more presentations and each set comprising a plurality
of captions for playout at different timings during the associated
presentation; and a user device having: i) a memory operable to
receive and store at least one set of captions for a presentation
to be made to an associated user, from said caption store; ii) a
receiver operable to receive synchronisation information defining
the timing during the presentation at which each caption in the
received set of captions is to be output to the user; iii) a
caption output circuit operable to output to the associated user,
the captions in the received set of captions; and iv) a timing
controller responsive to said received synchronisation information
and operable to control said caption output circuit so that said
captions are output to said user at the timings defined by said
synchronisation information.
2. The system according to claim 1, wherein said captions include
text.
3. The system according to claim 2, wherein said captions include
text for any dialogue in the presentation.
4. The system according to claim 2, wherein said caption output
circuit is operable to output said captions to a display device
associated with the user device for display to the user.
5. The system according to claim 4, wherein said captions include
formatting information for controlling the format of the text
displayed on said display.
6. The system according to claim 4, wherein each caption includes
duration information defining the duration that the caption should
be displayed to the user.
7. The system according to claim 4, wherein said caption includes
timing information defining the time at which the caption should be
displayed to the user during the presentation.
8. The system according to claim 1 wherein said captions include
audio data and wherein said caption output circuit is operable to
output said audio data to an electro-acoustic device for converting
the audio data into corresponding acoustic signals.
9. The system according to claim 1, wherein said presentation
includes audio.
10. The system according to claim 1, wherein said presentation
includes video.
11. The system according to claim 1, wherein said presentation is a
film.
12. The system according to claim 1, wherein said caption store is
formed in a memory card which is insertable into said user device
and wherein said user device includes a reader for reading captions
from said memory card when inserted therein.
13. The system according to claim 1, wherein said caption store is
provided in a computer system and wherein said user device includes
means for communicating with said computer system.
14. The system according to claim 13, wherein said computer system
is remote from said user device and wherein said user device has an
associated communication module for communicating with said remote
computer system.
15. The system according to claim 14, wherein said user device
includes a housing and wherein said communication module is
provided within said housing.
16. The system according to claim 14, wherein said communication
module is operable to communicate with said remote computer system
using a wireless communication link.
17. The system according to claim 16, wherein said user device
comprises a mobile telephone.
18. The system according to claim 1, wherein said user device
comprises a portable computing device such as a personal digital
assistant.
19. The system according to claim 1, wherein said synchronisation
information defines expected time points for one or more
predetermined portions of the presentation.
20. The system according to claim 19, wherein said user device
comprises a monitoring circuit operable to monitor said
presentation to identify the actual time points of said one or more
predetermined portions and wherein said timing controller is
responsive to the difference between the actual timings and the
expected timings to control the outputting of the captions by said
caption output circuit.
21. The system according to claim 20, wherein said predetermined
portions of said presentation correspond to portions of audio of
the presentation and wherein said monitoring circuit includes a
microphone for sensing the audio of the presentation and a
comparator for comparing the received audio with the expected
portions of the audio defined by said synchronisation
information.
22. The system according to claim 20, wherein said user device has
an acquisition mode of operation in which an output of said
monitoring circuit is compared with said predetermined points
defined by said synchronisation information to identify a current
position within said presentation and a tracking mode of operation
in which the output of said monitoring circuit is compared with a
current predetermined portion defined by said synchronisation
information.
23. The system according to claim 22, wherein during said tracking
mode of operation, said monitoring circuit is operable to monitor
said presentation during a predetermined time window around the
expected time point defined by said synchronisation information for
the current predetermined portion.
24. The system according to claim 1, wherein said receiver in said
user device is operable to receive said synchronisation information
from said caption store.
25. The system according to claim 1, wherein said synchronisation
information is embedded within said presentation and wherein said
user device includes a monitoring circuit operable to monitor the
presentation and to extract said synchronisation information
therefrom.
26. The system according to claim 25, wherein said synchronisation
information is embedded within the audio of said presentation.
27. The system according to claim 25, wherein said synchronisation
information comprises synchronisation codes occurring at different
timings during the presentation.
28. The system according to claim 27, wherein each synchronisation
code is unique to uniquely define the position in the
presentation.
29. The system according to claim 1, wherein said caption store
includes a plurality of sets of captions for a plurality of
different presentations.
30. The system according to claim 29, wherein said user device is
operable to capture a portion of said presentation and is operable
to transmit the captured portion to said caption store and when
said caption store is operable to use said captured portion of the
presentation to identify the presentation being made and to
transmit the associated set of captions for the identified
presentation to said user device.
31. The system according to claim 30, wherein said user device is
operable to process the captured portion of the presentation to
extract data characteristic of the captured portion and is operable
to transmit said characteristic data to said caption store, and
wherein said caption store is operable to use said characteristic
data to identify the presentation being made and to transmit the
associated set of captions for the identified presentation to the
user device.
32. The system according to claim 1, wherein said presentation is
given at a venue, wherein said venue is operable to provide an
activation code, wherein said user device is operable to receive
said activation code and further comprises an inhibitor for
inhibiting the operation of said caption output circuit unless said
user device has received said activation code.
33. A user device for use in a captioning system, the user device
comprising: i) a memory operable to receive and store at least one
set of captions for a presentation to be made to an associated
user, from said caption store; ii) a receiver operable to receive
synchronisation information defining the timing during the
presentation at which each caption in the received set of captions
is to be output to the user; iii) a caption output circuit operable
to output to the associated user, the captions in the received set
of captions; and iv) a timing controller responsive to said
received synchronisation information and operable to control said
caption output circuit so that said captions are output to said
user at the timings defined by said synchronisation
information.
34. A computer system for use in a captioning system, the computer
system comprising a caption store operable to store one or more
sets of captions each set being associated with one or more
presentations and each set comprising a plurality of captions which
playout at different timings during the associated presentation and
each caption having associated synchronisation information defining
the timing during the presentation in which each caption in the
received set of captions is to be output to the user; a receiver
operable to receive a request for a set of captions from a user
device; and an output circuit operable to output the requested set
of captions and the synchronisation information to the user
device.
35. A method of manufacturing a computer readable medium storing
caption data and synchronisation data for use in a captioning
system, the method comprising: providing a computer readable
medium; providing a set of captions that is associated with a
presentation which comprises a plurality of captions for playout at
different timings during the associated presentation; providing
synchronisation information defining the timing during the
presentation at which each caption in the set of captions is to be
output to a user; receiving a computer readable medium; recording
computer readable data defining said set of captions and said
synchronisation information on said computer readable medium; and
outputting the computer readable medium having the recorded caption
and synchronisation data thereon.
36. The computer readable medium storing computer executable
instructions for causing a general purpose computing device to
operate as the user device of claim 1.
37. A method of providing captions for presentation to a user, the
method comprising: storing, at a caption store, one or more sets of
captions each being associated with one or more presentations and
each comprising a plurality of captions for playout at different
timings during the associated presentation; and at a user device:
receiving and storing at least one set of captions for a
presentation to be made to an associated user from said caption
store; receiving synchronisation information defining the timing
during the presentation at which each caption in the received set
of captions is to be output to the user; outputting the captions in
the received set of captions to the associated user; and in
response to the received synchronisation information controlling
the outputting step so that said captions are output to the user at
the timings defined by the synchronisation information.
38. A captioning system for providing captions for a presentation
to a user, the captioning system comprising: a caption store
operable to store one or more sets of captions each set being
associated with one or more presentations and each set comprising
one or more captions for playout during the associated
presentation; and a user device having: i) a memory operable to
receive and store at least one set of captions for a presentation
to be made to an associated user, from said caption store; ii) a
receiver operable to receive synchronisation information defining
the timing during the presentation at which the or each caption in
the received set of captions is to be output to the user; and iii)
a caption output circuit operable to output to the associated user,
the or each caption in the received set of captions; and iv) a
timing controller responsive to said received synchronisation
information and operable to control said caption output circuit so
that the or each caption is output to said user at the timing
defined by said synchronisation information.
39. A captioning system for providing captions for a presentation
to a user, the captioning system comprising: a caption store
operable to store one or more sets of captions each set being
associated with one or more presentations and each set comprising a
plurality of captions for playout at different timings during the
associated presentation; and a user device having: i) a memory
operable to receive and store at least one set of captions for a
presentation to be made to an associated user, from said caption
store; ii) a receiver operable to receive synchronisation
information defining the timing during the presentation at which
each caption in the received set of captions is to be output to the
user; and iii) a caption output circuit operable to output to the
associated user, the captions in the received set of captions at
the timings defined by said synchronisation information.
40. A captioning system for providing captions for a presentation
to a user, the captioning system comprising: means for storing one
or more sets of captions each set being associated with one or more
presentations and each set comprising a plurality of captions for
playout at different timings during the associated presentation;
and a user device having: i) means for receiving captions from said
captions store; ii) means for receiving synchronisation information
defining the timing during the presentation at which each caption
is to be output to the user; iii) means for outputting the captions
to a user associated with the user device; and iv) means responsive
to the synchronisation information for controlling said output
means, so that said captions are output to said user at the timings
defined by said synchronisation information.
41. A computer readable medium storing caption data and
synchronisation data for a presentation, the caption data defining
a set of captions for the presentation and comprising a plurality
of captions for playout at different timings during the
presentation; and synchronisation data defining the timing during
the presentation at which each caption in the received set of
captions is to be output to a user.
Description
[0001] The present invention relates to a system and method and
parts thereof for providing captions for audio or video or
multi-media presentations. The invention has particular though not
exclusive relevance to the provision of such a captioning system to
facilitate the enjoyment of the audio, video or multimedia
presentation by people with sensory disabilities.
[0002] A significant proportion of the population with hearing
difficulties benefit from captions (in the form of text) on video
images such as TV broadcasts, video tapes, DVD and films. There are
currently two types of captioning systems available for video
images--on-screen caption systems and off-screen caption systems.
In on-screen caption systems, the caption text is displayed
on-screen and it obscures part of the image. This presents a
particular problem with cinema where there is a reluctance for this
to happen with general audiences. In the off-screen caption system,
the text is displayed on a separate screen. Whilst this overcomes
some of the problems associated with the on-screen caption system,
this solution adds additional cost and complexity and currently has
had poor takeup in cinemas for this reason.
[0003] In addition to text captioning systems for people with
hearing difficulties, there are also captioning systems which
provide audio captions for people with impaired eyesight. In this
type of audio captioning system, an audio description of what is
being displayed is provided to the user in a similar way to the way
in which subtitles are provided for the hard of hearing.
[0004] One aim of the present invention is to provide an
alternative captioning system for the hard of hearing or an
alternative captioning system for those with impaired eyesight. The
captioning system can also be used by those without impaired
hearing or eyesight, for example, to provide different language
captions or the lyrics for songs.
[0005] According to one aspect, the present invention provides a
captioning system comprising: a caption store for storing one or
more sets of captions each being associated with one or more
presentations and each set comprising at least one caption for
playout at different timings during the associated presentation;
and a user device having: (i) a memory for receiving and storing at
least one set of captions for a presentation from the caption
store; (ii) a receiver operable to receive synchronisation
information defining the timing during the presentation at which
each caption in the received set of captions is to be output to the
user; and (iii) a caption output circuit operable to output to the
associated user, the captions in the received set of captions at
the timings defined by the synchronisation information.
[0006] In one embodiment, the captions are text captions which are
output to the user on a display associated with the user device. In
another embodiment, the captions are audio signals which are output
to the user as acoustic signals via a loudspeaker or earphone. The
captioning system can be used, for example in cinemas, to provide
captions to people with sensory disabilities to facilitate their
understanding and enjoyment of, for example, films or other
multimedia presentations.
[0007] The user device is preferably a portable hand-held device
such as a mobile telephone or personal digital assistant, as there
are small and lightweight and most users have access to them. The
use of such a portable computing device is also preferred since it
is easy to adapt the device to operate in the above manner by
providing the device with appropriate software.
[0008] The caption store may be located in a remote server in which
case the user device is preferably a mobile telephone (or a PDA
having wireless connectivity) as this allows for the direct
connection between the user device and the remote server.
Alternatively, the caption store may be a kiosk at the venue at
which the presentation is to be made, in which case the user can
download the captions and synchronisation information when they
arrive. Alternatively, the caption store may simply be a memory
card or smart-card which the user can insert into their user device
in order to obtain the set of captions for the presentation
together with the synchronisation information.
[0009] According to another aspect, the present invention provides
a method of manufacturing a computer readable medium storing
caption data and synchronisation data for use in a captioning
system, the method comprising: providing a computer readable
medium; providing a set of captions that is associated with a
presentation which comprises a plurality of captions for playout at
different timings during the associated presentation; providing
synchronisation information defining the timing during the
presentation at which each caption in the set of captions is to be
output to a user; receiving a computer readable medium; recording
computer readable data defining said set of captions and said
synchronisation information on said computer readable medium; and
outputting the computer readable medium having the recorded caption
and synchronisation data thereon.
[0010] Exemplary embodiments of the present invention will now be
described with reference to the accompanying drawings, in
which:
[0011] FIG. 1 is a schematic overview of a captioning system
embodying the present invention;
[0012] FIG. 2a is a schematic block diagram illustrating the main
components of a user telephone that is used in the captioning
system shown in FIG. 1;
[0013] FIG. 2b is a table representing the captions in a caption
file downloaded to the telephone shown in FIG. 2a from the remote
web server shown in FIG. 1;
[0014] FIG. 2c is a representation of a synchronisation file
downloaded to the mobile telephone shown in FIG. 2a from the remote
web server shown in FIG. 1;
[0015] FIG. 2d is a timing diagram illustrating the timing of
synchronisation signals and illustrating timing windows during
which the mobile telephone processes an audio signal from a
microphone thereof;
[0016] FIG. 2e is a signal diagram illustrating an exemplary audio
signal received by a microphone of the telephone shown in FIG. 2a
and the signature stream generated by a signature extractor forming
part of the mobile telephone;
[0017] FIG. 2f illustrates an output from a correlator forming part
of the mobile telephone shown in FIG. 2a, which is used to
synchronise the display of captions to the user with the film being
watched;
[0018] FIG. 2g schematically illustrates a screen shot from the
telephone illustrated in FIG. 2a showing an example caption that is
displayed to the user;
[0019] FIG. 3 is a schematic block diagram illustrating the main
components of the remote web server forming part of the captioning
system shown in FIG. 1;
[0020] FIG. 4 is a schematic block diagram illustrating the main
components of a portable user device of an alternative embodiment;
and
[0021] FIG. 5 is a schematic block diagram illustrating the main
components of the remote server used with the portable user device
shown in FIG. 5.
OVERVIEW
[0022] FIG. 1 schematically illustrates a captioning system for use
in providing text captions on a number of user devices (two of
which are shown and labelled 1-1 and 1-2) for a film being shown on
a screen 3 within a cinema 5. The captioning system also includes a
remote web server 7 which controls access by the user devices 1 to
captions stored in a captions database 9. In particular, in this
embodiment, the user device 1-1 is a mobile telephone which can
connect to the remote web server 7 via a cellular communications
base station 11, a switching centre 13 and the Internet 15 to
download captions from the captions database 9. In this embodiment,
the second user device 1-2 is a personal digital assistant (PDA)
that does not have cellular telephone transceiver circuitry. This
PDA 1-2 can, however, connect to the remote web server 7 via a
computer 17 which can connect to the Internet 15. The computer 17
may be a home computer located in the user's home 19 and may
typically include a docking station 21 for connecting the PDA 1-2
with the computer 17.
[0023] In this embodiment, the operation of the captioning system
using the mobile telephone 1-1 is slightly different to the
operation of the captioning system using the PDA 1-2. A brief
description of the operation of the captioning system using these
devices will now be given.
[0024] In this embodiment, the mobile telephone 1-1 operates to
download the caption for the film to be viewed at the start of the
film. It does this by capturing a portion of soundtrack from the
beginning of the film, generated by speakers 23-1 and 23-2, which
it processes to generate a signature that is characteristic of the
audio segment. The mobile telephone 1-1 then transmits this
signature to the remote web server 7 via the base station 11,
switching station 13 and the Internet 15. The web server 7 then
identifies the film that is about to begin from the signature and
retrieves the appropriate caption file together with an associated
synchronisation file which it transmits back to the mobile
telephone 1-1 via the Internet 15, switching centre 13 and base
station 11. After the caption file and the synchronisation file
have been received by the mobile telephone 1-1, the connection with
the base station 11 is terminated and the mobile telephone 1-1
generates and displays the appropriate captions to the user in
synchronism with the film that is shown on the screen 3. In this
embodiment, the synchronisation data in the synchronisation file
downloaded from the remote web server 7 defines the estimated
timing of subsequent audio segments within the film and the mobile
telephone 1-1 synchronises the playout of the captions by
processing the audio signal of the film and identifying the actual
timing of those subsequent audio segments in the film.
[0025] In this embodiment, the user of the PDA 1-2 downloads the
caption for the film while they are at home 19 using their personal
computer 17 in advance of the film being shown. In particular, in
this embodiment, the user types in the name of the film that they
are going to see into the personal computer 17 and then sends this
information to the remote web 7 server via the Internet 15. In
response, the web server 7 retrieves the appropriate caption file
and synchronisation file for the film which it downloads to the
user's personal computer 17. The personal computer 17 then stores
the caption file and the synchronisation file in the PDA 1-2 via
the docking station 21. In this embodiment, the subsequent
operation of the PDA 1-2 to synchronise the display of the captions
to the user during the film is the same as the operation of the
mobile telephone 1-1 and will not, therefore, be described
again.
[0026] Mobile Telephone
[0027] A brief description has been given above of the way in which
the mobile telephone 1-1 retrieves and subsequently plays out the
captions for a film to a user. A more detailed description will now
be given of the main components of the mobile telephone 1-1 which
are shown in block form in FIG. 2a. As shown, the mobile telephone
1-1 includes a microphone 41 for detecting the acoustic sound
signal generated by the speakers 23 in the cinema 5 and for
generating a corresponding electrical audio signal. The audio
signal from the microphone 41 is then filtered by a filter 43 to
remove frequency components that are not of interest. The filtered
audio signal is then converted into a digital signal by the
analogue to digital converter (ADC) 45 and then stored in an input
buffer 47. The audio signal written into the input buffer 47 is
then processed by a signature extractor 49 which processes the
audio to extract a signature that is characteristic of the buffered
audio. Various processing techniques can be used by the signature
extractor 49 to extract this signature. For example, the signature
extractor may carry out the processing described in WO 02/11123 in
the name of Shazam Entertainment Limited. In this system, a window
of about 15 seconds of audio is processed to identify a number of
"fingerprints" along the audio string that are representative of
the audio. These fingerprints together with timing information of
when they occur within the audio string forms the above described
signature.
[0028] As shown in FIG. 2a, the signature generated by the
signature extractor is then output to an output buffer 51 and then
transmitted to the remote web server 7 via the antenna 53, a
transmission circuit 55, a digital to analogue converter (DAC) 57
and a switch 59.
[0029] As will be described in more detail below, the remote server
7 then processes the received signature to identify the film that
is playing and to retrieve the appropriate caption file and
synchronisation file for the film. These are then downloaded back
to the mobile telephone 1-1 and passed, via the aerial 53,
reception circuit 61 and analogue to digital converter 63 to a
caption memory 65. FIG. 2b schematically illustrates the form of
the caption file 67 downloaded from the remote web server 7. As
shown, in this embodiment, the caption file 67 includes an ordered
sequence of captions (caption(1) to caption(N)) 69-1 to 69-N. The
caption file 67 also includes, for each caption, formatting
information 71-1 to 71-N that defines the font, colour, etc. of the
text to be displayed. The caption file 67 also includes, for each
caption, a time value t.sub.1 to t.sub.N which defines the time at
which the caption should be output to the user relative to the
start of the film. Finally, in this embodiment, the caption file 67
includes, for each caption 69, a duration .DELTA.t.sub.1 to
.DELTA.t.sub.N which defines the duration that the caption should
be displayed to the user.
[0030] FIG. 2c schematically represents the data within the
synchronisation file 73 which is used in this embodiment by the
mobile telephone 1-1 to synchronise the display of the captions
with the film. As shown, the synchronisation file 73 includes a
number of signatures 75-1 to 75-M each having an associated time
value t.sub.1.sup.s to t.sub.M.sup.s identifying the time at which
the signature should occur within the audio of the film (again
calculated from the beginning of the film).
[0031] In this embodiment, the synchronisation file 73 is passed to
a control unit 81 which controls the operation of the signature
extracting unit 49 and a sliding correlator 83. The control unit 81
also controls the position of the switch 59 so that after the
caption and synchronisation files have been downloaded into the
mobile telephone 1-1, and the mobile telephone 1-1 is trying to
synchronise the output of the captions with the film, the signature
stream generated by the signature extractor 49 is passed to the
sliding correlator 83 via the output buffer 51 and the switch
59.
[0032] Initially, before the captions are output to the user, the
mobile telephone 1-1 must synchronise with the film that is
playing. This is achieved by operating the signature extractor 49
and the sliding correlator 83 in an acquisition mode, during which
the signature extractor extracts signatures from the audio received
at the microphone 41 which are then compared with the signatures 75
in the synchronisation file 73, until it identifies a match between
the received audio from the film and the signatures 75 in the
synchronisation file 73. This match identifies the current position
within the film, which is used to identify the initial caption to
be displayed to the user. At this point, the mobile telephone 1-1
enters a tracking mode during which the signature extractor 49 only
extracts signatures for the audio during predetermined time slots
(or windows) within the film corresponding to when the mobile
telephone 1-1 expects to detect the next signature in the audio
track of the film. This is illustrated in FIG. 2d which shows a
time line (representing the time line for the film) together with
the timings t.sub.1.sup.s to t.sub.M.sup.s corresponding to when
the mobile telephone 1-1 expects the signatures to occur within the
audio track of the film. FIG. 2d also shows a small time slot or
window w.sub.1 to w.sub.M around each of these time points, during
which the signature extractor 49 processes the audio signal to
generate a signature stream which it outputs to the output buffer
51.
[0033] The generation of the signature stream is illustrated in
FIG. 2e which shows a portion 77 of the audio track corresponding
to one of the time windows w.sub.j and the stream 79 of signatures
generated by the signature extractor 49. In this embodiment, three
signatures (signature (i), signature (i+1) and signature (i+2)) are
generated for each processing window w. This is for illustration
purposes only. In practice, many more or less signatures may be
generated for each processing window w. Further, whilst in this
embodiment the signatures are generated from non-overlapping
subwindows of the processing window w, the signatures may also be
generated from overlapping subwindows. The way in which this would
be achieved will be well known to those skilled in the art and will
not be described in any further detail.
[0034] In this embodiment, between adjacent processing windows w,
the control unit 51 controls the signature extractor 49 so that it
does not process the received audio. In this way, the processing
performed by the signature extractor 49 can be kept to a
minimum.
[0035] During this tracking mode of operation, the sliding
correlator 83 is operable to correlate the generated signature
stream in output buffer 51 with the next signature 75 in the
synchronisation file 73. This correlation generates a correlation
plot such as that shown in FIG. 2f for the window of audio being
processed. As shown in FIG. 2d, in this embodiment, the windows
w.sub.j are defined so that the expected timing of the signature is
in the middle of the window. This means that the mobile telephone
1-1 expects the peak output from the sliding correlator 83 to
correspond to the middle of the processing window w. If the peak
occurs earlier or later in the window then the caption output
timing of the mobile telephone 1-1 must be adjusted to keep it in
synchronism with the film. This is illustrated in FIG. 2f which
shows the expected time of the signature t.sub.s appearing in the
middle of the window and the correlation peak occurring .delta.t
seconds before the expected time. This means that the mobile
telephone 1-1 is slightly behind the film and the output timing of
the subsequent captions must be brought forward to catch up with
the film. This is achieved by passing the .delta.t value from the
correlator 83 into a timing controller 85 which generates the
timing signal for controlling the time at which the captions are
played out to the user. As shown, the timing controller receives
its timing reference from the mobile telephone clock 87. The
generated timing signal is then passed to a caption display engine
89 which uses the timing signal to index the caption file 67 in
order to retrieve the next caption 69 for display together with the
associated duration information .DELTA.t and formatting information
71 which it then processes and outputs for display on the mobile
telephone display 91 via a frame buffer 93. The details of how the
caption 69 is generated and formatted are well known to those
skilled in the art and will not be described in any further
detail.
[0036] FIG. 2g illustrates the form of an example caption which is
output on the display 91. FIG. 2g also shows in the right hand side
95 of the display 91 a number of user options that the user can
activate by pressing appropriate function keys on the keypad 97 of
the mobile telephone 1-1. These include a language option 99 which
allows the user to change the language of the caption 69 that is
displayed. This is possible, provided the caption file 67 includes
captions 69 in different languages. As the skilled man will
appreciate, this does not involve any significant processing on the
part of the mobile telephone 1-1, since all that is being changed
is the text of the caption 69 that is to be displayed at the
relevant timings. It is therefore possible to personalise the
captions for different users watching the same film. The options
also include an exit option 101 for allowing the user to exit the
captioning application being run on the mobile telephone 1-1.
[0037] Personal Digital Assistant
[0038] As mentioned above, the PDA 1-2 operates in a similar way to
the mobile telephone 1-1 except it does not include the mobile
telephone transceiver circuitry for connecting directly to the web
server 7. The main components of the PDA 1-2 are similar to those
of the mobile telephone 1-1 described above and will not,
therefore, be described again.
[0039] Remote Web Server
[0040] FIG. 3 is a schematic block diagram illustrating the main
components of the web server 7 used in this embodiment and showing
the captions database 9. As shown, the web server 7 receives input
from the Internet 15 which is either passed to a sliding correlator
121 or to a database reader 123, depending on whether or not the
input is from the mobile telephone 1-1 or from the PDA 1-2. In
particular, the signature from the mobile telephone 1-1 is input to
the sliding correlator 121 where it is compared with signature
streams of all films known to the system, which are stored in the
signature stream database 125. The results of these correlations
are then compared to identify the film that the user is about to
watch. This film ID is then passed to the database reader 123. In
response to receiving a film ID either from the sliding correlator
121 or directly from a user device (such as the PC 17 or PDA 1-2),
the database reader 123 reads the appropriate caption file 67 and
synchronisation file 73 from the captions database 9 and outputs
them to a download unit 127. The download unit 127 then downloads
the retrieved caption file 67 and synchronisation file 73 to the
requesting user device 1 via the Internet 15.
[0041] As those skilled in the art will appreciate, a captioning
system has been described above for providing text captions for a
film for display to a user. The system does not require any
modifications to the cinema or playout system, but only the
provision of a suitably adapted mobile telephone 1-1 or PDA device
1-2 or the like. In this regard, it is not essential to add any
additional hardware to the mobile telephone or the PDA, since all
of the functionality enclosed in the dashed box 94 can be performed
by an appropriate software application run within the mobile
telephone 1-1 or PDA 1-2. In this case, the appropriate software
application may be loaded at the appropriate time, e.g. when the
user enters the cinema and in the case of the mobile telephone 1-1,
is arranged to cancel the ringer on the telephone so that incoming
calls do not disturb others in the audience. The above captioning
system can therefore be used for any film at any time. Further,
since different captions can be downloaded for a film, the system
allows for content variation within a single screening. This
facilitates, for example, the provision of captions in multiple
languages.
[0042] Modifications and Alternative Embodiments
[0043] In the above embodiment, a captioning system was described
for providing text captions on a display of a portable user device
for allowing users with hearing disabilities to understand a film
being watched. As discussed in the introduction of this
application, the above captioning system can be modified to operate
with audio captions (e.g. audio descriptions of the film being
displayed for people with impaired eyesight). This may be done
simply by replacing the text captions 69 in the caption file 67
that is downloaded from the remote server 7 with appropriate audio
files (such as the standard .WAV or MP3 audio files) which can then
be played out to the user via an appropriate headphone or earpiece.
The synchronisation of the playout of the audio files could be the
same as for the synchronisation of the playout of the text
captions. Alternatively synchronisation can be achieved in other
ways. FIG. 4 is a block diagram illustrating the main components of
a mobile telephone that can be used in such an audio captioning
system. In FIG. 4, the same reference numerals have been used for
the same components shown in FIG. 2a and these components will not
be described again.
[0044] In this embodiment, the mobile telephone 1-1' does not
include the signature extractor 49. Instead, as illustrated in FIG.
5, the signature extractor 163 is provided in the remote web server
7'. In operation, the mobile telephone 1-1' captures part of the
audio played out at the beginning of the film and transmits this
audio through to the remote web server 7'. This audio is then
buffered in the input buffer 161 and then processed by the
signature extractor 163 to extract a signature representative of
the audio. This signature is then passed to a correlation table 165
which performs a similar function to the sliding correlator 121 and
signature stream database 125 described in the first embodiment, to
identify the ID for the film currently being played. In particular,
in this embodiment, all of the possible correlations that may have
been performed by the sliding correlator 121 and the signature
stream database 125 are carried out in advance and the results are
stored in the correlation table 165. In this way, the signature
output by the signature extractor 163 is used to index this
correlation table to generate correlation results for the different
films known to the captioning system. These correlation results are
then processed to identify the most likely film corresponding to
the received audio stream. In this embodiment, the captions
database 9 only includes the caption files 67 for the different
films, without any synchronisation 73 files. In response to
receiving the film ID either from the correlation table 165 or from
a user direct from a user device (not shown), the database reader
123 retrieves the appropriate caption file 67 which it downloads to
the user device 1 via the download unit 127.
[0045] Returning to FIG. 4, in this embodiment, since the mobile
telephone 1-1' does not include the signature extractor 49,
synchronisation is achieved in an alternative manner. In
particular, in this embodiment, synchronisation codes are embedded
within the audio track of the film. Therefore, after the caption
file 67 has been stored in the caption memory 65, the control
circuit 81 controls the position of the switch 143 so that the
audio signal input into the input buffer 47 is passed to a data
extractor 145 which is arranged to extract the synchronisation data
that is embedded in the audio track. The extracted synchronisation
data is then passed to the timing controller 85 which controls the
timing at which the individual audio captions are played out by the
caption player 147 via the digital-to-analogue converter 149,
amplifier 151 and the headset 153.
[0046] As those skilled in the art will appreciate, various
techniques can be used to embed the synchronisation data within the
audio track. The applicant's earlier International applications WO
98/32248, WO 01/10065, PCT/GB01/05300 and PCT/GB01/05306 describe
techniques for embedding data within acoustic signals and
appropriate data extractors for subsequently extracting the
embedded data. The contents of these earlier International
applications are incorporated herein by reference.
[0047] In the above audio captioning embodiment, synchronisation
was achieved by embedding synchronisation codes within the audio
and detecting these in the mobile telephone. As those skilled in
the art will appreciate, a similar technique may be used in the
first embodiment. However, embedding audio codes within the
soundtrack of the film is not preferred, since it involves
modifying in some way the audio track of the film. Depending on the
data rates involved, this data may be audible to some viewers which
may detract from their enjoyment of the film. The first embodiment
is therefore preferred since it does not involve any modification
to the film or to the cinema infrastructure.
[0048] In embodiments where the synchronisation data is embedded
within the audio, the synchronisation codes used can either be the
same code repeated whenever synchronisation is required or it can
be a unique code at each synchronisation point. The advantage of
having a unique code at each synchronisation point is that a user
who enters the film late or who requires the captions only at
certain points (for example a user who only rarely requires the
caption) can start captioning at any point during the film.
[0049] In the embodiment described above with reference to FIGS. 4
and 5, the signature extraction operation was performed in the
remote web server rather than in the mobile telephone. As those
skilled in the art will appreciate, this modification can also be
made to the first embodiment described above, without the other
modifications described with reference to FIGS. 4 and 5.
[0050] In the first embodiment described above, during the tracking
mode of operation, the signature extractor only processed the audio
track during predetermined windows in the film. As those skilled in
the art will appreciate, this is not essential. The signature
extractor could operate continuously. However, such an embodiment
is not preferred since it increases the processing that the mobile
telephone has to perform which is likely to increase the power
consumption of the mobile telephone.
[0051] In the above embodiments, the mobile telephone or PDA
monitored the audio track of the film for synchronisation purposes.
As those skilled in the art will appreciate, the mobile telephone
or PDA device may be configured to monitor the video being
displayed on the film screen. However, this is currently not
preferred because it would require an image pickup device (such as
a camera) to be incorporated into the mobile telephone or PDA and
relatively sophisticated image processing hardware and software to
be able to detect the synchronisation points or codes in the video.
Further, it is not essential to detect synchronisation codes or
synchronisation points from the film itself. Another
electromagnetic or pressure wave signal may be transmitted in
synchronism with the film to provide the synchronisation points or
synchronisation codes. In this case, the user device would have to
include an appropriate electromagnetic or pressure wave receiver.
However, this embodiment is not preferred since it requires
modification to the existing cinema infrastructure and it requires
the generation of the separate synchronisation signal which is
itself synchronised to the film.
[0052] In the above embodiments, the captions and where appropriate
the synchronisation data, were downloaded to a user device from a
remote server. As those skilled in the art will appreciate, the use
of such a remote server is not essential. The caption data and the
synchronisation data may be pre-stored in memory cards or smart
cards and distributed or sold at the cinema. In this case, the user
device would preferably have an appropriate slot for receiving the
memory card or smart-card and an appropriate reader for accessing
the caption data and, if provided, the synchronisation data. The
manufacture of the cards would include the steps of providing the
memory card or smart-card and using an appropriate card writer to
write the captions and synchronisation data into the memory card or
into a memory on the smart-card. Alternatively still, the user may
already have a smart-card or memory card associated with their user
device which they simply insert into a kiosk at the cinema where
the captions and, if applicable, the synchronisation data are
written into a memory on the card.
[0053] As a further alternative, the captions and synchronisation
data may be transmitted to the user device from a transmitter
within the cinema. This transmission may be over an electromagnetic
or a pressure wave link.
[0054] In the first embodiment described above, the mobile
telephone had an acquisition mode and a subsequent tracking mode
for controlling the playout of the captions. In an alternative
embodiment, the acquisition mode may be dispensed with, provided
that the remote server can identify the current timing from the
signature received from the mobile telephone. This may be possible
in some instances. However, if the introduction of the film is
repetitive then it may not be possible for the web server to be
able to provide an initial synchronisation.
[0055] In the first embodiment described above, the user devices
downloaded the captions and synchronisation data from a remote web
server via the internet. As those skilled in the art will
appreciate, it is not essential to download the files over the
internet. The files may be downloaded over any wide area or local
area network. The ability to download the caption files from a wide
area network is preferred since centralised databases of captions
may be provided for distribution over a wider geographic area.
[0056] In the first embodiment described above, the user downloaded
captions and synchronisation data from a remote web server.
Although not described, for security purposes, the caption file and
the synchronisation file are preferably encoded or encrypted in
some way to guard against fraudulent use of the captions.
Additionally, the caption system may be arranged so that it can
only operate in cinemas or at venues that are licensed under the
captioning system. In this case, an appropriate activation code may
be provided at the venue in order to "unlock" the captioning system
on the user device. This activation may be provided in human
readable form so that the user has to key in the code into the user
device. Alternatively, the venue may be arranged to transmit the
code (possibly embedded in the film) to an appropriate receiver in
the user device. In either case, the captioning system software in
the user device would have an inhibitor that would inhibit the
outputting of the captions until it received the activation code.
Further, where encryption is used, the activation code may be used
as part of the key for decrypting the captions.
[0057] The above embodiments have described text captioning systems
and audio captioning systems for use in a cinema. As those skilled
in the art will appreciate, these captioning systems may be used
for providing captions for any radio, video or multi-media
presentation. They can also be used in the theatre or opera or
within the user's home.
[0058] Various captioning systems have been described above which
provide text or audio captions for an audio or a video
presentation. The captions may include extra commentary about the
audio or video presentation, such as director's comments,
explanation of complex plots, the names of actors in the film or
third party comments. The captions may also include adverts for
other products or presentations. In addition, the audio captioning
system may be used not only to provide audio descriptions of what
is happening in the film, but also to provide a translation of the
audio track for the film. In this way, each listener in the film
can listen to the film in their preferred language. The caption
system can also be used to provide karaoke captions for use with
standard audio tracks. In this case, the user would download the
lyrics and the synchronisation information which define the timing
at which the lyrics should be displayed and highlighted to the
user.
[0059] In addition to the above, the captioning system described
above may be provided to control the display of video captions. For
example, such video captions can be used to provide sign language
(either real images or computer generated images) for the audio in
the presentation being given.
[0060] In the above embodiments, the captions for the presentation
to be made were downloaded in advance for playout. In an
alternative embodiment, the captions may be downloaded from the
remote server by the user device when they are needed. For example,
the user device may download the next caption when it receives the
next synchronisation code for the next caption.
[0061] In the caption system described above, a user downloads or
receives the captions and the synchronisation information either
from a web server or locally at the venue at which the audio or
visual presentation is to be made. As those skilled in the art will
appreciate, for applications where the user has to pay to download
or playout the captions, a transaction system is preferably
provided to facilitate the collection of the monies due. In
embodiments where the captions are downloaded from a web server,
this transaction system preferably forms part of or is associated
with the web server providing the captions. In this case, the user
can provide electronic payment or payment through credit card or
the like at the time that they download the captions. This is
preferred, since it is easier to link the payment being made with
the captions and synchronisation information downloaded.
[0062] In the first embodiment described above, the ID for the film
was automatically determined from an audio signature transmitted
from the user's mobile telephone. Alternatively, instead of
transmitting the audio signature, the user can input the film ID
directly into the telephone for transmission to the remote server.
In this case, the correlation search of the signature database is
not essential.
[0063] In the first embodiment described above, the user device
processed the received audio to extract a signature characteristic
of the film that they are about to watch. The processing that is
preferred is the processing described in the Shazam Entertainment
Ltd patent mentioned above. However, as those skilled in the art
will appreciate, other types of encoding may be performed. The main
purpose of the signature extractor unit in the mobile telephone is
to compress the audio to generate data that is still representative
of the audio from which the remote server can identify the film
about to be watched. Various other compression schemes may be used.
For example, a GSM codec together with other audio compression
algorithms may be used.
[0064] In the above embodiments in which text captions are
provided, they were displayed to the user on a display of a
portable user device. Whilst this offers the simplest deployment of
the captioning system, other options are available. For example,
the user may be provided with an active or passive type
head-up-display through which the user can watch the film and on
which the captions are displayed (active) or are projected
(passive) to overlay onto the film being watched. This has the
advantage that the user does not have to watch two separate
displays. A passive type of head-up-display can be provided, for
example, by providing the user with a pair of glasses having a beam
splitter (e.g. a 45.degree. prism) on which the user can see the
cinema screen and the screen of their user device (e.g. phone or
PDA) sitting on their lap. Alternatively, instead of using a
head-up-display, a separate transparent screen may be erected in
front of the user's seat and onto which the captions are projected
by the user device or a seat-mounted projector.
[0065] In the first embodiment described above, the caption file
included a time ordered sequence of captions together with
associated formatting information and timing information. As those
skilled in the art will appreciate, it is not essential to arrange
the captions in such of time sequential order. However, arranging
them in this way reduces the processing involved in identifying the
next caption to display. Further, it is not essential to have
formatting information in addition to the caption. The minimum
information required is the caption information. Further, it is not
essential that this be provided in a file as each of the individual
captions for the presentation may be downloaded separately.
However, the above described format for the caption file is
preferred since it is simple and can easily be created using, for
example, a spreadsheet. This simplicity also provides the potential
to create a variety of different caption content.
[0066] In embodiments where the user's mobile telephone is used to
provide the captioning, the captioning system can be made
interactive whereby the user can interact with the remote server,
for example interacting with adverts or questionaries before the
film starts. This interaction can be implemented using, for
example, a web browser on the user device that receives URLs and
links to other information on websites.
[0067] In the first embodiment described above, text captions were
provided for the audio in the film to be watched. These captions
may include full captions, subtitles for the dialogue only or
subtitles at key parts of the plot. Similar variation may be
applied for audio captions.
* * * * *