U.S. patent application number 12/136733 was filed with the patent office on 2009-12-10 for techniques for personalizing audio levels.
This patent application is currently assigned to SONY CORPORATION OF JAPAN. Invention is credited to Robert Hardacker, Steven Richman.
Application Number | 20090304205 12/136733 |
Document ID | / |
Family ID | 41400344 |
Filed Date | 2009-12-10 |
United States Patent
Application |
20090304205 |
Kind Code |
A1 |
Hardacker; Robert ; et
al. |
December 10, 2009 |
TECHNIQUES FOR PERSONALIZING AUDIO LEVELS
Abstract
Techniques for personalizing audio levels, in accordance with
embodiments of the present technology, provide different audio
volumes to different locations in a room allowing for two or more
users to enjoy the same audio content at different volumes.
Differential level and delay compensation filtering based on a
position of each of a plurality of speakers, the location of each
user and the preferred relative audio volume of each user are
utilized to produce different effective audio levels in localized
regions of a room.
Inventors: |
Hardacker; Robert;
(Escondido, CA) ; Richman; Steven; (San Diego,
CA) |
Correspondence
Address: |
SONY C/O MURABITO, HAO & BARNES LLP
TWO NORTH MARKET STREET, THIRD FLOOR
SAN JOSE
CA
95113
US
|
Assignee: |
SONY CORPORATION OF JAPAN
Tokyo
NJ
SONY ELECTRONICS, INC.
Parkridge
|
Family ID: |
41400344 |
Appl. No.: |
12/136733 |
Filed: |
June 10, 2008 |
Current U.S.
Class: |
381/104 |
Current CPC
Class: |
H04R 2400/13 20130101;
H04S 7/303 20130101; H03G 3/301 20130101; H04S 7/302 20130101 |
Class at
Publication: |
381/104 |
International
Class: |
H03G 3/00 20060101
H03G003/00 |
Claims
1. A method of personalizing audio levels comprising outputting a
localized audio volume proximate each user based upon a location
and preferred relative audio volume of each user.
2. The method according to claim 1, wherein outputting a localized
audio volume proximate each user comprises psychoacoustic
modulating audio to produce an audio volume proximate each user
based upon the location and preferred relative audio volume of each
user.
3. The method according to claim 1, wherein outputting a localized
audio volume proximate each user comprises applying differential
level and delay compensation filtering to audio based on a position
of each of a plurality of speakers, the location of each user and
the preferred relative audio volume of each user.
4. The method according to claim 1, further comprising: receiving a
command to adjust the audio level; and adjusting the localized
audio volume proximate each user based on the location and
preferred relative audio volume of each user in response to the
command to adjust the audio level.
5. The method according to claim 4, wherein the command comprises
an audio command received from a user.
6. The method according to claim 1, further comprising: determining
the location of each user; and determining the relative audio
volume preference of each user.
7. The method according to claim 6, wherein determining the
location of each user comprises: receiving sound from each user;
determining time delay between the sound received from each user at
a plurality of microphones; and triangulating a position of each
user from the time delay between sound received from each user at
the plurality of microphones.
8. The method according to claim 6, wherein determining the
location of each user comprises: outputting sound from each of a
plurality of speakers; determining time delay between the sound
received from each speaker at a microphone proximate each user; and
triangulating a position of each user from the time delay between
sound received from each speaker at the microphone.
9. The method according to claim 6, wherein determining the
location of each user comprises: outputting a radio frequency
signal from a transmitter proximate each user; determining a time
delay between the radio frequency signal received at each of a
plurality of antennas; and triangulating a position of each user
from the time delay between the radio frequency signal received
from the transmitter at each antenna.
10. The method according to claim 6, wherein determining the
location of each user comprises: outputting an infrared signal from
a transmitter proximate each user; determining a time delay between
the infrared signal received at each of a plurality of receivers;
and triangulating a position of each user from the time delay
between the infrared signal received from the transmitter at each
receiver.
11. The method according to claim 6, wherein determining the
location of each user comprises: outputting an tone or sound in the
audible sound range or non-audible sound range from a transmitter
proximate each user; determining a time delay between the tone or
sound received at each of a plurality of receivers; and
triangulating a position of each user from the time delay between
the infrared signal received from the transmitter at each
receiver.
12. The method according to claim 6, wherein determining the
relative audio volume preference of each user comprises:
determining a preferred audio level of each user; and determining a
difference between the preferred audio level of each user.
13. The method according to claim 6, wherein determining the
relative audio volume preference for each user comprises:
determining a preferred audio level for each user for each of a
plurality of audio levels; and determining a difference between the
preferred audio level of each user for each of the plurality of
audio levels.
14. The method according to claim 6, further comprising storing the
location and relative audio volume preference of each user.
15. A method comprising: accessing mode information including a
location and a relative audio volume preference of each user; and
outputting a localized audio volume proximate each user based upon
the location and preferred relative audio volume of each user.
16. The method according to claim 15, wherein outputting a
localized audio volume proximate each user comprises psychoacoustic
modulating audio to produce an audio volume proximate each user
based upon the location and preferred relative audio volume of each
user.
17. The method according to claim 15, further comprising:
determining the location of each user; and determining the relative
audio volume preference of each user.
18. The method according to claim 15, further comprising: receiving
a command to adjust the audio level; and adjusting the localized
audio volume proximate each user based on the location and
preferred relative audio volume of each user in response to the
command to adjust the audio level.
19. The method according to claim 15, wherein the relative audio
volume preference of each user is fixed for each of a plurality of
audio levels.
20. The method according to claim 16, wherein the relative audio
volume preference of each user is specified by a response
curve.
21. A system for personalizing audio levels comprising: a plurality
of speakers; a source of audio; and a signal processor,
communicatively coupled between the source and the plurality of
speakers, for receiving the audio and a location and a relative
audio volume preference of each user and for causing the plurality
of speakers to output a localized audio volume proximate each user
based upon the location and preferred relative audio volume of each
user.
22. The system of claim 21, further comprising: a remote control
for adjusting an audio level of the source; and the signal
processor for adjusting the localized audio volume proximate each
user in response to the adjusted audio level of the source.
23. The system of claim 21, further comprising: a microphone for
receiving an audible input from a user; the signal processor
implementing voice recognition for converting the audible input to
a command to adjust an audio level of the source and for adjusting
the localized audio volume proximate each user in response to the
audible input from the user.
24. The system of claim 21, further comprising: an image sensor for
receiving a hand gesture from a user; the processor implementation
a gesture recognition for converting the hand gesture of the user
to a command to adjust an audio level of the source and for
adjusting the localized audio volume proximate each user in
response to the audible input from the user
25. The system of claim 21, further comprising: a microphone
proximate a user for receiving a sound from each of the plurality
of speakers; and the signal processor for determining the location
of each user from a time difference between receipt of the sound
from each of the plurality of speakers by the microphone.
26. The system of claim 21, further comprising a logic unit for
determining one or more volume levels preferred by a first user and
corresponding volume levels preferred by one or more additional
users and determining the relative audio volume preference of each
user from the difference between corresponding volume levels
preferred by the one or more additional users relative to the first
user.
27. A system comprising: a means for determining a location of each
of a plurality of users; a means for determining a relative audio
volume preference of each user; a means for storing the location
and relative audio volume preference of each user as one of a
plurality of modes; and a means for outputting the localized audio
volume proximate each user based upon a selected one of the
plurality of modes.
28. The system of claim 27, further comprising: a means for
receiving a command to adjust the audio level; and a means for
adjusting the localized audio volume in response to the command to
adjust the audio level.
29. The system of claim 27, wherein the relative audio volume
preference of each user is fixed for each of a plurality of audio
levels.
30. The system of claim 27, wherein the relative audio volume
preference of each user is specified by a response curve.
Description
BACKGROUND OF THE INVENTION
[0001] In the past, electronic audio and video systems included
radio, television and record players that output audio in a single
channel format. More recently, electronic audio and video systems
have expanded to include video games, CD/DVD players, streaming
audio (e.g., internet radio), MP3 devices, and the like. The audio
and video systems now typically output audio in multi-channel
formats such as stereo and surround sound. The use of multi-channel
format audio generally enhances the user's listening and viewing
experience by more closely replicating the original audio and/or
enhancing a visual perception. For example, multi-channel audio may
be used to output different instruments on different speakers to
give the listener the feeling of being in the middle of a band. In
another example, in a movie the audio track of a plane may be faded
from front to back to aid the perception of a plane flying out of
the screen and past the viewer. However, users typically perceive
the audio differently from one another.
SUMMARY OF THE INVENTION
[0002] Embodiments of the present technology are directed toward
techniques for personalizing audio levels. In one embodiment, a
method of personalizing audio levels includes determining the
location and relative audio volume preference of each user. A
localized audio volume may then be output proximate each user based
upon the location and preferred relative audio volume of each
user.
[0003] In another embodiment, a system for personalizing audio
levels includes a plurality of speakers, a source of audio and a
signal processor. The signal processor receives audio from the
source and causes the plurality of speakers to output a localized
audio volume proximate each user based upon a location and
preferred relative audio volume of each user.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] Embodiments of the present invention are illustrated by way
of example and not by way of limitation, in the figures of the
accompanying drawings and in which like reference numerals refer to
similar elements and in which:
[0005] FIG. 1 shows a flow diagram of a method of personalizing
audio levels, in accordance with one embodiment of the present
technology.
[0006] FIG. 2 shows a block diagram of an exemplary audio or
multimedia environment for providing personalized audio levels.
DETAILED DESCRIPTION OF THE INVENTION
[0007] Reference will now be made in detail to the embodiments of
the present technology, examples of which are illustrated in the
accompanying drawings. While the present technology will be
described in conjunction with these embodiments, it will be
understood that they are not intended to limit the invention to
these embodiments. On the contrary, the invention is intended to
cover alternatives, modifications and equivalents, which may be
included within the scope of the invention as defined by the
appended claims. Furthermore, in the following detailed description
of the present technology, numerous specific details are set forth
in order to provide a thorough understanding of the present
technology. However, it is understood that the present technology
may be practiced without these specific details. In other
instances, well-known methods, procedures, components, and circuits
have not been described in detail as not to unnecessarily obscure
aspects of the present technology.
[0008] It is appreciated that each sound source, such as a
television, outputs sound within a range of continuous or discrete
volumes. In addition, the effective sound level perceived by a user
may differ from the output sound level of the sound source.
Accordingly, the term "audio level" is used herein to refer to the
output level of the sound source, the terms "audio volume" and
"listening volume" are used herein to refer to the volume perceived
by the user, and the term "audio level/volume" is used herein to
refer to the relationship between the "audio level" and the "audio
volume."
[0009] Two or more users of a multimedia system often disagree on
an appropriate listening volume. This results in a less than
satisfactory experience for one or more users. Accordingly, there
is a need for techniques for personalizing sound level/volume.
Embodiments of the present technology personalize the sound
level/volume by locating the users in a room, identifying the
relative volume preferences of the users, and processing and
delivering personalized audio levels to different users located at
know locations. Embodiments may also adjust the processing and
delivery of personalized audio levels in response to adjustments to
the generic volume. Embodiments may also save and recall the
location and relative volume preferences as one or more modes for
use during other sessions. FIG. 1 shows a method of personalizing
audio level/volume, in accordance with one embodiment of the
present technology. The method of personalizing audio level/volume
will be further described with reference to FIG. 2 which shows an
exemplary audio or multimedia environment. It is appreciated that
multimedia systems that output both audio and video are generally
more commonly in use today, and therefore the following detailed
description will refer to multimedia systems for convenience.
However, it is to be appreciated that the present technology as
described herein is equally applicable to systems that only output
audio.
[0010] The exemplary multimedia environment includes source 210 and
a plurality of speakers 212-222. The source may be a television,
cable tuner, satellite tuner, game console, CD/DVD player, VCR,
personal computer, and/or the like. In one implementation, the
audio source 210 may output two channels of audio for output on two
or more speakers. In another implementation, the audio source 210
may include four or more channels (e.g., 5.1 surround sound) for
output on four or more speakers. For instance, the speakers may
include a front left speaker 212, a right front speaker 214, a
center speaker 216, a left surround speaker 218, a right surround
speaker 220 and a subwoofer 222 for outputting a 5.1 surround sound
format of audio.
[0011] The method of personalizing audio level/volume begins with
determining the position of each of two or more users 224, 226, at
110. In one implementation, the multimedia system may include a
pair of microphones 228, 230 located at known positions. The
microphones 228, 230 receive sound from each user 224, 226. The
sound received from each user may be used to triangulate the
position of the corresponding user relative to the microphones 228,
230. For example, a microphone 228, 230 may be mounted on each side
of a television 210. During a training or setup mode, each user
224, 226 may be sequentially prompted to speak. The relative time
difference between the sound 232, 234 received at each speaker 228,
230 may be used to determine the location of the corresponding user
224 that has been prompted to speak. A signal processing system may
be used to determine the time delay between the sound from each
corresponding user received at each microphone. The time delay may
then be used along with the known position of each microphone 228,
230 to triangulate the position of the corresponding user 224.
Additional microphones placed in additional locations may be used
to determine the relative position in additional dimensions.
[0012] In another implementation, a remote controller 236 for the
system may include a short range radio frequency (RF) transmitter
and the television, set top box (STB) or the like may contain two
or more antennas. The antennas are located at known positions. When
each user 224, 226 possesses the remote controller 236, an RF
signal is emitted from the transmitter in the remote controller 236
and received by the antennas. The RF signal received at the
antennas is used by a signal processor to triangulate the position
of the corresponding users. In a similar implementation, the remote
controller 236 may include an infrared (IR) transmitter and two or
more IR receivers may be positioned at known locations in the
television, STB or the like.
[0013] In another implementation, a remote controller 236 for the
system may include a microphone. When each user 224, 226 possesses
the remote controller 236, sound (e.g., training tones) 238-246
emitted from speakers 212-220 at known fixed locations may be used
to triangulate the location of the remote controller 236, and
therefore the user 224 that posses the remote controller 236,
relative to the speakers 212-220. The process is repeated for each
user 224, 226. A signal processing system may be used to determine
the time delay between the sound from each speaker 212-222 received
at the microphone in the remote control 236. The time delay may
then be used along with the known position of each speaker 212-222
to triangulate the position of the corresponding user 224, 226.
Typically, the relative position of the users can be determined
with sufficient accuracy by outputting sound sequentially on three
speakers in diverse fixed locations (e.g., center, left rear and
right rear speakers 216-220). In one implementation, the remote
controller 236 may include logic for determining the time delay.
Data indicating the time delay is may then be sent back to a signal
processor in the television, cable tuner, satellite tuner, game
console, CD/DVD player, VCR, or personal computer which
triangulates the position of the users from the time delay and the
know location of the speakers. In another implementation, a signal
processor in the remote controller 236 may determine the time delay
and triangulate the position of the users and return data to the
television, cable tuner, satellite tuner, game console, CD/DVD
player, VCR, or personal computer indicating the determined
position of the users. The data may be returned from the remote
controller across an RF link or IR link of the remote control.
Alternatively, the data may be returned via an NFC link, a USB
link, memory stick sneaker netted back to the television, cable
tuner, satellite tuner, game console, CD/DVD player, VCR, personal
computer or the like, or by another similar communication
technique.
[0014] In another implementation, a remote controller 236 for the
system may include a transmitter and emits a tone or sound during a
particular mode or during activation of one or more buttons on the
remote control. The tone or sound may be in the audible range
(e.g., 20 Hz-20 KHz), or may be outside the audible range (e.g.,
20-48 KHz) such as an ultrasonic tone or sound. The tone or sound
is received by a plurality of receivers (e.g., microphones) in
fixed locations that are contained in or coupled to the television,
set top box (STB) or the like. The sound or tone received from the
remote control when possessed by each user may be used by a signal
processor in the television, set top box, or the like, to
triangulate the position of the corresponding user relative to the
receivers.
[0015] In yet another implementation, a graphical user interface
(GUI) is displayed on the television 210. A user selects a relative
shape of the room, and a region corresponding to the location in
the room of each user 224, 226. The location may be selected from a
grid overlaid over the relative shape of the room displayed on the
GUI.
[0016] At 120, the relative audio volume preference for each user
224, 226 is determined. When determining the relative audio volume
preference for each user it may be best that the audio volume is
substantially constant across all the locations at which the users
may be located. A logic unit may determine one or more volume
levels preferred by a first user and corresponding volume level
preferred by one or more additional users for each volume level
preference determined for the first user. The logic unit then
determines the relative audio volume preference from the difference
between each user's one or more referred audio levels.
[0017] In one implementation, each user 224 is sequentially
prompted to adjust the audio level of the system to the audio
volume preferred by the user. The user 224 may use the remote
control 236 to select a single preferred audio volume or verbal
commands picked up by the microphones 228, 230. The difference in
the audio levels selected by each user 224, 226 is used to
determine the relative audio volume preference for each user. In
one implementation, the relative audio volume preference may be
fixed for all audio levels of an audio source. In such an
implementation, the audio volume difference between users is fixed
for all audio levels. In another implementation, a response curve
for the relative audio volume preference of each user may be
determined. In such an implementation, the users may be iteratively
queried to select a preferred effective audio volume for each audio
level. In addition, each user may specify a minimum and/or maximum
effective audio volume. The data concerning the preferred effective
audio volume for each audio level, minimum and/or maximum effective
audio volume is used to determine a response curve. The response
curve indicates the preferred relative audio volume of each
user.
[0018] In one implementation, the location and relative audio
volume preference for each user 224, 226 is determined during a
training or setup mode. The position and relative audio volume
preference for each user may then be stored 132 as a set of mode
information, at 130. The location and relative audio volume
preference for each user determined during processes 110 and 120
may be stored 132 for use 134 in setting the localized volume and
adjusting the localized volume in response to commands to adjust
the sound level. In one implementation, the location and relative
sound volume preference may be determined and stored 132 for the
current session. In another implementation, the location and
relative sound volume preference may be stored 132 and recalled 134
for the current and subsequent session to reduce the setup time
during each session. For example, a currently determined location
and relative sound volume preferences may be stored 132 as one of a
plurality of modes. During other subsequent sessions, a given mode
may be selected 134 for use during the session. This can be
particularly useful as viewer habits often are characterized by a
given group of users located in the same spots in the room from one
session to another. For instance, when a husband and wife are
watching television, the husband may most times sit in a first
location and the wife may most times sit in a second location. When
a husband, wife and a child are watching television, the husband
may again sit in the first location, the child may most times sit
in the second location and the wife may most times sit in a third
location. Therefore, a first mode may include the location and
relative audio volume preferences for the husband and wife in the
first and second locations respectively. A second mode may include
the location and relative sound volume preferences for the husband,
child and wife in the first, second and third locations
respectively. Any number of additional modes may be created and
stored 132 for various combinations of users. Alternatively, the
location and relative audio volume preference for each user
determined at 120 may be used directly 136 to output localized
audio volume at 140.
[0019] The audio is output based on the location and preferred
relative audio volume of each user, at 140. The volume of the audio
at the location corresponding to each user (e.g., localized audio
volume 248, 250) is output at a relative audio volume preferred by
the user. The audio source 110 includes a signal processing unit
that applies differential level and delay compensation filtering to
produce psychoacoustical perception of the audio by one or more
users 224, 226. Psychoacoustics are utilized to produce given audio
volumes in localized regions 248. 250 of a room by applying
differential level and delay compensation to the conventional audio
output. The differential level and delay compensation filtering is
based on the position of the speakers 212-222, the location of one
or more users 224, 226, and the relative audio volume preference of
the corresponding user. The signal processing unit may be
implemented by a microprocessor or a dedicated digital signal
processor (DSP). As a result, different relative audio levels for
two or more locations are produced. For example, a first location
248 may be +6 dB louder than a second location 250, regardless of
the volume level from the television. In other words, the audio
level in a localized region 248 around the first user 224 may be at
an effective level of 7 and the effective audio level in a
localized region 250 around the second user 226 may be at an
effective level of 5, when the audio level of the audio source is
set to 7.
[0020] Depending upon the response curves of the individual
listeners, the relative audio volume at the first and second
positions may also vary as the audio level of the audio source
increases or decreases. For example, the relative difference
between the first and second positions 248, 250 may be +6 dB when
the audio level output by the television is at level 4, and might
be +8 dB at a audio level of 7.
[0021] At 150, a command to adjust the audio level is received.
Typically, a user 224 may use the remote control 236 to issue a
command to adjust the audio level up or down using one or more
appropriate buttons on the remote controller 236. The remote
controller 236 issues an appropriate command to the appropriate
device to adjust the audio level in response to activation of an
appropriate button by the user 224. In another implementation, a
microphone on the remote controller 236, television or the like and
a digital signal processor (DSP) implementing voice recognition may
be used to receive audible commands from the user to adjust the
levels. In yet another implementation, the user may input a command
to adjust the audio level using one or more hand gestures or any
other means for adjusting the audio level. At 160, the localized
audio volume proximate each user is adjusted based on the location
and preferred relative audio volume of each user in response to the
command to adjust the audio level. For example, one of the users
224 may use the remote controller 236 to adjust the audio level
from 7 to 9. The audio output is adjusted so that the audio volume
in the localized region 248 around the first user 224 is increased
to an effective level of 9 and the effective level in a localized
region 250 around the second user 226 is increased to an effective
level of 7. The process at 150 and 160 may be repeated to increase
or decrease the localized audio volumes in response to each
corresponding command.
[0022] Embodiments of the present technology advantageously provide
different audio volumes to different locations in a room allowing
for two or more users to enjoy the same audio content at different
volumes. Psychoacoustics are utilized to produce different
effective audio levels in localized regions of a room based on the
location and relative sound level preferences of the current set of
users.
[0023] The foregoing descriptions of specific embodiments of the
present technology have been presented for purposes of illustration
and description. They are not intended to be exhaustive or to limit
the invention to the precise forms disclosed, and obviously many
modifications and variations are possible in light of the above
teaching. The embodiments were chosen and described in order to
best explain the principles of the present technology and its
practical application, to thereby enable others skilled in the art
to best utilize the present technology and various embodiments with
various modifications as are suited to the particular use
contemplated. It is intended that the scope of the invention be
defined by the Claims appended hereto and their equivalents.
* * * * *