U.S. patent application number 16/401981 was filed with the patent office on 2020-03-05 for playback device calibration.
The applicant listed for this patent is Sonos, Inc.. Invention is credited to Timothy Sheen.
Application Number | 20200077219 16/401981 |
Document ID | / |
Family ID | 66541138 |
Filed Date | 2020-03-05 |
![](/patent/app/20200077219/US20200077219A1-20200305-D00000.png)
![](/patent/app/20200077219/US20200077219A1-20200305-D00001.png)
![](/patent/app/20200077219/US20200077219A1-20200305-D00002.png)
![](/patent/app/20200077219/US20200077219A1-20200305-D00003.png)
![](/patent/app/20200077219/US20200077219A1-20200305-D00004.png)
![](/patent/app/20200077219/US20200077219A1-20200305-D00005.png)
![](/patent/app/20200077219/US20200077219A1-20200305-D00006.png)
![](/patent/app/20200077219/US20200077219A1-20200305-D00007.png)
![](/patent/app/20200077219/US20200077219A1-20200305-D00008.png)
![](/patent/app/20200077219/US20200077219A1-20200305-D00009.png)
United States Patent
Application |
20200077219 |
Kind Code |
A1 |
Sheen; Timothy |
March 5, 2020 |
Playback Device Calibration
Abstract
Systems and methods for calibrating a playback device include
(i) outputting first audio content; (ii) capturing audio data
representing reflections of the first audio content within a room
in which the playback device is located; (iii) based on the
captured audio data, determining an acoustic response of the room;
(iv) connecting to a database comprising a plurality of sets of
stored audio calibration settings, each set associated with a
respective stored acoustic room response of a plurality of stored
acoustic room responses; (v) querying the database for a stored
acoustic room response that corresponds to the determined acoustic
response of the room in which the playback device is located; and
(vi) applying to the playback device a particular set of stored
audio calibration settings associated with the stored acoustic room
response that corresponds to the determined acoustic response of
the room in which the playback device is located.
Inventors: |
Sheen; Timothy; (Brighton,
MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Sonos, Inc. |
Santa Barbara |
CA |
US |
|
|
Family ID: |
66541138 |
Appl. No.: |
16/401981 |
Filed: |
May 2, 2019 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16115524 |
Aug 28, 2018 |
10299061 |
|
|
16401981 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G10K 11/17881 20180101;
G10K 11/17823 20180101; G06F 3/165 20130101; H04R 2227/007
20130101; H04S 2400/13 20130101; H04S 7/305 20130101; H04S 7/301
20130101 |
International
Class: |
H04S 7/00 20060101
H04S007/00; G06F 3/16 20060101 G06F003/16 |
Claims
1. A playback device comprising: a microphone; a speaker; one or
more processors; and tangible, non-transitory, computer-readable
media storing instructions executable by the one or more processors
to cause the playback device to perform operations comprising:
outputting, via the speaker, first audio content; capturing, via
the microphone, audio data representing reflections of the first
audio content within a room in which the playback device is
located; based on the captured audio data, determining an acoustic
response of the room in which the playback device is located;
establishing a connection with a database comprising a plurality of
sets of stored audio calibration settings, each set associated with
a respective stored acoustic room response of a plurality of stored
acoustic room responses; wherein the plurality of stored acoustic
room responses are determined based on multiple media playback
systems each performing a respective acoustic room response
determination process comprising (i) outputting, via a respective
playback device within a respective room that is different from the
room in which the playback device is located, respective audio
content, (ii) while the respective playback device outputs the
respective audio content, capturing, via a microphone disposed in a
housing of the respective playback device, respective audio data
representing reflections of the respective audio content in the
respective room, and (iii) based on the respective audio data,
determining an acoustic response of the respective room; querying
the database for a stored acoustic room response that corresponds
to the determined acoustic response of the room in which the
playback device is located; responsive to the query, applying to
the playback device a particular set of stored audio calibration
settings associated with the stored acoustic room response that
corresponds to the determined acoustic response of the room in
which the playback device is located; and outputting, via the
speaker, second audio content using the particular set of audio
calibration settings associated with the stored acoustic room
response that corresponds to the determined acoustic response of
the room in which the playback device is located.
2. The playback device of claim 1, wherein each respective set of
stored audio calibration settings includes respective audio
calibration settings for offsetting one or more audio
characteristics of an associated respective stored acoustic room
response.
3. The playback device of claim 1, wherein querying the database
comprises: mapping the acoustic response of the room in which the
playback device is located to a particular stored acoustic room
response in the database that satisfies a threshold similarity to
the acoustic response of the room in which the playback device is
located.
4. The playback device of claim 1, wherein a self-response of the
playback device is pre-determined in an anechoic chamber, and
wherein determining the acoustic response of the room in which the
playback device is located comprises offsetting the self-response
of the playback device from the captured audio data representing
reflections of the first audio content.
5. The playback device of claim 1, wherein a self-response of the
microphone is pre-determined in an anechoic chamber, and wherein
determining the acoustic response of the room in which the playback
device is located comprises offsetting the self-response of the
microphone from the captured audio data representing reflections of
the first audio content.
6. The playback device of claim 1, wherein outputting, via the
speaker, the first audio content comprises gradually increasing a
volume level of the playback device while outputting the first
audio content, and wherein the operations further comprise: while
outputting the first audio content, measuring a signal-to-noise
ratio of the first audio content to environmental noise in the room
in which the playback device is located; and when the
signal-to-noise ratio exceeds a threshold value for calibration,
ceasing to increase the volume level of the playback device and
continuing to output the first audio content at the current volume
level.
7. The playback device of claim 1, wherein the first audio content
is different from the respective audio content.
8. Tangible, non-transitory, computer-readable media storing
instructions executable by one or more processors to cause a
playback device to perform operations comprising: outputting, via a
speaker of the playback device, first audio content; capturing, via
a microphone of the playback device, audio data representing
reflections of the first audio content within a room in which the
playback device is located; based on the captured audio data,
determining an acoustic response of the room in which the playback
device is located; establishing a connection with a database
comprising a plurality of sets of stored audio calibration
settings, each set associated with a respective stored acoustic
room response of a plurality of stored acoustic room responses;
wherein the plurality of stored acoustic room responses are
determined based on multiple media playback systems each performing
a respective acoustic room response determination process
comprising (i) outputting, via a respective playback device within
a respective room that is different from the room in which the
playback device is located, respective audio content, (ii) while
the respective playback device outputs the respective audio
content, capturing, via a microphone disposed in a housing of the
respective playback device, respective audio data representing
reflections of the respective audio content in the respective room,
and (ii) based on the respective audio data, determining an
acoustic response of the respective room; querying the database for
a stored acoustic room response that corresponds to the determined
acoustic response of the room in which the playback device is
located; responsive to the query, applying to the playback device a
particular set of stored audio calibration settings associated with
the stored acoustic room response that corresponds to the
determined acoustic response of the room in which the playback
device is located; and outputting, via the speaker of the playback
device, second audio content using the particular set of audio
calibration settings associated with the stored acoustic room
response that corresponds to the determined acoustic response of
the room in which the playback device is located.
9. The tangible, non-transitory, computer-readable media of claim
8, wherein each respective set of stored audio calibration settings
includes respective audio calibration settings for offsetting one
or more audio characteristics of an associated respective stored
acoustic room response.
10. The tangible, non-transitory, computer-readable media of claim
8, wherein querying the database comprises: mapping the acoustic
response of the room in which the playback device is located to a
particular stored acoustic room response in the database that
satisfies a threshold similarity to the acoustic response of the
room in which the playback device is located.
11. The tangible, non-transitory, computer-readable media of claim
8, wherein a self-response of the playback device is pre-determined
in an anechoic chamber, and wherein determining the acoustic
response of the room in which the playback device is located
comprises offsetting the self-response of the playback device from
the captured audio data representing reflections of the first audio
content.
12. The tangible, non-transitory, computer-readable media of claim
8, wherein a self-response of the microphone is pre-determined in
an anechoic chamber, and wherein determining the acoustic response
of the room in which the playback device is located comprises
offsetting the self-response of the microphone from the captured
audio data representing reflections of the first audio content.
13. The tangible, non-transitory, computer-readable media of claim
8, wherein outputting, via the speaker of the playback device, the
first audio content comprises gradually increasing a volume level
of the playback device while outputting the first audio content,
and wherein the operations further comprise: while outputting the
first audio content, measuring a signal-to-noise ratio of the first
audio content to environmental noise in the room in which the
playback device is located; and when the signal-to-noise ratio
exceeds a threshold value for calibration, ceasing to increase the
volume level of the playback device and continuing to output the
first audio content at the current volume level.
14. The tangible, non-transitory, computer-readable media of claim
8, wherein the first audio content is different from the respective
audio content.
15. A method comprising: outputting, via a speaker of a playback
device, first audio content; capturing, via a microphone of the
playback device, audio data representing reflections of the first
audio content within a room in which the playback device is
located; based on the captured audio data, determining an acoustic
response of the room in which the playback device is located;
establishing a connection with a database comprising a plurality of
sets of stored audio calibration settings, each set associated with
a respective stored acoustic room response of a plurality of stored
acoustic room responses; wherein the plurality of stored acoustic
room responses are determined based on multiple media playback
systems each performing a respective acoustic room response
determination process comprising (i) outputting, via a respective
playback device within a respective room that is different from the
room in which the playback device is located, respective audio
content, (i) while the respective playback device outputs the
respective audio content, capturing, via a microphone disposed in a
housing of the respective playback device, respective audio data
representing reflections of the respective audio content in the
respective room, and (ii) based on the respective audio data,
determining an acoustic response of the respective room; querying
the database for a stored acoustic room response that corresponds
to the determined acoustic response of the room in which the
playback device is located; responsive to the query, applying to
the playback device a particular set of stored audio calibration
settings associated with the stored acoustic room response that
corresponds to the determined acoustic response of the room in
which the playback device is located; and outputting, via the
speaker of the playback device, second audio content using the
particular set of audio calibration settings associated with the
stored acoustic room response that corresponds to the determined
acoustic response of the room in which the playback device is
located.
16. The method of claim 15, wherein each respective set of stored
audio calibration settings includes respective audio calibration
settings for offsetting one or more audio characteristics of an
associated respective stored acoustic room response.
17. The method of claim 15, wherein querying the database
comprises: mapping the acoustic response of the room in which the
playback device is located to a particular stored acoustic room
response in the database that satisfies a threshold similarity to
the acoustic response of the room in which the playback device is
located.
18. The method of claim 15, wherein a self-response of the playback
device is pre-determined in an anechoic chamber, and wherein
determining the acoustic response of the room in which the playback
device is located comprises offsetting the self-response of the
playback device from the captured audio data representing
reflections of the first audio content.
19. The method of claim 15, wherein a self-response of the
microphone is pre-determined in an anechoic chamber, and wherein
determining the acoustic response of the room in which the playback
device is located comprises offsetting the self-response of the
microphone from the captured audio data representing reflections of
the first audio content.
20. The method of claim 15, wherein outputting, via the speaker of
the playback device, the first audio content comprises gradually
increasing a volume level of the playback device while outputting
the first audio content, and wherein the operations further
comprise: while outputting the first audio content, measuring a
signal-to-noise ratio of the first audio content to environmental
noise in the room in which the playback device is located; and when
the signal-to-noise ratio exceeds a threshold value for
calibration, ceasing to increase the volume level of the playback
device and continuing to output the first audio content at the
current volume level.
Description
CROSS REFERENCE TO RELATED APPLICATION
[0001] This application claims priority under 35 U.S.C. .sctn. 120
to, and is a continuation of, U.S. patent application Ser. No.
16/115,524 filed on Aug. 28, 2018, entitled "Playback Device
Calibration," which is incorporated herein in its entirety.
FIELD OF THE DISCLOSURE
[0002] The present disclosure is related to consumer goods and,
more particularly, to methods, systems, products, features,
services, and other elements directed to media playback or some
aspect thereof.
BACKGROUND
[0003] Options for accessing and listening to digital audio in an
out-loud setting were limited until in 2002, when SONOS, Inc. began
development of a new type of playback system. Sonos then filed one
of its first patent applications in 2003, entitled "Method for
Synchronizing Audio Playback between Multiple Networked Devices,"
and began offering its first media playback systems for sale in
2005. The Sonos Wireless Home Sound System enables people to
experience music from many sources via one or more networked
playback devices. Through a software control application installed
on a controller (e.g., smartphone, tablet, computer, voice input
device), one can play what she wants in any room having a networked
playback device. Media content (e.g., songs, podcasts, video sound)
can be streamed to playback devices such that each room with a
playback device can play back corresponding different media
content. In addition, rooms can be grouped together for synchronous
playback of the same media content, and/or the same media content
can be heard in all rooms synchronously.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] Features, aspects, and advantages of the presently disclosed
technology may be better understood with regard to the following
description, appended claims, and accompanying drawings, as listed
below. A person skilled in the relevant art will understand that
the features shown in the drawings are for purposes of
illustrations, and variations, including different and/or
additional features and arrangements thereof, are possible.
[0005] FIG. 1A is a partial cutaway view of an environment having a
media playback system configured in accordance with aspects of the
disclosed technology.
[0006] FIG. 1B is a schematic diagram of the media playback system
of FIG. 1A and one or more networks.
[0007] FIG. 1C is a block diagram of a playback device.
[0008] FIG. 1D is a block diagram of a playback device.
[0009] FIG. 1E is a block diagram of a network microphone
device.
[0010] FIG. 1F is a block diagram of a network microphone
device.
[0011] FIG. 1G is a block diagram of a playback device.
[0012] FIG. 1H is a partially schematic diagram of a control
device.
[0013] FIG. 2A is a diagram of a playback environment within which
a playback device may be calibrated.
[0014] FIG. 2B is a block diagram of a database for storing room
response data and corresponding playback device calibration
settings.
[0015] FIG. 2C is a diagram of a playback environment within which
a playback device may be calibrated.
[0016] FIG. 3A is a flowchart of a method for populating a database
with room response data and corresponding playback device
calibration settings.
[0017] FIG. 3B is a flowchart of a method for calibrating a
playback device using a database populated with room response data
and corresponding playback device calibration settings.
[0018] The drawings are for the purpose of illustrating example
embodiments, but those of ordinary skill in the art will understand
that the technology disclosed herein is not limited to the
arrangements and/or instrumentality shown in the drawings.
DETAILED DESCRIPTION
I. Overview
[0019] Any environment has certain acoustic characteristics
("acoustics") that define how sound travels within that
environment. For instance, with a room, the size and shape of the
room, as well as objects inside that room, may define the acoustics
for that room. For example, angles of walls with respect to a
ceiling affect how sound reflects off the wall and the ceiling. As
another example, furniture positioning in the room affects how the
sound travels in the room. Various types of surfaces within the
room may also affect the acoustics of that room; hard surfaces in
the room may reflect sound, whereas soft surfaces may absorb sound.
Accordingly, calibrating a playback device within a room so that
the audio output by the playback device accounts for (e.g.,
offsets) the acoustics of that room may improve a listening
experience in the room.
[0020] U.S. Pat. No. 9,706,323 entitled, "Playback Device
Calibration," and U.S. Pat. No. 9,763,018 entitled, "Calibration of
Audio Playback Devices," which are hereby incorporated by reference
in their entirety, provide examples of calibrating playback devices
to account for the acoustics of a room.
[0021] An example calibration process for a media playback system
involves a playback device outputting audio content while in a
given environment (e.g., a room). The audio content may have
predefined spectral content, such as a pink noise, a sweep, or a
combination of content. Then, one or more microphone devices detect
the outputted audio content at one or more different spatial
positions in the room to facilitate determining an acoustic
response of the room (also referred to herein as a "room
response").
[0022] For example, a mobile device with a microphone, such as a
smartphone or tablet (referred to herein as a network device) may
be moved to the various locations in the room to detect the audio
content. These locations may correspond to those locations where
one or more listeners may experience audio playback during regular
use (i.e., listening) of the playback device. In this regard, the
calibration process involves a user physically moving the network
device to various locations in the room to detect the audio content
at one or more spatial positions in the room. Given that this
acoustic response involves moving the microphone to multiple
locations throughout the room, this acoustic response may also be
referred to as a "multi-location acoustic response."
[0023] Based on a multi-location acoustic response, the media
playback system may identify an audio processing algorithm. For
instance, a network device may identify an audio processing
algorithm, and transmit to the playback device, data indicating the
identified audio processing algorithm. In some examples, the
network device identifies an audio processing algorithm that, when
applied to the playback device, results in audio content output by
the playback device having a target audio characteristic, such as a
target frequency response at one or more locations in the room. The
network device can identify the audio processing algorithm in
various ways. In one case, the network device determines the audio
processing algorithm based on the data indicating the detected
audio content. In another case, the network device sends, to a
computing device such as a server, data indicating the audio
content detected at the various locations in the room, and
receives, from the computing device, the audio processing algorithm
after the server (or another computing device connected to the
server) has determined the audio processing algorithm.
[0024] In some circumstances, performing a calibration process such
as the one described above is not feasible or practical. For
example, a listener might not have access to a network device that
is capable of or configured for performing such a calibration
process. As another example, a listener may choose not to calibrate
the playback device because of they find the process of moving the
microphone around the room inconvenient or otherwise
burdensome.
[0025] Disclosed herein are systems and methods to help address
these or other issues. In particular, a playback device in an
environment is configured to calibrate itself with respect to the
environment without using a network device to detect audio content
at various locations in the room. To do so, the playback device
leverages a database of calibration settings (e.g., audio
processing algorithms) that have been generated for other playback
devices using a calibration process, such as the process described
above. Given a sufficiently large database of calibration settings,
the database becomes statistically capable of providing a set of
calibration settings that are appropriate for calibrating the
playback device to account for the acoustic response of its
environment.
[0026] In practice, such a database is populated with calibration
settings by various playback devices performing a calibration
process similar to the process described above. Namely, the
database is populated by performing a calibration process for each
playback device of a number of playback devices that involves each
playback device outputting audio content in a room, moving a
network device to various locations in the room to determine a
multi-location acoustic response of the room, and determining the
calibration settings based on the room's multi-location acoustic
response. This process is repeated by a large number of users in a
larger number of different rooms, thereby providing a statistically
sufficient volume of different room responses and corresponding
calibration settings.
[0027] Further, a playback device may include its own microphone,
which the playback device uses to determine an acoustic response of
the room different from the multi-location acoustic response of the
room. While the playback device outputs audio content for
determining the multi-location acoustic response of the room as
described above, the playback device concurrently uses its own
microphone to detect reflections of the audio content within the
room and determines a different acoustic response of the room based
on the detected reflections (as compared with the acoustic response
determined based on the reflections detected by a network device).
This acoustic response determined by the playback device may be
referred to as a "localized acoustic response," as the acoustic
response is determined based on captured audio localized at the
playback device, rather than at multiple locations throughout the
room via the microphone of the network device. Data representing
the localized acoustic response and data representing the
calibration settings are then stored in the database and associated
with one another. As a result, the database is populated with a
number of records, each record corresponding to a respective
playback device, and each record including data representing the
respective playback device's localized acoustic response and the
respective playback device's calibration settings for the localized
acoustic response.
[0028] Once the database is populated, media playback systems can
access the database to determine suitable calibration settings
without requiring the use of a network device to first determine a
multi-location acoustic response for a room in which the playback
device(s) of that system are located. For instance, the playback
device determines a localized acoustic response for the room by
outputting audio content in the room and using a microphone of the
playback device to detect reflections of the audio content within
the room. The playback device then queries the database to identify
a stored localized acoustic response that is substantially similar
to, or that is most similar to, the localized acoustic response
determined by the playback device. The playback device then applies
to itself the identified calibration settings that are associated
in the database with the identified localized acoustic
response.
[0029] The above playback device calibration process may be
initiated at various times and/or in various ways. In some
examples, calibration of the playback device is initiated when the
playback device is being set up for the first time, when the
playback device plays music for the first time, or if the playback
device has been moved to a new location. For instance, if the
playback device is moved to a new location, calibration of the
playback device may be initiated based on a detection of the
movement or based on a user input indicating that the playback
device has moved to a new location. In some examples, calibration
of the playback device is initiated on demand via a controller
device. Further, in some examples, calibration of the playback
device is initiated periodically, or after a threshold amount of
time has elapsed after a previous calibration, in order to account
for changes to the environment of the playback device and/or
changes to the database of calibration settings.
[0030] Accordingly, in some implementations, for example, a
playback device outputs first audio content via one or more
speakers of the playback device, and the playback device captures
audio data representing reflections of the first audio content
within a room in which the playback device is located via one or
more microphones of the playback device. Based on the captured
audio data, the playback device determines an acoustic response of
the room in which the playback device is located. Further, the
playback device establishes a connection with a database populated
with a plurality of sets of stored audio calibration settings, each
set associated with a respective stored acoustic room response of a
plurality of stored acoustic room responses. The plurality of sets
of stored audio calibration settings are determined based on
multiple media playback systems each performing a respective audio
calibration process, which includes (i) outputting, via a
respective playback device within a respective room that is
different from the room in which the playback device is located,
respective audio content, (ii) while the respective playback device
outputs the respective audio content, capturing, via a microphone
of a respective network device in communication with the respective
playback device, first respective audio data representing
reflections of the respective audio content in the respective room
while the respective network device is moving from a first physical
location to a second physical location within the respective room,
and (iii) based on the first respective audio data, determining a
set of audio calibration settings for the respective playback
device. Additionally, the plurality of stored acoustic room
responses are determined based on the multiple media playback
systems each performing a respective acoustic room response
determination process, which includes (i) while the respective
playback device outputs the respective audio content, capturing,
via a microphone disposed in a housing of the respective playback
device, second respective audio data representing reflections of
the respective audio content in the respective room, and (ii) based
on the second respective audio data, determining an acoustic
response of the respective room. Once the playback device has
established a connection with the database, the playback device
queries the database for a stored acoustic room response that
corresponds to the determined acoustic response of the room in
which the playback device is located. Responsive to the query, the
playback device applies to itself a particular set of stored audio
calibration settings associated with the stored acoustic room
response that corresponds to the determined acoustic response of
the room in which the playback device is located. The playback
device then outputs, via one or more of its speakers, second audio
content using the particular set of audio calibration settings
associated with the stored acoustic room response that corresponds
to the determined acoustic response of the room in which the
playback device is located.
[0031] While some examples described herein may refer to functions
performed by given actors such as "users," "listeners," and/or
other entities, it should be understood that this is for purposes
of explanation only. The claims should not be interpreted to
require action by any such example actor unless explicitly required
by the language of the claims themselves.
[0032] In the Figures, identical reference numbers identify
generally similar, and/or identical, elements. To facilitate the
discussion of any particular element, the most significant digit or
digits of a reference number refers to the Figure in which that
element is first introduced. For example, element 110a is first
introduced and discussed with reference to FIG. 1A. Many of the
details, dimensions, angles and other features shown in the Figures
are merely illustrative of particular embodiments of the disclosed
technology. Accordingly, other embodiments can have other details,
dimensions, angles and features without departing from the spirit
or scope of the disclosure. In addition, those of ordinary skill in
the art will appreciate that further embodiments of the various
disclosed technologies can be practiced without several of the
details described below.
II. Suitable Operating Environment
[0033] FIG. 1A is a partial cutaway view of a media playback system
100 distributed in an environment 101 (e.g., a house). The media
playback system 100 comprises one or more playback devices 110
(identified individually as playback devices 110a-n), one or more
network microphone devices ("NMDs") 120 (identified individually as
NMDs 120a-c), and one or more control devices 130 (identified
individually as control devices 130a and 130b).
[0034] As used herein the term "playback device" can generally
refer to a network device configured to receive, process, and
output data of a media playback system. For example, a playback
device can be a network device that receives and processes audio
content. In some embodiments, a playback device includes one or
more transducers or speakers powered by one or more amplifiers. In
other embodiments, however, a playback device includes one of (or
neither of) the speaker and the amplifier. For instance, a playback
device can comprise one or more amplifiers configured to drive one
or more speakers external to the playback device via a
corresponding wire or cable.
[0035] Moreover, as used herein the term NMD (i.e., a "network
microphone device") can generally refer to a network device that is
configured for audio detection. In some embodiments, an NMD is a
stand-alone device configured primarily for audio detection. In
other embodiments, an NMD is incorporated into a playback device
(or vice versa).
[0036] The term "control device" can generally refer to a network
device configured to perform functions relevant to facilitating
user access, control, and/or configuration of the media playback
system 100.
[0037] Each of the playback devices 110 is configured to receive
audio signals or data from one or more media sources (e.g., one or
more remote servers, one or more local devices) and play back the
received audio signals or data as sound. The one or more NMDs 120
are configured to receive spoken word commands, and the one or more
control devices 130 are configured to receive user input. In
response to the received spoken word commands and/or user input,
the media playback system 100 can play back audio via one or more
of the playback devices 110. In certain embodiments, the playback
devices 110 are configured to commence playback of media content in
response to a trigger. For instance, one or more of the playback
devices 110 can be configured to play back a morning playlist upon
detection of an associated trigger condition (e.g., presence of a
user in a kitchen, detection of a coffee machine operation). In
some embodiments, for example, the media playback system 100 is
configured to play back audio from a first playback device (e.g.,
the playback device 100a) in synchrony with a second playback
device (e.g., the playback device 100b). Interactions between the
playback devices 110, NMDs 120, and/or control devices 130 of the
media playback system 100 configured in accordance with the various
embodiments of the disclosure are described in greater detail below
with respect to FIGS. 1B-1H.
[0038] In the illustrated embodiment of FIG. 1A, the environment
101 comprises a household having several rooms, spaces, and/or
playback zones, including (clockwise from upper left) a master
bathroom 101a, a master bedroom 101b, a second bedroom 101c, a
family room or den 101d, an office 101e, a living room 101f, a
dining room 101g, a kitchen 101h, and an outdoor patio 101i. While
certain embodiments and examples are described below in the context
of a home environment, the technologies described herein may be
implemented in other types of environments. In some embodiments,
for example, the media playback system 100 can be implemented in
one or more commercial settings (e.g., a restaurant, mall, airport,
hotel, a retail or other store), one or more vehicles (e.g., a
sports utility vehicle, bus, car, a ship, a boat, an airplane),
multiple environments (e.g., a combination of home and vehicle
environments), and/or another suitable environment where multi-zone
audio may be desirable.
[0039] The media playback system 100 can comprise one or more
playback zones, some of which may correspond to the rooms in the
environment 101. The media playback system 100 can be established
with one or more playback zones, after which additional zones may
be added, or removed to form, for example, the configuration shown
in FIG. 1A. Each zone may be given a name according to a different
room or space such as the office 101e, master bathroom 101a, master
bedroom 101b, the second bedroom 101c, kitchen 101h, dining room
101g, living room 101f, and/or the balcony 101i. In some aspects, a
single playback zone may include multiple rooms or spaces. In
certain aspects, a single room or space may include multiple
playback zones.
[0040] In the illustrated embodiment of FIG. 1A, the master
bathroom 101a, the second bedroom 101c, the office 101e, the living
room 101f, the dining room 101g, the kitchen 101h, and the outdoor
patio 101i each include one playback device 110, and the master
bedroom 101b and the den 101d include a plurality of playback
devices 110. In the master bedroom 101b, the playback devices 110l
and 110m may be configured, for example, to play back audio content
in synchrony as individual ones of playback devices 110, as a
bonded playback zone, as a consolidated playback device, and/or any
combination thereof. Similarly, in the den 101d, the playback
devices 110h-j can be configured, for instance, to play back audio
content in synchrony as individual ones of playback devices 110, as
one or more bonded playback devices, and/or as one or more
consolidated playback devices. Additional details regarding bonded
and consolidated playback devices are described below with respect
to FIGS. 1B and 1E.
[0041] In some aspects, one or more of the playback zones in the
environment 101 may each be playing different audio content. For
instance, a user may be grilling on the patio 101i and listening to
hip hop music being played by the playback device 110c while
another user is preparing food in the kitchen 101h and listening to
classical music played by the playback device 110b. In another
example, a playback zone may play the same audio content in
synchrony with another playback zone. For instance, the user may be
in the office 101e listening to the playback device 110f playing
back the same hip hop music being played back by playback device
110c on the patio 101i. In some aspects, the playback devices 110c
and 110f play back the hip hop music in synchrony such that the
user perceives that the audio content is being played seamlessly
(or at least substantially seamlessly) while moving between
different playback zones. Additional details regarding audio
playback synchronization among playback devices and/or zones can be
found, for example, in U.S. Pat. No. 8,234,395 entitled, "System
and method for synchronizing operations among a plurality of
independently clocked digital data processing devices," which is
incorporated herein by reference in its entirety.
a. Suitable Media Playback System
[0042] FIG. 1B is a schematic diagram of the media playback system
100 and a cloud network 102. For ease of illustration, certain
devices of the media playback system 100 and the cloud network 102
are omitted from FIG. 1B. One or more communication links 103
(referred to hereinafter as "the links 103") communicatively couple
the media playback system 100 and the cloud network 102.
[0043] The links 103 can comprise, for example, one or more wired
networks, one or more wireless networks, one or more wide area
networks (WAN), one or more local area networks (LAN), one or more
personal area networks (PAN), one or more telecommunication
networks (e.g., one or more Global System for Mobiles (GSM)
networks, Code Division Multiple Access (CDMA) networks, Long-Term
Evolution (LTE) networks, 5G communication network networks, and/or
other suitable data transmission protocol networks), etc. The cloud
network 102 is configured to deliver media content (e.g., audio
content, video content, photographs, social media content) to the
media playback system 100 in response to a request transmitted from
the media playback system 100 via the links 103. In some
embodiments, the cloud network 102 is further configured to receive
data (e.g. voice input data) from the media playback system 100 and
correspondingly transmit commands and/or media content to the media
playback system 100.
[0044] The cloud network 102 comprises computing devices 106
(identified separately as a first computing device 106a, a second
computing device 106b, and a third computing device 106c). The
computing devices 106 can comprise individual computers or servers,
such as, for example, a media streaming service server storing
audio and/or other media content, a voice service server, a social
media server, a media playback system control server, etc. In some
embodiments, one or more of the computing devices 106 comprise
modules of a single computer or server. In certain embodiments, one
or more of the computing devices 106 comprise one or more modules,
computers, and/or servers. Moreover, while the cloud network 102 is
described above in the context of a single cloud network, in some
embodiments the cloud network 102 comprises a plurality of cloud
networks comprising communicatively coupled computing devices.
Furthermore, while the cloud network 102 is shown in FIG. 1B as
having three of the computing devices 106, in some embodiments, the
cloud network 102 comprises fewer (or more than) three computing
devices 106.
[0045] The media playback system 100 is configured to receive media
content from the networks 102 via the links 103. The received media
content can comprise, for example, a Uniform Resource Identifier
(URI) and/or a Uniform Resource Locator (URL). For instance, in
some examples, the media playback system 100 can stream, download,
or otherwise obtain data from a URI or a URL corresponding to the
received media content. A network 104 communicatively couples the
links 103 and at least a portion of the devices (e.g., one or more
of the playback devices 110, NMDs 120, and/or control devices 130)
of the media playback system 100. The network 104 can include, for
example, a wireless network (e.g., a WiFi network, a Bluetooth, a
Z-Wave network, a ZigBee, and/or other suitable wireless
communication protocol network) and/or a wired network (e.g., a
network comprising Ethernet, Universal Serial Bus (USB), and/or
another suitable wired communication). As those of ordinary skill
in the art will appreciate, as used herein, "WiFi" can refer to
several different communication protocols including, for example,
Institute of Electrical and Electronics Engineers (IEEE) 802.11a,
802.11b, 802.11g, 802.11n, 802.11ac, 802.11ac, 802.11ad, 802.11af,
802.11ah, 802.11ai, 802.11aj, 802.11aq, 802.11ax, 802.11ay, 802.15,
etc. transmitted at 2.4 Gigahertz (GHz), 5 GHz, and/or another
suitable frequency.
[0046] In some embodiments, the network 104 comprises a dedicated
communication network that the media playback system 100 uses to
transmit messages between individual devices and/or to transmit
media content to and from media content sources (e.g., one or more
of the computing devices 106). In certain embodiments, the network
104 is configured to be accessible only to devices in the media
playback system 100, thereby reducing interference and competition
with other household devices. In other embodiments, however, the
network 104 comprises an existing household communication network
(e.g., a household WiFi network). In some embodiments, the links
103 and the network 104 comprise one or more of the same networks.
In some aspects, for example, the links 103 and the network 104
comprise a telecommunication network (e.g., an LTE network, a 5G
network). Moreover, in some embodiments, the media playback system
100 is implemented without the network 104, and devices comprising
the media playback system 100 can communicate with each other, for
example, via one or more direct connections, PANs,
telecommunication networks, and/or other suitable communication
links.
[0047] In some embodiments, audio content sources may be regularly
added or removed from the media playback system 100. In some
embodiments, for example, the media playback system 100 performs an
indexing of media items when one or more media content sources are
updated, added to, and/or removed from the media playback system
100. The media playback system 100 can scan identifiable media
items in some or all folders and/or directories accessible to the
playback devices 110, and generate or update a media content
database comprising metadata (e.g., title, artist, album, track
length) and other associated information (e.g., URIs, URLs) for
each identifiable media item found. In some embodiments, for
example, the media content database is stored on one or more of the
playback devices 110, network microphone devices 120, and/or
control devices 130.
[0048] In the illustrated embodiment of FIG. 1B, the playback
devices 110l and 110m comprise a group 107a. The playback devices
110l and 110m can be positioned in different rooms in a household
and be grouped together in the group 107a on a temporary or
permanent basis based on user input received at the control device
130a and/or another control device 130 in the media playback system
100. When arranged in the group 107a, the playback devices 110l and
110m can be configured to play back the same or similar audio
content in synchrony from one or more audio content sources. In
certain embodiments, for example, the group 107a comprises a bonded
zone in which the playback devices 110l and 110m comprise left
audio and right audio channels, respectively, of multi-channel
audio content, thereby producing or enhancing a stereo effect of
the audio content. In some embodiments, the group 107a includes
additional playback devices 110. In other embodiments, however, the
media playback system 100 omits the group 107a and/or other grouped
arrangements of the playback devices 110.
[0049] The media playback system 100 includes the NMDs 120a and
120d, each comprising one or more microphones configured to receive
voice utterances from a user. In the illustrated embodiment of FIG.
1B, the NMD 120a is a standalone device and the NMD 120d is
integrated into the playback device 110n. The NMD 120a, for
example, is configured to receive voice input 121 from a user 123.
In some embodiments, the NMD 120a transmits data associated with
the received voice input 121 to a voice assistant service (VAS)
configured to (i) process the received voice input data and (ii)
transmit a corresponding command to the media playback system 100.
In some aspects, for example, the computing device 106c comprises
one or more modules and/or servers of a VAS (e.g., a VAS operated
by one or more of SONOS.RTM., AMAZON.RTM., GOOGLE.RTM. APPLE.RTM.,
MICROSOFT.RTM.). The computing device 106c can receive the voice
input data from the NMD 120a via the network 104 and the links 103.
In response to receiving the voice input data, the computing device
106c processes the voice input data (i.e., "Play Hey Jude by The
Beatles"), and determines that the processed voice input includes a
command to play a song (e.g., "Hey Jude"). The computing device
106c accordingly transmits commands to the media playback system
100 to play back "Hey Jude" by the Beatles from a suitable media
service (e.g., via one or more of the computing devices 106) on one
or more of the playback devices 110.
b. Suitable Playback Devices
[0050] FIG. 1C is a block diagram of the playback device 110a
comprising an input/output 111. The input/output 111 can include an
analog I/O 111a (e.g., one or more wires, cables, and/or other
suitable communication links configured to carry analog signals)
and/or a digital I/O 111b (e.g., one or more wires, cables, or
other suitable communication links configured to carry digital
signals). In some embodiments, the analog I/O 111a is an audio
line-in input connection comprising, for example, an auto-detecting
3.5 mm audio line-in connection. In some embodiments, the digital
I/O 111b comprises a Sony/Philips Digital Interface Format (S/PDIF)
communication interface and/or cable and/or a Toshiba Link
(TOSLINK) cable. In some embodiments, the digital I/O 111b
comprises an High-Definition Multimedia Interface (HDMI) interface
and/or cable. In some embodiments, the digital I/O 111b includes
one or more wireless communication links comprising, for example, a
radio frequency (RF), infrared, WiFi, Bluetooth, or another
suitable communication protocol. In certain embodiments, the analog
I/O 111a and the digital 111b comprise interfaces (e.g., ports,
plugs, jacks) configured to receive connectors of cables
transmitting analog and digital signals, respectively, without
necessarily including cables.
[0051] The playback device 110a, for example, can receive media
content (e.g., audio content comprising music and/or other sounds)
from a local audio source 105 via the input/output 111 (e.g., a
cable, a wire, a PAN, a Bluetooth connection, an ad hoc wired or
wireless communication network, and/or another suitable
communication link). The local audio source 105 can comprise, for
example, a mobile device (e.g., a smartphone, a tablet, a laptop
computer) or another suitable audio component (e.g., a television,
a desktop computer, an amplifier, a phonograph, a Blu-ray player, a
memory storing digital media files). In some aspects, the local
audio source 105 includes local music libraries on a smartphone, a
computer, a networked-attached storage (NAS), and/or another
suitable device configured to store media files. In certain
embodiments, one or more of the playback devices 110, NMDs 120,
and/or control devices 130 comprise the local audio source 105. In
other embodiments, however, the media playback system omits the
local audio source 105 altogether. In some embodiments, the
playback device 110a does not include an input/output 111 and
receives all audio content via the network 104.
[0052] The playback device 110a further comprises electronics 112,
a user interface 113 (e.g., one or more buttons, knobs, dials,
touch-sensitive surfaces, displays, touchscreens), and one or more
transducers 114 (referred to hereinafter as "the transducers 114").
The electronics 112 is configured to receive audio from an audio
source (e.g., the local audio source 105) via the input/output 111,
one or more of the computing devices 106a-c via the network 104
(FIG. 1B)), amplify the received audio, and output the amplified
audio for playback via one or more of the transducers 114. In some
embodiments, the playback device 110a optionally includes one or
more microphones 115 (e.g., a single microphone, a plurality of
microphones, a microphone array) (hereinafter referred to as "the
microphones 115"). In certain embodiments, for example, the
playback device 110a having one or more of the optional microphones
115 can operate as an NMD configured to receive voice input from a
user and correspondingly perform one or more operations based on
the received voice input.
[0053] In the illustrated embodiment of FIG. 1C, the electronics
112 comprise one or more processors 112a (referred to hereinafter
as "the processors 112a"), memory 112b, software components 112c, a
network interface 112d, one or more audio processing components
112g (referred to hereinafter as "the audio components 112g"), one
or more audio amplifiers 112h (referred to hereinafter as "the
amplifiers 112h"), and power 112i (e.g., one or more power
supplies, power cables, power receptacles, batteries, induction
coils, Power-over Ethernet (POE) interfaces, and/or other suitable
sources of electric power). In some embodiments, the electronics
112 optionally include one or more other components 112j (e.g., one
or more sensors, video displays, touchscreens, battery charging
bases).
[0054] The processors 112a can comprise clock-driven computing
component(s) configured to process data, and the memory 112b can
comprise a computer-readable medium (e.g., a tangible,
non-transitory computer-readable medium, data storage loaded with
one or more of the software components 112c) configured to store
instructions for performing various operations and/or functions.
The processors 112a are configured to execute the instructions
stored on the memory 112b to perform one or more of the operations.
The operations can include, for example, causing the playback
device 110a to retrieve audio data from an audio source (e.g., one
or more of the computing devices 106a-c (FIG. 1B)), and/or another
one of the playback devices 110. In some embodiments, the
operations further include causing the playback device 110a to send
audio data to another one of the playback devices 110a and/or
another device (e.g., one of the NMDs 120). Certain embodiments
include operations causing the playback device 110a to pair with
another of the one or more playback devices 110 to enable a
multi-channel audio environment (e.g., a stereo pair, a bonded
zone).
[0055] The processors 112a can be further configured to perform
operations causing the playback device 110a to synchronize playback
of audio content with another of the one or more playback devices
110. As those of ordinary skill in the art will appreciate, during
synchronous playback of audio content on a plurality of playback
devices, a listener will preferably be unable to perceive
time-delay differences between playback of the audio content by the
playback device 110a and the other one or more other playback
devices 110. Additional details regarding audio playback
synchronization among playback devices can be found, for example,
in U.S. Pat. No. 8,234,395, which was incorporated by reference
above.
[0056] In some embodiments, the memory 112b is further configured
to store data associated with the playback device 110a, such as one
or more zones and/or zone groups of which the playback device 110a
is a member, audio sources accessible to the playback device 110a,
and/or a playback queue that the playback device 110a (and/or
another of the one or more playback devices) can be associated
with. The stored data can comprise one or more state variables that
are periodically updated and used to describe a state of the
playback device 110a. The memory 112b can also include data
associated with a state of one or more of the other devices (e.g.,
the playback devices 110, NMDs 120, control devices 130) of the
media playback system 100. In some aspects, for example, the state
data is shared during predetermined intervals of time (e.g., every
5 seconds, every 10 seconds, every 60 seconds) among at least a
portion of the devices of the media playback system 100, so that
one or more of the devices have the most recent data associated
with the media playback system 100.
[0057] The network interface 112d is configured to facilitate a
transmission of data between the playback device 110a and one or
more other devices on a data network such as, for example, the
links 103 and/or the network 104 (FIG. 1B). The network interface
112d is configured to transmit and receive data corresponding to
media content (e.g., audio content, video content, text,
photographs) and other signals (e.g., non-transitory signals)
comprising digital packet data including an Internet Protocol
(IP)-based source address and/or an IP-based destination address.
The network interface 112d can parse the digital packet data such
that the electronics 112 properly receives and processes the data
destined for the playback device 110a.
[0058] In the illustrated embodiment of FIG. 1C, the network
interface 112d comprises one or more wireless interfaces 112e
(referred to hereinafter as "the wireless interface 112e"). The
wireless interface 112e (e.g., a suitable interface comprising one
or more antennae) can be configured to wirelessly communicate with
one or more other devices (e.g., one or more of the other playback
devices 110, NMDs 120, and/or control devices 130) that are
communicatively coupled to the network 104 (FIG. 1B) in accordance
with a suitable wireless communication protocol (e.g., WiFi,
Bluetooth, LTE). In some embodiments, the network interface 112d
optionally includes a wired interface 112f (e.g., an interface or
receptacle configured to receive a network cable such as an
Ethernet, a USB-A, USB-C, and/or Thunderbolt cable) configured to
communicate over a wired connection with other devices in
accordance with a suitable wired communication protocol. In certain
embodiments, the network interface 112d includes the wired
interface 112f and excludes the wireless interface 112e. In some
embodiments, the electronics 112 excludes the network interface
112d altogether and transmits and receives media content and/or
other data via another communication path (e.g., the input/output
111).
[0059] The audio components 112g are configured to process and/or
filter data comprising media content received by the electronics
112 (e.g., via the input/output 111 and/or the network interface
112d) to produce output audio signals. In some embodiments, the
audio processing components 112g comprise, for example, one or more
digital-to-analog converters (DAC), audio preprocessing components,
audio enhancement components, a digital signal processors (DSPs),
and/or other suitable audio processing components, modules,
circuits, etc. In certain embodiments, one or more of the audio
processing components 112g can comprise one or more subcomponents
of the processors 112a. In some embodiments, the electronics 112
omits the audio processing components 112g. In some aspects, for
example, the processors 112a execute instructions stored on the
memory 112b to perform audio processing operations to produce the
output audio signals.
[0060] The amplifiers 112h are configured to receive and amplify
the audio output signals produced by the audio processing
components 112g and/or the processors 112a. The amplifiers 112h can
comprise electronic devices and/or components configured to amplify
audio signals to levels sufficient for driving one or more of the
transducers 114. In some embodiments, for example, the amplifiers
112h include one or more switching or class-D power amplifiers. In
other embodiments, however, the amplifiers include one or more
other types of power amplifiers (e.g., linear gain power
amplifiers, class-A amplifiers, class-B amplifiers, class-AB
amplifiers, class-C amplifiers, class-D amplifiers, class-E
amplifiers, class-F amplifiers, class-G and/or class H amplifiers,
and/or another suitable type of power amplifier). In certain
embodiments, the amplifiers 112h comprise a suitable combination of
two or more of the foregoing types of power amplifiers. Moreover,
in some embodiments, individual ones of the amplifiers 112h
correspond to individual ones of the transducers 114. In other
embodiments, however, the electronics 112 includes a single one of
the amplifiers 112h configured to output amplified audio signals to
a plurality of the transducers 114. In some other embodiments, the
electronics 112 omits the amplifiers 112h.
[0061] The transducers 114 (e.g., one or more speakers and/or
speaker drivers) receive the amplified audio signals from the
amplifier 112h and render or output the amplified audio signals as
sound (e.g., audible sound waves having a frequency between about
20 Hertz (Hz) and 20 kilohertz (kHz)). In some embodiments, the
transducers 114 can comprise a single transducer. In other
embodiments, however, the transducers 114 comprise a plurality of
audio transducers. In some embodiments, the transducers 114
comprise more than one type of transducer. For example, the
transducers 114 can include one or more low frequency transducers
(e.g., subwoofers, woofers), mid-range frequency transducers (e.g.,
mid-range transducers, mid-woofers), and one or more high frequency
transducers (e.g., one or more tweeters). As used herein, "low
frequency" can generally refer to audible frequencies below about
500 Hz, "mid-range frequency" can generally refer to audible
frequencies between about 500 Hz and about 2 kHz, and "high
frequency" can generally refer to audible frequencies above 2 kHz.
In certain embodiments, however, one or more of the transducers 114
comprise transducers that do not adhere to the foregoing frequency
ranges. For example, one of the transducers 114 may comprise a
mid-woofer transducer configured to output sound at frequencies
between about 200 Hz and about 5 kHz.
[0062] By way of illustration, SONOS, Inc. presently offers (or has
offered) for sale certain playback devices including, for example,
a "SONOS ONE," "PLAY:1," "PLAY:3," "PLAY:5," "PLAYBAR," "PLAYBASE,"
"CONNECT:AMP," "CONNECT," and "SUB." Other suitable playback
devices may additionally or alternatively be used to implement the
playback devices of example embodiments disclosed herein.
Additionally, one of ordinary skilled in the art will appreciate
that a playback device is not limited to the examples described
herein or to SONOS product offerings. In some embodiments, for
example, one or more playback devices 110 comprises wired or
wireless headphones (e.g., over-the-ear headphones, on-ear
headphones, in-ear earphones). In other embodiments, one or more of
the playback devices 110 comprise a docking station and/or an
interface configured to interact with a docking station for
personal mobile media playback devices. In certain embodiments, a
playback device may be integral to another device or component such
as a television, a lighting fixture, or some other device for
indoor or outdoor use. In some embodiments, a playback device omits
a user interface and/or one or more transducers. For example, FIG.
1D is a block diagram of a playback device 110p comprising the
input/output 111 and electronics 112 without the user interface 113
or transducers 114.
[0063] FIG. 1E is a block diagram of a bonded playback device 110q
comprising the playback device 110a (FIG. 1C) sonically bonded with
the playback device 110i (e.g., a subwoofer) (FIG. 1A). In the
illustrated embodiment, the playback devices 110a and 110i are
separate ones of the playback devices 110 housed in separate
enclosures. In some embodiments, however, the bonded playback
device 110q comprises a single enclosure housing both the playback
devices 110a and 110i. The bonded playback device 110q can be
configured to process and reproduce sound differently than an
unbonded playback device (e.g., the playback device 110a of FIG.
1C) and/or paired or bonded playback devices (e.g., the playback
devices 110l and 110m of FIG. 1B). In some embodiments, for
example, the playback device 110a is full-range playback device
configured to render low frequency, mid-range frequency, and high
frequency audio content, and the playback device 110i is a
subwoofer configured to render low frequency audio content. In some
aspects, the playback device 110a, when bonded with the first
playback device, is configured to render only the mid-range and
high frequency components of a particular audio content, while the
playback device 110i renders the low frequency component of the
particular audio content. In some embodiments, the bonded playback
device 110q includes additional playback devices and/or another
bonded playback device.
c. Suitable Network Microphone Devices (NMDs)
[0064] FIG. 1F is a block diagram of the NMD 120a (FIGS. 1A and
1B). The NMD 120a includes one or more voice processing components
124 (hereinafter "the voice components 124") and several components
described with respect to the playback device 110a (FIG. 1C)
including the processors 112a, the memory 112b, and the microphones
115. The NMD 120a optionally comprises other components also
included in the playback device 110a (FIG. 1C), such as the user
interface 113 and/or the transducers 114. In some embodiments, the
NMD 120a is configured as a media playback device (e.g., one or
more of the playback devices 110), and further includes, for
example, one or more of the audio components 112g (FIG. 1C), the
amplifiers 114, and/or other playback device components. In certain
embodiments, the NMD 120a comprises an Internet of Things (IoT)
device such as, for example, a thermostat, alarm panel, fire and/or
smoke detector, etc. In some embodiments, the NMD 120a comprises
the microphones 115, the voice processing 124, and only a portion
of the components of the electronics 112 described above with
respect to FIG. 1B. In some aspects, for example, the NMD 120a
includes the processor 112a and the memory 112b (FIG. 1B), while
omitting one or more other components of the electronics 112. In
some embodiments, the NMD 120a includes additional components
(e.g., one or more sensors, cameras, thermometers, barometers,
hygrometers).
[0065] In some embodiments, an NMD can be integrated into a
playback device. FIG. 1G is a block diagram of a playback device
110r comprising an NMD 120d. The playback device 110r can comprise
many or all of the components of the playback device 110a and
further include the microphones 115 and voice processing 124 (FIG.
1F). The playback device 110r optionally includes an integrated
control device 130c. The control device 130c can comprise, for
example, a user interface (e.g., the user interface 113 of FIG. 1B)
configured to receive user input (e.g., touch input, voice input)
without a separate control device. In other embodiments, however,
the playback device 110r receives commands from another control
device (e.g., the control device 130a of FIG. 1B).
[0066] Referring again to FIG. 1F, the microphones 115 are
configured to acquire, capture, and/or receive sound from an
environment (e.g., the environment 101 of FIG. 1A) and/or a room in
which the NMD 120a is positioned. The received sound can include,
for example, vocal utterances, audio played back by the NMD 120a
and/or another playback device, background voices, ambient sounds,
etc. The microphones 115 convert the received sound into electrical
signals to produce microphone data. The voice processing 124
receives and analyzes the microphone data to determine whether a
voice input is present in the microphone data. The voice input can
comprise, for example, an activation word followed by an utterance
including a user request. As those of ordinary skill in the art
will appreciate, an activation word is a word or other audio cue
that signifying a user voice input. For instance, in querying the
AMAZON.RTM. VAS, a user might speak the activation word "Alexa."
Other examples include "Ok, Google" for invoking the GOOGLE.RTM.
VAS and "Hey, Siri" for invoking the APPLE.RTM. VAS.
[0067] After detecting the activation word, voice processing 124
monitors the microphone data for an accompanying user request in
the voice input. The user request may include, for example, a
command to control a third-party device, such as a thermostat
(e.g., NEST.RTM. thermostat), an illumination device (e.g., a
PHILIPS HUE.RTM. lighting device), or a media playback device
(e.g., a Sonos.RTM. playback device). For example, a user might
speak the activation word "Alexa" followed by the utterance "set
the thermostat to 68 degrees" to set a temperature in a home (e.g.,
the environment 101 of FIG. 1A). The user might speak the same
activation word followed by the utterance "turn on the living room"
to turn on illumination devices in a living room area of the home.
The user may similarly speak an activation word followed by a
request to play a particular song, an album, or a playlist of music
on a playback device in the home.
d. Suitable Control Devices
[0068] FIG. 1H is a partially schematic diagram of the control
device 130a (FIGS. 1A and 1B). As used herein, the term "control
device" can be used interchangeably with "controller" or "control
system." Among other features, the control device 130a is
configured to receive user input related to the media playback
system 100 and, in response, cause one or more devices in the media
playback system 100 to perform an action(s) or operation(s)
corresponding to the user input. In the illustrated embodiment, the
control device 130a comprises a smartphone (e.g., an iPhone.TM., an
Android phone) on which media playback system controller
application software is installed. In some embodiments, the control
device 130a comprises, for example, a tablet (e.g., an iPad.TM.), a
computer (e.g., a laptop computer, a desktop computer), and/or
another suitable device (e.g., a television, an automobile audio
head unit, an IoT device). In certain embodiments, the control
device 130a comprises a dedicated controller for the media playback
system 100. In other embodiments, as described above with respect
to FIG. 1G, the control device 130a is integrated into another
device in the media playback system 100 (e.g., one more of the
playback devices 110, NMDs 120, and/or other suitable devices
configured to communicate over a network).
[0069] The control device 130a includes electronics 132, a user
interface 133, one or more speakers 134, and one or more
microphones 135. The electronics 132 comprise one or more
processors 132a (referred to hereinafter as "the processors 132a"),
a memory 132b, software components 132c, and a network interface
132d. The processor 132a can be configured to perform functions
relevant to facilitating user access, control, and configuration of
the media playback system 100. The memory 132b can comprise data
storage that can be loaded with one or more of the software
components executable by the processor 132a to perform those
functions. The software components 132c can comprise applications
and/or other executable software configured to facilitate control
of the media playback system 100. The memory 112b can be configured
to store, for example, the software components 132c, media playback
system controller application software, and/or other data
associated with the media playback system 100 and the user.
[0070] The network interface 132d is configured to facilitate
network communications between the control device 130a and one or
more other devices in the media playback system 100, and/or one or
more remote devices. In some embodiments, the network interface 132
is configured to operate according to one or more suitable
communication industry standards (e.g., infrared, radio, wired
standards including IEEE 802.3, wireless standards including IEEE
802.11a, 802.11b, 802.11g, 802.11n, 802.11ac, 802.15, 4G, LTE). The
network interface 132d can be configured, for example, to transmit
data to and/or receive data from the playback devices 110, the NMDs
120, other ones of the control devices 130, one of the computing
devices 106 of FIG. 1B, devices comprising one or more other media
playback systems, etc. The transmitted and/or received data can
include, for example, playback device control commands, state
variables, playback zone and/or zone group configurations. For
instance, based on user input received at the user interface 133,
the network interface 132d can transmit a playback device control
command (e.g., volume control, audio playback control, audio
content selection) from the control device 130a to one or more of
the playback devices 100. The network interface 132d can also
transmit and/or receive configuration changes such as, for example,
adding/removing one or more playback devices 100 to/from a zone,
adding/removing one or more zones to/from a zone group, forming a
bonded or consolidated player, separating one or more playback
devices from a bonded or consolidated player, among others.
[0071] The user interface 133 is configured to receive user input
and can facilitate `control of the media playback system 100. The
user interface 133 includes media content art 133a (e.g., album
art, lyrics, videos), a playback status indicator 133b (e.g., an
elapsed and/or remaining time indicator), media content information
region 133c, a playback control region 133d, and a zone indicator
133e. The media content information region 133c can include a
display of relevant information (e.g., title, artist, album, genre,
release year) about media content currently playing and/or media
content in a queue or playlist. The playback control region 133d
can include selectable (e.g., via touch input and/or via a cursor
or another suitable selector) icons to cause one or more playback
devices in a selected playback zone or zone group to perform
playback actions such as, for example, play or pause, fast forward,
rewind, skip to next, skip to previous, enter/exit shuffle mode,
enter/exit repeat mode, enter/exit cross fade mode, etc. The
playback control region 133d may also include selectable icons to
modify equalization settings, playback volume, and/or other
suitable playback actions. In the illustrated embodiment, the user
interface 133 comprises a display presented on a touch screen
interface of a smartphone (e.g., an iPhone.TM., an Android phone).
In some embodiments, however, user interfaces of varying formats,
styles, and interactive sequences may alternatively be implemented
on one or more network devices to provide comparable control access
to a media playback system.
[0072] The one or more speakers 134 (e.g., one or more transducers)
can be configured to output sound to the user of the control device
130a. In some embodiments, the one or more speakers comprise
individual transducers configured to correspondingly output low
frequencies, mid-range frequencies, and/or high frequencies. In
some aspects, for example, the control device 130a is configured as
a playback device (e.g., one of the playback devices 110).
Similarly, in some embodiments the control device 130a is
configured as an NMD (e.g., one of the NMDs 120), receiving voice
commands and other sounds via the one or more microphones 135.
[0073] The one or more microphones 135 can comprise, for example,
one or more condenser microphones, electret condenser microphones,
dynamic microphones, and/or other suitable types of microphones or
transducers. In some embodiments, two or more of the microphones
135 are arranged to capture location information of an audio source
(e.g., voice, audible sound) and/or configured to facilitate
filtering of background noise. Moreover, in certain embodiments,
the control device 130a is configured to operate as playback device
and an NMD. In other embodiments, however, the control device 130a
omits the one or more speakers 134 and/or the one or more
microphones 135. For instance, the control device 130a may comprise
a device (e.g., a thermostat, an IoT device, a network device)
comprising a portion of the electronics 132 and the user interface
133 (e.g., a touch screen) without any speakers or microphones.
III. Example Systems and Methods for Calibrating a Playback
Device
[0074] As discussed above, in some examples, a playback device is
configured to calibrate itself to account for an acoustic response
of a room in which the playback device is located. The playback
device performs this self-calibration by leveraging a database that
is populated with calibration settings that were determined for a
number of other playback devices. In some embodiments, the
calibration settings stored in the database are determined based on
multi-location acoustic responses for the rooms of the other
playback devices.
[0075] FIG. 2A depicts an example environment for using a
multi-location acoustic response of a room to determine calibration
settings for a playback device. As shown in FIG. 2A, a playback
device 210a and a network device 230 are located in a room 201a.
The playback device 210a may be similar to any of the playback
devices 110 depicted in FIGS. 1A-1E and 1G, and the network device
230 may be similar to any of the NMDs 120 or controllers 130
depicted in FIGS. 1A-1B and 1F-1H. One or both of the playback
device 210a and the network device 230 are in communication, either
directly or indirectly, with a computing device 206. The computing
device 206 may be similar to any of the computing devices 106
depicted in FIG. 1B. For instance, the computing device 206 may be
a server located remotely from the room 201a and connected to the
playback device 210a and/or the network device 230 over a wired or
wireless communication network.
[0076] In practice, the playback device 210a outputs audio content
via one or more transducers (e.g., one or more speakers and/or
speaker drivers) of the playback device 210a. In one example, the
audio content is output using a test signal or measurement signal
representative of audio content that may be played by the playback
device 210a during regular use by a user. Accordingly, the audio
content may include content with frequencies substantially covering
a renderable frequency range of the playback device 210a or a
frequency range audible to a human. In one case, the audio content
is output using an audio signal designed specifically for use when
calibrating playback devices such as the playback device 210a being
calibrated in examples discussed herein. In another case, the audio
content is an audio track that is a favorite of a user of the
playback device 210a, or a commonly played audio track by the
playback device 210a. Other examples are also possible.
[0077] While the playback device 210a outputs the audio content,
the network device 230 moves to various locations within the room
201a. For instance, the network device 230 may move between a first
physical location and a second physical location within the room
201a. As shown in FIG. 2A, the first physical location may be the
point (a), and the second physical location may be the point (b).
While moving from the first physical location (a) to the second
physical location (b), the network device 230 may traverse
locations within the room 201a where one or more listeners may
experience audio playback during regular use of the playback device
210a. For instance, as shown, the room 201a includes a kitchen area
and a dining area, and a path 208 between the first physical
location (a) and the second physical location (b) covers locations
within the kitchen area and dining area where one or more listeners
may experience audio playback during regular use of the playback
device 210a.
[0078] In some examples, movement of the network device 230 between
the first physical location (a) and the second physical location
(b) may be performed by a user. In one case, a graphical display of
the network device 230 may provide an indication to move the
network device 230 within the room 201a. For instance, the
graphical display may display text, such as "While audio is
playing, please move the network device through locations within
the playback zone where you or others may enjoy music." Other
examples are also possible.
[0079] The network device 230 determines a multi-location acoustic
response of the room 201a. To facilitate this, while the network
device 230 is moving between physical locations within the room
201a, the network device 230 captures audio data representing
reflections of the audio content output by the playback device 210a
in the room 201a. For instance, the network device 230 may be a
mobile device with a built-in microphone (e.g., microphone(s) 115
of network microphone device 120a), and the network device 230 may
use the built-in microphone to capture the audio data representing
reflections of the audio content at multiple locations within the
room 201a.
[0080] The multi-location acoustic response is an acoustic response
of the room 201a based on the detected audio data representing
reflections of the audio content at multiple locations in the room
201a, such as at the first physical location (a) and the second
physical location (b). The multi-location acoustic response may be
represented as a spectral response, spatial response, or temporal
response, among others. The spectral response may be an indication
of how volume of audio sound captured by the microphone varies with
frequency within the room 201a. A power spectral density is an
example representation of the spectral response. The spatial
response may indicate how the volume of the audio sound captured by
the microphone varies with direction and/or spatial position in the
room 201a. The temporal response may be an indication of how audio
sound played by the playback device 210a, e.g., an impulse sound or
tone played by the playback device 210a, changes within the room
201a. The change may be characterized as a reverberation, delay,
decay, or phase change of the audio sound.
[0081] The responses may be represented in various forms. For
instance, the spatial response and temporal responses may be
represented as room averages. Additionally, or alternatively, the
multi-location acoustic response may be represented as a set of
impulse responses or bi-quad filter coefficients representative of
the acoustic response, among others. Values of the multi-location
acoustic response may be represented in vector or matrix form.
[0082] Audio played by the playback device 210a is adjusted based
on the multi-location acoustic response of the room 201a so as to
offset or otherwise account for acoustics of the room 201a
indicated by the multi-location acoustic response. In particular,
the multi-location acoustic response is used to identify
calibration settings, which may include determining an audio
processing algorithm. U.S. Pat. No. 9,706,323, incorporated by
reference above, discloses various audio processing algorithms,
which are contemplated herein.
[0083] In some examples, determining the audio processing algorithm
involves determining an audio processing algorithm that, when
applied to the playback device 210a, causes audio content output by
the playback device 210a in the room 201a to have a target
frequency response. For instance, determining the audio processing
algorithm may involve determining frequency responses at the
multiple locations traversed by the network device while moving
within the room 201a and determining an audio processing algorithm
that adjusts the frequency responses at those locations to more
closely reflect target frequency responses. In one example, if one
or more of the determined frequency responses has a particular
audio frequency that is more attenuated than other frequencies,
then determining the audio processing algorithm may involve
determining an audio processing algorithm that increases
amplification at the particular audio frequency. Other examples are
possible as well.
[0084] In some examples, the audio processing algorithm takes the
form of a filter or equalization. The filter or equalization may be
applied by the playback device 210a (e.g., via audio processing
components 112g). Alternatively, the filter or equalization may be
applied by another playback device, the computing device 206,
and/or the network device 230, which then provides the processed
audio content to the playback device 210a for output. The filter or
equalization may be applied to audio content played by the playback
device 210a until such time that the filter or equalization is
changed or is no longer valid for the room 201a.
[0085] The audio processing algorithm may be stored in a database
of the computing device 206 or may be calculated dynamically. For
instance, in some examples, the network device 230 sends to the
computing device 206 the detected audio data representing
reflections of the audio content at multiple locations in the room
201a, and receives, from the computing device 206, the audio
processing algorithm after the computing device 206 has determined
the audio processing algorithm. In other examples, the network
device 230 determines the audio processing algorithm based on the
detected audio data representing reflections of the audio content
at multiple locations in the room 201a.
[0086] Further, while the network device 230 captures audio data at
multiple locations in the room 201a for determining the
multi-location acoustic response of the room 201a, the playback
device 210a concurrently captures audio data at a stationary
location for determining a localized acoustic response of the room
201a. To facilitate this, the playback device 210a may have one or
more microphones, which may be fixed in location. For example, the
one or more microphones may be co-located in or on the playback
device 210a (e.g., mounted in a housing of the playback device) or
be co-located in or on an NMD proximate to the playback device
210a. Additionally, the one or more microphones may be oriented in
one or more directions. The one or more microphones detect audio
data representing reflections of the audio content output by the
playback device 210a in the room 201a, and this detected audio data
is used to determine the localized acoustic response of the room
201a.
[0087] The localized acoustic response is an acoustic response of
the room 201a based on the detected audio data representing
reflections of the audio content at a stationary location in the
room. The stationary location may be at the one or more microphones
located on or proximate to the playback device 210a, but could also
be at the microphone of an NMD or a controller device proximate to
the playback device 210a.
[0088] The localized acoustic response may be represented as a
spectral response, spatial response, or temporal response, among
others. The spectral response may be an indication of how volume of
audio sound captured by the microphone varies with frequency within
the room 201a. A power spectral density is an example
representation of the spectral response. The spatial response may
indicate how the volume of the audio sound captured by the
microphone varies with direction and/or spatial position in the
room 201a. The temporal response may be an indication of how audio
sound played by the playback device 210a, e.g., an impulse sound or
tone played by the playback device 210a, changes within the room
201a. The change may be characterized as a reverberation, delay,
decay, or phase change of the audio sound. The spatial response and
temporal response may be represented as averages in some instances.
Additionally, or alternatively, the localized acoustic response may
be represented as a set of impulse responses or bi-quad filter
coefficients representative of the acoustic response, among others.
Values of the localized acoustic response may be represented in
vector or matrix form.
[0089] Similar to the multi-location acoustic response, the
localized acoustic response of the room 201a may be used to
determine a set of calibration settings for the playback device
210a. As such, calibration settings based on a multi-location
acoustic response are referred to herein as "multi-location
calibration settings," and calibration settings based on a
localized acoustic response are referred to herein as "localized
calibration settings." Further, like the multi-location calibration
settings, the localized calibration settings are configured to
offset or otherwise account for acoustic characteristics of the
room 201a. In some examples, the localized calibration settings,
when applied to the playback device 210a, cause audio content
output by the playback device 210a in the room 201a to have a
target frequency response. For instance, determining the localized
calibration settings may involve determining an audio processing
algorithm that adjusts a frequency response detected at or near the
playback device 210a to more closely reflect a target frequency
response. In one example, if the detected frequency response has a
particular audio frequency that is more attenuated than other
frequencies, then determining the localized calibration settings
may involve determining an audio processing algorithm that
increases amplification at the particular audio frequency. Other
examples are possible as well.
[0090] Like the multi-location calibration settings, the localized
calibration settings of the room 201a may be determined in various
ways. In one case, the playback device 210a determines the
localized acoustic response based on the detected audio data
representing audio reflections captured by the playback device 210a
within the room 201a, and then the playback device 210a determines
the localized calibration settings based on the localized acoustic
response of the room 201a. In another case, the playback device
210a sends the detected audio data to the network device 230, the
network device 230 determines the localized acoustic response based
on the detected audio data, and the network device 230 determines
the localized calibration settings based on the localized acoustic
response. In yet another case, the playback device 210a or the
network device 230 sends the detected audio data to the computing
device 206, and the computing device 206 (or another device
connected to the computing device 206) determines the localized
acoustic response and the localized calibration settings.
[0091] Once the multi-location calibration settings for the
playback device 210a and the localized acoustic response of the
room 201a are determined, this data is then provided to a computing
device, such as computing device 206, for storage in a database.
For instance, the network device 230 may send the determined
multi-location calibration settings to the computing device 206,
and the playback device 210a may send the localized acoustic
response of the room 201a to the computing device 206. In other
examples, the network device 230 or the playback device 210a sends
both the determined multi-location calibration settings and the
localized acoustic response of the room 201a to the computing
device 206. Other examples are possible as well.
[0092] FIG. 2B depicts an example database 250 for storing both the
determined multi-location calibration settings for the playback
device 210a and the localized acoustic response of the room 201a.
The database 250 may be stored on a computing device, such as
computing device 206, located remotely from the playback device
210a and/or from the network device 230, or the database 250 may be
stored on the playback device 210a and/or the network device 230.
The database 250 includes a number of records, and each record
includes data representing multi-location calibrations settings
(identified as "settings 1" through "settings 5") for various
playback devices as well as room responses (identified as "response
1" through "response 5"), such as localized acoustic responses,
associated with the multi-location calibration settings. For the
purpose of illustration, the database 250 only depicts five records
(numbered 1-5), but in practice should include many more than five
records to improve the accuracy of the calibration processes
described in further detail below.
[0093] When the computing device 206 receives data representing the
multi-location calibration settings for the playback device 210a
and data representing the localized acoustic response of the room
201a, the computing device 206 stores the received data in a record
of the database 250. As an example, the computing device 206 stores
the received data in record #1 of the database 250, such that
"response 1" includes data representing the localized acoustic
response of the room 201a, and "settings 1" includes data
representing the multi-location calibration settings for the
playback device 210a. In some cases, the database 250 also includes
data representing respective multi-location acoustic responses
associated with the localized acoustic responses and the
corresponding multi-location calibration settings. For instance, if
record #1 of database 250 corresponds to playback device 210a, then
"response 1" may include data representing both the localized
acoustic response of the room 201a and the multi-location acoustic
response of the room 201a.
[0094] As further shown, in some examples, the database 250
includes data identifying a type of a playback device associate
with each record. Playback device "type" refers to a model or
revision of a model, as well as different models that are designed
to produce similar audio output (e.g., playback devices with
similar components), among other examples. The type of the playback
device may be indicated when providing the calibration settings and
room response data to the database 250. As an example, in addition
to the network device 230 and/or the playback device 210a sending
data representing the multi-location calibration settings for the
playback device 210a and data representing the localized acoustic
response of the room 201a to the computing device 206, the network
device 230 and/or the playback device 210a also sends data
representing a type of the playback device 210a to the computing
device. Examples of playback device types offered by Sonos, Inc.
include, by way of illustration, various models of playback devices
such as a "SONOS ONE," "PLAY:1," "PLAY:3," "PLAY:5," "PLAYBAR,"
"PLAYBASE," "CONNECT:AMP," "CONNECT," and "SUB," among others.
[0095] In some examples, the data identifying the type of the
playback device additionally or alternatively includes data
identifying a configuration of the playback device. For instance,
as described above in connection with FIG. 1E, a playback device
may be a bonded or paired playback device configured to process and
reproduce sound differently than an unbonded or unpaired playback
device. Accordingly, in some examples, the data identifying the
type of the playback device 210a includes data identifying whether
the playback device 210a is in a bonded or paired
configuration.
[0096] By storing in the database 250 data identifying the type of
the playback device, the database 250 may be more quickly searched
by filtering data based on playback device type, as described in
further detail below. However, in some examples, the database 250
does not include data identifying the device type of the playback
device associated with each record.
[0097] Each record of the database 250 corresponds to a historical
playback device calibration process in which a particular playback
device was calibrated by determining calibration settings based on
a multi-location acoustic response, as described above in
connection with FIG. 2A. The calibration processes are "historical"
in the sense that they relate to multi-location calibration
settings and localized acoustic responses determined for rooms with
various types of acoustic characteristics previously determined and
stored in the database 250. As additional iterations of the
calibration process are performed, the resulting multi-location
calibration settings and localized acoustic responses may be added
to the database 250.
[0098] Other playback devices may leverage the historical
multi-location calibration settings and localized acoustic
responses stored in the database 250 in order to self-calibrate to
account for the acoustic responses of the rooms in which they are
located. In one example, a playback device determines a localized
acoustic response of a room in which the device is located, and the
playback device queries the database 250 to identify a record
having a stored acoustic response that is similar to the determined
acoustic response. The playback device then applies to itself the
multi-location calibration settings stored in the database 250 that
are associated with the identified record.
[0099] Efficacy of the applied calibration settings is influenced
by a degree of similarity between the identified stored acoustic
response in the database 250 and the determined acoustic response
for the playback device being calibrated. In particular, if the
acoustic responses are significantly similar or identical, then the
applied calibration settings are more likely to accurately offset
or otherwise account for an acoustic response of the room in which
the playback device being calibrated is located (e.g., by achieving
or approaching a target frequency response in the room, as
described above). On the other hand, if the acoustic responses are
relatively dissimilar, then the applied calibration settings are
less likely to accurately account for an acoustic response of the
room in which the playback device being calibrated is located.
Accordingly, populating the database 250 with records corresponding
to a significantly large number of historical calibration processes
may be desirable so as to increase the likelihood of the database
250 including acoustic response data similar to an acoustic
response of the room of the playback device presently being
calibrated.
[0100] FIG. 2C depicts an example environment in which a playback
device 210b leverages the database 250 to perform a
self-calibration process without determining a multi-location
acoustic response of its room 201b.
[0101] In one example, the self-calibration of the playback device
210b may be initiated when the playback device 210b is being set up
for the first time in the room 201b, when the playback device 210b
first outputs music or some other audio content, or if the playback
device 210b has been moved to a new location. For instance, if the
playback device 210b is moved to a new location, calibration of the
playback device 210b may be initiated based on a detection of the
movement (e.g., via a global positioning system (GPS), one or more
accelerometers, or wireless signal strength variations), or based
on a user input indicating that the playback device 210b has moved
to a new location (e.g., a change in playback zone name associated
with the playback device 210b).
[0102] In another example, calibration of the playback device 210b
may be initiated via a controller device, such as the controller
device 130a depicted in FIG. 1H. For instance, a user may access a
controller interface for the playback device 210b to initiate
calibration of the playback device 210b. In one case, the user may
access the controller interface, and select the playback device
210b (or a group of playback devices that includes the playback
device 210b) for calibration. In some cases, a calibration
interface may be provided as part of a playback device controller
interface to allow a user to initiate playback device calibration.
Other examples are also possible.
[0103] Further, in some examples, calibration of the playback
device 210b is initiated periodically, or after a threshold amount
of time has elapsed after a previous calibration, in order to
account for changes to the environment of the playback device 210b.
For instance, a user may change a layout of the room 201b (e.g., by
adding, removing, or rearranging furniture), thereby altering the
acoustic response of the room 201b. As a result, any calibration
settings applied to the playback device 210b before the room 201b
is altered may have a reduced efficacy of accounting for, or
offsetting, the altered acoustic response of the room 201b.
Initiating calibration of the playback device 210b periodically, or
after a threshold amount of time has elapsed after a previous
calibration, can help address this issue by updating the
calibration settings at a later time (i.e., after the room 201b is
altered) so that the calibration settings applied to the playback
device 210b are based on the altered acoustic response of the room
201b.
[0104] Additionally, because calibration of the playback device
210b involves accessing and retrieving calibration settings from
the database 250, as described in further detail below, initiating
calibration of the playback device 210b periodically, or after a
threshold amount of time has elapsed after a previous calibration,
may further improve a listening experience in the room 201b by
accounting for changes to the database 250. For instance, as users
continue to calibrate various playback devices in various rooms,
the database 250 continues to be updated with additional acoustic
room responses and corresponding calibration settings. As such, a
newly added acoustic response (i.e., an acoustic response that is
added to the database 250 after the playback device 210b has
already been calibrated) may more closely resemble the acoustic
response of the room 201b. Thus, by initiating calibration of the
playback device 210b periodically, or after a threshold amount of
time has elapsed after a previous calibration, the calibration
settings corresponding to the newly added acoustic response may be
applied to the playback device 210b. Accordingly, in some examples,
the playback device 210b determines that at least a threshold
amount of time has elapsed after the playback device 210b has been
calibrated, and, responsive to making such a determination, the
playback device 210b initiates a calibration process, such as the
calibration processes described below.
[0105] When performing the calibration process, the playback device
210b outputs audio content and determines a localized acoustic
response of its room 201b similarly to how playback device 210a
determined a localized acoustic response of room 201a. For
instance, the playback device 210b outputs audio content, which may
include music or one or more predefined tones, captures audio data
representing reflections of the audio content within the room 201b,
and determines the localized acoustic response based on the
captured audio data.
[0106] Causing the playback device 210b to output spectrally rich
audio content during the calibration process may yield a more
accurate localized acoustic response of the room 201b. Thus, in
examples where the audio content includes predefined tones, the
playback device 210b may output predefined tones over a range of
frequencies for determining the localized acoustic response of the
room 201b. And in examples where the audio content includes music,
such as music played during normal use of the playback device 210b,
the playback device 210b may determine the localized acoustic
response based on audio data that is captured over an extended
period of time. For instance, as the playback device 210b outputs
music, the playback device 210b may continue to capture audio data
representing reflections of the output music within the room 201b
until a threshold amount of data at a threshold amount of
frequencies is captured. Depending on the spectral content of the
output music, the playback device 210b may capture the reflected
audio data over the course of multiple songs, for instance, in
order for the playback device 210b to have captured the threshold
amount of data at the threshold amount of frequencies. In this
manner, the playback device 210b gradually learns the localized
acoustic response of the room 201b, and once a threshold confidence
in understanding of the localized acoustic response of the room
201b is met, then the playback device 210b uses the localized
acoustic response of the room 201b to determine calibration
settings for the playback device 210b, as described in further
detail below.
[0107] The playback device 210b may output the audio content at
various volume levels. For instance, if audio characteristics such
as acceptable volume ranges of the playback device 210b are known,
then the playback device 210b or a controller device, such as the
controller 130a depicted in FIG. 1H, in communication with the
playback device 210b may cause the playback device 210b to output
the audio content at a volume that falls within the acceptable
volume range of the playback device 210b. However, there may be
circumstances in which the acceptable volume range of the playback
device 210b is not known. For instance, the playback device 210b
may include an amplifier, such as the "CONNECT:AMP," offered by
Sonos, Inc., configured to output audio via connection to external
speakers with unknown audio characteristics. Without knowing the
audio characteristics of the speakers, the playback device 210b
could damage the speakers by attempting to drive the speakers with
too high electrical current.
[0108] The above issue may be addressed in various ways. For
instance, in some examples, the playback device 210b is configured
to apply a limit to the output volume or to the driver current. The
limit may be set to a conservative value that is safe for most or
virtually all speakers. In some embodiments, a user inputs into a
controller device, for instance, information identifying or
characterizing the speakers of the playback device 210b. The
information may include a manufacturer and/or model number of the
speakers, a size of the speakers, a maximum rated current or
wattage of the speakers, or any other information that could be
used to characterize the audio capabilities of the speakers. The
controller then uses the input information to set an appropriate
output volume of the playback device 210b. In some embodiments, the
playback device 210b is configured to measure an impedance curve of
the speakers, and the playback device 210b or the controller device
sets the output volume of the playback device 210b based on the
measured impedance curve.
[0109] In some embodiments, the playback device 210b varies the
volume of the audio content while the playback device 210b outputs
the audio content. In one example, the playback device 210b outputs
the audio content at a first, lower volume and increases the volume
of the audio content to a second, higher volume. The increase may
be a gradual increase over time (i.e., over a first portion of the
time period in which the playback device is outputting the audio
content).
[0110] The playback device 210b may determine when to stop
increasing the volume based on various characteristics, such as a
signal-to-noise ratio (SNR) of audio detected by the playback
device 210b while outputting the audio content. A determined
acoustic room response may be more accurate if the audio used for
determining the room response has a high SNR. Thus, in some
examples, the playback device 210b uses its microphone to capture
audio data representing the output audio content within the room
201b, and the playback device 210b determines an SNR of the
captured audio data. If the determined SNR is below a threshold
SNR, then the playback device 210b increases the volume of the
output audio content. The playback device 210b continues to
increase the volume of the output audio content until the
determined SNR exceeds the threshold SNR value. Similarly, in order
to avoid outputting excessively loud audio content, in some
embodiments the playback device 210b decreases the volume of the
output audio content responsive to determining that the SNR of the
captured audio exceeds the threshold SNR value by a predetermined
amount.
[0111] While outputting the audio content, the playback device 210b
uses one or more stationary microphones, which may be disposed in
or on a housing of the playback device 210b or may be co-located in
or on an NMD proximate to the playback device 210b, to capture
audio data representing reflections of the audio content in the
room 201b. The playback device 210b then uses the captured audio
data to determine the localized acoustic response of the room 201b.
In line with the discussion above, the localized acoustic response
may include a spectral response, spatial response, or temporal
response, among others, and the localized acoustic response may be
represented in vector or matrix form.
[0112] In some embodiments, determining the localized acoustic
response of the room 201b involves accounting for a self-response
of the playback device 210b or of a microphone of the playback
device 210b, for example, by processing the captured audio data
representing reflections of the audio content in the room 201b so
that the captured audio data reduces or excludes the playback
device's native influence on the audio reflections.
[0113] In one example, the self-response of the playback device
210b is determined in an anechoic chamber, or is otherwise known
based on a self-response of a similar playback device being
determined in an anechoic chamber. In the anechoic chamber, audio
content output by the playback device 210b is inhibited from
reflecting back toward the playback device 210b, so that audio
captured by a microphone of the playback device 210b is indicative
of the self-response of the playback device 210b or of the
microphone of the playback device 210b. Knowing the self-response
of the playback device 210b or of the microphone of the playback
device 210b, the playback device 210 offsets such a self-response
from the captured audio data representing reflections of the first
audio content when determining the localized acoustic response of
the room 201b.
[0114] Once the localized acoustic response of the room 201b is
known, the playback device 210b accesses the database 250 to
determine a set of calibration settings to account for the acoustic
response of the room 201b. For example, the playback device 210b
establishes a connection with the computing device 206 and with the
database 250 of the computing device 206, and the playback device
210b queries the database 250 for a stored acoustic room response
that corresponds to the determined localized acoustic response of
the room 201b.
[0115] In some examples, querying the database 250 involves mapping
the determined localized acoustic response of the room 201b to a
particular stored acoustic room response in the database 250 that
satisfies a threshold similarity to the localized acoustic response
of the room 201b. This mapping may involve comparing values of the
localized acoustic response to values of the stored acoustic room
responses and determining which of the stored acoustic room
responses are similar to the localized acoustic response.
[0116] For example, in implementations where the acoustic responses
are represented as vectors, the mapping may involve determining
distances between the localized acoustic response vector and the
stored acoustic response vectors. In such a scenario, the stored
acoustic response vector having the smallest distance from the
localized acoustic response vector of the room 201b may be
identified as satisfying the threshold similarity. In some
examples, one or more values of the localized acoustic response of
the room 201b may be averaged and compared to corresponding
averaged values of the stored acoustic responses of the database
250. In such a scenario, the stored acoustic response having
averaged values closest to the averaged values of the localized
acoustic response vector of the room 201b may be identified as
satisfying the threshold similarity. Other examples are possible as
well.
[0117] As shown, the room 201b depicted in FIG. 2C and the room
201a depicted in FIG. 2A are similarly shaped and have similar
layouts. Further, the playback device 210b and the playback device
210a are arranged in similar positions in their respective rooms.
As such, when the localized room response determined by playback
device 210b for room 201b is compared to the room responses stored
in the database 250, the computing device 206 may determine that
the localized room response determined by playback device 210a for
room 201a has at least a threshold similarity to the localized room
response determined by playback device 210b for room 201b.
[0118] In some examples, querying the database 250 involves
querying only a portion of the database 250. For instance, as noted
above, the database 250 may identify a type or configuration of
playback device for which each record of the database 250 is
generated. Playback devices of the same type or configuration may
be more likely to have similar room responses and may be more
likely to have compatible calibration settings. Accordingly, in
some embodiments, when the playback device 210b queries the
database 250 for comparing the localized acoustic response of the
room 201b to the stored room responses of the database 250, the
playback device 210b might only compare the localized acoustic
response of the room 201b to stored room responses associated with
playback devices of the same type or configuration as the playback
device 210b.
[0119] Once a stored acoustic room response of the database 250 is
determined to be threshold similar to the localized acoustic
response of the room 201b, then the playback device 210b identifies
a set of calibration settings associated with the threshold similar
stored acoustic room response. For instance, as shown in FIG. 2B,
each stored acoustic room response is included as part of a record
that also includes a set of calibration settings designed to
account for the room response. As such, the playback device 210b
retrieves, or otherwise obtains from the database 250, the set of
calibration settings that share a record with the threshold similar
stored acoustic room response and applies the set of calibration
settings to itself.
[0120] After applying the obtained calibration settings to itself,
the playback device 210b outputs, via its one or more transducers,
second audio content using the applied calibration settings. Even
though the applied calibration settings were determined for a
different playback device calibrated in a different room, the
localized acoustic response of the room 201b is similar enough to
the stored acoustic response that the second audio content is
output in a manner that at least partially accounts for the
acoustics of the room 201b. For instance, with the applied
calibration settings, the second audio content output by the
playback device 210b may have a frequency response, at one or more
locations in the room 201b, that is closer to a target frequency
response than the first audio content.
[0121] In line with the discussion above, the playback device 210b
(or some other network device in communication with the playback
device 210b) may determine localized calibration settings based on
the localized acoustic response. Accordingly, in some examples,
before or while querying the database 250 for multi-location
calibration settings, the playback device 210b determines localized
calibration settings based on the localized acoustic response of
the room 201b and applies the determined localized calibration
settings to itself. And if the playback device 210b successfully
queries the database 250 for multi-location calibration settings by
mapping the determined localized acoustic response of the room 201b
to a particular stored acoustic room response in the database 250
as described above, then the playback device 210b transitions from
applying the localized calibration settings to applying the
multi-location calibration settings retrieved from the database
250.
[0122] FIG. 3A shows an example embodiment of a method 300 for
establishing a database of calibration settings for playback
devices, and FIG. 3B shows an example embodiment of a method 320
for calibrating a playback device using the database established
according to method 300. Methods 300 and 320 can be implemented by
any of the playback devices disclosed and/or described herein, or
any other playback device now known or later developed.
[0123] Various embodiments of methods 300 and 320 include one or
more operations, functions, and actions illustrated by blocks 302
through 312 and blocks 322 through 334. Although the blocks are
illustrated in sequential order, these blocks may also be performed
in parallel, and/or in a different order than the order disclosed
and described herein. Also, the various blocks may be combined into
fewer blocks, divided into additional blocks, and/or removed based
upon a desired implementation.
[0124] In addition, for the methods 300 and 320 and for other
processes and methods disclosed herein, the flowcharts show
functionality and operation of one possible implementation of some
embodiments. In this regard, each block may represent a module, a
segment, or a portion of program code, which includes one or more
instructions executable by one or more processors for implementing
specific logical functions or steps in the process. The program
code may be stored on any type of computer readable medium, for
example, such as a storage device including a disk or hard drive.
The computer readable medium may include non-transitory computer
readable media, for example, such as tangible, non-transitory
computer-readable media that stores data for short periods of time
like register memory, processor cache, and Random Access Memory
(RAM). The computer readable medium may also include non-transitory
media, such as secondary or persistent long term storage, like read
only memory (ROM), optical or magnetic disks, compact-disc read
only memory (CD-ROM), for example. The computer readable media may
also be any other volatile or non-volatile storage systems. The
computer readable medium may be considered a computer readable
storage medium, for example, or a tangible storage device. In
addition, for the methods 300 and 320 and for other processes and
methods disclosed herein, each block in FIGS. 3A and 3B may
represent circuitry that is wired to perform the specific logical
functions in the process.
[0125] Method 300 involves populating a database with a plurality
of sets of stored audio calibration settings, each set associated
with a respective stored acoustic room response of a plurality of
stored acoustic room responses. The plurality of sets of stored
audio calibration settings and the plurality of stored acoustic
room responses are determined based on multiple media playback
systems each performing a respective audio calibration process and
a respective acoustic room response determination process
represented by method 300.
[0126] Method 300 begins at block 302, which involves a respective
playback device outputting respective audio content via one or more
transducers (e.g., one or more speakers and/or speaker drivers)
within a respective room. In line with the discussion above, the
audio content may include content with frequencies substantially
covering a renderable frequency range of the respective playback
device or a frequency range audible to a human. In one case, the
audio content is output using an audio signal created specifically
for use when calibrating playback devices, such as the respective
playback device. In another case, the audio content is an audio
track that is a favorite of a user of the respective playback
device, or a commonly played audio track by the respective playback
device. Other examples are also possible.
[0127] At block 304, method 300 involves, while the respective
playback device outputs the respective audio content, capturing,
via a microphone of a respective mobile device in communication
with the respective playback device, first respective audio data
representing reflections of the respective audio content in the
respective room while the respective mobile device is moving from a
first physical location to a second physical location within the
respective room.
[0128] At block 306, method 300 involves, while the respective
playback device outputs the respective audio content, capturing,
via a microphone disposed in a housing of the respective playback
device, second respective audio data representing reflections of
the respective audio content in the respective room.
[0129] At block 308, method 300 involves the respective playback
device using the first respective audio data to determine a set of
audio calibration settings for the respective playback device.
[0130] At block 310, method 300 involves the respective playback
device using the second respective audio data to determine an
acoustic response of the respective room.
[0131] At block 312, method 300 involves storing in the database
the determined set of audio calibration settings for the respective
playback device as well as the determined acoustic response of the
respective room.
[0132] Turning now to FIG. 3B, method 320 involves a playback
device using a database that is populated with a plurality of sets
of stored audio calibration settings and associated sets of stored
acoustic room responses to calibrate the playback device so that
the audio output by the playback device accounts for the acoustics
of a room in which the playback device is located. In some
examples, the database used in method 320 is populated by a
plurality of playback devices performing method 300.
[0133] Method 320 begins at block 322, which involves a playback
device outputting first audio content via one or more transducers
(e.g., one or more speakers and/or speaker drivers) of the playback
device. In some examples, the first audio content is the same audio
content output by the respective playback device at block 302 in
method 300. However, in other examples, the first audio content is
different than the audio content output by the respective playback
device at block 302 in method 300.
[0134] In some embodiments, the playback device outputting the
first audio content involves gradually increasing a volume level of
the playback device while outputting the first audio content.
Further, in some embodiments, method 320 further involves, while
outputting the first audio content, measuring a signal-to-noise
ratio of the first audio content to environmental noise in the room
in which the playback device is located, and, when the
signal-to-noise ratio exceeds a threshold value for calibration,
ceasing to increase the volume level of the playback device and
continuing to output the first audio content at the current volume
level.
[0135] At block 324, method 320 involves capturing, via a
microphone of the playback device, audio data representing
reflections of the first audio content within a room in which the
playback device is located. As noted above, instead of being moved
around the room, the microphone of the playback device is disposed
in or on a housing of the playback device or is co-located in or on
an NMD proximate to the playback device.
[0136] At block 326, method 320 involves, based on the captured
audio data, determining an acoustic response of the room in which
the playback device is located. In some embodiments, a
self-response of the playback device is pre-determined in an
anechoic chamber, and determining the acoustic response of the room
in which the playback device is located involves offsetting the
self-response of the playback device from the captured audio data
representing reflections of the first audio content. Further, in
some embodiments, a self-response of the playback device's
microphone is pre-determined in an anechoic chamber, and
determining the acoustic response of the room in which the playback
device is located involves offsetting the self-response of the
microphone from the captured audio data representing reflections of
the first audio content.
[0137] At block 328, method 320 involves establishing a connection
with a database comprising a plurality of sets of stored audio
calibration settings, each set associated with a respective stored
acoustic room response of a plurality of stored acoustic room
responses.
[0138] In line with the discussion above, the plurality of sets of
stored audio calibration settings are determined, in some
embodiments, based on multiple media playback systems each
performing a respective audio calibration process comprising (i)
outputting, via a respective playback device within a respective
room that is different from the room in which the playback device
is located, respective audio content, (ii) while the respective
playback device outputs the respective audio content, capturing,
via a microphone of a respective mobile device in communication
with the respective playback device, first respective audio data
representing reflections of the respective audio content in the
respective room while the respective mobile device is moving from a
first physical location to a second physical location within the
respective room, and (iii) based on the first respective audio
data, determining a set of audio calibration settings for the
respective playback device.
[0139] In some embodiments, determining the set of audio
calibration settings for the respective playback device involves
(i) determining audio characteristics of the respective room based
on the first respective audio data and (ii) determining respective
audio calibration settings for the respective playback device that
offset the determined audio characteristics of the respective
room.
[0140] Further, in some embodiments, the plurality of stored
acoustic room responses are determined based on the multiple media
playback systems each performing a respective acoustic room
response determination process comprising (i) while the respective
playback device outputs the respective audio content, capturing,
via a microphone disposed in a housing of the respective playback
device, second respective audio data representing reflections of
the respective audio content in the respective room, and (ii) based
on the second respective audio data, determining an acoustic
response of the respective room.
[0141] At block 330, method 320 involves querying the database for
a stored acoustic room response that corresponds to the determined
acoustic response of the room in which the playback device is
located. In some embodiments, querying the database for the stored
acoustic room response involves mapping the acoustic response of
the room in which the playback device is located to a particular
stored acoustic room response in the database that satisfies a
threshold similarity to the acoustic response of the room in which
the playback device is located.
[0142] At block 332, method 320 involves, responsive to the query,
applying to the playback device a particular set of stored audio
calibration settings associated with the stored acoustic room
response that corresponds to the determined acoustic response of
the room in which the playback device is located.
[0143] At block 334, method 320 involves outputting, via the one or
more transducers of the playback device, second audio content using
the particular set of audio calibration settings associated with
the stored acoustic room response that corresponds to the
determined acoustic response of the room in which the playback
device is located.
IV. Conclusion
[0144] The above discussions relating to playback devices,
controller devices, playback zone configurations, and media content
sources provide only some examples of operating environments within
which functions and methods described below may be implemented.
Other operating environments and configurations of media playback
systems, playback devices, and network devices not explicitly
described herein may also be applicable and suitable for
implementation of the functions and methods.
[0145] The description above discloses, among other things, various
example systems, methods, apparatus, and articles of manufacture
including, among other components, firmware and/or software
executed on hardware. It is understood that such examples are
merely illustrative and should not be considered as limiting. For
example, it is contemplated that any or all of the firmware,
hardware, and/or software aspects or components can be embodied
exclusively in hardware, exclusively in software, exclusively in
firmware, or in any combination of hardware, software, and/or
firmware. Accordingly, the examples provided are not the only ways)
to implement such systems, methods, apparatus, and/or articles of
manufacture.
[0146] Additionally, references herein to "embodiment" means that a
particular feature, structure, or characteristic described in
connection with the embodiment can be included in at least one
example embodiment of an invention. The appearances of this phrase
in various places in the specification are not necessarily all
referring to the same embodiment, nor are separate or alternative
embodiments mutually exclusive of other embodiments. As such, the
embodiments described herein, explicitly and implicitly understood
by one skilled in the art, can be combined with other
embodiments.
[0147] The specification is presented largely in terms of
illustrative environments, systems, procedures, steps, logic
blocks, processing, and other symbolic representations that
directly or indirectly resemble the operations of data processing
devices coupled to networks. These process descriptions and
representations are typically used by those skilled in the art to
most effectively convey the substance of their work to others
skilled in the art. Numerous specific details are set forth to
provide a thorough understanding of the present disclosure.
However, it is understood to those skilled in the art that certain
embodiments of the present disclosure can be practiced without
certain, specific details. In other instances, well known methods,
procedures, components, and circuitry have not been described in
detail to avoid unnecessarily obscuring aspects of the embodiments.
Accordingly, the scope of the present disclosure is defined by the
appended claims rather than the foregoing description of
embodiments.
[0148] When any of the appended claims are read to cover a purely
software and/or firmware implementation, at least one of the
elements in at least one example is hereby expressly defined to
include a tangible, non-transitory medium such as a memory, DVD,
CD, Blu-ray, and so on, storing the software and/or firmware.
* * * * *