U.S. patent application number 15/587244 was filed with the patent office on 2018-07-26 for privacy control in a connected environment based on speech characteristics.
The applicant listed for this patent is Essential Products, Inc.. Invention is credited to Dwipal Desai, Manuel Clair Roman, Andrew E. Rubin, Mara Clair Segal.
Application Number | 20180213396 15/587244 |
Document ID | / |
Family ID | 62906588 |
Filed Date | 2018-07-26 |
United States Patent
Application |
20180213396 |
Kind Code |
A1 |
Segal; Mara Clair ; et
al. |
July 26, 2018 |
PRIVACY CONTROL IN A CONNECTED ENVIRONMENT BASED ON SPEECH
CHARACTERISTICS
Abstract
Privacy control in a connected environment is described. An
assistant device can detect speech spoken within its environment.
The assistant device can determine characteristics of that speech
and determine privacy expectations regarding the first speech based
on the characteristics. Based on the privacy expectations, the
speech can be provided one or both of local resources of the
assistant device or a cloud server to receive a response regarding
the speech.
Inventors: |
Segal; Mara Clair; (San
Francisco, CA) ; Roman; Manuel Clair; (San Francisco,
CA) ; Desai; Dwipal; (Palo Alto, CA) ; Rubin;
Andrew E.; (Los Altos, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Essential Products, Inc. |
Palo Alto |
CA |
US |
|
|
Family ID: |
62906588 |
Appl. No.: |
15/587244 |
Filed: |
May 4, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62448923 |
Jan 20, 2017 |
|
|
|
62486392 |
Apr 17, 2017 |
|
|
|
62486388 |
Apr 17, 2017 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G10L 15/1822 20130101;
G10L 2015/228 20130101; H04W 4/70 20180201; G10L 2015/223 20130101;
H04W 12/001 20190101; G10L 2015/088 20130101; H04L 63/04 20130101;
H04L 63/00 20130101; H04L 67/10 20130101; G10L 15/08 20130101; G10L
15/22 20130101; H04W 12/02 20130101; G10L 25/78 20130101; G10L
15/02 20130101; G10L 15/30 20130101; G10L 25/51 20130101 |
International
Class: |
H04W 12/02 20060101
H04W012/02; G10L 15/22 20060101 G10L015/22; G10L 25/51 20060101
G10L025/51; G10L 15/30 20060101 G10L015/30 |
Claims
1. A home assistant device, comprising: a microphone; a speaker;
one or more processors; and memory storing instructions, wherein
the one or more processors are configured to execute the
instructions such that the one or more processors and memory are
configured to: detect first speech spoken by a first user of the
home assistant device using the microphone; determine first
characteristics of the first speech, the first characteristics
including one or more of content of the first speech, time of the
first speech, location of the first speech, distance from the home
assistant device to a source of the first speech, identity of the
first user providing the first speech, or audio characteristics of
the first speech; determine first privacy expectations regarding
the first speech based on the first characteristics of the first
speech; provide the first speech to a cloud server based on the
first privacy expectations corresponding to the first speech;
receive a first response from the cloud server providing a response
to the first speech; play back the first response using the
speaker; detect second speech spoken by the first user of the home
assistant device using the microphone; determine second
characteristics of the second speech, the second characteristics
including one or more of content of the second speech, time of the
second speech, location of the second speech, distance from the
home assistant device to a source of the second speech, identity of
the first user providing the second speech, or audio
characteristics of the second speech; determine second privacy
expectations regarding the second speech based on the
characteristics of the second speech, the first privacy
expectations and the second privacy expectations being different,
the second privacy expectations representing higher privacy
expectations than the first privacy expectations based on
differences between the first characteristics and the second
characteristics; provide the second speech to local resources of a
wireless network associated with the electronic device rather than
the cloud server based on the second privacy expectations; receive
a second response from the local resources providing a response to
the second speech; and play back the second response using the
speaker.
2. The home assistant device of claim 1, wherein the local
resources include one or both of hardware resources of the home
assistant device or resources of other devices communicatively
coupled with the home assistant device on the wireless network.
3. A method for privacy control in a connected environment,
comprising: detecting first speech within an environment of an
assistant device; determining, by a processor of the assistant
device, first characteristics of the first speech; determining
first privacy expectations regarding the first speech based on the
first characteristics of the first speech; and providing the first
speech to one or both of local resources of the assistant device or
a cloud server based on the first privacy expectations.
4. The method of claim 3, wherein the first characteristics
includes one or more of content of the first speech, time of the
first speech, location of the first speech, distance from the home
assistant device to a source of the first speech, identity of a
user providing the first speech, or audio characteristics of the
first speech.
5. The method of claim 3, wherein the first speech is provided to
the cloud server, and the method further comprising: detecting
second speech within the environment; determining second
characteristics of the second speech, the first characteristics and
the second characteristics being different; determining second
privacy expectations regarding the second speech based on the
second characteristics of the second speech, the first privacy
expectations and the second privacy expectations being different;
and providing the second speech to the local resources based on the
second privacy expectations.
6. The method of claim 5, the second privacy expectations represent
higher privacy expectations than the first privacy expectations
based on differences between the first characteristics and the
second characteristics.
7. The method of claim 5, further comprising: receiving first
response data corresponding to the first speech from the cloud
server; receiving second response data corresponding to the second
speech from the local resources; and providing a response to the
first speech and the second speech based on the first response data
received from the cloud server and the second response data
received from the local resources.
8. The method of claim 3, wherein the local resources include one
or both of hardware resources of the assistant device or resources
of other devices communicatively coupled with the assistant device
on a wireless network.
9. The method of claim 3, wherein the first speech was provided at
a first time, the method further comprising: detecting second
speech within an environment of the assistant device at a second
time after the first time; determining second characteristics of
the second speech, the first characteristics and the second
characteristics being similar; determining second privacy
expectations regarding the second speech based on the second
characteristics, the first privacy expectations and the second
privacy expectations being different based on a time difference
between the first time and the second time; and providing the
second speech to one or both of the local resources of the
assistant device or the cloud server based on the second privacy
expectations.
10. An electronic device, comprising: one or more processors; and
memory storing instructions, wherein the one or more processors are
configured to execute the instructions such that the one or more
processors and memory are configured to: detect first speech within
an environment of the electronic device; determine first
characteristics of the first speech; determine first privacy
expectations regarding the first speech based on the first
characteristics of the first speech; and provide the first speech
to one or both of local resources of the electronic device or a
cloud server based on the first privacy expectations.
11. The electronic device of claim 10, wherein the first
characteristics includes one or more of content of the first
speech, time of the first speech, location of the first speech,
distance from the electronic device to a source of the first
speech, identity of a user providing the first speech, or audio
characteristics of the first speech.
12. The electronic device of claim 10, wherein the first speech is
provided to the cloud server, wherein the one or more processors
are configured to execute the instructions such that the one or
more processors and memory are configured to: detect second speech
within the environment; determine second characteristics of the
second speech, the first characteristics and the second
characteristics being different; determine second privacy
expectations regarding the second speech based on the second
characteristics of the second speech, the first privacy
expectations and the second privacy expectations being different;
and provide the second speech to the local resources based on the
second privacy expectations.
13. The electronic device of claim 12, the second privacy
expectations represent higher privacy expectations than the first
privacy expectations based on differences between the first
characteristics and the second characteristics.
14. The electronic device of claim 12, wherein the one or more
processors are configured to execute the instructions such that the
one or more processors and memory are configured to: receive first
response data corresponding to the first speech from the cloud
server; receive second response data corresponding to the second
speech from the local resources; and provide a response to the
first speech and the second speech based on the first response data
received from the cloud server and the second response data
received from the local resources.
15. The electronic device of claim 10, wherein the local resources
include one or both of hardware resources of the electronic device
or resources of other devices communicatively coupled with the
electronic device on a wireless network.
16. The electronic device of claim 10, wherein the first speech was
provided at a first time, wherein the one or more processors are
configured to execute the instructions such that the one or more
processors and memory are configured to: detect second speech
within an environment of the electronic device at a second time
after the first time; determine second characteristics of the
second speech, the first characteristics and the second
characteristics being similar; determine second privacy
expectations regarding the second speech based on the second
characteristics, the first privacy expectations and the second
privacy expectations being different based on a time difference
between the first time and the second time; and provide the second
speech to one or both of the local resources of the electronic
device or the cloud server based on the second privacy
expectations.
17. A computer program product, comprising one or more
non-transitory computer-readable media having computer program
instructions stored therein, the computer program instructions
being configured such that, when executed by one or more computing
devices, the computer program instructions cause the one or more
computing devices to: detect first speech within an environment of
an electronic device; determine first characteristics of the first
speech; determining first privacy expectations regarding the first
speech based on the first characteristics of the first speech; and
providing the first speech to one or both of local resources of the
electronic device or a cloud server based on the first privacy
expectations.
18. The computer program product of claim 17, wherein the first
characteristics includes one or more of content of the first
speech, time of the first speech, location of the first speech,
distance from the electronic device to a source of the first
speech, identity of a user providing the first speech, or audio
characteristics of the first speech.
19. The computer program product of claim 17, wherein the first
speech is provided to the cloud server, wherein the computer
program instructions cause the one or more computing devices to:
detect second speech within the environment; determine second
characteristics of the second speech, the first characteristics and
the second characteristics being different; determine second
privacy expectations regarding the second speech based on the
second characteristics of the second speech, the first privacy
expectations and the second privacy expectations being different;
and provide the second speech to the local resources based on the
second privacy expectations.
20. The computer program product of claim 19, the second privacy
expectations represent higher privacy expectations than the first
privacy expectations based on differences between the first
characteristics and the second characteristics.
21. The computer program product of claim 19, wherein the computer
program instructions cause the one or more computing devices to:
receive first response data corresponding to the first speech from
the cloud server; receive second response data corresponding to the
second speech from the local resources; and provide a response to
the first speech and the second speech based on the first response
data received from the cloud server and the second response data
received from the local resources.
22. The computer program product of claim 17, wherein the local
resources include one or both of hardware resources of the
electronic device or resources of other devices communicatively
coupled with the electronic device on a wireless network.
23. The method of claim 3, wherein the characteristics include
content of the first speech.
24. The method of claim 3, wherein the characteristics include
distance from the home assistant device to a source of the first
speech.
25. The method of claim 3, wherein the characteristics include an
identity of a user speaking the first speech.
Description
CLAIM FOR PRIORITY
[0001] This application claims priority to U.S. Provisional Patent
Application No. 62/448,923, entitled "Privacy Control in a
Connected Environment," by Segal et al., and filed on Jan. 20,
2017. This application also claims priority to U.S. Provisional
Patent Application No. 62/486,392, entitled "Privacy Control in a
Connected Environment Based on Speech Characteristics," by Segal,
and filed on Apr. 17, 2017. This application also claims priority
to U.S. Provisional Patent Application No. 62/486,388, entitled
"Privacy Control in a Connected Environment," by Segal, and filed
on Apr. 17, 2017. The content of the above-identified applications
are incorporated herein by reference in their entirety.
TECHNICAL FIELD
[0002] This disclosure relates to privacy control, and in
particular privacy control in a connected environment such as a
home.
BACKGROUND
[0003] The Internet of Things (IoT) allows for the internetworking
of devices to exchange data among themselves to enable
sophisticated functionality. For example, devices configured for
home automation can exchange data to allow for the control and
automation of lighting, air conditioning systems, security, etc. In
the smart home environment, this can also include home assistant
devices providing an intelligent personal assistant to respond to
speech. For example, a home assistant device can include a
microphone array to receive voice input and provide the
corresponding voice data to a server for analysis to provide an
answer to a question asked by a user. The server can provide the
answer to the home assistant, which can provide the answer as voice
output using a speaker. As such, the user and the home assistant
device can interact with each other using voice, and the
interaction can be supplemented by a server outside of the home
providing the answers. However, some users might have privacy
concerns with sending voice data to a server outside of the
home.
SUMMARY
[0004] Some of the subject matter described herein includes a home
assistant device including: a microphone; a speaker; one or more
processors; and memory storing instructions, wherein the one or
more processors are configured to execute the instructions such
that the one or more processors and memory are configured to:
detect first speech spoken by a first user of the home assistant
device using the microphone; determine first characteristics of the
first speech, the first characteristics including one or more of
content of the first speech, time of the first speech, location of
the first speech, distance from the home assistant device to a
source of the first speech, identity of the first user providing
the first speech, or audio characteristics of the first speech;
determine first privacy expectations regarding the first speech
based on the first characteristics of the first speech; provide the
first speech to a cloud server based on the first privacy
expectations corresponding to the first speech; receive a first
response from the cloud server providing a response to the first
speech; play back the first response using the speaker; detect
second speech spoken by the first user of the home assistant device
using the microphone; determine second characteristics of the
second speech, the second characteristics including one or more of
content of the second speech, time of the second speech, location
of the second speech, distance from the home assistant device to a
source of the second speech, identity of the first user providing
the second speech, or audio characteristics of the second speech;
determine second privacy expectations regarding the second speech
based on the characteristics of the second speech, the first
privacy expectations and the second privacy expectations being
different, the second privacy expectations representing higher
privacy expectations than the first privacy expectations based on
differences between the first characteristics and the second
characteristics; provide the second speech to local resources of a
wireless network associated with the electronic device rather than
the cloud server based on the second privacy expectations; receive
a second response from the local resources providing a response to
the second speech; and play back the second response using the
speaker.
[0005] In some implementations, the local resources include one or
both of hardware resources of the home assistant device or
resources of other devices communicatively coupled with the home
assistant device on the wireless network.
[0006] Some of the subject matter described herein includes a
method for privacy control in a connected environment, including:
detecting first speech within an environment of an assistant
device; determining, by a processor of the assistant device, first
characteristics of the first speech; determining first privacy
expectations regarding the first speech based on the first
characteristics of the first speech; and providing the first speech
to one or both of local resources of the assistant device or a
cloud server based on the first privacy expectations.
[0007] In some implementations, the first characteristics includes
one or more of content of the first speech, time of the first
speech, location of the first speech, distance from the home
assistant device to a source of the first speech, identity of a
user providing the first speech, or audio characteristics of the
first speech.
[0008] In some implementations, the first speech is provided to the
cloud server, and the method further including: detecting second
speech within the environment; determining second characteristics
of the second speech, the first characteristics and the second
characteristics being different; determining second privacy
expectations regarding the second speech based on the second
characteristics of the second speech, the first privacy
expectations and the second privacy expectations being different;
and providing the second speech to the local resources based on the
second privacy expectations.
[0009] In some implementations, the second privacy expectations
represent higher privacy expectations than the first privacy
expectations based on differences between the first characteristics
and the second characteristics.
[0010] In some implementations, the method includes: receiving
first response data corresponding to the first speech from the
cloud server; receiving second response data corresponding to the
second speech from the local resources; and providing a response to
the first speech and the second speech based on the first response
data received from the cloud server and the second response data
received from the local resources.
[0011] In some implementations, the local resources include one or
both of hardware resources of the assistant device or resources of
other devices communicatively coupled with the assistant device on
a wireless network.
[0012] In some implementations, the first speech was provided at a
first time, the method further including: detecting second speech
within an environment of the assistant device at a second time
after the first time; determining second characteristics of the
second speech, the first characteristics and the second
characteristics being similar; determining second privacy
expectations regarding the second speech based on the second
characteristics, the first privacy expectations and the second
privacy expectations being different based on a time difference
between the first time and the second time; and providing the
second speech to one or both of the local resources of the
assistant device or the cloud server based on the second privacy
expectations.
[0013] Some of the subject matter described herein includes an
electronic device, including: one or more processors; and memory
storing instructions, wherein the one or more processors are
configured to execute the instructions such that the one or more
processors and memory are configured to: detect first speech within
an environment of the electronic device; determine first
characteristics of the first speech; determine first privacy
expectations regarding the first speech based on the first
characteristics of the first speech; and provide the first speech
to one or both of local resources of the electronic device or a
cloud server based on the first privacy expectations.
[0014] In some implementations, the first characteristics includes
one or more of content of the first speech, time of the first
speech, location of the first speech, distance from the electronic
device to a source of the first speech, identity of a user
providing the first speech, or audio characteristics of the first
speech.
[0015] In some implementations, the first speech is provided to the
cloud server, wherein the one or more processors are configured to
execute the instructions such that the one or more processors and
memory are configured to: detect second speech within the
environment; determine second characteristics of the second speech,
the first characteristics and the second characteristics being
different; determine second privacy expectations regarding the
second speech based on the second characteristics of the second
speech, the first privacy expectations and the second privacy
expectations being different; and provide the second speech to the
local resources based on the second privacy expectations.
[0016] In some implementations, the second privacy expectations
represent higher privacy expectations than the first privacy
expectations based on differences between the first characteristics
and the second characteristics.
[0017] In some implementations, the one or more processors are
configured to execute the instructions such that the one or more
processors and memory are configured to: receive first response
data corresponding to the first speech from the cloud server;
receive second response data corresponding to the second speech
from the local resources; and provide a response to the first
speech and the second speech based on the first response data
received from the cloud server and the second response data
received from the local resources.
[0018] In some implementations, the local resources include one or
both of hardware resources of the electronic device or resources of
other devices communicatively coupled with the electronic device on
a wireless network.
[0019] In some implementations, the first speech was provided at a
first time, wherein the one or more processors are configured to
execute the instructions such that the one or more processors and
memory are configured to: detect second speech within an
environment of the electronic device at a second time after the
first time; determine second characteristics of the second speech,
the first characteristics and the second characteristics being
similar; determine second privacy expectations regarding the second
speech based on the second characteristics, the first privacy
expectations and the second privacy expectations being different
based on a time difference between the first time and the second
time; and provide the second speech to one or both of the local
resources of the electronic device or the cloud server based on the
second privacy expectations.
[0020] Some of the subject matter described herein includes a
computer program product, comprising one or more non-transitory
computer-readable media having computer program instructions stored
therein, the computer program instructions being configured such
that, when executed by one or more computing devices, the computer
program instructions cause the one or more computing devices to:
detect first speech within an environment of an electronic device;
determine first characteristics of the first speech; determining
first privacy expectations regarding the first speech based on the
first characteristics of the first speech; and providing the first
speech to one or both of local resources of the electronic device
or a cloud server based on the first privacy expectations.
[0021] In some implementations, the first characteristics includes
one or more of content of the first speech, time of the first
speech, location of the first speech, distance from the electronic
device to a source of the first speech, identity of a user
providing the first speech, or audio characteristics of the first
speech.
[0022] In some implementations, the first speech is provided to the
cloud server, wherein the computer program instructions cause the
one or more computing devices to: detect second speech within the
environment; determine second characteristics of the second speech,
the first characteristics and the second characteristics being
different; determine second privacy expectations regarding the
second speech based on the second characteristics of the second
speech, the first privacy expectations and the second privacy
expectations being different; and provide the second speech to the
local resources based on the second privacy expectations.
[0023] In some implementations, the second privacy expectations
represent higher privacy expectations than the first privacy
expectations based on differences between the first characteristics
and the second characteristics.
[0024] In some implementations, the computer program instructions
cause the one or more computing devices to: receive first response
data corresponding to the first speech from the cloud server;
receive second response data corresponding to the second speech
from the local resources; and provide a response to the first
speech and the second speech based on the first response data
received from the cloud server and the second response data
received from the local resources.
[0025] In some implementations, the local resources include one or
both of hardware resources of the electronic device or resources of
other devices communicatively coupled with the electronic device on
a wireless network.
[0026] In some implementations, the first speech was provided at a
first time, wherein the computer program instructions cause the one
or more computing devices to: detect second speech within an
environment of the electronic device at a second time after the
first time; determine second characteristics of the second speech,
the first characteristics and the second characteristics being
similar; determine second privacy expectations regarding the second
speech based on the second characteristics, the first privacy
expectations and the second privacy expectations being different
based on a time difference between the first time and the second
time; and provide the second speech to one or both of the local
resources of the electronic device or the cloud server based on the
second privacy expectations.
[0027] Some of the subject matter described herein includes an
electronic device, including: one or more processors; and memory
storing instructions, wherein the one or more processors are
configured to execute the instructions such that the one or more
processors and memory are configured to: detect first noise within
an environment of the electronic device; determine first
characteristics of the first noise; determining first privacy
expectations regarding the first noise based on the first
characteristics of the first noise; and providing the first noise
to one or both of local resources of the electronic device or a
cloud server based on the first privacy expectations.
[0028] In some implementations, the first characteristics includes
one or more of content of the first noise, time of the first noise,
location of the first noise, distance from the electronic device to
a source of the first noise, identity of a user providing the first
noise, or audio characteristics of the first noise.
[0029] In some implementations, the first noise is provided to the
cloud server, wherein the one or more processors are configured to
execute the instructions such that the one or more processors and
memory are configured to: detect second noise within the
environment; determine second characteristics of the second noise,
the first characteristics and the second characteristics being
different; determine second privacy expectations regarding the
second noise based on the second characteristics of the second
noise, the first privacy expectations and the second privacy
expectations being different; and provide the second noise to the
local resources based on the second privacy expectations.
[0030] In some implementations, the second privacy expectations
represent higher privacy expectations than the first privacy
expectations based on differences between the first characteristics
and the second characteristics.
[0031] In some implementations, the one or more processors are
configured to execute the instructions such that the one or more
processors and memory are configured to: receive first response
data corresponding to the first noise from the cloud server;
receive second response data corresponding to the second noise from
the local resources; and provide a response to the first noise and
the second noise based on the first response data received from the
cloud server and the second response data received from the local
resources.
[0032] In some implementations, the local resources include one or
both of hardware resources of the electronic device or resources of
other devices communicatively coupled with the electronic device on
a wireless network.
[0033] In some implementations, the first noise was provided at a
first time, wherein the one or more processors are configured to
execute the instructions such that the one or more processors and
memory are configured to: detect second noise within an environment
of the electronic device at a second time after the first time;
determine second characteristics of the second noise, the first
characteristics and the second characteristics being similar;
determine second privacy expectations regarding the second noise
based on the second characteristics, the first privacy expectations
and the second privacy expectations being different based on a time
difference between the first time and the second time; and provide
the second noise to one or both of the local resources of the
electronic device or the cloud server based on the second privacy
expectations.
BRIEF DESCRIPTION OF THE DRAWINGS
[0034] FIG. 1 illustrates an example of an assistant device
responding to voice input.
[0035] FIGS. 2A and 2B illustrate an example of a block diagram for
an assistant device responding to voice input.
[0036] FIG. 3 illustrates an example of an assistant device using
local resources and cloud resources to respond to voice input.
[0037] FIG. 4 illustrates an example of a block diagram for using
local resources and cloud resources to respond to voice input.
[0038] FIG. 5 illustrates an example of a block diagram of
determining privacy expectations.
[0039] FIG. 6 illustrates an example of an assistant device.
DETAILED DESCRIPTION
[0040] This disclosure describes devices and techniques for
managing privacy in an environment with connected devices. In one
example, a home assistant device can listen to speech asking a
question in its vicinity using a microphone array and provide an
audible answer to the question using a speaker. Some speech can
include a hardware activation phrase in which the home assistant
device can record and provide the rest of the speech subsequent to
the hardware activation phrase to a server in the cloud via the
Internet. The server in the cloud can then provide the answer by
providing results data. A second hardware activation phrase can
result in keeping the speech within the local resources of the
home's connected environment, for example, the home assistant
device itself can try to answer the question. For example, if the
speech is "Cloud, what is today's date?" then "cloud" can be a
hardware activation phrase indicating that "what is today's date?"
can be provided to a cloud server. By contrast, if the speech is
"Local, what is today's date?" then "local" can be a hardware
activation phrase indicating that "what is today's date?" should be
kept within the local resources of the home's connected
environment. In this way, some speech can be kept locally within
the home's connected environment rather than transmitted to a
server in the cloud. This can allow some users to try to get
answers to their questions that might include content that they
might not want to leave their home environment due to privacy
concerns.
[0041] In another example, some speech can include a portion that
can be answered by the local resources of the home's connected
environment and another portion that can be answered by the cloud
resources. The answers from the local resources and the cloud
resources can then be analyzed and/or combined to provide an
answer. As a result, speech can be provided to one or both of the
cloud server and local resources without the use of a hardware
activation phrase.
[0042] In another example, the home assistant device can determine
who is speaking to it (e.g., based on voice recognition, video
recognition using a camera, etc.) and then determine the speaker's
privacy expectations and then use the local resources, cloud
resources, or both to provide an answer based on the determined
privacy expectations. The context, content, timing, or other
characteristics of the speech can be used to determine whether
speech should be provided to the local resources, cloud resources,
or both.
[0043] In more detail, FIG. 1 illustrates an example of an
assistant device responding to voice input. In FIG. 1, home
assistant device 105 can include an intelligent home assistant
enabled by a microphone (or microphone array) to hear speech 110
and provide a response to speech 110 using one or more speakers.
For example, speech 110 can include a question and the response
provided by home assistant device 105 can be an answer to that
question provided in a voice output using the speakers.
Accordingly, the experience using home assistant device 105 can be
based on audio such as voice. However, in other implementations,
the responses provided by home assistant device 105 can also be
provided on a display screen, or a combination of both audio and
the display screen. For example, answers to spoken questions can be
textually or graphically displayed on the display screen of home
assistant device 105.
[0044] In FIG. 1, based on whether the hardware activation phrase
is local hardware activation phrase 125 or cloud hardware
activation phrase 130, speech 120 is provided to either cloud
server 115 or kept within local resources 140, for example, home
assistant device 105 itself (e.g., its own hardware and software
capabilities) or other devices within the home's wireless network
(e.g., a tablet, laptop, etc. that home assistant device 105 is at
least communicatively coupled with). Local hardware activation
phrase 125 and cloud hardware activation phrase 130 can be
different words or phrases of speech. For example, if cloud
hardware activation phrase 130 is spoken and then speech 120 is
subsequently spoken, then home assistant device 105 can determine
that cloud hardware activation phrase 130 was spoken and then
record speech 120 and provide speech 120 to cloud server 115.
Speech 120 can include a question or other type of content in which
a response from home assistant device 105 is expected or would be
useful. Cloud server 115 can analyze speech 120 (e.g., either
translated into text by home assistant device 105, or audio data
including the speech) and provide results 135b to home assistant
device 105 providing the response. Results 135b can be data such as
text that home assistant device 105 can convert into speech played
on its speakers, or results 135b can be the audio that should be
played on the speakers. As such, speech 120 can be transmitted
outside of the home's wireless network to cloud server 115 via the
Internet.
[0045] By contrast, if local hardware activation phrase 125 is
spoken, then speech 120 can be kept within local resources 140,
which can be home assistant device 105 itself, other devices within
the home's wireless network (e.g., a personal computer, laptop,
tablet, smartphone, smartwatch, etc.), or a combination of home
assistant device 105 and the other devices. For example, in FIG. 1,
speech 120 can be provided to local resources 140 and results 135a
can be provided via the speaker in a similar manner as described
above with respect to results 135b.
[0046] Some users might want to keep speech 120 within local
resources 140 rather than cloud server 115 because they might not
want sensitive content to be transmitted over the Internet to cloud
server 115 outside of the home environment. As a result, by having
two different hardware activation phrases, a user can still use
home assistant device 105 without fear of their privacy being
violated. This can also allow for users to be more comfortable
using home assistant device 105 because the user has more control
over the privacy of their speech.
[0047] FIGS. 2A and 2B illustrate an example of a block diagram for
an assistant device responding to voice input. In FIG. 2A, at block
205, a home device can receive speech. For example, in FIG. 1, home
assistant device 105 can pick up speech 110 via its microphone. In
some implementations, the speech can be recorded (e.g., saved in
memory) for analysis by a processor of home assistant device 105.
At block 210, the home device can determine which type of resource
to use based on the hardware activation phrase of speech 110. For
example, in FIG. 1, the hardware activation phrase can be local
hardware activation phrase 125 representing an intent for the
subsequent speech 120 to be contained within local resources 140 or
cloud hardware activation phrase 130 representing an acceptability
for the subsequent speech 120 to be provided to cloud server 115.
In block 215, the speech can then be provided to the resource
corresponding to the activation phrase.
[0048] If cloud hardware activation phrase 130 is spoken, then at
block 220, the speech can be received by cloud server 115 and
results based on that speech can be determined at block 225. For
example, if speech 120 included a question, then results 135b
including an answer to the question can be generated and provided
to home assistant device 105 at block 230.
[0049] If local hardware activation phrase 125 is spoken, then at
block 235 in FIG. 2B, the speech can be provided to local resources
140. Local resources 140 can receive speech 120 at block 240,
determine results based on the speech similar to block 225, and
then provide the results at block 245.
[0050] In some implementations, home assistant device 105 can
include an alert indicating that speech 120 in FIG. 1 is about to
be transmitted outside of the home environment to cloud server 115.
For example, a light source such as a light emitting diode (LED) of
home assistant device 105 can be turned on to indicate that speech
120 is about to be transmitted to cloud server 115. The user can
then interact with home assistant device 105, for example, by
pressing a button or using voice interaction to indicate that
speech 120 should not be transmitted to cloud server 120. In some
implementations, speech 120 can then be attempted to be answered
within local resources 140. In some implementations, home assistant
device 105 can also indicate that speech 120 will be kept within
local resources 140 in a similar manner.
[0051] Home assistant device 105 can also be instructed to send
speech to cloud server 120 or local resources 140 based on other
types of user interactions other than providing a hardware
activation phrase. For example, the user can select or touch a
button or touchscreen of home assistant device 105 to indicate that
speech should be kept within local resources 140. As another
example, the user can select an application, or "app," on a
smartphone, press a button on a remote control, press a button on a
smartwatch, etc. to indicate that speech should be kept within
local resources 140.
[0052] In some implementations, local hardware activation phrase
125 and cloud hardware activation phrase 130 can be set by the
user. For example, local hardware activation phrase 125 can be a
phrase including multiple words, a single word, a sound (e.g.,
whistling), etc. assigned by a user. The user assign another
phrase, word, sound, etc. to cloud hardware activation phrase 130
such that they can be differentiated from each other.
[0053] Portions of speech can be provided to both cloud resources
and local resources. FIG. 3 illustrates an example of an assistant
device using local resources and cloud resources to respond to
voice input. In FIG. 3, speech 120 can be provided to home
assistant device 105 and it can determine that speech 120 includes
portion 305 that should be provided to local resources 140 and
portion 310 that should be provided to cloud server 115. That is,
speech 120 can include one portion for local resources and another
portion for cloud resources even without the use of a hardware
activation phrase. Home assistant device 105 can separate the two
portions (e.g., based on characteristics of the portions, as
discussed later herein) and provide them to the respective
resources (i.e., either local resources 140 or cloud server 115).
This results in results 315a and 315b provided to home assistant
device 105. Home assistant device 105 can use both results 315a and
315b to provide an answer to speech 120. For example, both can be
combined to provide an answer. That is, results from both local
resources 140 and cloud server 115 can be used to provide a
response to a user's speech. In some implementations, if there is
some inconsistency with the answers provided by results 315a and
315b, then one of local resources 140 or cloud server 115 can be
prioritized and the results of that one can be used for the answer
or the corresponding portion of the answer.
[0054] FIG. 4 illustrates an example of a block diagram for using
local resources and cloud resources to respond to voice input. In
FIG. 4, at block 405, speech can be received. For example, in FIG.
3, speech 120 having local portion 305 and cloud portion 310 can be
received. At block 410, it can be determined that the speech
includes a first portion to be provided to cloud resources and a
second portion to be provided to local resources. For example, in
FIG. 3, cloud portion 310 should be provided to cloud server 115
and local portion 305 should be provided to local resources 140. In
some implementations, the different portions of the speech can be
determined based on characteristics of one or more of the portions
of the speech (e.g., content, time, location, person or identity of
person providing the speech, etc.). For example, if a certain word
has been detected, then a portion of the speech within a time
period before and after that word was spoken can be identified as
one of the portions (e.g., local portion 305) and the rest of the
speech can be identified as the other portion (e.g., cloud portion
310). In one example, some words can be identified as being related
to sensitive speech that a user might not want to be sent to cloud
server 115, and therefore, if the certain word is identified as a
sensitive word then local portion 305 can be identified. Thus, home
assistant device 105 can include a dictionary (e.g., data in
memory) of sensitive words that can be identified. At block 415,
the different portions can be provided to the resources. For
example, new speech data for the portions can be generated and
provided to cloud server 115 and local resources 140 at blocks 420
and 435, respectively. At blocks 425 and 440, results based on the
portions can be determined by the resources. The results can then
be provided to the home assistant device at blocks 430 and 445.
Home assistant device 105 can then use both results received from
the cloud resources and local resources to provide a response. For
example, both can be combined to provide an answer to a question
that was asked. That is, results from both local resources 140 and
cloud server 115 can be used to provide a response to a user's
speech.
[0055] Regarding characteristics of the speech, home assistant
device 105 can determine portions of speech 120 that are relatively
sensitive and classify those portions as local portion 205 and
provided to local resources 140. Portions that are not sensitive
can be classified as cloud portion 210 and provided to cloud server
115. For example, home assistant device 105 can develop an
understanding of a user's privacy expectations and classify speech
as local portion 205 based on the user's privacy expectations.
Thus, characteristics of the speech can result in different privacy
expectations and those privacy expectations can be used to
determine whether speech should be provided to cloud server 115 or
local resources 140.
[0056] In some implementations, home assistant device 105 can
determine who is speaking. For example, home assistant device 105
can use voice recognition to determine a particular user. In
another example, home assistant device 105 can include a camera to
visually determine who is speaking, or home assistant device 105
can access a camera connected with the home's wireless network or a
personal area network (PAN) set up by either the camera or home
assistant device 105. Based on the user interacting with home
assistant device 105, different privacy expectations can be
determined. As a result, different users can say the same speech
120, but different local portion 305 and cloud portion 310 may be
identified based the privacy expectations of the user.
[0057] In some implementations, other characteristics of speech 120
can be used to determine the privacy expectations, and therefore,
whether local resources 140 or cloud server 115 is to be used for
speech 120 or a portion of speech 120. For example, the context
(e.g., multiple people talking, whether the user appears to be
incapacitated in some manner such as intoxicated, etc.) of speech
120 can be used. In another example the content of speech 120 can
be used, as previously discussed. For example, if the user is
identified as speaking often regarding privacy concerns, discussing
topics related to privacy, etc. then the privacy expectations of
that user can be increased. In another example, the time when
speech 120 was received by home assistant device 105 can be used.
In one example, if speech 120 was received late at night or early
in the morning, then this can indicate a higher privacy
expectation.
[0058] In another example, if a user's speech is quiet (e.g., the
volume of the speech is determined to be within a threshold volume
range or beneath a threshold volume value), then this can mean that
the user expects more privacy, and therefore, the privacy
expectations for that speech can be stricter, increasing the
likelihood of the speech or portions of the speech being provided
to local resources 140 rather than cloud server 115. If the volume
of the user's speech is loud, then this can indicate that it is not
a sensitive topic, and therefore, the speech can be provided to
cloud server 115. Thus, audio characteristics of the speech can be
used. In other examples, stuttering, mumbling, etc. can also be
used to determine the privacy expectations. For example, if a user
is stuttering, the he or she may be nervous (e.g., due to the
content of the speech) and, therefore, might not want the speech to
be provided to cloud server 115.
[0059] Other characteristics of the speech of the users can be
determined to adjust the privacy expectations. For example, the
distance of the user from home assistant device 105 can be used to
determine the user's privacy expectations. If the user is close to
home assistant device 105 (e.g., determined to be within a
threshold distance range of home assistant device 105 using cameras
or audio recognition), then this can indicate that the user has
higher privacy expectations, and therefore, the speech or portions
of the speech should be provided to local resources 140 rather than
cloud server 115. If the user is farther away, then this might
indicate that the user has lower privacy expectations.
[0060] In some implementations, the location of the speech can
influence whether the speech is kept within local resources 140,
cloud server 115, or both. For example, if speech is from
participants in a bedroom, then it might be kept within local
resources 140 due to that speech being from a more sensitive
location where many people have a higher expectation of privacy. By
contrast, if speech is from participants in a living room, then it
can be provided to cloud server 115. Accordingly, home assistant
device 105 can determine the location of speech and then determine
whether that speech should be kept within local resources 140,
cloud server 115, or both based on the location within the home
environment.
[0061] In some implementations, home assistant device 105 can
determine that a user's privacy expectations have changed. For
example, home assistant device 105 can store the user's birthdays
or ages, and as the user ages, the privacy expectations can become
stricter (i.e., more speech is to be restricted to local resources
140 rather than allowed to be transmitted to cloud server 115). In
another example, as the user ages, the privacy expectations can be
more lenient (i.e., more speech is to be allowed to be transmitted
to cloud server 115).
[0062] FIG. 5 illustrates an example of a block diagram of
determining privacy expectations. In FIG. 5, at block 505,
characteristics of the speech can be determined. For example, as
previously discussed, the context of the speech, the volume of the
speech, etc. can be used to determine various characteristics. At
block 510, privacy expectations can be determined based on those
characteristics. For example, if a user is speaking quietly, then
higher privacy expectations can be determined than if the user is
speaking loudly. At block 515, the speech can be provided to cloud
resources or local resources based on the privacy expectations. For
example, higher privacy expectations can result in speech being
provided to local resources rather than cloud resources.
[0063] In some implementations, home assistant device 105 can be
set with user preferences as to what should be provided to local
resources 140 and cloud server 115. In some implementations, home
assistant device 105 can learn the user's privacy expectations over
time.
[0064] Many of the aforementioned examples discuss speech including
a question. However, in other examples, the speech can include
commands. For example, home assistant device 105 can be commanded
to perform an activity, such as turn on lights, open windows, turn
on a security system, etc. in a smart home environment. In some
implementations, speech including commands can be provided to cloud
server 115 and it can perform speech-to-text translation. It can
then provide results to home assistant device 105 with what it's
supposed to do. That is, it can be provided data indicating how it
should be responding to the commands, for example, turn on lights.
Home assistant device 105 can then act on those commands. This can
allow for cloud server 115 to perform the processing to determine
the content of speech, but home assistant device 105 to actually
perform the commands rather than cloud server 115.
[0065] In some implementations, home assistant device 105 can
process a subset of possible speech on-device, but speech outside
of its capabilities can be provided to cloud server 115. For
example, home assistant device 105 might be able to recognize
speech for a small dictionary (e.g., four hundred words) so that it
can perform common commands, such as turning on lights, adjusting a
thermostat, etc. This can allow home assistant device 105 to
control various devices in the home without transmitting data to
cloud server 115, and therefore, it can still control devices even
if the Internet connection to cloud server 115 goes down. However,
more complex speech including commands can be determined to include
content outside of the dictionary, and therefore, can be provided
to cloud server 115 for processing.
[0066] Home assistant device 105 can also provide a response based
on whether results are received from cloud resources, local
resources, or both. For example, home assistant device 105 can play
back audio response to speech 120 at different volumes based on
where the response or a portion of the response was received from.
If results 315a in FIG. 3 is received (i.e., some speech was
provided to local resources), then the volume of playback of the
response to speech 120 can be lower than if only results 315b
(i.e., speech was provided to cloud server 115) were received. In
some implementations, the response to speech 120 can be displayed
on the display screen of home assistant device 105 if speech was
provided to local resources. In another example, if the results are
only from cloud server 115, then the response can be played back on
the speaker of home assistant device 105.
[0067] In some implementations, privacy expectations can be
determined using many of the aforementioned examples. An increase
in privacy expectations can result in home assistant device 105
encrypting data provided to cloud server 115 more, for example,
using different encryption algorithms that might take longer to
encrypt and for cloud server 115 to decrypt. However, some users
might find a delay acceptable if their privacy is ensured. Thus, a
hierarchy of encryptions levels can provide different levels,
strengths, or types of encryption based on the determined privacy
expectations.
[0068] In some implementations, home assistant device 105 can
include an intercom feature and a home environment can include
multiple home assistant devices. The different home assistant
devices can communicate with each other and other devices (e.g.,
speakers) using technology such as Bluetooth, local WLAN, etc. This
can allow users to communicate securely within a home without
having communications routed through cellular communications.
[0069] In some implementations, whether speech is provided to cloud
resources or local resources can also be based on the context of an
activity. For example, the activity can be understood through the
context of what is being communicated. In other implementations,
the context can include the time of day, past behaviors, or other
variables.
[0070] Many of the aforementioned examples discuss a home
environment. In other examples, the devices and techniques
discussed herein can also be set up in an office, public facility,
outdoors, etc.
[0071] Many of the aforementioned examples discuss speech. In other
examples, noise within the environment can be used with the devices
and techniques disclosed herein. For example, music, television
sounds, etc. can be used. In another example, environmental sounds
such as glass breaking, objects shattering, etc. can be determined
and provided to one or both of the local resources or cloud server
based on the techniques disclosed herein.
[0072] FIG. 6 illustrates an example of an assistant device. In
FIG. 6, home assistant device 105 can be an electronic device with
one or more processors 605 (e.g., circuits) and memory 610 for
storing instructions that can be executed by processors 605 to
implement privacy control 630 providing the techniques described
herein. Home assistant device 105 can also include microphone 620
(e.g., one or more microphones that can implement a microphone
array) to convert sounds into electrical signals, and therefore,
speech into data that can be processed using processors 605 and
stored in memory 610. Speaker 615 can be used to provide audio
output. Additionally, display 625 can display a graphical user
interface (GUI) implemented by processors 605 and memory 610 to
provide visual feedback. Memory 610 can be a non-transitory
computer-readable storage media. Home assistant device 105 can also
include various other hardware, such as cameras, antennas, etc. to
implement the techniques disclosed herein. Thus, the examples
described herein can be implemented with programmable circuitry
(e.g., one or more microprocessors) programmed with software and/or
firmware, or entirely in special-purpose hardwired
(non-programmable) circuitry, or in a combination of such forms.
Special-purpose hardwired circuitry may be in the form of, for
example, one or more application specific integrated circuits
(ASICs), complex programmable logic devices (CPLDs), field
programmable gate arrays (FPGAs), structured ASICs, etc.
[0073] Those skilled in the art will appreciate that the logic and
process steps illustrated in the various flow diagrams discussed
herein may be altered in a variety of ways. For example, the order
of the logic may be rearranged, sub-steps may be performed in
parallel, illustrated logic may be omitted, other logic may be
included, etc. One will recognize that certain steps may be
consolidated into a single step and that actions represented by a
single step may be alternatively represented as a collection of
substeps. The figures are designed to make the disclosed concepts
more comprehensible to a human reader. Those skilled in the art
will appreciate that actual data structures used to store this
information may differ from the figures and/or tables shown, in
that they, for example, may be organized in a different manner; may
contain more or less information than shown; may be compressed,
scrambled and/or encrypted; etc.
[0074] From the foregoing, it will be appreciated that specific
embodiments of the invention have been described herein for
purposes of illustration, but that various modifications can be
made without deviating from the scope of the invention.
Accordingly, the invention is not limited except as by the appended
claims.
* * * * *