U.S. patent application number 15/912519 was filed with the patent office on 2018-10-11 for smart bluetooth headset for speech command.
The applicant listed for this patent is Kopin Corporation. Invention is credited to Dashen Fan, John Gassel, Frederick Herrmann, Murshed Khandaker, Christopher Parkinson.
Application Number | 20180295656 15/912519 |
Document ID | / |
Family ID | 52463243 |
Filed Date | 2018-10-11 |
United States Patent
Application |
20180295656 |
Kind Code |
A1 |
Parkinson; Christopher ; et
al. |
October 11, 2018 |
Smart Bluetooth Headset For Speech Command
Abstract
A method of interfacing with a serving device from a wearable
device worn by a user, the method includes establishing a lossless
and wireless data link between the serving device and the wearable
device. The method further includes sending, by the serving device,
display information to the wearable device through the lossless and
wireless data link. The method also includes presenting, by the
wearable device, the display information to a display on the
wearable device. The display information may be rendered at the
wearable device. Alternatively, the display information may be
rendered at the serving device and provided to the wearable device
as an image or partial image.
Inventors: |
Parkinson; Christopher;
(Richland, WA) ; Fan; Dashen; (Bellevue, WA)
; Herrmann; Frederick; (Sharon, MA) ; Gassel;
John; (Southborough, MA) ; Khandaker; Murshed;
(Sharon, MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Kopin Corporation |
Westborough |
MA |
US |
|
|
Family ID: |
52463243 |
Appl. No.: |
15/912519 |
Filed: |
March 5, 2018 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14612832 |
Feb 3, 2015 |
9913302 |
|
|
15912519 |
|
|
|
|
61935141 |
Feb 3, 2014 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04M 1/6066 20130101;
H04M 2250/74 20130101; H04W 76/10 20180201; G10L 19/005 20130101;
H04L 65/602 20130101; G10L 19/0017 20130101 |
International
Class: |
H04W 76/10 20060101
H04W076/10; H04M 1/60 20060101 H04M001/60; G10L 19/005 20060101
G10L019/005; G10L 19/00 20060101 G10L019/00; H04L 29/06 20060101
H04L029/06 |
Claims
1. A method of interfacing with a serving device from a wearable
device worn by a user, the method comprising: establishing a
lossless and wireless data link between the serving device and the
wearable device; sending, by the serving device, display
information to the wearable device through the lossless and
wireless data link; and presenting, by the wearable device, the
display information to a display on the wearable device.
2. The method of claim 1, wherein the wearable device is a headset
device.
3. The method of claim 1, wherein the wearable device is a wrist
watch device.
4. The method of claim 1, further comprising rendering the display
information at the serving device, such that the display
information is one of an image and a partial image.
5. The method of claim 1, further comprising rendering the display
information at the wearable device, such that the display
information is un-rendered image data.
6. The method of claim 1, wherein the serving device is one or more
of a cellphone, a smartphone, a tablet device, a laptop computer, a
notebook computer, a desktop computer, a network server, a wearable
mobile communications device, a wearable mobile computer and a
cloud-based computing entity.
7. The method of claim 1, further including sending, from the
wearable device to the serving device, information to establish, at
the serving device, one or more components necessary to support the
lossless and wireless data link.
8. The method of claim 7, wherein the one or more components
necessary to support the lossless and wireless data link includes
(i) one or more of a custom WIFI connection and a custom Bluetooth
profile, (ii) a driver and (iii) compression/decompression
code.
9. The method of claim 1, wherein the lossless and wireless data
link is a Bluetooth link operating with a custom Bluetooth
profile.
10. A wearable device, comprising: a display; a receiver configured
to receive display information over a lossless, wireless data
channel; and a driver configured to present the display information
on the display.
11. The wearable device of claim 10, wherein the driver is further
configured to render the display information into one of an image
and a partial image and to present the one of an image and a
partial image on the display
12. The wearable device of claim 10, wherein the display
information received is one of an image and a partial image.
13. The wearable device of claim 10, wherein the lossless, wireless
data channel is based on a Bluetooth SPP profile.
14. The headset of claim 10, further including a code deployment
module configured to convey a custom Bluetooth profile and driver
to a serving device, to facilitate implementation of the lossless
link at the serving device.
15. The headset of claim 14, wherein the code deployment module
conveys a applet to the serving device to install the custom
Bluetooth profile and driver on the serving device.
Description
RELATED APPLICATION
[0001] This application is a continuation of U.S. application Ser.
No. 14/612,832, filed Feb. 3, 2015, which claims the benefit of
U.S. Provisional Application No. 61/935,141, filed on Feb. 3, 2014.
The entire teachings of the above applications are incorporated
herein by reference.
BACKGROUND
[0002] A Bluetooth headset designed to pair with a cellphone, or
other serving device, typically employs a Bluetooth Hands-Free
Profile (HFP) or Bluetooth HeadSet Profile (HSP) to control how
audio is passed from the cellphone to the headset. The HFP or HSP
profile allows incoming audio data on the cellphone to be relayed
directly to the headset for immediate playback via a near-ear
speaker. Simultaneously, audio collected at the headset from one or
more near-mouth microphones is passed immediately to the cellphone,
which includes the collected audio in the current audio telephone
call.
SUMMARY
[0003] Bluetooth headsets may offer some form of speech recognition
to the user. Such speech recognition can be used to control
features of the cellphone and to provide the user the ability to
place calls just by speaking a command. However, to-date all
Bluetooth headsets either run the speech recognition service
directly on the Bluetooth headset itself, or use cloud-based
recognition systems. A drawback of the former speech recognition
service is the need for complex, expensive electronics in the
headset. A drawback of the latter speech recognition service is the
requirement of an always-on connection to the cloud.
[0004] In Bluetooth devices, speech recognition services have
utilized the HFP or HSP for audio data transmission. The band for
the HFP or HSP is 8 kHz, which is generally too narrow for proper
speech recognition. To address this problem, a new Bluetooth HFP
standard (v1.6), Wide-Band-Speech (WBS) with a 16 kHz sampling
rate, has been used recently, together with compression method such
as modified subband coding (mSBC).
[0005] Both HFP and HSP, which are designed for voice transmission,
are lossy. (e.g., they sometimes lose voice packets or data). HFP
and HSP typically do not re-transmit the lost voice packets at all,
or re-transmit them at most once or twice to limit delay of the
wireless phone call and continuation the wireless conversation.
Losing a packet or two of speech data may be barely noticeable in
the decoded speech output. Packet erasure concealment algorithms
further reduce the speech degradation caused by missing speech
packets. More important is reducing delay or lag in the cell phone
conversation, so a lossy link is more acceptable than a high
latency link for speech channels.
[0006] While it does not have a major impact on a cell phone call,
the lost packet significantly degrades speech recognition.
Bluetooth so far does not have a standard profile to address the
problem of packet erasure when used for speech recognition
purposes. The lossy protocol in voice channel has yet to be
addressed in Bluetooth. In addition, HFP and HSP do not cancel
enough non-stationary noises and can distort voice transmissions,
which can degrade the accuracy of speech recognition.
[0007] In an embodiment of the present invention, a standard
Bluetooth headset is improved to provide better speech recognition
and deliver information to the user. In addition, the present
invention substantially improves the voice recognition by
addressing the loss of data packet problem in the Bluetooth.
[0008] In some embodiments, the Bluetooth device may be, rather
than a headset, another type of wearable device. Such wearable
devices may include a wrist-worn device, a device worn on the upper
arm or other part of the body.
[0009] In one aspect, the invention may be a method of interfacing
with a serving device from a wearable device worn by a user. The
method may include establishing a lossless and wireless data link
between the serving device and the wearable device, collecting, by
the wearable device, audio data from one or more microphones of the
wearable device. The method may further include sending, by the
wearable device, the collected audio data to the serving device
through the lossless and wireless data link.
[0010] In one embodiment, the wearable device is a headset device.
In another embodiment, the wearable device is a wrist watch
device.
[0011] One embodiment further includes providing, by the serving
device, speech recognition services associated with the audio
data.
[0012] In an embodiment, the speech recognition services include
wide band speech processing and (iii) low-distortion speech
compression.
[0013] Another embodiment further includes providing, by the
wearable device, speech compression of the collected audio
data.
[0014] In one embodiment, the serving device is one or more of a
cellphone, a smartphone, a tablet device, a laptop computer, a
notebook computer, a desktop computer, a network server, a wearable
mobile communications device, a wearable mobile computer and a
cloud-based computing entity.
[0015] Another embodiment further includes providing, by the
wearable device, noise cancellation services associated with the
collected audio data. Another embodiment further includes sending,
from the wearable device to the serving device, information to
establish, at the serving device, one or more components necessary
to support the lossless and wireless data link.
[0016] In one embodiment, the one or more components necessary to
support the lossless and wireless data link includes (i) one or
more of a custom WIFI connection and a custom Bluetooth profile,
(ii) a driver and (iii) compression/decompression code.
[0017] In another embodiment, the lossless and wireless data link
is a Bluetooth link operating with a custom Bluetooth profile.
[0018] In another aspect, the invention may be a method of
establishing a lossless and wireless data link between a serving
device and a wearable device. The method may include establishing,
by the wearable device, a wireless link of a first protocol between
the wearable device and the serving device. The method may further
include establishing, by the wearable device and using the wireless
link of the first protocol, a lossless wireless link of a second
protocol. The method may further include conveying to the serving
device, by the wearable device, information to establish, at the
serving device, one or more components necessary to support the
lossless and wireless data link.
[0019] In one embodiment, the one or more components necessary to
support the lossless and wireless data link includes a custom
Bluetooth profile, a driver and compression/decompression code. In
another embodiment, the wireless link of a first protocol is a
lossy Bluetooth link, and the wireless link of a second protocol is
a lossless Bluetooth link. In another embodiment, the lossless
Bluetooth link is based on a Bluetooth SPP profile. In another
embodiment, the lossless and wireless link of a first protocol is a
Bluetooth link operating with a custom Bluetooth profile.
[0020] In another aspect, the invention may be a wearable device,
including at least one microphone, at least one speaker, a voice
compression engine, and a driver configured to transmit voice
packets over a lossless, wireless data channel.
[0021] In one embodiment, the lossless, wireless data channel is
based on a Bluetooth SPP profile. In another embodiment, the voice
compression engine includes one or more of (i) Sub-Band-Coder, (ii)
Speex, and (iii) ETSI Distributed Speech Recognition.
[0022] One embodiment may further include a noise cancellation
engine. In another embodiment, the noise cancellation engine
receives an audio signal from two or more sources, and uses linear
noise cancellation algorithms to reduce ambient noise
[0023] One embodiment may further include a code deployment module
configured to convey a custom Bluetooth profile and driver to a
serving device, to facilitate implementation of the lossless link
at the serving device. In another embodiment, the code deployment
module conveys a applet to the serving device to install the custom
Bluetooth profile and driver on the serving device.
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The foregoing will be apparent from the following more
particular description of example embodiments of the invention, as
illustrated in the accompanying drawings in which like reference
characters refer to the same parts throughout the different views.
The drawings are not necessarily to scale, emphasis instead being
placed upon illustrating embodiments of the present invention.
[0025] FIG. 1 is a block diagram illustrating an example embodiment
of connecting a headset with a cellphone using two audio links.
[0026] FIG. 2 is a block diagram illustrating an example embodiment
of processing and transmitting of audio signal for speech
recognition according to the invention.
[0027] FIG. 3 is a block diagram illustrating another example
embodiment of processing and transmitting of audio signal for
speech recognition according to the invention.
DETAILED DESCRIPTION
[0028] A description of example embodiments of the invention
follows.
[0029] FIG. 1, described in more detail below, is an example
embodiment of the invention. This embodiment concerns two primary
components--a headset 102 and a serving device 104, connected by
one or more wireless links. The serving device 104 may be any
device that could implement a wireless link to a hands-free
headset, including but not limited to a cellphone, a smartphone, a
tablet device, a laptop computer, a notebook computer, a desktop
computer, a network server, a wearable mobile communications
device, a wearable mobile computer or a cloud-based entity. The
wearable device may include a device worn on the user's wrist,
upper arm, leg, waist or neck, or any other body part suitable for
supporting a communications and/or computing device. Similarly the
headset 102 component may be, rather than a headset, a device worn
on the user's wrist, upper arm, leg, waist or neck, or any other
body part suitable for supporting wireless device (e.g., Bluetooth
or WiFi device).
[0030] In embodiments of the present invention, the serving device
directly hosts a speech recognition service. To facilitate this
hosting, the embodiments establish a new, secondary data link from
the serving device to the headset. The secondary data link should
be lossless. The secondary data link may be a Bluetooth data link.
The secondary Bluetooth data link may be used to send the
near-mouth microphone input (or a second copy of the microphone
input, if the HFP link is active) to the serving device, which is
running a speech recognition service/speech recognition processing
software. The secondary Bluetooth data link preserves the original
Hands-free Profile link and ensures ongoing compatibility with the
cellphone's existing firmware. In taking this approach, compression
schemes can compress the audio data between serving device and
headset in ways not supported by standard Hands-Free profiles
(e.g., by using compression/decompression schemes that require a
lossless data path).
[0031] With this system setup, the user can speak a command to the
headset. The command (e.g., the spoken audio) is immediately
conveyed to the serving device via the secondary Bluetooth data
link, whereupon the audio is passed into a speech recognition
system. Depending on the commands spoken, the speech recognizer is
then able to take appropriate action, such as initiating a new call
to a given phone number.
[0032] Furthermore, with this system in place, functionality is no
longer confined to just establishing telephone calls. Natural
sentences can be spoken by the headset wearer to action other
important functions, such as "send SMS message to John that I shall
be late tonight". This sentence, when processed by speech
recognition and natural language/natural language understanding
engines on the serving device or on a network server through the
wireless link can be used to create and send appropriate SMS
messages, for example. In the same way, the user can query the
state of the phone or perform web-based queries by speaking to the
headset and letting the serving device perform speech recognition
and execute an action appropriate to the recognized speech.
[0033] At the same time as using the secondary Bluetooth data link
to collect microphone data and send to the serving device, the link
can also send audio from the serving device back to the headset for
playback via the near-ear speaker. In particular, this is used to
convey information back to the headset wearer via computer
generated spoken phrases, aka Text-to-Speech (TTS).
[0034] For example, software running on the serving device can
detect an incoming SMS text message. Typically a serving device
alerts the user with a chime and can display the incoming message
on the screen. In an embodiment of the present invention, the SMS
message can be converted to speech (e.g., text-to-speech) one the
server side, and the speech audio of the reading can be sent over
the Bluetooth link for playback to the user. The result here is a
system that reads aloud incoming messages to the user without the
user having to operate or look at the serving device.
[0035] This technique can be combined with the speech recognition
service to provide a two-way question and answer system. For
example, the user can now speak to the headset to ask a question
such as "what time is it?" This audio can be processed by the
speech recognition service, an answer calculated, and then spoken
aloud to the user.
[0036] FIG. 1 is a block diagram illustrating an example embodiment
of connecting a headset 102 with a serving device 104 using two
bi-directional channels; a lossless data link 106 and a lossy data
link 108. In this example embodiment, the lossless data link 106 is
a Bluetooth link using Serial Port Profile (SPP), and the lossy
data link 108 is a Bluetooth link using the headset profile (HSP)
or the hands free profile (HFP). In other embodiments, the lossless
data link 106 may be another digital data link such as WiFi or
other wireless technologies known in the art.
[0037] As will be described in more detail below, while SPP may
provide an underlying basis for a lossless data link, the profile
itself does not provide lossless transmission. As of this time,
Bluetooth does not provide a standard profile to address the
problem of packet loss, in particular when used for speech
recognition purposes. A customized profile is required, or at the
very least a modified version of the SPP is required.
[0038] In this example, the lossless data link 106 is established
and allowed to remain active as long as both the serving device 104
and the headset 102 are active (i.e., turned on). The lossy data
link 108, on the other hand, is active only when the user of the
headset 102 is making a voice call.
[0039] In this example embodiment, one or more microphones 110 on
the headset 102 collect audio data. Audio can then, optionally, be
passed through a noise cancellation module 112 on the headset 102
to reduce background noise and improve speech recognition. The use
of multiple microphones 110 may further improve the overall noise
cancellation performance by more effectively canceling both
stationary and non-stationary noises.
[0040] The microphone audio 114 may then be split into two streams,
as shown. One of the audio streams is sent to the lossless data
link 106 and one to the lossy data link 108.
[0041] As described earlier, the lossy data link 108 is only
established between headset 102 and serving device 104 as
associated with an active telephone call. Thus, this communication
link is intermittent. When the lossy data link 108 is established,
one of the audio streams is sent to the serving device 104 as part
of normal, hands-free system. Audio is sent from serving device 104
to headset 102 over the lossless data link 106. Audio may also be
sent from the serving device 104 to the headset 102 over the lossy
(HFP or HSP) data link 108, in the event that the serving device
operation requires a call to occur over the HFP or HSP data link
108. In some embodiments, the audio may be in the form of computer
generated spoken phrases (e.g., Text-To-Speech service), which are
played back on headset.
[0042] If a Bluetooth Hands-free call is active, the audio is also
played back on the headset 102 and merged with any spoken phrases
from the lossless data link 106 (also referred to herein as
command/control link). The audio received through the lossless data
link 106 may be given priority by temporarily muting the telephone
call speech from the lossy data link 108, or the two audio signals
may be mixed so the user hears both simultaneously, or the audio
from the lossy data link 108 may be temporarily attenuated (i.e.,
partially muted), to make it easier to hear the audio from the
lossless data link 106.
[0043] FIG. 2 and FIG. 3 are block diagrams illustrating example
embodiments of processing and transmission of audio speech signal
for speech recognition. In this example embodiment, audio
information is conveyed between a headset 202 and a serving device
204 across a bidirectional, lossless, wireless data link.
[0044] In the example embodiment shown in FIG. 2, the audio speech
signal is collected from two or more microphones 206, and processed
by a noise cancellation module 208. In one embodiment, noise
cancellation may be processed using linear algorithms to avoid
introducing any non-linear distortion to the speech signal. FIG. 2
illustrates compression of the speech signal with a voice
compression module 210. The compressed speech signal is sent to the
serving device 204 across a lossless, bidirectional, wireless data
link 212, for example a Serial Port Profile (SPP) Bluetooth data
link.
[0045] The serving device 204 receives the compressed speech signal
from the lossless data link 212 and decompresses the compressed
speech data using a voice decompression module 214. The resulting
voice data, acquired through a lossless data path, can be used by
an Automatic Speech Recognition (ASR) engine and/or a Natural
Language Processing engine 216.
[0046] The serving device 204 may have digital speech files (e.g.,
Text-To-Speech (TTS) or WAVE (.wav format)) to send to the headset
202. The speech data is first compressed by a voice compression
module, and send to the headset through the lossless data link 212.
A voice decompression module 222 decompresses the speech data and
provides the data to a TTS or WAVE play module 224, which converts
the audio file to an audio signal that drives a speaker 226.
[0047] FIG. 3 illustrates an embodiment that provides front-end
feature extraction and noise cancellation in the headset 302, with
an ASR backend and a natural language processing (NLP) engine in
the serving device. As with the embodiment of FIG. 2, audio is
collected with two or more microphones 306, a noise cancelation
module 308 reduces ambient noise. Data passes between the headset
302 and the serving device 304 over a lossless data link 312, to an
ASR backend module 330 at the serving device 304. The ASR backend
module 330 provides the processed speech data to an NLP engine. As
with the embodiment shown in FIG. 2, TTS/WAVE files 318 may be
transferred from the serving device 304 to the headset 302 through
a voice compression module 320, the lossless data link 312, a voice
decompression module 322 and a TTS or WAVE player driving a speaker
326. In other embodiments, WAVE files may be stored on the headset,
and initiated for playback on the headset by a simple command
conveyed by the serving device.
[0048] The features highlighted by FIG. 2 and FIG. 3 are examples
of how the described embodiments may be provide useful
functionality. These embodiments may be combined with each other,
or with other embodiments that provide other features.
[0049] The following are examples of voice compression techniques
that may be employed for speech recognition in the described
embodiments: [0050] Sub-Band-Coder (SBC) [0051] Bluetooth WBS mSBC
[0052] Speex (or other Code Excited Linear Prediction (CELP) based
compression algorithms) [0053] Opus [0054] European
Telecommunications Standards Institute (ETSI) Distributed Speech
Recognition (DSR)
[0055] As described above, the Bluetooth Serial Port Profile (SPP)
does not by itself provide lossless transmission. The described
embodiments, however, when used in conjunction with Bluetooth SPP,
do create a lossless data link. The described embodiments implement
at least a custom Bluetooth profile and driver to implement the
operations necessary for a lossless link. Such operations may
include retransmission protocols such as Automatic Repeat reQuest
ARQ, Hybrid ARQ (HARQ), and other lost packet recovery techniques
known in the art. Some embodiments include custom software in both
ends of the Bluetooth link. The software may include custom
Bluetooth profile(s), driver(s) and compression/decompression
codes.
[0056] Some embodiments modify the Bluetooth SPP to provide a
lossless data link, while other embodiments provide a completely
custom Bluetooth profile to provide a lossless data link suitable
for ASR. It should also be noted that while the example embodiments
utilize Bluetooth to provide a wireless link, the described
embodiments may utilize other wireless protocols and interfaces to
provide the described benefits.
[0057] The described embodiments may also provide techniques for
installing the aforementioned custom software and codes at the
serving device side. In some embodiments, the serving device side
may include a pre-installed custom driver. In other embodiments,
the Bluetooth Hands-free device can download an applet (or other
vehicle for conveying the necessary drivers and software) to the
serving device through Bluetooth SPP link described above, once
that Bluetooth link is established.
[0058] The described embodiments can easily be extended to
accommodate a display on the Bluetooth headset. In such an
extension, the information required for display on the headset can
be sent from the cellphone to the headset using the always-on
command and control link. Information can be sent and rendered by
the headset. Alternatively, information can be rendered by the
cellphone and sent as an image or partial image to the headset for
display. This latter method allows for the headset firmware to be
simple and flexible--all of the hard work is done by the
cellphone.
[0059] It will be apparent that one or more embodiments, described
herein, may be implemented in many different forms of software and
hardware. Software code and/or specialized hardware used to
implement embodiments described herein is not limiting of the
invention. Thus, the operation and behavior of embodiments were
described without reference to the specific software code and/or
specialized hardware--it being understood that one would be able to
design software and/or hardware to implement the embodiments based
on the description herein.
[0060] Further, certain embodiments of the invention may be
implemented as logic that performs one or more functions. This
logic may be hardware-based, software-based, or a combination of
hardware-based and software-based. Some or all of the logic may be
stored on one or more tangible computer-readable storage media and
may include computer-executable instructions that may be executed
by a controller or processor. The computer-executable instructions
may include instructions that implement one or more embodiments of
the invention. The tangible computer-readable storage media may be
volatile or non-volatile and may include, for example, flash
memories, dynamic memories, removable disks, and non-removable
disks.
[0061] While this invention has been particularly shown and
described with references to example embodiments thereof, it will
be understood by those skilled in the art that various changes in
form and details may be made therein without departing from the
scope of the invention encompassed by the appended claims.
* * * * *