User-controlled Beam Steering In Microphone Array Abrams; Darren W. ; et al. [Bose Corporation]

User-controlled Beam Steering In Microphone Array

Abrams; Darren W. ; et al.

Patent Application Summary

U.S. patent application number 16/732549 was filed with the patent office on 2020-04-30 for user-controlled beam steering in microphone array. The applicant listed for this patent is Bose Corporation. Invention is credited to Darren W. Abrams, Pelham Norville, William D. Rice, Noah J. Stupak, John A. Trotter, William H. A. Voigt.

Application Number	20200137487 16/732549
Document ID	/
Family ID	63528920
Filed Date	2020-04-30

United States Patent Application	20200137487
Kind Code	A1
Abrams; Darren W. ; et al.	April 30, 2020

USER-CONTROLLED BEAM STEERING IN MICROPHONE ARRAY

Abstract

Various aspects include approaches for controlling acoustic beams. In some particular aspects, a system includes: a microphone array; a camera system for detecting a visual characteristic of a user; and a control system coupled with the microphone array and the camera system, the control system programmed to control the microphone array by: receiving a camera signal from the camera system; and focusing the microphone array in response to the camera signal indicating the visual characteristic of the user, prior to receiving an audio input from the user at the microphone array.

Inventors:

Abrams; Darren W.; (Stow, MA) ; Voigt; William H. A.; (Southborough, MA) ; Norville; Pelham; (Wayland, MA) ; Rice; William D.; (Wayland, MA) ; Stupak; Noah J.; (Arlington, MA) ; Trotter; John A.; (Sudbury, MA)

Applicant:

Name	City	State	Country	Type
Bose Corporation	Framingham	MA	US

Family ID:

63528920

Appl. No.:

16/732549

Filed:

January 2, 2020

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
15687961	Aug 28, 2017	10547937
16732549

Current U.S. Class:	1/1
Current CPC Class:	H04N 21/42222 20130101; H04R 2420/07 20130101; H04R 2430/20 20130101; G10L 2021/02166 20130101; H04R 3/005 20130101; G06F 3/002 20130101; H04N 2005/4428 20130101; G06F 3/017 20130101; G10L 21/0208 20130101; G10L 21/028 20130101; G06F 3/165 20130101; H03G 3/32 20130101; H04N 5/60 20130101; H04N 7/15 20130101; H04R 1/406 20130101
International Class:	H04R 1/40 20060101 H04R001/40; G10L 21/0208 20060101 G10L021/0208; H03G 3/32 20060101 H03G003/32; G10L 21/028 20060101 G10L021/028; G06F 3/16 20060101 G06F003/16

Claims

1. A system comprising: a microphone array; a camera system for detecting a visual characteristic of a user; and a control system coupled with the microphone array and the camera system, the control system programmed to control the microphone array by: receiving a camera signal from the camera system; and focusing the microphone array in response to the camera signal indicating the visual characteristic of the user, prior to receiving an audio input from the user at the microphone array.

2. The system of claim 1, wherein the control system is further configured to compare the visual characteristic of the user detected by the camera system with a stored visual characteristic library, wherein focusing the microphone array is performed in response to the detected visual characteristic matching a visual characteristic in the stored visual characteristic library.

3. The system of claim 2, wherein the visual characteristic library comprises a thresholding mechanism for comparing the camera signal with pre-sorted video data or pre-sorted image data to determine whether the detected visual characteristic is linked with a focus direction for the microphone array.

4. The system of claim 1, wherein the visual characteristic comprises at least one of: a detected presence of the user, a detected location of the user, a detected identity of the user, or a detected gesture by the user.

5. The system of claim 1, wherein the microphone array is contained within an audio and video conferencing system.

6. The system of claim 5, wherein the audio and video conferencing system includes a video display for providing video captured by the camera system at a remote geographic location.

7. The system of claim 5, further comprising a directional controller for indicating a desired direction of focus for the microphone array.

8. The system of claim 7, wherein the directional controller is at a remote geographic location from the microphone array when sending the command indicating the desired direction of focus, and wherein the directional controller is configured to send a command signal through the audio and video conferencing system to indicate the desired direction of focus for the microphone array at the remote geographic location.

9. The system of claim 1, further comprising a directional controller for training a visual characteristic engine to identify the visual characteristic of the user.

10. The system of claim 1, wherein the control system includes a digital signal processor for modifying the focus direction of the microphone array toward a physical location of the user or away from the physical location of the user as detected by the camera system.

11. An audio and video conferencing system comprising: a microphone array; a camera system for detecting a visual characteristic of a user at a first geographic location; and a control system coupled with the microphone array and the camera system, the control system programmed to control the microphone array by: receiving a camera signal from the camera system; and focusing the microphone array toward the user at the first geographic location in response to the camera signal indicting the visual characteristic of the user, prior to receiving an audio input from the user at the microphone array.

12. The system of claim 11, wherein the control system is further configured to compare the visual characteristic of the user detected by the camera system with a stored visual characteristic library, wherein focusing the microphone array is performed in response to the detected visual characteristic matching a visual characteristic in the stored visual characteristic library.

13. The system of claim 12, wherein the visual characteristic library comprises a thresholding mechanism for comparing the camera signal with pre-sorted video data or pre-sorted image data to determine whether the detected visual characteristic is linked with a focus direction for the microphone array.

14. The system of claim 11, wherein the visual characteristic comprises at least one of: a detected presence of the user, a detected location of the user at a geographic location, a detected identity of the user, or a detected gesture by the user.

15. The system of claim 11, further comprising a video display and a speaker located at a second, distinct geographic location.

16. The system of claim 15, further comprising a directional controller for indicating a desired direction of focus for the microphone array.

17. The system of claim 16, wherein the directional controller is at the second, distinct geographic location when sending the command indicating the desired direction of focus.

18. The system of claim 17, wherein the directional controller is configured to send a command signal through the audio and video conferencing system to indicate a desired direction of focus for the microphone array at the first geographic location.

19. The system of claim 11, further comprising a directional controller for training a visual characteristic engine to identify the visual characteristic of the user.

20. The system of claim 11, wherein the control system includes a digital signal processor for modifying the focus direction of the microphone array toward a physical location of the user or away from the physical location of the user as detected by the camera system.

Description

PRIORITY CLAIM

[0001] This application is a continuation application of U.S. patent application Ser. No. 15/687,961, filed on Aug. 28, 2017, which is hereby incorporated by reference in its entirety.

TECHNICAL FIELD

[0002] This disclosure generally relates to acoustic beam steering. More particularly, the disclosure relates to controlling beam direction in a microphone array based upon one or more user inputs.

BACKGROUND

[0003] Directionality in acoustic signals can greatly affect the user experience. In dynamic environments, where users and/or noise-generating devices move relative to a microphone array and/or speaker, systems that fail to adapt to changes in relative position may deliver an undesirable user experience.

SUMMARY

[0004] Various aspects of the disclosure include approaches for controlling acoustic beams. In a first aspect, a system includes: a microphone array; and a control system coupled with the microphone array, the control system programmed to control the microphone array by: processing a signal from a beacon device to identify a physical location of the beacon device relative to the microphone array; and focusing the microphone array in a direction based upon the physical location of the beacon device prior to receiving an audio input at the microphone array.

[0005] A second aspect of the disclosure includes an apparatus having: a microphone array; and a control system coupled with the microphone array, the control system programmed to: receive a command from a directional controller indicating a desired direction of focus for the microphone array; and focus the microphone array in the desired direction of focus prior to transmitting an audio output or receiving an audio input at the microphone array.

[0006] A third aspect of the disclosure includes a system having: a control system programmed to control a microphone array by: providing a prompt to initiate at least one noise source for sampling using the microphone array; receiving a set of noise source samples of the at least one noise source at the microphone array; creating a microphone array filter configured to at least partially reject the at least one noise source from the set of noise source samples; and storing the microphone array filter in a library of microphone array filters.

[0007] A fourth aspect of the disclosure includes a system having: a microphone array; a camera system for detecting a visual characteristic of a user; and a control system coupled with the microphone array and the camera system, the control system programmed to control the microphone array by: receiving a camera signal from the camera system; and focusing the microphone array in response to the camera signal indicating the visual characteristic of the user, prior to receiving an audio input from the user at the microphone array.

[0008] A fifth aspect of the disclosure includes an audio and video conferencing system having: a microphone array; a camera system for detecting a visual characteristic of a user at a first geographic location; and a control system coupled with the microphone array and the camera system, the control system programmed to control the microphone array by: receiving a camera signal from the camera system; and focusing the microphone array toward the user at the first geographic location in response to the camera signal indicting the visual characteristic of the user, prior to receiving an audio input from the user at the microphone array.

[0009] Implementations may include one of the following features, or any combination thereof.

[0010] In certain cases, the control system is further configured to compare the visual characteristic of the user detected by the camera system with a stored visual characteristic library, where focusing the microphone array is performed in response to the detected visual characteristic matching a visual characteristic in the stored visual characteristic library.

[0011] In particular aspects, the visual characteristic library includes a thresholding mechanism for comparing the camera signal with pre-sorted video data or pre-sorted image data to determine whether the detected visual characteristic is linked with a focus direction for the microphone array.

[0012] In some implementations, the visual characteristic includes at least one of: a detected presence of the user, a detected location of the user, a detected identity of the user, or a detected gesture by the user.

[0013] In certain aspects, the microphone array is contained within an audio and video conferencing system.

[0014] In particular cases, the audio and video conferencing system includes a video display for providing video captured by the camera system at a remote geographic location.

[0015] In some aspects, the system further includes a directional controller for indicating a desired direction of focus for the microphone array.

[0016] In certain cases, the directional controller is at a remote geographic location from the microphone array when sending the command indicating the desired direction of focus.

[0017] In particular implementations, the directional controller is configured to send a command signal through the audio and video conferencing system to indicate the desired direction of focus for the microphone array at the remote geographic location.

[0018] In some cases, the system further includes a directional controller for training a visual characteristic engine to identify the visual characteristic of the user.

[0019] In certain aspects, the control system includes a digital signal processor for modifying the focus direction of the microphone array toward a physical location of the user or away from the physical location of the user as detected by the camera system.

[0020] In particular cases, the system further includes a video display and a speaker located at a second, distinct geographic location.

[0021] Two or more features described in this disclosure, including those described in this summary section, may be combined to form implementations not specifically described herein.

[0022] The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features, objects and advantages will be apparent from the description and drawings, and from the claims.

DESCRIPTION OF THE DRAWINGS

[0023] FIG. 1 shows a schematic depiction of an environment illustrating a microphone array control system according to various implementations.

[0024] FIG. 2 shows a close-up depiction of portions of the microphone array control system of FIG. 1.

[0025] FIG. 3 is an example polar plot illustrating aspects of microphone array control.

[0026] FIG. 4 is an example polar plot illustrating a control process for modifying the polar plot of FIG. 3.

[0027] FIG. 5 shows a schematic depiction of an environment illustrating a microphone array control system according to various additional implementations.

[0028] FIG. 6 shows a schematic depiction of a microphone array control system according to various further implementations.

[0029] It is noted that the drawings of the various implementations are not necessarily to scale. The drawings are intended to depict only typical aspects of the disclosure, and therefore should not be considered as limiting the scope of the invention. In the drawings, like numbering represents like elements between the drawings.

DETAILED DESCRIPTION

[0030] As noted herein, various aspects of the disclosure generally relate to acoustic beam steering. More particularly, aspects of the disclosure relate to controlling beam direction in a microphone array based upon one or more user inputs.

[0031] Commonly labeled components in the FIGURES are considered to be substantially equivalent components for the purposes of illustration, and redundant discussion of those components is omitted for clarity.

[0032] Various implementations of the disclosure include approaches for providing user control of microphone arrays to enhance user experiences and system responses. In conventional microphone systems, such as those used in virtual personal assistant systems (VPAs), voice user interface systems (VUIs), smart speaker systems, conferencing systems, etc., a control system dictates the direction of the beam formed by the microphone array (or, "microphone array beam," or simply, "beam") based upon an audio input, such as the voice of one or more users. The microphone array can detect voice directionality, e.g., that a user is speaking in a particular portion of the room, and the control system can use digital signal processing (DSP) to modify the direction of the beam in order to enhance the signal to noise ratio of the audio signal from that user.

[0033] However, these conventional systems limit the user's ability to direct the array, and can struggle to accurately parse commands and/or other audio signals through noise. In contrast to these conventional systems and approaches, various aspects of the disclosure include apparatuses and systems for controlling microphone array directionality through user-based commands. In some cases, a system configured to receive beacon signals is disclosed. This beacon-based system can adaptively direct a microphone array based upon the physical location of a beacon device. In particular cases, this beacon-based system can pre-select the direction of the beam formed by the microphone array based upon the beacon location, thereby enhancing signal quality and signal-to-noise ratio when compared with conventional systems which require an audio input signal to determine direction.

[0034] In some other cases, an apparatus is configured to receive signals from a (microphone) directional controller. This apparatus can be remotely controlled (e.g., in a remote physical location such as one separated from the microphone array by many miles), or may be locally controlled (e.g., within a localized region such as a single room, home or office). In some particular implementations, the directional controller can enable a remote user to control the direction of a microphone array in order to enhance the audio signal ultimately transmitted to that remote user at his/her remote location. As will be described herein, in various implementations, the directional controller can be used as a beacon.

[0035] In still other cases, a system is configured to interact with a user in order to sample noise sources and create customized microphone array filters for subsequently neutralizing those types of noise. This system can provide a prompt to the user or other system control in order to sample noise sources (e.g., common noise sources in a home, office or other environment), and can create and store specific microphone array filters for those noises for later implementation. It is understood that the term "user prompt" may not require a human user to initiate the noise sampling process. For example, in some cases, the "user" may be a robotic user, an artificial intelligence (AI) system or a control system configured to initiate the noise source in response to a prompt. In any case, this system can be configured to enhance audio signal quality at the microphone array by filtering specific noise signals.

[0036] FIG. 1 shows an illustrative environment 10 including a system 20 according to various implementations. System 20 can include a microphone array 30. The microphone array 30 can include a plurality of microphones 40 for receiving audio signals, e.g., such as commands from one or more users 50 (distinct users 50 shown as 50A, 50B, 50C, etc.). While a plurality of users 50 (e.g., 50A, 50B, 50C) are shown in various depictions, it is understood that the systems and apparatuses described according to various implementations may be utilized by a single user (e.g., 50A) moving between a plurality of locations (e.g., Location A, Location B, Location C). Microphone array 30 can be coupled with a control system 60 for modifying the beam direction of microphone array 30 based upon one or more conditions, as discussed herein. As noted herein, control system 60 can include conventional hardware and/or software components for executing program instructions or code according to processes described herein. For example, control system 60 may include one or more processors, memory, communications pathways between components, and/or one or more logic engines for executing program code. Control system 60 can be coupled with microphone array 30 via any conventional wireless and/or hardwired connection which allows control system 60 to send/receive signals to/from microphone array 30 and control operation thereof. In various implementations, control system 60 and microphone array 30 are collectively housed in a speaker system 70 (shown optionally in phantom), which may include a speaker 80 for transmitting audio signals to user(s) 50. Environment 10 can further include beacon device(s) 90 for providing location information to control system 60 according to processes described herein. Control system 60 is programmed to control microphone array 30 according to various approaches, as noted herein. For example, in one implementation as illustrated in FIG. 1, control system 60 is programmed to control microphone array 30 by: i) Processing a signal from the beacon device 90 to identify a physical location (e.g., Location A, Location B, Location C) of the beacon device 90 relative to the microphone array 30; and ii) Focusing the microphone array 30 in a direction (e.g., Direction A, Direction B, Direction C) based upon the physical location (e.g., Location A, Location B, Location C) of beacon device 90 prior to transmitting an audio output 110 or receiving an audio input 120 at the microphone array 30.

[0037] That is, system 20 can permit focusing of microphone array 30 without the need for an audio input 120 at microphone array 30 regarding a focus direction. This pre-focusing process can allow for improved signal quality in commands, transmissions, etc. from the desired direction (e.g., Direction A, Direction B, Direction C), as the microphone array 30 is already pre-positioned to focus on that direction based upon input from beacon device 90.

[0038] Beacon device 90 can include any device capable of being located within environment 10, and in some cases, can include at least one of: a Bluetooth beacon device, a mobile phone, an electronic tablet, a headset or headphones, a controller such as a game controller, a computing device, a global positioning system (GPS) transceiver, a wearable electronic device (e.g., smart watch, smart glasses, smart headphones or smart headsets) or a remote control with location-based capabilities. In some cases, beacon device 90 includes a wireless transceiver 125 (shown in one example beacon device 90 in FIG. 1) configured to communicate with a wireless transceiver (WT) 130 in the control system 60.

[0039] According to various implementations, beacon device 90 is movable across a plurality of physical locations (Location A, Location B, Location C, etc.) relative to the microphone array 30, and can provide information about its physical location (Location A, Location B, Location C, etc.) relative to microphone array 30. While multiple users 50A, 50B, 50C are shown at distinct locations (Location A, Location B, etc.) for illustrative purposes, it is understood that each user 50 could represent a single user (e.g., 50A) moving between locations. Additionally, different types of beacon devices 90 are illustrated in FIG. 1, e.g., a smaller wearable beacon device 90 on user 50A, a hand-held beacon device 90 connected with user 50B. At location C, a tag-style beacon 90 could be used to designate this location within environment 10 according to various implementations.

[0040] In some cases, beacon device 90 can include an electronic device configured to communicate with control system 60 directly, or via a communications protocol (or, network), such as any conventional wireless communications protocol. In some cases, communications protocol can include a Wi-Fi protocol using a wireless local area network (LAN), a communication protocol such as IEEE 802.11 b/g a cellular network-based protocol (e.g., third, fourth or fifth generation (3G, 4G, 5G cellular networks) or one of a plurality of internet-of-things (IoT) protocols, such as: Bluetooth, BLE Bluetooth, ZigBee (mesh LAN), Z-wave (sub-GHz mesh network), 6LoWPAN (a lightweight IP protocol), LTE protocols, RFID, ultrasonic audio protocols, etc.

[0041] In various implementations beacon device 90 sends a location signal 150 to control system 60 (e.g., via one or more communications protocols described herein) indicating the physical location (Location A, Location B, etc.) of beacon device 90 relative to control system 60. In some particular cases, control system 60 logs or otherwise stores location information (Location A, Location B, etc.) about beacon device(s) 90 from a location signal 150 until a later location signal 150 is received from the beacon device(s) 90. This process may allow control system 60 to direct the microphone array 30 in a direction (Direction A, Direction B, etc.) without polling or otherwise querying beacon device(s) 90. In various implementations, beacon device 90 is configured to operate in two distinct modes: active mode and passive mode. In passive mode, beacon device 90 remains in a sleep (non-signaling) configuration until it receives a ping or other location-requesting signal from speaker system 70 (e.g., a location-based ping). In active mode, beacon device 90 is configured to continuously or periodically send location signals without requiring a ping or other location-requesting signal from speaker system 70. Additionally, passive v. active modes can be controlled or otherwise modified through a software application stored or otherwise implemented (e.g., via cloud-based retrieval) in control system 60 and/or on beacon device 90, such that control system 60 sends a query signal 160 to beacon device 90 whenever beacon device 90 switches from a passive mode to an active mode; and beacon device 90 may respond with a location signal 150 updating the location information (Location A, Location B, etc.) about beacon device 90. Additionally, or alternatively, the software application may cause beacon device 90 to switch from passive mode to active mode and send location signal 150 whenever beacon device 90 moves within environment 10, e.g., moves a distance that exceeds a threshold distance.

[0042] Location signals 150 can include data indicating a location of beacon device 90, and can be sent via one or more location-based protocols. For example, as noted herein, beacon device 90 can include a wireless transceiver 125 for communicating with a wireless transceiver 130 in control system 60. In some cases, beacon device 90 can include a Bluetooth beacon device configured to communicate (e.g., via location signals 150) with a Bluetooth transceiver at control system 60. In other cases, beacon device 90 can include a global positioning system (GPS) transceiver or satellite-based transceiver configured to send location signals 150 to control system 60. In still other implementations, beacon device 90 can include a Wi-Fi access point location system (e.g., including an antenna array on beacon device 90 and/or speaker system 70) configured to transmit location signals 150 via Wi-Fi to indicate the location of beacon 90 within a Wi-Fi network range. In some particular implementations, wireless transceiver 130 on speaker system 70 can include an antenna configured to detect signal direction from location signals 150. In still other embodiments, speaker system 70 and/or beacon device 90 can include an infra-red locator for detecting the direction of location signals 150. Additionally, microphone array 30 may be used for ultrasonic detection to indicate the location of beacon device 90 relative to speaker system 70.

[0043] According to various implementations, one or more beacon device(s) 90 can include a focus mode (FM) 170 for pre-focusing the microphone array 30 toward the beacon device 90, or a negation mode (NM) 180 for pre-focusing the microphone array 30 away from the beacon device 90. That is, in some configurations, each beacon device 90 can be designated as a "focus" beacon or a "negation" beacon. These modes can allow control system 60 to pre-focus microphone array 30 on particular beacon device(s) 90 or negate other beacon device(s) 90. In some cases, the mode (e.g., focus mode 170 or negation mode 180) for one or more beacon devices 90 is pre-set prior to beacon device 90 sending a location signal 150 to control system 60. This could allow for use of beacon devices 90 as labels on particular areas, people, devices, etc., of interest or disinterest in terms of microphone directionality. For example, a user 50 may employ a beacon device 90 (e.g., his/her mobile phone, tablet, smart glasses, earpiece, hearing aid, smart watch) set on focus mode 170 in order to continuously update the direction of microphone array 30 to "follow" the user 50 as he/she moves within environment 10. This may enhance the user's experience and provide greater quality of the audio input 120 to microphone array 30. In another example, a user 50 may couple a beacon device 90 (e.g., electronic tag, RFID tag, Wi-Fi tag) set to negation mode 180 with an area of a home where user(s) 50 are unlikely to be located, e.g., a room where few people congregate. In still other examples, a user 50 may couple a beacon device 90 (e.g., electronic tag, RFID tag, Wi-Fi tag) set to negation mode 180 with a noise source that may not interact with (or intend to interact with) the speaker system 70, such as a device, person and/or thing (e.g., a vacuum, home theater system, a child or a dog).

[0044] In some cases, speaker system 70 and/or beacon device 90 can include a camera system 190 configured to detect a visual characteristic (e.g., a presence, location, proximity, identity and/or gesture) of one or more users 50. In some cases, camera system 190 is located only at speaker system 70 for detecting the visual characteristic (e.g., gestures or identity) of user(s) 50, e.g., across a room or a space. However, as shown in phantom in FIG. 1, and in the additional detailed depiction of control system 60 in FIG. 2, in some cases beacon device 90 can also (or alternatively) include a camera system 190, e.g., where beacon device 90 is a smart phone, wearable electronic device (e.g., smart watch, smart glasses, smart headphones or smart headsets) or other portable camera-enabled system. In any case, camera system 190 can be configured to detect a visual characteristic (e.g., presence, location, proximity, identity and/or gesture) of one or more user(s) 50. In various implementations, camera system 190 can be actively enabled, e.g., via user prompt, to track visual characteristics of one or more user(s) 50. In one particular implementation, camera system 190 can include a visual characteristic (VC) tracking engine 200 for processing image data captured at camera system 190 on beacon device 90 and/or at speaker system 70.

[0045] According to various implementations, control system 60 is programmed or otherwise configured to: receive a camera signal from camera system 190 which indicates the visual characteristic of user 50 (e.g., a gesture such as a wave of the hand, or other significant motion); and focus the microphone array 30 in response to the visual characteristic of the user 50 matching a visual characteristic in a stored visual characteristic library (VC Lib) 220 (locally stored or remotely accessible via any conventional network interface or communications protocol described herein). Stored VC library 220 may also include a thresholding mechanism (e.g., logic, or a relational database with indexing) for comparing a camera signal (e.g., from camera system 190 at beacon device 90 and/or at speaker system 70) with pre-sorted video or image data to determine whether the visual characteristic of the user 50 is linked with a focus direction for the microphone array 30.

[0046] In some cases, where the visual characteristic includes a gesture, user motion which exceeds a certain threshold may trigger a camera signal from camera system 190 to control system 60 (e.g., via connection between camera system 190 and control system 60, or via beacon device 90). In these examples, a user 50 could wave a left arm to indicate a desire to direct the microphone array 30 to the user's left-hand side. Where the movement of that left arm is significant enough to trigger VC tracking engine 200 (e.g., such that movement range is greater than a threshold distance such as one or two feet), and a camera signal is sent to control system 60, control system 60 can compare that camera signal with image or video data in VC library 220 that is user-specific or generic to a plurality of users to determine a characteristic of the visual data, e.g., in the case of a gesture, the direction of the gesture and an intent if the gesture (e.g., that the gesture indicates an intention to focus the microphone array 30 in a particular direction). It is understood that in various particular implementations, the "gesture" can include a physical gesture such as movement of an arm, leg, torso, head, etc., but can also (or alternatively) include a facial gesture (e.g., movement of an eyebrow, change in direction of gaze). In some of these cases, the facial gaze of a user 50 can be detected, e.g., using a retina detection feature in camera system 190, in order to detect that the user 50 is looking directly at camera system 190. This application may be beneficial, for example, in some communications settings (e.g., video conferencing), as well as when user 50 engages with a virtual assistant. In some particular implementations, control system 60 includes logic for comparing camera signals from camera system(s) 190 (at beacon device 90 and/or speaker system 70) with image or video data in VC library 220, including, e.g., image recognition logic to determine that the user 50 is intentionally making a particular gesture for controlling the focus of the microphone array 30.

[0047] As noted herein, in other examples, the visual characteristic (e.g., visual characteristic of user 50) can include one or more of a presence of a user 50, a proximity of a user 50 to one or more other reference items (e.g., speaker system 70, beacon device(s) 90 or other user(s) 50), a location of one or more user(s) 50 or an identity of a user 50. In some of these cases, the VC tracking engine 200 can be configured to detect the identity of a user 50, using image data from camera system 190 (e.g., either periodically or continuously obtained image data), and indicate the location of the identified user 50. In these example implementations, VC tracking engine 200 could visually identify the user 50 as he/she moves through an environment, and send an indication of that location to control system 60 in order to dynamically update the focus direction of the microphone array 30. In still other examples, the beacon device 90 can utilize its camera system 190 to identify (and in some cases, track) a user 50. For instance, a user 50 can direct the camera on a beacon device 90 (e.g., smartphone, tablet, wearable smart device) toward himself/herself, or to a particular other user (e.g., user 50A, user 50B, or user 50C), and VC tracking engine 200 can be trained to identify the user(s), e.g., via image capture and recognition and/or artificial intelligence (AI)-based training.

[0048] It is understood that a single beacon device 90 (at Location A) in FIGS. 1 and 2 is illustrated in greater detail than other beacon devices 90 (e.g., at Location B or Location C) merely for simplicity of illustration. It is further understood that any beacon device 90 shown and described herein may have some or all of the capabilities of that beacon device 90 shown at Location A in FIGS. 1 and 2.

[0049] Turning to FIG. 2, and with continuing reference to FIG. 1, control system 60 and microphone array 30 are shown in greater detail to describe additional implementations of the disclosure. As shown in FIG. 2, control system 60 can further include a location identification (Location ID) engine 225 for detecting the location of beacon device 90 within environment 10. That is, according to various implementations, location identification engine 225 is configured to process location signal(s) 150 for one or more beacon device(s) 90 in order to determine where in environment beacon device 90 is located (e.g., Location A, Location B, Location C or any other location). That is, location identification engine 225 can include logic for processing data about one or more location signals 150 received via one or more location protocols described herein. For example, location identification engine 225 may include logic for processing Bluetooth location data, Wi-Fi location data, video (camera) data, etc. to determine a location for each beacon device 90. In particular implementations, location identification engine 225 can rely upon multiple location protocols to identify and/or verify a location of beacon device(s) 90.

[0050] Control system 60 can further include a digital signal processor (DSP) 230 for modifying the focus direction of the microphone array 30 according to various implementations described herein. That is, DSP 230 can be configured to modify the focus direction (Direction A, Direction B, etc.) of the microphone array 30 in the direction toward the physical location (Location A, Location B, etc.) of beacon device 90, or in a direction (Direction B, Direction C, etc.) away from the physical location (Location A, Location B, etc.) of beacon device 90 based upon one or more indicators from beacon device 90 (e.g., focus mode 170 or a negation mode 180, camera signal including visual characteristic information). Microphone array 30 can include a plurality of microphones 40, which may each include a conventional receiver for receiving audio signals (e.g., audio input signal 120). In some cases, microphones 40 can include one or more directional microphones. However, in other cases, each microphone 40 in the array 30 can include an omnidirectional microphone configured to be directed by digital signal processor 230. DSP 230 can be coupled with microphones 40 (and/or wireless transceiver 130) and include one or more DSP filters 260 for processing audio input 120 and/or audio output 110 in order to control the direction of the array 30, e.g., by DSP beamforming. DSP beamforming is a known technique for summing the input (e.g., audio input 120) from multiple directions to achieve a narrower response to input(s) 120 from a particular direction (e.g., Direction A, Direction B, etc.). FIG. 2 illustrates an example curved microphone array 30 including a plurality of microphones 40 arranged along an arcuate path, each having an x and y coordinate value.

[0051] An example response from a given environment (without beamforming) at microphones 40 is shown in the polar graphical depiction of FIG. 3, where the desired pointing direction (e.g., Direction A, FIGS. 1 and 2) is called the maximum response angle (MRA), the angle in the polar graph of FIG. 3 is the off-set from that MRA, and the radius is the amplitude response in that MRA direction. FIG. 4 illustrates a filtered response at microphones 40 with DSP filters 260 applied to direct the microphone array 30 in a particular direction (e.g., the MRA direction, which can be dictated by the location of beacon device(s) 90, visual characteristic tracking at beacon device(s) 90, etc.).

[0052] FIG. 5 shows a schematic depiction of another environment 300, including an apparatus 310 configured to control a microphone array 30 according to various additional implementations. For example, apparatus 310 can include a microphone array 30 (as described with respect to microphone array 30 in FIG. 1 and FIG. 2), and a control system 60 (as described with respect to control system 60 in FIG. 1 and FIG. 2). Additionally, as shown in FIG. 5, control system 60 is configured to: A) receive a command (command signal) 320 (either directly or via a communications protocol) from a directional controller 330 indicating a desired direction of focus (e.g., Direction A, Direction B) for microphone array 30; and B) focus the microphone array 30 in the desired direction (e.g., Direction A, Direction B) prior to transmitting audio output 110 or receiving audio input 120.

[0053] In this implementation, in contrast to FIG. 1, apparatus 310 may be configured to interact with directional controller 330, which may include a beacon device 90 (with some or all of the functionality described herein), but may also include a command engine 340 for dictating a number of directions of microphone array 30 using an active command signal 320. For example, in some cases, directional controller 330 can be contained in, or form a part of, an audio conferencing system or a video conferencing system (audio/video conferencing system) 360 (which may utilize one or more communications protocols described herein), and may be located at a remote geographic location (Location R) from microphone array 30. In these cases, directional controller 330 can send a command signal 320 through audio/video conferencing system 360 (or directly to control system 60) indicating a desired direction of focus (Direction A, Direction, B, etc.) for the microphone array 30. In these cases, a user 50 can control the direction of microphone array 30 from a remote geographic location (Location R), for example, when communicating with another user 50 who is in the same general location (Location L, or local location) as the microphone array 30. This apparatus 310 may be beneficial to the remote user 50 when one or more users 50 are present at Location L, and the remote user 50 wishes to pre-focus the microphone array 30 on a particular user 50 (e.g., Direction A, Direction B, etc.). This apparatus 310 may also be beneficial where one or more users 50 at Location L will move around within that location during an audio/video conference. It is understood that as used herein, the term "conference" may include any designated communication between users 50, including, e.g., video chat, audio call, virtual reality/augmented reality (VR/AR) chats, audio/video overlay, etc. In various implementations, the conference application can include any conventional conferencing system such as a chat system, audio-conferencing system or video-conferencing system, e.g., using protocol(s) such as voice over IP (VoIP), voice over LTE (VoLTE), or WebRTC.

[0054] In yet another embodiment, as illustrated in the schematic depiction of FIG. 6, a system 400 can be configured to provide customized noise filtering for particular applications. That is, in various implementations, system 400 can include a control system 410 programmed to control a microphone array 30 (similar to microphone array 30 described with reference to FIG. 1) by:

[0055] A) Providing a prompt 420 to initiate at least one noise source 430 for sampling using the microphone array 30. In some cases, this can include providing the prompt 420 to a user 50, e.g., as an audio prompt and/or via a user interface (UI) 440 (which may include any conventional UI on an electronic device having interface capabilities, e.g., smartphone, wearable electronic device (e.g., smart watch, smart glasses, smart headphones or smart headsets), or other device similar to beacon device 90 described herein), or to another control engine 450 (e.g., control system for noise-generating devices in a home, office or other environment) for controlling one or more noise sources 430. Prompt 420 may request that user 50 or control engine 450 initiate the noise source(s) 430 for sampling by microphone array 30 during a sampling period, e.g., within a prescribed time such as 10-15 seconds or one or two minutes (as non-limiting examples). The prompt 420 can request that user 50 or control engine 450 power on the noise source 430 or otherwise initiate noise from noise source 430, and may request additional information about the noise source 430, such as the type of device generating the noise, e.g., the make and/or model of the device generating the noise. For example, noise source 430 could include a vacuum cleaner, a heating/ventilating/air-conditioning (HVAC) unit, a food processor, etc., and prompt 420 may request that one or more of these noise sources 430 be individually operated (and potentially, operated within a certain range such as a High setting, Low setting, etc., or for a particular period) in order to gather information about these noise sources 430. At the input stage, prompt 420 could ask user 50 to provide information about the type of noise source 430, such as by selecting a type of the source 430 from a menu.

[0056] B) After prompting user 50 and/or control engine 450 to initiate noise source 430, microphone array 30 can receive a set of noise source samples 460 of noise source(s) 430. Noise source samples 460 may include audio signals from noise source(s) 430 taken over a period, or at one or more particular operating levels (e.g., High, Low, full volume, partial volume). Noise source samples 460 will include spectral content for each noise source 430, and can be categorized by device and/or noise spectrum (frequency) for later filtering.

[0057] C) From the noise source samples 460, control system 410 can create one or more microphone array filters (mic. array filters) 470 configured to at least partially reject the noise source(s) 430. Control system 410, when paired with microphone array 30, can be configured to identify audio signals from noise source 430 and apply microphone array filters 470 to at least partially reject the noise source 430. In some cases, creating the set of microphone array filters 470 includes categorizing the noise source samples 460 according to at least one of frequency or amplitude, and correlating control instructions for modifying a gain on the microphone array 30 with the categorized noise source samples 460. For example, the microphone array filters 470 can include frequency-dependent filters selected to knock out (interfere with) frequencies along a particular spectrum for one or more noise sources 430.

[0058] D) Storing the microphone array filters 470 in a library 480 for later use in filtering noise sources 430. It is understood that library 480 could be a local library in a common geographic location as one or more portions of control system 410, or may be a remote library stored at least partially in a distinct location or in a cloud-based server. It is further understood that library 480 could include a plurality of microphone array filters 470 for a plurality of noise sources 430 from distinct users 50 or profiles. In this sense, library 480 could store microphone array filters 470 which are applicable to specific users 50 or profiles, but may also store microphone array filters 470 that can be used by distinct users 50 or profiles, e.g., where a noise source 430 is common to multiple users 50 and/or profiles.

[0059] E) In some cases, in response to detecting audio noise signals 490 (similar to noise source sample signals 460), control system 410 is further configured to apply microphone array filters 470 during at least one of receiving an audio input 120 at microphone array 30 or transmitting an audio output signal 110 at speaker 80 (e.g., where control system 410 is part of speaker system 70). That is, in various implementations, when control system 410 receives an audio input signal 120 or sends an audio output signal 110 (e.g., to user 50) and detects an audio noise signal 490, control system 410 can apply one or more microphone array filters 470 from library 480 to filter (e.g., interfere with) that audio noise signal 490. In some particular cases, control system 410 can modify the direction of microphone array 30 according to the stored microphone array filter 470 in response to detecting the noise source 430 (audio noise signal 490).

[0060] In various implementations, library 480 can include a relational database (relational DB) 500 including relationships between the microphone array filter(s) 470 and the set of noise source samples 460. In some cases, library 480 could also include a text index for noise sources, e.g., with pre-set or user-definable categories. This could allow the user (e.g., in response to prompt 420) to provide information about the type of noise source 430, such as by selecting a type of the source 430 from a menu (e.g., provided in UI 440 or at control engine 450). Control system 410 can further include a learning engine (e.g., an artificial intelligence component such as an artificial neural network) configured to learn about the type of source selected, e.g., from a group of users' noise sources 430 (e.g., washing machines or vacuum cleaners).

[0061] Control system 410 can include a DSP 230 (similar to DSP described with reference to FIGS. 1-4) for comparing audio noise signal 490 with noise source samples 460 stored in the library 480, and can modify the gain on the microphone array 30 in response to the audio noise signal 490 matching at least one of those noise source samples 460. In particular, control system 410 can utilize DSP 230 to modify a gain and a direction of the microphone array 30 in order to knock out (interfere with) the noise from noise source.

[0062] The functionality described herein, or portions thereof, and its various modifications (hereinafter "the functions") can be implemented, at least in part, via a computer program product, e.g., a computer program tangibly embodied in an information carrier, such as one or more non-transitory machine-readable media, for execution by, or to control the operation of, one or more data processing apparatus, e.g., a programmable processor, a computer, multiple computers, and/or programmable logic components.

[0063] A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a network.

[0064] Actions associated with implementing all or part of the functions can be performed by one or more programmable processors executing one or more computer programs to perform the functions of the calibration process. All or part of the functions can be implemented as, special purpose logic circuitry, e.g., an FPGA and/or an ASIC (application-specific integrated circuit). Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. Components of a computer include a processor for executing instructions and one or more memory devices for storing instructions and data.

[0065] In various implementations, components described as being "coupled" to one another can be joined along one or more interfaces. In some implementations, these interfaces can include junctions between distinct components, and in other cases, these interfaces can include a solidly and/or integrally formed interconnection. That is, in some cases, components that are "coupled" to one another can be simultaneously formed to define a single continuous member. However, in other implementations, these coupled components can be formed as separate members and be subsequently joined through known processes (e.g., soldering, fastening, ultrasonic welding, bonding). In various implementations, electronic components described as being "coupled" can be linked via conventional hard-wired and/or wireless means such that these electronic components can communicate data with one another. Additionally, sub-components within a given component can be considered to be linked via conventional pathways, which may not necessarily be illustrated.

[0066] Other embodiments not specifically described herein are also within the scope of the following claims. Elements of different implementations described herein may be combined to form other embodiments not specifically set forth above. Elements may be left out of the structures described herein without adversely affecting their operation. Furthermore, various separate elements may be combined into one or more individual elements to perform the functions described herein.

* * * * *

Patent Diagrams and Documents

D00000

D00001

D00002

D00003

D00004

D00005

D00006

XML

US20200137487A1 – US 20200137487 A1