U.S. patent application number 15/480271 was filed with the patent office on 2018-01-11 for simple affirmative response operating system.
The applicant listed for this patent is Intelligently Interactive, Inc.. Invention is credited to Eric H. Weingartner.
Application Number | 20180012595 15/480271 |
Document ID | / |
Family ID | 60892418 |
Filed Date | 2018-01-11 |
United States Patent
Application |
20180012595 |
Kind Code |
A1 |
Weingartner; Eric H. |
January 11, 2018 |
SIMPLE AFFIRMATIVE RESPONSE OPERATING SYSTEM
Abstract
A simple affirmative response operating system for selecting a
data item from a list of options using a unique affirmative action.
Text-based labels in a listing of content are converted to speech
using an embedded text-to-speech engine and an audio output of a
first converted label is provided. A listening state is entered
into for a predefined pause time to await receipt of the simple
affirmative action. If the simple affirmative action is performed
during the predefined pause time, an associated content item is
selected for output. If the simple affirmative action is not
performed during the predefined pause time, an audio output of a
next converted label in the list is provided. This protocol may be
used to control a variety of computing devices safely and
efficiently while a user is distracted or disabled from using
traditional input methods.
Inventors: |
Weingartner; Eric H.; (New
York, NY) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Intelligently Interactive, Inc. |
New York |
NY |
US |
|
|
Family ID: |
60892418 |
Appl. No.: |
15/480271 |
Filed: |
April 5, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15441742 |
Feb 24, 2017 |
|
|
|
15480271 |
|
|
|
|
15204981 |
Jul 7, 2016 |
9619202 |
|
|
15441742 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04L 65/1083 20130101;
G10L 15/22 20130101; H04L 12/2816 20130101; H04L 67/02 20130101;
H04L 12/2827 20130101; G10L 15/1815 20130101; G06F 16/951 20190101;
G10L 2015/223 20130101; G10L 13/08 20130101; H04L 51/02
20130101 |
International
Class: |
G10L 15/22 20060101
G10L015/22; H04L 12/58 20060101 H04L012/58; G10L 13/08 20130101
G10L013/08; H04L 12/28 20060101 H04L012/28; G10L 15/18 20130101
G10L015/18; H04L 29/08 20060101 H04L029/08; G06F 17/30 20060101
G06F017/30 |
Claims
1. A system, comprising: a processor, a sensor coupled to the
processor, an electronic sensory presentation device, and
non-transitory memory storing instructions that, when executed by
the processor, cause the processor to: store an identification of a
unique, system-wide, and context-independent affirmative action;
divide a content input to generate a sequence of contents and a
sequence of identifiers, wherein each identifier of the sequence of
identifiers is associated with a content from the sequence of
contents; enter a list-pause cycle comprising: outputting, via the
sensory presentation device, an identifier from the sequence of
identifiers; and pausing a predetermined length of time; repeat
list-pause cycles until determining, via input from the sensor,
that the affirmative action has been performed during the pause of
a list-pause cycle; and responsive to determining that the
affirmative action has been performed during the pause of the
list-pause cycle, output, via the sensory presentation device, the
content from the sequence of contents associated with the
identifier output during that list-pause cycle.
2. The system of claim 1, wherein the non-transitory memory stores
instructions that, when executed by the processor, further cause
the processor to: store an identification of a unique, system-wide,
and context-independent pause-initiation action; and responsive to
determining that the pause-initiation action has been performed
during the pause of a list-pause cycle, terminate the repetition of
list-pause cycles and entering a wait-pause mode.
3. The system of claim 2, wherein the non-transitory memory stores
instructions that, when executed by the processor, further cause
the processor to: store an identification of a unique, system-wide,
and context-independent options-initiation action; responsive to
determining that the options-initiation action has been performed
during either a list-pause cycle or the wait-pause mode: generate a
new sequence of identifiers, each of which represents an option for
controlling the computing device; enter a list-pause cycle
comprising: outputting, in an audio form and via the speaker, an
identifier of the new sequence of identifiers; and pausing a
predetermined length of time; repeating list-pause cycles until the
affirmative action or pause-initiation action is performed by the
user during the pause of a list-pause cycle; and responsive to the
user performing the affirmative action or pause-initiation action,
perform an action associated with the option for controlling the
computing device or terminating the list-pause cycle and entering a
wait-pause mode, respectively.
4. The system of claim 1, wherein the list-pause cycle further
comprises, after outputting the identifier and before pausing the
predetermined length of time, outputting, via the speaker, an audio
signifier that the pause is beginning.
5. The system of claim 1, wherein the list-pause cycle further
comprises, after pausing the predetermined length of time,
outputting, via the speaker, an audio signifier that the
predetermined length of time has elapsed and a new list-pause cycle
will begin.
6. The system of claim 1, wherein the content input comprises
textual data, and wherein outputting, in an audio form, the content
comprises converting the content with a text-to-speech engine and
playing the converted content.
7. The system of claim 6, wherein the textual data is received from
a search engine, the sequence of identifiers is a sequence of
webpage titles, and the sequence of contents is a sequence of
webpage contents.
8. The system of claim 6, wherein the textual data is received from
an email server, the sequence of identifiers is a sequence of email
header values, and the sequence of contents is a sequence of email
contents or attachments.
9. The system of claim 6, wherein the textual data is received from
a mapping service, the sequence of identifiers is a sequence of
location identifiers, and the sequence of contents is a sequence of
informative texts about locations associated with the location
identifiers.
10. The system of claim 1, wherein the content input is received
from a video streaming service, the sequence of identifiers is a
sequence of video identifiers or portions of video content, and the
sequence of contents comprises at least a portion of an audio track
or video track from each video of a sequence of videos.
11. The system of claim 1, wherein the content input is received
from an application programming interface (API) of a second
computing device, the sequence of identifiers is a sequence of
commands of the API, and the sequence of contents comprises at
least a confirmation of an action taken by the second computing
device.
12. A computer-implemented method for interaction between a human
and a computing device, comprising: storing an identification of a
unique, system-wide, and context-independent affirmative action;
dividing a content input to generate a sequence of contents and a
sequence of identifiers, wherein each identifier of the sequence of
identifiers is associated with a content from the sequence of
contents; entering a list-pause cycle comprising: outputting, in an
audio form, an identifier of the sequence of identifiers; and
pausing a predetermined length of time; repeating list-pause cycles
until determining the affirmative action has been performed during
the pause of a list-pause cycle; and responsive to determining the
affirmative action has been performed during the pause of a
list-pause cycle, outputting, in an audio form, the content from
the sequence of contents associated with the identifier output
during that list-pause cycle.
13. The computer-implemented method of claim 12, further
comprising: storing an identification of a unique, system-wide, and
context-independent pause-initiation action; and responsive to
determining that the pause-initiation action has been performed
during the pause of a list-pause cycle, terminating the repetition
of list-pause cycles and entering a wait-pause mode.
14. The computer-implemented method of claim 13, further
comprising: storing an identification of a unique, system-wide, and
context-independent options-initiation action; responsive to
determining that the options-initiation action has been performed
during either a list-pause cycle or the wait-pause mode: generating
a new sequence of identifiers, each of which represents an option
for controlling the computing device; entering a list-pause cycle
comprising: outputting, in an audio form, an identifier of the new
sequence of identifiers; and pausing a predetermined length of
time; repeating list-pause cycles until the affirmative action or
pause-initiation action is performed by the user during the pause
of a list-pause cycle; and responsive to the user performing the
affirmative action or pause-initiation action, performing an action
associated with the option for controlling the computing device or
terminating the list-pause cycle and entering a wait-pause mode,
respectively.
15. The computer-implemented method of claim 12, wherein the
list-pause cycle further comprises, after outputting the identifier
and before pausing the predetermined length of time, outputting an
audio signifier that the pause is beginning.
16. The computer-implemented method of claim 12, wherein the
list-pause cycle further comprises, after pausing the predetermined
length of time, outputting an audio signifier that the
predetermined length of time has elapsed and a new list-pause cycle
will begin.
17. The computer-implemented method of claim 12, wherein the
content input comprises textual data, and wherein outputting, in an
audio form, the content comprises converting the content with a
text-to-speech engine and playing the converted content.
18. The computer-implemented method of claim 17, wherein the
textual data is received from a search engine, the sequence of
identifiers is a sequence of webpage titles, and the sequence of
contents is a sequence of webpage contents.
19. The computer-implemented method of claim 17, wherein the
textual data is received from an email server, the sequence of
identifiers is a sequence of email header values, and the sequence
of contents is a sequence of email contents or attachments.
20. The computer-implemented method of claim 17, wherein the
textual data is received from a mapping service, the sequence of
identifiers is a sequence of location identifiers, and the sequence
of contents is a sequence of informative texts about locations
associated with the location identifiers.
21. The computer-implemented method of claim 12, wherein the
content input is received from a video streaming service, the
sequence of identifiers is a sequence of video identifiers or
portions of video content, and the sequence of contents comprises
at least a portion of an audio track or video track from each video
of a sequence of videos.
22. The computer-implemented method of claim 12, wherein the
content input is received from an application programming interface
(API) of a second computing device, the sequence of identifiers is
a sequence of commands of the API, and the sequence of contents
comprises at least a confirmation of an action taken by the
physical device.
23. A computer-implemented method for interaction between a human
and a computing device, comprising: storing, by a first computing
device, an identification of a unique, system-wide, and
context-independent affirmative action; storing, by the first
computing device, a sequence of available commands for a second
computing device and a sequence of identifiers, wherein each
identifier of the sequence of identifiers is associated with a
command from the sequence of available commands; entering a
list-pause cycle comprising: outputting, in an audio form, an
identifier of the sequence of identifiers; and pausing a
predetermined length of time; repeating list-pause cycles until
determining the affirmative action has been performed during the
pause of a list-pause cycle; and responsive to determining the
affirmative action has been performed during the pause of a
list-pause cycle, transmitting, to the second computing device, the
command from the sequence of available commands associated with the
identifier output during that list-pause cycle.
24. The computer-implemented method of claim 23, further
comprising: receiving a confirmation from the second computing
device that the command has been received; and notifying a user
that the command was received by the second computing device.
25. The computer-implemented method of claim 23, wherein the second
computing device is a smart home appliance.
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application is a continuation-in-part of co-pending
U.S. patent application Ser. No. 15/441,742, filed Feb. 24, 2017
and titled "VOICE COMMAND-DRIVEN DATABASE", which is itself a
continuation of U.S. patent application Ser. No. 15/204,981, filed
Jul. 7, 2016 and titled "VOICE COMMAND-DRIVEN DATABASE", both of
which are incorporated herein by reference in their entirety.
FIELD OF INVENTION
[0002] This disclosure relates generally to operating systems, and
more specifically, to a simple response-driven system for
initializing and controlling a variety of applications.
BACKGROUND
[0003] Modern computing devices are able to access a vast quantity
of information, both via the Internet and from other sources.
Functionality for such devices is increasing rapidly, as mobile
computing devices are able to run software applications to perform
various tasks and provide different types of information. However,
modern computing devices primarily rely upon output content to a
user via a visual screen, and acknowledging user only via a screen.
As a result, users who wish to operate a computing device either
while concurrently performing other distracting or strenuous
activities (e.g., operating a vehicle, riding a bicycle,
exercising, etc.), are visually impaired or disabled in some
manner, or simply wish to rest their eyes while interacting with
the device, may have difficulty interfacing effectively with their
devices due to limited or no ability to read a display screen or
physically interact with the device using existing physical input
methods.
[0004] Some modern computing devices include functionality that
enables a user to interact with the device using spoken natural
language, rather than employing a conventional manual user
interface. Most of the popular natural language voice recognition
systems for mobile computing devices and consumer products today,
such as Apple Inc.'s Siri.RTM. and Amazon.com, Inc.'s Amazon
Echo.RTM. utilize command-driven ASR systems that allow for the
spoken interaction to control the system on the mobile device.
Existing systems do not provide a sustained interaction predicated
by the first action initiated by the user, but rather respond with
a single result--for example, playing a song, or providing a single
fact that is the answer to a question.
[0005] Command-driven ASR systems typically rely on a limited
vocabulary list of words at any given time during the course of
interaction by the user and may be part of an embedded system
within a mobile device that does not require a remote server to
translate the STT to control the system. In such embedded systems,
the user is predominantly accessing a limited type of data (e.g.,
phone numbers, music, etc.) that is generally known to the user at
the time of a voice command input.
[0006] Systems that rely on commands, however, shift the burden to
the user to remember different commands or keywords in a dynamic
implementation of the vocabulary list, thus increasing the
difficulty for the user to know, remember or guess the commands to
enable useful control and interaction. For this reason,
conventional embedded, command-driven ASR systems are suitable for
limited applications in mobile devices (e.g., retrieving phone
numbers or email addresses, selecting music, or requesting
directions to a specific address) where the vocabulary list is
limited, finite, and generally known by the user.
[0007] Conventional command-driven, embedded ASR systems are not
suitable for more complex applications requiring a large vocabulary
due to the limited computational, memory and battery resources of
mobile computing devices. As the vocabulary required for responses
increases or varies, the accuracy of the speech recognition
decreases in embedded ASR systems. In addition, there are many
applications that require large vocabularies, oftentimes without
the ASR system or the user knowing in advance what vocabulary is
required.
[0008] Another area that adds complexity is the interaction with an
ASR system using the microphone and speaker of a device. Because
the microphone is typically close to the speaker on most mobile
devices, the ASR system can erroneously act upon its own TTS or
spoken output or ambient sounds if simultaneously "listening" for a
voice command from the user. Additionally, it can be a challenge
for the user to know when to speak while interacting with a TTS
list and relying on an erratic pause delay in the TTS between
varied-length content. The user doesn't know when the TTS of the
individual content has concluded without a delay in their response
time. The pause length between the TTS of content can be set to
address the time needed for the user, but still requires a lot of
attention for the user to respond quickly enough to speak to
initiate a selection or increase the overall time it takes for the
user to navigate through the list of content.
[0009] To address the spoken voice feedback loop, some digital
personal assistants utilize ASR systems that are always listening
but require the user to use a keyword to let the system know that
the user is initiating voice interaction. This creates awkward
interaction because the user cannot continue with the system after
receiving a response without using the keyword. It relegates these
systems to a form of communication that resembles amateur
radio.
[0010] Additionally, natural language systems are capable of
deciphering the meaning of a user query and provide a series of
result descriptions correlating with the query. However, these
systems do not offer a method for the user to then continue to use
spoken input to select one of the results from the list and
initiate the presentation of the content associated with a
particular result description as well as traversing back to the
list of result descriptions and interact with another result and
its associated content, all by way of spoken input.
[0011] Accordingly, there is a need for a simple command system
with a minimal number of commands or equivalent command-actions
that allows the user to easily interact and control the system in a
sustained, interactive manner as well as navigate dynamic, unknown
content.
SUMMARY
[0012] A simple affirmative response operating system (OS) is
disclosed, designed to enable a user to interact with a computer
system in a screen-free manner by engaging in a sustained and
immersive interaction between user and system, initiated by the
user using a minimum number of commands or equivalent
command-actions.
[0013] The simple affirmative response OS allows the user to
initiate a selection with a single, system-wide and
context-independent affirmative response and can receive unlimited,
undefined and/or defined information, rather than the user being
limited to receiving a single answer. The OS may present a list or
sequence of audio or visual items to a user, after each of which
the system may output an optional response prompt (either a tone or
a visual indicator), pause and wait for user input, and output an
optional conclusion response (either a tone or a visual
indicator).
[0014] The simple affirmative response OS solves the issue that
prevents current natural language voice searches from being truly
interactive for the user. Used in conjunction with a natural
language system, the user can initially perform a search on the
Internet by voice using a natural language query. Using the simple
affirmative response OS, the search results are read aloud in the
form of a list with a pause between each search result. The user
can then select and listen to various webpages associated with each
search result in a sustained and interactive manner.
[0015] In contrast with existing intelligent home systems, which
typically use a predefined list of commands that a user must
remember and simply wait for a valid input instead of dialoguing to
clarify user intent), the simple affirmative response OS allows an
intelligent home system to interact with the user, guiding them
through various options and settings presented as a list with a
pause between each item. Even if the user does not know what the
options or settings are, the user is guided and empowered to make
selections and invoke personal settings using a simple affirmative
response.
[0016] The simple affirmative response OS can facilitate a
tourist's self-guided tour of a city and allow the tourist to hear
historical information for various places of interest. Presenting
the information based on the GPS location of the tourist by way of
audio, the user can interact and navigate the information,
hands-free as it is presented by way of the simple affirmative
response OS. Furthermore, a tourist can be provided with a list of
eating establishments within their proximity and use the simple
affirmative response system to obtain detailed information
regarding the types of food, hours, etc. for a chosen eating
establishment.
[0017] A system is disclosed, comprising a processor, a sensor
coupled to the processor, an electronic sensory presentation
device, and non-transitory memory. The memory stores instructions
that, when executed by the processor, cause the processor to store
an identification of a unique, system-wide, and context-independent
affirmative action; divide a content input to generate a sequence
of contents and a sequence of identifiers, wherein each identifier
of the sequence of identifiers is associated with a content from
the sequence of contents; enter a list-pause cycle (comprising
outputting, via the sensory presentation device, an identifier from
the sequence of identifiers and pausing a predetermined length of
time); repeat list-pause cycles until determining, via input from
the sensor, that the affirmative action has been performed during
the pause of a list-pause cycle; and responsive to determining that
the affirmative action has been performed during the pause of the
list-pause cycle, output, via the sensory presentation device, the
content from the sequence of contents associated with the
identifier output during that list-pause cycle.
[0018] A computer-implemented method for interaction between a
human and a computing device, comprising storing an identification
of a unique, system-wide, and context-independent affirmative
action; dividing a content input to generate a sequence of contents
and a sequence of identifiers, wherein each identifier of the
sequence of identifiers is associated with a content from the
sequence of contents; entering a list-pause cycle (comprising
outputting, in an audio form, an identifier of the sequence of
identifier and pausing a predetermined length of time); repeating
list-pause cycles until determining the affirmative action has been
performed during the pause of a list-pause cycle; and responsive to
determining the affirmative action has been performed during the
pause of a list-pause cycle, outputting, in an audio form, the
content from the sequence of contents associated with the
identifier output during that list-pause cycle.
[0019] A method is disclosed, comprising storing, by a first
computing device, an identification of a unique, system-wide, and
context-independent affirmative action; storing, by the first
computing device, a sequence of available commands for a second
computing device and a sequence of identifiers, wherein each
identifier of the sequence of identifiers is associated with a
command from the sequence of available commands; entering a
list-pause cycle (comprising outputting, in an audio form, an
identifier of the sequence of identifiers and pausing a
predetermined length of time); repeating list-pause cycles until
determining the affirmative action has been performed during the
pause of a list-pause cycle; and responsive to determining the
affirmative action has been performed during the pause of a
list-pause cycle, transmitting, to the second computing device, the
command from the sequence of available commands associated with the
identifier output during that list-pause cycle.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] FIG. 1 is an illustrative network environment in which a
simple response-driven operating system may be implemented;
[0021] FIG. 2 is a block diagram of an illustrative computing
device that may be utilized to implement the various features and
processes described herein;
[0022] FIG. 3 is an illustrative diagram of representative
descriptors and content stored in a database on a computing
device;
[0023] FIG. 4 is an illustrative flow diagram of an example
implementation of a Wait Pause with the simple response-driven
system;
[0024] FIG. 5 is an illustrative flow diagram of an example
implementation of a List Pause with the simple response-driven
system;
[0025] FIG. 6 is a continuation of the illustrative flow diagram of
the example implementation of FIG. 5;
[0026] FIG. 7 is an illustrative example of a default action of
FIG. 6;
[0027] FIG. 8 is an example graphical user interface (GUI)
displayed on a computing device to provide a visual display of all
or a part of the list available for audio output in response to a
user response;
[0028] FIG. 9 is an illustrative flow diagram of another example
implementation of the simple response-driven system;
[0029] FIG. 10 is a continuation of the illustrative flow diagram
of the example implementation of FIG. 9;
[0030] FIG. 11 is an example GUI of a command controller displayed
on a computing device to provide a visual display of responses
available to a user;
[0031] FIG. 11A illustrates the example command controller GUI of
FIG. 11 displaying active responses available to a user; and
[0032] FIG. 12 is a block diagram of an optional system server that
may be utilized to implement various features and processes
described herein.
DETAILED DESCRIPTION
[0033] The following detailed description refers to the
accompanying drawings. The same labels and/or reference numbers in
different drawings may identify the same or similar elements. The
invention can be embodied in many different forms and should not be
construed as limited to the embodiments set forth herein. Rather,
the embodiments are provided so that this disclosure will be
complete and will fully convey the invention to those skilled in
the art.
[0034] This disclosure provides a technical solution to address the
shortcomings of existing natural language systems (NLS) and other
command-drive systems, since the embodiments described herein are
not dependent on having an Internet connection, and allow a user to
use and control, with a minimum number of voice commands or other
responses, a dynamic application. The dynamic application may
include, but is not limited to, an application for displaying
elements from a playlist of unknown content.
[0035] For illustrative purposes, embodiments of the invention are
described in connection with two primary methods. The first is a
mechanism for interacting with information stored in a database,
playlist, or other data storage, such as (but not limited to)
email, text messages, news feeds, music and other media content,
search results, smart homes, cars, and artificial intelligence (AI)
systems. The second is a mechanism for more generally initializing
or completely controlling an application via as few as three
responses or actions. It is understood, however, the invention is
not intended to be limited to the specific, illustrative
embodiments described herein and is applicable to a variety of
interactive applications and automated systems.
[0036] Embodiments of this disclosure are particularly well-suited
for allowing a user of a computing device to use a minimum number
of responses to navigate and control a list stored in a database or
other data storage, especially where the list is dynamic and,
therefore, not readily known to the user of the device.
[0037] Various specific details are set forth herein and in the
drawings, to aid in understanding the invention. However, such
specific details are intended to be illustrative, and are not
intended to restrict in any way the scope of the invention as
claimed herein. In addition, the particular screen layouts,
appearance, and terminology as depicted and described herein are
intended to be illustrative and exemplary and in no way limit the
scope of the invention as claimed.
[0038] FIG. 1 illustrates a representative network environment 100
in which embodiments may be implemented. As shown in FIG. 1, a user
employing a computing device 110 communicates over network 120.
Network 120 may be a wide area network (WAN) such as the Internet,
the Public Switched Telephone Network (PSTN), a local area network
(LAN), an intranet, an extranet, a cellular network, any wired or
wireless network, or any combination of the above.
[0039] The computing device(s) 110 are preferably mobile computing
devices, such as (but not limited to) a smart phone 110a, tablet
computer 110b, laptop computer 110c, in-vehicle computing devices
110d, or similar portable computing device, such as a smart
television remote control, portable video game console, portable
digital assistant, intelligent personal assistant, or wearable
device (including, for example, a smart watch, bracelet, or
eyeglasses). It is understood, however, that the invention
disclosed herein is not limited to mobile computing devices and is
applicable to other computing devices where it is desirable to
interact with and control applications using voice or other
commands.
[0040] For example, a visually impaired person utilizing a desktop
computer 110e may want to hear a list of content or data and select
desired content from the list using a voice command. Similarly, a
person using an interactive television (iTV) or a cable television
set-top box 110f may want to select a desired television program or
movie from a list (e.g., listing of available programming on an
interactive program guide) using a simple voice command. Other
examples of non-mobile computing devices compatible with the
disclosed method include intelligent personal assistant 110g, such
as the Amazon.RTM. Echo voice-activated speaker or Google.RTM.
Home.RTM. voice-activated speaker.
[0041] For illustration purposes only, six representative computing
devices 110 are shown in FIG. 1--a smart phone 110a, such as (but
not limited to) an iPhone.RTM. or Android.RTM. handheld device; a
tablet 110b, such as (but not limited to) an iPad.RTM. or
Windows.RTM. Surface.RTM. tablet computer or a tablet computer
running the Android.RTM. operating system; a laptop computer 110c;
a desktop computer 110e, an iTV 110f, and an in-vehicle computing
device 110d, such as (but not limited to) dedicated computing
devices within a vehicle. Although six representative computing
devices 110a-110g are shown in FIG. 1, it is understood that
environment 100 may include any number of computing devices, each
controlled by a respective user. For purposes of this disclosure,
the term "computing device 110" will refer collectively to each of
the devices 110a-110g, as well as to any other computing device
capable of receiving any form of user input, as described
below.
[0042] The computing devices 110, in a preferred embodiment,
communicate with one or more third-party servers 130 over network
120. Data or other content may be downloaded or streamed from one
or more respective servers 130 for storage in memory on computing
device 110. Third-party servers 130 may include (but are not
limited to) servers hosting a website, search engines, mapping
services, email servers, news publisher servers for providing news
feeds (e.g., Atom or RSS (Really Simple Syndication) feeds), or any
other server from which desired content may be downloaded by or
streamed to computing device 110 over network 120. The computing
device 110 communicates with the third-party servers 130 to
download content or commands, but does not require communication
with the third-party servers over network 120 for ASR processing or
processing of other forms of user input. Although only two
third-party servers 130 are shown in FIG. 1, it is understood that
environment 100 may include any number of third-party servers
130.
[0043] An optional intermediary system server 140 is also
illustrated in FIG. 1 that, among other things, processes the
desired content or data before it is downloaded by or streamed to
computing device 110 from third-party servers 130 over network 120.
This ensures that only the desired content or data is downloaded by
or streamed to computing device 110. In some embodiments, the
intermediary system server 140 strips out unwanted information that
may be included with the desired content or data to be downloaded
by or streamed to the computing device 110. For example, if desired
content from a webpage hosted on third-party server 130 is to be
downloaded by computing device 110, that webpage may include
additional content (e.g., advertisements, images, navigation links,
other unimportant content, etc.). Intermediary system server 140
may use conventional extractor software or programming to strip out
this additional or unnecessary content so only the desired content
(e.g., actual article/page text with no additional or unnecessary
content) is downloaded for storage in the computing device 110.
[0044] In addition, in some embodiments, the intermediary server
140 may serve as a source of content to be downloaded by or
streamed to computing device 110 over network 120, as opposed to or
in addition to obtaining content or data from third-party servers
130. The intermediary server 140 may also be the source of
available voice commands for client application software running on
computing device 110. It is understood that intermediary server 140
may be separate from or combined with third-party server 130, and
that one or more intermediary servers 140 may optionally be
utilized with the various embodiments of this disclosure. Again, it
is understood that ASR processing or other processing of input
commands (such as gestures, movements, or other input detected by a
sensor) can be performed locally on computing device 110 without
the need to communicate over network 120 with third-party servers
130 or intermediary servers 140 for ASR processing or other
processing of input commands. Nothing in this disclosure, however,
should be construed to preclude the computing device 110 from being
connected to network 120 for any purposes (e.g., connected to a
cellular network for making calls, another application software
running on computing device 110 communicating over a network for
purposes other than described herein, etc.).
[0045] FIG. 2 is a block diagram of an illustrative computing
device 110. As discussed above, examples of computing devices 110
include (but are not limited to) mobile computing devices, such as,
for example, smart phones, tablet computers, personal digital
assistants (PDAs), laptop computers, wearable devices, portable
game consoles, or any other portable computing devices. While a
representative mobile computing device 110 is illustrated in FIG.
2, it is understood that the invention disclosed herein is not
limited to mobile computing devices and is applicable to other
computing devices where it is desirable to interact with and
control applications (including, e.g., web browsers, email clients,
file managers, etc.), display media content, or display or review
data by way of voice commands or other forms of user input.
[0046] As shown, computing device 110 includes one or more
processing units 210 coupled to system memory 220, and various
sensors and subsystems including (but not limited to) wireless
communication subsystem(s) 230, audio subsystem 240, I/O subsystem
250, and power subsystem 260.
[0047] Wireless communication subsystem 230 facilitates wireless
device communication functions over network 120. For example,
wireless communication subsystem 230 may include one or more
transmitters and receivers (not shown), which are implemented to
operate over network 120.
[0048] Audio subsystem 240 is coupled to speaker(s) 242 to output
audio and to microphone(s) 244 to facilitate voice-enabled
functions, such as voice recognition, digital recording, telephony,
etc.
[0049] I/O subsystem 250 facilitates the transfer between I/O
peripheral devices, such as (but not limited to) a touch screen 253
and other I/O devices 255, and the data bus of the processing
unit(s) 200. I/O subsystem 250 may include a touch screen
controller 252 and other I/O controller(s) 254 to facilitate the
transfer between I/O peripheral devices and the data bus of
processing unit(s) 210.
[0050] Other I/O devices 255 may include, in some embodiments, a
camera, a photodetector, a motion sensor, a pressure sensitive
button or pad, a motion-sensitive trackpad, a manipulable joystick
or other raised switch having one or more possible degrees of
motion, an accelerometer, or a gradiometer. Other I/O devices 255
may, rather than being directly incorporated into computing device
110, be peripheral devices that connect to other I/O controllers
254 via a wired connection to a physical data port or via a
wireless connection that is received by wireless communications
subsystem 230. For example, other I/O devices may include a
wireless mouse, wireless keyboard, wearable device (e.g., a
smartwatch or bracelet having buttons, touchscreens, or
accelerometers for user input), or a companion mobile computing
device (e.g., a remote control for a television, or a mobile phone
connected to a desktop computer).
[0051] The processing unit(s) 210 is coupled to system memory 220,
which stores various software components or applications 224 to be
executed by processing unit(s) 210. Preferably, memory 220 stores
an operating system (OS) 221, which includes various procedures,
sets of instructions, software components and/or drivers for
controlling and managing general system tasks (e.g., memory
management, storage device control, power management, etc.) and
facilitates communication between various hardware and software
components. System memory 220 also preferably includes RAM 222 and
non-volatile flash memory 223, a combination of
volatile/non-volatile memory, and/or any other type of memory.
[0052] System memory may include a speech-to-text (STT) engine 225,
which receives speech input from the user of computing device 110
via microphone 244 and audio subsystem 240. In some embodiments,
the STT engine 225 uses various acoustic and language models to
recognize the speech input as a sequence of phonemes, and
ultimately, a sequence of words or tokens written in one or more
languages. The STT engine 225 can be implemented using any suitable
speech recognition techniques, acoustic models, and language
models, such as (but not limited to) Hidden Markov Models, Dynamic
Time Warping (DTW)-based speech recognition, and other statistical
and/or analytical techniques.
[0053] System memory may also include a text-to-speech (TTS) engine
226, which converts data in the form of text to a synthesized
speech output via speaker 242 and audio subsystem 240. In some
implementations, the synthesized speech can be created by
concatenating phonemes of recorded speech that are stored in a
voice database.
[0054] System memory 220 may also include at least one database
227. As will be discussed below with respect to FIG. 3, database
227 functions to store desired content and associated data in
memory 220 on computing device 110. Database 227 is preferably a
Structured Query Language (SQL) database, but can be structured as
any other form of database, including, for example, a
non-relational (NOSQL) database; a structured plaintext datafile,
such as file storing comma separated value (CSV), JavaScript object
notation (JSON), or extensible markup language (XML); an
Excel.RTM., Access.RTM., or other desktop productivity suite
filetype; or any other data storage format.
[0055] System memory 220 also includes communication instructions
to facilitate communicating with one or more additional devices;
graphical user interface (GUI) instructions to facilitate GUI
processing; image processing instructions to facilitate
image-related processing and functions; input processing
instructions to facilitate input-related (e.g., manual input)
processes and functions; audio processing instructions to
facilitate audio-related processes and functions; and camera
instructions to facilitate camera-related processes and functions.
It is understood that the communication instructions described
above are merely exemplary and the system memory 220 may also
include additional instructions. For example, memory 220 for a
smartphone 110 may include phone instructions to facilitate
phone-related processes and functions. It is also understood that
the above-mentioned instructions need not be implemented as
separate software programs or modules.
[0056] While the components illustrated in FIG. 2 are shown as
separate components, one of ordinary skill in the art will
recognize that two or more components or modules may be integrated
into a single component or module. In addition, two or more
components may be coupled together by one or more communication
buses or signal lines. Also, while many of the functions have been
described as being performed by one component or module, one of
ordinary skill in the art will realize that the functions described
with respect to FIG. 2 may be split into two or more components or
modules.
[0057] The computing device 110 also includes one or more power
source(s) 280 for powering the various hardware components of the
computing device. Power source 280 may include a power management
system, one or more power sources, a recharging system, a power
failure detection circuit, a power converter or inverter, a power
status indicator, and any other components typically associated
with the generation, management and distribution of power in
computing devices.
[0058] FIG. 3 illustrates a high level representation of content
and descriptors of content stored in a database 227 on a computing
device 110. Content 310 refers to desired content or data items
(Data 1, Data 2, Data 3, Data 4, . . . Data n) stored in memory 220
of computing device 110.
[0059] In some implementations, the desired content is downloaded
by or streamed to computing device 110 over network 120 for storage
in memory 220. For example, the desired content may be (but is not
limited to) full text news articles obtained from third-party
servers 130 (and optionally stripped of unwanted or unnecessary
information by an extraction process performed by intermediate
system server 140). In other implementations, the desired content
may be generated by, or entered into, another application running
on computing device 110.
[0060] Any form of sequential content or content that can be
converted or divided up into a sequential form may be presented
using the presently disclosed system. Other examples of desired
content or data items include (but are not limited to) the text
associated with webpages identified by an Internet search engine,
search with in private data repositories such as Lexis-Nexis.RTM.,
emails from an email server or system, books presented in a digital
format that are presented as text or audio, text messages received
by a mobile phone or messaging system, instant messages received
from a messaging protocol, events/appointments in a calendar
system, information about physical locations identified by mapping
software or search engines, audio files (e.g., episodes of a
podcast, songs from an internet radio service or other audio
streaming service, or subparts of an HTTP Live Streaming playlist),
video files (including, e.g., episodic video content from a video
hosting service or video streaming service), updates or status
indications from a user's contacts on a social networking site, new
threads or posts in an interactive forum, contents of recipes in
electronic cookbooks or other databases, new articles in an online
newspaper, online magazine, or blog, updates or actual events
recorded in an available data source (e.g., a police
scanner/blotter, a list of real estate available from a realtor,
classified advertisements from a newspaper, a list of newly
introduced legislation in a legislature, etc.), or information
gathering by way of an interactive questionnaire used for research
(e.g., medical, marketing, or political surveys).
[0061] In an alternative embodiment, rather than the display of
textual or other media content for a user's consumption, the stored
elements may be user interface items intended for a user to
interact with. For example, memory 220 may store information
regarding graphical user interface (GUI) buttons, toggles, radio
buttons, drop down boxes, hyperlinks, tabs, windows, dialog boxes,
error messages, or collapsible sections of content. Memory 220 may
also store a list of other applications currently running in
memory, and any application programming interface (API) or other
elements that are capable of manipulating, controlling, inputting
data into, or getting data from these other applications.
Consequently, in contrast to the primary embodiment of outputting
content described in the following paragraphs, a command system may
be used to interact with other applications as an additional mode
of input to any identified GUI or API elements of those
applications. An API may in turn be provided by the disclosed
method in order to allow other software to control aspects of data
output, recognition of user input, or the method by which a user's
selection of input is determined.
[0062] An illustrative high level list of content descriptors 320
is also stored in database 227 of computing device 110. The list of
content descriptors 320 includes Descriptor 1, Descriptor 2,
Descriptor 3, Descriptor 4, . . . Descriptor n. Each descriptor in
the list of content descriptors 320 is preferably a text-based
summary or description of an associated content or data listed in
content 310. For example, Descriptor 1 is a text-based summary or
description of Data 1, Descriptor 2 is a text-based summary or
description of Data 2, and so on. The descriptor may be, for
example, a text-based headline for a news article, text appearing
in the subject of an email or text message, a text-based summary of
a webpage identified in the results of an Internet search, etc.
[0063] In some implementations, the descriptor in the list of
content descriptors 320 may be downloaded by or streamed to the
computing device 110 over network 120.
[0064] By way of example only, in the context of news feeds, a user
of computing device 110 could subscribe to one or more RSS feeds
published by third-party news server(s) 130. The RSS feed may
include text-based headlines and URLs for the actual news article
associated with each headline. One or more headlines included in
the RSS feed can be stored in memory 220 of computing device 110
for inclusion in the list of content descriptors 320 of database
227. An application software 224 running on computing device 110
causes the processing unit(s) 210 to send a request over network
120 to the third-party server 130 to obtain the full text of the
actual news article corresponding to the URL included in the RSS
feed. Once received (optionally after extracting unwanted or
unnecessary content from the retrieved news article using
intermediate system server 140), the full text of the news article
can be stored in memory 220 of computing device 110 for inclusion
in the list of content 310 of database 227.
[0065] In a second example embodiment, in the context of emails, a
user of computing device 110 could have one or more email accounts
on one or more email servers 130. Via any email protocol, including
for example IMAP, POP, or Microsoft Exchange.RTM., the computing
device may access the email headers and contents of one or more
stored emails on email servers 130, and the headers and/or contents
may be stored in memory 220 of computing device 110 for inclusion
in the list of content descriptors 320 of database 227.
[0066] In a third example embodiment, in the context of search
engine queries, a user of computing device 110 could cause a search
query to be transmitted (via the hypertext transfer protocol (HTTP)
or any other protocol) to one or more search engines hosted by one
or more web servers 130. Search results containing a series of URLs
of webpages responding to the query, and/or contents or titles of
the webpages themselves, may be stored in memory 220 of computing
device 110 for inclusion in the list of content descriptors 320 of
database 227.
[0067] In a fourth example embodiment, in the context of
location-aware searching, a user of computing device 110 could
cause a current location of the user or the user's device (as
determined by a street address or GPS sensor) to be transmitted to
search engines hosted by one or more web servers 130. Search
results containing a series of locations, identifiers of those
locations, and/or pertinent information about the history of those
locations or services provided at those locations may be stored in
memory 220 of computing device 110 for inclusion in the list of
content descriptors 320 of database 227.
[0068] In a fifth example embodiment, in the context of video
streaming, a user of computing device 110 could cause the computing
device to transmit a request for available videos to a video
streaming service (e.g., YouTube.RTM., Hulu.RTM., Netflix.RTM., HBO
Go.RTM., etc.). The request could comprise additional information,
such as an identification of the user, subscription information
about the user, or a query for a particular brand or source of
content, in order to narrow down which videos are available. In
response, the computing device receives a series of video titles
(and/or other information such as publisher, video length, episode
or season numbers) and identifiers allowing a video to be played,
which may be stored in memory 220 of computing device 110 for
inclusion in the list of content descriptors 320 of database 227.
In addition, the user of the computing device can receive a stream
of shortened video clips in a successive order and make a selection
without storing any data locally on the device.
[0069] In a sixth example embodiment, in the context of home
appliances, a user of computing device 110 could cause the
computing device to transmit a request for available settings or
commands to a smart home appliance such as a thermostat or coffee
maker. In response, the computing device receives a list of
possible settings to change (and the values to which the setting
may be changed) and/or a list of possible commands to the device
(including, for example, "turn off" or "begin brewing"), which may
be stored in memory 220 of computing device 110 for inclusion in
the list of content descriptors 320 of database 227.
[0070] In a seventh example embodiment, in the context of smart
vehicles, a driver or manufacturer of smart vehicle 110 could cause
the vehicle to be pre-loaded with a series of settings or commands
(including, for example, turning on a radio, requesting a current
value such as fuel level or oil level, or determining other vehicle
statuses) in a list of content descriptors 320 stored in database
227.
GO (or Equivalent Simple Affirmative) Response
[0071] User navigation of the application software 224 executed by
processing unit(s) 210 on computing device 110 is designed so the
device user need only know a limited or minimal number of simple,
easily remembered responses ("simple responses"). In a preferred
embodiment, the simple responses are voice responses, each
consisting of a single spoken word or phrase.
[0072] In an alternative embodiment, responses may be provided by a
user without speaking, including, by way of non-limiting example,
manipulation of an input device (e.g., a keyboard, mouse, button,
joystick, trackpad, etc.), a hand gesture (e.g. a hand wave, a
shaking of a mobile device, a movement of the hand from one
elevation to another, or a gesture having a meaning in a sign
language), a movement or positioning of fingers (e.g., a "thumbs
up", an open palm, a closed fist, a particular number of fingers
pointing away from the hand, a particular finger or fingers
pointing away from the hand, etc.), a facial movement (e.g., a
wink, a blink, a number of blinks, a movement of the eyes in a
particular direction, a raised eyebrow, a smile, a frown, etc.),
other muscle movements (e.g., a tap of a finger or foot on a
surface, a muscle twitch in an otherwise paralyzed or amputated
limb), non-verbal sound inputs (e.g., clapping hands or a series of
claps, humming, coughs, clicks, utterances, etc.), or even input
from an electroencephalograph or other device that directly
measures brain activity of a user. A response may consist of a
combination of two or more such response types, such as speaking a
word while holding down a button.
[0073] In some embodiments, the application software 224 can be
configured to respond to only three responses, such as a simple
affirmative response, a simple pause-initiation response, and a
simple options-initiation response. Consequently, a user of
computing device 110 need only remember the three simple responses.
The responses are ideally unique responses both system-wide and
independent of context. For example, the same set of three
responses should be used in every menu of an application, or in
every task that might be performed by a productivity software
suite, or for the display of every type of media content, or for
navigating a variety of different websites, or for reviewing
results of different query types or search engines, or for the
control of every physical component controlled by a controller
using an affirmative response interface, or for every application
controlled by an operating system, or for both a normal mode and a
navigation or help mode of an application. As a result, a user need
not learn a different set of responses based on media type (such as
"read" for text or "play" for video) or for navigation (such as
"back/forward" in a web browser or "parent/child" for a file
explorer with hierarchical nodes).
[0074] For example, the simple affirmative response could be a
verbal response selected from any one of "GO", "YES", "SELECT",
"CHOOSE", a similar statement of assent to continuing, or the
equivalent of one of these response, localized to a different
language ("ALLEZ", "OUT", "SI", "DA", etc.). The simple
pause-initiation response could be, for example, any one of "STOP",
"NO", "CEASE", "WAIT". a similar statement indicating that
continuing is not desired, or the equivalent of one of these
response, localized to a different language ("ARRET", "ALTO",
etc.). The simple options-initiation response could be, for
example, any one of "HELP", "WHAT", "MENU", "OPTIONS", another
response indicating an intent that is neither continuing or
stopping, or the equivalent of one of these response, localized to
a different language ("AIDE-MOI", "AYUDA", etc.).
[0075] In a non-verbal embodiment, the three simple response could
be, by way of example only, a thumbs up, thumbs down, and an open
palm; or one blink, two blinks, and a wink; or clicks of different
buttons on a mouse or other peripheral device having at least three
buttons; or a single-click, double-click, and triple-click of a
mouse or other button; or different numbers of taps against a
surface with a foot or finger; or any three ways of interacting
with an input device capable of distinguishing three separate
examples of input.
[0076] In some embodiments, more than three responses may also be
available to the user to enable faster interaction with navigation
the software application 224.
[0077] The simple affirmative response GO (or equivalent simple
affirmative response) is preferably the default response to
initiate different actions at different states within the
application software 224 running on computing device 110. For
example, when a user inputs a GO response (or equivalent simple
affirmative response) during a pause between the TTS output of a
list, the application software 224 may cause the processing unit(s)
210 to initiate a default action specific to that mode or sequence.
Representative examples of default action by the processing unit(s)
210 may include (but are not limited to): [0078] A GO response
input may initiate the output of a content associated with an
identifier of that content that immediately preceded the GO
response input. [0079] At the end of a mode or sequence where a
list of available options for how to proceed are output to the
user, a GO response initiates the action associated with the item
in the list that immediately preceded the GO response input. [0080]
When presented with a list of Settings, a GO response input may
select the setting (e.g., changing the speed of the output) that
immediately preceded the GO response input.
[0081] The STOP response (or equivalent simple pause-initiation
response) input allows the user of computing device 110 to
interrupt action within the application software 224 and enter into
an Indefinite-Length Pause ("Wait Pause") as will be described
below. For example, a user of computing device 110 could input a
STOP response during the pause(s) in output of a list, which would
cause processing unit(s) 210 to interrupt the output and remain
indefinitely in a paused state until the user provides another
response input.
[0082] The HELP response (or equivalent simple options-initiation
response) input allows the user of computing device 110 to obtain a
list of voice or other input responses that are available to the
user within the software application 224. Upon receipt of a HELP
response input, the application software 224 causes processing
unit(s) 210 to output a list of available responses through
speaker(s) 242 or other electronic sensory presentation device.
Optionally, after the user received the desired response, the user
simply provides a GO response (or equivalent simple affirmative
response) input to cause the processing unit(s) 210 to invoke the
desired response within software application 224.
Pauses
[0083] Pauses introduced in the navigation of application software
224 provide the user with the time needed to input a response.
Because a pause results in the absence of sound and occurs
naturally during presentation of speech, the pause may be preceded
by a short tone or other media or sensory output to alert the user
to the imminent pause. In addition, the pause may also be followed
by a short tone (or other media or sensory output) to habituate the
user to a predefined length of time (t) of the pause and remove any
sense of randomness to the length of the pause.
[0084] In some embodiments, there may be two types of pauses--a
List-With-A-Defined-Length Pause ("List Pause") and an
Indefinite-Length Pause ("Wait Pause")--to facilitate user
interaction with the application software 224 executed by
processing unit(s) 210 on computing device 110. The List Pause is
preferably set for a short, predefined length of time (t) (e.g.,
1.5 seconds-4 seconds) and used predominantly throughout the
presentation of labels in the application software 224. The Wait
Pause often occurs at the end of a List Pause or after the user
inputs a STOP response during a List Pause.
[0085] The List Pause may be a different length for different
applications, purposes, or content types; for example, when
presenting streaming content such as a video, audio book, or
podcast, the List Pause may be much shorter, to allow a return to
the content with minimal interruption, and when presenting text or
other information typically meant for consumption at the user's
chosen pace, a longer List Pause may be used.
[0086] The Wait Pause facilitates changing to a different mode in
the computer system in addition to pausing indefinitely. Once a
Wait Pause is invoked, the user can continue with the current mode
with the Go command/command-action or use other commands to change
to, for example, Settings or Selecting a Different Channel or
passing user control to a Natural Language interface.
[0087] List Pause
[0088] In some embodiments, the application software 224 executed
by processing unit(s) 210 on computing device 110 may provide two
or more types of lists to the user. The first list type may be the
Content List 310 or List of Content Descriptors 320 (e.g., short
content, such as (but not limited to) headlines, names of songs,
etc.; and/or longer content divided into segments (e.g., one or
more paragraphs) separated with a pause, such as (but not limited
to) summaries, stories, articles, etc.). The second list type may
be a Navigational List that presents options to the user to
navigate throughout the application software 224.
[0089] With the Content List, List of Content Descriptors or
Navigational List, the application software 224 preferably causes
the processing unit(s) 210 to initiate the output of each item in
the list followed by a short List Pause for a predefined length of
time (t) to allow sufficient time for the user of computing device
110 to interact with the list (or a story/article divided into
segments) and control movement within the list or content. The
predefined pause time (t) may be a fixed time (e.g., about 2.5
seconds) programmed in the application software 224, or it can be a
specific time (t) selected by the user of computing device 110 from
a range of available pause times (e.g., from about 1.5 seconds to
about 4 seconds). In addition to the GO response (or equivalent
simple affirmative response), other simple responses that may also
be available for the user to interact and control the list or
content may include (but are not limited to) RESTART, REPEAT,
GO-BACK, SKIP, NEXT and/or PREVIOUS, as well as SAVE and/or
DELETE.
[0090] In addition, the short List Pause facilitates the user to
input a simple affirmative response to initiate a default action
specific to that mode or sequence by the application software 224,
such as (but not limited to) select and output of the associated
content (story, article, etc.), or to initiate the action
associated with an item in a Navigational List, such as, by way of
example only:
[0091] "When you hear your selection, say GO" [0092] [1] "Choose
another Channel within the current Topic" [0093] [Tone, List Pause,
Tone] [0094] [2] "Go back to the list of Topics" [0095] [Tone, List
Pause, Tone] [0096] [3] "Exit and go back to where you were" [0097]
[Tone, List Pause, Tone]
[0098] With respect to output of longer content, it is preferred to
divide longer content into segments (e.g., paragraphs or groups of
paragraphs) and insert a List Pause between the output of each
segment. The List Pause between output of segments of longer
content facilitates the following non-limiting conveniences for the
user of computing device 110 because the device may have impaired
ability to listen or receive other input due to command-feedback
when in speaker mode and thus the user may not be able to
effectively interrupt the output using a command. Introducing a
List Pause improves the overall experience for TTS output of longer
content by providing control over the longer content during
speakerphone playback. This contrasts with other forms of audio
broadcast (e.g., radio) and other audio recordings.
[0099] Wait Pause
[0100] When a user of computing device 110 encounters a Wait Pause,
the application software 224 has paused indefinitely and preferably
remains in the paused state until the user provides a response
input or passes control to a natural language interface. If the
user passes control to a natural language interface, the List Pause
and Wait Pause cycles may be terminated until the natural language
interface is used to request content that can be converted into a
sequential form amenable to navigation and output using simple
affirmative responses.
[0101] In some embodiments, this can occur at the end of output of
a Navigational List with the user not initiating changing to a
different mode and the application software 224 remaining in wait
state. When the user inputs a simple affirmative response during
the Wait Pause, the application software 224 may provide the user
with instructions as to how to proceed, such as, by way of example
only: [0102] "You are in Extended Help with Responses for Channel
Mode while the news reader application software is Paused" [0103]
"After you hear your selection and the Listen Tone, say GO" [0104]
[1] "Exit Extended Help and return to Current Channel." [0105]
[Tone, List Pause, Tone] [0106] [2] "Return to General Response
Help." [0107] [Tone, List Pause, Tone] [0108] [3] "Read the list of
available commands for Channel Mode." [0109] [Tone, List Pause,
Tone] [0110] [4] "Restart Extended Command Help for Channel Mode"
[0111] [Tone, List Pause, Tone]
[0112] A Wait Pause may also occur when the user of computing
device 110 inputs a simple pause-initiation command. The
application software 224 causes the processing unit(s) 210 to then
change to the Wait Pause, preferably preceded by a tone or other
media or sensory output. The tone or other output preceding the
Wait Pause is preferably a different output than the one preceding
the List Pause to indicate to the user that they are in a different
pause state.
[0113] The Wait Pause that occurs after the simple pause-initiation
command enables the user to navigate to other modes within the
application software 224 using additional responses, such as, for
example, HEADLINES, STORIES, CHANNELS and SETTINGS. If the user is
in a Wait Pause during the output of content (e.g., playback of a
story or article), the user may input a simple affirmative
response, which default action specific to that mode or sequence
may be to continue the output of the paused content. In addition to
the affirmative response, the user may use other List control
simple responses, such as (but not limited to) RESTART, REPEAT,
GO-BACK, SKIP, NEXT, PREVIOUS and/or HELP.
[0114] If a user attempts a predetermined number of times to
perform an input action while in a Wait Pause and the system is
unable to determine the user's intent (such as three failed input
attempts by the user), the system may automatically switch into a
guide or help mode and begin outputting possible commands that a
user may have meant, or outputting all possible commands or
navigation options to allow the user to reorient themselves.
[0115] A simple affirmative response following a Wait Pause may
cause the processing unit(s) 210 to initiate the output of a list,
such as (but not limited to) an entire list of headlines, with each
item in the list separated by a List Pause; continue the output of
a paused story (which may represent segments or paragraphs of
content); or merely "awaken" the application software 224 from the
indefinite-length Wait Pause to inform the user of their "location"
in the application software, such as the sequence in which the user
resides in the application software, and initiating the output to
the user of available options, such as (but not limited to) to
remain or change to a different mode within the application
software.
[0116] FIG. 4 is an illustrative flow diagram of an example
implementation of a Wait Pause (described above) with the simple
affirmative response operating system. While not illustrated in
FIG. 4, the process described below may be initiated, for example,
in response to a response input by the user using microphone(s) 244
or other I/O device 255 of the computing device 110, or in response
to some other action or event occurring while the application
software 224 is running on computing device 110. For example,
assuming that one of the following has occurred that will result in
the application software 224 causing the processing unit(s) 210 to
enter into a Wait Pause: the application software 224 has just been
launched by the user on the computing device 110, the processing
unit(s) 210 have completed output of a list (e.g., content
descriptors, content, available responses, etc.) through speaker(s)
242 or other electronic sensory presentation device, or received a
simple pause-initiation response input from the user via
microphone(s) 244 or other I/O devices (Step 400). In response, the
processing unit(s) 210 causes the output of a listen tone or other
sensory feedback--a Wait Listen signal in this example--to the user
through speaker(s) 242 of computing device 110 (Step 405).
[0117] The application software 224 causes the processing unit(s)
210 to change to a listening state to detect a response input by
the user (Step 410). In the example illustrated in FIG. 4, the
listening state 410 is an indefinite-length Wait Pause, where the
application software 224 has paused indefinitely and preferably
remains in the paused state until the user provides a response
input or passes user control to a natural language interface. Wait
Listen signal (Step 405) is preferably a distinctive signal, such
as, for example, a double tone or the like, to indicate to the user
of computing device 110 that the application software 224 is paused
indefinitely and waiting for a response input from the user.
[0118] Once the user provides a response input via microphone(s)
244 or other I/O device 255 (Step 415), the STT engine 225 (or, if
the response is non-verbal, a visual input or other signal
processing unit) converts the received response input from for
processing by processing unit(s) 210. If the received response
input is not understood (Step 420), then the processing unit(s) 210
causes output of a Response Not Understood signal (Step 425). The
Response Not Understood signal is preferably distinct from Wait
Listen signal (Step 405) or other signals so that the user may
recognize that the response was not understood by the system. The
application software 224 then causes the processing unit(s) 210 to
respond accordingly to not understanding a response input (Step
430). The response may be, for example, a TTS output indicating
that the response input was not understood, or a TTS output of a
list of responses available to the user at that location or
sequence within application software 224 with a List Pause of a
predefined time (t) between each item in the list to allow the user
time to provide a GO voice response (or equivalent simple
affirmative response) input after hearing the TTS output of the
desired item in the list.
[0119] If the response is understood (Step 420), then the
processing unit(s) 210 causes the audio subsystem 240 to output an
Affirmation signal through speaker(s) 242 (Step 435). The
Affirmation signal is preferably distinct from Wait Listen signal
(Step 405), Response Not Understood signal (Step 425) and any other
signal so that the user may recognize that the response was
understood by the system. The processing unit(s) 210 determines
whether the received simple response input was a GO (or equivalent
affirmative) response (Step 440). If the processing unit(s) 210
determines that the received voice response was a GO (or equivalent
affirmative) response, then the processing unit(s) 210 processes
and responds accordingly to the received default GO response (Step
445). Alternatively, if the processing unit(s) 210 determines that
a simple response other than a GO response (e.g., RESTART, RESTORE,
CHANNELS, STORIES, SETTINGS, HELP, etc.) is received from the user
(Step 440), then the processing unit(s) 210 processes the received
other response and responds accordingly (Step 450).
[0120] FIGS. 5 and 6 are an illustrative flow diagram of an example
implementation of a List Pause (described above) with the simple
affirmative response operating system.
[0121] While not illustrated in FIG. 5, the process described below
may be initiated, for example, in response to a response input by
the user using microphone(s) 244 or other I/O device 255 of the
computing device 110, or in response to some other action or event
occurring while the application software 224 is running on
computing device 110.
[0122] The process described below may, in some embodiments, be
initiated by an open ended natural language prompt (e.g., "What
were the greatest achievements of George Washington?") that, when
converted by a speech-to-text system and input to a search engine,
yields an output suitable for sequential processing such as a list
of search results. After a user has received one or more of the
sequential results and is satisfied, the user may use the
navigation system (via an options-initiation response or another
command during a Wait Pause) to exit the simple affirmative
response system and to return to a natural language input mode. In
other embodiments, a system may use exclusively a simple
affirmative response protocol for all input and output
interactions.
[0123] In this example, the application software 224 causes the
processing unit(s) 210 to retrieve the list of content descriptors
320 residing in database 227 (Step 500). Once retrieved, the
processing unit(s) 210 may cause the TTS engine 226 and audio
subsystem 240, or another electronic sensory presentation device
(including, e.g., a screen or light-emitting diode for visual
feedback, or a vibration device for tactile feedback) to provide
output of a first content descriptor (e.g., Descriptor 1) in the
list of content descriptors 320 to the user through speaker(s) 242
(Step 510). For purposes of this disclosure, the first content
descriptor refers to text of the first content descriptor retrieved
for TTS output from the list 320, but is not necessarily the first
content descriptor in the list 320.
[0124] After output of the first content descriptor (e.g.,
Descriptor 1 of FIG. 3) retrieved from the list of content
descriptors 320 is completed, the processing unit(s) 210 may cause
the audio subsystem 240 or other electronic sensory presentation
device to output a listen signal (Step 520) and change to a List
Pause listening state for a predefined time (t) to detect a
response input from the user (Step 530). The List Pause listen
signal (Step 520) is preferably a distinct audible tone (e.g., a
single tone or the like) from the Wait Pause Listen signal (Step
405), Response Not Understood signal (Step 425) and Affirmation
signal (Step 435) described above with respect to FIG. 4, to
indicate to the user of computing device 110 that the application
software 244 is in a List Pause state that will time out after
pause time (t) if the user does not provide a response.
[0125] If no response input is received during the predefined pause
time (t) (Step 540), then the processing unit(s) 210 determines
whether there remain any content descriptors (e.g., Descriptors 2,
Descriptor 3, Descriptor 4, . . . Descriptor n) in the list of
content descriptors 320 that have not been TTS output to the user
through speaker(s) 242 (Step 550). If there are descriptors
remaining that have not been TTS output to the user (Step 550),
then the processing unit(s) 210 may optionally cause the audio
subsystem 240 or other electronic sensory presentation device to
output an end of pause signal (Step 560). The end of pause signal
(Step 560) is preferably distinct from the List Pause listen
signal, Wait Pause listen signal, Response Not Understood signal
and Affirmation signal, so that the user of computing device 110
can differentiate between them.
[0126] If there are content descriptors remaining from the
retrieved list 320 that have not been output to the user (Step 550)
or after output of the optional end of pause signal (Step 560), the
processing unit(s) 210 causes the conversion a next content
descriptor in the retrieved list of content descriptors 320 for
output to the user through audio subsystem 240 and speaker(s) 242
or other electronic sensory presentation device (Step 570). The
process in Steps 520-570 is repeated until either a response input
is received from the user through microphone(s) 244 or other I/O
input device 255 (Step 540), or a predefined number or all of the
descriptors (e.g., Descriptor 3, Descriptor 4, . . . Descriptor n
of FIG. 3) in the retrieved list of content descriptors 320 have
been output to the user through audio subsystem 240 and speaker(s)
242 or other electronic sensory presentation device. In the latter
case, the processing unit(s) 210 may optionally cause the system to
provide an output to the user through audio subsystem 240 and
speaker(s) 242 or other electronic sensory presentation device as
to other options from which the user may provide response input
(Step 580). It is understood that the retrieval and output of the
content descriptors within the list of content descriptors 320 can
be sequentially, random or any order desired. The important aspect
is the retrieval and output of descriptors in the list 320, not the
order in which the descriptors are retrieved and output to the
user.
[0127] Referring to FIG. 6, if a response input from the user is
received during the pause time (t) (Step 540 in FIG. 5), then the
processing unit(s) 210 causes the STT engine 225 or other
processing unit to convert the input to a text response for
processing by processing unit(s) 210. If the received response
input is not understood (Step 600), then the processing unit(s) 210
causes the audio subsystem 240 or other electronic sensory
presentation device to output a Response Not Understood signal
(Step 610). The Response Not Understood signal is preferably
distinct from List Pause listen signal, Wait Pause listen signal,
Response Not Understood signal, Affirmation signal and optional End
of Pause signal so that the user may recognize that the response
was not understood by the system. The application software 224 then
causes the processing unit(s) 210 to respond accordingly to not
understanding a response input (Step 620). The response may be, for
example, a TTS output indicating that the response input was not
understood, or a TTS output of a list of responses available to the
user at that location within application software 224.
[0128] If the converted response is understood (Step 600), then the
processing unit(s) 210 causes the audio subsystem 240 or other
electronic sensory presentation device to output an Affirmation
signal through speaker(s) 242 (Step 630), which informs the user
that the response input was understood by the system. The
Affirmation signal is preferably distinct from List Pause listen
signal, Wait Pause listen signal, Response Not Understood signal
and optional End of Pause signal so that the user may recognize
that the response was understood by the system.
[0129] The application software 224 causes the processing unit(s)
210 to determine whether the received response input was a GO
response (or equivalent affirmative response) (Step 640). If the
processing unit(s) 210 determines that a response other than a GO
response (e.g., RESTART, RESTORE, CHANNELS, STORIES, SETTINGS,
HELP, etc.) is received from the user (Step 640), then the
processing unit(s) 210 processes the received other response and
responds accordingly (Step 650). Alternatively, if the processing
unit(s) 210 determines that the received response was a GO response
(or equivalent affirmative response), then the application software
224 causes the processing unit(s) 210 to execute default action
specific to that mode or sequence (Step 660).
[0130] An illustrative example of a default action is illustrated
in FIG. 7. In this illustrative example, the default action (Step
660 of FIG. 6) may be (but is not limited to) retrieving the
content (e.g., Data n) from the list of content 310 (FIG. 3) that
is associated with last descriptor (e.g., Descriptor n) in the list
of content descriptors 320 output to the user as text-to-speech
through speaker(s) 242 (Step 700). Once retrieved, the processing
unit(s) 210 may cause the TTS engine 226 and audio subsystem 240 to
provide an audio output of the retrieved content (e.g., Data n) to
the user through speaker(s) 242 (Step 730). In some embodiments,
processing unit(s) 210 may divide the retrieved content (e.g., Data
n) into segments (e.g., by paragraphs, number of words or lines, or
the like) (Step 710) and cause the TTS engine 226 to provide an
audio output of one segment (or group of segments) at a time (Step
720). After audio output of one segment (or a grouping of segments)
of the retrieved content is completed, the processing unit(s) 210
may cause the audio subsystem 240 to output an audible listen tone
through speaker(s) 242 (Step 730) and change to a List Pause
listening state for a predefined time (t) to detect a response
input from the user (Step 740).
[0131] If a response input from the user is received during the
pause time (t) (Step 750), then the processing unit(s) 210 causes
the STT engine 225 or other processing unit to convert the input
from speech-to-text and the application software 224 responds
according to the received response input (Step 760). By way of
example only, the received response input may be other simple voice
responses, such as (but is not limited to) RESTART, GO-BACK, SKIP,
HELP, SETTINGS, STOP, etc.
[0132] If no response input is received during the pause time (t)
(Step 750), then the processing unit(s) 210 determines whether
there remain any segments of the retrieved content that have not
been TTS output to the user through speaker(s) 242 (Step 770). If
there are content segments remaining that have not been output to
the user (Step 670), then the processing unit(s) 210 causes the TTS
engine 226 to convert the next segment (or group of segments) of
the retrieved content for audio output to the user through
speaker(s) 242 (Step 780). The process in Steps 730-780 is repeated
until either a response is received from the user through
microphone(s) 244 (Step 750), or all (or a predefined number of)
segments of the retrieved content have been converted from
text-to-speech by the TTS engine 226 and output to the user through
speaker(s) 242. In the latter case, the processing unit(s) 210 may
optionally cause the TTS engine 226 to provide an audio output to
the user through speaker(s) 242 as to other options from which the
user may provide response input (Step 790).
[0133] FIG. 8 illustrates an example graphical user interface (GUI)
displayed on a computing device 110 to provide a visual display of
all or a part of the list of content descriptors 320 available for
audio output in response to a user response with additional output
beyond the viewing area of the display screen. It is to be
understood that the simple affirmative response OS does not require
a screen at all; however, when used on a device having a screen, a
screen may be used to supplement the system and display content in
a visual medium in addition to an auditory or other medium.
[0134] For instance, Content Descriptor 1 (810) corresponds to
Descriptor 1 in the list of content descriptors 320 (FIG. 3).
Content Descriptor 2 (820) corresponds to Descriptor 2 in the list
of content descriptors 320 (FIG. 3). Content Descriptor 3 (830)
corresponds to Descriptor 3 in the list of content descriptors 320
(FIG. 3). Content Descriptor 4 (840) corresponds to Descriptor 4 in
the list of content descriptors 320 (FIG. 3). Content Descriptor n
(850) corresponds to Descriptor n in the list of content
descriptors 320 (FIG. 3).
[0135] FIGS. 9 and 10 are an illustrative flow diagram of another
example implementation of the simple affirmative response OS. In
this illustrative example, a user of computing device 110 can
control settings on the device using input responses. The user of
computing device 110 may input a response such as, for example,
"Settings," through microphone(s) 244 or other I/O device 255 (Step
900). The STT engine 225 or other processing unit converts the
response input into text for processing by processing unit(s) 210,
which determines that the user has selected "Settings." The
application software 224 running on computing device 110 causes the
processing unit(s) 210 to retrieve a list of settings from database
227 (Step 910). The list of settings may include short descriptors
for each setting that a user can control on the computing device
110, such as (but not limited to) "Play Back Speed," "Pause
Length," "Volume," "Choosing a Voice," "Language," etc. The
processing unit(s) 210 may optionally cause the TTS engine 226 to
provide an audio output to the user through speaker(s) 242, such
as, for example, "After you hear the setting that you wish to
select, say GO."
[0136] The processing unit(s) 210 causes the TTS engine 226 to
convert the descriptor for one of the settings (e.g., the first
descriptor in the list of settings, such as "Play Back Speed") in
the list of settings for audio output to the user through
speaker(s) 242 (Step 920). After audio output of the converted
descriptor is completed, the processing unit(s) 210 causes an
audible listen tone to be output through speaker(s) 242 of
computing device 110 (Step 930). The application software 224
causes the processing unit(s) 210 to change to a listening state
for a predefined pause time (t) to detect a response input from the
user (Step 940). As discussed above, the predefined pause time (t)
may be a fixed time (e.g., 2.5 seconds) programmed in the
application software 224, or it can be a specific time (t) selected
by the user of computing device 110 from a range of available pause
times (e.g., from 1.5 seconds to 4 seconds).
[0137] If no input response is received during the predefined pause
time (t) (Step 950), then the processing unit(s) 210 causes an
audible end of pause tone to be output through speaker(s) 242 of
computing device 110 (Step 960). The end of pause tone (Step 960)
is preferably audibly distinct from the listen tone (Step 930) so
that the user of computing device 110 can differentiate between
them. After the end of pause tone (Step 960), the processing
unit(s) 210 determines whether there are any additional descriptors
in the list of settings that have not been text-to-speech output or
otherwise presented to the user of the device 110 (Step 970). If
yes, the processing unit(s) 210 causes the TTS engine 226 to
convert the next descriptor in the list of settings (e.g., the
second descriptor in the list of settings, such as "Pause Length")
for audio output to the user through speaker(s) 242 (Step 980), and
the processing unit(s) 210 again causes the audio subsystem 240 to
output an audible listen tone through speaker(s) 242 of computing
device 110 (Step 930). After the listen tone (Step 930), the
application software 224 causes the processing unit(s) 210 to
change to a List Pause listening state for the predefined pause
time (t) to detect a response input from the user (Step 940).
[0138] The loop in Steps 930-980 continues until either a response
input is received from the user of the computing device 110 (Step
950), a predefined number of descriptors have been output to the
user, or there are no unread descriptors remaining in the list of
settings (Step 970). If the latter, then the application software
224 causes the processing unit(s) to end the process, or provide an
audio text-to-speech output to the user through the TTS engine 226,
audio subsystem 240 and speaker(s) 242 of other options available
for user selection, control or action (Step 990). These other
options may be present to the user in the form of a TTS list of
options, where the processing unit(s) 210 causes the audio
subsystem 240 to output an option in the list through speaker(s)
242 followed by a List Pause for a time (t). If no response input
is received after time (t), then the processing unit(s) 210 causes
the audio subsystem 240 to output the next option in the list
through speaker(s) 242. This process would continue until either
the processing unit(s) 210 receive a GO response (or equivalent
affirmative response) input from the user indicating the user's
selection of the last option in the list that was output to the
user through speaker(s) 242, or all options in the list have been
TTS output to the user. In this manner, the user may, for instance,
encounter a Wait Pause and remain in Settings or move to another
mode in the application software 224 through the input of the
affirmative, default GO voice response.
[0139] Referring to FIG. 10, if a response input is received within
the pause time (t) from the user through microphone(s) 244 (Step
950 of FIG. 9), then the response input is converted to text by the
STT engine 225 or other processing unit and the processing unit(s)
210 causes an audible affirmation tone to be output through
speaker(s) 242 of computing device 110 (Step 1010). The affirmation
tone is preferably audibly different from the listen tone and end
of pause tone so that the user can differentiate from the other
audible tones.
[0140] If the received response in Step 950 was a GO response (or
equivalent affirmative response) (Step 1020), the application
software 224 causes the processing unit(s) 210 to execute default
action (Step 1030). Alternatively, if the received response was not
a GO response (or equivalent affirmative response) (Step 1020), the
application software 224 causes the processing unit(s) 210 to
respond according to the other response received (Step 1040).
Examples of other simple responses that may be available to the
user in this illustrative example include (but are not limited to)
STOP, GO-BACK, REPEAT, EXIT, etc.
[0141] By way of example only, the default action (Step 1030) might
be (but is not limited to) providing an audio text-to-speech (TTS)
output of another list of options available to the user with
respect to the specific setting descriptor that was selected by the
user with a GO response input. For instance, if the received voice
response input was a GO response (Step 1020) following the audio
text-to-speech output of the setting "Pause Length," then the
application software 224 may cause the processing unit(s) 210 to
retrieve a numerical list of times (t) (e.g., t.sub.1, t.sub.2,
t.sub.3, t.sub.4, etc.) from which the user may select the desired
pause length (t) (Step 910). The processing unit(s) 210 causes the
TTS engine 226 to convert an initial pause-length time (e.g.,
t.sub.1) in the retrieved list of pause-length times (t) for audio
output to the user through speaker(s) 242. After audio output of
the converted initial pause-length time is completed, the
processing unit(s) 210 causes the audio subsystem 240 to output an
audible listen tone through speaker(s) 242 of computing device 110
and changes to a List Pause listening state for the predefined or
default pause-length time (t) to detect a response input from the
user.
[0142] If no response input is received after the default pause
time (t), then the processing unit(s) causes an end of pause tone
to be output to the user through speaker(s) 242. If additional
pause-length times (e.g., t.sub.2, t.sub.3, t.sub.4, etc.) are
available in the retrieved list of pause times, then the processing
unit(s) 210 causes the TTS engine 226 to convert the next
pause-length time (e.g., t.sub.2) in the retrieved list of pause
times for audio output to the user through speaker(s) 242, and the
processing unit(s) 210 again causes the audio subsystem 240 to
output an audible listen tone through speaker(s) 242 of the
computing device 110.
[0143] After the listen tone (Step 930), the application software
224 causes the processing unit(s) 210 to change to a listening
state for the default pause time (t) to detect an audible voice
response from the user. This loop continues until either a GO
response (or equivalent affirmative response) input is received
from the user, which, in this example, may be to set the
pause-length time (t) to the pause-length time (e.g., t.sub.2)
output to the user through speaker(s) 242 that immediately preceded
the GO response.
[0144] In an alternative implementation, after the initial
pause-length time (e.g., t.sub.1) from the list of pause-length
times (t) is output to the user through speaker(s) 242, the user
could input a response, such as (but not limited to) "Shorter" or
"Longer," through microphone(s) 244, which would be converted to
text by the STT engine 225 and processed by the processing unit(s)
210. Depending on the received response input, the application
software 224 may cause the processing unit(s) to select a shorter
or longer List Pause length time (t) from the retrieved list of
pause-length times for text-to-speech output to the user through
speaker(s) 242. The application software 224 may continue to
provide "shorter" or "longer" TTS until the user hears the desired
pause-length time. Once the user hears the desired pause-length
time, the user may input a GO response (or equivalent simple
affirmative response) through microphone(s) 244 and the processing
unit(s) 210 may set the List Pause length time (t) to the desired
pause-length time (e.g., t.sub.2) output to the user through
speaker(s) 242 that immediately preceded the affirmative simple
response (e.g., a GO response) input.
[0145] It is understood that the example above is illustrative and
that the invention is not intended to be limited to a particular
list stored in a database or specific voice responses. For example,
instead of adjusting "Pause Length," the user may adjust the speed
of the TTS output by the TTS engine 226 by providing a GO response
input (Step 1020) following the audio text-to-speech output of the
setting "Voice Speed." In that case, the application software 224
may cause the processing unit(s) 210 to retrieve a numerical list
of voice speeds (v) and present the user with a TTS output of each
speed (e.g., v.sub.1, v.sub.2, v.sub.3, v.sub.4, etc.) in the list,
from which the user may select the desired speed (v) by providing
the affirmative GO response (or an equivalent affirmative response)
through microphone(s) 244 after hearing the audio TTS output of the
desired speed (Step 910).
[0146] FIGS. 11 and 11A illustrate an example response controller
GUI 1110 displayed on a touch screen 253 (or other display) of
computing device 110. The response controller GUI 1110 provides a
visual display of simple responses available to the user of
computing device 110 and, preferably, of active simple responses
available to the user as they change at different times or stages
within the application software 224. The response controller GUI
1110 prompts the user of computing device 110 with the active
responses differentiated from the inactive (or unavailable)
responses to facilitate a response-driven system so that the user
does not have to remember all responses available. Non-limiting
examples for differentiating active responses from inactive (or
unavailable) responses include (but are not limited to) only
displaying the active responses on the visual display, presenting
the active responses in a different color than the inactive
responses, or presenting the active responses with a different
apparent brightness than the inactive responses.
[0147] In some embodiments, the response controller GUI 1110
displayed on touch screen 253 of computing device 110 is in the
form of a virtual remote control in which the text of active
responses are displayed (as they change at different times or
stages within the application software 224) to inform the user of
the responses that are available at that time or sequence within
application software 224. By way of example only, the response
controller GUI 1110 of FIG. 11 may include (but is not limited to)
a visual display of the text for the following illustrative simple
responses available within application software 224: GO, STOP, GO
BACK, RESTART, SKIP, REPEAT, SAVE, DELETE, SHARE, HELP, TIME, MUTE,
SETTINGS, HEADLINES, STORIES, CHANNELS, REFRESH, PLAY ALL, RESTORE,
PREVIOUS and NEXT. However, only certain responses (active
responses) are available to the user of computing device 110 at
different stages or times within application software 224. FIG. 11A
illustrates, for emphasis, the response controller GUI 1110 with
only the text for active responses displayed to the user at a given
stage or time within application software 224. It is understood,
however, that other techniques may be utilized to differentiate
active and inactive responses on the visual display, such as (but
not limited to) displaying active responses in a different color
from inactive responses. In this manner, response controller GUI
1110 provides the user with a visual display that differentiates
active responses that are available to the user from inactive
responses that are unavailable to the user at any given time within
the application software 224.
[0148] In some embodiments, the text of each active response
displayed by the response controller GUI 1110 on touch screen 253
of computing device 110 may also function as a virtual button to
allow the user to provide response inputs by tapping on the text of
the desired response. By functioning as a manual interface, the
response controller GUI 1110 provides the user of computing device
110 with a simplified interface and an alternative to providing a
response input or navigating the graphical screen interface. The
remote control style presentation of GUI 1110 may also facilitate
the use of the system by users who are physically challenged and
cannot use a keyboard or other manual input devices to interact
with the system. Furthermore, for users who are visually impaired,
response controller GUI 1110 may facilitate other systems, such as
Apple, Inc.'s VoiceOver.TM. screen reader, whereby all buttons to
control the system are labeled and accessible to visually impaired
user's for interaction with the system.
[0149] FIG. 12 is a high level block diagram of a representative
intermediate system server 140 that may be utilized to implement
various features and processes described herein. Intermediate
system server 140 may be described in the general context of
computer system-executable instructions, such as program modules,
being executed by a computer system. Generally, program modules may
include routines, programs, objects, components, logic, data
structures, and so on that perform particular tasks or implement
particular abstract data types.
[0150] As shown in FIG. 12, optional intermediate system server 140
is illustrated in the form of a special purpose computer system.
The components of intermediate system server 140 may include (but
are not limited to) one or more processors or processing units
1200, a system memory 1210, and a bus 1215 that couples various
system components including memory 1210 to processor 1200.
[0151] Bus 1215 represents one or more of any of several types of
bus structures, including a memory bus or memory controller, a
peripheral bus, an accelerated graphics port, and a processor or
local bus using any of a variety of bus architectures. By way of
example, and not limitation, such architectures include Industry
Standard Architecture (ISA) bus, Micro Channel Architecture (MCA)
bus, Enhanced ISA (EISA) bus, Video Electronics Standards
Association (VESA) local bus, and Peripheral Component Interconnect
(PCI) bus.
[0152] Processing unit(s) 1200 may execute computer programs stored
in memory 1210. Any suitable programming language can be used to
implement the routines of particular embodiments including C, C++,
Java, Swift, assembly language, etc. Different programming
techniques can be employed such as procedural or object oriented.
The routines can execute on a single intermediate system server 140
or multiple intermediate system servers 140. Further, multiple
processors 1200 may be used.
[0153] Intermediate system server 140 typically includes a variety
of computer system readable media. Such media may be any available
media that is accessible by intermediate system server 140, and it
includes both volatile and non-volatile media, removable and
non-removable media.
[0154] System memory 1210 can include computer system readable
media in the form of volatile memory, such as random access memory
(RAM) 1220 and/or cache memory 1230. Intermediate system server 140
may further include other removable/non-removable,
volatile/non-volatile computer system storage media. By way of
example only, storage system 1240 can be provided for reading from
and writing to a non-removable, non-volatile magnetic media (not
shown and typically referred to as a "hard drive" or "solid state
drive"). Although not shown, a magnetic disk drive for reading from
and writing to a removable, non-volatile magnetic disk (e.g., a
"floppy disk"), and an optical disk drive for reading from or
writing to a removable, non-volatile optical disk such as a CD-ROM,
DVD-ROM or other optical media can be provided. In such instances,
each can be connected to bus 1215 by one or more data media
interfaces. As will be further depicted and described below, memory
1210 may include at least one program product having a set (e.g.,
at least one) of program modules that are configured to carry out
the functions of embodiments described in this disclosure.
[0155] Program/utility 1250, having a set (at least one) of program
modules 1255, may be stored in memory 1210 by way of example, and
not limitation, as well as an operating system, one or more
application software, other program modules, and program data. Each
of the operating system, one or more application programs, other
program modules, and program data or some combination thereof, may
include an implementation of a networking environment.
[0156] Intermediate system server 140 may also communicate with one
or more external devices 1270 such as a keyboard, a pointing
device, a display, etc.; one or more devices that enable a user to
interact with intermediate system server 140; and/or any devices
(e.g., network card, modem, etc.) that enable intermediate system
server 140 to communicate with one or more other computing devices.
Such communication can occur via Input/Output (I/O) interface(s)
1260.
[0157] In addition, as described above, intermediate system server
140 can communicate with one or more networks 120, such as a local
area network (LAN), a general wide area network (WAN) and/or a
public network (e.g., the Internet) via network adaptor 1280. As
depicted, network adaptor 1280 communicates with other components
of intermediate system server 140 via bus 1215. It should be
understood that although not shown, other hardware and/or software
components could be used in conjunction with intermediate system
server 140. Examples include (but are not limited to) microcode,
device drivers, redundant processing units, external disk drive
arrays, RAID systems, tape drives, and data archival storage
systems, etc.
[0158] The present invention may be a system, a method, and/or a
computer program product at any possible technical detail level of
integration. The computer program product may include a computer
readable storage medium (or media) having computer readable program
instructions thereon for causing a processor to carry out aspects
of the present invention.
[0159] The computer readable storage medium can be a tangible
device that can retain and store instructions for use by an
instruction execution device. The computer readable storage medium
may be, for example, but is not limited to, an electronic storage
device, a magnetic storage device, an optical storage device, an
electromagnetic storage device, a semiconductor storage device, or
any suitable combination of the foregoing. A non-exhaustive list of
more specific examples of the computer readable storage medium
includes the following: a portable computer diskette, a hard disk,
a random access memory (RAM), a read-only memory (ROM), an erasable
programmable read-only memory (EPROM or Flash memory), a static
random access memory (SRAM), a portable compact disc read-only
memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a
floppy disk, a mechanically encoded device such as punch-cards or
raised structures in a groove having instructions recorded thereon,
and any suitable combination of the foregoing. A computer readable
storage medium, as used herein, is not to be construed as being
transitory signals per se, such as radio waves or other freely
propagating electromagnetic waves, electromagnetic waves
propagating through a waveguide or other transmission media (e.g.,
light pulses passing through a fiber-optic cable), or electrical
signals transmitted through a wire.
[0160] Computer readable program instructions described herein can
be downloaded to respective computing/processing devices from a
computer readable storage medium or to an external computer or
external storage device via a network, for example, the Internet, a
local area network, a wide area network and/or a wireless network.
The network may comprise copper transmission cables, optical
transmission fibers, wireless transmission, routers, firewalls,
switches, gateway computers and/or edge servers. A network adapter
card or network interface in each computing/processing device
receives computer readable program instructions from the network
and forwards the computer readable program instructions for storage
in a computer readable storage medium within the respective
computing/processing device.
[0161] Computer readable program instructions for carrying out
operations of the present invention may be assembler instructions,
instruction-set-architecture (ISA) instructions, machine
instructions, machine dependent instructions, microcode, firmware
instructions, state-setting data, configuration data for integrated
circuitry, or either source code or object code written in any
combination of one or more programming languages, including an
object oriented programming language such as Smalltalk, C++, or the
like, and procedural programming languages, such as the "C"
programming language or similar programming languages. The computer
readable program instructions may execute entirely on the user's
computer, partly on the user's computer, as a stand-alone software
package, partly on the user's computer and partly on a remote
computer or entirely on the remote computer or server. In the
latter scenario, the remote computer may be connected to the user's
computer through any type of network, including a local area
network (LAN) or a wide area network (WAN), or the connection may
be made to an external computer (for example, through the Internet
using an Internet Service Provider). In some embodiments,
electronic circuitry including, for example, programmable logic
circuitry, field-programmable gate arrays (FPGA), or programmable
logic arrays (PLA) may execute the computer readable program
instructions by utilizing state information of the computer
readable program instructions to personalize the electronic
circuitry, in order to perform aspects of the present
invention.
[0162] Aspects of the present invention are described herein with
reference to flowchart illustrations and/or block diagrams of
methods, apparatus (systems), and computer program products
according to embodiments of the invention. It is understood that
each block of the flowchart illustrations and/or block diagrams,
and combinations of blocks in the flowchart illustrations and/or
block diagrams, can be implemented by computer readable program
instructions.
[0163] These computer readable program instructions may be provided
to a processor of a general purpose computer, special purpose
computer, or other programmable data processing apparatus to
produce a machine, such that the instructions, which execute via
the processor of the computer or other programmable data processing
apparatus, create means for implementing the functions/acts
specified in the flowchart and/or block diagram block or blocks.
These computer readable program instructions may also be stored in
a computer readable storage medium that can direct a computer, a
programmable data processing apparatus, and/or other devices to
function in a particular manner, such that the computer readable
storage medium having instructions stored therein comprises an
article of manufacture including instructions which implement
aspects of the function/act specified in the flowchart and/or block
diagram block or blocks.
[0164] The computer readable program instructions may also be
loaded onto a computer, other programmable data processing
apparatus, or other device to cause a series of operational steps
to be performed on the computer, other programmable apparatus or
other device to produce a computer implemented process, such that
the instructions which execute on the computer, other programmable
apparatus, or other device implement the functions/acts specified
in the flowchart and/or block diagram block or blocks.
[0165] The flowchart and block diagrams in the Figures illustrate
the architecture, functionality, and operation of possible
implementations of systems, methods, and computer program products
according to various embodiments of the present invention. In this
regard, each block in the flowchart or block diagrams may represent
a module, segment, or portion of instructions, which comprises one
or more executable instructions for implementing the specified
logical function(s). In some alternative implementations, the
functions noted in the blocks may occur out of the order noted in
the Figures. For example, two blocks shown in succession may, in
fact, be executed substantially concurrently, or the blocks may
sometimes be executed in the reverse order, depending upon the
functionality involved. It will also be noted that each block of
the block diagrams and/or flowchart illustration, and combinations
of blocks in the block diagrams and/or flowchart illustration, can
be implemented by special purpose hardware-based systems that
perform the specified functions or acts or carry out combinations
of special purpose hardware and computer instructions.
[0166] The descriptions of the various embodiments of the present
invention have been presented for purposes of illustration, but are
not intended to be exhaustive or limited to the embodiments
disclosed. Many modifications and variations will be apparent to
those of ordinary skill in the art without departing from the scope
and spirit of the described embodiments. The terminology used
herein was chosen to best explain the principles of the
embodiments, the practical application or technical improvement
over technologies found in the marketplace, or to enable others of
ordinary skill in the art to understand the embodiments disclosed
herein.
* * * * *