U.S. patent application number 10/610699 was filed with the patent office on 2004-06-17 for speech based personal information manager.
Invention is credited to Kiecza, Daniel, Kubala, Francis.
Application Number | 20040117188 10/610699 |
Document ID | / |
Family ID | 30003990 |
Filed Date | 2004-06-17 |
United States Patent
Application |
20040117188 |
Kind Code |
A1 |
Kiecza, Daniel ; et
al. |
June 17, 2004 |
Speech based personal information manager
Abstract
A personal information manager (PIM) [100] stores user personal
data. The PIM may be a personal computer like box located in the
home of the user. The PIM includes an audio interface [108] and a
visual interface [110]. Users may equivalently interact and manage
their data through the audio interface or the visual interface.
Accordingly, the user has full access to the PIM whenever the user
can establish a voice or data connection.
Inventors: |
Kiecza, Daniel; (Cambridge,
MA) ; Kubala, Francis; (Boston, MA) |
Correspondence
Address: |
Leonard C. Suchyta
c/o Christian Andersen
Verizon Corporate Services Group Inc.
600 Hidden Ridge, HQE03H01
Irving
TX
75038
US
|
Family ID: |
30003990 |
Appl. No.: |
10/610699 |
Filed: |
July 2, 2003 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60394064 |
Jul 3, 2002 |
|
|
|
60394082 |
Jul 3, 2002 |
|
|
|
60419214 |
Oct 17, 2002 |
|
|
|
Current U.S.
Class: |
704/270.1 ;
705/1.1 |
Current CPC
Class: |
G10L 25/78 20130101;
H04M 2201/60 20130101; G10L 15/26 20130101; H04M 2201/42 20130101;
H04M 2203/305 20130101; Y10S 707/99943 20130101 |
Class at
Publication: |
704/270.1 ;
705/001 |
International
Class: |
G06F 017/60; G10L
011/00; G10L 021/00 |
Claims
what is claimed:
1. An information management device comprising: a voice interface;
a database configured to store data for a user of the information
management device, the data including at least voicemail data and
email data; and a dialog manager component configured to provide
access to the voicemail data and the email data when the user
connects to the information management device through the voice
interface.
2. The information management device of claim 1, wherein the dialog
manager component further comprises: a speech recognition component
configured to analyze verbal commands given by the user; wherein
the information management device accesses the database based on
the analyzed verbal commands.
3. The information management device of claim 1, wherein the dialog
manager component further comprises: a text-to-speech component
that transmits the email data to the user as audio information
synthesized from text of the email data.
4. The information management device of claim 1, further
comprising: a network interface configured to provide access to the
database through a visual user interface.
5. The information management device of claim 4, wherein the
network interface and the voice interface provide equivalent access
to the database.
6. The information management device of claim 1, wherein the data
in the database includes additional data related to at least one of
personal to-do lists, contact information, and calendar
information.
7. The information management device of claim 6, further
comprising: a network interface configured to provide access to the
database through a visual user interface; wherein the additional
data is accessible through the voice interface and the network
interface.
8. The information management device of claim 1, wherein the
information management device is a personal-computer-like device
located at a residence of the user.
9. The information management device of claim 1, wherein the access
provided by the dialog manager component includes reviewing and
deleting the voicemail data and the email data.
10. A device comprising: means for receiving and transmitting audio
data; means for storing personal data of a user, the personal data
including at least voicemail data and email data; means for
providing access to the voicemail data and the email data to the
user via an audio interface implemented via the means for receiving
and transmitting; and means for creating email data based on speech
received through the means for receiving and transmitting.
11. The device of claim 10, further comprising: means for
recognizing commands spoken by the user; and means for accessing
the voicemail data and the email data based on the commands spoken
by the user.
12. A method of managing personal information, comprising: storing
the personal information in a database associated with a personal
information management device; establishing a connection to the
personal information management device via a voice interface; and
receiving spoken commands over the voice interface, the spoken
commands initiating retrieval of voicemail, email, and personal
organization information.
13. The method of claim 12, wherein the personal organization
information includes at least one of personal to-do lists, contact
information, and calendar information.
14. The method of claim 12, wherein the spoken commands
additionally include commands to create an email message.
15. The method of claim 12, wherein establishing the connection
includes authorizing a user based on an identification of acoustic
properties of voice information of the user.
16. The method of claim 12, wherein the personal information
includes passwords of a user.
17. A device for managing personal information, comprising: means
for storing the personal information; means for accepting a
connection initiated over a voice interface; and means for
receiving spoken commands over the voice interface, the spoken
commands initiating retrieval of voicemail, email, and personal
organization information.
18. The device of claim 17, wherein the personal organization
information includes at least one of personal to-do lists, contact
information, and calendar information.
19. The device of claim 17, further comprising: means for
authorizing the connection based on an identification of acoustic
properties of voice information of a user.
20. A system comprising: a database configured to store personal
data of a user; a voice interface configured to provide access to
the personal data via a voice connection; a network interface
configured to provide access to the personal data via a data
connection that provides visual information to the user; and a
control processing component configured to receive spoken commands
from the voice interface and respond to the spoken commands by
providing the personal data to the user as audio data, the control
processing component being further configured to receive logical
commands from the network interface and to respond to the logical
commands by providing the personal data to the user as visual
data.
21. The system of claim 20, further comprising: a dialog manager
component configured to analyze the spoken commands and generate
logical commands based on the spoken commands.
22. The system of claim 21, wherein the dialog manager component
further comprises: a speech recognition component configured to
perform speech recognition functions on the spoken commands; and a
text-to-speech component configured to convert textual information
from the database to the audio data.
23. The system of claim 22, wherein the speech recognition
component further includes: speaker identification logic configured
to identify the user as a speaker of the spoken commands by
comparing acoustic features of the voice of the user to a set of
pre-stored acoustic features.
24. The system of claim 23, wherein the speech recognition
component further comprises: name spotting logic configured to
locate the names of at least one of people, places, and
organizations in speech spoken by the user; and topic
classification logic configured to assign topics to the speech
spoken by the user.
25. The system of claim 20, wherein the personal data in the
database includes data related to at least one of personal to-do
lists, contact information, and calendar information.
26. The system of claim 20, wherein the personal data in the
database includes data related to voicemail and email.
Description
RELATED APPLICATIONS
[0001] This application claims priority under 35 U.S.C. .sctn. 119
based on U.S. Provisional Application Nos. 60/394,064 and
60/394,082 filed Jul. 3, 2002 and Provisional Application No.
60/419,214 filed Oct. 17, 2002, the disclosures of which are
incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates generally to personal
information management systems and, more particularly, to a
personal information manager that includes speech recognition
capabilities.
[0004] 2. Description of Related Art
[0005] Personal information management devices assist users in
organizing personal information. Typically, conventional personal
information management devices are portable and relatively small
computing systems that provide a number of information management
functions. For example, the personal information management devices
may store user telephone contact numbers, to-do lists, and
calendar/scheduling information. In some cases, these personal
information management devices may also function as communication
devices by, for example, providing the ability to transmit and
receive email.
[0006] Conventionally, users enter information into personal
information management devices through a text-based input device,
such as a keyboard or a stylus, and view the contents of the
personal information management devices through a visual display
associated with the device, such as an LCD.
[0007] One reason for the popularity of conventional personal
information management devices is that the small size of these
devices allows users to easily carry the devices with them during
the course of their day. Because the devices are always accessible,
users can interact with the devices whenever the need arises.
[0008] Despite the portability and general accessibility of
conventional personal information management devices, these devices
still have a number of limitations. These limitations include a
limited interface for entering and retrieving data. Further,
although conventional personal information management devices are
portable, the user still has to remember to carry the device with
him.
[0009] Accordingly, there is a need in the art for an improved
personal information management device. In particular, it would be
desirable for a personal information management device to have an
improved user interface and be always accessible.
SUMMARY OF THE INVENTION
[0010] Systems and methods consistent with the present invention
provide improved personal information management services.
[0011] One aspect of the invention is directed to an information
management device that includes a voice interface, a database, and
a dialog manager. The database stores data for a user that includes
at least voicemail data and email data. The dialog manager provides
access to the voicemail data and the email data when the user
connects to the information management device through the voice
interface.
[0012] A second aspect of the invention is directed to a method of
managing personal information. The method includes storing the
personal information in a database associated with a personal
information management device; establishing a connection to the
personal information management device via a voice interface; and
receiving spoken commands over the voice interface, the spoken
commands initiating retrieval of voicemail, email, and personal
organization information.
[0013] Another aspect of the invention is directed to a system. The
system includes a database configured to store personal data of a
user and a voice interface configured to provide access to the
personal data via a voice connection. Further, the system includes
a network interface configured to provide access to the personal
data via a data connection that provides visual information to the
user. The system further includes a control processing component
configured to receive spoken commands from the voice interface and
respond to the spoken commands. The system responds to the spoken
commands by providing the personal data to the user as audio data.
The system receives logical commands from the network interface and
responds to the logical commands by providing the personal data to
the user as visual data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] The accompanying drawings, which are incorporated in and
constitute a part of this specification, illustrate the invention
and, together with the description, explain the invention. In the
drawings,
[0015] FIG. 1 is a diagram illustrating a personal information
manager (PIM) implemented in a manner consistent with the present
invention;
[0016] FIG. 2 is a diagram illustrating a speech recognition
component in the PIM of FIG. 1;
[0017] FIG. 3 is an exemplary diagram illustrating portions of the
speech recognition component in additional detail;
[0018] FIG. 4 is a diagram illustrating contents of a database used
by the PIM; and
[0019] FIG. 5 is a flow chart illustrating exemplary operation of
the PIM in accessing personal data stored in a database through a
voice interface.
DETAILED DESCRIPTION
[0020] The following detailed description of the invention refers
to the accompanying drawings. The same reference numbers in
different drawings may identify the same or similar elements. Also,
the following detailed description does not limit the invention.
Instead, the scope of the invention is defined by the appended
claims and equivalents.
[0021] A personal information manager (PIM), as described below,
stores a variety of information for a user. The PIM is remotely
accessible through both a voice port and a network data port. The
PIM may permanently reside at a single location, such as the user's
home, but is remotely accessible by the user whenever the user has
access to a voice or data line. Information in the PIM can be
accessed and/or modified through either the voice or data
ports.
[0022] FIG. 1 is a diagram illustrating a PIM 100 implemented in a
manner consistent with the present invention. PIM 100 includes
database 102, control processing component 104, dialog manager 106,
voice interface 108, network interface 110, and local interface
112.
[0023] In one implementation, PIM 100 is designed to be a personal
computer-like box that resides in the home of the user. Unlike some
conventional personal information management devices, which users
carry with them, PIM 100 does not necessarily need to be designed
to be portable. Instead, PIM 100 includes a number of interfaces
108-112 that allow users to connect and interact with PIM 100 from
anywhere the user has a telephone or a network connection.
[0024] Voice interface 108 connects PIM 100 to the user via a voice
line. Voice interface 108 may be, for example, a standard telephone
connection that connects the user to a standard public switched
telephone network (PSTN) 120. By dialing the number assigned to
voice interface 108, users can verbally give instructions to and
receive information from PIM 100.
[0025] Network interface 110 connects PIM 100 to network 121.
Network 121 may be a wide area network, such as the Internet. Users
may connect to PIM 100 via an HTTP (hyper-text transfer protocol)
web-based connection. In this situation, PIM 100 may act as a web
server in receiving and responding to user commands.
[0026] Through voice interface 108 and network interface 110, users
can connect to PIM 100 whenever they have access to either a voice
line, such as via a cell or wireless phone, or a network
connection, such as via the Internet.
[0027] In one implementation, PIM 100 may also include local
interface 112. Local interface 112 may include connections for
wired or wireless devices, such as a keyboard and/or a display
device (not shown). Through local interface 112, users may interact
directly with PIM 100.
[0028] As previously mentioned, PIM 100 may interact with a user
through a voice interface, such as through a telephone line. Dialog
manager 106 handles the speech recognition and text-to-speech
synthesis functions necessary for this interaction. Dialog manager
106 includes speech recognition component 115, command interface
116, and text-to-speech synthesis component 117. Speech recognition
component 115 processes incoming audio signals and converts the
audio signals into their textual transcriptions. Command interface
116 analyzes the transcriptions produced by speech recognition
component 115 to, for example, spot user commands in the
transcription. An example of a user command may include the command
"check email," which may cause PIM 100 to audibly alert the user if
he has new email. Text-to-speech synthesis component 117 outputs
information stored in a textual format as one or more audio signals
appropriate for transmission through voice interface 108. For
example, textual email may be converted to speech and sent to a
user over voice interface 108.
[0029] FIG. 2 is a diagram illustrating speech recognition
component 115 of dialog manager 106 in additional detail. As shown
in FIG. 2, speech recognition component 115 may include training
system 210, statistical model 220, and recognition system 230.
Training system 210 may include logic that estimates parameters of
statistical model 220 from a corpus of training data. The training
data may initially include human-produced data. For example, the
training data might include one hundred hours of audio data that
has been meticulously and accurately transcribed by a human.
Training system 210 may use the training data to generate
parameters for statistical model 220 that recognition system 230
may later use to recognize future data that it receives (i.e., new
audio that it has not heard before).
[0030] Statistical model 220 may include acoustic models and
language models. The acoustic models may describe the time-varying
evolution of feature vectors for each sound or phoneme. The
acoustic models may employ continuous Hidden Markov Models (HMMs)
to model each of the phonemes in the various phonetic contexts.
[0031] The language models may include n-gram language models,
where the probability of each word is a function of the previous
word (for a bi-gram language model) and the previous two words (for
a tri-gram language model). Typically, the higher the order of the
language model, the higher the recognition accuracy at the cost of
slower recognition speeds.
[0032] Recognition system 230 may use statistical model 220 to
process input audio data. FIG. 3 is an exemplary diagram of
recognition system 230 according to an implementation consistent
with the principles of the invention. Recognition system 230 may
include audio classification logic 310, speech recognition logic
320, speaker identification logic 340, name spotting logic 350, and
topic classification logic 360. Audio classification logic 310 may
distinguish speech from silence, noise, and other audio signals in
input audio data. For example, audio classification logic 310 may
analyze five second windows of the input data to determine whether
it contains speech.
[0033] Speech recognition logic 320 may perform continuous speech
recognition to recognize the words spoken in the segments that it
receives from audio classification logic 310. Speech recognition
logic 320 may generate a transcription of the speech using
statistical model 220. Speaker identification logic 340 may
identify the speaker by comparing acoustic features of the
speaker's voice with a set of pre-stored acoustic features. Speaker
identification may be useful for security verification.
[0034] Name spotting logic 350 may locate the names of people,
places, and organizations in the transcription. Name spotting logic
350 may extract the names and store them in a database. Topic
classification logic 360 may assign topics to the transcription.
Each of the words in the transcription may contribute differently
to each of the topics assigned to the transcription. Topic
classification logic 360 may generate a rank-ordered list of all
possible topics and corresponding scores for the transcription.
Topics identified for a speaker segment may be stored with the
segment and later used when performing searches over an archive of
spoken segments.
[0035] Referring back to FIG. 1, PIM 100 includes control
processing component 104 and database 102. Database 102 stores user
information. FIG. 4 is a diagram illustrating potential contents of
database 102. Database 102 may include user created to-do lists
401, user memos 402, contact lists 403, user calendar information
404, voicemail 405-406, email 407, and other documents or files
408. To-do lists 401 may include lists of "to-do" action items
created by the user. Memos 402 may include brief written documents
created by the user. Contact lists 403 may include contact
information, such as names, telephone numbers, and email addresses,
for a number of people.
[0036] Voicemails received through voice interface 108 may be
stored as digitized audio 405. Similarly, voicemail transcriptions
406 may include a rich transcription of the received voicemails, as
transcribed by speech recognition component 115. Email received via
network interface 110 may also be stored in database 102.
[0037] Items 401-408, stored in database 102, are exemplary. In
general, database 102 functions as a central storage location for a
user's personal information. Accordingly, database 102 could be
used to store other appropriate user information, such as any
miscellaneous files 408 (e.g., documents, audio files, video files,
etc.) that the user wishes to store.
[0038] Referring again to FIG. 1, control processing component 104
may manage the information in database 102. Control processing
component 104 may, for example, act as a web server and process
requests, such as HTTP requests, received via network interface
110. The requests may relate to the display or manipulation of the
user data in database 102. Control processing component 104 may
also interact with dialog manager 106 to transfer data to or
receive data from dialog manager 106. Thus, for example, voice
commands received from a user may be identified by speech
recognition component 115 and then transmitted from command
interface 116 to control processing component 104.
[0039] FIG. 5 is a flow chart illustrating exemplary operation of
PIM 100 in accessing personal data stored in database 102 through
voice interface 108.
[0040] A user begins by placing a call to voice interface 108 of
PIM 100 (Act 501). The call may be placed from any conventional
phone. If voice interface 108 is connected to a telephone line that
shares other household functions, such as an answering machine
and/or a general phone line, PIM 100 may monitor the connection to
determine if the caller intends to interact with PIM 100. The
caller may, for example, press a predetermined number sequence on
the phone or speak a predetermined command to indicate that the
caller wishes to interact with PIM 100.
[0041] PIM 100 may check to determine whether the caller is
authorized to use PIM 100 (Act 502). This authorization may take
the form of a predetermined password entered on the phone keypad,
by a predetermined spoken password, or through speaker
identification performed by speaker identification component 340.
When using speaker identification component 340 to authorize a
user, speaker identification component 340 may compare pre-stored
acoustic features of the user's voice to acoustic features derived
from the active voice connection. If the acoustic features match,
the user is authorized.
[0042] Once authorized, the user may interact with PIM 100 by
giving spoken audio commands to PIM 100 (Act 503). The spoken
commands may relate to any of the data in database 102. The spoken
commands are processed by recognition system 230 of speech
recognition component 115 to generate a logical representation of
the command, which command interface 116 may transmit to control
processing component 104.
[0043] The commands include a command from a set of commands that
relate to voicemail data 405 and 406, based on which PIM 100
assists the user in retrieving voice mail (Act 504). The user may,
for example, control the playback of voice messages that other
parties left for the user. Thus, via voice commands, the user may
listen to, delete, and/or file voice messages for future reference.
In some embodiments, the user may search or otherwise interact with
voicemail transcriptions 406. For example, after listening to a
voice message from "Bob," the user may instruct PIM 100 to search
archived voice messages for other messages from Bob, and then
playback a particular one of those messages.
[0044] Another set of commands may relate to email. Through these
commands, PIM 100 allows the user to retrieve, compose, and/or
manage email account(s) (Act 505). The user may, for example,
command PIM 100 to playback, through text-to-speech synthesis
component 117, the "subject" and "from" fields of all newly
received emails. Emails that the user is particularly interested in
may be selected for playback of the "body" of the email. In
addition to merely retrieving emails, in some implementations, the
user may compose emails. In this situation, the user may speak the
name of the intended recipient, the subject line of the email, and
the body of the email. Speech recognition component 115 transcribes
the user's spoken email. Alternatively, the body of the email could
be transmitted as an acoustic file (i.e., send what the person said
as an acoustic file attachment). Control processing component 104
may then look up the address corresponding to the intended
recipient in database 102 and prepare the email with the
transcribed speech.
[0045] In additional to email and voicemail control, PIM 100 may
respond to commands to manage personal information of the user,
such as to-do lists 401, memos 402, contact lists 403, and/or
calendar information 404 (Act 506). A user may, for example, add
action items to to-do list 401 or review action items already in
the to-do list. The user may similarly edit, review, and manage
memos 402, contact lists 403, calendar information 404. As with
voicemail and email management, speech recognition component 115
and text-to-speech synthesis component 117 enable recognition of
user voice commands, provide dictation of user speech, and convert
data to audio for playback to the user.
[0046] In addition to accessing personal data in database 102
through voice interface 108, the user may access his personal data
via network interface 110. In one implementation, PIM 100 presents
a web browser based interface to the user. Accordingly, in the
situation in which network 121 is the Internet, the user may
connect to PIM 100 through any computing device that contains a
suitable browser program and is connected to the Internet. Control
processing component 104 may act as a web server that provides
access to database 102.
[0047] In general, when interacting with a user through network
interface 110, control processing component 104 allows the user to
perform all of the functions that are available through voice
interface 108. Thus, the user may access and manage voicemail and
email accounts, as well as access and manage to-do lists 401, memos
402, contact lists 403, and calendar information 404. Web servers
that provide access to email, to-do lists, memos, contact lists,
and calendars are known in the art and will not be discussed
further herein.
[0048] When providing access to voicemail functions through network
interface 110, control processing component 104 may transmit
transcriptions of the received voicemails to the user.
Alternatively, control processing component 104 may stream audio
corresponding to the voicemail over network 121.
[0049] Local interface 112 provides direct access to PIM 100.
Through local interface 112 a user may connect, for example, a
keyboard, a mouse, and a monitor. Control processing component 104
may provide functionality through local interface 102 that is
similar to that provided through network interface 110. In some
implementations, a user connect a microphone and speakers through
local interface 112. In this situation, the microphone and speaker
lines may be coupled to dialog manager 106 may provide
functionality through local interface 102 that is similar to that
provided through voice interface 108.
[0050] In summary, whether accessing PIM 100 through voice
interface 108, network interface 110, or local interface 112, PIM
100 gives the user full access to the personal data stored in
database 102. Users may experience equivalent personal data access
features when accessing their data through audio or video
interfaces.
[0051] Because PIM 100 stores the user's personal data at a single
location that can be accessed on-demand from virtually anywhere,
PIM 100 can provide a number of additional data management
services. PIM 100, may, for example, act as a centralized storage
for user passwords and login names. The user can simply call PIM
100 and request that PIM 100 log onto a web site and retrieve user
specified information, which PIM 100 may then return to the user.
Thus, in addition to managing the user's personal information, PIM
100 can retrieve, store, and provide on-demand access to other
information in which the user is interested.
Conclusion
[0052] Systems and methods consistent with principles of the
invention provide a convenient and easy obtainable access to
personal data. Users can manage their personal data through
traditional visual interfaces, or equivalently, through an audio
interface. In this way, the user's personal data remains totally
under the control of the user at all times.
[0053] The foregoing description of preferred embodiments of the
invention provides illustration and description, but is not
intended to be exhaustive or to limit the invention to the precise
form disclosed. Modifications and variations are possible in light
of the above teachings or may be acquired from practice of the
invention. For example, while a series of acts has been presented
with respect to FIG. 5, the order of the acts may be different in
other implementations consistent with the present invention.
[0054] Certain portions of the invention have been described as
software that performs one or more functions. The software may more
generally be implemented as any type of logic. This logic may
include hardware, such as an application specific integrated
circuit or a field programmable gate array, software, or a
combination of hardware and software.
[0055] No element, act, or instruction used in the description of
the present application should be construed as critical or
essential to the invention unless explicitly described as such.
Also, as used herein, the article "a" is intended to include one or
more items. Where only one item is intended, the term "one" or
similar language is used.
[0056] The scope of the invention is defined by the claims and
their equivalents.
* * * * *