U.S. patent application number 09/749598 was filed with the patent office on 2002-07-04 for method and system for providing textual content along with voice messages.
Invention is credited to Davidson, Jason Alan, Hernandez, Thomas J., Pokhariyal, Shuvranshu.
Application Number | 20020085690 09/749598 |
Document ID | / |
Family ID | 25014415 |
Filed Date | 2002-07-04 |
United States Patent
Application |
20020085690 |
Kind Code |
A1 |
Davidson, Jason Alan ; et
al. |
July 4, 2002 |
Method and system for providing textual content along with voice
messages
Abstract
A system and method for providing textual content along with
voice messages is presented. A caller at a calling station places a
call to a callee at a receiving station. The call is placed through
a connection, and the receiving station includes a callee's phone
linked to a callee's computer. The call is recorded at the calling
station to generate voice data. The call is then transcribed based
on the voice data to generate the textual content of the call. The
textual content is sent by the calling station to a server via the
connection. The server then transfers the textual content, as well
as voice data received by the server, to an electronic incoming
mailbox of the callee as electronic mail.
Inventors: |
Davidson, Jason Alan;
(Forest Grove, OR) ; Hernandez, Thomas J.;
(Portland, OR) ; Pokhariyal, Shuvranshu;
(Hillsboro, OR) |
Correspondence
Address: |
PILLSBURY WINTHROP, LLP
P.O. BOX 10500
MCLEAN
VA
22102
US
|
Family ID: |
25014415 |
Appl. No.: |
09/749598 |
Filed: |
December 28, 2000 |
Current U.S.
Class: |
379/88.17 |
Current CPC
Class: |
H04M 1/6505
20130101 |
Class at
Publication: |
379/88.17 |
International
Class: |
H04M 001/64 |
Claims
What is claimed is:
1. A method for providing textual content along with voice
messages, the method comprising: placing a call, by a caller on a
calling station, to a callee, represented by a receiving station,
through a connection, the receiving station including a callee's
phone linked to a callee's computer; recording the call, at the
calling station, to generate voice data; transcribing the call
based on the voice data to generate the textual content of the
call; sending the textual content by the calling station to a
server via the connection; and transferring, by the server, the
textual content and the voice data, received by the server, as
electronic mail to an electronic incoming mailbox of the callee on
the callee's computer.
2. The method according to claim 1, wherein the calling station
includes a caller's computer.
3. The method according to claim 2, wherein the caller's computer
is connected to a caller's phone that has analog and digital
capabilities.
4. The method according to claim 3, wherein the connection includes
a broadband access line connection.
5. The method according to claim 1, wherein the recording the call
to generate voice data includes recording the call on a phone to
generate analog voice data.
6. The method according to claim 1, wherein the recording the call
to generate voice data includes recording the call on a phone to
generate digital voice data.
7. The method according to claim 1, wherein the recording the call
to generate voice data includes recording the call on a computer to
generate digital voice data.
8. The method according to claim 1, wherein the transcribing
comprises: receiving a first message, at the calling station, sent
by the server, the message indicating an absence of the callee at
the receiving station; sending a second message, by the calling
station, to the server to acknowledge the first message;
retrieving, at the calling station, a local voice profile
corresponding to the caller stored at the calling station; and
processing the voice data to generate the textual content of the
call using the local voice profile.
9. The method according to claim 8, wherein the receiving
comprises: switching, at the server, the call, transmitted by the
placing through the connection, to the callee's phone; detecting an
unattended call at the callee's phone by the server; sending, by
the server, the first message to the calling station as a result of
the unattended call, detected by the detecting; and intercepting
the first message, by the calling station, sent by the sending by
the server.
10. The method according to claim 9, wherein the server includes a
Private Branch Exchange (PBX).
11. The method according to claim 8, wherein the processing
comprises: digitizing the voice data, if the voice data is recorded
in analog form, to generate digital voice data; and performing
speech recognition, using a speech recognition engine and the local
voice profile, on the digital voice data to generate the textual
content of the call.
12. The method according to claim 11, wherein the local voice
profile characterizes the speech properties of the caller.
13. The method according to claim 12, wherein the speech properties
include vocal track characteristics of the caller.
14. A method for a phone connected to a computer, the method
comprising: establishing an analog connection from the phone,
representing a caller, to a receiving station, representing a
callee; transmitting an analog signal of a call, placed by the
caller to the callee using the phone; recording the analog signal
of the call at the phone; receiving a digital signal, sent by a
server connecting the caller to the callee, the digital signal
indicating the absence of the callee; sending the analog signal to
the computer; receiving digital textual content, transcribed from
the call by the computer; and sending the digital textual content
to the server.
15. The method according to claim 14, further comprising:
determining the identity of the caller prior to the establishing;
associating the analog signal with the identity, determined by the
determining; and sending the identity of the caller with the analog
signal to the computer.
16. The method according to claim 15, wherein the determining the
identity includes determining the identity of the caller via
speaker identification.
17. A method for a computer connected to a phone, the method
comprising: receiving an analog signal representing a call placed
by a caller via the phone; converting the analog signal to a
corresponding digital signal of the call; retrieving a local voice
profile, stored on the computer, that corresponds to the caller and
that characterizes speech properties of the caller; and performing
speech recognition on the digital signal based on the local voice
profile using a speech recognition engine to generate textual
content of the call; and sending the textual content to the
phone.
18. The method according to claim 17, further comprising: receiving
an identification corresponding to the caller from the phone; and
selecting the local voice profile, from a plurality of local voice
profiles, that has the identification.
19. A method for a server linked to a caller and a callee via a
connection, the method comprising: receiving a voice signal
transmitted from the caller, representing a call placed from the
caller to the callee, via the connection; storing the voice signal
at the server; transmitting the voice signal to a phone
corresponding to the callee; detecting absence of the callee;
sending a first message to the caller if the absence is detected;
receiving a second message from the caller as an acknowledgement to
the first message; receiving the textual content of the call, sent
by the caller; and sending the textual content and the voice signal
to the electronic mail address of the callee.
20. The method according to claim 19, wherein the textual content
is sent to the electronic mail address as an attachment to an
electronic mail message.
21. A computer-readable medium encoded with a plurality of
processor-executable instruction sequences for: placing a call, by
a caller on a calling station, to a callee, represented by a
receiving station, through a connection, the receiving station
including a callee's phone linked to a callee's computer; recording
the call, at the calling station, to generate voice data;
transcribing the call based on the voice data to generate the
textual content of the call; sending the textual content by the
calling station to a server via the connection; and transferring,
by the server, the textual content and the voice data, received by
the server, as electronic mail to an electronic incoming mailbox of
the callee on the callee's computer.
22. The computer-readable medium according to claim 21, wherein the
transcribing comprises: receiving a first message, at the calling
station, sent by the server, the message indicating an absence of
the callee at the receiving station; sending a second message, by
the calling station, to the server to acknowledge the first
message; retrieving, at the calling station, a local voice profile
corresponding to the caller stored at the calling station; and
processing the voice data to generate the textual content of the
call using the local voice profile.
23. A computer-readable medium encoded with a plurality of
processor-executable instruction sequences for: establishing an
analog connection from the phone, representing a caller, to a
receiving station, representing a callee; transmitting an analog
signal of a call, placed by the caller to the callee using the
phone; recording the analog signal of the call at the phone;
receiving a digital signal, sent by a server connecting the caller
to the callee, the digital signal indicating the absence of the
callee; sending the analog signal to the computer; receiving
digital textual content, transcribed from the call by the computer;
and sending the digital textual content to the server.
24. The computer-readable medium according to claim 23, further
having processor-executable instruction sequences for: determining
the identity of the caller prior to the establishing; associating
the analog signal with the identity, determined by the determining;
and sending the identity of the caller with the analog signal to
the computer.
25. A computer-readable medium encoded with a plurality of
processor-executable instruction sequences for: receiving a voice
signal transmitted from the caller, representing a call placed from
the caller to the callee, via the connection; storing the voice
signal at a server; transmitting the voice signal to a phone
corresponding to the callee; detecting absence of the callee;
sending a first message to the caller if the absence is detected;
receiving a second message from the caller as an acknowledgement to
the first message; receiving the textual content of the call, sent
by the caller; and sending the textual content and the voice signal
to the electronic mail address of the callee.
26. The computer-readable medium of claim 25, wherein the textual
content is sent to the electronic mail address embedded in an
e-mail message.
27. A system for providing textual content along with voice
messages, the system comprising: a calling station configured to
enable a caller to place a call; a receiving station configured to
communicate with the calling station through a connection, the
receiving station including a callee's phone linked to a callee's
computer; and a server configured to communicate with the calling
station and the receiving station, wherein said call is recorded at
the calling station to generate voice data, the calling station
transcribes the call based on the voice data to generate textual
content of the call, the calling station sends the textual content
to the server via the connection, and the server transfers the
textual content and the voice data, received by the server, as
electronic mail to an electronic incoming mailbox of the callee on
the callee's computer.
28. The system of claim 27, wherein the calling station includes a
caller's computer.
29. A calling station for providing textual content along with
voice messages, the calling station comprising: a phone configured
to communicate with a callee phone and a server through a
connection; and a computer linked to the phone, the computer having
an identification mechanism and a speech recognition mechanism, the
identification mechanism being configured to determine the identity
of a caller placing a call from the phone, the speech recognition
mechanism being configured to perform speech recognition on a
digital signal recorded from the call based on a retrieved local
voice profile corresponding to the identity, the speech recognition
generating textual content of the call, wherein the phone sends the
textual content to the server.
30. The calling station according to claim 29, wherein the computer
and the phone are incorporated into one device.
Description
BACKGROUND
[0001] 1. Field of the Invention
[0002] The present invention relates in general to messaging.
Specifically, the present inventions relate to methods and systems
for unified messaging.
[0003] 2. General Background and Related Art
[0004] In our modern telecommunications era, messaging is carried
out in various ways, such as by leaving a phone message or sending
electronic mail. To enable efficient message search and retrieval,
messages may be indexed according to date received, date sent,
sender, subject, etc.
[0005] Information management techniques attempt to organize
information such that search and retrieval are easy and meaningful.
Such techniques have been applied to electronic mail systems.
Advanced systems enable users to search email messages based on
their content. For example, a user may enter a certain keyword as a
search criterion, and the mail system will return a set of all
messages that contain the keyword.
[0006] Conventional voice messages may be transferred to an
electronic mail system as a digital file attachment wherein an
audio signal is encoded as a digital representation. One popular
form of such files is "*.wav". There are other such waveform files
as well. Thus, an audio file, such as a voicemail message, can be
sent as an attachment over the Internet. For example, a phone
message may be intercepted by a server that detects when the
intended recipient of a call is absent. The audio phone message may
be digitized to generate a wave signal representation of the voice
message, and then sent as an attachment to a pre-specified e-mail
address of the recipient. Accordingly, the recipient may be able to
access voice messages by means other than calling his voice mail
box on the telephone, such as through e-mail. An audio voice mail
attachment may be played using appropriate software when the e-mail
message to which it is attached is opened by the recipient.
[0007] One advantage of sending a voice message as an attachment in
an electronic mail system is that the message can be indexed
according to certain criteria, such as by date received. As such,
the recipient may search and retrieve messages based on the
criteria. Such selective search and retrieval of messages is not
possible if a voice message is left on a conventional phone. That
is, when a voice message is sent to an electronic mail system, some
information management techniques may be applied to facilitate
message search and retrieval.
[0008] However, it is not possible to directly apply information
management techniques to the content of voice messages that are
sent to electronic systems as attachments. Indeed, the digital
waveform exists in its intrinsic signal form instead of in digital
textual form. Indexing digital voicemail messages according to
their content would be more semantically meaningful to users.
[0009] Therefore, what is needed is a method and system that allows
users to search and retrieve information based on the content of
voicemail messages in an e-mail system.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] FIG. 1 is a block diagram of a system according to an
embodiment of the present invention.
[0011] FIG. 2 is a block diagram illustrating processing of data
according to the present invention.
[0012] FIG. 3 is a block diagram of a calling station according to
an embodiment of the present invention.
[0013] FIG. 4 is a flowchart showing a method according to the
present invention.
[0014] FIG. 5 is a flowchart showing a method according to the
present invention.
DETAILED DESCRIPTION
[0015] The following detailed description refers to the
accompanying drawings that illustrate exemplary embodiments of the
claimed inventions. Other embodiments are possible and
modifications may be made to the embodiments without departing from
the spirit and scope of the invention. Therefore, the following
detailed description is not meant to limit the invention. Rather,
the scope of the invention is defined by the appended claims.
[0016] It will be apparent to one of ordinary skill in the art that
the embodiments as described below may be implemented in many
different embodiments of software, firmware, and hardware in the
entities illustrated in the figures. The actual software code or
specialized control hardware used to implement the present
invention is not limiting of the present invention. Thus, the
operation and behavior of the embodiments will be described without
specific reference to the actual software code or specialized
hardware components. The absence of such specific references is
feasible because it is clearly understood that artisans of ordinary
skill would be able to design software and control hardware to
implement the embodiments of the present invention based on the
description herein.
[0017] Moreover, the processes associated with the presented
embodiments may be stored in any storage device, such as, for
example, a computer system (non-volatile) memory, an optical disk,
magnetic tape, or magnetic disk. Furthermore, the processes may be
programmed when the computer system is manufactured or via a
computer-readable medium at a later date. Such a medium may include
any of the forms listed above with respect to storage devices and
may further include, for example, a carrier wave modulated, or
otherwise manipulated, to convey instructions that can be read,
demodulated/decoded and executed by a computer.
[0018] A system and method for providing textual content along with
voice messages, as described herein, involves a caller at a calling
station placing a call to a callee at a receiving station. The call
is placed through a connection, and the receiving station includes
a callee's phone linked to a callee's computer. The call is
recorded at the calling station to generate voice data. The call is
then transcribed based on the voice data to generate the textual
content of the call. The textual content is sent by the calling
station to a server via the connection. The server then transfers
the textual content, as well as voice data received by the server,
to an electronic incoming mailbox of the callee as electronic
mail.
[0019] FIG. 1 is a block diagram of system 100 according to an
embodiment of the present invention. System 100 comprises calling
station 110, PBX network 190, voicemail server 170, e-mail server
180, and receiving station 120. The various components of system
100 send to each other, and receive from each other, information
via PBX network 190. It is to be noted that system 100 may include
other types of connections in lieu of or in addition to PBX network
190, such as, for example, a narrowband phone line connection, a
broadband access line connection, a wireless connection, or a Voice
over IP (VoIP) connection.
[0020] Calling station 110 comprises caller phone 140 and caller
computer 130. Caller phone 140 may have both analog and digital
capabilities, wherein a caller speaks into a handset of caller
phone 140, the spoken words are received as an analog signal, and
the analog signal is converted into digital data by caller phone
140. Caller phone 140 interfaces with caller computer 130 via
Universal Serial Bus 135, or another suitable connection interface.
Caller computer 130 may process digital data to generate textual
information. The functions performed by caller phone 140 and caller
computer 130 may be performed by one multifunctional device or
multiple discrete devices.
[0021] Voicemail server 170 provides voicemail functions in system
100. Voicemail server 170 may provide conventional voicemail
functions. However, voicemail server 170 may also be configured to
communicate with caller phone 140 or caller computer 130. For
instance, voicemail server 170 may send handshaking tones to caller
phone 140, and receive handshaking tones therefrom. E-mail server
180 stores e-mail messages, and provides a platform for
transmission and reception of e-mail messages to and from various
components of system 100, as well as other external nodes (not
shown) in remote locations. For instance, e-mail server 180 may run
Microsoft Outlook server software. Voicemail server 170 and e-mail
server 180 may be implemented as one server.
[0022] Receiving station 120 comprises calling phone 150 and callee
computer 160 linked thereto. Callee phone 150 may interface with
callee computer 160 via a USB connection 155. Callee phone 150 and
callee computer 160 may be implemented in system 100 as one
device.
[0023] FIG. 2 is a high-level block diagram illustrating processing
of data according to the present invention. As shown, analog voice
data 210 is received at calling station 110 when a caller speaks
into a handset of caller phone 140. Analog voice data 210 is then
converted, via analog-to-digital conversion, to digital voice data
220. Textual content 230 is extracted from digital voice data 220,
and both digital voice data 220 and associated textual content 230
are placed in an e-mail message 240. E-mail message 240, which may
be retrieved or received by receiving station 240, may include
digital voice data 220 in the form of a wave file.
[0024] In an exemplary implementation, interaction of the
components of system 100 in FIG. 1 may occur as follows. A caller
at calling station 110, desiring to place a call to a callee at
receiving station 120, places a call from caller phone 140. PBX
network 190 routes the caller's call to callee phone 150 at
receiving station 120. If the callee is present, the callee may
pick up callee phone 150 and may converse with the caller calling
from caller phone 140. If the callee does not pick up callee phone
150--for instance, the callee is absent or unable to get to callee
phone 150 in due time--the call is forwarded to voicemail server
170. Voicemail server 170 may then send a tone to caller phone 140.
Such a tone may signify to caller phone 140 that caller phone 140
should switch to another mode of operation. Caller phone 140 may
then send meaningful tones back to voicemail server 170. As such, a
digital handshake may occur between caller phone 140 and voicemail
server 170.
[0025] Via USB connection 135, caller computer 130 may extract text
from digital voice data received by caller phone 140. Caller
computer 130 may also assign a digital timestamp, or other
appropriate indicia, to the call. During this extraction phase, the
caller's message may be received and recorded at voicemail server
170. When the caller finishes leaving the message, caller phone 140
may transfer the text extracted by caller computer 130, the
timestamp, and any other indicia associated with the message to
voicemail server 170.
[0026] Voicemail server 170 may then include the text, the
timestamp and other indicia of the message, and a wave file
containing received voice data for the message in an e-mail message
in e-mail server 180. The wave file may be included as an
attachment to the e-mail message. Textual information may be
embedded within the e-mail message or attached thereto in a text
file. The message may be placed in the callee's inbox in e-mail
server 180. Accordingly, the callee may retrieve the message, which
not only contains digital voice data, but also includes textual
information corresponding to the message and any other indicia
associated with the message. It is to be appreciated that a
timestamp or other such indicia for the message need not be
included in the information transmitted to voicemail server 170, or
ultimately, to receiving station 120.
[0027] If caller phone 140 and caller computer 130 are not
configured to perform various handshaking and extraction functions
described above, then a caller from caller phone 140 may simply
leave a voicemail message for a callee. Voicemail server 170 may
receive digital voice data and attach such data as a wave file to
an e-mail message placed in the callee's inbox of e-mail server
180. Similarly, if caller phone 140 or caller computer 130 are not
functioning properly, or processing demands prevent such components
from sending handshaking tones, then caller phone 140 need not
respond to the handshaking tones transmitted by voicemail server
170.
[0028] FIG. 3 is a high-level block diagram of calling station 301
according to an embodiment of the present invention. In this
embodiment, calling station 301 comprises caller computer 305 and
caller phone 330. Caller computer 305 may comprise identification
mechanism 310 and speech recognition mechanism 320. Identification
mechanism 310 and speech recognition mechanism 320 may also be
implemented individually or together within caller computer 305 or
within another device.
[0029] A local voice profile corresponding to a given caller may be
stored at calling station 301. The local voice profile may
characterize various speech properties of the caller, such as, for
example, vocal track characteristics of the caller, voice
characteristics of the caller, or pronunciation habits of the
caller. Such speech properties may be determined by a training
program, wherein a user speaks various sample words and phrases
such that applicable speech processing algorithms learn to more
accurately process the user's speech.
[0030] Identification mechanism 310 may determine the identity of a
caller before the caller attempts to leave a message with a callee.
Identification mechanism 310 may determine the identity of the
caller via speaker identification methods. For instance,
identification mechanism 310 may identify the caller based on her
voice, and her associated local voice profile may be loaded on
caller computer 305 for further processing. Speaker verification
methods may also be used to determine the identity of the caller.
Before dialing a callee's number, a caller may enter an
identification code on a keypad of caller phone 330; the code may
be processed to identify the caller. Smart-cards or biometric
detectors may also be employed to identify the caller.
[0031] Speech recognition mechanism 320 may receive as input
digital voice data, and may output textual content of the digital
voice data. If voice data is initially in analog form, then the
data may first be digitized such that speech recognition mechanism
320 may act upon such data. In other words, speech recognition
mechanism 320 may transcribe spoken information encapsulated within
digital voice data. Software to transcribe digital voice data to
text may be prepared or purchased from a software developer, such
as Dragon Systems, Inc., and incorporated into the present
invention. Speech recognition mechanism 320 may load a local voice
profile associated with the caller to more accurately process and
transcribe the content of the caller's message. Because speech
recognition may be processor-intensive, speech recognition
mechanism may be located within a client computer, such as caller
computer 305, as shown in FIG. 3.
[0032] Multiple local voice profiles may be stored in, or
accessible to, caller computer 305 and caller phone 330. As such,
caller computer 305 may, via identification mechanism 310, select
from among the stored local voice profiles a particular local voice
profile associated with a given caller.
[0033] FIG. 4 is a flowchart illustrating a method for providing
textual content along with voice messages according to an
embodiment of the present invention. In block B410, a caller places
a call to a callee. In block B420, the call and a timestamp for the
call are recorded as digital data. During this recording phase, the
call, which may be a message left for the callee, may also be
received and recorded by voicemail server 170. The call may be
transcribed in block B430 to generate textual content. In block
B440, the textual content and timestamp for the call may be sent to
voicemail server 170. In block B450, the textual content, voice
data, and time of the call are transferred to callee's computer as
an e-mail message stored in e-mail server 180.
[0034] FIG. 5 is a flowchart showing a method for providing textual
content along with voice messages according to another embodiment
of the present invention. In block B510, a caller places a call to
a callee. In block B520, the method tests whether the callee is
absent from the callee's phone. If the callee is not absent, no
further processing within the method occurs. If the callee is
absent, then in block B530, handshaking between voicemail server
170 and calling station 301 may be performed. In block B540, a
local voice profile corresponding to the caller at calling station
301 may be retrieved. Voice data representing the caller's message
is recorded as digital data in block B550; recording may occur at
both calling station 301 and voicemail server 170. In block B560,
voice data may be transcribed via speech recognition mechanism 320,
which may load the local voice profile of the caller to produce a
more accurate transcription. In block B570, after the message has
been transcribed, the textual content of the message may be sent to
voicemail server 170. Thereafter, voicemail server 170 may send an
e-mail message to a callee containing voice data, attached as a
wave file, and textual content, as indicated in block B580.
[0035] The foregoing description of the preferred embodiments is
provided to enable any person skilled in the art to make or use the
present invention. Various modifications to these embodiments are
possible, and the generic principles presented herein may be
applied to other embodiments as well. For example, a message may be
both recorded and transcribed at the calling station, and the
calling station may forward both the textual content and a wave
file to the voicemail server.
[0036] In addition, the invention may be implemented in part or in
whole as a hard-wired circuit, as a circuit configuration
fabricated into an application-specific integrated circuit, or as a
firmware program loaded into non-volatile storage or a software
program loaded from or into a data storage medium as
machine-readable code, such code being instructions executable by
an array of logic elements such as a microprocessor or other
digital signal processing unit. Furthermore, speech recognition and
transcription of voice data may be performed at a voicemail server,
e-mail server, or receiving station, or another dedicated
transcription server.
[0037] As such, the present invention is not intended to be limited
to the embodiments shown above but rather is to be accorded the
widest scope consistent with the principles and novel features
disclosed in any fashion herein.
* * * * *