System and Method for Selective Voicemail Transcription JACKSON; James ; et al. [NUANCE COMMUNICATIONS, INC.]

System and Method for Selective Voicemail Transcription

JACKSON; James ; et al.

Patent Application Summary

U.S. patent application number 15/996669 was filed with the patent office on 2018-10-04 for system and method for selective voicemail transcription. The applicant listed for this patent is NUANCE COMMUNICATIONS, INC.. Invention is credited to Philip CUNETTO, James JACKSON, Mehrad YASREBI.

Application Number	20180288227 15/996669
Document ID	/
Family ID	45556178
Filed Date	2018-10-04

United States Patent Application	20180288227
Kind Code	A1
JACKSON; James ; et al.	October 4, 2018

System and Method for Selective Voicemail Transcription

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for selectively transcribing messages. Five general approaches are disclosed herein. The first approach is directed to checking for a transcription capable client, which transcribes messages when a client device is capable of receiving transcriptions. The second and third approaches are platform-controlled and user-controlled predefined selective transcription. One aspect of this approach is driven by transcription rules. The fourth approach is user-controlled on-demand selective transcription before the message is stored or deposited for transcription. An example of this is a user transferring an incoming caller to voicemail and indicating that the voicemail be transcribed. The fifth approach is user-controlled on-demand selective transcription after the message is stored. In one embodiment of this approach, a user must specifically request that a stored message be transcribed.

Inventors:

JACKSON; James; (Austin, TX) ; CUNETTO; Philip; (Austin, TX) ; YASREBI; Mehrad; (Austin, TX)

Applicant:

Name	City	State	Country	Type
NUANCE COMMUNICATIONS, INC.	Burlington	MA	US

Family ID:

45556178

Appl. No.:

15/996669

Filed:

June 4, 2018

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
14850082	Sep 10, 2015	9992344
15996669
14531572	Nov 3, 2014	9137375
14850082
12852190	Aug 6, 2010	8879695
14531572

Current U.S. Class:	1/1
Current CPC Class:	H04M 15/8011 20130101; H04M 3/53366 20130101; H04M 2203/459 20130101; H04M 3/5307 20130101; H04M 2201/40 20130101; H04M 2201/60 20130101; G10L 15/26 20130101; H04M 3/53333 20130101
International Class:	H04M 3/533 20060101 H04M003/533; H04M 15/00 20060101 H04M015/00; G10L 15/26 20060101 G10L015/26; H04M 3/53 20060101 H04M003/53

Claims

1. A method comprising: transferring a call to a subscriber to a voicemail system; determining that the call to the subscriber resulted in a voicemail; determining that the voicemail should be transcribed to text to yield a determination, the determination being based on at least one of a class of service associated with the subscriber or an input from the subscriber requesting transcription of the voicemail into text; based on the determination, transcribing the voicemail into text to yield a voicemail transcription; and presenting the voicemail transcription on a device associated with the subscriber.

2. The method of claim 1, further comprising: presenting, via a client device, a notification of the call to the subscriber.

3. The method of claim 1, receiving, while presenting the notification, an input from the subscriber with regard to the incoming call, wherein the input is associated with instructions to: (1) transfer the incoming call to a voicemail system; and (2) when the incoming call results in a voicemail at the voicemail system, transcribe the voicemail into text; transferring, based on the input, the incoming call to the voicemail system to yield the voicemail; determining whether a current time is within a first time window associated with a first class of service or a second time window a second class of service to yield a determined class of service according to the current time; generating, based on the determined class of service according to the current time, a transcription of the voicemail, the transcription comprising text generated from the voicemail; and presenting the transcription on a device of the subscriber.

2. The method of claim 1, wherein the transferring of the incoming call and the generating of the transcription are further based on a class of service associated with the subscriber.

3. The method of claim 2, wherein the class of service is one of a plurality of classes of service, each class of the plurality of classes of service having a distinct functionality and cost, wherein the cost requires a premium for a threshold time.

4. The method of claim 1, further comprising: identifying a current transcription state of the voicemail for the subscriber; storing the current transcription state in a subscriber directory; and notifying the subscriber when the current transcription state changes.

5. The method of claim 1, wherein the transferring of the incoming call and the generating of the transcription are performed by the client device.

6. The method of claim 1, further comprising, prior to the generating of the transcription, comparing the voicemail to a set of transcription rules to determine if the voicemail should remain untranscribed.

7. The method of claim 1, wherein the first time window and the second time window do not overlap and cover a consecutive twenty-four hours.

8. A system comprising: a processor; and a computer-readable storage device having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: presenting, via a client device, a notification of an incoming call to a subscriber; receiving, while presenting the notification, an input from the subscriber with regard to the incoming call, wherein the input indicates to: (1) transfer the incoming call to a voicemail system; and (2) when the incoming call results in a voicemail at the voicemail system, transcribe the voicemail into text; transferring, based on the input, the incoming call to the voicemail system; determining whether a current time is within a first time window associated with a first class of service or a second time window a second class of service to yield a determined class of service according to the current time; generating, based on the determined class of service according to the current time, a transcription of the voicemail, the transcription comprising text generated from the voicemail; and presenting the transcription on a device of the subscriber.

9. The system of claim 8, wherein the transferring of the incoming call and the generating of the transcription are further based on a class of service associated with the subscriber.

10. The system of claim 9, wherein the class of service is one of a plurality of classes of service, each class of the plurality of classes of service having a distinct functionality and cost, wherein the cost requires a premium for a threshold time.

11. The system of claim 8, the computer-readable storage device having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising: identifying a current transcription state of the voicemail for the subscriber; storing the current transcription state in a subscriber directory; and notifying the subscriber when the current transcription state changes.

12. The system of claim 8, wherein the transferring of the incoming call and the generating of the transcription are performed by the client device.

13. The system of claim 8, the computer-readable storage device having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising, prior to the generating of the transcription, comparing the voicemail to a set of transcription rules to determine if the voicemail should remain untranscribed.

14. The system of claim 8, wherein the transcription occurs via a hybrid transcription service comprising a first class of service for a time window and a second class of service for a remainder time window, and wherein the time window and the remainder time window do not overlap and cover a consecutive twenty-four hours.

15. A computer-readable storage device having instructions stored which, when executed by a computing device, result in the computing device performing operations comprising: presenting, via a client device, a notification of an incoming call to a subscriber; receiving, while presenting the notification, an input from the subscriber with regard to the incoming call, wherein the input is associated with instructions to: (1) transfer the incoming call to a voicemail system; and (2) when the incoming call results in a voicemail at the voicemail system, transcribe the voicemail into text; transferring, based on the input, the incoming call to the voicemail system to yield the voicemail; determining whether a current time is within a first time window associated with a first class of service or a second time window a second class of service to yield a determined class of service according to the current time; generating, based on the determined class of service according to the current time, a transcription of the voicemail, the transcription comprising text generated from the voicemail; and presenting the transcription on a device of the subscriber.

16. The computer-readable storage device of claim 15, wherein the transferring of the incoming call and the generating of the transcription are further based on a class of service associated with the subscriber.

17. The computer-readable storage device of claim 16, wherein the class of service is one of a plurality of classes of service, each class of the plurality of classes of service having a distinct functionality and cost, wherein the cost requires a premium for a threshold time.

18. The computer-readable storage device of claim 15, having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising: identifying a current transcription state of the voicemail for the subscriber; storing the current transcription state in a subscriber directory; and notifying the subscriber when the current transcription state changes.

19. The computer-readable storage device of claim 15, wherein the transferring of the incoming call and the generating of the transcription are performed by the client device.

20. The computer-readable storage device of claim 15, having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising, prior to the generating of the transcription, comparing the voicemail to a set of transcription rules to determine if the voicemail should remain untranscribed.

Description

PRIORITY INFORMATION

[0001] The present application is a continuation of U.S. patent application Ser. No. 14/850,082, filed Sep. 10, 2015, which is a continuation of Ser. No. 14/531,572, filed Nov. 3, 2014, now U.S. Pat. No. 9,137,375, issued Sep. 15, 2015, which is a continuation of U.S. patent application Ser. No. 12/852,190, filed Aug. 6, 2010, now U.S. Pat. No. 8,879,695, issued Nov. 4, 2014, the contents of which are incorporated herein by reference in their entirety.

BACKGROUND

1. Technical Field

[0002] The present disclosure relates to message transcriptions and more specifically to selectively transcribing messages in a messaging platform.

2. Introduction

[0003] Transcribing voicemails or other messages from multimedia forms such as video, images, and audio to text is a very resource intensive process that can require significant amounts of processing time, memory, disk space, and so forth. Many subscribers either do not have the ability to view transcriptions at the moment due to device-based limitations or they do not have desire to read view the transcriptions at that time or for that particular message. Further, certain subscribers simply do not access transcriptions regularly and prefer to listen to or view the original message instead of the transcription. In such cases, the resources spent to transcribe messages are effectively wasted and could have been allocated to process more urgent messages. This waste leads system designers to intentionally overdesign a transcription and messaging system and spend more money constructing such a system than is actually necessary.

SUMMARY

[0004] Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or can be learned by practice of the herein disclosed principles. The features and advantages of the disclosure can be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the disclosure will become more fully apparent from the following description and appended claims, or can be learned by the practice of the principles set forth herein.

[0005] Disclosed herein are systems, methods, and non-transitory computer-readable storage media for selectively transcribing messages. Five general approaches are disclosed herein. The first approach is directed to checking for a transcription capable client, which transcribes messages when a client device is capable of receiving transcriptions and when an associated class of service indicates that transcriptions should be performed. The second approach is platform-controlled predefined selective transcription and the third approach is user-controlled predefined selective transcription. One aspect of this approach is driven by transcription rules. The fourth approach is user-controlled on-demand selective transcription before the message is stored or deposited for transcription. An example of this is a user transferring an incoming caller to voicemail and indicating that the voicemail be transcribed. The fifth approach is user-controlled on-demand selective transcription after the message is stored. In one embodiment of this approach, a user must specifically request that a stored message be transcribed. These approaches can be used separately, in combination with each other, and/or with other transcription optimization techniques.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] In order to describe the manner in which the above-recited and other advantages and features of the disclosure can be obtained, a more particular description of the principles briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only exemplary embodiments of the disclosure and are not therefore to be considered to be limiting of its scope, the principles herein are described and explained with additional specificity and detail through the use of the accompanying drawings in which:

[0007] FIG. 1 illustrates an example system embodiment;

[0008] FIG. 2 illustrates an example unified messaging (UM) server and UM client configuration;

[0009] FIG. 3 illustrates a first example method embodiment;

[0010] FIG. 4 illustrates a second example method embodiment; and

[0011] FIG. 5 illustrates a third example embodiment.

DETAILED DESCRIPTION

[0012] Various embodiments of the disclosure are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without parting from the spirit and scope of the disclosure.

[0013] The present disclosure addresses the need in the art for prioritizing and selectively transcribing messages. Some introductory principles and concepts are discussed first, followed by a brief description of a basic general purpose system or computing device in FIG. 1 which can be employed to practice the concepts is disclosed herein. A more detailed description of a unified messaging platform and the various methods will then follow.

[0014] Transcribing a voicemail from audio to text is a very resource intensive process, requiring significant amounts of processor time, memory, storage, and so forth. This disclosure provides a framework for optimizing resource utilization and thereby reducing costs, through selective transcription mechanisms. This ensures that transcriptions are only performed when necessary. Five major types of selective transcription disclosed herein include (1) a transcription-capable client check, (2) platform-controlled predefined selective transcription, (3) user-controlled pre-defined selective transcription, (4) user-controlled on-demand selective transcription (pre-deposit), and (5) user-controlled on-demand selective transcription (post-deposit). These five types of selective transcription shall be discussed herein as the various embodiments are set forth. The disclosure now turns to FIG. 1.

[0015] With reference to FIG. 1, an exemplary system 100 includes a general-purpose computing device 100, including a processing unit (CPU or processor) 120 and a system bus 110 that couples various system components including the system memory 130 such as read only memory (ROM) 140 and random access memory (RAM) 150 to the processor 120. The system 100 can include a cache (not shown) of high speed memory connected directly with, in close proximity to, or integrated as part of the processor 120. The system 100 copies data from the memory 130 and/or the storage device 160 to the cache for quick access by the processor 120. In this way, the cache provides a performance boost that avoids processor 120 delays while waiting for data. These and other modules can be configured to control the processor 120 to perform various actions. Other system memory 130 may be available for use as well. The memory 130 can include multiple different types of memory with different performance characteristics. It can be appreciated that the disclosure may operate on a computing device 100 with more than one processor 120 or on a group or cluster of computing devices networked together to provide greater processing capability. The processor 120 can include any general purpose processor and a hardware module or software module, such as module 1 162, module 2 164, and module 3 166 stored in storage device 160, configured to control the processor 120 as well as a special-purpose processor where software instructions are incorporated into the actual processor design. The processor 120 may essentially be a completely self-contained computing system, containing multiple cores or processors, a bus, memory controller, cache, etc. A multi-core processor may be symmetric or asymmetric.

[0016] The system bus 110 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. A basic input/output (BIOS) stored in ROM 140 or the like, may provide the basic routine that helps to transfer information between elements within the computing device 100, such as during start-up. The computing device 100 further includes storage devices 160 such as a hard disk drive, a magnetic disk drive, an optical disk drive, tape drive or the like. The storage device 160 can include software modules 162, 164, 166 for controlling the processor 120. Other hardware or software modules are contemplated. The storage device 160 is connected to the system bus 110 by a drive interface. The drives and the associated computer readable storage media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the computing device 100. In one aspect, a hardware module that performs a particular function includes the software component stored in a non-transitory computer-readable medium in connection with the necessary hardware components, such as the processor 120, bus 110, output device (e.g., display) 170, and so forth, to carry out the function. The basic components are known to those of skill in the art and appropriate variations are contemplated depending on the type of device, such as whether the device 100 is a small, handheld computing device, a desktop computer, or a computer server.

[0017] Although the exemplary embodiment described herein employs the hard disk 160, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that are accessible by a computer, such as magnetic cassettes, flash memory cards, digital versatile disks, cartridges, random access memories (RAMs) 150, read only memory (ROM) 140, a cable or wireless signal containing a bit stream and the like, may also be used in the exemplary operating environment. Non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals per se.

[0018] To enable user interaction with the computing device 100, an input device 190 represents any number of input mechanisms, such as a microphone for speech, a touch-sensitive screen for gesture or graphical input, keyboard, mouse, motion input, speech and so forth. An output device 170 can also be one or more of a number of output mechanisms known to those of skill in the art. In some instances, multimodal systems enable a user to provide multiple types of input to communicate with the computing device 100. The communications interface 180 generally governs and manages the user input and system output. There is no restriction on operating on any particular hardware arrangement and therefore the basic features here may easily be substituted for improved hardware or firmware arrangements as they are developed.

[0019] For clarity of explanation, the illustrative system embodiment is presented as including individual functional blocks including functional blocks labeled as a "processor" or processor 120. The functions these blocks represent may be provided through the use of either shared or dedicated hardware, including, but not limited to, hardware capable of executing software and hardware, such as a processor 120, that is purpose-built to operate as an equivalent to software executing on a general purpose processor. For example the functions of one or more processors presented in FIG. 1 may be provided by a single shared processor or multiple processors. (Use of the term "processor" should not be construed to refer exclusively to hardware capable of executing software.) Illustrative embodiments may include microprocessor and/or digital signal processor (DSP) hardware, read-only memory (ROM) 140 for storing software performing the operations discussed below, and random access memory (RAM) 150 for storing results. Very large scale integration (VLSI) hardware embodiments, as well as custom VLSI circuitry in combination with a general purpose DSP circuit, may also be provided.

[0020] The logical operations of the various embodiments are implemented as: (1) a sequence of computer-implemented steps, operations, or procedures running on a programmable circuit within a general use computer, (2) a sequence of computer implemented steps, operations, or procedures running on a specific-use programmable circuit; and/or (3) interconnected machine modules or program engines within the programmable circuits. The system 100 shown in FIG. 1 can practice all or part of the recited methods, can be a part of the recited systems, and/or can operate according to instructions in the recited non-transitory computer-readable storage media. Such logical operations can be implemented as modules configured to control the processor 120 to perform particular functions according to the programming of the module. For example, FIG. 1 illustrates three modules Mod1 162, Mod2 164 and Mod3 166 which can be modules configured to control the processor 120. These modules may be stored on the storage device 160 and loaded into RAM 150 or memory 130 at runtime or may be stored as would be known in the art in other computer-readable memory locations.

[0021] Having disclosed some basic system components, the disclosure now turns to the exemplary method embodiment shown in FIG. 2. For the sake of clarity, the method is discussed in terms of an exemplary system such as is shown in FIG. 1 configured to practice the method.

[0022] The disclosure now turns to FIG. 2 which illustrates an example overview 200 of a unified messaging (UM) server 202 and UM client configuration with a diversity of clients, such as a limited display device 210A, a smartphone 210B, a telephone with no display 210C, and a personal computer 210D. The UM server 202 and/or UM clients 210A, 210B, 210C, 210D can include all or part of the elements of the exemplary system 100 shown in FIG. 1. The UM server 202 receives messages from multiple message sources 204a, 204b, 204c via a communication network 206, such as the public switched telephone network or the Internet. The message sources can provide message such as voicemails, video messages, faxes, images, multimedia messages, and/or hyperlinks.

[0023] When the UM server 202 receives messages, the UM server 202 can identify a recipient (also called user or subscriber herein) of the message and retrieve a subscriber profile from a UM directory 214, and can store the message in the subscriber's mailbox (not shown). The subscriber profile can provide information about a class of service for the subscriber. For example, one subscriber can pay a premium fee for real-time transcription service, another subscriber can pay a lower fee for a first non-real-time transcription service that indicates a preference for a short transcription time, but the short time is not guaranteed, and a third subscriber can use a second non-real-time transcription service for free that has no preference for a transcription delay. The UM server 202 can send non-text contents of messages (e.g., voice messages) to the transcription server(s) 208 to be transcribed (converted to text messages). Content to be transcribed is referred to as raw-media content herein for conciseness. In at least one embodiment, the UM server 202 transmits raw content to the transcription server(s) 208 after receiving a complete message, which contains one or more raw content(s). In another embodiment, the UM server 202 transmits raw content to the transcription server(s) 208 even if the UM server 202 has not received the entire message. While waiting to be transcribed, non-real-time raw contents can be deposited in a queue internal to the UM server 202, a queue internal to the transcription server(s) 208, and/or a queue external to both the UM server 202 and the transcription server(s) 208. In one case, multiple non-real-time queues can distinguish between different classes of non-real-time transcriptions.

[0024] The UM directory 214 can store additional classes of service beyond the exemplary classes of service discussed herein. In one aspect, a hybrid class of service provides a different class of service based on time, location, subscription, date, and other user parameters. For example, a hybrid class of service for an accountant may indicate a real-time class of service on weekdays which are not federal holidays between 8:00 a.m. and 6:30 p.m. and a no preference class of service all other times. In another example, a salesman can indicate that all incoming messages from phone numbers or emails originating from a group of client companies are associated with a real-time transcription class of service and all other messages are associated with a class of service which prefers but does not require a short transcription time. Other variations and classes of service can be applied.

[0025] In one aspect, the UM directory 214 or another component (not shown) associated with the UM server 202 also provides information to the UM server 202 related to the probability of messages being accessed in the near term. If the user receives and accesses a new message notification while the message transcription is pending, the UM server 202 can increase the probability that the message will be accessed in the near term. If the user receives the new message notification indicating to the user that he/she has received a new message in his/her mailbox on the UM server 202, but the user does not access the message, the UM server 202 can lower the probability or leave it unchanged. The probability of near-term access can be based on historical statistics for subscriber message/transcription access times, such as the average time between new message notification and transcription access. The average time can be per-user for a very granular average for a particular user or can be averaged for similar customers. For example, the average time between new message notification and message access can be calculated for males from ages 18-25 in Florida, for Asian females in the Rocky Mountains, or for college students nationwide.

[0026] The probability of near-term access can further be based on subscriber presence information. Presence information can convey a user's available capacities to communicate. For example, presence can indicate whether a user is available or not, whether a user can accept a video feed or not, the user's physical location, which specific communication devices the user has available, and so forth. Presence can also indicate a user's willingness to accept communications. For example, a user presence can indicate "do not disturb", "in a meeting", or "available". Presence information can be automatically generated (e.g., based on communications with other components, some of which are not shown) or manually set by the user. In one configuration, the UM directory 214 receives subscriber presence information from UM clients 210A, 210B, 210C, 210D and/or components that directly and/or indirectly communicate with such clients and bases the probability of messages being accessed in the near term on that presence information. Presence information can be gleaned from one source or from multiple sources, such as web browser logins, smartphone applications, GPS signals, calendar events, and so forth. Furthermore, presence information can also be determined from activities and/or login status of subscribers using the sample devices for UM clients 210a, 210b, 210c, and 210d.

[0027] Other potentially relevant factors to the probability of near-term access can include message parameters, such as indicators of message urgency, and message meta-data, such as a message source or message title (where available). The UM server 202 can also dedicate more resources to subscribers that have historically received higher confidence transcriptions from the transcription server(s) 208 for their raw messages.

[0028] The UM server 202 communicates with a transcription server(s) 208 which transcribe all or part of each message from the message sources via a finite number of communication channels 212. The finite number of communication channels can be divided into multiple groups (not shown). For example, a first group of communication channels associated with a first group of transcription servers can handle real-time transcriptions and a second group of communication channels associated with a second group of transcription servers can handle non-real-time transcriptions. The transcription server 208 can transcribe messages using speech to text, OCR, pattern recognition, and/or any other suitable mechanism(s) to extract text from non-textually formatted messages or raw content. The transcription server 208 can also perform translation services to translate extracted text from one language to another, if needed. The UM server 202 can then offer an original language transcription and a translated transcription to the UM client. The UM server 202 identifies a particular UM client 210A, 210B, 210C, 210D for each message and transmits information to the respective UM client regarding the message, including a transcription status. In the case of a voicemail, the UM server 202 can transmit information indicating a sender of the voicemail, a duration of the voicemail, a callback number, a time of the voicemail, a "headline" of the voicemail transcription and so forth.

[0029] Some example UM clients include smartphones, PDAs, cellular phones, web browsers, mobile phone applications, a personal computer, an intermediate UM server, an IPTV set top box, and so forth. Additional types of client devices can be used as well, all of which are not shown in FIG. 2. When a UM client establishes a session with the UM server 202, the UM server 202 can return a listing of messages and, possibly, transcription progress for messages in the listing. If the UM server 202 receives progress updates from the transcription server(s) 208, such as a revised expected completion time or a completed transcription, from the transcription server 208, the UM server 202 can transmit updated notifications to the appropriate UM client device.

[0030] Having disclosed some basic system components and an exemplary unified messaging server and client configuration, the disclosure now turns to a discussion of five types of selective message transcription. The first type of selective message transcription is checking for a transcription-capable UM client. Before transcription is enabled for a subscriber, the UM server 202 retrieves a Class-of-Service (CoS), such as from a UM directory 214, for the subscriber to ensure that transcription is allowed. If the subscriber's CoS allows transcription, the UM server 202 proceeds to determine, where possible, whether the subscriber is currently accessing the UM server 202 via a transcription-capable device, such as a device capable of displaying text. Some examples of such devices include a smartphone 210b and a personal computer 210d. Certain devices, such as a plain telephone 210a, do not have any display capabilities and are thus not transcription-capable. Some devices have limited ability to display text, such as a desktop phone 210c having a display capable of showing only a single, short line of characters. Depending on these display capabilities, the device may or may not qualify as transcription-capable. The UM server 202 can track user logins from specific clients, client types, client versions, client identifications, and so forth. In one embodiment, the UM server 202 tracks a source of the last "getMessageTranscription" application programming interface (API) call from UM client applications on behalf of each subscriber. For example, the API call may include a fingerprint of the requesting client device that can identify the device type. The server 202 can then look up in a table whether that device type is transcription capable. Alternately, the API call can include a flag indicating whether a device is transcription capable or not.

[0031] In one variation, if the last "getMessageTranscription" API call occurred within the last N days, then the UM server 202 enables transcription. N can be a CoS configurable attribute, allowing different values to be used for different subscribers. The current state of transcription may be stored in an attribute in the UM Directory 214. Whenever the "transcription capable" state changes for a particular user, the UM server 202 can update such information in the UM server 202 and/or the UM directory 214 for that user.

[0032] The second type of selective transcription is platform-controlled pre-defined selective transcription. In this type, the UM server 202 is configured to act based on pre-defined transcription exceptions. The transcription exceptions can be applied globally or can be associated with a particular subscriber and/or CoS. The UM directory 214 can store these exceptions. Exceptions can take the form of a ruleset that determines when transcriptions should be skipped that would otherwise be performed. A ruleset can include one or more rules to skip entirely or change priorities of transcriptions such as "skip transcription for any messages greater than 1 min in length" or "skip transcription if subscriber currently has more than 4 transcriptions pending". Rulesets can also include positive rules regarding which types of messages should always be transcribed.

[0033] The third type of selective transcription is user-controlled pre-defined selective transcription. In one variation of this approach, the UM directory 214 and/or the UM server 202 provide a subscriber interface to allow subscribers to adjust the details of each rule and to adjust the order in which the rules are applied to messages for that subscriber. Subscriber settings in the UM platform 200 are updated to reflect new pre-defined options for transcription. Some exemplary options and rules include "skip transcription if the sender of the message is not an approved contact", "assign a low priority to transcriptions for messages left between midnight and 7:30 a.m.", "transcribe messages from unknown callers", "transcribe message from callers present in my address book", "transcribe message from callers not present in my address book", "transcribe messages marked urgent", "transcribe messages with a read receipt request", and "transcribe messages from callers in the Legal department". User-controlled rules can be conditional, such as based on a client device state, a user location, a current user activity, calendar events, and so forth. The conditional rules can depend on multiple user-dependent or user-independent factors. User-controlled rules can be applied in addition to CoS rules or can be overridden by conflicting CoS rules and/or exceptions.

[0034] The fourth type of selective transcription is pre-deposit, user-controlled, on-demand selective transcription. This supports scenarios in which the subscriber is presented with an enhanced call handling interface, such as an enhanced graphical interface on a smartphone, IPTV, or softphone. Upon receiving a new call, the subscriber chooses an option to "forward to voicemail and provide a transcription". The UM platform 200 is enhanced to support receipt and processing of a new parameter in call signaling. Alternately, the UM server 202 can communicate with client devices 210a, 210b, 210c, 210d via a separate data channel such as a web services API channel. This parameter indicates the specific feature that is being requested. For example, when a call is forwarded to voicemail, the system can associate a redirecting reason code with the redirecting number. In Session Initiation Protocol (SIP), this can be the reason code associated with a SIP diversion header, a cause code associated with a Voicemail URI, and so forth.

[0035] The fifth type of selective transcription is a post-deposit, user-controlled, on-demand selective transcription. When this is enabled in a subscriber's CoS, no messages are automatically transcribed. Instead, the subscriber must specifically request transcription of a message. This approach can rely on additional functionality via a new API call to the UM server 202 such as a "TranscribeMessages ([arrays of message-numbers]) API call, whereby a client device to the UM server 202 can request that the UM server 202 initiate transcription for a particular message or a group of messages. This can lead to modification of systems which automatically transcribe all messages or no messages.

[0036] This approach is not limited to only subscribers that have on-demand transcription service. The TranscribeMessages API call can include implied and/or explicitly-requested limits on the maximum number of simultaneously pending transcriptions for a given subscriber to ensure that a client does not request transcription for a large number of previously untranscribed messages in a short interval.

[0037] The approaches set forth herein can reduce the hardware cost for a unified messaging platform 200 and can improve scalability of the platform by ensuring that transcriptions are only performed when subscribers are actively using transcription-capable clients. The unified messaging platform 200 can allow pre-defined and on-demand voicemail transcription and reduce the possibility of delays in voicemail transcription to improve the customer experience.

[0038] The disclosure now turns to the exemplary method embodiments of these types of selective transcription. FIG. 3 illustrates a first example method embodiment for selectively transcribing messages. The method is discussed in terms of a system configured to practice the method, such as system 100 shown in FIG. 1. The system 100 receives, at a messaging server, such as the UM server 202 of FIG. 2, a message addressed to a subscriber (302) and retrieves a class of service associated with the subscriber (304). Then, if the class of service indicates that transcription is to be performed, the system 100 determines whether the subscriber is accessing the messaging server via a transcription capable client (306). The system 100 transcribes the message if the subscriber is accessing the messaging server via the transcription capable client (308). A transcription server separate from or incorporated into the system 100 can transcribe the message. In one variation, the system 100 transcribes the message if the messaging server has received a request for a transcription from the subscriber within a threshold time. The threshold time can be associated with the class of service. The system 100 can optionally identify a current transcription state for the subscriber, store the current transcription state in a subscriber directory, and notify the subscriber when the current transcription state changes.

[0039] FIG. 4 illustrates a second example embodiment for selectively transcribing messages. The system 100 receives, at a messaging server, such as the UM server 202 of FIG. 2, a message for a subscriber (402) and checks attributes associated with the message against a set of transcription rules if the message is transcribable (404). The attributes can include, for example, the type of the message, the raw content of the message, a sender of the message, a category of the sender of the message, a contact list of the subscriber, a priority marking of the message, and a request for a read receipt. Other attributes include any information which describes any single aspect or multiple aspects of the message, its sender, or any other associated entities.

[0040] The set of transcription rules can include a set of transcription exceptions. The set of transcription rules can be retrieved from a directory of user accounts. Each user's account can include a set of transcription rules and the user's class of service can indicate additional transcription rules to apply. Transcription rules can include transcription exceptions which define messages having a certain attribute or pattern of attributes that are not to be transcribed automatically. A user can explicitly define one or more transcription rules, or the system can infer and automatically generate transcription rules by observing user behavior. In one aspect, the transcription rules are not based solely on attributes of the message or the sender, but also on factors external to the message. For example, one of the transcription rules can be based on a threshold of the number of currently pending transcriptions.

[0041] If at least one of the message and at least one of the attributes matches any of the set of transcription rules, the system 100 passes the message to a transcription server for transcription (406), and if at least one of the message and at least one of the attributes does not match at least one of the set of transcription rules, the system 100 leaves the message untranscribed (408).

[0042] FIG. 5 illustrates a third exemplary method embodiment for selectively transcribing messages. The system 100 presents, via a UM client device, a notification of an incoming call to a subscriber (502). The notification can be presented in real time simultaneously with the incoming call. The notification can be presented to the user (UM subscriber) via an enhanced interface. The system 100 converts the incoming call to a saved message based on input received from the subscriber and generates a transcription of the saved message (504). Converting the incoming call can optionally be performed based on input received from the subscriber in response to the notification. The system 100 can also assign the transcription a reason code and/or a redirecting number received from the subscriber via a separate data channel. The reason code can be associated with a Session Initiation Protocol (SIP) diversion header and/or a cause code associated with a voicemail Uniform Resource Identifier (URI). The system 100 presents the transcription to the subscriber (506). The system 100 can present the transcription via one or more communication medium, such as a text message, an image of the transcribed text, an email, a tweet, and so forth.

[0043] Embodiments within the scope of the present disclosure may also include tangible and/or non-transitory computer-readable storage media for carrying or having computer-executable instructions or data structures stored thereon. Such non-transitory computer-readable storage media can be any available media that can be accessed by a general purpose or special purpose computer, including the functional design of any special purpose processor as discussed above. By way of example, and not limitation, such non-transitory computer-readable media can include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions, data structures, or processor chip design. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or combination thereof) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of the computer-readable media.

[0044] Computer-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, components, data structures, objects, and the functions inherent in the design of special-purpose processors, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.

[0045] Those of skill in the art will appreciate that other embodiments of the disclosure may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. Embodiments may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination thereof) through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

[0046] The various embodiments described above are provided by way of illustration only and should not be construed to limit the scope of the disclosure. Those skilled in the art will readily recognize various modifications and changes that may be made to the principles described herein without following the example embodiments and applications illustrated and described herein, and without departing from the spirit and scope of the disclosure.

* * * * *