U.S. patent application number 10/316567 was filed with the patent office on 2004-06-17 for system and method for collaboration summarization playback.
This patent application is currently assigned to Siemens Information. Invention is credited to Beyda, William J., Caspi, Rami.
Application Number | 20040114541 10/316567 |
Document ID | / |
Family ID | 32505979 |
Filed Date | 2004-06-17 |
United States Patent
Application |
20040114541 |
Kind Code |
A1 |
Caspi, Rami ; et
al. |
June 17, 2004 |
System and method for collaboration summarization playback
Abstract
A system for collaboration summarization playback includes a
graphical user interface (950) for displaying summarization
categories (952) associated with recording cues and clips of the
multimedia conference. The categories may be arranged as a list or
as thumbnails and typically includes a length of time for each
category; a time during the conference when the associated clip was
recorded; and a media type. The categories are clickable and allow
the associated clip to be played or displayed as recorded. In
addition, in certain embodiments, a voice recognition transcription
of audio clips may be provided.
Inventors: |
Caspi, Rami; (Sunnyvale,
CA) ; Beyda, William J.; (Cupertino, CA) |
Correspondence
Address: |
Attn: Elsa Keller, Legal Administrator
Siemens Corporation
Intellectual Property Department
186 Wood Avenue South
Iselin
NJ
08830
US
|
Assignee: |
Siemens Information
Communication Networks, Inc.
|
Family ID: |
32505979 |
Appl. No.: |
10/316567 |
Filed: |
December 11, 2002 |
Current U.S.
Class: |
370/260 ;
370/352; 709/204 |
Current CPC
Class: |
H04M 3/567 20130101;
H04M 7/006 20130101; H04M 2201/40 20130101; H04M 2201/42 20130101;
H04M 2203/4536 20130101; H04L 65/403 20130101; H04M 2207/203
20130101; H04L 29/06027 20130101; H04M 3/42221 20130101; H04M
3/5307 20130101; H04M 2203/2072 20130101; H04M 3/382 20130101; H04M
2201/60 20130101 |
Class at
Publication: |
370/260 ;
370/352; 709/204 |
International
Class: |
H04L 012/16 |
Claims
What is claimed is:
1. A telecommunications method, comprising: storing a plurality of
recording cues, said recording cues adapted for marking a
predetermined time period around which a portion of a multimedia
conference is to be recorded; capturing portions of said multimedia
conference responsive to execution of said plurality of recording
cues; and storing said portions according to user-defined
categories.
2. A telecommunications method in accordance with claim 1, further
comprising playing said portions back based on particular
categories.
3. A telecommunications method in accordance with claim 2, said
playing back further comprising playing back based on said
particular categories and a relevance probability
determination.
4. A telecommunications method in accordance with claim 2, further
comprising rendering said categories as audio cues.
5. A telecommunications method in accordance with claim 1, further
comprising arranging said audio cues an interactive voice response
menu.
6. A telecommunications method in accordance with claim 5, further
comprising playing back captured portions having a selectable
probability.
7. A telecommunications system, comprising: a local area network
(LAN); a multimedia server operably coupled to said network, said
multimedia server adapted to manage a multimedia conference and
including a memory for storing selectable portions of said
multimedia conference; one or more client devices operably coupled
to said LAN and adapted to set recording cues for choosing said
portions of said multimedia conference for playback and indexing
said portions according to user-defined categories.
8. A telecommunications system in accordance with claim 7, said one
or more clients adapted to select for storing a transcription of an
audio portion of said multimedia conference.
9. A telecommunications system in accordance with claim 7, wherein
said one or more client devices are adapted to set probabilities of
recognition of said recording cues.
10. A telecommunications system in accordance with claim 7, wherein
said recording cues comprise audio recording cues.
11. A telecommunications system in accordance with claim 7, wherein
said recording cues comprise video recording cues.
12. A telecommunications server, comprising: a multimedia
communication controller for interfacing multimedia conferences;
and a collaboration controller operably coupled to said multimedia
communication controller, said collaboration controller adapted to
store a multimedia conference and play back selected portions of
said multimedia conference according to user selected criteria
based on recording cues and responsive to a user-defined index.
13. A telecommunications server in accordance with claim 12, said
collaboration controller adapted to select for storing a
transcription of an audio portion of said multimedia conference
14. A telecommunications server in accordance with claim 12,
wherein said collaboration controller is adapted to play back said
portions based on probabilities of recognition of said recording
cues.
15. A telecommunications server in accordance with claim 12,
wherein said recording cues comprise audio recording cues.
16. A telecommunications server in accordance with claim 12,
wherein said recording cues comprise video recording cues.
17. A telecommunications server in accordance with claim 16,
wherein said recording cues comprise whiteboard recording cues
18. A telecommunications device, comprising an interaction center
adapted to conduct a multimedia conference including instant
messaging and adapted to allow defining recording cues for playing
back portions of said multimedia conference based on an index of
said recording cues.
19. A telecommunications device in accordance with claim 18, said
interaction center further adapted to specify a playback content by
selecting a recording cue match probability.
20. A telecommunications device in accordance with claim 19,
wherein said recording cues comprise audio recording cues.
21. A telecommunications device in accordance with claim 19,
wherein said recording cues comprise video recording cues.
22. A telecommunications device in accordance with claim 19,
wherein said recording cues comprise instant messaging recording
cues.
Description
FIELD OF THE INVENTION
[0001] The present invention relates to telecommunications systems
and, in particular, to an improved system and method for messaging
collaboration summarization.
BACKGROUND OF THE INVENTION
[0002] The development of various voice over IP protocols such as
the H.323 Recommendation and the Session Initiation Protocol (SIP)
has led to increased interest in multimedia conferencing. In such
conferencing, typically, a more or less central server manages the
conference and maintains the various communications paths. Parties
to the conference are able to communicate via voice and/or video
through the server.
[0003] Instant messaging can provide an added dimension to
multimedia conferences. In addition to allowing text chatting,
instant messaging systems such as Microsoft Windows Messenger can
allow for transfer of files, document sharing and collaboration,
collaborative whiteboarding, and even voice and video.
[0004] As can be appreciated, a complete multimedia conference can
involve multiple voice and video streams, the transfer of many
files, and much marking-up of documents and whiteboarding. On
occasion, an individual who is not a party to all or part of the
conference may nevertheless find it necessary to review what was
said. While a messaging server or individual clients may be able to
record or store an entirety of such a conference, the reviewing
party may not wish to replay the entire meeting, including all the
irrelevant comments and dead ends typical in any multiparty
collaboration.
[0005] As such, there is a need for a system and method for easily
reviewing a multimedia conference. There is a further need for a
system and method for accessing particular portions of a multimedia
conference upon review.
SUMMARY OF THE INVENTION
[0006] These and other drawbacks in the prior art are overcome in
large part by a system and method according to embodiments of the
present invention.
[0007] A telecommunications system according to an embodiment of
the present invention includes a network and a multimedia server
operably coupled to the network. The multimedia server is adapted
to manage a multimedia conference and includes a memory for storing
selectable portions of the multimedia conference. The system
further includes one or more client devices operably coupled to the
network and adapted to set recording cues for choosing portions of
said multimedia conference for playback. The multimedia server or
clients may include a voice recognition system for transcribing
audio portions of the conference. The voice recognition system may
further be used to detect instances of the recording cues.
[0008] A method according to an embodiment of the present invention
includes storing a plurality of recording cues adapted for marking
a predetermined time period around which a portion of a multimedia
conference is to be recorded; and capturing sequentially portions
of the multimedia conference responsive to execution of the
recording cues. The recording cues may be audio cues or may be
whiteboard or document identifiers.
[0009] A telecommunications server according to an embodiment of
the present invention is adapted to store or record a multimedia
conference. In addition, the server may store a plurality of
predetermined recording cues, settable by a user. The recording
cues may include voice recording cues, recognizable by a voice
recognition unit, or may include text or whiteboard identification
recording cues. When the cues are identified, a predetermined
amount of the conference is tagged or stored for summary play
later. In addition, a percentage match when tags are identified may
be assigned, such that the summary may be played back later based
on the likelihood of a match.
[0010] A system for collaboration summarization playback according
to an embodiment of the present invention includes a graphical user
interface for displaying summarization categories associated with
recording cues and clips of the multimedia conference. The
categories may be arranged as a list or as thumbnails and typically
includes a length of time for each category; a time during the
conference when the associated clip was recorded; and a media type.
The categories are clickable and allow the associated clip to be
played or displayed as recorded. In addition, in certain
embodiments, a voice recognition transcription of audio clips may
be provided.
[0011] A method according to an embodiment of the present invention
includes recording a multimedia conference and associating portions
thereof with one or more categories derived from recording cues.
The method further includes making the portions accessible for
selective playback via a user interface. This can include
identifying a media type and a time associated with each portion.
Further, the method includes playing back the portions responsive
to a selection in the original media type or by a text
transcription.
[0012] A better understanding of these and other specific
embodiments of the invention is obtained when the following
detailed description is considered in conjunction with the
following drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIG. 1 is a diagram of a telecommunication system according
to an embodiment of the present invention;
[0014] FIG. 2 is a diagram illustrating a telecommunications
collaboration system according to an embodiment of the present
invention;
[0015] FIG. 3 is a diagram illustrating a graphical user interface
according to an embodiment of the present invention;
[0016] FIG. 4 is a diagram illustrating collaboration summarization
according to an embodiment of the present invention;
[0017] FIG. 5A and FIG. 5B are flowcharts illustrating setting
recording cues according to embodiments of the present
invention;
[0018] FIG. 5C is a graphical user interface according to an
embodiment of the present invention;
[0019] FIG. 5D is a signaling diagram illustrating operation of an
embodiment of the present invention;
[0020] FIG. 6A is a flowchart illustrating operation of an
embodiment of the present invention;
[0021] FIG. 6B is a graphical user interface according to an
embodiment of the present invention;
[0022] FIG. 6C is a signaling diagram illustrating operation of an
embodiment of the present invention;
[0023] FIG. 7A is a flowchart illustrating operation of an
embodiment of the present invention;
[0024] FIG. 7B is a graphical user interface according to an
embodiment of the present invention;
[0025] FIG. 7C is a signaling diagram illustrating operation of an
embodiment of the present invention;
[0026] FIG. 8 is a flowchart illustrating operation of an
embodiment of the present invention;
[0027] FIG. 9A and FIG. 9B represent schematically the storage of
the recorded conference and summarization(s);
[0028] FIG. 10 illustrates an exemplary graphical user interface
according to an embodiment of the present invention;
[0029] FIG. 11 illustrates category handling according to an
embodiment of the present invention;
[0030] FIG. 12 illustrates signaling according to an embodiment of
the present invention;
[0031] FIG. 13 is a flowchart illustrating operation of an
embodiment of the present invention;
[0032] FIG. 14 illustrates signaling according to an embodiment of
the present invention;
[0033] FIG. 15 is a flowchart illustrating operation of an
embodiment of the present invention;
[0034] FIG. 16A and FIG. 16B are flowcharts illustrating operation
of an embodiment of the present invention;
[0035] FIG. 17 is a diagram illustrating an exemplary IVR menu
according to an embodiment of the present invention; and
[0036] FIG. 18 is a signaling diagram showing operation of an
embodiment of the present invention.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
[0037] Turning now to the drawings and, with particular attention
to FIG. 1, a diagram of an exemplary telecommunications system 100
according to an embodiment of the present invention is shown. As
shown, the telecommunications system 100 includes a local area
network (LAN) 102. The LAN 102 may be implemented using a TCP/IP
network and may implement voice or multimedia over IP using, for
example, the Session Initiation Protocol (SIP). Operably coupled to
the local area network 102 is a server 104. The server 104 may
include one or more controllers 101, which may be embodied as one
or more microprocessors, and memory 103 for storing application
programs and data. The controller 101 implements an instant
messaging system 106. The instant messaging system may be embodied
as Microsoft Windows Messenger or other instant messaging system.
Thus, according to certain embodiments of the present invention,
the instant messaging system 106 implements the Microsoft .Net
environment 108 and Real Time Communications protocol (RTC)
110.
[0038] In addition, according to embodiments of the present
invention, a collaboration system 114 may be provided, which may be
part of an interactive suite of applications 112, run by controller
101, as will be described in greater detail below.
[0039] Also coupled to the LAN 102 is a gateway 116 which may be
implemented as a gateway to a private branch exchange (PBX), the
public switched telephone network (PSTN) 118, or any of a variety
of other networks, such as a wireless or cellular network. In
addition, one or more LAN telephones 120a-120n and one or more
computers 122a-122n may be operably coupled to the LAN 102.
[0040] The computers 122a-122n may be personal computers
implementing the Windows XP operating system and thus, Windows
Messenger. In addition, the computers 122a-122n may include
telephony and other multimedia messaging capability using, for
example, peripheral cameras, microphones and speakers (not shown)
or peripheral telephony handsets 124, such as the Optipoint
handset, available from Siemens Corporation. In other embodiments,
one or more of the computers may be implemented as wireless
telephones, digital telephones, or personal digital assistants
(PDAs). Thus, the figures are exemplary only. As shown with
reference to computer 122a, the computers may include one or more
controllers 129, such as Pentium-type microprocessors, and storage
131 for applications and other programs.
[0041] Finally, the computers 122a-122n may implement Interaction
Services 128a-128n according to embodiments of the present
invention. As will be described in greater detail below, the
Interaction Services 128a-128n allow for interworking of phone,
buddy list, instant messaging, presence, collaboration, calendar
and other applications. In addition, according to embodiments of
the present invention, the Interaction Services 128 allow access to
the collaboration summarization module 114 of the server 104 and
thus permit the user to access and manipulate conference
summaries.
[0042] Turning now to FIG. 2, a functional model diagram
illustrating collaboration system 114 is shown. More particularly,
FIG. 2 is a logical diagram illustrating a particular embodiment of
a collaboration server 104. The server 104 includes a plurality of
application modules 200 and a communication broker module 201. One
or more of the application modules and communication broker module
201 may include an inference engine, i.e., a rules based artificial
intelligence engine for implementing functions according to the
present invention, as will be described in greater detail below. In
addition, the server 104 provides interfaces, such as APIs
(application programming interfaces) to SIP phones 220 and
gateways/interworking units 222.
[0043] According to the embodiment illustrated, the broker module
201 includes a basic services module 214, an advanced services
module 216, an automation module 212, and a toolkit module 218.
[0044] The basic services module 214 functions to implement, for
example, phone support, PBX interfaces, call features and
management, as well as Windows Messaging and RTC add-ins, when
necessary. The phone support features allow maintenance of and
access to buddy lists and provide presence status.
[0045] The advanced services module 216 implements function such as
presence, multipoint control unit (MCU), recording, and the like.
MCU functions are used for voice conferencing and support ad hoc
and dynamic conference creation from a buddy list following the SIP
conferencing model for ad hoc conferences. In certain embodiments,
support for G.711 and G.723.1 codecs is provided. Further, in
certain embodiments, the MCU can distribute media processing over
multiple servers using the MEGACO protocol.
[0046] Presence features provide device context for both SIP
registered devices and user-defined non-SIP devices. Various user
contexts, such as In Meeting, On Vacation, In the Office, etc., can
be provided for. In addition, voice, e-mail and instant messaging
availability may be provided across the user's devices. The
presence feature enables real time call control using presence
information, e.g., to choose a destination based on the presence of
a user's devices. In addition, various components have a central
repository for presence information and for changing and querying
presence information. In addition, the presence module provides a
user interface for presenting the user with presence
information.
[0047] In addition, the broker module 201 may include the
ComResponse platform, available from Siemens Information and
Communication Networks, Inc. ComResponse features include speech
recognition, speech-to-text, and text-to-speech, and allow for
creation of scripts for applications. The speech recognition and
speech-to-text features may be used by the collaboration
summarization unit 114, as will be discussed in greater detail
below.
[0048] In addition, real time call control is provided by a SIP API
220 associated with the basic services module 214. That is, calls
can be intercepted in progress and real time actions performed on
them, including directing those calls to alternate destinations
based on rules and or other stimuli. The SIP API 220 also provides
call progress monitoring capabilities and for reporting status of
such calls to interested applications. The SIP API 220 also
provides for call control from the user interface.
[0049] According to the embodiment illustrated, the application
modules include a collaboration module 202, an interaction center
module 204, a mobility module 206, an interworking services module
208, and a collaboration summarization module 114.
[0050] The collaboration module 202 allows for creation,
modification or deletion of a collaboration session for a group of
users. The collaboration module 202 may further allow for invoking
a voice conference from any client. In addition, the collaboration
module 202 can launch a multi-media conferencing package, such as
the WebEx package. It is noted that the multi-media conferencing
can be handled by other products.
[0051] The interaction center 204 provides a telephony interface
for both subscribers and guests. Subscriber access functions
include calendar access and voicemail and e-mail access. The
calendar access allows the subscriber to accept, decline, or modify
appointments, as well as block out particular times. The voicemail
and e-mail access allows the subscriber to access and sort
messages.
[0052] Similarly, the guest access feature allows the guest access
to voicemail for leaving messages and calendar functions for
scheduling, canceling, and modifying appointments with subscribers.
Further, the guest access feature allows a guest user to access
specific data meant for them, e.g., receiving e-mail and fax back,
etc.
[0053] The mobility module 206 provides for message forwarding and
"one number" access across media, and message "morphing" across
media for the subscriber. Further, various applications can send
notification messages to a variety of destinations, such as
e-mails, instant messages, pagers, and the like. In addition, the
subscriber can set rules that the mobility module 206 uses to
define media handling, such as e-mail, voice and instant messaging
handling. Such rules specify data and associated actions. For
example, a rule could be defined to say "If I'm traveling, and I
get a voicemail or e-mail marked Urgent, then page me."
[0054] Further, as will be explained in greater detail below, the
collaboration summarization module 114 is used to identify or
highlight portions of a multimedia conference and configure the
portions sequentially for later playback. The portions may be
stored or identified based on recording cues either preset or
settable by one or more of the participants in the conference, such
as a moderator. As will be explained in greater detail below, the
recording cues may be based on vocalized keywords identified by the
voice recognition unit of the ComResponse module, or may be invoked
by special controls or video or whiteboarding or other
identifiers.
[0055] Turning now to FIG. 3, a diagram of a graphical user
interface 300 according to embodiments of the present invention is
shown. In particular, shown are a variety of windows for invoking
various functions. Such a graphical user interface 300 may be
implemented on one or more of the network clients. Thus, the
graphical user interface 300 interacts with the Interactive
Services unit 128 to control collaboration sessions.
[0056] Shown are a collaboration interface 302, a phone interface
304, and a buddy list 306. It is noted that other functional
interfaces may be provided. According to particular embodiments,
certain of the interfaces may be based on, be similar to, or
interwork with, those provided by Microsoft Windows Messenger or
Outlook.
[0057] The buddy list 306 is used to set up instant messaging calls
and/or multimedia conferences. The phone interface 304 is used to
make calls, e.g., by typing in a phone number, and also allows
invocation of supplementary service functions such as transfer,
forward, etc. The collaboration interface 302 allows for viewing
the parties to a collaboration 302a and the type of media involved.
It is noted that, while illustrated in the context of personal
computers 122, similar interfaces may be provided the telephones or
cellular telephones or PDAs.
[0058] As noted above, an aspect of the present invention allows
selective summarization based on recognition of recording cues.
FIG. 4 is a diagram schematically illustrating collaboration
summarization according to an embodiment of the present invention.
More particularly, shown are a plurality of media streams
representative of, for example, a multimedia conference between
multiple parties. Shown are a whiteboard stream 400, an audio
stream 402, a video stream 404, and an instant messaging stream
406. It is noted that, in practice, more or fewer of such data
streams may be present. Thus, the figure is exemplary only.
[0059] Also shown in FIG. 4 is a time scale 408 showing a time T1.
The time T1 represents, for example, a duration of the conference
and hence the period required to review the conference in its
entirety once it has been recorded. According to the present
invention, however, a participant in the conference, such as a
designated moderator, can set and activate or invoke a recording
cue, which causes the collaboration summarization system to either
mark predetermined periods on the recorded conference or save
predetermined periods as a separate summary file. As shown in FIG.
4, at a time Ta, a user activates a recording cue 4000. A period
410 of the conference is then either marked or stored in memory 103
for later playback as part of a collaboration summary. Similarly,
at time Tb, another recording cue is activated and a period 412 is
then either marked or stored for later playback as part of a
collaboration summary. As seen at 416, the result on playback is a
summary of the multimedia conference of duration T2.
[0060] FIG. 5A and FIG. 5B are flowcharts illustrating setting
recording cues or keywords for conference summarization according
to embodiments of the present invention. FIG. 5C illustrates an
exemplary user interface window 5000 that may be used to set the
recording cue(s). Shown are a cue display area 5002 for displaying
the recited cue and accept and reject buttons 5004, 5006. The user
interface window 5000 may be generated by or in association with
the interaction services module 128 of the client 122 and in
communication with the collaboration module 114 of the server
104.
[0061] As shown in FIG. 5A, a moderator may set recording cues or
keywords for later use in a conference. At 502a, the moderator
speaks or otherwise enters the desired recording cue. For example,
the moderator may set phrases such as "Action Item," "A decision
has been reached," "We have a consensus," "Our next meeting will be
. . . " and the like. The computer's sound system will receive the
cue and display it at 5002 on the graphical user interface of FIG.
5C. In other embodiments, the user can type in a recording cue that
will be recognized either from the speech unit of the ComResponse
platform or from transcribed text. Alternatively, the user may
define a particular entry into whiteboard or instant messaging
windows as the recording cue. For example, the moderator may
indicate that an R in the whiteboard window means that the contents
should be recorded. Alternatively, an X through it should indicate
it should not. The user than has an option of accepting or
rejecting the cue, by selecting the buttons 5004, 5006 (FIG. 5C).
If rejected, the user can re-try. If accepted, the collaboration
summarization system 114 will then record the cue at 504a (e.g.,
store it in a database in memory 103) and monitor the conference
for instances of the cue at 506a, as will be explained in greater
detail below. It is noted that an accept/reject option may also be
provided for video or other cues, as well.
[0062] In addition to, or instead of, the moderator setting the
recording cues, in certain embodiments, the recording cues may be
set by the individual users prior to beginning the conference. This
may be particularly useful if, for example, a voice response system
needs to learn the voices of various participants. As shown in FIG.
5B, at step 502b, the system may connect the conferees and enter a
training mode. In the training mode, while the users may be
connected to the server, they are not necessarily connected to one
another. At step 504b, the users may each set their cues, in a
manner similar to that described above with reference to FIG. 5A
and FIG. 5C. The training mode may allow, for example, the users to
each set various phrases as recording cues and may allow the system
to establish a personalized summary of the conference, keyed to the
person who made the cue. At step 506b, the system stores the cues
in memory 103 for use during the conference and then connects the
users.
[0063] Signaling for exemplary system recording cue training is
shown in FIG. 5D. Shown are a server 104 and a client 122, which
may represent the conference moderator or a participant. At 5500,
the client 122 requests and receives access to the server 104 for a
media session. This can include, for example, a SIP INVITE,
RINGING, OK sequence, for example. At 5502, the server 104 and the
client 122 open a media channel and the client 122 accesses the
collaboration system 114. At 5504, the client 122 uploads the
recording cue. As discussed above, this can include a voice or
video cue, or whiteboard, etc., markup. At 5506, the collaboration
system 114 downloads a confirmation of the recording cue and stores
it. For example, it may convert the speech to text and download the
text, or may store and analyze the cue and repeat it back, for
confirmation. If the cue is appropriately confirmed, then at 5508,
the client 122 sends an acknowledge.
[0064] FIG. 6A and FIG. 6B illustrate conferencing and activating
recording cues according to an embodiment of the present invention.
FIG. 6A is a flowchart illustrating operation of a collaboration
according to an embodiment of the present invention. FIG. 6B
illustrates an exemplary graphical user interface for use with a
collaboration summarization session. In particular, shown are a
master window 6000a, a whiteboard application window 6000b, and a
chat/document window 6000c. It is noted that in other embodiments,
more or fewer of each type of window, as well as windows pertaining
to other functions, may also be present. In the embodiment
illustrated, the master window 6000a includes a In Collaboration
field 6002 which defines the participants to the conference; a
speech-to-text field 6004 for displaying the converted audio into
text; and an Activate Cue button 6006. It is noted that in certain
embodiments, in which audio cues are used exclusively, the Activate
Cue button 6006 might not be present.
[0065] Turning now to FIG. 6A, at 604, the conference begins, with
the users all connected via the server, using various media. As
noted above, such a conference can include various combinations of
media such as voice, video, Instant Messaging, application sharing,
whiteboarding, and the like. At 602, the collaboration system
records the entirety of the multimedia conference, including all
threads and media, by storing it in memory 103. Further, in certain
embodiments, the collaboration system activates a speech-to-text
unit, e.g., the ComResponse platform, to transcribe all speech from
the voice and video channels, which is also stored in association
with the conference in memory 103. The window 6004 (FIG. 6B) may be
used to display the transcription. At 606, the moderator or one of
the users activates one of the recording cues. The recording cue
may be activated, for example, by the user or moderator speaking it
or by marking the whiteboard or other document being collaborated
on. Alternatively, in certain embodiments, the recording cue may be
activated by selecting a button or key associated with the client.
For example, with reference to FIG. 6B, the user may activate the
button 6006; or may draw the X 6008 in the whiteboarding window
6000b; or may activate the Record button 6010 of the chat/shared
application window 6000c. The invoking of the recording cue may
occur by the moderator or party formally invoking it, or by the
system "picking up" the use of it during the conference.
[0066] In response, at 608 (FIG. 6A), the collaboration
summarization system 114 either marks the point on the master
recording of the conference where the cue was invoked for later
playback, or stores in a separate file the associated passage, also
for later playback. In either case, the conference portion
pertinent to the cue is designated for later playback. In certain
embodiments, the summarization is stored or marked or categorized
by the party who has invoked the cue. In such an embodiment, a
moderator may maintain a master summarization record. In other
embodiments, the summarization occurs on a singular basis--i.e.,
only one summarization is performed, regardless of the invoking
party. Finally, at step 610, a match or relevance probability is
set in association with the marked or recorded summarization
portion of the conference. Any of a variety of probability matching
methods may be employed. In this manner, each part of the
conference is captured, separated and marked with a probability of
its relevance.
[0067] FIG. 6C illustrates signaling for a conference summarization
session according to an embodiment of the present invention. Shown
are a Client A 122a, which may also be the moderator; a server, and
a Client B 122b and a Client C 122c. At 6500, the client A or
moderator initiates a connection with the server 104, identifies
the session as a conference, and identifies the other parties. At
6502 and 6504, the other parties to the conference, Client B and
Client C, likewise log in to the server 104. As in the recording
cue case, the log in process can be in accordance with the SIP
protocol. Next, at 6506, 6508, and 6510, the clients 122a-122c
establish media connections via the server 104. At 6512, the server
104 records the conference and the collaboration summarization
system 114 monitors the ongoing media for the recording cue(s). If
a recording cue is detected, then at 6514, the collaboration
summarization system 114 records or marks the relevant passage or
clip or portion of the conference as part of the summary as it is
stored in memory. In addition, the collaboration summarization
system 114 may return a cue acknowledge signal to the moderator to
indicate that the cue was received or detected. The conference can
be terminated at 6518 in a known manner.
[0068] FIG. 7A and FIG. 7B illustrate playing a summarization
according to an embodiment of the present invention. FIG. 7A is a
flowchart illustrating operation of a playback embodiment of the
present invention. FIG. 7B is an exemplary user interface 7000 for
the playback.
[0069] As shown in FIG. 7B, the interface includes a conference
list 7002 listing conferences that have been saved and summarized;
one or more viewing windows 7004; a play button 7006; a relevance
probability entry field 7008; and training buttons 7010.
[0070] Turning now to FIG. 7A, at step 702, the user desiring a
summary will activate a summary function using his GUI 7000, for
example, by selecting the conference from the conference window
7002 and selecting the play button 7006. In certain embodiments, a
default match percentage will be used to deliver the summary. In
other embodiments, the user can designate a percentage match
threshold using the match field 7008--for matches to the cue higher
than the threshold, the system will play back a summary. As noted
above, in certain embodiments, this can be embodied as playing back
a single file containing all media above the threshold, or can be
embodied as accessing a single broad summary file with relevant
passages at the desired percent match marked. At 704, the system
will access the stored conference and play back the summary
according to the percent match.
[0071] FIG. 7C illustrates signaling for summary access according
to an embodiment of the present invention. Shown are a client 122
and server 104. At 7500, the client 122 logs in to the server 104.
At 7502, the client accesses, for example, a web page interface,
such as described above. At 7504, the user can select the summary
for viewing. As noted above, this can include specifying percent
matches, and the like. Finally, at 7506, the server 104 sends back
the appropriate summary from memory 103. It is noted that, in other
embodiments, the entirety of the summary can be downloaded, and
thereafter accessed locally.
[0072] As noted above, the system can be trained to recognize cues
prior to the start of a conference. FIG. 8 illustrates another way
of training the system. More particularly, a user can activate
approval indicia, such as "thumbs up" or "thumbs down" (or
good-bad) buttons when playing back his selected summary. That is,
each time the user detects an inaccuracy on behalf of the system,
he can select the "thumbs down" button and each time he is
satisfied, he can push the "thumbs up" button. This is interpreted
by the system and can be used when the same scenario occurs in the
future. Such good-bad buttons 7010 are illustrated in FIG. 7B.
[0073] Operation of this training method is illustrated more
particularly with reference to FIG. 8. In particular, at 802, the
user elects to playback the selected summary. At 804, the user
presses the "thumbs up" or "thumbs down" buttons to indicate
approval or disapproval. At 806, the system stores the
approval-disapproval after identifying the context. The knowledge
can then be used on subsequent occasions when the context occurs
again. That is, the collaboration system 114 can learn whether a
cue was correctly detected as having been invoked. Thus, the next
time a cue is determined to be invoked, the system can check both
its database of "user-set" cues and cross-reference its record of
"learned" probabilities. Further, such training can be used by the
collaboration summarization system 114 to search through and update
other stored summarizations, if desired.
[0074] As noted above, the summarization can be stored by the
system either as a completely separate file or as indices marking
"points" on the complete conference recording. This is illustrated
more particularly with reference to FIGS. 9A and 9B. Shown in FIG.
9A is a file 900a representing the complete recorded conference.
Also shown are files 902a, 902b representing one or more recorded
summaries of the conference. In certain embodiments, each file
represents a complete summary based on a particular user's
automatic or deliberate invocation of recording cues. In certain
embodiments, only one such file will be created (i.e., based on the
moderator's cuing). Alternatively, each file can represent a
complete summary based on a percent match with the recording
cue.
[0075] FIG. 9B illustrates indexing against the recorded
conference. More particularly, 902b represents the recorded
complete conference. Shown at 902b1, 902b2, 902b3, 902b4, are
indices representing invocation of recording cues, marked, for
example, by a time stamp on the recorded conference 900b. Again,
the recording cues can be invoked by the moderator or parties to
the conference. The indices can be unique to the party invoking the
cue. Alternatively, only the moderator can be allowed to invoke
cues other than automatic ones.
[0076] As noted above, an aspect of the present invention relates
to providing an interface for accessing a collaboration summary. In
one embodiment for accessing, the summarized portions of a
conference can be stored according to summarization categories. In
certain embodiments, the recording cues themselves may for the
category indices.
[0077] Turning now to FIG. 10, a diagram of an exemplary graphical
user interface 950 according to an embodiment of the present
invention is shown. Typically, the graphical user interface 950 is
generated in conjunction with the Interactive Services module 128
and Collaboration Summarization module 114. In the embodiment
illustrated, the graphical user interface for playback 950 includes
a plurality of category headings 952a-952e, representative of, for
example, Action Items, Decisions, Items on Hold, Summaries, and
Open Items. It is noted that this list of categories is not
comprehensive and is exemplary only. Associated with each of the
categories 952a-952e are one or more thumbnails 954a-954e,
respectively. Each of the thumbnails is representative of a portion
or a clip from the multimedia conference. In certain embodiments,
displayed with the thumbnails is an indication of the media type,
size, and time of and associated with the clip. It is noted that,
while in the embodiment illustrated the categories are displayed as
thumbnails, the categories and associated information could be
displayed, for example, as a scrollable or dropdown list, or other
arrangement. Also, certain embodiments may include a timeline 956
to allow a visualization of where each of the associated clips
occurs during the conference. Thus, as shown, time indicia for
thumbnail clips 954a1 and 954e1 are displayed on the timeline. This
allows the user to better distinguish among clips in the same
category. In operation, as will be discussed in greater detail
below, the user can click on one of the thumbnails to view the
associated portion of the conference. The graphical user interface
950 may further include a relevance probability entry window 960.
This allows the user to specify both a category and a relevance
probability for summary viewing.
[0078] The category headings can be settable by a user and
associated with one or more recording cues, also settable by the
user. More particularly, shown in FIG. 11 are exemplary categories
and associated recording cues. In particular, shown at 1150 is a
category "Action Item," with associated cues 1151 "Action Item,"
"Need to Implement," "Progress must be made," and "Of utmost
urgency." It is noted that these are exemplary only. Similarly, an
exemplary category "Decisions" 1152 is shown, with associated
recording cues 1153 "When," "How Much."
[0079] In operation, the user can define the categories (or have
default categories provided) and then set associated recording
cues. Signaling for this is illustrated more particularly with
reference to FIG. 12. Shown are a client 122 and server 114. At
1201, the client logs in. This can include, for example, logging in
via a Web page access portal. At 1202, the server 114 provides a
default list of categories. This may include, for example, the
presentation of a Web page having a form using CGI-BIN script. At
1204, the user can be provided with an option to change the
categories, and transmit the changes to the server 114. At 1206,
the user can provide recording cues, in a manner similar to that
discussed above, and also associate them with the appropriate
categories. It is noted that providing the recording cues and
providing the categories need not necessarily occur in the same
session. Finally, at 1208, the server 114 stores the category/cue
lists in memory 103 and can use them for the designated
conference.
[0080] FIG. 13 is a flowchart illustrating operation of an
embodiment of the present invention and, in particular, illustrates
use of embodiments of the present invention during a conference or
collaboration. At step 1302, the various parties to the conference
log in to the server 104 for the conference. At step 1304, the
server 104 stores the ongoing conference in memory 103. At step
1306, the server 104 and, particularly, the collaboration
summarization module 114 monitors the conference for the invocation
of recording cues. At step 1308, the collaboration summarization
module 114 detects the recording cue during the conference. As in
the embodiment discussed above, a relevance probability may be
assigned to the associated conference portions. At step 1310, the
collaboration summarization module 114 accesses memory for
associated category information. Finally, at step 1312, the
collaboration summarization module 114 stores the summary portion
indexed to the category.
[0081] FIG. 14 illustrates signaling for an embodiment of the
present invention. In particular, FIG. 14 illustrates accessing a
collaboration summarization according to an embodiment of the
present invention. Shown are a client 122 and server 104. At 1402,
the client 122 accesses the server 104. As discussed above, this
can include the user accessing a Web page portal. At 1404, the
client receives the Web page from the server 104. At 1406, the
client 122, presented with a web page selection interface, such as
that of FIG. 10, can select one of the categories for viewing from
the collaboration summarization module 114. At 1408, the selection
is transmitted to the server 104. Finally, at 1410, the server
returns the selected portion of the conference. That is, the
collaboration summarization module 114 returns a conference summary
including all conference portions in the category selected.
[0082] As noted above, in certain embodiments, the user can enter a
relevance probability in addition to a category when accessing a
summary. This is illustrated more particularly with reference to
the flowchart of FIG. 15. At step 1502, the collaboration
summarization module receives the category selection from the user.
The collaboration summarization module 114 may then prompt the user
for a relevance probability, which can be entered at step 1504. For
example, the probability can be entered using control 960 while the
category can be selected by clicking on one of the categories. The
collaboration summarization module 114 then searches in the
category for all stored conference portions have that relevance
probability or higher, and displays them at step 1506.
[0083] An additional aspect of embodiments of the present invention
makes use of the text to speech capabilities of the ComResponse
platform. More particularly, the ComResponse module is able to
convert the Web page interface 950 to speech and allow the user to
hear categories as voice prompts in an IVR function. The conference
summary can then be accessed remotely via voice telephone, if the
requesting party does not have Web access. This is illustrated more
particularly in flowchart form in FIG. 16A and FIG. 16B.
[0084] Turning now to FIG. 16A, in step 1602, after the conference
and after the recording summary has been made, the collaboration
summarization module 114 generates the control web page interface
950 from the user input categories (if any) and the detected
recording cues, as described above. In step 1604, the user or a
moderator can invoke the ComResponse platform to generate a
speech-based menu from the Web page. In step 1604, the user or a
moderator can invoke the ComResponse feature to generate a
speech-based menu from the Web page. The result is a stored
"listing" of the category headings from the Web page 950. In
certain embodiments, identifiers of the individual records can also
be converted to speech. In step 1606, the system then uses this
listing to be associated with an IVR (interactive voice response)
type menu, with the categories being a first layer of prompts and
the individual summary portions underneath the headings being a
next layer of prompts. Alternatively, only the main categories can
be rendered as speech; accessing the IVR choices then would cause
the system to "read" the record portions associated with the
heading serially. It is noted that, while the ComResponse system
employs one type of text to speech, any suitable one may be
employed.
[0085] FIG. 17 illustrates schematically the rendering of the
category menu as speech. In the example illustrated, "Action Item,"
"Decisions," and "Summaries" have been converted to speech. If
there are choices below each category, as represented by the trees
1702, 1704, 1706, these, too, may be identified for future
access.
[0086] Accessing such a menu is illustrated with reference to the
flowchart of FIG. 16B. In step 1650, the accessing party dials in
to an access telephone number and enters any appropriate access
codes, etc. Once the accessing party has obtained access to the
system, he can select a conference for review, for example via an
interactive voice menu. In step 1654, the system delivers or
presents to voice menu associated with the conference that has been
rendered as discussed above. Finally, in step 1656, the accessing
party can access the desired summary portion or portions by
selecting the category or otherwise navigating the IVR menu. In
certain embodiments, the IVR menu may also give the user the option
of keying in a relevance probability in a manner similar to that
described above.
[0087] FIG. 18 illustrates signaling for accessing a collaboration
summarization by IVR according to an embodiment of the present
invention. Shown are a server 104, gateway 116, and PSTN 118. In
the example illustrated, the accessing party accesses the system
via the PSTN 118; for example, the user could access the system via
a landline analog or digital telephone or a cellular or wireless
telephone. At 1802, the user calls in, typically using a central
access telephone number. The call request is received at the
gateway 116 and a connection is made to the server 104 at 1804. In
the embodiment illustrated, the network is a SIP network, so the
exchange 1804 includes the SIP INVITE/RINGING/OK sequence. At 1806,
the server 104 opens a media channel to the gateway 116. At 1808,
the user enters a personal identification number (PIN), which is
received by the server via the gateway 116. In response, at 1810,
the server 104 accesses the user's account and presents an IVR menu
at 1812. For example, a list of conferences stored in summary form
could be provided. At 1814, the user can select a particular
conference either by keying in one or more digits or by speaking a
choice selection. In response, at 1816, the server 114 accesses the
conference. At 1818, the server 104 delivers the IVR menu of
conference summarization categories, as described above. At 1820,
the user can select the appropriate category. Finally, at 1822, the
summary portion selected is delivered as voice.
[0088] The invention described in the above detailed description is
not intended to be limited to the specific form set forth herein,
but is intended to cover such alternatives, modifications and
equivalents as can reasonably be included within the spirit and
scope of the appended claims.
* * * * *