U.S. patent application number 14/860414 was filed with the patent office on 2016-01-14 for rapid transcription by dispersing segments of source material to a plurality of transcribing stations.
The applicant listed for this patent is TIGERFISH. Invention is credited to Adam Michael GOLDBERG.
Application Number | 20160012821 14/860414 |
Document ID | / |
Family ID | 40075546 |
Filed Date | 2016-01-14 |
United States Patent
Application |
20160012821 |
Kind Code |
A1 |
GOLDBERG; Adam Michael |
January 14, 2016 |
RAPID TRANSCRIPTION BY DISPERSING SEGMENTS OF SOURCE MATERIAL TO A
PLURALITY OF TRANSCRIBING STATIONS
Abstract
A method and system for producing and working with transcripts
according to the invention eliminates time inefficiencies. By
dispersing a source recording to a transcription team in small
segments, so that team members transcribe segments in parallel, a
rapid transcription process delivers a fully edited transcript
within minutes. Clients can view accurate, grammatically correct,
proofread and fact-checked documents that shadow live proceedings
by mere minutes. The rapid transcript includes time coding, speaker
identification and summary. A viewer application allows a client to
view a video recording side-by-side with a transcript. Clicking on
a word in the transcript locates the corresponding recorded
content; advancing a recording to a particular point locates and
displays the corresponding spot in the transcript. The recording is
viewed using common video features, and may be downloaded. The
client can edit the transcript and insert comments. Any number of
colleagues can view and edit simultaneously.
Inventors: |
GOLDBERG; Adam Michael; (San
Francisco, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
TIGERFISH |
San Francisco |
CA |
US |
|
|
Family ID: |
40075546 |
Appl. No.: |
14/860414 |
Filed: |
September 21, 2015 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13664353 |
Oct 30, 2012 |
9141938 |
|
|
14860414 |
|
|
|
|
12127635 |
May 27, 2008 |
8306816 |
|
|
13664353 |
|
|
|
|
60940197 |
May 25, 2007 |
|
|
|
Current U.S.
Class: |
704/235 |
Current CPC
Class: |
G10L 15/01 20130101;
G10L 15/26 20130101; G06F 40/169 20200101; G10L 15/30 20130101;
G06F 40/18 20200101; G06Q 10/10 20130101 |
International
Class: |
G10L 15/26 20060101
G10L015/26; G10L 15/30 20060101 G10L015/30; G10L 15/01 20060101
G10L015/01; G06F 17/24 20060101 G06F017/24 |
Claims
1. A computer-implemented method for rapidly producing a transcript
of source material, the method comprising: receiving, at a provider
server, a real-time feed of source material from an event at a
remote location; automatically dividing the source material into
segments as the source material is received, wherein each of the
segments is a predetermined length; saving each of the segments as
a discrete file; allowing a transcriber to claim one or more
discrete files for transcription; dispersing the one or more
discrete files to a transcribing station associated with the
transcriber; receiving, at the provider server, an edited
transcript for each of the one or more discrete files; producing a
single composite transcript of the source material from the edited
transcripts.
2. The method of claim 1, further comprising: synchronizing the
composite transcript and the source material by: assigning
incremental time stamps to the source material; time-coding
instances within the composite transcript that correspond to the
source material; and associating the time codes with the
incremental time stamps by means of a table.
3. The method of claim 2, wherein the instances are written words
in the composite transcript.
4. The method of claim 1, further comprising: recognizing one or
more speakers whose utterances compose the source material; and
associating each word in the composite transcript with a particular
speaker.
5. The method of claim 1, further comprising: preparing a summary
of notable moments from the source material, wherein the summary
includes a link to an audio or video segment drawn from the source
material for each notable moment.
6. The method of claim 5, wherein the composite transcript and the
summary are simultaneously viewable within a viewer window that is
accessible by a client.
7. The method of claim 1, further comprising: delivering the
composite transcript to a client by transmitting the composite
transcript via email or making the composite transcript accessible
through a secure web page.
8. The method of claim 1, further comprising: encoding the source
material into a compressed media format; and specifying a filename
for each discrete file that includes a unique alphanumeric code
that identifies client, the event, date of the event, and time of
the event.
9. The method of claim 1, wherein the transcribing station includes
a computing device that is communicatively coupled to the provider
server.
10. The method of claim 1, wherein the source material is recorded
by an audio or video feed and transmitted across a network from a
remote location.
11. A computer-implemented method for constructing a synchronized
transcript of source material, the method comprising: receiving, at
a provider server, a plurality of edited transcripts, wherein each
edited transcript corresponds to a distinct segment of source
material from a single event, and wherein transcription is
performed by a plurality of transcribers, each of whom transcribes
and transmits at least one edited transcript; producing, from the
edited transcripts, a single transcript that is synchronized to the
source material, wherein the synchronized transcript can be used to
navigate the source material and the source material can be used to
navigate the synchronized transcript; presenting the synchronized
transcript and the source material within a client-accessible
viewer window, which includes elements by which the client
interacts with and modifies the synchronized transcript.
12. The method of claim 11, wherein the viewer window allows the
client to view the source material side-by-side with the
synchronized transcript and simultaneously follow progress of
both.
13. The method of claim 12, wherein clicking on a word in the
synchronized transcript locates a corresponding portion of the
source material, and wherein advancing the source material to a
particular location identifies a corresponding portion of the
synchronized transcript.
14. The method of claim 11, wherein the client is able to edit the
synchronized transcript and insert comments directly within the
viewer window.
15. The method of claim 14, wherein the client and at least one
other individual are able to view and edit the synchronized
transcript simultaneously.
16. A computer-implemented method for synchronizing source material
and a transcript of the source material, the method comprising:
automatically assigning incremental time stamps to the source
material, wherein the time stamps are incremented by a
predetermined, fixed interval, and wherein the time stamps
represent run time of the source material; allowing a transcriber
to manually assign time codes to portions of the transcript that
mark events within the source material; creating a tabular
structure that associates words in the transcript with the time
codes in a one-to-one relationship; and storing the tabular
structure in a database, wherein the transcript can be
reconstituted by displaying the words in the tabular structure in
the correct order.
17. The method of claim 16, wherein the time codes are embedded
within the transcript.
18. The method of claim 16, wherein a plurality of words are
identified by a time code range.
19. The method of claim 16, further comprising: allowing a client
to select the transcript for viewing, editing, or both; and
reconstituting the transcript by displaying the words in the
tabular structure in the correct order according to the time
codes.
20. The method of claim 16, wherein the tabular structure is
created using a spreadsheet application that splits the transcript
into cells so that each word is matched to a time code.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent
application Ser. No. 13/664,353, filed Oct. 30, 2012, which is a
divisional application of U.S. patent application Ser. No.
12/127,635, filed May 27, 2008, now U.S. Pat. No. 8,306,816, which
claims benefit of U.S. Provisional Patent Application Ser. No.
60/940,197, filed May 25, 2007, all of which are incorporated
herein in their entirety by this reference thereto.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The invention generally relates to transcription of spoken
source material. More particularly the invention is directed to a
method and system for rapid transcription.
[0004] 2. Technical Background
[0005] Transcription is the rendering of the spoken word into
written form. While this process goes back to the earliest examples
of written poetry and narrative, this discussion will focus on the
modern practice of transcription in the various disciplines that
have need of it.
[0006] Thomas Edison invented the phonograph for the purpose of
recording and reproducing the human voice, using a tinfoil-covered
cylinder, in 1877. When Edison listed ten uses for his invention,
the top item was "Letter writing and all kinds of dictation without
the aid of a stenographer." Prior to this, a reporter would have to
rely on contemporary notes, and the business world on secretaries
trained to take shorthand and type up the notes later. Edison's
invention created the possibility that something said in one
location could later be transcribed elsewhere, with the additional
benefit of repeated listening for greater accuracy. Since then,
advances in the field of transcription have been closely tied to
the development of recording technology.
[0007] By the 1930's, machines specifically designed for dictation
and playback had become ubiquitous in American offices. Gradually
cylinder-based machines gave way to tape, but until the 1990's the
practice of transcription still required the physical delivery of
recorded media from the location of recording to the location of
transcription.
[0008] In the early 1990's, practitioners began to recognize and
make use of the potential of the Internet and email in the practice
of transcription. Whereas previously a transcript needed to be
printed and delivered to a client, Internet email made it possible
to simply attach a document in electronic form to an email message.
Additionally, as tape recordings began to be replaced by digital
media and businesses became more sophisticated in their use of the
Internet, recordings destined for transcription could be uploaded
to a secure web site and then downloaded by the transcriber.
[0009] In spite of these technological advances that have greatly
eased the receipt and delivery of transcription materials,
transcription of speech remains a cumbersome process that is of
limited utility to clients for at least two reasons. The first
reason is the amount of time required to transcribe speech into
written form; the second has to do with the ability of clients to
coordinate the original speech with the completed transcripts.
[0010] Transcription relies on the abilities of a trained
professional to listen carefully, understand what is being said,
and accurately transfer the content and nuance to the written page.
To do this well requires a great deal of time. Digital recording
and electronic communication have accelerated the transmission of
recordings and delivery of transcripts, but a skilled transcriber
still requires at least several hours to transcribe one hour of
recorded speech. In this era of instant communication and an
ever-accelerating need for information and materials, even this
amount of time has begun to seem a roadblock to timely business
interactions.
[0011] The second difficulty referred to above has to do with the
difficulty of reconciling a written transcription with its recorded
source. For example, a documentary filmmaker may shoot twelve rolls
of interviews and have them transcribed in order to find the most
useful footage. Even though the transcripts contain time-coding
that is synchronized with the recordings, it can still be a
cumbersome, time-consuming task for the filmmaker to go back and
locate the desired footage based on the written transcript. Often,
this sort of project involves many hours of footage arriving from
different sources in various locations, thus compounding the
problem.
[0012] Accordingly, there exists a great need in the art to reduce
or eliminate the time inefficiencies imposed on clients by the
labor-intensive nature of the conventional transcription process
and the difficulty of reconciling the transcript with the
source.
SUMMARY OF THE INVENTION
[0013] A method and system for producing and working with
transcripts according to the invention eliminates the foregoing
time inefficiencies. By dispersing a source recording to a
transcription team in small segments, so that team members
transcribe segments in parallel, a rapid transcription process
delivers a fully edited transcript within minutes. Clients can view
accurate, grammatically correct, proofread and fact-checked
documents that shadow live proceedings by mere minutes. The rapid
transcript includes time coding, speaker identification and
summary. A viewer application allows a client to view a video
recording side-by-side with a transcript. Clicking on a word in the
transcript locates the corresponding recorded content; advancing a
recording to a particular point locates and displays the
corresponding spot in the transcript. The recording is viewed using
common video features, and may be downloaded. The client can edit
the transcript and insert comments. Any number of colleagues can
view and edit simultaneously.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] FIG. 1 provides a diagram of a machine in the exemplary form
of a computer system within which a set of instructions, for
causing the machine to perform any one of the methodologies
discussed herein below, may be executed;
[0015] FIG. 2 provides a schematic diagram of a method for rapid
transcription;
[0016] FIG. 3 provides a schematic diagram of a network-based
system for producing and working with transcripts;
[0017] FIG. 4 shows a main page from the system of FIG. 3;
[0018] FIG. 5 shows a matrix diagram illustrating workflow in a
rapid transcription process;
[0019] FIG. 6 shows a schematic diagram of the functional
architecture of a system for working with transcripts;
[0020] FIG. 7 provides a flow diagram of a sub-process for opening
a project;
[0021] FIG. 8 provides a flow diagram of a sub-process for making a
tagg;
[0022] FIG. 9 provides a flow diagram of a sub-process for
navigating text through video; and
[0023] FIG. 10 provides a flow diagram of a sub-process for
fine-tuning time stamps in a tagg.
DETAILED DESCRIPTION
[0024] A method and system for producing and working with
transcripts according to the invention eliminates the foregoing
time inefficiencies. By dispersing a source recording to a
transcription team in small segments, so that team members
transcribe segments in parallel, a rapid transcription process
delivers a fully edited transcript within minutes. Clients can view
accurate, grammatically correct, proofread and fact-checked
documents that shadow live proceedings by mere minutes. The rapid
transcript includes time coding, speaker identification and
summary. A viewer application allows a client to view a video
recording side-by-side with a transcript. Clicking on a word in the
transcript locates the corresponding recorded content; advancing a
recording to a particular point locates and displays the
corresponding spot in the transcript. The recording is viewed using
common video features, and may be downloaded. The client can edit
the transcript and insert comments. Any number of colleagues can
view and edit simultaneously.
[0025] Referring now to FIG. 1, shown is a diagrammatic
representation of a machine in the exemplary form of a computer
system 100 within which a set of instructions for causing the
machine to perform any one of the methodologies discussed herein
below may be executed. In alternative embodiments, the machine may
comprise a network router, a network switch, a network bridge,
personal digital assistant (PDA), a cellular telephone, a web
appliance or any machine capable of executing a sequence of
instructions that specify actions to be taken by that machine.
[0026] The computer system 100 includes a processor 102, a main
memory 104 and a static memory 106, which communicate with each
other via a bus 108. The computer system 100 may further include a
display unit 110, for example, a liquid crystal display (LCD) or a
cathode ray tube (CRT). The computer system 100 also includes an
alphanumeric input device 112, for example, a keyboard; a cursor
control device 114, for example, a mouse; a disk drive unit 116, a
signal generation device 118, for example, a speaker, and a network
interface device 128.
[0027] The disk drive unit 116 includes a machine-readable medium
124 on which is stored a set of executable instructions, i.e.
software, 126 embodying any one, or all, of the methodologies
described herein below. The software 126 is also shown to reside,
completely or at least partially, within the main memory 104 and/or
within the processor 102. The software 126 may further be
transmitted or received over a network 130 by means of a network
interface device 128.
[0028] In contrast to the system 100 discussed above, a different
embodiment of the invention uses logic circuitry instead of
computer-executed instructions to implement processing offers.
Depending upon the particular requirements of the application in
the areas of speed, expense, tooling costs, and the like, this
logic may be implemented by constructing an application-specific
integrated circuit (ASIC) having thousands of tiny integrated
transistors. Such an ASIC may be implemented with CMOS
(complimentary metal oxide semiconductor), TTL
(transistor-transistor logic), VLSI (very large scale integration),
or another suitable construction. Other alternatives include a
digital signal processing chip (DSP), discrete circuitry (such as
resistors, capacitors, diodes, inductors, and transistors), field
programmable gate array (FPGA), programmable logic array (PLA),
programmable logic device (PLD), and the like.
[0029] It is to be understood that embodiments of this invention
may be used as or to support software programs executed upon some
form of processing core (such as the Central Processing Unit of a
computer) or otherwise implemented or realized upon or within a
machine or computer readable medium. A machine-readable medium
includes any mechanism for storing or transmitting information in a
form readable by a machine, e.g. a computer. For example, a machine
readable medium includes read-only memory (ROM); random access
memory (RAM); magnetic disk storage media; optical storage media;
flash memory devices; electrical, optical, acoustical or other form
of propagated signals, for example, carrier waves, infrared
signals, digital signals, etc.; or any other type of media suitable
for storing or transmitting information.
[0030] Turning now to FIG. 2, in overview, a method 100 for rapid
transcription may include at least one of the following steps:
Rapid Transcript:
[0031] acquire source material 202; [0032] disperse source material
in short segments to transcribers 204; [0033] produce transcript
from the transcribed segments 206; [0034] edit, proofread and
fact-check the transcript 208;
Synch:
[0034] [0035] synchronize the transcript to the source material
210; and [0036] client accesses and interacts with the transcript
via a web-based viewer 212.
[0037] The rapid transcript process is used to create, within
minutes, a written transcript of taped or live proceedings. The
transcript is posted, in incrementally updated segments, on a web
page accessible only by the client. The client can read the text
directly from the web page, as well as download recordings of the
event and copies of the transcript in a series of continually
updated documents.
[0038] Rapid transcript is useful for creating written transcripts
of the following types of events: [0039] financial conference
calls: [0040] basic conference calls; [0041] interviews for
television or documentary film production; [0042] conventions and
meetings; [0043] keynotes; [0044] breakout sessions; [0045] panel
discussion; [0046] legal proceedings; [0047] depositions; [0048]
hearings; [0049] witness interviews and examinations; [0050]
transcription of broadcast for placement on internet; [0051]
candidate debates; [0052] press conferences; and [0053] previously
recorded sessions requiring immediate transcription.
[0054] Rapid transcript employs a novel system configuration that
allows a provider to quickly disperse short segments, for example,
one minute in length, of a live event or recording to any number of
transcribers. This allows each member of a team of transcribers to
be simultaneously working on a given segment of recorded material.
Thus, if it takes a single transcriber one hour to transcribe a
fifteen-minute recording, a team of five transcribers can have the
entire segment transcribed, edited, and posted to a web site within
twenty minutes. In the case of a live event, this means that
participants in a meeting or on a conference call, for example, can
be viewing an accurate, grammatically correct, proofread and
fact-checked document that shadows the live proceedings upon which
it is based by mere minutes. This transcript includes time coding,
speaker identification, and a summary of important points. The
transcript can be delivered via email or through a secure web page,
giving clients easy access via computer or handheld device.
[0055] Referring now to FIG. 3, shown is a schematic diagram of a
system 300 for rapid transcribing. At a provider's location, an FTP
(file transfer protocol) server 302 receives source material 304 in
the form of audio input from any source and encodes the source
material into audio files, such as mp3 format or various other
compressed media formats. Audio may arrive via the Internet using
VoiceOverIP (VOIP), various streaming media, or SHOUTCAST (AOL LLC
Dulles, VA) software. Audio may also arrive via landline telephone
or cell phone connection.
[0056] The audio signal is converted into segments of
predetermined, configurable length, one minute, for example, by a
recording system 306, such as the VRS recording system (NCH
SOFTWARE PTY. LTD., Canberra, Australia). Each audio segment may be
saved to a source folder 308 with a filename that includes a unique
alphanumeric code identifying the client and project, followed by
the year, month, day, hour, and minute of the segment.
[0057] Any number of individual, password-protected folders 312,
314 is established on the FTP server for each transcriber and
editor. A synchronization module 310 copies the audio segments from
the source folder and pastes a copy of each one into each
Transcriber's 312 and Editor's 314 FTP folder. In one embodiment,
SYNCTOY (MICROSOFT CORPORATION, Redmond, Wash.) fills the role of
synchronization module. When copying files from the source folder
308 to the transcriber and editor folders 312, 314, using SYNCTOY'S
`combine` synchronization method assures that files exist at both
locations, while deleted and renamed files are ignored. Using such
a synchronization method is preferred in this case because, as
explained below, during the production process, as transcribers
transcribe audio segments downloaded from the FTP server, the
segments are deleted from transcriber's folder. The use of a
synchronization method that ignores deleted files assures that
system resources are not wasted copying files from the source
folder 308 unnecessarily.
[0058] The transcriber stations and Editor stations 324, 326 are
typically, but not necessarily, located off-site. A transcriber
station 324 typically constitutes a computational device programmed
to enable transcription of human speech. In one embodiment, the
computational device is programmed with transcribing software, such
as EXPRESS SCRIBE (NCH SOFTWARE PTY., LTD.). Additionally, the
transcriber station 324 includes a transcriber operating the
computational device to transcribe downloaded segments.
[0059] An editor station 326 typically constitutes a computation
device programmed to enable activities such as editing,
proofreading and fact-checking. Additionally, the editor station
includes an editor using the computational device to perform the
activities of an editor.
[0060] In FIG. 3, the transcriber stations 324 and editor stations
326 are surrounded by a dashed line for ease of representation to
indicate the similarity of their status as clients to the web
server 316 and the FTP server 302. Double-headed arrows between the
transcriber stations and editor stations and the servers are
included to illustrate the bidirectional data flow between the
clients and the servers. The dashed line is not intended to
indicate that the client, in the form of transcriber stations 324
and/or editor stations 326 are disposed at the same location,
although they could be.
[0061] A web server 316 includes a front end 318 that manages
authentication to the main page 320--described in greater detail
below. Whenever a transcriber arrives at the main page URL, the
log-in action through a `before_filter` verifies a browser's
session user ID against those stored at the web server 316. If
verification fails, the browser is redirected to a login page. If
the browser authenticates successfully, the browser is redirected
to the main page.
[0062] Preferably, a new, password-protected web page is created
for each Rapid Transcript project that may include: one or more
main pages 320 for use by the Transcribers, one or more append
pages 322, and one or more Client pages 330 upon which the final
edited transcript is posted.
[0063] FIG. 4 shows an exemplary main page 400. As shown in FIG. 4,
the main page 400 includes an alternating sequence of time stamps
and buttons 406 and buttons 404. The main page 400 is refreshed at
regular intervals, one minute, for example. As in FIG. 4, the
initial time stamp in the main page preferably starts at 00:00:00.
Each subsequent time stamp in the time stamp sequence is
incremented by fixed interval, for example, one minute, until the
sequence reaches the duration of the recorded source material. The
time stamps are preferably displayed in HH:MM:SS format. As shown,
time stamps 406 alternate with buttons on the page.
[0064] Clicking one of the buttons redirects the browser to an
append page 322 that is associated with the time stamp above the
button. An append page contains a text field and a text field
`submit` button. It is to be noted here that portions of an
embodiment of the present application employ a
model-view-controller development framework, for example "Ruby on
Rails", wherein the "model" is understood to constitute a data
model, the "view" constitutes a user interface and the "controller"
constitutes the application's control logic. However, other
programming approaches and/or development frameworks can be
employed to implement the system herein described and are within
the scope of claims. The text field `submit` button, when clicked,
submits the text field input to an action which then passes the
input to a model, which, in turn, adds the input to a data
structure, such as a table, which is associated with a particular
time stamp. Transcribers "claim" a segment for transcription by
placing their initials in the field associated with a given audio
segment. In this way, a group of transcribers is able to
spontaneously organize itself for maximum efficiency.
[0065] For example, as shown in FIG. 4, the first two time stamps
bear the transcriber initials "NS", and "AG", respectively. Thus,
those two segments have been claimed by the transcribers identified
by the initials "NS" and "AG." As above, the transcriber may claim
a time stamp by clicking the `append` button for the time stamp,
which navigates the transcriber to the `append` page 322 for the
time stamp. Upon gaining access to the `append` page, the
transcriber may enter his/her initials in the text field, and click
the `submit` button, whereupon the transcriber's initials are
entered into the data structure for the time stamp and subsequently
displayed on the `append` page. In one embodiment, as shown in FIG.
4, the transcriber's initials appear above the `append` button for
the time stamp.
[0066] As previously described, recorded segments are written to
the transcriber folders 312 on the FTP server 302. In order to
download the recorded segments to a transcribing station 324, the
transcriber sets up an automated incoming FTP connection. The
transcribing software on the transcription station is then
configured to authenticate and check for new files in the
transcriber's folder 312, at regular intervals of, for example, one
minute. As new files become available, they are automatically
downloaded to the transcription station 324. Whenever the client
station downloads new files over the FTP connection, the
transcriber can chose to transcribe a new audio file. In actual
practice, the transcriber transcribes only files that the
transcriber has claimed by means of the foregoing reservation
process.
[0067] The transcriber looks for the claimed file among the
downloaded audio files and transcribes the claimed file.
[0068] After finishing the transcription, the transcriber may then
upload the transcription by the previously described process for
submitting input: he/she navigates to the main page 320, clicks the
`append` button 404 for the particular time stamp. Clicking the
`append` button navigates the transcriber to the `append` page 322
for the time stamp. The transcriber then pastes the transcript into
the text submit box for the time stamp, clicks the `submit` button
and the text of the transcript is submitted in the manner
previously described. Subsequently, the transcriber's browser is
then navigated to the main page 320 to claim another time stamp. In
this way, a rough draft of the source material is produced by a
team of transcribers working in parallel.
[0069] It should be noted that the Rapid Transcript process can
also make use of stenography or voice recognition software to
produce this initial rough draft.
[0070] If a staff member is observing a live meeting involving
multiple speakers, she may also log-on to the main page 320, and
take notes regarding the speaker ID directly on the web page.
Transcribers can consult this speaker ID log while
transcribing.
[0071] The next step is to edit the individual segments into a
single document. An editor logs on to the project website and
copies transcribed text from one or more cells (text from more than
one cell can be copied in one swath). The text is then pasted into
a word processing program on the Editor's computer. The editor
listens to the audio segments and cross-references the transcribed
material against the recorded segments. If the material at this
point is ready for the client, the editor uploads the edited
transcript to the Client web page. If the material requires
additional fact-checking, the editor can upload the edited
transcript to a second website, where another reviews the material
and checks facts.
The Rapid Transcript Web Page
[0072] A client begins by logging on to the provider's password
protected website. This brings the client to a dedicated web page
containing a list of the client's projects. More will be said about
the client page herein below. Any number of people associated with
a given project may be logged on concurrently.
[0073] The web page created for the client consists of the
transcript, with time coding, side by side with a column containing
a summary, if requested. Links to the one-minute audio or video
segments are provided, as well as a link to a recording of the
entire proceeding. As well as reading from the website, the client
is able to download the transcript in either PDF or word-processing
file formats. The transcript can also be sent by email directly
from this web page.
Translation
[0074] The rapid transcript method can be utilized in translation
as well. For a session that includes a translator, the client's web
page typically displays three main columns: transcription in the
original language, the translator's words, and the provider's own
expert translation.
[0075] Turning now to FIG. 5, shown is a matrix depicting an
exemplary workflow for the rapid transcript process. The timeline
shown in FIG. 5 is merely illustrative and is not intended to be
limiting. The following description proceeds by describing the role
of each of the parties at each point on the timeline.
[0076] Initially, before an event, a number of preparatory
activities are carried out, by the provider, the transcribers and
the editors. The provider for example may perform any of: [0077]
Creating a web page for the transcribers and the editors; [0078]
Pre-populating the web page with individual segments that
correspond to the anticipated time and length of recording; [0079]
Specifying in the page the desired transcribing style and client
keywords; [0080] Securing an audio feed and patching into the
recording computer; and [0081] Setting up a backup recording
computer.
[0082] Each of the transcribers and editors may also have a
preparatory role. For example, each may need to configure his or
her audio playback software to download recorded segments.
Additionally, each transcriber claims at least one segment.
Finally, the client, in anticipation of the event, may log onto a
password protected web page. More will be said about the client's
page below.
[0083] At minute 1, the provider records the first segment of the
speech that is to be transcribed. In one embodiment, the segment
length is set at one minute. However, segment length is a
configurable parameter and thus, the exemplary segment length of
one minute is not intended to be limiting. It should be noticed
that the segments are recorded with an overlap in order to prevent
loss of material. In one embodiment, the overlap is configured to
be five seconds in length.
[0084] At minute 2, the provider uploads segment 1 to the FTP site.
Transcriber 1 downloads segment 1 and begins transcribing it. The
remaining transcribers download subsequent segments as they become
available. The client may, from their web site, review the audio of
segment 1.
[0085] At minute 3, segment 2 is uploaded to the FTP site and
transcriber 2 begins transcription of segment 2.
[0086] At minute 4, the provider uploads segment 3 and transcribers
3 begins transcribing segment 3.
[0087] At minute 5, the provider uploads segment 4 to the FTP site.
Transcriber 1 completes transcription of segment 1. Transcriber
posts the transcript to the web page in an appropriate cell and
claims segment 6. Transcriber 4 begins transcription of segment 4.
Editor 1 begins proofreading of segment 1.
[0088] At minute 6, the provider uploads segment 5 to the FTP site
and transcriber 5 begins transcription of segment 5. Editor 1
completes proofreading of segment 1 and posts the proofread
transcription to a second web page.
[0089] At minute 7, the provider uploads segment 6 to the FTP site
and transcribers 1 begins transcription of segment 6. Transcriber 3
may complete transcribing of segment 3 and posts the transcript to
the web page in the appropriate cell. Transcriber 3 then claims
segment 7. Editor 1 begins proofreading segment 3. Editor 2
fact-checks segment 1 and posts to the client web page.
[0090] At minute 8, the provider uploads segment 7 to the FTP site.
Transcriber 2 may finish with segment 2 and post it to the web page
in the appropriate cell, and then claim segment 8. Transcriber 3
may begin transcribing segment 7. Transcriber 4 may begin
transcribing segment 4 and posts it to the web page in the
appropriate cell. Transcriber 4 then may claim segment 9. Editor 1
completes proofreading segment 3 and posts the proofread transcript
to the second web page. Editor 1 may then begin proofreading
segment 2. The client may review the first installation of the
transcript.
[0091] At minute 9, the provider uploads segment 8 to the FTP site.
Transcriber 2 may then begin transcription of segment 8. Editor 1
typically completes segment 2 by this time and posts it to the
second web page. Editor then proceeds to proofread segment 4.
Editor 2 fact-checks segment 3.
[0092] At minute 10, there may remain no further segments to
upload. Transcriber 1 completes segment 6 and posts it to the web
page in the appropriate cell. Transcriber 5 completes segment 5 and
posts the transcript to the web page in an appropriate cell. Editor
1 completes proofreading segment 4 and posts to the second web
page. Editor 1 then begins proofreading segments 5 and 6. Editor 2
fact-checks segment 2 and posts segments 2 and 3 to the client web
page.
[0093] At minute 11, Editor 1 completes proofreading segments 5 and
6. Editor 2 begins and completes fact-checking segment 4. The
second installation of the transcript is available to the client,
containing segments 1-3.
[0094] At minute 12, Editor 2 begins and completes fact-checking
segments 5 and 6. The workflow proceeds in such fashion until all
segments have been transcribed, edited and fact-checked and
reviewed by the client. The rapid transcript process may terminate,
for example, approximately five minutes after the end of a live
event with the delivery of a completed transcript. Additionally,
the client may download the audio and/or video of the entire event
from the client's project web site. Additionally, the transcript
may be downloaded as a word-processing or .PDF file, or sent via
email.
[0095] As described above, the usefulness of a conventional
transcript to the client is limited because the transcript is not
synchronized to the source content. Therefore, the client must
spend a great deal of time searching through the transcript and the
source material in order to reconcile them to each other. In order
to eliminate this cumbersome chore, transcripts are synchronized to
the source material.
[0096] Synch links a written transcript to its source material. In
one embodiment, the process of linking transcript and source
material follows the rapid transcript process, either immediately
or at some unspecified future time. In another embodiment, the
synchronization process can be applied to any conventional
transcript produced from audio and/or video source material. A web
page allows a client to view a video recording, for example,
side-by-side with a transcript of the video's audio content. In one
embodiment, the client sees four main items on the screen: A video,
a transcript, a list of projects, and a "tagg" table. A tagg is a
memo of description created by the client that describes the
contents of a portion of the video. In this way, the client is
provided a means for reviewing and making decisions concerning
audio or video recording by means of a transcript which is linked
to the recording. Multiple recordings can thus be categorized and
referenced; a particular phrase can be located in the written
transcript and immediately reviewed in the recording.
[0097] Each point in the recording is synchronized to the
corresponding text in the written transcript, so that clicking on a
word in the transcript automatically locates the corresponding spot
in the recording. Conversely, advancing a recording to a particular
point automatically locates and displays the corresponding text in
the transcript. The recording can be viewed using common video
viewing features (play, stop, skip forwards and backwards, play at
various speeds, etc.), and may be downloaded. The client can read
and edit the written transcript, and insert comments. Any number of
colleagues can be using this service to work on a series of
transcripts simultaneously.
[0098] To synchronize the source material with the transcript, the
source material is generally converted to a media player file
format, if necessary. One embodiment converts source videos to the
QUICKTIME (APPLE, INC., Cupertino CA) video format. The ordinarily
skilled practitioner will also appreciate that the QUICKTIME
application can also play many other types of media files, such
MPEG, MP3 and WAV. Thus, audio files can also be rendered in a
format compatible with the QUICKTIME player.
[0099] As previously described, one or more transcribers create a
transcript of the source material, typically using a word
processing application such as WORD (MICROSOFT CORPORATION). At the
time the transcription is created, the transcriber(s) place(s) time
code markers in the transcript. If necessary, the transcript may be
re-formatted using a text editor, such as TEXTPAD (HELIOS SOFTWARE
SOLUTIONS, Longridge, UK). Using a media transcription application
such as INQSCRIBE (INQUIRIUM, LLC, Chicago, Ill.) the media file is
combined with information about the embedded time codes in the
transcript. The transcript is then further formatted to convert
into a table wherein the time codes and the written words in the
transcript are associated with each other in a one-to-one
relationship. In one embodiment, converting the transcript to a
table is accomplished by using a spreadsheet application, such as
EXCEL (MICROSOFT CORPORATION) to format the transcript into cells
consisting of time codes and transcript words. Each cell from the
spreadsheet file then becomes a field in the database table. The
words of the transcript are associated to the appropriate segment
of source content by means of the time codes. As described below, a
project includes a database, wherein the transcript tables are
stored. When the client selects a project to view, as described
below, the transcript is reconstituted by displaying the words
stored in the database in the correct order.
[0100] As previously described in relation to the time code
process, a web site is created for each project. The project web
site preferably includes at least one password-protected client
page 328 wherein the client is able to view the transcript.
Log In
[0101] As above, a project is created, for example, in `RUBY ON
RAILS.` As above, the project is preferably password-protected.
Authentication is preferably managed by the development platform by
components provided within the development platform, including an
authentication database and controller. A client begins by logging
on to the password-protected web site. This brings them to the
client page, 328, a detailed view 602 of which is shown in FIG. 6,
which contains a list of their projects. Any number of people
associated with a given project may be logged on concurrently.
Main Interface
[0102] Having negotiated the authentication process, the client
gains access to the main interface. In one embodiment, the main
interface includes one or more of the following sections: [0103]
Project List 606; [0104] Transcript 608; [0105] Video 610; and
[0106] Tagg table 612.
[0107] After having logged in, the client's web browser calls a
controller 604 from `Ruby on Rails.` The controller collects all
the HTML items and text items from the `Ruby on Rails` model and
from the `Ruby on Rails` view template. These text items are
displayed in a pre-made template on the client's browser. The
template is specified by the HTML items mentioned above. The
template specifies which text item goes to which portion of the
four portions of the main page 602 mentioned above. Then the main
page is loaded.
Opening a Project
[0108] The client opens a project 700 by selecting a project from
the list 606. This brings up the web page where, on one side of the
screen, one can view a recording 610 and, on the other side, there
is a corresponding written transcript 608. As described above, the
transcript and video are linked. That is to say, each point in the
recording corresponds to a specific place in the written
transcript. Additionally, the project's tagg table is displayed
612. Whenever a client selects a project different from the current
one being displayed, the web page replaces these four elements with
elements of the new project without the browser being reloaded. The
replacement is done by a script that hides the element that is
being replaced and sets the new element's CSS (cascading style
sheets) `visibility` attribute to "visible." All projects are
loaded. Their CSS `visibility` attribute is, by default, set to
"hidden."
Navigating Video, Text, and Taggs
[0109] An embodiment enables the linking and coordination of the
viewable recording, the written transcript, and the various
associated taggs. Each point in the recording corresponds to a
specific place in the written transcript; if taggs have been
created they are linked to specific places in both the video and
the written transcript. The client, then, has three options for
navigating the material: clicking on a word in the transcript
automatically locates the corresponding spot in the recording;
advancing a recording to a particular point automatically locates
and displays the corresponding spot in the transcript; and clicking
on a tagg will bring the user to the corresponding places in the
video and transcript, and display the contents of the tagg in a
dialogue box.
1. Navigating by Way of the Transcript
[0110] After the project is open, the client may click on any text
in the transcript. Clicking on a point in the transcript
automatically locates the corresponding spot in the recording and
plays the recording from this spot forward. This occurs because an
event called ST (Set Time) 704 in a JavaScript library 702 that
passes the video ID and the time stamp which are embedded in the
text to the QUICKTIME player 710. The QUICKTIME player
automatically issues a CB (Call Back) which carries the message
whether or not the portion of video has been loaded. When the very
same JavaScript library 702 receives this message it either
notifies the client through a CSS pop-up that the video is still
being downloaded and cannot be played, or it plays the video,
beginning at the spot indicated by the time stamp 706.
[0111] Additionally, advancing a recording to a particular point
automatically locates and displays the corresponding spot in the
transcript.
[0112] The client can view the recording using common video viewing
features (play, stop, skip forwards and backwards, play at various
speeds, and so on).
[0113] The client can also read and edit the written transcript.
When the client mouses over the written transcript, a transparent
"balloon tip" reveals the time code in the corresponding recording.
Depending on the client's specifications, time codes may show the
elapsed time of the recording, the day and time of the original
recording, or the SMPTE (Society of Motion Picture and Television
Engineers) time code embedded in the original recording.
[0114] The transcript may also be downloaded.
2. Navigating by Way of the Video
[0115] The client can drag the play bar in the QUICKTIME player to
a desired position 900. The client then pushes a `Show Text` button
next to the QUICKTIME player. This calls a JavaScript function 902
which passes the current playtime timestamp in the QUICKTIME player
to a JavaScript library which then evaluates 904 (which converts
hours/minutes/seconds timestamp to an integer timestamp) and finds
the closest matching word ID 906 whose timestamp is the closest
match to the current playtime timestamp 902. This JavaScript
function then highlights the text 908 whose timestamp matches the
word ID that was just found. This very same JavaScript function
passes that word ID and timestamp to a hidden form for possible
tagg creation 910. The browser displays and highlights the text
corresponding to the words spoken in the video 912.
3. Navigating by Using Taggs
[0116] As above, a tagg is a memo of description created by the
client that describes the contents of a portion of the video. By
interacting with the tagg table, the client is able to navigate
through the video, in a similar manner to that enabled by
interacting with the transcript. The client may click on a tagg 714
whose URL calls an action `Show Text` 716 in the controller 604.
The action then calls a plurality of functions from the JavaScript
library 702: `Replace Text`, 718 `Replace Video` 718, `Start/Stop`
718 and `Seek in Transcript` 718: [0117] `Replace Text` and
`Replace Video` both use the CSS `visibility` attribute to hide a
current video and show the desired video; [0118] `Start/Stop` uses
the default QUICKTIME API 710 (application programming interface)
to set the start time and the stop time of the video. The video is
then played immediately; [0119] The `Seek in Transcript feature`
uses the default `Set Timeout` JavaScript function and the `Focus`
function to center and highlight the text 720 relating to this time
stamp for five seconds.
[0120] The video is replaced and played, and the text is centered
and highlighted.
Creating a Tagg
[0121] When the client finds a useful or interesting portion of the
written transcript, he or she can create a tagg. Any number of
taggs can be created by various users in a single project and they
can be cross-referenced with multiple recordings and transcripts
within a project. Taggs are created by highlighting a desired
portion of text. Once the text is highlighted, a dialogue box
appears and the client may give this tagg a title. Taggs may be of
any length. A new tagg may overlap an existing segment.
Additionally, a tagg may have a `name` attribute, which allows the
client, or another party to assign a name to the portion of the
recorded source material that the Tagg applies to.
[0122] When a new tagg is defined by the client, the program
automatically records details related to the tagg, including the
author of the segment; the date and time the segment was created;
the original recording that correlates with the segment; and the
beginning and ending time codes of the segment. [0123] MARKERS:
Once a tagg has been defined, the written transcript includes
visual markers that identify the starting and ending points of the
tagg. Markers are named according to the source recording. The
opening marker of a segment is displayed in colored text, green for
example; the closing marker is displayed in a color distinct from
that of the opening marker, red for example. [0124] COMMENTS:
Immediately upon defining a tagg, clients are prompted to write a
comment. The client may write a comment of any length relating to a
given tagg. Additionally, a client may add a comment to a
previously created tagg. Clients can also record comments about
taggs in response to other clients' previously written remarks.
This may serve as a forum for discussing materials.
[0125] When "mousing over" the beginning or ending markers, a
balloon opens with comments and identifying information.
[0126] To create a tagg, the client highlights 800, using the
mouse, a portion of the transcript. Highlighting the portion of the
transcript triggers a pair of JavaScript events 802: [0127]
`OnMouseDown` passes a Word ID, transcript ID, and a timestamp of
the first word highlighted to a hidden html form 812; and [0128]
`OnMouseUp` passes a Word ID, transcript ID, and a timestamp of the
last word highlighted to the hidden html form 812, which triggers a
third event called `FormPopUp`. [0129] The `FormPopUp` event
utilizes the CSS `visibility` attribute to unhide the hidden form
812 mentioned above, so that the client is now able to enter
desired descriptive information in this form. When he or she has
done so, the client clicks a button which submits the form to a URL
which triggers a `Create Tagg` action in a `Ruby on Rail`s
controller 816. That action sends the parameters to the tagg model
818 in `Ruby on Rails`, where a validation function is triggered
that checks all the incoming data. [0130] Table: All comments
relating to any recording in a given project, along with their
identifying details as described above, are stored in the tagg
table 612. The tagg table may be viewed and edited online, or
downloaded for use in spreadsheet applications. Additionally, the
tagg table, or any portion thereof, is separately exportable. In
one embodiment, at least the export version of the tagg table is
written in a standardized markup language such as XML (eXtendible
Markup Language) so that the tagg table is compatible with video
editing programs such as FINAL CUT PRO (APPLE, INC.) or AVID (AVID
TECHNOLOGY, INC. Tewkesbury, Mass.). In this way, taggs can be used
to facilitate the processes of video logging and editing.
[0131] The table may be sorted according to various criteria. For
example, a client may wish to view all of the material related to a
particular speaker, and next to see which videotapes this material
is on. As such, this is a valuable tool for editing, as it allows
people to organize material from various sources.
[0132] The table is linked to the transcripts. Clicking on an item
in the table will bring to view the corresponding point in the
transcript and recording.
[0133] Taggs may be edited, removed from the table perhaps to be
used later, or permanently deleted.
[0134] A client may place (or remove) a star next to her favorite
segments to assist in editing. Segments may be sorted by the
criterion of stars.
[0135] If all data is validated, the model creates a new row 826 in
the `MySQL` database table 822. The `Ruby on Rails controller` 816
calls a built in `Prototype` JavaScript event which updates the
client's browser with the new data, without reloading the
browser.
[0136] If not all data is validated, the model 818 calls the Ruby
on Rails controller 816 to send back to the client's browser an
error message, which is displayed by setting the CSS `visibility`
attribute of the error message HTML div tagg to "visible."
Fine Tuning Time Stamps of a Tagg
[0137] The client clicks 1002 on a time stamp in the tagg table
612. This triggers an `OnClick` JavaScript event that calls the
inline html JavaScript function 1004 to replace this time stamp
with an editable field. This editable field contains four
subfields, labeled Minute 1006, Second 1010, and milliseconds or
Frames 1012. The client can then manually input a number in one or
more of these subfields. The client either hits "Enter" or "Escape"
1014.
[0138] Hitting `Enter` or clicking anywhere on the browser submits
these parameters to an action 1016 in the `Ruby on Rails`
controller 716 that passes these parameters to a model 1020 in
`Ruby on Rails` which does a validation of this data. If the date
is validated, the model updates the entry in the MySQL database
table 1028. The action 1016 also overrides the original time code
in tagg 1018 and sends these parameters back to another action 1022
in the controller 1022 which updates 1024 the tagg in the tagg
Table in the client's browser with reloading it.
[0139] Hitting `Escape` triggers an `onkeypress` JavaScript event
which triggers a JavaScript function to cancel and remove the
editable mode and hence leave the timestamp unmodified. The
JavaScript function also restores the display mode 1014.
[0140] In the foregoing specification, the invention has been
described with reference to specific exemplary embodiments thereof.
It will, however, be evident that various modifications and changes
may be made thereto without departing from the broader spirit and
scope of the invention as set forth in the appended claims. The
specification and drawings are, accordingly, to be regarded in an
illustrative sense rather than a restrictive sense.
* * * * *