U.S. patent application number 10/325061 was filed with the patent office on 2004-06-24 for system and method for annotating multi-modal characteristics in multimedia documents.
Invention is credited to Adams, Hugh W. JR., Iyengar, Giridharan, Lin, Ching-Yung, Neti, Chalapathy V., Smith, John R., Tseng, Belle L..
Application Number | 20040123231 10/325061 |
Document ID | / |
Family ID | 32593641 |
Filed Date | 2004-06-24 |
United States Patent
Application |
20040123231 |
Kind Code |
A1 |
Adams, Hugh W. JR. ; et
al. |
June 24, 2004 |
System and method for annotating multi-modal characteristics in
multimedia documents
Abstract
A manual annotation system of multi-modal characteristics in
multimedia files. There is provided an arrangement for selection an
observation modality of video with audio, video without audio,
audio with video, or audio without video, to be used to annotate
multimedia content. While annotating video or audio features in
isolation results in less confidence in the identification of the
features, observing both audio and video simultaneously and
annotating that observation results in a higher confidence
level.
Inventors: |
Adams, Hugh W. JR.;
(Wappingers Falls, NY) ; Iyengar, Giridharan;
(Mahopac, NY) ; Lin, Ching-Yung; (Forest Hills,
NY) ; Neti, Chalapathy V.; (Yorktown Heights, NY)
; Smith, John R.; (New Hyde Park, NY) ; Tseng,
Belle L.; (Forest Hills, NY) |
Correspondence
Address: |
FERENCE & ASSOCIATES
400 BROAD STREET
PITTSBURGH
PA
15143
US
|
Family ID: |
32593641 |
Appl. No.: |
10/325061 |
Filed: |
December 20, 2002 |
Current U.S.
Class: |
715/202 ;
707/E17.009; 715/230 |
Current CPC
Class: |
G06F 16/48 20190101;
G06F 40/169 20200101 |
Class at
Publication: |
715/500.1 ;
715/512 |
International
Class: |
G06F 017/21 |
Claims
What is claimed is:
1. An apparatus for managing multimedia content, said apparatus
comprising: an arrangement for supplying multimedia content; an
input interface for permitting the selection, for observation, of
at least one of the following modes associated with the multimedia
content: an audio portion that includes video; and a video portion
that includes audio; and an arrangement for annotating observations
of a selected mode.
2. The apparatus according to claim 1, wherein said input interface
permits the selection, for observation, of both of the following
associated with the multimedia content: an audio portion that
includes video; and a video portion that includes audio.
3. The apparatus according to claim 1, wherein said input interface
additionally permits the selection, for observation, of solely a
video portion of multimedia content.
4. The apparatus according to claim 1, wherein said input interface
additionally permits the selection, for observation, of solely an
audio portion of multimedia content.
5. The apparatus according to claim 1, wherein said arrangement for
supplying multimedia content comprises a working memory which
stores multimedia files.
6. The apparatus according to claim 1, wherein said input interface
is adapted to: first permit the selection of a multimedia file and
then permit the selection of said at least one of: an audio portion
simultaneously with video; and a video portion simultaneously with
audio.
7. The apparatus according to claim 1, further comprising a working
memory for saving the annotated observations of a selected
mode.
8. The apparatus according to claim 1, wherein said input interface
is adapted to permit the selection, for observation, at least the
following mode associated with the multimedia content: a video
portion that includes audio.
9. The apparatus according to claim 8, wherein said input interface
comprises: an arrangement for permitting the selection, for
observation, of a video mode of multimedia content; and an
arrangement for selectably adding audio to the video mode for
observation.
10. A method of managing multimedia content, said method comprising
the steps of: supplying multimedia content; permitting the
selection, for observation, of at least one of the following modes
associated with the multimedia content: an audio portion that
includes video; and a video portion that includes audio; and
annotating observations of a selected mode.
11. The method according to claim 10, wherein said step of
permitting selection comprises permitting the selection, for
observation, of both of the following associated with the
multimedia content: an audio portion that includes video; and a
video portion that includes audio.
12. The method according to claim 10, wherein said step of
permitting selection additionally comprises permitting the
selection the selection, for observation, of solely a video portion
of multimedia content.
13. The method according to claim 10, wherein step of permitting
selection comprises permitting the selection, for observation, of
solely an audio portion of multimedia content.
14. The method according to claim 10, wherein said step of
supplying multimedia content comprises providing a working memory
which stores multimedia files.
15. The method according to claim 10, wherein said step of
permitting selection comprises: first permitting the selection of a
multimedia file and then permitting the selection of said at least
one of: an audio portion simultaneously with video; and a video
portion simultaneously with audio.
16. The method according to claim 10, further comprising the step
of providing a working memory for saving the annotated observations
of a selected mode.
17. The method according to claim 10, wherein said step of
permitting selection comprises permitting the selection, for
observation, at least the following mode associated with the
multimedia content: a video portion that includes audio.
18. The method according to claim 17, wherein said step of
permitting selection comprises: permitting the selection, for
observation, of a video mode of multimedia content; and thereafter
enabling the addition of audio to the video mode for
observation.
19. A program storage device readable by machine, tangibly
embodying a program of instructions executable by the machine to
perform method steps for managing multimedia content, said method
comprising the steps of: supplying multimedia content; permitting
the selection, for observation, of at least one of the following
modes associated with the multimedia content: an audio portion that
includes video; and a video portion that includes audio; and
annotating observations of a selected mode.
Description
FIELD OF THE INVENTION
[0001] The present invention relates to the computer processing of
multimedia files. More specifically, the present invention relates
to the manual annotation of multi-modal events, objects, scenes,
and audio occurring in multimedia files.
BACKGROUND OF THE INVENTION
[0002] Multimedia content is becoming more common both on the World
Wide Web and local computers. As the corpus of multimedia content
increases, the indexing of features within the content becomes more
and more important. Observing both audio and video simultaneously
and annotating that observation results in a higher confidence
level.
[0003] Existing multimedia tools provide capabilities to annotate
either audio or video separately, but not as a whole. (An example
of a video-only annotation tool is the IBM MPEG7 Annotation Tool,
inventors J. Smith et al., available through
[http://]www.alphaworks.ibm.com/tech/vide- oannex. Other
conventional arrangements are described in: Park et al,
"iMEDIA-CAT: Intelligent Media Content Annotation Tool", Proc.
International Conference on Inductive Modeling (ICIM) 2001, South
Korea, November 2001; and Minka et al., "Interactive Learning using
a Society of Models," Pattern Recognition, Vol. 30, pp. 565, 1997,
TR #349.
[0004] It has long been recognized that annotating video or audio
features in isolation results in a less confidence of the
identification of the features.
[0005] In view of the foregoing, a need has been recognized in
connection with providing improved systems and methods for
observing and annotating multi-modal events, objects, scenes, and
audio occurring in multimedia files.
SUMMARY OF THE INVENTION
[0006] In accordance with at least one presently preferred
embodiment of the present invention, there are broadly contemplated
multimedia annotation systems and methods that permit users to
observe solely video, video with audio, solely audio, or audio with
video and to annotate what has been observed.
[0007] In one embodiment, there is provided a computer system which
has one or more multimedia files that are stored in a working
memory. The multi-modal annotation process displays a user selected
multimedia file, permits the selection of a mode or modes to
observe the file content, annotates the observations; and saves the
annotations in a working memory (such as a MPEG-7 XML file).
[0008] In summary, one aspect of the invention provides an
apparatus for managing multimedia content, the apparatus
comprising: an arrangement for supplying multimedia content; an
input interface for permitting the selection, for observation, of
at least one of the following modes associated with the multimedia
content: an audio portion that includes video; and a video portion
that includes audio; and an arrangement for annotating observations
of a selected mode.
[0009] A further aspect of the invention provides a method of
managing multimedia content, the method comprising the steps of:
supplying multimedia content; permitting the selection, for
observation, of at least one of the following modes associated with
the multimedia content: an audio portion that includes video; and a
video portion that includes audio; and annotating observations of a
selected mode.
[0010] Furthermore, an additional aspect of the invention provides
a program storage device readable by machine, tangibly embodying a
program of instructions executable by the machine to perform method
steps for managing multimedia content, the method comprising the
steps of: supplying multimedia content; permitting the selection,
for observation, of at least one of the following modes associated
with the multimedia content: an audio portion that includes video;
and a video portion that includes audio; and annotating
observations of a selected mode.
[0011] For a better understanding of the present invention,
together with other and further features and advantages thereof,
reference is made to the following description, taken in
conjunction with the accompanying drawings, and the scope of the
invention will be pointed out in the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] FIG. 1 is a block diagram depicting a multi-modal annotation
system.
[0013] FIG. 2 is an illustration of a system annotating video
scenes, objects, and events.
[0014] FIG. 3 is an illustration of a system annotating audio with
video.
[0015] FIG. 4 is an illustration of a system annotating audio
without video.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0016] FIG. 1 is a block diagram of one preferred embodiment of a
multi-modal annotation system in accordance with the present
invention. The multimedia content and previous annotations are
stored on the storage medium 100. When a user 130 selects a
multimedia file via the annotation tool from the storage medium
100, it is loaded into working memory 110 and portions of it
displayed in the annotation tool 120. At any time, the user 130 may
also request that previously saved annotations associated with the
current multi-modal file be loaded from the storage medium 100 into
working memory 110. The user 100 views the multimedia data by
making requests through the annotation tool 120. The user 130 then
annotates his observations and the annotation tool 120 saves these
annotations in working memory 110. The user can at anytime request
the annotation tool 120 to save the annotation on the storage
medium 100.
[0017] FIG. 2 is an illustration of a system annotating video
scenes, objects, and events. (Simultaneous reference should also be
made to FIG. 1.) The multimedia data has been loaded from the
storage medium 100 into working memory 110. A video tab 290 has
been selected. The multimedia video has been segmented using scene
changed detection into shots. A shot list window 200 displays a
portion of the shots in the multimedia. Here, the user 130 has
selected a shot 210 which is highlighted in the shot list window
200. A key frame 220, which is a representative shot in the frames
of a shot, is preferably displayed. In addition, the frames of the
shot maybe viewed in the video window 230 using play controls 240.
The video can be viewed with or without audio depending upon the
selection of a mute button 250. The user 130 may select annotations
for this shot by clicking the boxes in events 260, static scenes
270, or key objects 280 lists of boxes. Any significant
observations which are not contained in the check boxes can be
noted in a keywords text box 300.
[0018] FIG. 3 is an illustration of the system annotating audio
with video. (Simultaneous reference should also be made to FIG. 1.)
The multimedia data has been loaded from the storage medium 100
into working memory 110. The audio with video tab 370 has been
selected. The multimedia video has been segmented using scene
change detection into shots. The shot list window 200 displays a
portion of the shots in the multimedia. The shot 210 associated
with the current audio position is highlighted in the shot list
window 200. The audio data is displayed in the window 390. A
segment of audio 340 has been delimited for annotation; that is,
the limits or bounds of the audio has been fixed for subsequent
annotation. The video associated with the audio is shown in 230. As
the user 130 uses the play controls 360, the audio data display 390
is updated to display the current audio data and the video window
230 changes to reflect the current video frame. Thus, the user 130
may observe the Video and simultaneously hear the audio while
making audio annotations. The user 130 preferably uses the buttons
350 to delimit audio segments. Check boxes corresponding to the
foreground sounds (320) (the most prominent sounds in the segment)
and background sounds (330) (sounds which are present but are
secondary to other sounds) may be checked to indicated sounds heard
within the audio segment 340. Any significant observations which
are not contained in the check boxes can be noted in keywords text
box 300.
[0019] FIG. 4 is an illustration of the system annotating audio
without video. (Simultaneous reference should be made to FIG. 1.)
The multimedia data has been loaded from the storage medium 100
into working memory 110. Audio-without-video tab 400 has been
selected. The audio data is displayed in the window 390. A segment
of audio 340 has been delimited for annotation. As the user 130
uses the play controls 360, the audio data display 390 is updated
to display the current audio data. Thus, the user 130 may only hear
the audio while making audio annotations. The user 130 uses the
buttons 350 to delimit audio segments. The check boxes for
foreground sounds 320 and background sounds 330 may be checked to
indicate sounds heard within the audio segment 340. Any significant
observations which are not contained in the check boxes can be
noted in the keywords text box 300.
[0020] It is to be understood that the present invention, in
accordance with at least one presently preferred embodiment,
includes an arrangement for supplying multimedia content, an input
interface for permitting the selection, for observation, of a mode
associated with the multimedia content, and an arrangement for
annotating observations of a selected mode. Together, these
elements may be implemented on at least one general-purpose
computer running suitable software programs. These may also be
implemented on at least one Integrated Circuit or part of at least
one Integrated Circuit. Thus, it is to be understood that the
invention may be implemented in hardware, software, or a
combination of both.
[0021] If not otherwise stated herein, it is to be assumed that all
patents, patent applications, patent publications and other
publications (including web-based publications) mentioned and cited
herein are hereby fully incorporated by reference herein as if set
forth in their entirety herein.
[0022] Although illustrative embodiments of the present invention
have been described herein with reference to the accompanying
drawings, it is to be understood that the invention is not limited
to those precise embodiments, and that various other changes and
modifications may be affected therein by one skilled in the art
without departing from the scope or spirit of the invention.
* * * * *
References