U.S. patent application number 13/317509 was filed with the patent office on 2012-02-16 for method and system for classifying scene for each person in video.
This patent application is currently assigned to SAMSUNG ELECTRONIC CO. LTD.. Invention is credited to Jin Guk Jeong, Ji Yeun Kim, Sang Kyun Kim, San Ko.
Application Number | 20120039515 13/317509 |
Document ID | / |
Family ID | 39382421 |
Filed Date | 2012-02-16 |
United States Patent
Application |
20120039515 |
Kind Code |
A1 |
Jeong; Jin Guk ; et
al. |
February 16, 2012 |
Method and system for classifying scene for each person in
video
Abstract
Described is a method of classifying a scene for each person in
a video, the method including: detecting a face within input video
frames; detecting a shot change of the input video frames;
extracting a person representation frame in the shot; performing a
person clustering in the extracted person representation frame
based on time information; detecting a scene change by separating a
person portion from a background based on face extraction
information, and comparing the person portion and the background;
and merging similar clusters from the extracted person
representation frame and performing a scene clustering for each
person.
Inventors: |
Jeong; Jin Guk; (Yongin-si,
KR) ; Kim; Ji Yeun; (Seoul, KR) ; Kim; Sang
Kyun; (Yongin-si, KR) ; Ko; San; (Seoul,
KR) |
Assignee: |
SAMSUNG ELECTRONIC CO. LTD.
Suwon-si
KR
|
Family ID: |
39382421 |
Appl. No.: |
13/317509 |
Filed: |
October 20, 2011 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11882733 |
Aug 3, 2007 |
8073208 |
|
|
13317509 |
|
|
|
|
Current U.S.
Class: |
382/118 |
Current CPC
Class: |
G06K 9/00718 20130101;
G06F 16/784 20190101; G06K 9/00765 20130101; G06K 9/00295
20130101 |
Class at
Publication: |
382/118 |
International
Class: |
G06K 9/62 20060101
G06K009/62 |
Foreign Application Data
Date |
Code |
Application Number |
Jan 4, 2007 |
KR |
10-2007-0000957 |
Claims
1. A method of classifying a scene for each person in a video, the
method comprising: extracting a person representation frame in a
shot; comparing a first person representation frame and a second
person representation frame; performing a person clustering by
extending a time window when the first person representation frame
is similar to the second person representation frame; merging
similar clusters using a person cluster extracted from a
representation frame and performing a scene clustering for each
person based on a scene change, wherein the scene change is
determined using a person portion and a background portion.
2. The method of claim 1, wherein the performing of the person
clustering further comprises: receiving cluster information, the
first person representation frame, and the second person
representation frame to be compared; including the second person
representation frame which has been currently compared in the
current cluster information when the first person representation
frame is similar to the second person representation frame; and
setting a subsequent person representation frame as third person
representation frame to be compared on the time window.
3. The method of claim 2, further comprising: moving to a
subsequent cluster when the first person representation frame and
the second person representation frame are at the end of the time
window; or setting the subsequent person representation frame in
the time window as the other person representation frame to be
compared on the time window, when the first person representation
frame and the second person representation frame to be compared are
not at the end of the time window.
4. The method of claim 1, wherein the performing of the scene
clustering comprises: receiving time information-based clusters;
selecting two clusters having a minimum difference value; comparing
the minimum difference value and a threshold value; and merging the
two clusters when the minimum difference value is less than the
threshold value.
5. A non-transitory computer-readable recording medium storing a
program for implementing a method of classifying a scene for each
person in a video, the method comprising: extracting a person
representation frame in a shot; comparing a first person
representation frame and a second person representation frame;
performing a person clustering by extending a time window when the
first person representation frame is similar to the second person
representation frame; merging similar clusters using a person
cluster extracted from a representation frame and performing a
scene clustering for each person based on a scene change, wherein
the scene change is determined using a person portion and a
background portion.
6. A system for classifying a scene for each person in a video, the
system comprising: a person representation frame extracting unit to
extract a person representation frame in a shot; a person
clustering unit to compare a first person representation frame and
a second person representation frame and to perform a person
clustering by extending a time window when the first person
representation frame is similar to the second person representation
frame; a scene clustering unit to merge similar clusters using a
person cluster extracted from a representation frame and to perform
a scene clustering for each person based on a scene change, wherein
the scene change is determined using a person portion and a
background portion.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of application Ser. No.
11/882,733 filed on Aug. 3, 2007, which claims the priority of
Korean Patent Application No. 10-2007-0000957, filed on Jan. 4,
2007, in the Korean Intellectual Property Office, the disclosure of
which is incorporated herein by reference.
BACKGROUND
[0002] 1. Field
[0003] The present invention relates to a method and system for
classifying a scene for each person in a video, and more
particularly, to a method and system for classifying a scene for
each person in a video based on person information and background
information in video data.
[0004] 2. Description of the Related Art
[0005] Generally, a scene is a unit between when video contents are
changed. In a conventional art, scenes are classified by using low
level information such as color information or edge
information.
[0006] Specifically, shots are clustered using low level
information such as color information extracted in all frames, and
a scene segmentation is detected in a conventional automatic scene
segmentation algorithm. However, when a person in a video moves or
a camera moves, low level information changes. Accordingly, a
degree of accuracy decreases.
[0007] Also, persons in a video are clustered using face
information, and thus the persons are classified in a conventional
person classification method. However, face information changes
depending on poses, lighting, and the like, which causes a low
accuracy.
[0008] Accordingly, a method and system for classifying a scene for
each person in a video is required.
SUMMARY
[0009] An aspect of the present invention provides a method and
system for classifying a scene for each person in a video which may
provide a story overview for each person by classifying a person by
a scene unit by using temporal information in video data.
[0010] An aspect of the present invention also provides a method
and system for classifying a scene for each person in a video which
may improve an accuracy of a scene segmentation detection by
separating a person portion and a background in video data and
using information about the person portion and the background
together.
[0011] According to an aspect of the present invention, there is
provided a method of classifying a scene for each person in a
video, the method including: detecting a face within input video
frames; detecting a shot change of the input video frames;
extracting a person representation frame in the shot; performing a
person clustering in the extracted person representation frame
based on time information; detecting a scene change by separating a
person portion from a background based on face extraction
information, and comparing the person portion and the background;
and merging similar clusters from the extracted person
representation frame and performing a scene clustering for each
person.
[0012] According to another aspect of the present invention, there
is provided a system for classifying a scene for each person in a
video, the system including: a face detection unit detecting a face
within input video frames; a shot change detection unit detecting a
shot change of the input video frames; a person representation
frame extraction unit extracting a person representation frame in
the shot; a person clustering unit performing a person clustering
in the extracted person representation frame based on time
information; a scene change detection unit detecting a scene change
by separating a person portion from a background based on face
extraction information and comparing the person portion and the
background; and a scene clustering unit merging similar clusters
from the extracted person representation frame and performing a
scene clustering for each person.
[0013] Additional aspects and/or advantages of the invention will
be set forth in part in the description which follows and, in part,
will be apparent from the description, or may be learned by
practice of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] The patent or application file contains at least one drawing
executed in color. Copies of this patent or patent application
publication with color drawing(s) will be provided by the Office
upon request and payment of the necessary fee.
[0015] These and/or other aspects, features, and advantages of the
invention will become apparent and more readily appreciated from
the following description of exemplary embodiments, taken in
conjunction with the accompanying drawings of which:
[0016] FIG. 1 is a block diagram illustrating a configuration of a
system for classifying a scene for each person in a video according
to an embodiment of the present invention;
[0017] FIG. 2 is a diagram illustrating an example of clothes
information and face information detected in a same time window
according to an embodiment of the present invention;
[0018] FIG. 3 is a diagram illustrating an example of performing a
clustering for each person according to an embodiment of the
present invention;
[0019] FIG. 4 is a flowchart illustrating a method of classifying a
scene for each person in a video according to another embodiment of
the present invention;
[0020] FIG. 5 is a flowchart illustrating an operation of a time
information-based person clustering illustrated in FIG. 4 according
to another embodiment of the present invention;
[0021] FIG. 6 is a flowchart illustrating an operation of a scene
change detection illustrated in FIG. 4 according to another
embodiment of the present invention; and
[0022] FIG. 7 is a flowchart illustrating an operation of a scene
clustering for each person according to another embodiment of the
present invention.
DETAILED DESCRIPTION
[0023] Reference will now be made in detail to embodiments of the
present invention, examples of which are illustrated in the
accompanying drawings, wherein like reference numerals refer to the
like elements throughout. The embodiments are described below in
order to explain the present invention by referring to the
figures.
[0024] FIG. 1 is a block diagram illustrating a configuration of a
system for classifying a scene for each person in a video according
to an embodiment of the present invention.
[0025] Referring to FIG. 1, the system for classifying a scene for
each person in a video 100 includes a face detection unit 110, a
shot change detection unit 120, a person representation frame
extraction unit 130, a person clustering unit 140, a scene change
detection unit 150, and a scene clustering unit 160.
[0026] The face detection unit 110 detects a face of input video
frames. Specifically, the face detection unit 110 analyzes the
input video frames, and detects the face of the input video
frames.
[0027] The shot change detection unit 120 detects a shot change
within the input video frames. Specifically, the shot change
detection unit 120 detects the shot change of the input video
frames to segment the input video frames into a shot which is a
basic unit of the video.
[0028] The person representation frame extraction unit 130 extracts
a person representation frame in the shot. Using all person frames
for a person clustering is inefficient. Accordingly, the person
representation frame extraction unit 130 extracts a frame which is
closest to a center frame having a greatest similarity in each
cluster as the person representation frame, after performing a
clustering of frames including a face in the shot. Specifically,
the person representation frame extraction unit 130 extracts the
frame one by one in all clusters and may set the frame as the
person representation frame in the shot, since at least one person
may be included in the shot.
[0029] The person clustering unit 140 performs the person
clustering in the extracted person representation frame based on
time information. When simply performing a clustering based on all
person representation frames, an algorithm for various poses or
lightings may not be strict. Accordingly, the person clustering
unit 140 performs the person clustering by using the time
information to start clustering based on various forms of each
person. Specifically, as illustrated in FIG. 2, a single person
generally wears same clothes within a similar time period in same
video data, and such clothes information has a clearer difference
than face information. Accordingly, the person clustering unit 140
obtains various forms of the single person by using the clothes
information.
[0030] FIG. 2 is a diagram illustrating an example of clothes
information and face information detected in a same time window
according to an embodiment of the present invention.
[0031] A location and size of a face 211, 221, 231, 241, and 251,
automatically detected in a person representation frame 210, 220,
230, 240, and 250 in a shot, and a location and size of clothes
212, 222, 232, 242, and 252, extracted in the person representation
frame 210, 220, 230, 240, and 250, are illustrated in FIG. 2. The
size of clothes is determined in proportion to a size of a key
person in the person representation frame 210, 220, 230, 240, and
250 in the shot.
[0032] The person clustering unit 140 extracts clothes information
from current cluster information, a current person representation
frame, and a comparison person representation frame, i.e. a person
representation frame to be compared. The person clustering unit 140
compares the current person representation frame and the comparison
person representation frame, and determines whether the current
person representation frame is similar to the comparison person
representation frame as a result of the comparing. The person
clustering unit 140 extends a time window when the current person
representation frame is similar to the comparison person
representation frame, and includes the person representation frame
which has been currently compared in the current cluster
information. The person clustering unit 140 sets a subsequent
person representation frame as another comparison person
representation frame on the time window. Also, the person
clustering unit 140 determines whether the current person
representation frame and the comparison person representation frame
are at an end of the time window, when the current person
representation frame is different from the comparison person
representation frame. The person clustering unit 140 sets the
subsequent person representation frame in the time window as the
other comparison person representation frame, when the current
person representation frame and the comparison person
representation frame are not at the end of the time window.
[0033] A scene change detection unit 150 detects a scene change by
separating a person portion from a background based on face
extraction information and comparing the person portion and the
background. Specifically, the scene change detection unit 150 may
approximately extract a person by using the face extraction
information, and thus may detect the scene change by the separating
and the comparing after the person is approximately extracted.
[0034] The scene change detection unit 150 receives current scene
information, a current shot representation frame, and a comparison
shot representation frame, and extracts background information from
the current shot representation frame and the comparison shot
representation frame. The scene change detection unit 150 compares
the current shot representation frame and the comparison shot
representation frame, and determines whether the current shot
representation frame is similar to the comparison shot
representation frame. The scene change detection unit 150 extends
the time window when the current shot representation frame is
similar to the comparison shot representation frame, and marks that
the comparing of the current shot representation frame is
completed. The scene change detection unit 150 assigns the
comparison shot representation frame to the current shot
representation frame, and assigns a subsequent shot representation
frame in the time window to the comparison shot representation
frame. The scene change detection unit 150 marks that the comparing
of the current shot representation frame is completed, when the
current shot representation frame is different from the comparison
shot representation frame, and determines whether comparing all
frames in the time window is completed. The scene change detection
unit 150 assigns a subsequent shot representation frame where the
comparing is incomplete to the current shot representation frame,
and assigns the subsequent shot representation frame to the
comparison shot representation frame, when the comparing is
incomplete.
[0035] A scene clustering unit 160 merges similar clusters from the
extracted person representation frame and performs a scene
clustering for each person. Specifically, the scene clustering unit
160 may perform the scene clustering for each person by comparing
the person representation frame in the shot and merging the similar
clusters according to the comparison, as illustrated in FIG. 3.
[0036] The scene clustering unit 160 receives time
information-based clusters, and selects two clusters having a
minimum difference value. The scene clustering unit 160 compares
the minimum difference value and a threshold value, and merges the
two clusters when the minimum difference value is less than the
threshold value. The scene clustering detection unit 160 connects
scenes including a person frame in a same cluster, when the minimum
difference value is equal to or greater than the threshold value. A
scene clustering method for each person is described in greater
detail with reference to FIG. 3.
[0037] FIG. 3 is a diagram illustrating an example of performing a
clustering for each person according to an embodiment of the
present invention.
[0038] In operation S1, a scene clustering unit 160 compares a
first person representation frame 310 and a second person
representation frame 320, and performs a first merge of similar
clusters based on a result of the comparison. In operation S2, the
scene clustering unit 160 compares a fifth person representation
frame 350 and a sixth person representation frame 360, and performs
a second merge of similar clusters based on a result of the
comparison. In operation S3, the scene clustering unit 160 compares
a third person representation frame 330 and a seventh person
representation frame 370, and performs a third merge of similar
clusters based on a result of the comparison. In operation S4, the
scene clustering unit 160 compares the first merge and the second
merge, and performs a fourth merge of similar clusters based on a
result of the comparison.
[0039] FIG. 4 is a flowchart illustrating a method of classifying a
scene for each person in a video according to another embodiment of
the present invention.
[0040] Referring to FIG. 4, in operation S410, a system for
classifying a scene for each person in a video detects a face
within input video frames. Specifically, the system for classifying
a scene for each person in a video analyzes the input video frames
via a face detector and thereby may detect the face within the
input video frames.
[0041] In operation S420, the system for classifying a scene for
each person in a video detects a shot change within the input video
frames. Specifically, the system for classifying a scene for each
person in a video detects the shot change within the input video
frames to segment the input video frames into a shot which is a
basic unit of the video.
[0042] In operation S430, the system for classifying a scene for
each person in a video extracts a person representation frame in
the shot. Since using all person frames for a person clustering is
inefficient, the system for classifying a scene for each person in
a video extracts a frame which is closest to a center in each
cluster as the person representation frame, after performing a
clustering of frames including a face in the shot. Specifically,
the system for classifying a scene for each person in a video
extracts the frame one by one in all frames and may set the frame
as the person representation frame in the shot, since at least one
person may be included in the shot.
[0043] In operation S440, the system for classifying a scene for
each person in a video performs the person clustering in the
extracted person representation frame based on time information.
When simply clustering based on all person representation frames,
an algorithm for various poses or lightings may not be strict.
Accordingly, the system for classifying a scene for each person in
a video performs the person clustering by using the time
information to start clustering based on various forms of each
person. Specifically, a single person generally wears the same
clothes within a similar time period in the same video data, and
such clothes information has a clearer difference than face
information. Accordingly, the system for classifying a scene for
each person in a video obtains various forms of the single person
by using the clothes information. An operation of a time
information-based person clustering is described in greater detail
with reference to FIG. 5.
[0044] FIG. 5 is a flowchart illustrating an operation of a time
information-based person clustering illustrated in FIG. 4 according
to another embodiment of the present invention.
[0045] Referring to FIG. 5, in operation S501, the system for
classifying a scene for each person in a video receives current
cluster information, a current person representation frame, and a
comparison person representation frame. The comparison person
representation frame is a person representation frame to be
compared.
[0046] In operation S502, the system for classifying a scene for
each person in a video extracts clothes information of each of the
current person representation frame and the comparison person
representation frame. Specifically, the system for classifying a
scene for each person in a video may extract the clothes
information by referring to the location and size of the face from
the face information as illustrated in FIG. 2 to reduce a time to
extract clothes information.
[0047] In operation S503, the system for classifying a scene for
each person in a video compares the current person representation
frame and the comparison person representation frame. Specifically,
the system for classifying a scene for each person in a video adds
a comparison value of color information corresponding to the
clothes information and a weight of a comparison value
corresponding to the face information, when comparing.
[0048] In operation S504, the system for classifying a scene for
each person in a video determines whether the current person
representation frame is similar to the comparison person
representation frame, as a result of the comparing.
[0049] In Operation S505, when the current person representation
frame is similar to the comparison person representation frame, the
system for classifying a scene for each person in a video extends a
time window T.sub.fw. Specifically, when the current person
representation frame is similar to the comparison person
representation frame, the system for classifying a scene for each
person in a video resets the time window T.sub.fw from a present
point in time, since a same person exists up to the present point
in time.
[0050] In operation S506, the system for classifying a scene for
each person in a video includes the comparison person
representation frame which has been currently compared in the
current cluster information. Specifically, the system for
classifying a scene for each person in a video includes the
comparison person representation frame, which has been compared
with the current person representation frame, in the current
cluster information.
[0051] In operation S507, the system for classifying a scene for
each person in a video sets a subsequent person representation
frame in the time window T.sub.fw as other comparison person
representation frame, and performs operation S502. Specifically,
the system for classifying a scene for each person in a video
continues to compare using the subsequent person representation
frame in the time window T.sub.fw.
[0052] In operation S508, when the current person representation
frame is different from the comparison person representation frame,
the system for classifying a scene for each person in a video
determines whether the current person representation frame and the
comparison person representation frame are at an end of the time
window T.sub.fw. Specifically, when the current person
representation frame is different from the comparison person
representation frame, the system for classifying a scene for each
person in a video determines whether the all frames in the time
window T.sub.fw are compared by using a result of the determining
whether the current person representation frame and the comparison
person representation frame are at the end of the time window
T.sub.fw.
[0053] In operation S509, when the current person representation
frame and the comparison person representation frame are at the end
of the time window T.sub.fw, the system for classifying a scene for
each person in a video moves to a subsequent cluster and performs a
time information-based person clustering for the subsequent
cluster, since all person representation frames corresponding to a
current cluster are extracted.
[0054] In operation S510, when the current person representation
frame and the comparison person representation frame are not at the
end of the time window T.sub.fw, the system for classifying a scene
for each person in a video sets the subsequent person
representation frame as the comparison person representation frame,
and performs operation S502, since the all person representation
frames corresponding to the current cluster are not detected.
[0055] In operation S450, the system for classifying a scene for
each person in a video detects a scene change by separating a
person portion from a background based on face extraction
information and comparing the person portion and the background.
Specifically, the system for classifying a scene for each person in
a video may approximately extract a person by using the face
extraction information, and thus may detect the scene change by the
separating and the comparing after the person is approximately
extracted. A scene change detection operation is described in
greater detail with reference to FIG. 6.
[0056] FIG. 6 is a flowchart illustrating an operation of a scene
change detection illustrated in FIG. 4 according to another
embodiment of the present invention.
[0057] Referring to FIG. 6, in operation S601, the system for
classifying a scene for each person in a video receives current
scene information, a current shot representation frame P.sub.f, and
a comparison shot representation frame C.sub.f.
[0058] In operation S602, the system for classifying a scene for
each person in a video extracts background information of the
current shot representation frame P.sub.f and the comparison shot
representation frame C.sub.f. The background information is
information about a pixel of another location excluding a face
location and a clothes location.
[0059] In operation S603, the system for classifying a scene for
each person in a video compares the current shot representation
frame P.sub.f and the comparison shot representation frame C.sub.f.
Specifically, the system for classifying a scene for each person in
a video adds the comparison value of the color information
corresponding to the clothes information and the weight of the
comparison value corresponding to the face information, when
comparing. Also, when comparing the background information, a
normalized color histogram, and a hue, saturation, value (HSV) are
used.
[0060] In operation S604, the system for classifying a scene for
each person in a video determines whether the current shot
representation frame P.sub.f is similar to the comparison shot
representation frame C.sub.f, as a result of the comparing.
[0061] In operation S605, when the current shot representation
frame P.sub.f is similar to the comparison shot representation
frame C.sub.f, the system for classifying a scene for each person
in a video extends a time window T.sub.sw. Specifically, the system
for classifying a scene for each person in a video resets the time
window T.sub.sw to extend a scene again, since a same scene is
continued up to a point in time when the current shot
representation frame P.sub.f is similar to the comparison shot
representation frame C.sub.f.
[0062] In operation S606, the system for classifying a scene for
each person in a video marks that the comparing of the current shot
representation frame P.sub.f is completed, and sets the comparison
shot representation frame C.sub.f as the current shot
representation frame P.sub.f.
[0063] In operation S607, the system for classifying a scene for
each person in a video sets a subsequent shot representation frame
in the time window T.sub.sw as a comparison shot representation
frame (*C.sub.f?), and performs operation S602. Specifically, the
system for classifying a scene for each person in a video continues
to compare using the subsequent shot representation frame in the
time window T.sub.sw.
[0064] In operation S608, when the current shot representation
frame P.sub.f is different from the comparison shot representation
frame C.sub.f, the system for classifying a scene for each person
in a video marks that the comparing of the current shot
representation frame P.sub.f is completed.
[0065] In operation S609, the system for classifying a scene for
each person in a video determines whether comparing all frames in
the time window T.sub.sw is completed.
[0066] In operation S610, when the comparing all frames in the time
window T.sub.sw is completed, the system for classifying a scene
for each person in a video determines a shot, which is examined
last and determined to be a similar shot, as a last shot of a
current scene, since all shots corresponding to the current scene
are detected. Also, the system for classifying a scene for each
person in a video performs a detection operation of a subsequent
scene.
[0067] In operation S611, when the comparing is incomplete, the
system for classifying a scene for each person in a video sets a
subsequent shot representation frame where the comparing is
incomplete as the current shot representation frame P.sub.f, and
sets the subsequent shot representation frame as the comparison
shot representation frame C.sub.f. Also, the system for classifying
a scene for each person in a video performs operation S602.
[0068] In operation S460, the system for classifying a scene for
each person in a video merges similar clusters from the extracted
person representation frame and performs the scene clustering for
each person. Specifically, the system for classifying a scene for
each person in a video may perform the scene clustering by
comparing and merging as illustrated in FIG. 3. An operation of a
scene clustering for each person is described in greater detail
with reference to FIG. 7.
[0069] FIG. 7 is a flowchart illustrating an operation of a scene
clustering for each person according to another embodiment of the
present invention.
[0070] Referring to FIG. 7, in operation S701, the system for
classifying a scene for each person in a video receives time
information-based clusters.
[0071] In operation S702, the system for classifying a scene for
each person in a video selects two clusters having a minimum
difference value from difference values from among all clusters.
Specifically, the difference values of all clusters may be compared
using an average value of each cluster. Also, the minimum
difference value may be used after comparing all objects of a
corresponding cluster and all objects of a comparison cluster.
[0072] In operation S703, the system for classifying a scene for
each person in a video compares the minimum difference value and a
threshold value and determines whether the minimum difference value
is less than the threshold value.
[0073] In operation S704, when the minimum difference value is less
than the threshold value, the system for classifying a scene for
each person in a video merges the two clusters, as illustrated in
FIG. 3, since the two clusters include a similar person. Also, the
system for classifying a scene for each person in a video performs
operation S702.
[0074] In operation S705, when the minimum difference value is
equal to or greater than the threshold value, the system for
classifying a scene for each person in a video connects scenes
including a person frame in a same cluster. Specifically, the
system for classifying a scene for each person in a video
determines that all clustering are completed when the minimum
difference value is equal to or greater than the threshold value.
Also, when connecting the scenes including a same person, the
operation of a scene clustering for each person is completed. Each
scene may be included in many clusters since various persons may
exist in a single scene.
[0075] The method and system for classifying a scene for each
person in a video according to the above-described exemplary
embodiments of the present invention may be recorded in
computer-readable media including program instructions to implement
various operations embodied by a computer. The media may also
include, alone or in combination with the program instructions,
data files, data structures, and the like. Examples of
computer-readable media include magnetic media such as hard disks,
floppy disks, and magnetic tape; optical media such as CD ROM disks
and DVD; magneto-optical media such as optical disks; and hardware
devices that are specially configured to store and perform program
instructions, such as read-only memory (ROM), random access memory
(RAM), flash memory, and the like. The media may also be a
transmission medium such as optical or metallic lines, wave guides,
etc. including a carrier wave transmitting signals specifying the
program instructions, data structures, etc. Examples of program
instructions include both machine code, such as produced by a
compiler, and files containing higher level code that may be
executed by the computer using an interpreter. The described
hardware devices may be configured to act as one or more software
modules in order to perform the operations of the above-described
exemplary embodiments of the present invention.
[0076] A method and system for classifying a scene for each person
in a video according to the above-described embodiments of the
present invention may provide a story overview for each person by
classifying a person by a scene unit by using temporal information
in video data.
[0077] Also, a method and system for classifying a scene for each
person in a video according to the above-described embodiments of
the present invention may improve an accuracy of a scene
segmentation detection by separating a person portion and a
background in video data and using information about the person
portion and the background together.
[0078] Also, a method and system for classifying a scene for each
person in a video according to the above-described embodiments of
the present invention may replay for each person in video data, and
thereby may enable a user to selectively view a scene including a
person that the user likes.
[0079] Also, a method and system for classifying a scene for each
person in a video according to the above-described embodiments of
the present invention may classify a person by a scene unit, which
is a story unit in video data, and thereby may improve a scene
classification accuracy and enable a scene-based navigation.
[0080] Also, a method and system for classifying a scene for each
person in a video according to the above-described embodiments of
the present invention may perform a video data analysis more easily
by improving a scene classification accuracy in video data.
[0081] Although a few embodiments of the present invention have
been shown and described, the present invention is not limited to
the described embodiments. Instead, it would be appreciated by
those skilled in the art that changes may be made to these
embodiments without departing from the principles and spirit of the
invention, the scope of which is defined by the claims and their
equivalents.
* * * * *