U.S. patent application number 16/843447 was filed with the patent office on 2020-07-23 for system and method for clustering multimedia content elements.
This patent application is currently assigned to CORTICA LTD.. The applicant listed for this patent is CORTICA LTD.. Invention is credited to Karina Odinaev, Igal Raichelgauz, Yehoshua Y. ZEEVI.
Application Number | 20200233891 16/843447 |
Document ID | / |
Family ID | 54065620 |
Filed Date | 2020-07-23 |
![](/patent/app/20200233891/US20200233891A1-20200723-D00000.png)
![](/patent/app/20200233891/US20200233891A1-20200723-D00001.png)
![](/patent/app/20200233891/US20200233891A1-20200723-D00002.png)
![](/patent/app/20200233891/US20200233891A1-20200723-D00003.png)
![](/patent/app/20200233891/US20200233891A1-20200723-D00004.png)
![](/patent/app/20200233891/US20200233891A1-20200723-D00005.png)
United States Patent
Application |
20200233891 |
Kind Code |
A1 |
Raichelgauz; Igal ; et
al. |
July 23, 2020 |
SYSTEM AND METHOD FOR CLUSTERING MULTIMEDIA CONTENT ELEMENTS
Abstract
A system and method for clustering multimedia content. The
method includes: detecting at least one clustering trigger event
related to at least one multimedia content element to be clustered;
generating at least one signature for the at least one multimedia
content element, each signature representing at least a portion of
the at least one multimedia content element; determining, based on
the generated at least one signature, at least one multimedia
content element cluster, wherein each multimedia content element
cluster includes a plurality of clustered multimedia content
elements sharing at least one common concept with the at least one
multimedia content element; and adding, to each determined cluster,
the at least one multimedia content element.
Inventors: |
Raichelgauz; Igal; (New
York, NY) ; Odinaev; Karina; (New York, NY) ;
ZEEVI; Yehoshua Y.; (Haifa, IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
CORTICA LTD. |
Tel Aviv |
|
IL |
|
|
Assignee: |
CORTICA LTD.
Tel Aviv
IL
|
Family ID: |
54065620 |
Appl. No.: |
16/843447 |
Filed: |
April 8, 2020 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15420989 |
Jan 31, 2017 |
|
|
|
16843447 |
|
|
|
|
62307515 |
Mar 13, 2016 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/487 20190101;
G06Q 30/0201 20130101; G06F 16/7847 20190101; G06K 9/00281
20130101; G06K 9/6267 20130101; G06Q 30/0261 20130101; G09B 19/0092
20130101; G10L 15/32 20130101; H04H 20/103 20130101; H04N 21/2668
20130101; Y10S 707/99948 20130101; G06F 16/4393 20190101; H04H
2201/90 20130101; G06K 9/00711 20130101; G06F 16/14 20190101; G06F
3/0488 20130101; G06F 16/51 20190101; G10L 25/51 20130101; H04H
60/33 20130101; H04H 60/71 20130101; H04N 21/466 20130101; G06F
16/685 20190101; G06N 5/022 20130101; H04N 21/25891 20130101; G06K
9/00758 20130101; G06F 16/41 20190101; G06F 16/438 20190101; G06F
16/951 20190101; H04H 20/93 20130101; H04H 60/58 20130101; H04L
67/22 20130101; G06F 40/134 20200101; G06F 16/433 20190101; H04H
60/66 20130101; G06F 16/9558 20190101; G06F 16/285 20190101; G10L
15/26 20130101; H04H 60/46 20130101; G06F 16/434 20190101; G06F
16/683 20190101; G06N 5/04 20130101; H04H 20/26 20130101; H04H
60/59 20130101; H04L 67/10 20130101; G06F 16/904 20190101; G06F
3/0484 20130101; G06F 16/172 20190101; G06F 16/783 20190101; G06N
5/025 20130101; G06Q 30/0246 20130101; G06F 16/40 20190101; G06T
19/006 20130101; H04H 60/56 20130101; G06F 16/284 20190101; G06F
16/43 20190101; G06F 16/7844 20190101; H04H 60/37 20130101; H04L
67/327 20130101; H04N 7/17318 20130101; G06F 16/435 20190101; G06F
16/48 20190101; G06N 5/02 20130101; G06N 7/005 20130101; G06N 20/00
20190101; H04L 67/306 20130101; H04N 21/8106 20130101; G06F 16/152
20190101; H04H 60/49 20130101; G06K 2209/27 20130101; H04L 65/601
20130101; G06F 3/048 20130101; G06F 16/7834 20190101; G06F 16/1748
20190101; Y10S 707/99943 20130101; G06K 9/00744 20130101; G06F
16/2228 20190101; G06F 16/35 20190101 |
International
Class: |
G06F 16/41 20060101
G06F016/41 |
Claims
1. A method for clustering multimedia content, comprising:
detecting at least one clustering trigger event related to at least
one multimedia content element to be clustered; generating at least
one signature for the at least one multimedia content element, each
signature representing at least a portion of the at least one
multimedia content element; following the generating of the at
least one signature, generating, based on the generated at least
one signature, at least one tag for the at least one multimedia
content element; determining, based on the generated at least one
signature, at least one multimedia content element cluster, wherein
each multimedia content element cluster includes a plurality of
clustered multimedia content elements sharing at least one common
concept with the at least one multimedia content element; and
adding, to each determined cluster, the at least one multimedia
content element.
2. The method of claim 1, wherein the at least one multimedia
content element cluster is determined based further on the
generated at least one tag.
3. The method of claim 2, wherein each multimedia content element
cluster includes a plurality of multimedia content elements
associated with the generated at least one tag.
4. The method of claim 1, further comprising: querying, with
respect to the generated at least one signature, a deep content
classification system to obtain at least one concept structure
matching the at least one multimedia content element, each concept
structure including a signature reduced cluster and metadata, and
generating the at least one tag for the at least one multimedia
content element based on the metadata of the at least one concept
structure.
5. The method of claim 1, wherein each multimedia content element
cluster is associated with at least one portion of a signature that
is common to the plurality of clustered multimedia content elements
of the multimedia content element cluster and to the at least one
multimedia content element.
6. The method of claim 1, further comprising: determining, based on
the generated at least one signature, whether an existing
multimedia content element cluster sharing a common concept with
the multimedia content element can be found; and generating another
multimedia content element cluster, when it is determined that an
existing multimedia content element cluster sharing a common
concept with the multimedia content element cannot be found.
7. The method of claim 1, wherein the at least one signature is
generated via a signature generator system, wherein the signature
generator system includes a plurality of at least partially
statistically independent computational cores, wherein the
properties of each computational core are set independently of
properties of each other computational core.
8. The method of claim 1, wherein the detected at least one
clustering trigger event includes receiving a request to cluster
the at least one multimedia content element, wherein the request
includes at least one of: the at least one multimedia content
element, at least one identifier of the at least one multimedia
content element, and at least one location of the at least one
multimedia content element.
9. The method of claim 1, further comprising: storing, in a data
storage, the at least one cluster including the added at least one
multimedia content element, wherein each cluster is stored in a
separate location of the data storage.
10. The method according to claim 1 wherein the common concept
represents a sub textual information out of (a) activities or
actions being performed, and (b) relationships among individuals
shown in the at least one multimedia content element.
11. The method according to claim 1 wherein the common concept
represents a meta aspect indicating information about an
acquisition of the at least one multimedia content element.
12. The method according to claim 1 wherein the common concept
represents a user having a user device that captured the at least
one multimedia content element.
13. The method according to claim 1 wherein the common concept
differs from a textual tag.
14. The method according to claim 1 wherein the common concept
represents an item captured in the at least one multimedia content
element.
16. A non-transitory computer readable medium having stored thereon
instructions for causing a processing circuitry to execute a
process, the process comprising: detecting at least one clustering
trigger event related to at least one multimedia content element to
be clustered; generating at least one signature for the at least
one multimedia content element, each signature representing at
least a portion of the at least one multimedia content element;
following the generating of the at least one signature, generating,
based on the generated at least one signature, at least one tag for
the at least one multimedia content element; determining, based on
the generated at least one signature, at least one multimedia
content element cluster, wherein each multimedia content element
cluster includes a plurality of clustered multimedia content
elements sharing at least one common concept with the at least one
multimedia content element; and adding, to each determined cluster,
the at least one multimedia content element.
16. A system for clustering multimedia content, comprising: a
processing circuitry; and a memory, the memory containing
instructions that, when executed by the processing circuitry,
configure the system to: detect at least one clustering trigger
event related to at least one multimedia content element to be
clustered; generate at least one signature for the at least one
multimedia content element, each signature representing at least a
portion of the at least one multimedia content element; generate,
following a generating of the at least one signature, and based on
the generated at least one signature, at least one tag for the at
least one multimedia content element; determine, based on the
generated at least one signature, at least one multimedia content
element cluster, wherein each multimedia content element cluster
includes a plurality of clustered multimedia content elements
sharing at least one common concept with the at least one
multimedia content element; and add, to each determined cluster,
the at least one multimedia content element.
17. The system of claim 16, wherein the at least one multimedia
content element cluster is determined based further on the
generated at least one tag.
18. The system of claim 17, wherein each multimedia content element
cluster includes a plurality of multimedia content elements
associated with the generated at least one tag.
19. The system of claim 16, wherein the system is further
configured to: query, with respect to the generated at least one
signature, a deep content classification system to obtain at least
one concept structure matching the at least one multimedia content
element, each concept structure including a signature reduced
cluster and metadata, and generate the at least one tag for the at
least one multimedia content element based on the metadata of the
at least one concept structure.
20. The system of claim 16, wherein each determined cluster is
associated with at least one portion of a signature that is common
to the clustered multimedia content elements of the cluster and to
the at least one multimedia content element.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent Ser. No.
15/420,989 filing date Jan. 31, 2017 which claims the benefit of
U.S. Provisional Application No. 62/307,515 filed on Mar. 13,
2016.
[0002] The contents of the above-referenced applications are hereby
incorporated by reference.
TECHNICAL FIELD
[0003] The present disclosure relates generally to organizing
multimedia content, and more specifically to clustering based on
analysis of multimedia content elements.
BACKGROUND
[0004] As the Internet continues to grow exponentially in size and
content, the task of finding relevant and appropriate information
has become increasingly complex. Organized information can be
browsed or searched more quickly than unorganized information. As a
result, effective organization of content allowing for subsequent
retrieval is becoming increasingly important.
[0005] Search engines are often used to search for information,
either locally or over the World Wide Web. Many search engines
receive queries from users and uses such queries to find and return
relevant content. The search queries may be in the form of, for
example, textual queries, images, audio queries, etc.
[0006] Search engines often face challenges when searching for
multimedia content (e.g., images, audio, videos, etc.). In
particular, existing solutions for searching for multimedia content
are typically based on metadata of multimedia content elements.
Such metadata may be associated with a multimedia content element
and may include parameters such as, for example, size, type, name,
short description, tags describing articles or subject matter of
the multimedia content element, and the like. A tag is a
non-hierarchical keyword or term assigned to data (e.g., multimedia
content elements). The name, tags, and short description are
typically manually provided by, e.g., the creator of the multimedia
content element (for example, a user who captured the image using
his smart phone), a person storing the multimedia content element
in a storage, and the like.
[0007] Tagging has gained widespread popularity in part due to the
growth of social networking, photograph sharing, and bookmarking of
websites. Some websites allow users to create and manage tags that
categorize content using simple keywords. The users of such sites
manually add and define descriptions used for tags. Some of these
websites only allow tagging of specific portions of multimedia
content elements (e.g., portions of images showing people). Thus,
the tags assigned to a multimedia content may not fully capture the
contents shown therein.
[0008] Further, because at least some of the metadata of a
multimedia content element is typically provided manually by a
user, such metadata may not accurately describe the multimedia
content element or facets thereof. As examples, the metadata may be
misspelled, provided with respect to a different image than
intended, vague or otherwise failing to identify one or more
aspects of the multimedia content, and the like. As an example, a
user may provide a file name "weekend fun" for an image of a cat,
which does not accurately indicate the contents (e.g., the cat)
shown in the image. Thus, a query for the term "cat" would not
return the "weekend fun" image.
[0009] Additionally, different users may utilize different tags to
refer to the same subject or topic, thereby resulting in some
multimedia content elements related to a particular subject having
one tag and other multimedia content elements related to the
subject having a different tag. For example, one user may tag
images of trees with the term "plants," while another user tags
images of trees with the term "trees." Thus, a query based on
either the tag "plants" or the tag "trees" will only return results
including one of the images despite both images being relevant to
the query.
[0010] It would therefore be advantageous to provide a solution
that would overcome the deficiencies of the prior art.
SUMMARY
[0011] A summary of several example embodiments of the disclosure
follows. This summary is provided for the convenience of the reader
to provide a basic understanding of such embodiments and does not
wholly define the breadth of the disclosure. This summary is not an
extensive overview of all contemplated embodiments, and is intended
to neither identify key or critical elements of all embodiments nor
to delineate the scope of any or all aspects. Its sole purpose is
to present some concepts of one or more embodiments in a simplified
form as a prelude to the more detailed description that is
presented later. For convenience, the term "some embodiments" may
be used herein to refer to a single embodiment or multiple
embodiments of the disclosure.
[0012] Some embodiments disclosed herein include a method for
clustering multimedia content. The method comprises detecting at
least one clustering trigger event related to at least one
multimedia content element to be clustered; generating at least one
signature for the at least one multimedia content element, each
signature representing at least a portion of the at least one
multimedia content element; determining, based on the generated at
least one signature, at least one multimedia content element
cluster, wherein each multimedia content element cluster includes a
plurality of clustered multimedia content elements sharing at least
one common concept with the at least one multimedia content
element; and adding, to each determined cluster, the at least one
multimedia content element.
[0013] Some embodiments disclosed herein also include a
non-transitory computer readable medium having stored thereon
instructions for causing a processing circuitry to execute a
process, the process comprising: detecting at least one clustering
trigger event related to at least one multimedia content element to
be clustered; generating at least one signature for the at least
one multimedia content element, each signature representing at
least a portion of the at least one multimedia content element;
determining, based on the generated at least one signature, at
least one multimedia content element cluster, wherein each
multimedia content element cluster includes a plurality of
clustered multimedia content elements sharing at least one common
concept with the at least one multimedia content element; and
adding, to each determined cluster, the at least one multimedia
content element.
[0014] Some embodiments disclosed herein also include a system for
clustering multimedia content. The system comprises a processing
circuitry; and a memory, the memory containing instructions that,
when executed by the processing circuitry, configure the system to:
detect at least one clustering trigger event related to at least
one multimedia content element to be clustered; generate at least
one signature for the at least one multimedia content element, each
signature representing at least a portion of the at least one
multimedia content element; determine, based on the generated at
least one signature, at least one multimedia content element
cluster, wherein each multimedia content element cluster includes a
plurality of clustered multimedia content elements sharing at least
one common concept with the at least one multimedia content
element; and add, to each determined cluster, the at least one
multimedia content element.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] The subject matter disclosed herein is particularly pointed
out and distinctly claimed in the claims at the conclusion of the
specification. The foregoing and other objects, features, and
advantages of the disclosed embodiments will be apparent from the
following detailed description taken in conjunction with the
accompanying drawings.
[0016] FIG. 1 is a network diagram utilized to describe the various
disclosed embodiments.
[0017] FIG. 2 is a flowchart illustrating a method for clustering
multimedia content elements according to an embodiment.
[0018] FIG. 3 is a block diagram depicting the basic flow of
information in the signature generator system.
[0019] FIG. 4 is a diagram showing the flow of patches generation,
response vector generation, and signature generation in a
large-scale speech-to-text system.
[0020] FIG. 5 is a block diagram illustrating a clustering system
according to an embodiment.
DETAILED DESCRIPTION
[0021] It is important to note that the embodiments disclosed
herein are only examples of the many advantageous uses of the
innovative teachings herein. In general, statements made in the
specification of the present application do not necessarily limit
any of the various claimed embodiments. Moreover, some statements
may apply to some inventive features but not to others. In general,
unless otherwise indicated, singular elements may be in plural and
vice versa with no loss of generality. In the drawings, like
numerals refer to like parts through several views.
[0022] The various disclosed embodiments include a method and
system for clustering multimedia content elements (MMCEs). The
clustering allows for organizing and searching of multimedia
content elements based on common concepts. In an example
embodiment, multimedia content elements to be clustered are
obtained. For each multimedia content element, at least one
signature is generated. Based on the signatures generated for each
multimedia content element a search tag may be generated. In an
embodiment, a plurality of search tags can be generated for each
multimedia content element. Each of the multimedia content elements
is added to a multimedia content element cluster based on the
generated at least one signature, the generated tags, or both. Each
multimedia content element cluster includes a plurality of
multimedia content elements having at least one concept in
common.
[0023] In an example embodiment, the common concept among
multimedia content elements of a multimedia content element cluster
may be a collection of signatures representing elements of the
unstructured data and metadata describing the concept. The common
concept may represent an item or aspect of the multimedia content
elements such as, but not limited to, an object, a person, an
animal, a pattern, a color, a background, a character, a sub
textual aspect (e.g., an aspect indicating sub textual information
such as activities or actions being performed, relationships among
individuals shown such as teams or members of an organization,
etc.), a meta aspect indicating information about the multimedia
content element itself (e.g., an aspect indicating that an image is
a "selfie" taken by a person in the image), words, sounds, voices,
motions, combinations thereof, and the like. Multimedia content
elements may share a common concept when each of the multimedia
content elements is associated with at least one signature, at
least one portion of a signature, at least one tag, or a
combination thereof, that is common to all of the multimedia
content elements sharing a common concept.
[0024] In an embodiment, the at least one multimedia content
element may be clustered based further on metadata associated with
a user. The user may be, but is not limited to, a user of a user
device in which the at least one multimedia content element is
stored. In another embodiment, the clustering may include
searching, based on the generated at least one signature, for
clusters including multimedia content elements sharing a common
concept. The searching may further include comparing the generated
at least one signature to signatures of a plurality of multimedia
content element clusters to determine matching signatures, where
the at least one multimedia content element may be added to a
cluster associated with matching signatures.
[0025] FIG. 1 shows an example network diagram 100 utilized to
describe the various embodiments disclosed herein. The example
network diagram includes user device 110, a clustering system 130,
a database 150, and a deep content classification (DCC) system 160,
communicatively connected via a network 120.
[0026] The network 120 is used to communicate between different
components of the network diagram 100. The network 120 may be the
Internet, the world-wide-web (WWW), a local area network (LAN), a
wide area network (WAN), a metro area network (MAN), and other
networks capable of enabling communication between the components
of the network diagram 100.
[0027] The user device 110 may be, but is not limited to, a
personal computer (PC), a personal digital assistant (PDA), a
mobile phone, a smart phone, a tablet computer, a wearable
computing device, a smart television, and other devices configured
for storing, viewing, and sending multimedia content elements.
[0028] The user device 110 may have installed thereon an
application (app) 115. The application 115 may be downloaded from
applications repositories such as, but not limited to, the
AppStore.RTM., Google Play.RTM., or any other repositories storing
applications. The application 115 may be pre-installed in the user
device 110. The application 115 may be, but is not limited to, a
mobile application, a virtual application, a web application, a
native application, and the like. In an example implementation, the
application 115 may be a web browser.
[0029] In an embodiment, the clustering system 130 is configured to
cluster multimedia content elements. The clustering system 130
typically includes, but is not limited to, a processing circuitry
connected to a memory, the memory containing instructions that,
when executed by the processing circuitry, configure the clustering
system 130 to at least perform clustering of multimedia content
elements as described herein. In an embodiment, the processing
circuitry may be realized as an array of at least partially
statistically independent computational core, the properties of
each core being set independently of the properties of each other
core. An example block diagram of the clustering system 130 is
described further herein below with respect to FIG. 5.
[0030] In an embodiment, the clustering system 130 is configured to
initiate clustering of the multimedia content elements upon
detection of at least one clustering trigger event. The at least
one clustering trigger event may include, but is not limited to,
receipt of a request to cluster a multimedia content element or a
plurality of multimedia content elements.
[0031] To this end, in an embodiment, the clustering system 130 is
configured to receive, from the user device 110, a request to
cluster a multimedia content element or a plurality of multimedia
content elements. Clustering each of the multimedia content
elements may include generating a cluster based on two or more
multimedia content elements, or adding a multimedia content element
to an existing cluster. The request may include, but is not limited
to, the multimedia content element or plurality of multimedia
content elements, an identifier of one or more of the multimedia
content elements, an indicator of a location of one or more of the
multimedia content elements (e.g., an indicator of a location in
the database 150 in which one or more of the multimedia content
elements is stored), a combination thereof, and the like. As
non-limiting examples, the request may include an image, an
identifier used for finding the image, a location of the image in a
storage (e.g., one of the data sources 160), or a combination
thereof.
[0032] Each multimedia content element may include, but is not
limited to, images, graphics, video streams, video clips, audio
streams, audio clips, video frames, photographs, images of signals
(e.g., spectrograms, phasograms, scalograms, etc.), combinations
thereof, portions thereof, and the like. The multimedia content
elements may be, e.g., captured via the user device 110.
[0033] In an optional embodiment, the clustering system 130 is
further communicatively connected to a signature generator system
(SGS) 140. In a further embodiment, the clustering system 130 may
be configured to send, to the signature generator system 140, the
multimedia content element to be clustered. The signature generator
system 140 is configured to generate signatures based on the
multimedia content element and to send the generated signatures to
the clustering system 130. In another embodiment, the clustering
system 130 may be configured to generate the signatures. Generation
of signatures based on multimedia content elements is described
further herein below with respect to FIGS. 3 and 4. In another
embodiment, the signatures generated for more than one multimedia
content element may be clustered.
[0034] In an optional embodiment, the clustering system 130 is
further communicatively connected to a deep-content classification
(DCC) system 160. The DCC system 160 may be configured to
continuously create a knowledge database for multimedia data. To
this end, the DCC system 160 may be configured to initially receive
a large number of multimedia content elements to create a knowledge
database that is condensed into concept structures that are
efficient to store, retrieve, and check for matches. As new
multimedia content elements are collected by the DCC system 160,
they are efficiently added to the knowledge base and concept
structures such that the resource requirement is generally
sub-linear rather than linear or exponential. The DCC system 160 is
configured to extract patterns from each multimedia content element
and selects the important/salient patterns for the creation of
signatures thereof. A process of inter-matching between the
patterns followed by clustering, is followed by reduction of the
number of signatures in a cluster to a minimum that maintains
matching and enables generalization to new multimedia content
elements. Metadata respective of the multimedia content elements is
collected, thereby forming, together with the reduced clusters, a
concept structure.
[0035] In a further embodiment, the clustering system 130 may be
configured to obtain, from the DCC system 160, at least one concept
structure matching each of the multimedia content elements to be
clustered. In yet a further embodiment, the clustering system 130
may be configured to query the DCC system 160 for the at least one
matching concept structure. The query may be made with respect to
the signatures for the multimedia content elements to be clustered.
In an embodiment, multimedia content elements associated with the
obtained matching concept structures may be utilized as for
determining clusters to which the multimedia content elements to be
clustered are added.
[0036] In an optional embodiment, the clustering system 130 is
configured to generate, based on the signatures for the multimedia
content elements to be clustered, at least one tag for each
multimedia content element. Each tag is a textual index term
assigned to content. The generated tags are searchable (e.g., by
the user device 110 or other user devices), and may be included in
metadata for the multimedia content element. In an embodiment, the
tags may be generated based on metadata of the obtained at least
one concept structure. As a non-limiting example, if metadata of an
obtained concept structure includes the word "Superman.RTM.", the
generated tags may include the textual term "Superman.RTM.".
[0037] In an embodiment, based on the generated signatures, the
generated tags, or both, the clustering system 130 is configured to
determine at least one multimedia content element cluster for each
multimedia content element to be clustered. Each determined
multimedia content element cluster includes a plurality of
multimedia content elements sharing at least one common concept
with each other and with the multimedia content element or
plurality of multimedia content elements to be clustered. The
common concept of a multimedia content element may be a collection
of signatures representing elements of the unstructured data and
metadata describing the concept. The common concept may represent
an item or aspect of the multimedia content element such as, but
not limited to, an object, a person, an animal, a pattern, a color,
a background, a character, a sub textual aspect, a meta aspect,
words, sounds, voices, motions, combinations thereof, and the like.
In a further embodiment, multimedia content elements may share a
common concept when each of the multimedia content elements is
associated with at least one signature, at least one portion of a
signature, at least one tag, or a combination thereof, that is
common to the multimedia content elements sharing a common
concept.
[0038] It should be noted that multiple multimedia content element
clusters may be determined for each multimedia content element. As
a non-limiting example, for an image showing a "selfie" of a person
(i.e., an image showing the person that is captured by the person)
taken on the beach, multimedia content element clusters including
multimedia content elements showing the person, selfies of the
person or of other people, and beach scenery may be determined, and
the selfie image may be clustered into each of the determined
multimedia content element clusters.
[0039] In a further embodiment, determining the multimedia content
element clusters may include comparing the generated signatures or
the generated tags to signatures or tags, respectively, of a
plurality of multimedia content element clusters. Each determined
multimedia content element cluster may be, e.g., a cluster having
signatures or tags that match the generated signatures or tags
above a predetermined threshold. As a non-limiting example, a
signature is generated based on a video showing a stand-up comedy
performance by the comedian Jerry Seinfeld, and tags including
"Jerry Seinfeld" and "stand-up comedy" are generated based on the
generated signature. In yet a further embodiment, the determined
multimedia content element clusters may include one cluster for
each tag.
[0040] In yet a further embodiment, one or more of the multimedia
content element clusters may be included in or associated with a
concept structure such that the comparison may include comparing
the generated signatures or the generated tags to a reduced set of
signatures or tags of the concept structure, respectively. In a
further embodiment, the multimedia content elements to be clustered
may be added to the concept structures having matching multimedia
content element clusters.
[0041] In another embodiment, if no existing multimedia content
element clusters having concepts in common with the multimedia
content element can be found (e.g., if no signatures or tags match
the generated signatures or tags above a predetermined threshold),
the clustering system 130 may be configured to generate a
multimedia content element cluster including the multimedia content
elements to be clustered. Generating the multimedia content element
cluster may include, but is not limited to, searching in one or
more data sources (e.g., the user device 110, the database 150, or
other data sources not shown that may be accessible over, e.g., the
Internet) to identify multimedia content elements sharing common
concepts with the multimedia content element. The searching may be
based on the generated signatures, the generated tags, or both. The
identified multimedia content elements are clustered with the
multimedia content element to be clustered, and the resulting
cluster may be stored in, e.g., the database 150. In a further
embodiment, the generated cluster may further include the generated
tags.
[0042] It should be noted that using signatures for tagging
multimedia content elements, clustering multimedia content
elements, or both, ensures more accurate clustering of multimedia
content than, for example, when using manually provided metadata
(e.g., tags provided by users). For instance, in order to cluster
an image of a sports car into an appropriate cluster, it may be
desirable to locate a car of a particular model. However, in most
cases the model of the car would not be part of the metadata
associated with the multimedia content (image). Moreover, the car
shown in an image may be at angles different from the angles of a
specific photograph of the car that is available as a search item.
The signature generated for that image would enable accurate
recognition of the model of the car because the signatures
generated for the multimedia content elements, according to the
disclosed embodiments, allow for recognition and classification of
multimedia content elements, such as, content-tracking, video
filtering, multimedia taxonomy generation, video fingerprinting,
speech-to-text, audio classification, element recognition,
video/image search and any other application requiring
content-based signatures generation and matching for large content
volumes such as, web and other large-scale databases.
[0043] The database 150 stores multimedia content elements,
clusters of multimedia content elements, or both. In the example
network diagram 100 shown in FIG. 1, the clustering system 130
communicates with the database 150 through the network 120. In
other non-limiting configurations, the clustering system 130 may be
directly connected to the database 150. The database 150 may be
accessible to, e.g., the user device 110, other user devices (not
shown), or both, thereby allowing for retrieval of clusters from
the database 150 by such user devices.
[0044] It should also be noted that the signature generator system
140 and the DCC system 160 are shown in FIG. 1 as being directly
connected to the clustering system 130 merely for simplicity
purposes and without limitation on the disclosed embodiments. The
signature generator system 140, the DCC system 160, or both, may be
included in the clustering system 130 or communicatively connected
to the clustering system 130 over, e.g., the network 120, without
departing from the scope of the disclosure.
[0045] It should be further noted that the clustering is described
as being performed by the clustering system 130 merely for
simplicity purposes and without limitation on the disclosed
embodiments. The clustering may be equally performed locally by,
e.g., the user device 110, without departing from the scope of the
disclosure. In such a case, the user device 110 may include the
clustering system 130, the signature generator system 140, the DCC
system 160, or any combination thereof, or may otherwise be
configured to perform any or all of the processes performed by such
systems. Further, local clustering by the user device 110 may be
based on multimedia content clusters stored locally on the user
device 110.
[0046] As a non-limiting example for local clustering by the user
device 110, the clustering may be based on clusters of images in a
photo library stored on the user device 110 such that new images
may be clustered in real-time and, therefore, subsequently searched
by a user of the user device 110. Thus, when, for example, the user
of the user device 110 captures an image of his dog named "Lucky,"
the user device 110 may cluster the image with other images of the
dog Lucky stored in the user device 110 such that, when the user
searches through the user device 110 for images using the query
"lucky," the captured image is returned along with other clustered
images of the dog Lucky.
[0047] FIG. 2 is an example flowchart 200 illustrating a method for
clustering of multimedia content elements according to an
embodiment. In another embodiment, the method may be performed in
response to a request to cluster one or more multimedia content
elements.
[0048] At S205, a clustering trigger event is detected. The
clustering trigger event may be or may include, but is not limited
to, receiving a request to cluster at least one multimedia content
element.
[0049] At S210, at least one multimedia content element to be
clustered is obtained. In an embodiment, the at least one
multimedia content element may be obtained based on a request to
cluster the at least one multimedia content element. The request
may include the at least one multimedia content element to be
clustered, an identifier of one or more of the at least one
multimedia content element, an indicator of a location of one or
more of the at least one multimedia content element, or a
combination thereof.
[0050] At S220, at least one signature is generated for each
multimedia content element. Each generated signature may be robust
to noise and distortion. In an embodiment, the signatures are
generated by a signature generator system as described further
herein below with respect to FIGS. 3 and 4. In another embodiment,
S220 may include sending, to a signature generator system (e.g.,
the signature generator system 140, FIG. 1), the multimedia content
element and receiving, from the signature generator system, the at
least one signature generated for each multimedia content
element.
[0051] At optional S230, at least one tag is generated for the at
least one multimedia content element based on the generated at
least one signature. Each tag is a textual index term assigned to
the multimedia content element as described further herein above.
As non-limiting examples of tags, the tag "me" may be assigned to
an image of the user's face, the tag "my dog" may be assigned to an
image of a dog, and the tag "my dog and I" may be assigned to an
image featuring both the user and a dog.
[0052] In an embodiment, S230 may include comparing the generated
at least one signature to signatures of a plurality of multimedia
content elements having assigned predetermined tags. In a further
embodiment, tags of multimedia content elements having signatures
that match one or more of the generated at least one signature may
be generated as tags for the multimedia content element.
[0053] In another embodiment, the at least one tag may be generated
based on metadata of concept structures matching the at least one
multimedia content element to be clustered. To this end, in a
further embodiment, S230 may further include obtaining, from a DCC
system (e.g., the DCC system 160, FIG. 1), at least one concept
structure matching each multimedia content element to be clustered.
In yet a further embodiment, S230 may further include querying the
DCC system with respect to the signatures for each multimedia
content element to be clustered.
[0054] At S240, at least one multimedia content element cluster is
determined. Each determined multimedia content element cluster
includes a plurality of multimedia content elements sharing a
common concept. Each of the at least one multimedia content element
also shares the common concept of the multimedia content element
cluster. The common concept of a multimedia content element may be
may be a collection of signatures representing elements of the
unstructured data and metadata describing the concept. The common
concept may represent an item or aspect in the multimedia content
element such as, but not limited to, an object, a person, an
animal, a pattern, a color, a background, a character, a sub
textual aspect, a meta aspect, words, sounds, voices, motions,
combinations thereof, and the like. Multimedia content elements may
share a common concept when each of the multimedia content elements
is associated with at least one signature, at least one portion of
a signature, at least one tag, or a combination thereof, that is
common to all of the multimedia content elements sharing a common
concept.
[0055] As non-limiting examples, the common concept may represent,
e.g., a Labrador retriever dog shown in images or videos, a voice
of the actor Daniel Radcliffe that can be heard in audio or videos,
a motion including swinging of a baseball bat shown in videos, a
subtext of playing chess, an indication that an image is a
"selfie," and the like.
[0056] The common concept may be further based on levels of
granularity. For example, the common concept may be related to cats
generally such that any cats shown or heard in multimedia content
elements is considered a common concept, or may be related to a
particular cat such that only visual or audio representations of
that cat are considered to be a common concept. Such granularity
may depend on, e.g., a threshold for matching signatures, tags, or
both, such that higher thresholds result in more granular
results.
[0057] In another embodiment, the determined at least one
multimedia content element may include only multimedia content
elements of the same type as the obtained multimedia content
element. For example, if the obtained multimedia content element is
an image, only other images having a common concept may be
determined. In yet another embodiment, multimedia content elements
of different types may be determined. Which types of multimedia
content elements may be determined may be based on, e.g., one or
more rules.
[0058] As a non-limiting example of a common concept, for an image
showing a person wearing a parachute with the sky in the
background, a tag for the image may be "skydiving." The common
concept may be the sub textual aspect "skydiving" indicating an
activity that the person shown in the image is performing. Other
multimedia content elements showing or otherwise illustrating
people skydiving may also be associated with the tag "skydiving"
and, therefore, the sub textual aspect "skydiving" is a common
concept of the multimedia content elements.
[0059] As another non-limiting example of a common concept, for an
audio clip in which a user recites information that the user wishes
to reference later, a portion of a signature generated for the
audio clip may be related to the meta aspect "note to self." In
particular, a portion of the signature may be generated based on
the words "note to self" spoken at the beginning of the audio clip.
Other multimedia content elements may also have portions of
signatures related to the concept "note to self" (e.g., other
content illustrating the words "note to self" or similar phrases)
and, therefore, the meta aspect "note to self" is a common concept
of the multimedia content elements. In a further example, only
multimedia content elements related to the particular user heard in
the obtained multimedia content element (i.e., multimedia content
elements featuring a voice of the user who recorded the obtained
multimedia content element) may be determined as having a concept
in common with the obtained multimedia content element such that
the cluster includes only notes to self by the same user.
[0060] In an embodiment, if no existing multimedia content element
clusters having a common concept with the multimedia content
element can be found (e.g., if no multimedia content element
clusters are associated with signatures or tags matching the
generated at least one signature or the generated at least one tag
above a predetermined threshold), S240 may include generating a new
multimedia content element cluster. In a further embodiment,
generating the new multimedia content element cluster may include
searching in one or more data sources to identify multimedia
content elements sharing a common concept with the obtained
multimedia content element. The identified multimedia content
elements may be clustered with the obtained multimedia content
element.
[0061] At S250, the at least one multimedia content element is
added to the determined or generated new multimedia content element
cluster. In an embodiment, S250 may further include storing the at
least one multimedia content element cluster with the added at
least one multimedia content element in a storage (e.g., the
database 150 of FIG. 1, a data source such as a web server, etc.).
As a non-limiting example, the cluster may be stored in a server of
a social media platform, thereby enabling other users to find the
cluster during searches. Each cluster may be stored separately such
that different groupings of multimedia content elements are stored
in separate locations. For example, different clusters of
multimedia content elements may be stored in different folders.
[0062] At S260, it is determined if additional multimedia content
elements are to be clustered and, if so, execution continues with
S205; otherwise, execution terminates.
[0063] Clustering of the multimedia content elements allows for
organizing the multimedia content elements based on subject matter
represented by various concepts. Such organization may be useful
for, e.g., organizing photos captured by a user of a smart phone
based on common subject matter. As a non-limiting example, images
showing dogs, a football game, and food may be organized into
different collections and, for example, stored in separate folders
on the smart phone. Such organization may be particularly useful
for social media or other content sharing applications, as
multimedia content being shared can be organized and shared with
respect to content. Additionally, such organization may be useful
for subsequent retrieval, particularly when the organization is
based on tags. As noted above, using signatures to classify the
multimedia content elements typically results in more accurate
identification of multimedia content elements sharing similar
content.
[0064] It should be noted that the embodiments described herein
above with respect to FIG. 2 are discussed as including clustering
multimedia content elements in series merely for simplicity
purposes and without limitations on the disclosure. Multiple
multimedia content elements may be clustered in parallel without
departing from the scope of the disclosure. Further, the clustering
method discussed above can be performed by the clustering system
130, or locally by a user device (e.g., the user device 110, FIG.
1).
[0065] FIGS. 3 and 4 illustrate the generation of signatures for
the multimedia content elements by the signature generator system
140 according to an embodiment. An example high-level description
of the process for large scale matching is depicted in FIG. 3. In
this example, the matching is for a video content.
[0066] Video content segments 2 from a Master database (DB) 6 and a
Target DB 1 are processed in parallel by a large number of
independent computational Cores 3 that constitute an architecture
for generating the Signatures (hereinafter the "Architecture").
Further details on the computational Cores generation are provided
below. The independent Cores 3 generate a database of Robust
Signatures and Signatures 4 for Target content-segments 5 and a
database of Robust Signatures and Signatures 7 for Master
content-segments 8. An exemplary and non-limiting process of
signature generation for an audio component is shown in detail in
FIG. 4. Finally, Target Robust Signatures and/or Signatures are
effectively matched, by a matching algorithm 9, to Master Robust
Signatures and/or Signatures database to find all matches between
the two databases.
[0067] To demonstrate an example of the signature generation
process, it is assumed, merely for the sake of simplicity and
without limitation on the generality of the disclosed embodiments,
that the signatures are based on a single frame, leading to certain
simplification of the computational cores generation. The Matching
System is extensible for signatures generation capturing the
dynamics in-between the frames.
[0068] The Signatures' generation process is now described with
reference to FIG. 4. The first step in the process of signatures
generation from a given speech-segment is to breakdown the
speech-segment to K patches 14 of random length P and random
position within the speech segment 12. The breakdown is performed
by the patch generator component 21. The value of the number of
patches K, random length P and random position parameters is
determined based on optimization, considering the tradeoff between
accuracy rate and the number of fast matches required in the flow
process of the context server 130 and SGS 140. Thereafter, all the
K patches are injected in parallel into all computational Cores 3
to generate K response vectors 22, which are fed into a signature
generator system 23 to produce a database of Robust Signatures and
Signatures 4.
[0069] In order to generate Robust Signatures, i.e., Signatures
that are robust to additive noise L (where L is an integer equal to
or greater than 1) by the Computational Cores 3 a frame is injected
into all the Cores 3. Then, Cores 3 generate two binary response
vectors: {right arrow over (S)} which is a Signature vector, and
{right arrow over (RS)} which is a Robust Signature vector.
[0070] For generation of signatures robust to additive noise, such
as White-Gaussian-Noise, scratch, etc., but not robust to
distortions, such as crop, shift and rotation, etc., a core
Ci={ni}(1.ltoreq.i.ltoreq.L) may consist of a single leaky
integrate-to-threshold unit (LTU) node or more nodes. The node ni
equations are:
V i=j w ij k j## EQU00001## n i=.theta.(Vi-Thx)## EQU0001.2##
where, .theta. is a Heaviside step function; w.sub.ij is a coupling
node unit (CNU) between node i and image component j (for example,
grayscale value of a certain pixel j); kj is an image component T
(for example, grayscale value of a certain pixel j); Thx is a
constant Threshold value, where `x` is `S` for Signature and `RS`
for Robust Signature; and Vi is a Coupling Node Value.
[0071] The Threshold values Thx are set differently for Signature
generation and for Robust Signature generation. For example, for a
certain distribution of Vi values (for the set of nodes), the
thresholds for Signature (Ths) and Robust Signature (ThRs) are set
apart, after optimization, according to at least one or more of the
following criteria:
[0072] 1: For:
V.sub.i>Th.sub.RS
1-p(V>Th.sub.S)-1-(1-.epsilon.).sup.1<<1
[0073] i.e., given that/nodes (cores) constitute a Robust Signature
of a certain image I, the probability that not all of these I nodes
will belong to the Signature of same, but noisy image, is
sufficiently low (according to a system's specified accuracy).
[0074] 2:
p(V.sub.i>Th.sub.RS).apprxeq.1/L
[0075] i.e., approximately/out of the total L nodes can be found to
generate a Robust Signature according to the above definition.
[0076] 3: Both Robust Signature and Signature are generated for
certain frame i.
[0077] It should be understood that the generation of a signature
is unidirectional, and typically yields lossless compression, where
the characteristics of the compressed data are maintained but the
uncompressed data cannot be reconstructed. Therefore, a signature
can be used for the purpose of comparison to another signature
without the need of comparison to the original data. The detailed
description of the Signature generation can be found in U.S. Pat.
Nos. 8,326,775 and 8,312,031, assigned to the common assignee,
which are hereby incorporated by reference for all the useful
information they contain.
[0078] A Computational Core generation is a process of definition,
selection, and tuning of the parameters of the cores for a certain
realization in a specific system and application. The process is
based on several design considerations, such as:
[0079] (a) The Cores should be designed so as to obtain maximal
independence, i.e., the projection from a signal space should
generate a maximal pair-wise distance between any two cores'
projections into a high-dimensional space.
[0080] (b) The Cores should be optimally designed for the type of
signals, i.e., the Cores should be maximally sensitive to the
spatio-temporal structure of the injected signal, for example, and
in particular, sensitive to local correlations in time and space.
Thus, in some cases a core represents a dynamic system, such as in
state space, phase space, edge of chaos, etc., which is uniquely
used herein to exploit their maximal computational power.
[0081] (c) The Cores should be optimally designed with regard to
invariance to a set of signal distortions, of interest in relevant
applications.
[0082] A detailed description of the Computational Core generation
and the process for configuring such cores is discussed in more
detail in the above-referenced U.S. Pat. No. 8,655,801.
[0083] FIG. 5 is an example block diagram illustrating a clustering
system 130 implemented according to an embodiment. The clustering
system 130 includes a processing circuitry 510 coupled to a memory
520, a storage 530, and a network interface 540. In an embodiment,
the components of the clustering system 130 may be communicatively
connected via a bus 550.
[0084] The processing circuitry 510 may be realized as one or more
hardware logic components and circuits. For example, and without
limitation, illustrative types of hardware logic components that
can be used include field programmable gate arrays (FPGAs),
application-specific integrated circuits (ASICs),
Application-specific standard products (ASSPs), system-on-a-chip
systems (SOCs), general-purpose microprocessors, microcontrollers,
digital signal processors (DSPs), and the like, or any other
hardware logic components that can perform calculations or other
manipulations of information. In an embodiment, the processing
circuitry 510 may be realized as an array of at least partially
statistically independent computational cores. The properties of
each computational core are set independently of those of each
other core, as described further herein above.
[0085] The memory 520 may be volatile (e.g., RAM, etc.),
non-volatile (e.g., ROM, flash memory, etc.), or a combination
thereof. In one configuration, computer readable instructions to
implement one or more embodiments disclosed herein may be stored in
the storage 530.
[0086] In another embodiment, the memory 520 is configured to store
software. Software shall be construed broadly to mean any type of
instructions, whether referred to as software, firmware,
middleware, microcode, hardware description language, or otherwise.
Instructions may include code (e.g., in source code format, binary
code format, executable code format, or any other suitable format
of code). The instructions, when executed by the processing
circuitry 510, cause the processing circuitry 510 to perform the
various processes described herein. Specifically, the instructions,
when executed, cause the processing circuitry 510 to perform
clustering of multimedia content elements as described herein.
[0087] The storage 530 may be magnetic storage, optical storage,
and the like, and may be realized, for example, as flash memory or
other memory technology, CD-ROM, Digital Versatile Disks (DVDs), or
any other medium which can be used to store the desired
information.
[0088] The network interface 540 allows the clustering system 130
to communicate with the signature generator system 140 for the
purpose of, for example, sending multimedia content elements,
receiving signatures, and the like. Additionally, the network
interface 540 allows the clustering system 130 to communicate with
the user device 110 in order to obtain multimedia content elements
to be clustered.
[0089] It should be understood that the embodiments described
herein are not limited to the specific architecture illustrated in
FIG. 5, and other architectures may be equally used without
departing from the scope of the disclosed embodiments. In
particular, the clustering system 130 may further include a
signature generator system configured to generate signatures as
described herein without departing from the scope of the disclosed
embodiments.
[0090] It should be understood that any reference to an element
herein using a designation such as "first," "second," and so forth
does not generally limit the quantity or order of those elements.
Rather, these designations are generally used herein as a
convenient method of distinguishing between two or more elements or
instances of an element. Thus, a reference to first and second
elements does not mean that only two elements may be employed there
or that the first element must precede the second element in some
manner. Also, unless stated otherwise, a set of elements comprises
one or more elements.
[0091] As used herein, the phrase "at least one of" followed by a
listing of items means that any of the listed items can be utilized
individually, or any combination of two or more of the listed items
can be utilized. For example, if a step in a method is described as
including "at least one of A, B, and C," the step can include A
alone; B alone; C alone; A and B in combination; B and C in
combination; A and C in combination; or A, B, and C in
combination.
[0092] The various embodiments disclosed herein can be implemented
as hardware, firmware, software, or any combination thereof.
Moreover, the software is preferably implemented as an application
program tangibly embodied on a program storage unit or computer
readable medium consisting of parts, or of certain devices and/or a
combination of devices. The application program may be uploaded to,
and executed by, a machine comprising any suitable architecture.
Preferably, the machine is implemented on a computer platform
having hardware such as one or more central processing units
("CPUs"), a memory, and input/output interfaces. The computer
platform may also include an operating system and microinstruction
code. The various processes and functions described herein may be
either part of the microinstruction code or part of the application
program, or any combination thereof, which may be executed by a
CPU, whether or not such a computer or processor is explicitly
shown. In addition, various other peripheral units may be connected
to the computer platform such as an additional data storage unit
and a printing unit. Furthermore, a non-transitory computer
readable medium is any computer readable medium except for a
transitory propagating signal.
[0093] All examples and conditional language recited herein are
intended for pedagogical purposes to aid the reader in
understanding the principles of the disclosed embodiment and the
concepts contributed by the inventor to furthering the art, and are
to be construed as being without limitation to such specifically
recited examples and conditions. Moreover, all statements herein
reciting principles, aspects, and embodiments of the disclosed
embodiments, as well as specific examples thereof, are intended to
encompass both structural and functional equivalents thereof.
Additionally, it is intended that such equivalents include both
currently known equivalents as well as equivalents developed in the
future, i.e., any elements developed that perform the same
function, regardless of structure.
* * * * *