U.S. patent application number 15/293679 was filed with the patent office on 2018-04-19 for image processing system and method.
The applicant listed for this patent is Cloudera, Inc.. Invention is credited to Grant Custer, Micha Gorelick, Hilary Mason.
Application Number | 20180107899 15/293679 |
Document ID | / |
Family ID | 61872669 |
Filed Date | 2018-04-19 |
United States Patent
Application |
20180107899 |
Kind Code |
A1 |
Gorelick; Micha ; et
al. |
April 19, 2018 |
IMAGE PROCESSING SYSTEM AND METHOD
Abstract
An image processing system involves a camera, at least one
processor associated with the camera, non-transitory storage, a
lexical database of terms and image classification software. The
image processing system uses the image classification software to
assign hyponyms and associated probabilities to an image and then
builds a subset hierarchical tree of hypernyms from the lexical
database of terms. The processor then scores the hypernyms and
identifies at least one hypernym for the image that has a score
that is calculated to have a value that is greater than one of: a
pre-specified threshold score, or all other calculated level scores
within the subset hierarchical tree. The associated methods are
also disclosed.
Inventors: |
Gorelick; Micha; (Brooklyn,
NY) ; Mason; Hilary; (Brooklyn, NY) ; Custer;
Grant; (Brooklyn, NY) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Cloudera, Inc. |
Palo Alto |
CA |
US |
|
|
Family ID: |
61872669 |
Appl. No.: |
15/293679 |
Filed: |
October 14, 2016 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06K 9/6219 20130101;
G06F 40/284 20200101; G06F 40/44 20200101; G06F 16/51 20190101;
G06K 9/72 20130101; G06F 16/50 20190101; G06F 16/5866 20190101;
G06F 16/901 20190101; G06F 16/55 20190101; G06K 9/6201 20130101;
G06K 9/6277 20130101; G06F 40/58 20200101; G06F 16/3344 20190101;
G06F 16/41 20190101; G06F 16/285 20190101; G06K 9/6215
20130101 |
International
Class: |
G06K 9/62 20060101
G06K009/62; G06F 17/30 20060101 G06F017/30; G06K 9/72 20060101
G06K009/72 |
Claims
1. An image processing system comprising: a camera; at least one
processor associated with the camera; non-transitory storage,
associated with the camera and accessible to the processor, the
non-transitory storage containing therein an image obtained via the
camera, and programming executable by the at least one processor; a
lexical database of terms, accessible to the processor, wherein the
lexical database of terms is arranged in a hierarchy including a
single root node, multiple leaf nodes, and multiple hypernym nodes,
located between the root node and at least one leaf node and
arranged such that an individual hypernym node will have either the
root node or another hypernym node as a parent node, and at least
one leaf node or hypernym node as a child node, wherein the root
node is associated with a term in the hierarchy of greatest
generality, wherein each of the leaf nodes has an associated
hyponym term, the hyponym terms representing terms in the hierarchy
of greatest specificity, and wherein each of the hypernym nodes is
associated with a term that is more specific than its parent node
and less specific than each of its child nodes, and wherein each of
the hyponym terms has an assigned value between 0 and 1; image
classification software which, when executed by the at least one
processor will classify the image, using contents of a visual
database having multiple images therein with each of the images
having at least one hyponym from the lexical database of terms
associated with it so as to assign to the image at least two
hyponym terms and a specific probability value representing a
probability that each of the at least two hyponym terms accurately
describes content of the image; wherein, when the programming is
executed by the at least one processor, the processor will, for the
image i) build a subset hierarchical tree, from the lexical
database of terms, containing the assigned at least two hyponym
terms, all hyponym terms sharing a common parent with the assigned
at least two hyponym terms, and all hypernyms within the hierarchy
between the at least two hyponyms and the root node, ii) calculate
a first level score for each hypernym in the subset hierarchical
tree, that is a first level hypernym because it is directly
connected to a leaf node, using a specified scoring function and
the assigned specific probability value for each hyponym; iii)
calculate a second level score for each hypernym that is a second
level hypernym, because it is directly connected to at least one
first level hypernym, using first level scores of its child nodes
in the specified scoring function, iv) calculate additional level
scores for each additional level of parent hypernyms that are above
the second level hypernyms using calculated scores for all
immediate child hypernyms of each parent hypernym in the specified
scoring function, and v) identify at least one hypernym for the
image, the at least one hypernym being a hypernym associated with a
node, other than the root node, that has a specific level score
that is calculated to have a value that is greater than one of: a
pre-specified threshold score, or all other calculated level scores
within the subset hierarchical tree.
2. The image processing system of claim 1, wherein the image is a
classified image that is one of multiple classified images and
wherein, when the programming is executed by the at least one
processor following classification but prior to "i)", the processor
will, for the multiple classified images combine their individual
leaf node probabilities, on an individual hypernym basis, as a
normalized log sum of the individual probabilities; and treat the
multiple classified images as the image for purposes of "i)"
through "v)".
3. The image processing system of claim 2, further comprising: a
user interface via which a user can input a query, wherein, upon
receipt of the query, the processor will apply a clustering
algorithm to a set of the multiple classified images, selected
based upon hyponym scores, to determine at least one similarity
among the set.
4. The image processing system of claim 3, wherein the clustering
algorithm is one of a Leacock-Chodorow similarity algorithm, a
Wu-Palmer similarity algorithm, a Resnik similarity algorithm, a
Jiang & Conrath similarity algorithm, a Lin similarity
algorithm, or a Nguyen and Al-Mubaid similarity algorithm.
5. The image processing system of claim 1, wherein the lexical
database of terms is one or more of: WordNet, eXtended WordNet,
FrameNet, BabelNet, Malayalam WordNet, Chinese Wordnet, WordNet
Libre du Francais, Just Another WordNet Subset ("JAWS"),
IndoWordNet, MultiWordNet, EuroWordNet, Global Wordnet, BalkaNet,
Russian WordNet, FinnWordNet, GermaNet, OpenWN-PT, plWordNet,
PolNet, or BulNet.
6. The image processing system of claim 5, wherein the visual
database is ImageNet.
7. The image processing system of claim 6, wherein the image
classification software is constructed according to an architecture
corresponding to one of: GoogLeNet or AlexNet.
8. The image processing system of claim 1, wherein the visual
database is ImageNet.
9. The image processing system of claim 8, wherein the image
classification software is constructed according to an architecture
corresponding to one of: GoogLeNet or AlexNet.
10. The image processing system of claim 1, wherein the camera, at
least one processor, and non-transitory storage are housed within
one of: an autonomous undersea vehicle, an autonomous land vehicle,
an autonomous flying vehicle, a game/trail camera unit, or a
smartphone.
11. The image processing system of claim 1, wherein the specified
scoring function is calculated, on a per node basis, as: Score (
node ) = [ [ Sum of probabilities of all immediate children ] [ log
10 ( minimum distance to a leaf node + 2 ) .times. ( total number
of immediate child nodes ) ] ] . ##EQU00005##
12. The image processing system of claim 1, wherein the specified
scoring function is calculated, on a per node basis, as: Score (
node ) = [ [ Sum of probabilities of all immediate children ] [
total number of immediate child nodes ] ] . ##EQU00006##
13. The image processing system of claim 1, wherein the specified
scoring function is calculated, on a per node basis, as:
Score(node)=[g(node).times.log.sub.10(sum of g(node) for all images
in visual database)] where g(node)=average probability of all
directly connected child nodes.
14. The image processing system of claim 1, wherein the specified
scoring function is calculated, on a per node basis, as: Score (
node ) = [ g ( node ) .times. log 10 ( popularity of term
associated with this node ) log 10 ( minimum distance to a leaf
node + 2 ) ] ##EQU00007## where g(node)=average probability of all
directly connected child nodes.
Description
FIELD OF THE INVENTION
[0001] This disclosure relates generally to computing and, more
particularly, to computerized image processing.
BACKGROUND
[0002] The advent of digital photography has dramatically reduced
the cost of taking photographic images relative to using film. As a
result, taking hundreds or even thousands of pictures is extremely
easy and cost effective.
[0003] However, that ability has a down side in that it takes
longer for a human to view and isolate those pictures that may be
of interest for some reason.
[0004] Attempts have been made to sort/group images using output
from classification software such as GoogLeNet. However, such
sorting is typically not accurate enough for use in certain
research applications, such as
identification/discovery/classification of new land and marine
creatures. because the output probabilities are often extremely
low, so significant human effort in sorting/grouping the images is
still required.
[0005] So, there is still a need for more accurate image processing
than is currently available.
SUMMARY
[0006] We have devised a system and approach that represents an
improvement to the field of computerized image processing.
[0007] Advantageously, our solution is broadly suitable for use
with any application for which there is a hierarchical taxonomy
available, for example in biology--Life, Domain, Kingdom, Phylum,
Class, Order, Family, Genus, Species, Sub-species, and for which a
set of images is also available where each image in the set is
annotated by one or more of the labels at a most specific level of
the taxonomy (e.g., grey wolf, basset hound, Indian elephant,
etc.).
[0008] One aspect of this disclosure involves an image processing
system. The image processing system includes a camera, at least one
processor associated with the camera, non-transitory storage,
associated with the camera and accessible to the processor. The
non-transitory storage has therein, at least an image obtained via
the camera, and programming executable by the at least one
processor.
[0009] The image processing system also includes a lexical database
of terms that is accessible to the processor. The lexical database
of terms is arranged in a hierarchy including a single root node,
multiple leaf nodes, and multiple hypernym nodes, located between
the root node and at least one leaf node and is further arranged
such that an individual hypernym node will have either: the root
node or another hypernym node as a parent node, and at least one
leaf node or hypernym node as a child node. In addition, the root
node is associated with a term in the hierarchy of greatest
generality. Also, each of the leaf nodes has an associated hyponym
term, the hyponym terms representing terms in the hierarchy of
greatest specificity, and each of the hypernym nodes is associated
with a term that is more specific than its parent node and less
specific than each of its child nodes. Finally, each of the hyponym
terms has an assigned value between 0 and 1.
[0010] The image processing system additionally includes image
classification software which, when executed by the at least one
processor will classify the image, using contents of a visual
database having multiple images therein with each of the images
having at least one hyponym from the lexical database of terms
associated with it so as to assign to the image at least two
hyponym terms and a specific probability value representing a
probability that each of the at least two hyponym terms accurately
describes content of the image.
[0011] When the programming is executed by the at least one
processor, the processor will, for the image: i) build a subset
hierarchical tree, from the lexical database of terms, containing
the assigned at least two hyponym terms, all hyponym terms sharing
a common parent with the assigned at least two hyponym terms, and
all hypernyms within the hierarchy between the at least two
hyponyms and the root node, ii) calculate a first level score for
each hypernym in the subset hierarchical tree, that is a first
level hypernym because it is directly connected to a leaf node,
using a specified scoring function and the assigned specific
probability value for each hyponym; iii) calculate a second level
score for each hypernym that is a second level hypernym, because it
is directly connected to at least one first level hypernym, using
first level scores of its child nodes in the specified scoring
function, iv) calculate additional level scores for each additional
level of parent hypernyms that are above the second level hypernyms
using calculated scores for all immediate child hypernyms of each
parent hypernym in the specified scoring function, and v) identify
at least one hypernym for the image, the at least one hypernym
being a hypernym associated with a node, other than the root node,
that has a specific level score that is calculated to have a value
that is greater than one of: a pre-specified threshold score, or
all other calculated level scores within the subset hierarchical
tree.
[0012] Another aspect of this disclosure involves an image
processing method that uses a processor to perform: A) classifying
an image, taken using a camera, using contents of a visual
database, stored in non-transitory storage, having multiple images
therein with each of the images having at least one hyponym
corresponding to a leaf node of a hierarchically-organized lexical
database of terms associated with it, by assigning to the image at
least two hyponym terms and a specific probability value for each
representing a probability that each of the at least two hyponym
terms accurately describes content of the image; B) building a
subset hierarchical tree from the hierarchically-organized lexical
database of terms, the subset hierarchical tree containing the
assigned at least two hyponym terms, all hyponym terms in the
hierarchically-organized lexical database of terms sharing a common
parent with the assigned at least two hyponym terms, and all
hypernyms within the hierarchically-organized lexical database of
terms connected between the at least two hyponyms and the root
node; C) calculating a first level score, using a specified scoring
function and the assigned specific probability value for each
hyponym, for each hypernym in the subset hierarchical tree that is
a first level hypernym because it is directly connected to a leaf
node; D) calculating a second level score for each hypernym that is
a second level hypernym, because it is directly connected to at
least one first level hypernym, using first level scores of its
child nodes in the specified scoring function; E) calculating
additional level scores for each additional level of parent
hypernyms that are above the second level hypernyms using
calculated scores for all immediate child hypernyms of each parent
hypernym in the specified scoring function; F) identifying at least
one hypernym for the image, the at least one hypernym being a
hypernym associated with a node, other than the root node, that has
a specific level score that is calculated to have a value that is
greater than one of: a pre-specified threshold score, or all other
calculated level scores within the subset hierarchical tree; and G)
linking the at least one identified hypernym to the image so that,
in response to a user query that identifies the at least one
identified hypernym, the image will be retrieved for the user in
response to the query.
[0013] The foregoing and following outlines rather generally the
features and technical advantages of one or more embodiments of
this disclosure in order that the following detailed description
may be better understood. Additional features and advantages of
this disclosure will be described hereinafter, which may form the
subject of the claims of this application.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] This disclosure is further described in the detailed
description that follows, with reference to the drawings, in
which:
[0015] FIG. 1 illustrates, in simplified form, one example variant
implementation of our image processing system solution along with
three suitable implementation environments;
[0016] FIG. 2 illustrates, in simplified form, another example
variant implementation of our image processing system solution for
the three implementation environments of FIG. 1;
[0017] FIG. 3 illustrates, in simplified form, an alternative
example variant of our image processing system similar to that of
FIG. 2;
[0018] FIG. 4 illustrates, in simplified form, an alternative
example variant of our image processing system similar to that of
FIGS. 1-3;
[0019] FIG. 5 illustrates in simplified form, one example of a
lexical database;
[0020] FIG. 6 illustrates, in simplified form, a representative
portion of a visual database suitable for use in an image
processing system as described herein in conjunction with the
lexical database of FIG. 5;
[0021] FIG. 7 illustrates, in simplified form, a representative
example of the output of an image that has been classified using
the image classification software;
[0022] FIG. 8 illustrates, in simplified overview form, the
operation of an image processing system to process the classified
image of FIG. 7;
[0023] FIG. 9 illustrates, in simplified form, a more concrete
example of our solution using WordNet as the lexical database and
GoogLeNet as the image classification software;
[0024] FIG. 10 is a flowchart illustrating, in simplified form, the
use of our image processing solution with a collection of images to
improve the processing of a user query; and
[0025] FIG. 11 illustrates, in simplified form, details of how
multiple classified images are combined in conjunction with the
process of FIG. 10.
DETAILED DESCRIPTION
[0026] This disclosure provides a technical solution to address the
aforementioned problems inherent with current computer image
processing for purposes of sorting and/or grouping of images. Our
technical solution improves the functioning of computer systems
that are constructed to perform those types of operations by
improving the accuracy and thereby making them more suitable for
use in identification/discovery/classification of new land and
marine creatures by addressing a problem that only arises in the
image processing art. Still; further, the systems and methods
described herein are particularly well suited for use with
autonomous exploration vehicles, for example, autonomous underwater
vehicles (e.g., mini submarines), small autonomous land vehicles,
or autonomous aerial vehicles (e.g., drones, UAVs) that can be sent
out to explore a region and take many thousands of pictures (e.g.,
still images or significant frames from video) for purposes of, for
example, new creature discovery and identification. Finally,
systems and methods embodying the teachings described herein make
unconventional use of existing technology in order to make such
activities more viable than is possible with the existing
technology itself.
[0027] FIG. 1 illustrates, in simplified form, one example variant
implementation of our image processing system 100 solution along
with three suitable implementation environments.
[0028] As shown in FIG. 1, the image processing system is made up
of a camera 102, at least one processor 104, associated with the
camera 102, and storage 106 configured to store at least images 108
and programming 110 that is executable by the processor(s) 104 to
effect the operation(s) described herein.
[0029] At this point it should be noted that, unless otherwise
expressly stated herein, the term "storage" as used herein is
intended to mean any storage medium that stores data,
data-containing structures, and program instructions in a
non-transitory manner, for example, such as non-transient solid
state memory, a magnetic hard drive, a CD or DVD, a tape drive, or
an analogous or equivalent storage medium type would.
[0030] Returning to FIG. 1, the image processing system further
includes a lexical database 112 and image classification software
116 based upon, or that makes use of, a visual database 114 as
described herein.
[0031] The lexical database 112 is a hierarchically arranged
taxonomy of terms, which at its most generic is a single root node,
for example, "entity" or "life" and then a series of nodes, having
associated terms, arranged in a hierarchical manner to represent
immediate subgroups of the respective node's parent node. In
addition, the hierarchy should ideally have a single root node, but
need not be balanced (i.e., the number of nodes between the root
node and each leaf need not be the same within the hierarchy). If
the hierarchy does not have a single root node, it must be able to
be broken up by treating each root node as the only root node.
[0032] For example, using the biological taxonomy, "Life" is the
root node, and it has multiple child nodes, the "Domain" nodes,
each of which, in turn has multiple "Kingdom" nodes, and the same
is true for "Phylum" nodes, "Class" nodes, "Order" nodes, "Family"
nodes, "Genus" nodes, "Species" nodes, each having one or more
"Subspecies" nodes. The lowest level nodes (e.g., subspecies) are
referred to herein as "leaf" node. Each of these nodes has a
lexical term (i.e., word or phrase) associated with it, with the
lexical terms for the leaf nodes being referred to as "hyponyms"
herein and the terms associated with nodes between the leaf nodes
and the root node are referred to as "hypernyms" herein. Examples
of some lexical databases suitable for use with embodiments of the
image processing system 100 described herein include, but are not
limited to: [0033] WordNet (https://wordnet.princeton.edu/), [0034]
BabelNet (http://babelnet.org/), [0035] eXtended WordNet
(http://www.hlt.utdallas.edu/.about.xwn/), [0036] FrameNet
(https://framenet.icsi.berkeley.edu/fndrupal/) [0037] FrameNet
foreign counterparts [0038]
http://sccfn.sxu.edu.cn/portal-en/home.aspx [0039]
http://www.ufjf.br/framenetbr/ [0040]
https://spraakbanken.gu.se/eng/swefn/ [0041]
http://jfn.st.hc.keio.ac.jp/ [0042]
http://www.ramki.uw.edu.pl/en/index.html [0043] http://framenet.dk/
[0044] http://www.laits.utexas.edu/gframenet/ [0045] Malayalam
WordNet (http://malayalamwordnet.cusat.ac.in/), [0046] Chinese
Wordnet (http://lope.linguistics.ntu.edu.tw/cwn/), [0047] WordNet
Libre du Francais (http://alpage.inria.fr/.about.sagot/wolf.html),
[0048] IndoWordNet (http://www.cfilt.iitb.ac.in/indowordnet/),
[0049] MultiWordNet (http://multiwordnet.fbk.eu/english/home.php),
[0050] EuroWordNet (http://www.illc.uva.nl/EuroWordNet/), [0051]
Global Wordnet (http://globalwordnet.org/), [0052] BalkaNet
(http://www.dblab.upatras.gr/balkanet/), [0053] RussNet
(http://project.phil.spbu.ru/RussNet/), [0054] FinnWordNet
(http://www.ling.helsinki.fi/en/lt/research/finnwordnet/), [0055]
GermaNet (http://www.sfs.uni-tuebingen.de/lsd/index.shtml), [0056]
OpenWN-PT (https://github.com/own-pt/openWordnet-PT), [0057]
plWordNet (http://plwordnet.pwr.wroc.pl/wordnet/), [0058] PolNet
(http://ltc.amu.edu.pl/polnet/), and [0059] BulNet
(http://dcl.bas.bg/bulnet/) to name a few.
[0060] The visual database 114 is a database of images, with each
of the images in it associated with at least one leaf node's
hyponym from the schema of the particular lexical database 112. One
representative example of such a visual database 114 is ImageNet
(available from http://image-net.org/) which is a visual database
organized according to the WordNet lexical database hierarchy such
that all images in ImageNet are labeled/tagged (i.e., associated)
with at least one leaf term from WordNet.
[0061] The image classification software 116 is a deep neural
network program constructed according to an architecture such as,
for example, GoogLeNet and its Inception module (described in C.
Szegedy, et al., "Going deeper with convolutions" arXiv:1409.4842
(2014), and C. Szegedy, et al., "Rethinking the Inception
Architecture for Computer Vision" arXiv:1512.00567 (2015), program
available from http://vision.princeton.edu/pvt/GoogLeNet/), or
using an architecture such as AlexNet (described in A. Krizhevsky,
et al., "Imagenet classification with deep convolutional neural
networks" Advances in neural information processing systems, pp.
1097-1105 (2012)), to classify images. In addition, the visual
database 114 is used by the image classification software 116, in
some cases, both as a training base to classify a new image.
[0062] With respect to the image processing system 100 described
herein, the image classification software 116 is suitable for use
if it will assign to each image it classifies: [0063] (a) at least
one hyponym from the leaf nodes of the lexical database 112, and
[0064] (b) a probability value between 0 and 1 to each assigned
hyponym.
[0065] Advantageously, the image processing system 100 of FIG. 1
can be entirely housed, for purposes of exploration-related
implementations, in, for example, an autonomous undersea vehicle
118 such as a mini-submarine, an autonomous land vehicle 120, or an
autonomous aerial vehicle 122 such as a drone, UAV or mini-UAV.
[0066] FIG. 2 illustrates, in simplified form, another example
variant implementation of our image processing system 100 solution
for the three implementation environments of FIG. 1. As shown in
FIG. 2, the image processing system 100 is made up of the same
components as that of FIG. 1. However, the with the implementation
of FIG. 2, the camera 102, processor(s) 104 and storage 106 are
part of an image acquisition unit 202 that is housed within the
autonomous undersea vehicle 118, autonomous land vehicle 120 or
autonomous aerial vehicle 122, whereas the lexical database 112 and
image classification software 116 (based upon the visual database
114) are part of a processing unit 204 that is located
geographically remotely from the image acquisition unit 202, with
the image acquisition unit 202 and processing unit 204 being
communicatably connectable to each other 206, for example, via a
wired or wireless connection, either remotely during image
acquisition or when co-located somewhere post-image acquisition. As
such, it should be understood that the processing unit 204 may have
its own processor(s) 104 and/or the image acquisition unit 202 may
not have processor(s) 104 that operate to perform any function
described herein if the processing unit 204 does.
[0067] FIG. 3 illustrates, in simplified form, an alternative
example variant 300 of our image processing system 100 similar to
that of FIG. 2 in that it is made up of an image acquisition unit
202 and a processing unit 204. However, with this variant
implementation, the image acquisition unit 202 is part of a game
camera or trail camera (referred to herein as a "game/trail camera
unit") 302 that can be moved about and/or situated in a particular
location to acquire images, for example, continuously, or in
response to some trigger, such as movement within the field of view
of the camera 102. With this variant 300, the image acquisition
unit 202 will typically be wirelessly connectable/connected to a
remotely located processing unit 204 via a network 304 so that the
processing unit 204 can be shared by multiple, geographically
dispersed devices containing their own image acquisition units
202.
[0068] FIG. 4 illustrates, in simplified form, an alternative
example variant 400 of our image processing system 100 similar to
that of FIGS. 1-3 except that the entire image processing system
100 is housed in a handheld mobile device 400, such as a smart
phone, tablet or laptop computer.
[0069] FIG. 5 illustrates in simplified form, one example of a
lexical database 112 suitable for use as described herein (only a
portion of which is shown for simplicity). As shown, the lexical
database 112 is made up of a root node 502 at its "highest" part
(meaning that it has no parent nodes) that has a term associated
with it, in this case "Entity," which is the most general term that
can be used to describe anything that is classifiable within this
lexical database 112. A series of nodes 504.sub.1, 504.sub.2,
504.sub.3, . . . , 504.sub.n, 504.sub.n+1, 504.sub.n+2, 504.sub.n+3
are at the "lowest" part or the lexical database 112 and are each
referred to herein as a "leaf" node (because they have no child
nodes). Each of the leaf nodes 504 also have an individual term
associated with them, as respectively shown in the illustrated
portion "Lion," "Tiger,", Puma," "Domestic dog," "Wolf," "Fox" and
"Coyote."
[0070] In between the root node 502 and the leaf nodes 504.sub.1,
504.sub.2, 504.sub.3, . . . , 504.sub.n, 504.sub.n+1, 504.sub.n+2,
504.sub.n+3 are multiple hypernym nodes 506 that each have a parent
node and at least one child node. The hypernym nodes 506 each also
have a term associated with them that is more specific than its
parent node and more general than any of its child nodes.
[0071] FIG. 6 illustrates, in simplified form, a representative
portion of a visual database 114 suitable for use in conjunction
with the lexical database 112 of FIG. 5 in an image processing
system 100 as described herein. As shown in FIG. 6, the visual
database 114 is made up of multiple images 602 that are each tagged
or otherwise associated with at least one term, e.g., "Lion"
604.sub.1, "Tiger" 604.sub.2, "Puma" 604.sub.3, . . . , "Domestic
dog" 604.sub.n, "Wolf" 604.sub.n+1, "Fox" 604.sub.n+2, "Coyote"
604.sub.n+3 from the leaf nodes 504.sub.1, 504.sub.2, 504.sub.3, .
. . , 504.sub.n, 504.sub.n+1, 504.sub.n+2, 504.sub.n+3 of the
lexical database 112 of FIG. 5.
[0072] FIG. 7 illustrates, in simplified form, a representative
example of the output of an image 700 that has been classified
using the image classification software 116.
[0073] As shown in FIG. 7, the image 700 has associated with it
four hyponyms 702a, 704a, 706a, 708a from a lexical database 112 as
described herein and their respective probabilities 702b, 704b,
706b, 708b, in this example case, the image 700 has been classified
as displaying a: "German Shepherd" with a probability of 0.20,
"Poodle" with a probability of 0.18, "Motorcycle" with a
probability of 0.10, and "Car" with a probability of 0.11.
[0074] Having described various example variant system structures
for our processing system 100, the operation of implementations of
our solution will now be discussed.
[0075] FIG. 8 illustrates, in simplified overview form, the
operation of an image processing system 100 to process the
classified image 700 of FIG. 7.
[0076] As shown, under program control, the processor(s) 104
accesses the classified image 700, along with its associated
hyponyms 702a, 704a, 706a, 708a and their probabilities 702b, 704b,
706b, 708b in the storage 106.
[0077] Next, the processor(s) 104 use the lexical database 112 to
build a subset hierarchical tree 802 from it by taking at least the
hyponyms 702a, 704a, 706a, 708a and their respective parent,
grandparent, great grandparent, etc. nodes 506 from the lexical
database 112, and doing so up the hierarchy until either the root
node 502 is reached or they all come together at some single
hypernym node 506 below the root node 502.
[0078] Note here that, optionally, in building the subset
hierarchical tree 802, if there are other leaf nodes 504 that share
a common parent hypernym node 506 with one of the hyponyms assigned
to a classified image, they can be included in the tree built,
irrespective of whether they have been assigned a value because,
for some implementations, this can yield more accurate results
during scoring as described below.
[0079] Once the subset hierarchical tree 802 has been built, the
processor(s) then use the probabilities 702b, 704b, 706b, 708b to
score the hypernym nodes 506 for all immediate parent nodes 506 of
the hyponym nodes 506 using a scoring function ("Score(node)").
[0080] Depending upon the particular implementation and lexical
database, different scoring functions can be used. Equations 1-4
below are a few representative examples of scoring functions that
can be used to score the terms for the hypernym nodes 506:
Score ( node ) = [ [ Sum of probabilities of all immediate children
] [ log 10 ( minimum distance to a leaf node + 2 ) .times. ( total
number of immediate child nodes ) ] ] , ( Eq . 1 ) Score ( node ) =
[ [ Sum of probabilities of all immediate children ] [ Total number
of immediate child nodes ] ] , ( Eq . 2 ) Score ( node ) = [ g (
node ) .times. log 10 ( sum of g ( node ) for all images in visual
database ) ] , and ( Eq . 3 ) Score ( node ) = [ g ( node ) .times.
log 10 ( popularity of term associated with this node ) log 10 (
minimum distance to a leaf node + 2 ) ] ( Eq . 4 ) ##EQU00001##
where, in Equation 3 and Equation 4, g(node)=(the average
probability of all directly-connected child nodes).
[0081] For purposes of the example of FIG. 8, the scoring function
of Equation 1 will be used. However, it should be understood that
any of the other scoring functions could have been used instead and
that different scoring functions may yield better results when used
with some specific image classification software 116 and/or lexical
database 112.
[0082] Referring back to FIG. 8, the score for the hypernym node
506 with the term "Dog" that is the parent to the hyponym leaf
nodes for the terms "German Shepherd" and "Poodle" assigned by the
image classification software 116 is calculated as:
Score ( Dog ) = [ [ ( 0.20 + 0.18 ) ] [ log 10 ( 1 + 2 ) .times. (
2 ) ] ] = [ 0.38 log 10 ( 3 ) .times. 2 ] = 0.398 ##EQU00002##
so the probability value of 0.398 is assigned to the hypernym node
506 for the term "Dog."
[0083] Likewise, the score for the hypernym node 506 for the term
"Motor vehicle" is calculated from the probabilities of the hyponym
leaf node terms "Motorcycle" and "Car" assigned by the image
classification software 116 as:
Score ( Motor vehicle ) = [ [ ( 0.10 + 0.11 ) ] [ log 10 ( 1 + 2 )
.times. ( 2 ) ] ] = [ 0.21 log 10 ( 3 ) .times. 2 ] = 0.220
##EQU00003##
and the probability value of 0.22 is assigned to the hypernym node
506 for the term "Motor vehicle."
[0084] This process is then repeated for each successively higher
hypernym node 506 using the probability values of its respective
hypernym children nodes 506 until the score for the highest single
common node hypernym node 506 has been calculated or the root node
502 has been reached.
[0085] For purposes of this example, presume that the only node
above the "Dog" and "Motor vehicle" termed hypernym nodes 506 is
the single hypernym node 506 for the term "Entity." Thus, the
probability value for that hypernym node 506 can be calculated in a
similar fashion as:
Score ( Entity ) = [ [ ( 0.398 + 0.22 ) ] [ log 10 ( 2 + 2 )
.times. ( 2 ) ] ] = [ 0.618 log 10 ( 4 ) .times. 2 ] = 0.506
##EQU00004##
so the probability value of 0.506 is assigned to the hypernym node
506 for the term "Entity."
[0086] Once the scoring of all hypernym nodes 506 in the subset
hierarchical tree 802 is complete, the calculated values can be
used to identify those one or more hypernyms 506 that best
accurately represent the content of the classified image and is not
the root node 502 or the highest common ultimate parent node 506.
Thus, for the example of FIG. 8, the hypernym node 506 "Dog"
represents the least generic hypernym 506 that accurately describes
the content of the image 700.
[0087] Now, empirically, we have determined that, using our scoring
approach, a "Goldilocks" probability range (i.e., one that is not
too generic and not too specific) will typically involve calculated
score function probability values in the range of 0.30 to 0.40 when
GoogLeNet is the image classification software 116 and WordNet is
the lexical database 112. However, it is to be understood that
other specific combinations of image classification software 116
and lexical database 112 may yield a different "Goldilocks" range
that should be readily determinable using known images that are the
same as, or close to, one of the images in the visual database 114.
Alternatively, depending upon the particular scoring function used,
there may not be a need to specify any particular range because:
selecting the (non-root) node(s) with the highest score(s) will
achieve the same result, irrespective of the actual values for the
calculated score(s).
[0088] Finally, it is to be understood that, in practice, and
depending upon the particular image classification software 116,
the subset hierarchical tree 802 may actually be the overall
lexical database 112, because some leaf node values may actually be
zero or so small as to be effectively zero. Thus, it is to be
understood that, as in set theory, where a subset can equal a set,
in some cases, the "subset" hierarchical tree 802 may be equal to
the entire lexical database 112.
[0089] FIG. 9 illustrates, in simplified form, a more concrete
example of our solution using WordNet as the lexical database 112
and GoogLeNet as the image classification software 116.
[0090] As shown in FIG. 9, a camera 102 has been used to take a
photograph 900 and that photograph has been classified 902 using
GoogLeNet to yield a set of thirteen hyponyms 904 that each
correspond to a leaf node in the WordNet lexical database 112 and
associated probabilities 906 that the individual classification is
an accurate description of what appears in the photograph 900.
[0091] From those leaf node hyponyms 504, a subset hierarchical
tree 802 was created from the WordNet lexical database 112.
Following our approach, a probability value for each hypernym node
506 between those leaf nodes 504 and the root node 502 for the term
"Entity" is then calculated using a scoring function such as
described above in Equations 1 through 4. As a result, based upon
the scoring, the top five hypernyms are identified 908. As a
result, in order of specificity from most to least, the terms
"Mountain," "Natural elevation," "Geological formation," "Object,"
and "Entity" have been identified. Thus, as should be appreciated,
the hypernym "Mountain" is a more accurate description of the image
than any of the assigned hyponym terms.
[0092] In addition, it should now be appreciated that this
improvement to image classification technology can be of
significant value in fields relating to
identification/discovery/classification of new land and marine
creatures. For example, presume that an undersea exploration
vehicle takes a photograph of some creature during one of its
dives. Using present image classification software, that photograph
might be classified with the following hyponyms: "squid,"
"jellyfish," "octopus," "anemone," "tiger fish" and "seaweed" all
with fairly low probabilities (i.e., less than 20%). Thus, it would
be difficult, based upon the output of the image classification
software to know what is actually displayed, because it may be a
new creature that has characteristics of several of those
creatures. However, using our approach, the highest scoring
hypernym might be "cephalopod." In other words, with our system and
approach, the photograph will effectively have been labeled as
"this is a photograph that is highly likely to be a cephalopod
although we cannot specifically identify it more particularly, so
it may be new."
[0093] As a result, researchers can better zero in on the photos of
creatures of interest so that, a researcher looking for new types
of cephalopods would want to examine that photograph (and
potentially pictures taken around it), whereas a researcher looking
for new types of jellyfish or seaweed would not need to do so, even
though the image classification software may have tagged that
photograph with those terms (and, hence, would likely have had to
have been looked at absent our image processing solution).
[0094] Still further, our solution can be extended for use with
multiple images collectively so that the collection can be more
accurately queried by a user due to the addition, through our
solution, of probability values for hypernyms in the associated
lexical database. This extended approach will now be described with
reference to FIGS. 10-11.
[0095] In this regard, FIG. 10 is a flowchart 1000 illustrating, in
simplified form, the use of our image processing solution with a
collection of classified images to improve the processing of a user
query of that collection.
[0096] As shown, the process begins with receipt of an image query
from a user (Step 1002), for example, in words: "provide all images
of otters" or, using an image, a query that is, in effect, return
all images most like this image.
[0097] In the former case, the term "otters" would have to be a
hyponym in the lexical database 112, and in the latter case, the
"query" would be based upon the hyponyms assigned by the image
classification software 116 to the image that is used as the query.
Alternatively, the query could be "all images from user X" or "all
images from today's expedition" which could return various sets of
images sub-grouped according to their highest scoring non-root node
hypernyms.
[0098] In either case, at least the images in the collection that
satisfy one of those two cases, (e.g., their classifications
include the hyponym "otter" or share at least one hyponym with the
image that is used as the query) are grouped and the probabilities
for those images are combined (Step 1004).
[0099] FIG. 11 illustrates, in simplified form, details of how
multiple classified images are combined in conjunction with the
process of FIG. 10.
[0100] In FIG. 11, the each image (I.sub.1, I.sub.2, . . . ,
I.sub.n) in the collection 1100 of classified images has associated
hyponym terms 604 and associated probabilities. To combine the
images, the unique hyponym terms are collected from all of the
relevant images in the collection and their probabilities are
combined, on an individual hyponym basis, by taking the log sum of
their probabilities for all images classified to that hyponym.
[0101] By way of example, in FIG. 11, Image I.sub.1 has been
classified to hyponym terms T1, T2 and T5, Image I.sub.2 has been
classified to hyponym terms T1, T4, T7 and T9, and Image I.sub.n
has been classified to hyponym terms T2 and T9. So the probability
for hyponym term T1 for the collection 1100 would be the log sum of
the probabilities for Image I.sub.1 and Image I.sub.2 and all other
images classified to hyponym term T1. Likewise, the probability for
hyponym term T2 for the collection 1100 would be the log sum of the
probabilities for Image I.sub.1 and Image I.sub.n and all other
images classified to hyponym term T2, the probability for hyponym
term T4 for the collection 1100 would be the log sum of the
probabilities for Image I.sub.2 and all other images classified to
hyponym term T4, and so on.
[0102] Once that is done, the result is effectively creation of a
classification of a single "image" 1102 (referred to in FIG. 11 as
"I.sub.Combined") that has been classified 1104 with all unique
hyponym terms of the collection 1100, and, through the probability
combination above, now has a single calculated associated
probability for each hyponym term based upon the probabilities of
the images in the collection that were classified to that term.
[0103] Once that is done, the process proceeds largely as described
previously for an actual single image but using the terms and
probabilities for the "image" I.sub.Combined, by creating a subset
hierarchical tree for all hypernyms of the hyponyms of
I.sub.Combined.
[0104] Then each hypernym of the subset hierarchical tree is scored
using a scoring function as described above (Step 1008).
[0105] Once all of the hypernyms of the subset hierarchical tree
have been scored, some number "n" of the top hypernyms based
directly upon score or based upon exceeding some threshold score
value (depending upon the particular implementation) are then
filtered (i.e., all other hypernyms are disregarded) (Step
1010).
[0106] Then the top "n" hypernyms are clustered, for the images in
the collection 1100, using a similarity algorithm and the user's
query (Step 1012).
[0107] Representative example known similarity algorithms suitable
for use in this respect include, but are not limited to, a
Leacock-Chodorow similarity algorithm (described in C. Leacock
& M. Chodorow, "Combining Local Context and WordNet Similarity
for Word Sense Identification," Ch. 11, pp. 265-283. MIT Press,
Cambridge, Mass. (1998)), a Wu-Palmer similarity algorithm
(described in Z. Wu & M. Palmer, "Verb Semantics and Lexical
Selection," 32nd Annual Meeting of the Assoc. for Computational
Linguistics, pp. 133-138 (1994)), a Resnik similarity algorithm
(described in P. Resnik, "Using Information Content to Evaluate
Semantic Similarity in a Taxonomy," Int'l Joint Conf. on Artificial
Intelligence (IJCAI-95), pp. 448-453, Montreal, Canada (1995)), a
Jiang & Conrath similarity algorithm (described in J. Jiang
& D. Conrath, "Semantic Similarity Based on Corpus Statistics
and Lexical Taxonomy," Proc. of Int'l Conf. Research on
Computational Linguistics, Taiwan (1997)), a Lin similarity
algorithm (described in D. Lin, "An Information-Theoretic
Definition of Similarity," Proc. 15th In'l Conf. on Machine
Learning, pp. 296-304, San Francisco, Calif. (1998)), or a Nguyen
and Al-Mubaid similarity algorithm (described in H. Al-Mubaid &
H. A. Nguyen, "A Cross-Cluster Approach for Measuring Semantic
Similarity Between Concepts," IEEE Int'l Conf. on Information Reuse
and Integration, pp. 551-556 (2006)). Of course others can be
used.
[0108] Finally, the images in the collection 1100 that are
identified based upon the result of the similarity clustering are
returned to the user as the response to the query (Step 1014).
Optionally, depending upon the particular implementation, the "n"
hypernyms can also be provided as part of the query response and/or
the individual images making up the response can be processed and
those hypernyms (either the top hypernyms or those within a
"Goldilocks" range--depending upon the scoring function used) can
be returned with each image.
[0109] Having described and illustrated the principles of this
application by reference to one or more example embodiments, it
should be apparent that the embodiment(s) may be modified in
arrangement and detail without departing from the principles
disclosed herein and that it is intended that the application be
construed as including all such modifications and variations
insofar as they come within the spirit and scope of the subject
matter disclosed.
* * * * *
References