U.S. patent number 8,805,653 [Application Number 12/854,781] was granted by the patent office on 2014-08-12 for supervised nonnegative matrix factorization.
This patent grant is currently assigned to Seiko Epson Corporation. The grantee listed for this patent is Mithun Das Gupta, Seung-il Huh, Jing Xiao. Invention is credited to Mithun Das Gupta, Seung-il Huh, Jing Xiao.
United States Patent |
8,805,653 |
Huh , et al. |
August 12, 2014 |
Supervised nonnegative matrix factorization
Abstract
Graph embedding is incorporated into nonnegative matrix
factorization, NMF, while using the original formulation of graph
embedding. Negative values are permitted in the definition of graph
embedding without violating the nonnegative requirement of NMF. The
factorized matrices of NMF are found by an iterative process.
Inventors: |
Huh; Seung-il (Pittsburgh,
PA), Das Gupta; Mithun (Cupertino, CA), Xiao; Jing
(Cupertino, CA) |
Applicant: |
Name |
City |
State |
Country |
Type |
Huh; Seung-il
Das Gupta; Mithun
Xiao; Jing |
Pittsburgh
Cupertino
Cupertino |
PA
CA
CA |
US
US
US |
|
|
Assignee: |
Seiko Epson Corporation (Tokyo,
JP)
|
Family
ID: |
45565435 |
Appl.
No.: |
12/854,781 |
Filed: |
August 11, 2010 |
Prior Publication Data
|
|
|
|
Document
Identifier |
Publication Date |
|
US 20120041725 A1 |
Feb 16, 2012 |
|
Current U.S.
Class: |
703/2; 382/181;
382/118 |
Current CPC
Class: |
G06K
9/00275 (20130101); G06K 9/00308 (20130101); G06F
17/16 (20130101); G06K 9/6252 (20130101); G06K
9/00 (20130101); G06K 9/6239 (20130101); G06K
9/00288 (20130101) |
Current International
Class: |
G06F
17/10 (20060101) |
References Cited
[Referenced By]
U.S. Patent Documents
Other References
Yan, S., et al, "Graph Embedding and Extension: A general Framework
Dimensionality Reduction", IEEE, Transaction on Pattern Analysis
and Machine Intelligence, vol. 29, No. 1, pp. 40-51, Jan. 2007.
cited by examiner .
D. Xu, S. Yan, D. Tao, S. Lin, and H. Zhang, "Marginal Fisher
Analysis and Its Variants for Human Gait Recognition and
Content-Based Image Retrieval" pp. 2811-2821, 2007 IEEE. cited by
examiner .
J. Yang, S. Yang, Y. Fu, X. Li, T. Huang, "Non-Negative Graph
Embedding", pp. 1-8, 2008 IEEE. cited by examiner .
Sastry, V.N., et al., "Modified Algorithm to Compute Pareto-Optimal
Vectors", Journal of Optimization Theory and Applications, vol.
103, No. 1, pp. 241-244, Oct. 1999. cited by applicant .
He, X., "Incremental Semi-Supervised Subspace Learning for Image
Retrieval", Proceedings of the 12th Annual ACM International
Conference on Multimedia, pp. 2-8, 2004. cited by applicant .
Wang, H., et al., "Trace Ratio vs. Ratio Trace for Dimensionality
Reduction", IEEE Conference on Computer Vision and Pattern
Recognition, pp. 1-8, 2007. cited by applicant .
Brunet, J.P., et al., "Metagenes and Molecular Pattern Discovery
Using Matrix Factorization", National Academy of Sciences, 102(12):
pp. 4164-4169, 2004. cited by applicant .
Buciu, I., et al., "Non-negative matrix factorization in polynomial
feature space", IEEE Transactions on Neural Networks, 19(6): 2008.
cited by applicant .
Cai, D., et al., "Non-negative Matrix Factorization on Manifold",
ICDM, 2008. cited by applicant .
Cooper, M., et al., "Summarizing Video using Non-Negative
Similarity Matrix Factorization", IEEE Workshop on Multimedia
Signal Processing, pp. 25-28, 2002. cited by applicant .
Kim, J., et al., "Toward Faster Nonnegative Matrix Factorization: A
New Algorithm and Comparisons", In ICDM, pp. 353-362, 2008. cited
by applicant .
Sha, F., et al., "Multiplicative Updates for Nonnegative Quadratic
Programming in Support Vector Machines", In NIPS, 2003. cited by
applicant .
Lee, D.D., et al., "Learning the parts of objects by nonnegative
matrix factorization", Nature 401: 788-791, 1999. cited by
applicant .
Lee, D.D., et al., Algorithms for Non-negative Matrix
Factorization, NIPS, pp. 556-562, 2000. cited by applicant .
Li, S. Z., et al., "Learning Spatially Localized, Parts-Based
Representation", in CVPR, pp. 207-212, 2001. cited by applicant
.
Lin, C.-J., "On the Convergence of Multiplicative Update Algorithms
for Non-negative Matrix Factorization", IEEE Transactions on Neural
Networks, pp. 1589-1596, 2007. cited by applicant .
Lin, C.-J., "Projected gradient methods for non-negative matrix
factorization", Neural Computation, 19(10): pp. 2756-2779, 2007.
cited by applicant .
Wang, C., et al., "Multiplicative Nonnegative Graph Embedding", in
CVPR, 2009. cited by applicant .
Wang, Y., et al., "Non-Negative Matrix Factorization Framework for
Face Recognition", International Journal of Pattern Recognition and
Artificial Intelligence, 19(4): 495-511, 2005. cited by applicant
.
Wang, Y., et al., "Fisher Non-Negative Matrix Factorization for
Learning Local Features", in ACCV, 2004. cited by applicant .
Xu, Dong, et al., "Marginal Fisher Analysis and its Variants for
Human Gait Recognition and Content-Based Image Retrieval", IEEE
Trans. on Image Processing, 16(11), 2007. cited by applicant .
Xu, W., et al., "Document Clustering Based on Non-negative Matrix
Factorization", SIGIR, ACM Conference, pp. 267-273, 2003. cited by
applicant .
Yang, J., et al., "Non-Negative Graph Embedding", in CVPR 2008.
cited by applicant .
Zafeiriou, S., et al., "Nonlinear Nonnegative Component Analysis",
CVPR, pp. 2860-2865, 2009. cited by applicant .
Duda, R., et al., "Pattern Classification, Second Edition, pp.
215-281, Linear Discriminant Functions", 2006. cited by
applicant.
|
Primary Examiner: Rivas; Omar Fernandez
Assistant Examiner: Gebresilassie; Kibrom
Claims
What is claimed is:
1. A pattern recognition method, comprising: providing a data
processing device to implement the following steps: accessing
multiple sets of training data, each set of training data having at
least one of a true example of a pattern to be recognized or a
false example of a pattern to be recognized; arranging said
multiple sets of data into a data matrix U in an electronic memory
constituting a data store, wherein each set of training data is
arranged as a separate column in data matrix U and data matrix U is
defined as U.epsilon..sup.d.times.n, where is the set of real
numbers and d.times.n define the dimensions of data matrix U;
defining an intrinsic graph G to label specific features of most
interest in the sets of training data, wherein G={U,W}, labels that
identify favorable features that are features characteristic of the
pattern to be recognized are added to the sets of training data in
U, each column of U represents a vertex, W is a similarity matrix
and each element of similarity matrix W measures the similarity
between vertex pairs; defining a penalty graph G to label specific
features of least interest in the sets of training data, wherein
G={U, W}, labels that identify unfavorable features that are
features not characterisic of the pattern to be recognized are
added to the sets of training data in U, W is a dissimilarity
matrix, and each element of dissimilarity matrix W measures
unfavorable relationships between said vertex pairs; defining an
intrinsic diagonal matrix D, wherein D=[D.sub.ij] and
D.sub.ii=.SIGMA..sub.j=1.sup.nW.sub.ij; defining an intrinsic
Laplacian matrix L, wherein L=D-W; defining a penalty diagonal
matrix D, wherein D=[ D.sub.ij] and D.sub.ii=.SIGMA..sub.j=1.sup.n=
W.sub.ij; defining a penalty Laplacian matrix L, wherein L= D- W;
defining a basis matrix V, wherein V.epsilon..sup.d.times.r and
basis matrix V is to hold basis examples of the sets of training
data, the basis examples being a reduction of the sets of training
data into simplified representations that highlight distinguishing
characteristics of the pattern to be recognized; defining a feature
matrix X, wherein X.epsilon..sup.r.times.n feature matrix X is to
hold features values to construct an approximation of U from basis
matrix V; incorporating the label information of intrinsic graph G
and penalty graph G into the construction of basis matrix V and
features matrix X by defining a measure of the compactness of
intrinsic graph G by the weighted sum of squared distances defined
as
.SIGMA..sub.i<j.sup.nW.sub.ij.parallel.x.sub.i-x.sub.j.parallel..sup.2-
=Tr(XLX.sup.T), wherein x.sub.i is the i-th column of X and x.sub.j
is the j-th column of X; defining a measure of the separability of
penalty graph G by the weighted sum of squared distances defined as
.SIGMA..sub.i<j.sup.n
W.sub.ij.parallel.x.sub.i-x.sub.j.parallel..sup.2=Tr(X LX.sup.T)
wherein x.sub.i is the i-th column of X and x.sub.j is the j-th
column of X; defining F.sup.(1)(V,X) as an objective of NMF
(nonnegative matrix factorization), F.sup.(1)(V,X) being
proportional to
F.sup.(1)(V,X)=1/2.parallel.U-VX.parallel..sub.F.sup.2; defining
F.sup.(2)(X) as an objective of graph embedding, F.sup.(2)(X) being
proportional to ratio .function..function..times..times.
##EQU00040## deriving an SNMF (supervised nonnegative matrix
factorization) objective from a sum of F.sup.(1)(V,X) and
F.sup.(2)(X); populating basis matrix V and feature matrix X by
solving the derived SNMF objective through iterative multiplicative
updates; and separating recognizable patterns within the basis
examples of basis matrix V into distinct pattern classifications
using a data classifier, these pattern classifications of the basis
examples being deemed the recognized patterns of the sets of
training data; wherein: said pattern recognition method is a face
detection method; each of said set of training data is a distinct
training image; the favorable features labeled in intrinsic graph G
identify regions within each distinct training image that are to be
focused upon when defining the basis examples; the unfavorable
features labeled in intrinsic graph G identify regions within each
distinct training image that are of least interest when defining
the basis examples; at least one of said distinct pattern
classifications is a face pattern classification; and a received
test image is tested for the existence of a face by determining a
basis test sample from the received test image, submitting the
basis test sample to the data classifier, and if the data
classifier identifies the basis test sample as belonging to the
face pattern classification, then deeming the received test image
as having a rendition of a face.
2. The method of claim 1, wherein: F.sup.(1)(V,X) is defined as
F.sup.(1)(V,X)=1/2.parallel.U-VX.parallel..sub.F.sup.2;
F.sup.(2)(X) is defined as
.function..lamda..times..function..function..times..times.
##EQU00041## where .lamda. is a multiplication factor determined by
a validation technique.
3. The method of claim 1, wherein F.sup.(1)(V,X) is defined as
F.sup.(1)(V,X)=1/2.parallel.U-VX.parallel..sub.F.sup.2;
F.sup.(2)(X) is defined as
.function..lamda..times..function..function..times..times.
##EQU00042## where .lamda. is a multiplication factor determined by
a validation technique, and where .function..times. ##EQU00043##
and said SNMF objective is defined as
.times..function..times..lamda..times..function..function..times..times.
##EQU00044##
4. The method of claim 3, further comprising: approximating the
objective of SNMF as
.function..times..lamda..times..alpha..times..times..function..beta..time-
s..times..function..times..times..beta..alpha. ##EQU00045## where
V=V.sup.t and X=X.sup.t at time t and
.alpha..function..times..times..times..times..times..times..times..beta..-
function..times..times..function..times..times..times.
##EQU00046##
5. The method of claim 4, wherein: said SNMF objective is
determined through the following iterative multiplicative updates:
.rarw..times..times..lamda..times..times..function..times..lamda..times..-
times..function. ##EQU00047## where T.sub.D=.alpha.D-.beta. D and
T.sub.W=.alpha.W-.beta. W, and
.rarw..times..lamda..times..times..lamda..times..times.
##EQU00048## where S=.alpha.L-.beta. L.
6. The method of claim 1, wherein the data classifier is a support
vector machine (SVM).
7. The method of claim 1, wherein W and W are generated from true
relationships among data pairs.
8. The method of claim 7, wherein said data pairs are class label
data.
9. The method of claim 1, wherein each column of feature matrix X
is a low dimensional representation of the corresponding column of
U.
10. The method of claim 1, wherein at least one of similarity
matrix W or dissimilarity matrix W has negative values.
11. The method claim 10, wherein Tr(XLX.sup.T) and Tr(X LX.sup.T)
are positive.
12. The method of claim 1, wherein similarity matrix W and
dissimilarity matrix W are defined by the concept of within-class
and between-class distances of Linear Discriminant Analysis
(LDA).
13. The method of claim 12, wherein: similarity matrix W=[W.sub.ij]
is defined as: .times..times..di-elect cons. ##EQU00049## wherein
y.sub.i is a class label of the i-th sample, y.sub.j is a class
label of the j-th sample, and n.sub.c is the size of class c; and
dissimilarity matrix W=[ W.sub.ij] is defined as ##EQU00050##
wherein n is the number of data points.
14. A pattern recognition system for processing input test data,
comprising: an electronic memory storing multiple sets of training
data, each set of training data having at least one of a true
example of a pattern to be recognized or a false example of a
pattern to be recognized, wherein said electronic memory
constitutes a data store, said multiple sets of training data are
arranged into a data matrix U, each set of training data is
arranged as a separate column in data matrix U, and data matrix U
is defined as U.epsilon..sup.d.times.n where is the set of real
numbers and d.times.n define the dimensions of data matrix U; a
data processing device having access to said electronic memory and
being configured to implement the following steps: defining an
intrinsic graph G to label specific features of most interest in
the sets of training data, wherein G={U,W}, labels that identify
favorable features that are features characterisic of the pattern
to be recognized are added to the sets of training data in U, each
column of U representing a vertex, W is a similarity matrix and
each element of similarity matrix W measures the similarity between
vertex pairs; defining a penalty graph G to label specific features
of least interest in the sets of training data, wherein G={U, W},
labels that identify unfavorable features that are features not
characterisic of the pattern to be recognized are added to the sets
of training data in U, W is a dissimilarity matrix, and each
element of dissimilarity matrix W measures unfavorable
relationships between said vertex pairs; defining an intrinsic
diagonal matrix D as D=[D.sub.ij] and
D.sub.ii=.SIGMA..sub.j=1.sup.nW.sub.ij; defining an intrinsic
Laplacian matrix L as L=D-W; defining a penalty diagonal matrix D
as D=[ D.sub.ij] and D.sub.ii=.SIGMA..sub.j=1.sup.n W.sub.ij;
defining a penalty Laplacian matrix L as L= D- W; defining a basis
matrix V as V.epsilon..sup.d.times.r, wherein basis matrix V is to
hold basis examples of the sets of training data, the basis
examples being a reduction of the sets of training data into
simplified representations that highlight distinguishing
characteristics of the pattern to be recognized; defining a feature
matrix X as X.epsilon..sup.r.times.n, wherein feature matrix X is
to hold features values to construct an approximation of U from
basis matrix V; incorporating the label information of intrinsic
graph G and penalty graph G into the construction of basis matrix V
and features matrix X by defining a measure of the compactness of
intrinsic graph G by the weighted sum of squared distances defined
as
.SIGMA..sub.i<j.sup.nW.sub.ij.parallel.x.sub.i-x.sub.j.parallel..sup.2-
=Tr(XLX.sup.T), wherein x.sub.i is the i-th column of X and x.sub.j
is the j-th column of X; defining a measure of the separability of
penalty graph G by the weighted sum of squared distances defined as
.SIGMA..sub.i<j.sup.n
W.sub.ij.parallel.x.sub.i-x.sub.j.parallel..sup.2=Tr(X LX.sup.T),
wherein x.sub.i is the i-th column of X and x.sub.j is the j-th
column of X; defining F.sup.(1)(V,X) as an objective of NMF
(nonnegative matrix factorization), wherein as
F.sup.(1)(V,X)=1/2.parallel.U-VX.parallel..sub.F.sup.2; defining
F.sup.(2)(X) as an objective of graph embedding, where
.function..lamda..times..function..function..times..times.
##EQU00051## .lamda. is a multiplication factor determined by a
validation technique, and .function..times. ##EQU00052## defining
an approximation of supervised nonnegative factorization, SNMF, as
.function..times..lamda..times..alpha..times..times..function..beta..time-
s..times..function..times..times..beta..alpha. ##EQU00053## where
V=V.sup.t and X=X.sup.t at time t,
.alpha..function..times..times..times..times..times..beta..function..time-
s..times..function..times..times..times. ##EQU00054## identifying
factorized matrices X.sub.ij and V.sub.ij by the following
iterative multiplicative updates:
.rarw..times..times..lamda..times..times..function..times..lamda..times..-
times..function. ##EQU00055## where T.sub.D=.alpha.D-.beta. D and
T.sub.W=.alpha.W-.beta. W, and
.rarw..times..lamda..times..times..lamda..times..times.
##EQU00056## where s=.alpha.L-.beta. L; and classifying the test
data according to classifications defined by X.sub.ij using a data
classifier; wherein: the pattern recognition system is a face
recognition system and the classifications defined by X.sub.ij
include a face classification; each of said set of training data is
a distinct training image; the favorable features labeled in
intrinsic graph G identify regions within each distinct training
image that are to be focused upon when defining the basis examples;
the unfavorable features labeled in intrinsic graph G identify
regions within each distinct training image that are of least
interest when defining the basis examples; and a received test
image is tested for the existence of a face by determining a basis
test sample from the received test image, submitting the basis test
sample to the using a data classifier, and if the using a data
classifier identifies the basis test sample as belonging to the
face classification, then deeming the received test image as having
a rendition of a face.
15. The system of claim 14, wherein said data pairs are class
labels of data.
16. The system of claim 14, wherein at least one of similarity
matrix W or dissimilarity matrix W has negative values.
17. The system of claim 14, wherein: similarity matrix W=[W.sub.ij]
is defined as: .times..times..di-elect cons. ##EQU00057## wherein
y.sub.i is a class label of the i-th sample and n.sub.c is the size
of class c; and dissimilarity matrix W=[ W.sub.ij] is defined as
##EQU00058## wherein n is the total number of data points.
18. A pattern recognition method, comprising: providing a data
processing device to implement the following steps: accessing
multiple sets of training data, each set of training data having at
least one of a true example of a pattern to be recognized or a
false example of a pattern to be recognized; arranging said
multiple sets of data into a data matrix U in an electronic memory
constituting a data store, wherein each set of training data is
arranged as a separate column in data matrix U and data matrix U is
defined as U.epsilon..sup.d.times.n, where is the set of real
numbers and d.times.n define the dimensions of data matrix U;
defining an intrinsic graph G to label specific features of most
interest in the sets of training data, wherein G={U,W}, labels that
identify favorable features that are features characterisic of the
pattern to be recognized are added to the sets of training data in
U, each column of U represents a vertex, W is a similarity matrix
and each element of similarity matrix W measures the similarity
between vertex pairs; defining a penalty graph G to label specific
features of least interest in the sets of training data, wherein
G={U,W}, to label specific features of least interest in the sets
of training data, W is a dissimilarity matrix, and each element of
dissimilarity matrix W measures unfavorable relationships between
said vertex pairs; defining an intrinsic diagonal matrix D, wherein
D=[D.sub.ij] and D.sub.ii=.SIGMA..sub.j=1.sup.nW.sub.ij; defining
an intrinsic Laplacian matrix L, wherein L=D-W; defining a penalty
diagonal matrix D, wherein D=[ D.sub.ij] and
D.sub.ii=.SIGMA..sub.j=1.sup.n W.sub.ij; defining a penalty
Laplacian matrix L, wherein L= D- W; defining a basis matrix V,
wherein V.epsilon..sup.d.times.r and basis matrix V is to hold
basis examples of the sets of training data, the basis examples
being a reduction of the sets of training data into simplified
representations that highlight distinguishing characteristics of
the pattern to be recognized; incorporating the label information
of intrinsic graph G and penalty graph G into the construction of
basis matrix V and features matrix X by defining a feature matrix
X, where X.epsilon..sup.r.times.n, wherein feature matrix X is to
hold features values to construct an approximation of U from basis
matrix V; given a kernel NMF optimization problem of
min1/2.parallel.U.sup..phi.-V.sup..phi.X.parallel..sub.F.sup.2
wherein v.sup..phi..sub.ij.gtoreq.0 and x.sub.ij.gtoreq.0 for
.A-inverted.i, j, U.sup..phi.=[.phi.(u.sub.1), .phi.(u.sub.2), . .
. , .phi.(u.sub.N)], V.sup..phi.=[.phi.(v.sub.1), (v.sub.2), . . .
, .phi.(v.sub.R)], and .phi.: .sup.M.sub.+.fwdarw.H.sub.+ is a
mapping that projects an image u to a Hilbert space H, representing
.phi.(v.sub.j) as a linear combination of .phi.(u.sub.i), where
.phi.(v.sub.j)=.SIGMA..sub.i=1.sup.NH.sub.ij.phi.(u.sub.i), and
redefining said kernel NMF optimization as
1/2.parallel.U.sup..phi.-U.sup..phi.HX.parallel..sub.F.sup.2;
defining favorable relationships among feature vector pairs as:
.function..times.<.times..times..times..times..times..times..PHI..time-
s..times..PHI..times..times..times. ##EQU00059##
.times..PHI..function..times..times..times..times..times..times.
##EQU00059.2## .times..phi..function..phi..function. ##EQU00059.3##
defining unfavorable relationships between features vector pairs
as:
.function..times.<.times..times..times..times..times..times..PHI..time-
s..times..PHI..times..times..times. ##EQU00060## defining an SNMF
(supervised nonnegative matrix factorization) objective function as
.function..times..PHI..PHI..times..lamda..times..function..PHI..times..fu-
nction..PHI..times..times..times..times. ##EQU00061## populating
basis matrix V and feature matrix X by applying the following
iteratively multiplicative updates to achieve said SNMF objective
function:
.rarw..times..times..lamda..times..times..function..times..lamda..times..-
times..function. ##EQU00062## ##EQU00062.2##
.rarw..times..lamda..times..times..lamda..times..times.
##EQU00062.3## separating recognizable patterns within the basis
examples of basis matrix V into distinct pattern classifications
using a data classifier, these pattern classifications of the basis
examples being deemed the recognized patterns of the sets of
training data; wherein: said pattern recognition method is a face
detection method; each of said set of training data is a distinct
training image; the favorable features labeled in intrinsic graph G
identify regions within each distinct training image that are to be
focused upon when defining the basis examples; the unfavorable
features labeled in intrinsic graph G identify regions within each
distinct training image that are of least interest when defining
the basis examples; at least one of said distinct pattern
classifications is a face pattern classification; and a received
test image is tested for the existence of a face by determining a
basis test sample from the received test image, submitting the
basis test sample to the data classifier, and if the data
classifier identifies the basis test sample as belonging to the
face pattern classification, then deeming the received test image
as having a rendition of a face.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is related to U.S. patent application Ser. No.
12/854,768 entitled "Supervised Nonnegative Matrix Factorization"
filed on the same day as the instant application, and U.S. patent
application Ser. No. 12/854,776 entitled "Supervised Nonnegative
Matrix Factorization" filed on the same day as the instant
application. These related applications are hereby incorporated by
reference for all purposes.
BACKGROUND
1. Field of Invention
The present invention relates to the field of matrix factorization.
More specifically, it relates to the field of matrix factorization
with incorporated data classification properties.
2. Description of Related Art
Matrix factorization is a mechanism by which a large matrix U
(where U.epsilon.) is factorized into the product of two,
preferably smaller matrices: a basis matrix V (where V.epsilon.)
and a coefficient matrix X (where X.epsilon.). A motivation for
this is that is often easier to store and manipulate smaller
matrices V and X, than it is to work with a single, large matrix U.
However, since not all matrices can be factorized perfectly, if at
all, matrices V and X are often approximations. An objective of
matrix factorization is therefore to identify matrices V and X such
that when they are multiplied together, the result closely match
matrix U with minimal error.
Among the different approaches to matrix factorization, an approach
that has gained favor in the community is nonnegative matrix
factorization (NMF) due to its ease of implementation and useful
applications
Nonnegative matrix factorization has recently been used for various
applications, such as face recognition, multimedia, text mining,
and gene expression discovery. NMF is a part-based representation
wherein nonnegative inputs are represented by additive combinations
of nonnegative bases. The inherent nonnegativity constraint in NMF
leads to improved physical interpretation compared to other
factorization methods, such as Principal Component Analysis
(PCA).
Although NMF, and its variants, are well suited for recognition
applications, they lack classification capability. The lack of
classification capability is a natural consequence of its
unsupervised factorization method, which does not utilize
relationships within input entities, such as class labels.
Several approaches have been proposed for NMF to generate more
descriptive features for classification and clustering tasks. For
example, "Fisher Nonnegative Matrix Factorization", ACCV, 2004, by
Y. Wang, Y. Jiar, C. Hu, and M. Turk, proposes incorporating the
NMF cost function and the difference of the between-class scatter
from the within-class scatter. However, the objective of this
Fisher-NMF is not guaranteed to converge since it may not be a
convex function. "Non-negative Matrix Factorization on Manifold",
ICDM, 2008, by D. Cai, X. He, X. Wu, and J. Han proposes graph
regularized NMF (GNMF), which appends terms representing favorable
relationships among feature vector pairs. But, GNMF is handicapped
by not considering unfavorable relationships.
A different approach better suited for classification is a
technique called "graph embedding", which is derived from
topological graph theory. Graph embedding, embeds a graph G on a
surface, and is a representation of graph G on the surface in which
points of the surface are associated with vertices.
Recently, J. Yang, S. Yang, Y. Fu, X. Li, and T. Huang suggested
combining a variation of graph embedding with nonnegative matrix
factorization in an approached termed "Non-negative graph
embedding" (NGE), in CVPR, 2008. NGE resolved the previous problems
by introducing the concept of complementary space so as to be
widely considered the state-of-the-art. NGE, however, does not use
true graph embedding, and instead utilizes an approximate
formulation of graph embedding. As a result, NGE is not effective
enough for classification, particularly when intra-class variations
are large.
In a general sense, all of these previous works tried to
incorporate NMF with graph embedding, but none of them successfully
adopted the original formulation of graph embedding because the
incorporated optimization problem is considered intractable. In
addition, all the works are limited in that they depend on suitable
parameters which are not easy to determine appropriately.
It is an object of the present invention to incorporate NMF with
graph embedding using the original formulation of graph
embedding.
It is another object of the present invention to permit the use of
negative values in the definition of graph embedding without
violating the requirement of NMF to limit itself to nonnegative
values.
SUMMARY OF INVENTION
The above objects are met in method of factorizing a data matrix U
file by supervised nonnegative factorization, SNMF, including:
providing a data processing device to implement the following step:
accessing the data matrix U from a data store, wherein data matrix
U is defined as U.epsilon.; defining an intrinsic graph G, wherein
G={U,W}, each column of U.epsilon. representing a vertex, and each
element of similarity matrix W measures the similarity between
vertex pairs; defining a penalty graph G, wherein G={U, W} and each
element of dissimilarity matrix W measures unfavorable
relationships between the vertex pairs; defining an intrinsic
diagonal matrix r), wherein D=[D.sub.ij.] and
D.sub.ii=.SIGMA..sub.j=1.sup.nW.sub.ij; defining an intrinsic
Laplacian matrix L, wherein L=D-W; defining a penalty diagonal
matrix D, wherein D=[ D.sub.ij] and D.sub.ii=.SIGMA..sub.j=1.sup.n
W.sub.ij; defining a penalty Laplacian matrix L, wherein L= D- W;
defining a basis matrix V, where V.epsilon.; defining a feature
matrix X, where X.epsilon. defining a measure of the compactness of
intrinsic graph G by the weighted sum of squared distances defined
as
.SIGMA..sub.i<j.sup.nW.sub.ij.parallel.x.sub.i-x.sub.j.parallel..sup.2-
=Tr(XLX.sup.T), wherein x.sub.i is the i-th column of X and x.sub.j
is the j-th column of X; defining a measure of the separability of
penalty graph G by the weighted sum of squared distances defined as
.SIGMA..sub.i<j.sup.n
W.sub.ij.parallel.x.sub.i-x.sub.j.parallel..sup.2=Tr(X LX.sup.T),
wherein x.sub.i is the i-th column of X and x.sub.j is the j-th
column of X; defining F.sup.(1)(V,X) as an objective of NMF
(nonnegative matrix factorization); defining F.sup.(2)(X) as an
objective of graph embedding, F.sup.(2)(X) being proportional to
ratio
.function..function..times..times. ##EQU00001## deriving an SNMF
objective from a sum of F.sup.(1)(V,X) and F.sup.(2)(X), and
determining the SNMF objective through iterative multiplicative
updates.
Preferably, F.sup.(1)(V,X) is defined as
F.sup.(1)(V,X)=1/2.parallel.U-VX.parallel..sub.F.sup.2; and
F.sup.(2)(X) is defined as
.function..lamda..times..function..function..times..times.
##EQU00002## where .lamda. is a multiplication factor determined by
a validation technique.
Further preferably, F.sup.(1)(V,X) is defined as F.sup.(1)(V,X)=
1/2.parallel.U-VX.parallel..sub.F.sup.2; F.sup.(2)(X) is defined
as
.function..lamda..times..function..function..times..times.
##EQU00003## where .lamda. is a multiplication factor determined by
a validation technique, and where
.times. ##EQU00004## and the SNMF objective is defined as
.times..function..times..lamda..times..function..function..times..times.
##EQU00005## Following this definition of F.sup.(1)(V,X) and
F.sup.(2)(X), the SNMF objective is approximated as
.function..times..lamda..times..alpha..times..times..function..beta..time-
s..times..function..times..times..beta..alpha. ##EQU00006## where
V=V.sup.t and X=X.sup.t at time t and
.alpha..function..times..times..times..times..times..times..times..beta..-
function..times..times..function..times..times..times. ##EQU00007##
Following this approach, the SNMF objective is determined through
the following iterative multiplicative updates:
.rarw..times..times..lamda..times..times..function..times..lamda..times..-
times..function..times. ##EQU00008##
.times..times..alpha..times..times..beta..times..times..times..times..tim-
es..times..alpha..times..times..beta..times..times. ##EQU00008.2##
.rarw..times..lamda..times..times..lamda..times..times..times..times..tim-
es..times..alpha..times..times..beta..times..times.
##EQU00008.3##
In a preferred embodiment, matrix U is comprised of n samples and
each column of U represents a sample. Further preferably, each of
the samples is an image file.
W and W may be generated from true relationships among data pairs.
These data pairs may be class label data.
In a preferred embodiment, each column of feature matrix X is a low
dimensional representation of the corresponding column of U.
Also preferably, at least one of similarity matrix W or
dissimilarity matrix W has negative values. But Tr(XLX.sup.T) and
Tr(X LX.sup.T) are preferably positive.
In an embodiment of the present invention, similarity matrix W and
dissimilarity matrix W are defined by the concept of within-class
and between-class distances of Linear Discriminant Analysis (LDA).
In this embodiment, similarity matrix W=[W.sub.ij] is defined
as:
.times..times..di-elect cons. ##EQU00009## wherein y.sub.i is a
class label of the i-th sample, y.sub.j is a class label of the
j-th sample, and n.sub.c is the size of class c; and dissimilarity
matrix W=[ W.sub.ij] is defined as
##EQU00010## wherein n is the number of data points.
The present invention is also embodied in a data classification
system for classifying test data, having: a data processing device
with access to a data matrix U of training data and with access to
the test data, the data matrix U being defined as U.epsilon.;
wherein the data processing device classifies the test data
according to a classification defined by X.sub.ij; wherein an
intrinsic graph G is defined as G={U,W}, each column of U.epsilon.
representing a vertex and each element of similarity matrix W
measuring the similarity between vertex pairs; a penalty graph G is
defined as G={U, W} and each element of dissimilarity matrix W
measures unfavorable relationships between the vertex pairs; an
intrinsic diagonal matrix D is defined as D=[D.sub.ij] and
D.sub.ii=.SIGMA..sub.j=1.sup.nW.sub.ij; an intrinsic Laplacian
matrix L is defined as L=D-W; a penalty diagonal matrix D is
defined as D=[ D.sub.ij] and D.sub.ii=.SIGMA..sub.j=1.sup.n
W.sub.ij; a penalty Laplacian matrix L is defined as L= D- W; a
basis matrix V is defined as V.epsilon. a feature matrix X is
defined as X.epsilon.; a measure of the compactness of intrinsic
graph G is defined by the weighted sum of squared distances defined
as
.SIGMA..sub.i<j.sup.nW.sub.ij.parallel.x.sub.i-x.sub.j.parallel..sup.2-
=Tr(XLX.sup.T), wherein x.sub.i is the i-th column of X and x.sub.j
is j-th column of X; a measure of the separability of penalty graph
G is defined by the weighted sum of squared distances defined as
.SIGMA..sub.i<j.sup.n
W.sub.ij.parallel.x.sub.i-x.sub.j.parallel..sup.2=Tr(X LX.sup.T),
wherein x.sub.i is the i-th column of X and x.sub.j is the j-th
column of X; F.sup.(1)(V,X) defines an objective of NMF
(nonnegative matrix factorization), wherein as
F.sup.(1)(V,X)=1/2.parallel.U-VX.parallel..sub.F.sup.2;
F.sup.(2)(X) defines an objective of graph embedding, where
.function..lamda..times..function..function..times..times.
##EQU00011## .lamda. is a multiplication factor determined by a
validation technique, and
.times. ##EQU00012## and an approximation of supervised nonnegative
factorization, SNMF, is defined as
.function..times..lamda..times..alpha..times..times..function..beta..time-
s..times..function..times..times..beta..alpha. ##EQU00013## where
V=V.sup.t and X=X.sup.t at time t,
.alpha..function..times..times..times..times..times..beta..function..time-
s..times..function..times..times..times. ##EQU00014## factorized
matrices X.sub.ij and V.sub.ij are identified by the following
iterative multiplicative updates:
.rarw..times..times..lamda..times..times..function..times..lamda..times..-
times..function..times. ##EQU00015##
.times..times..alpha..times..times..beta..times..times..times..times..tim-
es..times..alpha..times..times..beta..times..times. ##EQU00015.2##
.rarw..times..lamda..times..times..lamda..times..times..times..times..tim-
es..times..alpha..times..times..beta..times..times.
##EQU00015.3##
Preferably, data matrix U is comprised of n samples and each column
of U represents a sample. In this case, each of the samples may be
an image file.
Further preferably, the data pairs are class labels of data.
Additionally, each column of feature matrix X may be a low
dimensional representation of the corresponding column of U.
In an embodiment of the present invention, at least one of
similarity matrix W or dissimilarity matrix W has negative
values.
Additionally, similarity matrix W=[W.sub.ij] is preferably defined
as:
.times..times..di-elect cons. ##EQU00016## wherein y.sub.i is a
class label of the i-th sample and n.sub.c is the size of class c;
and dissimilarity matrix W=[ W.sub.ij] is defined as
##EQU00017## wherein n is the total number of data points.
The above objects are also met in a method of factorizing a data
matrix U file by supervised nonnegative factorization, SNMF,
having: providing a data processing device to implement the
following step: accessing the data matrix U from a data store,
wherein data matrix U is defined as U.epsilon. defining an
intrinsic graph G, wherein G={U,W}, each column of U.epsilon.
represents a vertex, and each element of similarity matrix W
measures the similarity between vertex pairs; defining a penalty
graph G, wherein G={U, W} and each element of dissimilarity matrix
W measures unfavorable relationships between the vertex pairs;
defining an intrinsic diagonal matrix D, wherein D=[D.sub.ij] and
D.sub.ii=.SIGMA..sub.j=1.sup.nW.sub.ij; defining an intrinsic
Laplacian matrix L, wherein L=D-W; defining a penalty diagonal
matrix D, wherein D=[ D.sub.ij] and D.sub.ii=.SIGMA..sub.j=1.sup.n
W.sub.ij; defining a penalty Laplacian matrix L, wherein L= D- W;
defining a basis matrix V, where V.epsilon.; defining a feature
matrix X, where X.epsilon.; given a kernel NMF optimization problem
of min1/2.parallel.U.sup..phi.-V.sup..phi.X.parallel..sub.F.sup.2
wherein v.sup..phi..sub.ij.gtoreq.0 and x.sub.ij.gtoreq.0 for
.A-inverted. i, j, U.sup..phi.=[.phi.(u.sub.1), .phi.(u.sub.2), . .
. , .phi.(u.sub.N)], V.sup..phi.=[.phi.(v.sub.1), .phi.(v.sub.2), .
. . , .phi.(v.sub.R)] and .phi.: .sup.M.sub.+.fwdarw. is a mapping
that projects an image u to a Hilbert space , redefining the kernel
NMF optimization as
1/2.parallel..sup..phi.-U.sup..phi.HX.parallel..sub.F.sup.2;
defining favorable relationships among feature vector pairs as:
.function.<.times..times..times..times..PHI..times..function..PHI..tim-
es..times. ##EQU00018## .PHI..function..times..times..times..times.
##EQU00018.2## defining unfavorable relationships between features
vector pairs as:
.function.<.times..times..times..times..PHI..times..times..PHI..times.-
.times..times. ##EQU00019## defining an SNMF objective function
as
.function..times..PHI..PHI..times..lamda..times..function..PHI..times..fu-
nction..PHI..times..times..times. ##EQU00020## and applying the
following iteratively multiplicative updates to achieve the SNMF
objective function:
.rarw..times..times..lamda..times..times..function..times..lamda..times..-
times..function. ##EQU00021## ##EQU00021.2##
.rarw..times..lamda..times..times..lamda..times..times.
##EQU00021.3##
Other objects and attainments together with a fuller understanding
of the invention will become apparent and appreciated by referring
to the following description and claims taken in conjunction with
the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
In the drawings wherein like reference symbols refer to like
parts.
FIG. 1 is a flowchart of a preferred SNMF method in accord with the
present invention.
FIG. 2 is a flow chart of preferred updates for the preferred SNMF
objective.
FIG. 3 is an example of the present invention incorporating a
kernel NMF.
FIG. 4 shows a set of sixteen, simplified, test face images having
a combination of four distinct eye-pairs and four distinct mouth
for comparing the generation of basis images.
FIG. 5 illustrates the basis images generated by the present
invention for identifying the distinctive eye-pairs of FIG. 4
FIG. 6 illustrates the basis images generated by the present
invention for identifying the distinctive mouth shapes of FIG.
4
FIG. 7 illustrates four basis images generated by the prior art NMF
approach for distinguishing between the 16 images of FIG. 4.
FIG. 8 is an example of 7 basic facial expressions of an image, as
incorporated in the JAFFE database.
FIG. 9a is an example of face class images found in the CBCL
dataset.
FIG. 9b is an example of non-face class images found in the CBCL
dataset.
FIG. 10 shows exemplary hardware for implementing the present
invention.
FIG. 11 shows plot results of testing of the presenting
invention.
FIG. 12 is a table comparing the results of the present invention
with those of eight prior art approaches.
FIG. 13 shows sample basis images generated by the prior art NMF
approach.
FIG. 14 shows sample basis images generated by the present
invention.
FIG. 15 is a plot comparing the accuracy of the present invention
with three prior art approaches.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Recently, Nonnegative Matrix Factorization (NMF) has received much
attention due to its representative power for nonnegative data. The
discriminative power of NMF, however, is limited by its inability
to consider relationships present in data, such as class labels.
Several works tried to address this issue by adopting the concept
of graph embedding, albeit in an approximated form. Herein, a
Supervised NMF (SNMF) approach that incorporates the objective
function of graph embedding with that of nonnegative matrix
factorization is proposed.
Before describing SNMF, it is beneficial to first provide
background information regarding non-negative matrix factorization
(NMF) and graph embedding.
With reference to FIG. 1, SNMF combines the benefits of
non-negative matrix factorization (NMF) and graph embedding, each
of which is discussed in turn.
Given a raw matrix U=.left brkt-bot.u.sub.1, u.sub.2, . . . ,
u.sub.n.right brkt-bot..epsilon., SNMF, like NMF, factorizes a
matrix U into the product of two, preferably smaller matrices: a
basis matrix V (where V=.left brkt-bot.v.sub.1, v.sub.2, . . . ,
v.sub.r,.right brkt-bot..epsilon.) and a coefficient matrix (or
feature matrix) X (where X=.left brkt-bot.x.sub.1, x.sub.2, . . . ,
x.sub.n.right brkt-bot..epsilon.). For example, matrix U may be a
raw data matrix of n samples (or data points) with each sample
being of dimension d such that U.epsilon. (step S1). A specific
example of this may be if each of the n columns of U (i.e. each of
the n samples) is an image of size d. Matrix U is factorized into
the product of basis matrix V and a feature matrix X by minimizing
the following reconstruction error:
.times..times..times..times..times..gtoreq..times..times..times..times..g-
toreq..times..times..A-inverted..times..times. ##EQU00022##
Where .parallel..cndot..parallel..sub.F denotes the Frobenius norm.
Since Eq. (1) is not a convex function of both V and X, there is no
closed form solution for the global optimum. Thus, many researchers
have developed iterative update methods to solve the problem. Among
them, a popular approach is the multiplicative updates devised by
Lee and Seung in "Learning the parts of objects by non-negative
matrix factorization", Nature, 401:788-791, 1999, which is hereby
incorporated in its entirety by reference. These multiplicative
updates, shown below as equation (2), are popular due to their
simplicity.
.rarw..times..rarw..circle-w/dot..times..times. ##EQU00023## These
updates monotonically decrease the objective function in Eq.
(1).
Graph embedding, on the other hand, may be defined as the optimal
low dimensional representation that best characterizes the
similarity relationships between data pairs. In graph embedding,
dimensionality reduction involves two graphs: an intrinsic graph
that characterizes the favorable relationships among feature vector
pairs and a penalty graph that characterizes the unfavorable
relationships among feature vector pairs. Thus, applying graph
embedding to data matrix U would organize its raw data into classes
according to specified favorable and unfavorable relationships. To
achieve this, however, one first needs to define graph embedding as
applied to data matrix U.
For graph embedding, one let G={U,W} be an intrinsic graph where
each column of U.epsilon. represents a vertex and each element of W
(where W.epsilon.) measures the similarity between vertex pairs
(step S3). In the same way, a penalty graph G, which measures the
unfavorable relationships between vertex pairs may be defined as
G={U, W}, where W.epsilon. (step S5). In this case, W and W can be
generated from true relationships among data pairs, such as class
labels of data.
In addition, the diagonal matrix D=[D.sub.ij] is defined, where
D.sub.ii=.SIGMA..sub.j=1.sup.nW.sub.ij (step S7) and the Laplacian
matrix L=D-W is defined (step S9). Matrices D and L are defined
from W in the same way (steps S11 and S13), such that
D.sub.ii=.SIGMA..sub.j=1.sup.n W.sub.ij and L= D- W.
As is explained above, to factorize data matrix U, which is defined
as U.epsilon., one defines a basis matrix V such that V.epsilon.
(step S15), defines a feature matrix X such that X.epsilon. (step
S17), and seeks to populate V and X such that the product of V and
X approximates U with minimal error. An object of the present
invention, however, is to combine graph embedding with the
factorization of matrix U such that the classification properties
of graph embedding are incorporated into factorized basis matrix V
and a feature matrix X. The present embodiment achieves this by
defining the objective of graph embedding in terms of feature
matrix X.
First, let each column of feature matrix X be a low dimensional
representation of the corresponding column of U. Then, one can
measure the compactness of the intrinsic graph G and the
separability of the penalty graph G by the weighted sum of squared
distances of feature matrix X, as follows:
F.sub.DIS(X)=.SIGMA..sub.i<j.sup.nW.sub.ij.parallel.x.sub.i-x.sub.j.pa-
rallel..sup.2=Tr(XLX.sup.T) (Step S19)
F.sub.DIS(X)=.SIGMA..sub.i<j.sup.n
W.sub.ij.parallel.x.sub.i-x.sub.j.parallel..sup.2=Tr(X LX.sup.T)
(Step S19) where F.sub.DIS expresses the compactness of favorable
relationships, F.sub.DIS expresses the separability of unfavorable
relationships, x.sub.i is the i-th column of X, and x.sub.j is j-th
column of X.
It is desired to minimized F.sub.DIS and maximize F.sub.DIS for a
given W and W. The objective of graph embedding, as is the case of
most dimensionality reduction methods, can therefore be generalized
to the following unified framework with specifically defined W and
W.
.times..function..times..times..times..times..function..times..times.
##EQU00024##
To acquire both the benefits of part-based representation and the
classification power of graph embedding, the present approach
addresses both the objectives of NMF and the objective of graph
embedding. However, unlike previous works, the present invention
utilizes the ratio formation of graph embedding. The objective of
NMF, F.sup.(1)(V,X), can be derived from equation (1), or can be
re-expressed as equation (7) (step S23), where the constant
multiple of 1/2 may be optionally dropped for simplicity. That is,
it simplifies the derivative.
.function..times..times..function..times. ##EQU00025## The
objective of graph embedding, F.sup.(2)(X), can be derived from
equation (5) or re-expressed as equation (8) (step S25), as:
.function..function..times..times..times..times..function..times..times..-
times..times..function..lamda..times..function..times..times..times..times-
..function..times..times. ##EQU00026## where parameter .lamda. is a
multiplication factor determined using a validation technique, i.e.
determined by running experiments with different values of
.lamda.'s and selecting the best one.
Thus the objective of SNMF may be defined by the combined
objectives formulation of NMF and graph embedding (step S27)
as:
.times..function..times..times..times..times..function..times..times.
##EQU00027## or alternatively,
.times..times..lamda..times..function..times..times..times..times..functi-
on..times..times. ##EQU00028##
This approach explicitly minimizes the ratio of two distances,
which is relative compactness of the favorite relationship.
Consequently, SNMF can employ any definitions of similarity and
dissimilarity matrices W and W (including negative values) if both
Tr(XLX.sup.T) and Tr(X LX.sup.T) are positive. These constraints
are reasonable since Tr(XLX.sup.T) and Tr(X LX.sup.T) are distance
measures. By contrast, NGE of the prior art requires more
restricted constraints when defining the matrices. For example, in
NGE, all the elements of W and W must be nonnegative because
negative elements can make the objective of NGE be a non-convex
function.
Also unlike, NGE, SNMF does not require any complementary spaces.
NGE requires the introduction of complementary spaces to construct
objective functions by addition of nonnegative terms. However, it
is doubtful whether the complementary space exists without
violating the nonnegative constrains. Even if such spaces exist,
one has no guarantee that the objective function of NGE can
discover the complementary space.
Before describing a detailed implementation for achieving the
objectives of SNMF, as described in equations (9) and (10), a
sample definition of W and W is provided. A presently preferred
embodiment defines W and W by borrowing the concept of within-class
and between-class distances from Linear Discriminant Analysis
(LDA), as is generally described, for example, in chapter 5 of book
"Pattern Classification" by R. O. Duda, P. E. Hart, and D. G.
Stork, published by Wiley-interscience, Hoboken, N.J., 2nd edition,
2001, which is hereby incorporated by reference. This approach
begins by letting y.sub.i be a class label of the i-th sample and
n.sub.c be the size of class c. Alternatively, y=[y.sub.1, y.sub.2,
. . . y.sub.n].sup.T, where y.sub.i.epsilon.{1, 2, . . . , C}, is a
true label vector. Matrices W=[W.sub.ij] and W=[ W.sub.ij] may be
defined as
.times..times..di-elect cons..times..times..times..times.
##EQU00029## where n is the total number of data points
Alternatively, matrices W=[W.sub.ij] and W=[ W.sub.ij] may also be
defined as
.times..times..times..times..times..times. ##EQU00030##
Note that the elements of W can be negative, which means that NGE
cannot use W and W from the LDA formulation, as describe
immediately above. Not only can SNMF adopt the LDA formulation in
order to measure similarities, but other formulations can be
adopted as well. For example, for multi-modal data sets, the
Marginal Fisher Analysis (MFA) formulation, which effectively
reflects local relationships among data, can be used. Information
on MFA is provided in "Marginal Fisher Analysis and Its Variants
For Human Gait Recognition and Content-based Image Retrieval", IEEE
Trans on Image Processing, 16(11), 2007, by D. Xu, S. Yan, D. Tao,
S. Lin, and H. J. Zhang, which is herein incorporated in its
entirety.
Preferably, all the pair-wise distances are computed based on the
unit basis vectors. This normalized distance calculation prevents
the distance ratio from meaninglessly decreasing due to rescaling
of basis vectors.
With reference to FIG. 2, in the following example, the following
SNMF objective function (step S28), as defined from equation (10),
is optimized.
.times..function..times..lamda..times..function..function..times..times..-
times..times..times..times..times. ##EQU00031##
F(V,X) is not a convex function of both V and X. Therefore,
interactive updates are needed to minimize the objective function
(11). Due to its fractional term, F(V,X) can be troublesome to
optimize by multiplicative updates. Therefore, a presently
preferred embodiment uses an approximation of its fractional term
with a subtraction of two terms at each time t. Suppose that
V=V.sup.t and X=X.sup.t at time t (step S33). The approximate
function of F(V,X) may be defined as (step S35):
.function..times..lamda..times..alpha..times..times..function..beta..time-
s..times..function..times..times..beta..alpha..times..times..times..times.-
.alpha..function..times..times..times..times..times..times..times..times..-
beta..times..times..times..times..times..times..times..times..times..times-
..times..function..apprxeq..function..times..times..times..times..times..t-
imes..differential..differential..times..function..times..times..apprxeq..-
differential..differential..times..function..times. ##EQU00032##
then {tilde over (F)}(V.sup.t,X) is non-increasing under the
following multiplicative update rules (step S37).
.rarw..times..times..lamda..times..times..function..times..lamda..times..-
times..function..alpha..times..times..beta..times..times..times..times..ti-
mes..times..alpha..times..times..beta..times..times. ##EQU00033##
In addition, for a matrix A, A.sup.+=[A.sub.ij.sup.+] and
A.sup.-=[A.sub.ij.sup.-], where
.times..times.>.times..times.< ##EQU00034## Therefore, {tilde
over (F)}(V.sup.t,X) is non-increasing under the following
multiplicative update (step S39):
.rarw..times..lamda..times..times..lamda..times..times..alpha..times..tim-
es..beta..times..times. ##EQU00035## This leads to the following
theorem: Theorem 1: The approximation of objective function {tilde
over (F)} in equation (12) is non-increasing under the update rules
of equations (14) and (18). A proof of Theorem 1 is provided the
appendix, attached below.
Since the multiplicative factors of equations (14) and (18) are
always non-negative by Theorem 1, it follows that all elements in V
and X are maintained non-negative after each update.
As is stated above, the distance ratio part of SNMF, which may be
computed based on class labels, can be incorporated into other NMF
variation. As an illustrated example in FIG. 3, the ratio part of
the objective of the present invention, as defined in equation (8)
is incorporated into a Kernal NMF (KNMF). The present examples uses
a Kernal NMF approach as explained in article "Nonlinear
Nonnegative Component Analysis", by D S. Zafeiriou and M. Petrou,
in CVPR, pp. 2860-2865, 2009, which is herein incorporated in its
entirety.
Beginning with step S41, let .phi.: R.sup.M.sub.+.fwdarw. be a
mapping that projects u image to a Hilbert space of arbitrary
dimensionality. In Kernel NMF, the decomposed matrix contains the
projected images by the mapping .phi.. More formally, Kernel NMF
solves the following optimization problem:
min1/2.parallel.U.sup..phi.-V.sup..phi.X.parallel..sub.F.sup.2 (20)
subject to: v.sup..phi..sub.ij.gtoreq.0 and x.sub.ij.gtoreq.0 for
.A-inverted.i,j where U.sup..phi.=[.phi.(u.sub.1), .phi.(u.sub.2),
. . . , .phi.(u.sub.N)] and V.phi.=[.phi.(v.sub.1), .phi.(v.sub.2),
. . . , .phi.(v.sub.R)]. To solve this optimization problem, KNMF
assumes that every .phi.(v.sub.j) can be represented as a linear
combination of .phi.(u.sub.i): i.e.
.phi.(v.sub.j)=.SIGMA..sub.i=1.sup.NH.sub.ij.phi.(u.sub.i).
Then the objective function in Eq. (20) can be converted (Step S43)
to 1/2.parallel.U.sup..phi.-U.sup..phi.HX.parallel..sub.F.sup.2
(21) This objective can be monotonically minimized by the following
updates.
.rarw..circle-w/dot..rarw..circle-w/dot..times..times..times..times..phi.-
.function..phi..function. ##EQU00036##
Using the is Kernel NMF as a feature generation method, the
presently suggested approach for SNMF can now be applied. The
normalized compactness of favorable relationships is (Step
S45):
.times..function.<.times..times..times..times..times..times..PHI..time-
s..function..PHI..times..times..function..times.<.times..times..times..-
times..times..times..PHI..times..function..PHI..times..times..times..times-
..times..times..times..PHI..function..times..times..times..times.
##EQU00037## Therefore the objective function F is defined as (step
S47):
.function..times..PHI..PHI..times..lamda..times..function..PHI..times..fu-
nction..PHI..times..times..times. ##EQU00038##
Following a similar logic as described above, the approximation of
F is non-decreasing under the following multiplicative update rules
(step S49):
.rarw..times..times..lamda..times..times..function..times..lamda..times..-
times..function..rarw..times..lamda..times..times..lamda..times..times.
##EQU00039##
The present SNMF approach was tested in various applications, and
the results compared to other techniques known in the art.
In a first application, the present invention is first illustrated
as applied to a simplified, face classification application, and
its ability to generate basis images and identify specific image
features is tested.
With reference to FIG. 4, a set of sixteen, 7.times.7 pixel,
simplified, test face images 51 were first generated by combining
four images of distinct eye-pairs with four images of distinct
mouth shapes in all possible distinct combinations. Each eye-pair
is distinguished by on pupil position.
Because of SNMF's ability to make distinctions based on labels, it
is possible to specify specific features on which one wishes to
focus. For example, in a first test run, the present invention is
asked identify basis images (i.e. characteristic images used to
classify features) to distinguish between types of eyes in the
sixteen test face images. In a second test run, the present
invention is asked to identify basis images to distinguish between
mouth shapes. The results are shown in FIGS. 5 and 6,
respectively.
In FIG. 5, because the present method finds the representation that
effectively distinguishes different classes, when class labels are
imposed placing an emphasis on eye position, the present invention
correctly identifies the original four distinct eye-pairs as four
basis images 55a-55d. In the present case, since the mouth area
does not provide much information for classification of eye
position, the mouth section of each basis image 55a-55d is averaged
out.
In FIG. 6, when class labels are imposed placing an emphasis
classifying mouth shape, the present invention again identifies
four basis images 57a-57d for distinguishing between mouth shapes.
In the present case, the eyes are averaged out since they do not
provide much information for classification of mouth shape.
The prior art NMF approach is also applied to the sixteen test
images 51 of FIG. 4, but for comparison purposes, a restriction is
imposed to limited its creation of basis images to only four. The
resultant four basis images are shown in FIG. 7. Since NMF does not
support class labels, the resultant four images 59a-59d are
insufficient for classifying the four different eye positions or
the for different mouth shapes. NMF would require more basis images
in order to classify even the simplified, test face images 51 of
FIG. 4.
Unlike the present approach, NMF cannot utilize label information,
and NMF can therefore not focus on specific parts of images, which
is often an importance feature for classification purposes.
Consequently, NMF needs to represent all the components
sufficiently well for classification of each part. As a result, NMF
requires more basis images to achieve classification of any
specific feature.
The sixteen test face images 51 of FIG. 4 are very simplified
representations of human faces, but real world datasets are much
more complicated. In real world datasets, as the number of basis
images increases, not only does the amount of information needed to
discriminate between different classes increase, but also the
amount of noise (i.e. image data not necessary for classification)
increases, which drops the classification performance.
Because the present approach can use class data to focus on
specific features, it is much more resistant to such noise, and
obtains greater performance with fewer basis images. This ability
is particularly important in identifying specific features, such as
facial expressions.
Two examples using two industry-standard databases of actual human
faces are provided below. A first example uses the JAFFE database,
and the second example uses the CBCL database. The JAFFE database
contains 213 images of 10 Japanese female subjects. For each
subject, 3 or 4 samples for each of 7 basic facial expressions are
provided, as is illustrated in FIG. 8. CBCL dataset consists of two
classes images (faces and non-faces), with each image having a size
of 19.times.19 pixels. A sample of the CBCL database is illustrated
in FIGS. 9a and 9b showing a sampling of the face class images and
of the non-face class images, respectively.
For evaluation purposes when using the JAFFE database, once the
face region is cropped, each image is down-sampled to 40.times.30
pixels. Following the typical approach of previous works, 150
images from the JAFFE database are randomly selected as a training
set (i.e. training data), and the rest are utilized as a test set
(i.e. test data). The results after ten tests is presented and
compared with the accuracy results of previous works.
To test the effectiveness of the present SNMF approach, the results
of the present SNMF approach is compared with eight other popular
subspace learning algorithms: Nonnegative Matrix Factorization
(NMF), Localized NMF (LNMF), polynomial NMF (PNMF), Principal
Component Analysis (PCA), Independent Component Analysis (ICA),
Linear Discriminant Analysis (LDA), kernal independent component
analysis (KICA), and kernal principle component analysis
(KPCA).
In the feature generation and classification setup, each column of
a data matrix U is constructed by concatenating all column of an
image. All elements of U are adjusted (i.e. normalized) to range
from 0 to 1. U is then divided into a training set U.sub.training
and a test set U.sub.test. Training set U.sub.training is
factorized into V.times.X. The feature matrices for the training
set (i.e. X.sub.training) and the test set (i.e. X.sub.test) are
obtained as X.sub.training=(V.sup.TV).sup.-1V.sup.T U.sub.training
and X.sub.test=(V.sup.TV).sup.-1V.sup.T U.sub.test,
respectively.
For classification, a linear kernel SVM is used. The SVM parameter
is determined through a validation approach. The parameter .lamda.,
which is the multiplication factor of the distance ratio part, is
determined using a validation.
The above described methods of SNMF, which, as is illustrated
below, is well suited for data classification, may be implemented
in various types of data processing hardware.
With reference to FIG. 10, a general example of such data
processing hardware includes a data processing device 11. As it is
known in the art, data processing device 11 may be a
micro-computer, a central processing unit (CPU), a specialized
image processor, a programmable logic device (PLD), a complex
programmable logic device (CPLD), an application specific
integrated circuit (ASIC), or other computing device. In general,
data processing device 11 may include an arithmetic logic unit
(ALU) or CPU 13, control logic 15, various timing clocks 17,
various types of registers 19 (including data registers, shift
registers, workspace registers, status registers, address
registers, interrupt registers, instruction registers, program
counters, etc.), and a memory unit 21 (including RAM and/or
ROM).
In the present example of FIG. 10, raw data matrix U of n samples,
which may consist of training data when used for data
classification or categorization, may be maintain in a data store
23. Data processing device 11 may directly access data store 23 via
a direct link 32 and appropriate input/output interface 27, or may
alternatively access data store 23 via communication links 31/33
and network 29, which may be a LAN, WLAN, or the Internet.
Similarly, test data 37, which is the data that is to be
classified, may be accessible via a direct link 34 or through
communication network 29 and communication links 31/35. It is to be
understood that test data 37 may be an archive of data (such as a
store of face images) or may be generated in real time (such as
face images created by surveillance cameras). It is further to be
understood that communication links 31-35 may be wired or wireless
communication links.
The results of this first approach are summarized in FIGS. 11 and
12. As is illustrated in FIG. 11, the residual error and the
objective function error decrease gradually with increasing
iterations. Table 1 in FIG. 12 compares the performance on the
JAFFE dataset of the present approach versus those of other prior
art methods, in terms of the maximum accuracy and the number or
required basis images. As shown, the present approach outperforms
the other prior art methods. Although the LDA approach required
fewer basis images, it may be noted that LDA is not robust to
variations, and is particularly poor at dealing with occlusion
regions (i.e. regions of an image that are obscured either by image
corruption or human obstraction, such as an intentional covering of
a part of the face).
For illustration purposes, FIG. 13a provides a sampling of basis
images created by NMF and FIG. 13b provides a sampling of basis
image created by the present invention. As compared with the NMF
basis images, the basis images of the present invention are sparser
and more focused on the regions of the a human face better suited
for distinguishing facial expression.
The results of the present invention upon the CBCL database are
summarized in FIG. 14. A graph 71 plots the classification
performance of the present invention 61, and compares it with the
classification performance of the NMF 63, PCA 65, and ICA 67
methods. As shown, the plot present invention 61 outperforms the
prior art methods.
While the invention has been described in conjunction with several
specific embodiments, it is evident to those skilled in the art
that many further alternatives, modifications and variations will
be apparent in light of the foregoing description. Thus, the
invention described herein is intended to embrace all such
alternatives, modifications, applications and variations as may
fall within the spirit and scope of the appended claims.
* * * * *