U.S. patent application number 10/512194 was filed with the patent office on 2005-09-15 for pattern characteristic extraction method and device for the same.
This patent application is currently assigned to NEC CORPORATION. Invention is credited to Kamei, Toshio.
Application Number | 20050201595 10/512194 |
Document ID | / |
Family ID | 30118927 |
Filed Date | 2005-09-15 |
United States Patent
Application |
20050201595 |
Kind Code |
A1 |
Kamei, Toshio |
September 15, 2005 |
Pattern characteristic extraction method and device for the
same
Abstract
An input pattern feature amount is decomposed into element
vectors. For each of the feature vectors, a discriminant matrix
obtained by discriminant analysis is prepared in advance. Each of
the feature vectors is projected into a discriminant space defined
by the discriminant matrix and the dimensions are compressed.
According to the feature vector obtained, projection is performed
again by the discriminant matrix to calculate the feature vector,
thereby suppressing reduction of the feature amount effective for
the discrimination and performing effective feature extraction.
Inventors: |
Kamei, Toshio; (Tokyo,
JP) |
Correspondence
Address: |
YOUNG & THOMPSON
745 SOUTH 23RD STREET
2ND FLOOR
ARLINGTON
VA
22202
US
|
Assignee: |
NEC CORPORATION
7-1, SHIBA 5-Chome MINATO-KU
Tokyo
JP
|
Family ID: |
30118927 |
Appl. No.: |
10/512194 |
Filed: |
October 22, 2004 |
PCT Filed: |
July 4, 2003 |
PCT NO: |
PCT/JP03/08556 |
Current U.S.
Class: |
382/118 ;
382/190 |
Current CPC
Class: |
G06K 9/00275 20130101;
G06K 9/00281 20130101; G06K 9/6234 20130101 |
Class at
Publication: |
382/118 ;
382/190 |
International
Class: |
G06K 009/00; G06K
009/66 |
Foreign Application Data
Date |
Code |
Application Number |
Jul 16, 2002 |
JP |
2002-207022 |
Oct 15, 2002 |
JP |
2002-300594 |
Jan 13, 2003 |
JP |
2003-68916 |
Claims
1-22. (canceled)
23. A pattern feature extraction method comprising the steps of
extracting a plurality of input vectors from an input pattern,
projecting the input vectors to obtain projection vectors by using
basis matrices respectively corresponding to the input vectors, and
projecting, using a discriminant matrix corresponding to a joint
vector, the joint vector obtained by combining a plurality of
projection vectors, thereby extracting a feature of the input
pattern.
24. A pattern feature extraction method comprising the steps of
extracting a plurality of input vectors from an input pattern,
projecting the input vectors to obtain projection vectors by using
basis matrices respectively corresponding to the input vectors,
normalizing the projection vectors to obtain normalized vectors,
and projecting, using a discriminant matrix corresponding to a
joint vector, the joint vector obtained by combining a plurality of
normalized vectors, thereby extracting a feature of the input
pattern.
25. A pattern feature extraction method including the steps of
extracting a plurality of input vectors from an input pattern and
projecting the input vectors to obtain projection vectors, thereby
extracting a feature of the input pattern, characterized in that in
the step of projecting the input vectors to obtain projection
vectors, the input vectors are projected using a transformation
matrix specified by basis matrices respectively corresponding to
the input vectors and by a discriminant matrix corresponding to a
joint vector obtained by combining projection vectors respectively
obtained by projecting the input vectors using the basis
matrices.
26. A pattern feature extraction method according to claim 23,
characterized in that the basis matrices corresponding to the input
vectors serve as discriminant matrices for the input vectors.
27. A pattern feature extraction method according to claim 24,
characterized in that the basis matrices corresponding to the input
vectors serve as discriminant matrices for the input vectors.
28. A pattern feature extraction method according to claim 25,
characterized in that the basis matrices corresponding to the input
vectors serve as discriminant matrices for the input vectors.
29. A pattern feature extraction method according to claim 23,
characterized in that the basis matrices corresponding to the input
vectors are basis matrices specified by transformation matrices for
extracting principal component vectors of the input vectors and by
discriminant matrices for the principal component vectors.
30. A pattern feature extraction method according to claim 24,
characterized in that the basis matrices corresponding to the input
vectors are basis matrices specified by transformation matrices for
extracting principal component vectors of the input vectors and by
discriminant matrices for the principal component vectors.
31. A pattern feature extraction method according to claim 25,
characterized in that the basis matrices corresponding to the input
vectors are basis matrices specified by transformation matrices for
extracting principal component vectors of the input vectors and by
discriminant matrices for the principal component vectors.
32. A pattern feature extraction method according to claim 23,
characterized in that the step of extracting input vectors
comprises the step of extracting vectors whose elements are pixel
values obtained from sample points in each sample point set for
each of a plurality of predetermined sample point sets in an image
serving as an input pattern.
33. A pattern feature extraction method according to claim 24,
characterized in that the step of extracting input vectors
comprises the step of extracting vectors whose elements are pixel
values obtained from sample points in each sample point set for
each of a plurality of predetermined sample point sets in an image
serving as an input pattern.
34. A pattern feature extraction method according to claim 25,
characterized in that the step of extracting input vectors
comprises the step of extracting vectors whose elements are pixel
values obtained from sample points in each sample point set for
each of a plurality of predetermined sample point sets in an image
serving as an input pattern.
35. A pattern feature extraction method according to claim 32,
characterized in that the sample point set comprises a set having
as sample points pixels in partial images obtained from a local
region of the image, thereby extracting a feature of the image.
36. A pattern feature extraction method according to claim 32,
characterized in that the sample point set comprises a set having
as sample points pixels in each reduced image obtained from the
image, thereby extracting a feature of the image.
37. A pattern feature extraction method according to claim 23,
characterized in that the step of extracting input vectors
comprises the step of extracting as input vectors feature amounts
calculated from each local region for each of a plurality of local
regions of the image serving as the input pattern.
38. A pattern feature extraction method according to claim 24,
characterized in that the step of extracting input vectors
comprises the step of extracting as input vectors feature amounts
calculated from each local region for each of a plurality of local
regions of the image serving as the input pattern.
39. A pattern feature extraction method according to claim 25,
characterized in that the step of extracting input vectors
comprises the step of extracting as input vectors feature amounts
calculated from each local region for each of a plurality of local
regions of the image serving as the input pattern.
40. A pattern feature extraction method according to claim 23,
characterized in that the step of extracting input vectors
comprises the steps of Fourier-transforming the image serving as
the input pattern, extracting Fourier spectrum vectors as the input
vectors from a Fourier spectrum of the image, and extracting
Fourier amplitude vectors as the input vectors from a Fourier
amplitude spectrum of the image, thereby extracting a feature of
the image.
41. A pattern feature extraction method according to claim 24,
characterized in that the step of extracting input vectors
comprises the steps of Fourier-transforming the image serving as
the input pattern, extracting Fourier spectrum vectors as the input
vectors from a Fourier spectrum of the image, and extracting
Fourier amplitude vectors as the input vectors from a Fourier
amplitude spectrum of the image, thereby extracting a feature of
the image.
42. A pattern feature extraction method according to claim 25,
characterized in that the step of extracting input vectors
comprises the steps of Fourier-transforming the image serving as
the input pattern, extracting Fourier spectrum vectors as the input
vectors from a Fourier spectrum of the image, and extracting
Fourier amplitude vectors as the input vectors from a Fourier
amplitude spectrum of the image, thereby extracting a feature of
the image.
43. A pattern feature extraction method according to claim 40,
characterized in that a plurality of partial images or reduced
images are extracted from the image, and Fourier spectrum vectors
or Fourier amplitude vectors of the partial images or reduced
images are extracted to extract a feature of the image.
44. A pattern feature extraction apparatus comprising vector
extraction means for extracting a plurality of input vectors from
an input pattern, basis matrix storage means for storing basis
matrices respectively corresponding to the input vectors, linear
transformation means for projecting the input vectors to obtain
projection vectors using the basis matrices stored in said basis
matrix storage means, discriminant matrix storage means for storing
a discriminant matrix corresponding to a joint vector obtained by
combining a plurality of projection vectors obtained by said linear
transformation means, and second linear transformation means for
projecting, using the discriminant matrix stored in said
discriminant matrix storage means, the joint vector obtained by
combining the plurality of projection vectors, thereby extracting a
feature of the input pattern.
45. A pattern feature extraction apparatus comprising vector
extraction means for extracting a plurality of input vectors from
an input pattern, basis matrix storage means for storing basis
matrices respectively corresponding to the input vectors, linear
transformation means for projecting the input vectors to obtain
projection vectors using the basis matrices stored in said basis
matrix storage means, normalization means for normalizing the
projection vectors to obtain normalized vectors, discriminant
matrix storage means for storing a discriminant matrix
corresponding to a joint vector obtained by combining a plurality
of normalized vectors obtained by said normalization means, and
second linear transformation means for projecting, using the
discriminant matrix stored in said discriminant matrix storage
means, the joint vector obtained by combining the plurality of
normalized vectors, thereby extracting a feature of the input
pattern.
46. A pattern feature extraction apparatus comprising vector
extraction means for extracting a plurality of input vectors from
an input pattern, basis matrix storage means for storing basis
matrices respectively corresponding to the input vectors, and
linear transformation means for projecting the input vectors using
the transformation matrices stored in said transformation matrix
storage means, thereby extracting a feature of the input pattern,
characterized in that the transformation matrices stored in said
transformation matrix storage means comprise transformation
matrices specified by basis matrices respectively corresponding to
the input vectors and the discriminant matrix corresponding to the
joint vector obtained by combining the plurality of projection
vectors obtained by projecting the input vectors using the basis
matrices.
47. A computer-readable storage medium which stores a program for
allowing a computer to execute pattern feature extraction for
extracting a feature of an input pattern, characterized in that the
program comprises a program which executes a function of extracting
a plurality of input vectors from an input pattern, a function of
projecting the input vectors to obtain projection vectors using
basis matrices respectively corresponding to the input vectors, and
a function of projecting, using a discriminant matrix corresponding
to a joint vector, the joint vector obtained by combining the
projection vectors.
48. A computer-readable storage medium which stores a program for
allowing a computer to execute pattern feature extraction for
extracting a feature of an input pattern, characterized in that the
program comprises a program which executes a function of extracting
a plurality of input vectors from an input pattern, a function of
projecting the input vectors to obtain projection vectors using
basis matrices respectively corresponding to the input vectors, a
function of normalizing the projection vectors to obtain normalized
vectors, and a function of projecting, using a discriminant matrix
corresponding to a joint vector, the joint vector obtained by
combining the normalized vectors.
49. A computer-readable storage medium which stores a program for
allowing a computer to execute pattern feature extraction for
extracting a feature of an input pattern by executing a function of
extracting a plurality of input vectors from the input pattern and
a function of projecting the input vectors, characterized in that
the function of projecting the input vectors comprises a function
of projecting the input vectors by using a transformation matrix
specified by basis matrices respectively corresponding to the input
vectors and by a discriminant matrix corresponding to a joint
vector obtained by combining the plurality of projection vectors
obtained by projecting the input vectors using the basis
matrices.
50. A pattern feature extraction method characterized by comprising
the steps of segmenting an input image using different segmentation
numbers to obtain a plurality of block images and the step of
extracting Fourier amplitudes of the block images, thereby
extracting a feature amount of the input image.
51. A pattern feature extraction method according to claim 50,
characterized by comprising the steps of scanning the Fourier
amplitudes to extract multiblock Fourier amplitude vectors, and
projecting the multiblock Fourier amplitude vectors using basis
matrices to obtain projection vectors.
52. A pattern feature extraction method according to claim 51,
characterized by further comprising the step of normalizing the
projection vectors to obtain normalized vectors.
53. A pattern feature extraction method according to claim 51,
characterized in that the basis matrices comprise basis matrices
specified by transformation matrices for extracting principal
component vectors of the multiblock Fourier amplitude vectors and
by discriminant matrices corresponding to the principal component
vectors.
54. A pattern feature extraction method according to claim 50,
characterized in that in the step of obtaining the plurality of
block images, at least one entire image having the entire input
image as one block image, four block images obtained by segmenting
the entire input image into four blocks, and 16 block images
obtained by segmenting the input image into 16 blocks are
obtained.
55. A pattern feature extraction method characterized by comprising
the steps of obtaining a Fourier spectrum vector by calculating a
Fourier spectrum for an input normalized image by using a
predetermined calculation expression, extracting a multiblock
Fourier amplitude vector from a Fourier amplitude of a partial
image of the normalized image, performing feature vector projection
of the Fourier spectrum vector and the multiblock intensity vector
by using a basis matrix, thereby obtaining respective normalized
vectors, combining the normalized vectors to obtain a coupled
Fourier vector and using a second basis matrix to transform the
coupled value into a projection vector, and extracting a Fourier
feature by quantizing the projection vector.
Description
BACKGROUND OF THE INVENTION
[0001] Conventionally, in the field of pattern recognition, the
similarity between patterns such as characters or human faces has
been determined by extracting feature vectors from input patterns,
extracting feature vectors effective for identification from the
feature vectors, and comparing the feature vectors obtained from
the respective patterns.
[0002] In the case of face verification, for example, pixel values
of a facial image normalized with the positions of the eyes or the
like are raster-scanned to transform the pixel values into a
one-dimensional feature vector, and the principal component
analysis is performed by using this feature vector as an input
feature vector (non-patent reference 1: Moghaddam et al.,
"Probabilistic Visual Learning for Object Detection", IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 17,
No. 7, pp. 696-710, 1997) or linear discriminant analysis is
performed on the principal components of the feature vector
(non-patent reference 2: W. Zhao et al., "Discriminant Analysis of
Principal Components for Face Recognition", Proceedings of the IEEE
Third International Conference on Automatic Face and Gesture
Recognition, pp. 336-341, 1998), thereby reducing dimensions and
performing personal identification or the like based on faces by
using obtained feature vectors.
[0003] In these methods, covariance matrices, within-class
covariance matrices, and between-class covariance matrices are
calculated with respect to prepared learning samples, and basis
vectors are obtained as solutions to the eigenvalue problems in the
covariance matrices. The features of input feature vectors are then
transformed by using these basis vectors.
[0004] Linear discriminant analysis will be described in more
detail below.
[0005] Linear discriminant analysis is a method of obtaining a
transformation matrix W which maximizes the ratio of a
between-class covariance matrix S.sub.B to a within-class
covariance matrix S.sub.W of an M-dimensional vector y (=W.sup.Tx)
obtained when an N-dimensional feature vector x is transformed by
the transformation matrix W. As such a covariance evaluation
function, equation (1) as an evaluation expression is defined: 1 J
( W ) = S B S W = W T B W W T w W ( 1 )
[0006] In this equation, the within-class covariance matrix
.SIGMA..sub.W and between-class covariance matrix .SIGMA..sub.B are
respectively a covariance matrix .SIGMA..sub.i within C classes
.omega..sub.i (i=1, 2, . . . , C; their data count ni) in a set of
feature vectors x in a learning sample and a covariance matrix
between the classes, and are respectively represented by: 2 w = i =
1 C P ( i ) i ( 2 ) = i = 1 C ( P ( i ) 1 n i x x i ( x - m i ) ( x
- m i ) T ) B = i = 1 C P ( i ) ( m i - m ) ( m i - m ) T ( 3 )
[0007] where m.sub.i is a mean vector of a class .omega..sub.i
(equation (4)), and m is a mean vector of x in total (equation
(5)): 3 m i = 1 n i x x i x ( 4 ) m = i = 1 C P ( i ) m i ( 5 )
[0008] If a priori probability P(.omega..sub.i) of each class
.omega..sub.i reflects a sample count n.sub.i in advance, it
suffices to assume P(.omega..sub.i)=n.sub.i/n. If each probability
can be assumed to be equal, it suffice to set
P(.omega..sub.i)=1/C.
[0009] The transformation matrix W which maximizes equation (1) can
be obtained as a set of generalized eigenvectors corresponding to M
large eigenvalues of equation (6) as the eigenvalue problem of a
column vector w.sub.i. The transformation matrix W obtained in this
manner will be referred to as a discriminant matrix.
.SIGMA..sub.Bw.sub.i=.lambda..sub.i.SIGMA..sub.Ww.sub.i (6)
[0010] Note that a conventional linear discriminant analysis method
is disclosed in, for example, non-patent reference 5: Richard O.
Duda et al., "Pattern Recognition" (supervised/translated by Morio
Onoue, Shingijutu Communications, 2001, pp. 113-122).
[0011] Assume that the number of dimensions of the input feature
vector x is especially large. In this case, if small learning data
is used, .SIGMA..sub.W becomes singular. As a consequence, the
eigenvalue problem of equation (6) cannot be solved by a general
method.
[0012] As described in patent reference 1: Japanese Patent
Laid-Open No. 7-296169, it is known that a high-order component
with a small eigenvalue in a covariance matrix includes a large
parameter estimation error, which adversely affects recognition
precision.
[0013] According to the above article by W. Zhao et al., the
principal component analysis is performed on input feature vectors,
and discriminant analysis is applied to principal components with
large eigenvalues. More specifically, as shown in FIG. 2, after
principal components are extracted by projecting an input feature
vector by using a basis matrix obtained by the principal component
analysis, a feature vector effective for identification is
extracted by projecting principal components by using the
discriminant matrix obtained by discriminant analysis as a basis
matrix.
[0014] According to the computation scheme for feature
transformation matrices described in patent reference 1: Japanese
Patent Laid-Open No. 7-296169, the number of dimensions is reduced
by deleting high-order eigenvalues of total covariance matrix
.SIGMA..sub.T and corresponding eigenvectors, and discriminant
analysis is applied to a reduced feature space. Deleting high-order
eigenvalues of total covariance matrix and corresponding
eigenvectors is equivalent to performing discriminant analysis in a
space of only principal components with large eigenvalues by the
principal component analysis. In this sense, this technique, like
the method by W. Zhao, provides stable parameter estimation by
removing high-order features.
[0015] The principal component analysis using the total covariance
matrix .SIGMA..sub.T, however, is no more than sequentially
selecting orthogonal axes within a feature space in the axial
direction in which large covariances appear. For this reason, a
feature axis effective for pattern identification is lost.
[0016] Assume that the feature vector x is comprised of three
elements (x=(x.sub.1, x.sub.2, x.sub.3).sup.T), x.sub.1 and x.sub.2
are features which have large variances but are irrelevant to
pattern identification, and x.sub.3 is effective for pattern
identification but has a small variance (between-class
variance/within-class variance, i.e., Fisher's ratio, is large, but
the variance value itself is sufficiently smaller than those of
x.sub.1 and x.sub.2). In this case, if the principal component
analysis is performed and only two-dimensional values are selected,
a feature space associated with x.sub.1 and x.sub.2 is selected,
and the contribution of x.sub.3 effective for identification is
neglected.
[0017] This phenomenon will be described with reference to the
accompanying drawings. Assume that FIG. 3A is the distribution of
data viewed from a direction almost perpendicular to the plane
defined by x.sub.1 and x.sub.2, with the black circles and white
circles representing data points in different classes. When viewed
in the space defined by x.sub.1 and x.sub.2 (plane in FIG. 3A),
black and white circles cannot be identified. When, however, viewed
from a feature axis of x.sub.3 perpendicular to this plane as shown
in FIG. 3B, black and white circles can be separated from each
other. If, however, an axis with a large variance is selected, the
plane defined by x.sub.1 and x.sub.2 is selected as a feature
space, which is equivalent to performing discrimination by seeing
FIG. 3A. This makes it difficult to perform discrimination.
[0018] In the prior art, this is a phenomenon which cannot be
avoided by the principal component analysis and the technique of
deleting spaces with small eigenvalues in (total) covariance
matrices.
DISCLOSURE OF INVENTION
[0019] The present invention has been made in consideration of the
above problems in the prior art, and has as its object to provide a
feature vector transformation technique for suppressing a reduction
in feature amount effective for discrimination and performing
efficient feature extraction when a feature vector effective for
discrimination is to be extracted from an input pattern feature
vector and feature dimensions are to be compressed.
[0020] A pattern feature extraction method according the present
invention is characterized by comprising the step of expressing one
of a pattern feature and a feature from an image by using a
plurality of feature vectors x.sub.1, the step of obtaining a
discriminant matrix W.sub.i of each feature vector by linear
discriminant analysis with respect to each of the plurality of
feature vectors x.sub.1, the step of obtaining in advance a
discriminant matrix W.sub.T by linear discriminant analysis with
respect to a feature vector y obtained by arraying vectors y.sub.i
obtained by linearly transforming the vectors x.sub.i using the
discriminant matrix W.sub.i, and the step of performing linear
transformation specified by the discriminant matrix W.sub.i and the
discriminant matrix W.sub.T.
[0021] This pattern feature extraction method is characterized in
that the step of performing linear transformation comprises the
step of compressing the number of feature dimensions by
transforming a feature vector of a pattern.
[0022] In addition, the method is characterized in that the step of
expressing comprises the step of dividing a pattern feature into a
plurality of feature vectors x.sub.i, the step of obtaining the
discriminant matrix W.sub.T comprises the step of calculating a
feature vector y.sub.i by performing linear transformation
y.sub.i=W.sub.i.sup.Tx.sub.i with respect to the feature vector
x.sub.i using the discriminant matrix W.sub.i, and the step of
performing linear transformation comprises the step of compressing
the number of dimensions of a pattern feature by calculating a
feature vector z by calculating linear transformation
z=W.sub.T.sup.Ty with respect to a vector y obtained by combining
calculated feature vector y.sub.i using the discriminant matrix
W.sub.T.
[0023] In addition, the method is characterized by further
comprising the step of calculating in advance a matrix W specified
by the discriminant matrices W.sub.i and W.sub.T, wherein the step
of performing linear transformation comprises the step of
compressing the number of dimensions of a pattern feature by
calculating a feature vector z by calculating linear transformation
z=W.sub.TX with respect a feature vector x obtained by combining
input feature vectors x.sub.i and the matrix W using the matrix
W.
[0024] The above pattern feature extraction method is characterized
in that the step of expressing comprises the step of extracting a
feature vector x.sub.i formed from pixel values obtained from a
plurality of sample points with respect to a plurality of sample
point sets S.sub.i preset in an image, and the step of performing
linear transformation comprises the step of extracting a feature
amount from the image by transforming a feature vector for each
image sample set.
[0025] This pattern feature extraction method is characterized in
that the step of obtaining in advance the discriminant matrix
W.sub.T comprises the step of calculating a feature vector y.sub.i
by performing linear transformation y.sub.i=W.sub.i.sup.Tx.sub.i
with respect to a plurality of feature vectors x.sub.i formed from
a plurality of sample points by using the discriminant matrix
W.sub.i, and the step of performing linear transformation comprises
the step of extracting a feature amount from an image by
calculating a feature vector z by calculating linear transformation
z=W.sub.T.sup.Ty with respect to a vector y obtained by combining
calculated feature vectors y.sub.i by using the discriminant matrix
W.sub.T.
[0026] The method is characterized by further comprising the step
of calculating in advance a matrix W specified by the discriminant
matrices W.sub.i and W.sub.T, wherein the step of performing the
linear transformation comprises the step of extracting a feature
amount from an image by calculating a feature vector z by
calculating linear transformation z=W.sub.TX of a vector x obtained
by combining feature vectors x.sub.i and the matrix W by using the
matrix W.
[0027] The above pattern feature extraction method is characterized
in that the step of expressing comprises the step of segmenting an
image into a plurality of preset local regions, and expressing a
feature amount as a feature vector x.sub.i extracted for each of
the plurality of local regions, and the step of performing linear
transformation comprises the step of extracting a feature amount
from an image by transforming a feature vector of a local
region.
[0028] This pattern feature extraction method is characterized in
that the step of obtaining in advance the discriminant matrix
W.sub.T comprises the step of calculating a feature vector y.sub.i
by performing linear transformation y.sub.i=W.sub.i.sup.Tx.sub.i
with respect to a feature vector x.sub.i by using the discriminant
matrix W.sub.i, and the step of performing linear transformation
comprises the step of extracting a feature amount from an image by
calculating a feature vector z by calculating linear transformation
z=W.sub.T.sup.Ty with respect to a vector y obtained by combining
calculated feature vector y.sub.i using the discriminant matrix
W.sub.T.
[0029] The method is characterized by further comprising the step
of calculating in advance a matrix W specified by the discriminant
matrices W.sub.i and W.sub.T, wherein the step of performing linear
transformation comprises the step of extracting a feature amount
from an image by calculating a feature vector z by calculating
linear transformation z=W.sub.TX with respect a feature vector x
obtained by combining input feature vectors x.sub.i and the matrix
W using the matrix W.
[0030] The above pattern feature extraction method is characterized
by further comprising the step of performing a two-dimensional
Fourier transform for an image, wherein the step of expressing
comprises the step of extracting a real component and an imaginary
component of a two-dimensional Fourier transform as a feature
vector x.sub.i, and the step of calculating a power spectrum of a
two-dimensional Fourier transform, and extracting the power
spectrum as a feature vector x.sub.2, and in the step of performing
linear transformation, a feature amount is extracted from an image
by transforming a feature vector.
[0031] This pattern feature extraction method is characterized in
that in the step of performing linear transformation, a feature
amount is extracted from an image by transforming a feature vector
x.sub.1 corresponding to a real component and an imaginary
component of a Fourier component and a feature vector x.sub.2
corresponding to a power spectrum of the Fourier component by
linear transformation specified by a discriminant matrix W.sub.i
and a discriminant matrix W.sub.T corresponding to principal
components of a feature vector x.sub.i in such a manner that
dimension reduction is realized.
[0032] This pattern feature extraction method is characterized by
further comprising the step of calculating a discriminant feature
of principal components of a feature vector x.sub.1 formed from a
real component and an imaginary component based on a Fourier
transform by linear transformation y.sub.1=.PHI..sub.1.sup.Tx.sub.1
using a transformation matrix .PSI..sub.1 for transforming the
principal components of the feature vector x.sub.1 and a basis
matrix .PHI..sub.1 (=(W.sub.1.sup.T.PSI..sub.1.sup.T).sup.T)
represented by a discriminant matrix W.sub.1 corresponding to the
principal components, the step of normalizing a size of an obtained
feature vector y.sub.1 to a predetermined size, the step of
calculating a discriminant feature of principal components of a
feature vector x.sub.2 formed from a power spectrum based on
Fourier transformation by using a transformation matrix .PSI..sub.2
for transforming the feature vector x.sub.2 to principal components
and a basis matrix .PHI..sub.2 (=(W.sub.2.sup.T.PSI..sub.2.sup-
.T).sup.T) represented by a discriminant matrix W.sub.2
corresponding to the principal components, the step of normalizing
a size of an obtained feature vector y.sub.2 to a predetermined
size, and the step of extracting a feature amount from an image by
calculating a feature vector z by calculating linear transformation
z=W.sub.T.sup.Ty using a discriminant matrix W.sub.T with respect
to a feature vector y obtained by combining two feature vectors
y.sub.1 and y.sub.2.
[0033] This pattern feature extraction method is characterized in
that the step of expressing further comprises the step of
segmenting an image into a plurality of regions, and in the step of
extracting the feature vector x.sub.2, a two-dimensional Fourier
power spectrum is calculated in each of the segmented regions.
[0034] In addition, the method is characterized in that in the step
of segmenting, a region is segmented into regions having different
sizes in a multiple manner.
[0035] In addition, the method is characterized in by further
comprising the step of reducing feature dimensions by performing
feature extraction by kernel discriminant analysis on an obtained
two-dimensional Fourier power spectrum and extracting an effective
feature amount.
[0036] The method is characterized by further comprising the step
of reducing feature dimensions by performing linear transformation
using a discriminant matrix obtained in advance by linear
discriminant analysis with respect to an obtained two-dimensional
Fourier power spectrum.
[0037] The method is characterized in that the step of obtaining in
advance the discriminant matrix W.sub.i comprises the step of
obtaining the discriminant matrix W.sub.i of feature vectors
obtained by linear discriminant analysis on principal components of
a feature vector x.sub.i (i=1, 2), and in the step of performing
linear transformation, a feature amount is extracted from an image
by transforming a feature vector x.sub.1 corresponding to a real
component and an imaginary component of a Fourier component and a
feature vector x.sub.2 corresponding to a power spectrum of the
Fourier component by linear transformation specified so as to
reduce dimensions by a discriminant matrix W.sub.i and a
discriminant matrix W.sub.T corresponding to principal components
of the feature vector x.sub.i.
[0038] This pattern feature extraction method is characterized in
that the step of expressing further comprises the step of
calculating a power spectrum of a two-dimensional Fourier
transform, the step of segmenting an image into a plurality of
regions and calculating a power spectrum of a two-dimensional
Fourier transform for each of the regions, and the step of
extracting a vector obtained combining the respective power spectra
as a feature vector x.sub.2.
[0039] A pattern feature extraction device according to the present
invention is a pattern feature extraction device for compressing
feature dimensions of a pattern feature by using linear
transformation, characterized by comprising basis matrix storage
means for storing a basis matrix specified by a discriminant matrix
W.sub.i of feature vectors obtained by linear discriminant analysis
on a plurality of feature vectors x.sub.i representing a pattern
feature and a discriminant matrix W.sub.T obtained in advance by
linear discriminant analysis on a feature vector y obtained by
combining vectors y.sub.i obtained by performing linear
transformation of the vectors x.sub.i using a discriminant matrix,
and linear transformation means for compressing feature dimension
by transforming a feature vector of a pattern by using the basis
matrix stored by the basis matrix storage means.
[0040] A computer-readable storage medium according to the present
invention is a computer-readable storage medium which records a
program for causing a computer to execute pattern feature
extraction to compress feature dimensions of a pattern feature by
using linear transformation, the program being characterized by
including a program for executing a function of expressing a
pattern feature by a plurality of feature vectors x.sub.i,
obtaining in advance a discriminant matrix W.sub.i of feature
vectors obtained by performing linear discriminant analysis on each
of the feature vectors x.sub.i, and obtaining in advance a
discriminant matrix W.sub.T by linear discriminant analysis on a
feature vector y obtained by combining vectors y.sub.i obtained by
linear transformation of the vectors x.sub.i, and a function of
compressing feature dimensions by transforming a feature vector of
a pattern by linear transformation specified by the discriminant
matrix W.sub.i and the discriminant matrix W.sub.T.
[0041] An image feature extraction method according to the present
invention is characterized by comprising the step of obtaining a
Fourier spectrum vector by calculating a Fourier spectrum of an
input normalized image by using a predetermined mathematic
expression, the step of extracting a multiblock Fourier amplitude
vector from a Fourier amplitude of a partial image of the
normalized image, the step of obtaining normalized vectors of a
Fourier spectrum vector and the multiblock intensity vector by
performing projection of feature vectors with respect to the
Fourier spectrum vector and the multiblock intensity vector by
using a basis matrix, the step of combining the normalized vectors
to form a combined Fourier vector and obtaining a projection vector
of the coupled value by using a second basis matrix, and the step
of extracting a Fourier feature by quantizing the projection
vector.
BRIEF DESCRIPTION OF DRAWINGS
[0042] FIG. 1 is a block diagram showing the arrangement of a
pattern feature extraction device according to an embodiment of the
present invention;
[0043] FIG. 2 is a view for explaining the prior art;
[0044] FIG. 3 is a view for explaining the distribution of pattern
features;
[0045] FIG. 4 is a block diagram showing the arrangement of a
pattern feature extraction device according to the second
embodiment of the present invention;
[0046] FIG. 5 is a view for explaining an embodiment of the present
invention;
[0047] FIG. 6 is a view for explaining an embodiment of the present
invention;
[0048] FIG. 7 is a block diagram showing the arrangement of a
facial image matching system according to the third embodiment of
the present invention;
[0049] FIG. 8 is a view for explaining an embodiment of the present
invention;
[0050] FIG. 9 is a view for explaining an embodiment of the present
invention;
[0051] FIG. 10 is a view for explaining an embodiment of the
present invention;
[0052] FIG. 11 is a view for explaining an embodiment of the
present invention;
[0053] FIG. 12 is a view for explaining an embodiment of the
present invention;
[0054] FIG. 13 is a view for explaining an embodiment of the
present invention;
[0055] FIG. 14 is a view for explaining an embodiment of the
present invention;
[0056] FIG. 15 is a view showing an example of a facial description
according to the fifth embodiment of the present invention;
[0057] FIG. 16 is a view showing an example of a rule when a binary
representation syntax is used in the fifth embodiment of the
present invention;
[0058] FIG. 17 is a view for explaining how to extract a Fourier
feature (FourierFeature) in the fifth embodiment of the present
invention;
[0059] FIG. 18 is a view showing an example of a Fourier spectrum
scanning method in the fifth embodiment of the present
invention;
[0060] FIG. 19 is a table showing an example of a Fourier spectrum
scanning rule in the fifth embodiment of the present invention;
[0061] FIG. 20 is a table showing an example of scanning regions in
a Fourier space for CentralFourierFeature elements in the fifth
embodiment of the present invention; and
[0062] FIG. 21 is a view showing an example of a block diagram in
the fifth embodiment of the present invention.
DETAILED DESCRIPTION OF EMBODIMENTS
[0063] (First Embodiment)
[0064] An embodiment of the present invention will be described in
detail with reference to the accompanying drawings. FIG. 1 is a
block diagram showing a pattern feature extraction device using a
pattern feature extraction device according to the present
invention.
[0065] The pattern feature extraction device will be described in
detail below.
[0066] As shown in FIG. 1, the pattern feature extraction device
according to the present invention includes a first linear
transformation means 11 for linearly transforming an input feature
vector x.sub.1, a second linear transformation means 12 for
linearly transforming an input feature vector x.sub.2, and a third
linear transformation means 13 for receiving feature vectors which
are transformed and dimension-reduced by the linear transformation
means 11 and 12 and linearly transforming them. The respective
linear transformation means perform basis transformation based on
discriminant analysis by using discriminant matrices obtained in
advance by learning and stored in discriminant matrix storage means
14, 15, and 16.
[0067] The input feature vectors x.sub.1 and x.sub.2 are feature
amounts which are extracted in accordance with purposes in
character recognition, face verification, and the like, and
include, for example, directional features calculated from the
gradient features of an image, and density features which are
nothing less than the pixel values of an image. Each vector
includes a plurality of elements. In this case, for example,
N.sub.1 directional features are input as one feature vector
x.sub.1, and other N.sub.2 density values as the feature vector
x.sub.2.
[0068] The discriminant matrix storage means 14 and 15 store
discriminant matrices W.sub.1 and W.sub.2 obtained by performing
linear discriminant analysis on the feature vectors x.sub.1 and
x.sub.2.
[0069] As described above, discriminant matrices may be obtained by
calculating a within-class covariance matrix .SIGMA..sub.W
(equation (2)) and between-class covariance matrix .SIGMA..sub.B
(equation (3)) with respect to feature vectors in prepared learning
samples in accordance with their classes, as described above. A
priori probability P(.omega..sub.i) of each class .omega..sub.i may
be given by P(.omega..sub.i)=n.sub.i/n with sample count sample
count ni being reflected.
[0070] Discriminant matrices can be obtained in advance by
selecting eigenvectors W.sub.i corresponding to the large
eigenvalues in an eigenvalue problem expressed by equation (6) with
respect to these covariance matrices.
[0071] When M.sub.1-dimensional and M.sub.2-dimensional bases
smaller than input feature dimensions N.sub.1 and N.sub.2 are
selected with respect to the feature vectors x.sub.1 and x.sub.2,
M.sub.1-dimensional and M.sub.2-dimensional feature vectors y.sub.1
and y.sub.2 can be obtained by projective transformation to
discriminant bases,
y.sub.1=W.sub.1.sup.Tx.sub.1
y.sub.2=W.sub.2.sup.Tx.sub.2 (7)
[0072] In this case, the sizes of the matrices W.sub.1 and W.sub.2
are M.sub.1.times.N.sub.1 and M.sub.2.times.N.sub.2,
respectively.
[0073] The numbers of feature dimensions can be efficiently reduced
by greatly reducing dimension counts M.sub.1 and M.sub.2 of feature
spaces to be projected. This can effectively decrease the data
amount and increase the processing speed. If, however, the number
of feature dimensions is reduced too much, the discriminant
performance deteriorates. This is because as the number of feature
dimensions is reduced, feature amounts effective for discrimination
are lost.
[0074] For this reason, the dimension counts M.sub.1 and M.sub.2 of
feature vectors are amounts which are easily influenced by the
number of learning samples, and are preferably determined on the
basis of experiments.
[0075] The third linear transformation means 13 projects the
vectors y.sub.1 and y.sub.2 calculated by the first and second
linear transformation means as an input feature vector y to a
discriminant space. A discriminant matrix W3 to be registered in
the discriminant matrix storage means 16 is obtained from learning
samples as in the case wherein the first and second discriminant
matrices are calculated. The input feature vector y is a vector
having elements arranged as expressed by equation (8) given below:
4 y = ( y 1 y 2 ) ( 8 )
[0076] As in the case of equation (7), the Lth-dimensional feature
vector y is projected according to equation (9) by using the basis
matrix W.sub.3 (the size of the matrix is
L.times.(M.sub.1+M.sub.2)), and a feature vector z to be output is
obtained.
Z=W.sub.3.sup.Ty (9)
[0077] In this manner, each feature vector is divided, and linear
discriminant analysis is performed on learning samples of feature
vectors with small dimension counts, thereby suppressing estimation
errors, which tend to occur in high-dimensional feature components,
and obtaining features effective for discrimination.
[0078] In the above case, the three linear transformation means are
provided to perform processing concurrently and stepwisely.
However, since a linear discrimination means can be basically
realized by a product-sum computing unit, one linear transformation
means can be commonly used by switching discriminant matrices to be
read out in accordance with the input feature vector to be linearly
transformed.
[0079] The size of a necessary computing unit can be reduced by
using one linear transformation means in this manner.
[0080] As is obvious from equations (7), (8), and (9), the
computation of an output feature vector z can be expressed as: 5 Z
= W 3 T ( y 1 y 2 ) = W 3 T ( W 1 T x 1 W 2 T x 2 ) = W 3 T ( W 1 T
0 0 W 2 T ) ( x 1 x 2 ) = W T ( x 1 x 2 ) ( 10 )
[0081] That is, linear transformations using the respective
discriminant matrices can be integrated into linear transformation
using one matrix. In stepwise computation processing, the number of
times of product-sum computation is
L.times.(M.sub.1+M.sub.2)+M.sub.1N.sub.1+M.sub.2N.sub.2. When
matrices are integrated into one matrix, the number of times of
product-sum computation is L.times.(N.sub.1+N.sub.2). If, for
example, N.sub.1=N.sub.2=500, M.sub.1=M.sub.2=200, and L=100,
240,000 product-sum computations are required in stepwise
computation processing. In the latter computation processing,
100,000 product-sum computations are required. The computation
amount in batch computation processing as in the latter case is
smaller than that in the former case, and hence high-speed
computation can be realized. As is obvious from the mathematical
expressions, when a final dimension count L is to be reduced, the
batch computation method can reduce the computation amount and
hence is effective.
[0082] (Second Embodiment)
[0083] According to the above case, when different kinds of
features, e.g., directional features and density features, are to
be joined together, discriminant analysis is repeatedly performed
on a feature vector having undergone discriminant analysis for each
feature. However, a plurality of elements corresponding to one
feature may be divided into a plurality of vectors, discriminant
analysis may be performed on each element set as an input feature,
and the corresponding projected vector may be further subjected to
discriminant analysis.
[0084] In the second embodiment, a facial image feature extraction
device will be described.
[0085] As shown in FIG. 4, the facial image feature extraction
device according to the second invention includes an image feature
decomposition means 41 for decomposing the density feature of an
input facial image, a linear transformation means 42 for projecting
a feature vector in accordance with a discriminant matrix
corresponding to the feature vector, and a discriminant matrix
group storage means 43 for storing the respective discriminant
matrices described above.
[0086] Techniques of extracting features from facial images include
a method of positioning facial images at the eye position or the
like and setting their density values as vector features, as
disclosed in the above article by W. Zhao et al.
[0087] In the second invention as well, pixel density values of an
image are handled as an input feature, i.e., an original feature.
However, an image feature has a large image size, for example,
42.times.54 pixels=2352 dimensions with the central positions of
the left and right eyes being normalized to the coordinates (14,
23) and (29, 23). With such large feature dimensions, it is
difficult to perform high-precision feature extraction by directly
performing linear discriminant analysis using limited learning
samples. Therefore, a deterioration in feature which is caused when
the principal component analysis or the like is applied is
suppressed by decomposing image feature elements, performing
discriminant analysis on the decomposed features, and obtaining
discriminant matrices.
[0088] One of the methods of decomposing image features is to
segment an image. For example, as shown in FIG. 5, an image is
divided into nine parts each having a size of 14.times.18 pixels
(=252 dimensions), local images having different sizes are set as
feature vectors x: (i=1, 2, 3, . . . , 9), and discriminant
analysis is performed on the respective partial images by using
learning samples, thereby obtaining in advance discriminant
matrices W.sub.i corresponding to the respective feature
vectors.
[0089] Note that letting regions have overlaps when an image is
segmented makes it possible to reflect, in feature vectors, feature
amounts based on the correlations between pixels in the boundary
regions. Therefore, the respective regions may be sampled after
being overlapped.
[0090] Since the number of feature dimensions is greatly reduced to
252 as compared with the original image, a basis matrix based on
discriminant analysis can be calculated with high precision by
sampling several images of each of several hundred individuals,
i.e., a total of several thousand facial images. If the number of
feature dimensions is as large as that of the original feature
(2352 dimensions), in order to obtain similar performance with
features based on discriminant analysis, it is expected that facial
images of several thousand individuals must be sampled. In
practice, however, it is difficult to collect such a large amount
of image data, and hence this technique cannot be realized.
[0091] Assume that the feature in each local region is compressed
to a 20-dimensional feature by a first-stage discriminant feature.
In this case, the resultant output feature vectors become a feature
vector of 9 regions.times.20 dimensions=180 dimensions. By further
performing discriminant analysis on this feature vector, the number
of dimensions can be efficiently reduced to about 50 dimensions.
This second-stage discriminant matrix is also stored in the
discriminant matrix group storage means 43, and discriminant
analysis is performed again by the linear transformation means 42
upon receiving the 180-dimensional vector of the first-stage
discriminant feature. Note that the first-stage discriminant matrix
and second-stage discriminant matrix may be calculated in advance
as indicated by equation (10). However, when 252 dimensions.times.9
regions are to be compressed to 20 dimensions.times.9 regions, and
the 180 dimensions are to be transformed into 50 dimensions, the
calculation in two stages will reduce the memory to be used and the
computation amount to 1/2 or less and hence is efficient.
[0092] By applying discriminant analysis locally and stepwisely, a
facial feature with high identification performance can be
extracted. Assume that in character recognition, for example, z,1
and z,2 are to be identified. In this case, if the principal
component analysis is performed on each entire character image to
extract components having large eigenvalues, the feature "{grave
over ()}" that helps to identify and tends to be lost (for this
reason, similar character identification is sometimes performed by
using a specific high-order feature instead of a feature of a
portion with a large eigenvalue obtained by the principal component
analysis). The effectiveness of segmenting an image into local
regions and extracting discriminant features is similar to a
phenomenon in similar character identification in character
recognition. It can be thought that spatially limiting a feature
that is easy to identify can ensure higher precision per unit
dimension than performing discriminant analysis on principal
components as a whole.
[0093] In addition, the image feature decomposition means 41 may
sample images from an entire image and segment the sampled images
instead of segmenting an image and forming a feature vector for
each local region. When, for example, a primary feature is to be
divided by nine into nine 252-dimensional vectors, sampling is
performed from 3.times.3 regions, as shown in FIG. 6. That is, the
sampled images become reduced images with slight positional
differences. These reduced images are raster-scanned to be
transformed into nine feature vectors. Such feature vectors are
used as primary vectors to calculate discriminant components. These
discriminant components may be integrated to perform discriminant
analysis again.
[0094] (Third Embodiment)
[0095] Another embodiment of the present invention will be
described in detail with reference to the accompanying drawings.
FIG. 7 is a block diagram showing a facial image matching system
using a facial metadata creating device according to the present
invention.
[0096] The facial image matching system will be described in detail
below.
[0097] As shown in FIG. 7, the facial image matching system
according to the present invention includes a facial image input
unit 71 which inputs facial images, a facial metadata creating unit
72 which creates facial metadata, a facial metadata storage unit 73
which stores extracted facial metadata, a facial similarity
calculation unit 74 which calculates a facial similarity from
facial metadata, a facial image database 75 which stores facial
images, a control unit 76 which controls the input of images, the
creation of metadata, the storage of metadata, and the calculation
of facial similarities in accordance with an image registration
request/retrieval request, and a display unit 77 of a display which
displays facial images and other information.
[0098] The facial metadata creating unit 72 is comprised of a
region cutting means 721 for cutting a facial region from an input
facial image, and a facial image feature extraction means 722 which
extracts a facial feature of the cut region. The facial metadata
creating unit 72 creates metadata about a facial image by
extracting facial feature vectors.
[0099] When a facial image is to be registered, a facial photo or
the like is input upon adjustment of the size and position of the
face by using the facial image input unit 71 such as a scanner or
video camera. Alternatively, a human face may be directly input
from a video camera or the like. In this case, it is preferable
that the face position of the input image be detected by using a
face detection technique like that disclosed in the above reference
by Mohaddam and the size and the like of the facial image be
automatically normalized.
[0100] The input facial image is registered in the facial image
database 75 as needed. At the same time with facial image
registration, the facial metadata creating unit 72 creates facial
metadata and stores it in the facial metadata storage unit 73.
[0101] At the time of retrieval, the facial image input unit 71
inputs a facial image, and the facial metadata creating unit 72
creates facial metadata as in the case of registration. The created
facial metadata is either registered in the facial metadata storage
unit 73 or directly sent to the facial similarity calculation unit
74.
[0102] In retrieval operation, when it is to be checked whether or
not data identical to a pre-input facial image exists in the
database (facial identification), the similarity between the input
facial image and each data registered in the facial metadata
storage unit 73 is calculated. The control unit 76 selects a facial
image from the facial image database 75 on the basis of the result
exhibiting the highest similarity, and displays the facial image on
the display unit 77 or the like. An operator then checks the
coincidence between the faces in the retrieved image and the
registered image.
[0103] When it is to be checked whether or not a facial image
specified by an ID number or the like in advance coincides with a
retrieved facial image (face verification), the facial similarity
calculation unit 74 makes calculation to check whether or not the
facial image specified by the ID number coincides with the
retrieved image. If the calculated similarity is lower than a
predetermined similarity, it is determined that the two images do
not coincide with each other, and the result is displayed on the
display unit 77. Assume that this system is used for room access
management. In this case, room access management can be performed
by causing the control unit 76 to send an opening/closing control
signal to an automatic door so as to control the automatic door,
instead of displaying a facial image.
[0104] The facial image matching system operates in the above
manner. Such operation can be implemented on a computer system. For
example, facial image matching can be realized by storing a
metadata creation program for executing metadata creation to be
described in detail next and a similarity calculation program in a
memory and executing these programs using a program control
processor.
[0105] In addition, these programs may be recorded on a
computer-readable recording medium.
[0106] The operation of this facial image matching system, and more
specifically, the operations of the facial metadata creating unit
72 and facial similarity calculation unit 74, will be described in
detail next.
[0107] (1) Creation of Facial Metadata
[0108] The facial metadata creating unit 72 extracts a facial
feature amount by using an image I(x, y) whose position and size
have been normalized. In normalizing the position and size, the
image is preferably normalized to set the eye positions to (16, 24)
and (31, 24) and the size to 46.times.56 pixels. In the following
case, the image has been normalized to this size.
[0109] The region cutting means 721 then cuts a plurality of preset
local regions of the facial image. In the case of the above image,
for example, one region is the entire normalized facial image (f(x,
y)) and the other is a central region g(x, y) of 32.times.32 pixels
centered on the face. This region may be cut such that the
positions of the two eyes are set to (9, 12) and (24, 12).
[0110] The reason why a central region of a face is cut in the
above manner is that a stable feature can be extracted by cutting a
range free from the influences of a hair style and the like even if
the hair style changes (for example, when facial verification is
used in a home robot, verification can be done even if the hair
style changes before and after bathing). If a hair style and the
like do not change (for example, personal identification within
scenes in a video clip), since an improvement in verification
performance can be expected by performing verification using images
including hair styles, a large facial image including a hair style
and a small facial image of a central portion of the face are
cut.
[0111] The facial image feature extraction means 722 then performs
two-dimensional discrete Fourier transforms for the two cut regions
f(x, y) to extract a facial image feature.
[0112] FIG. 8 shows the more detailed arrangement of the facial
image feature extraction means 722. The facial image feature
extraction means includes a Fourier transform means 81 for
performing a discrete Fourier transform for a normalized cut image,
a Fourier power calculation means 82 for calculating the power
spectrum of a Fourier-transformed Fourier frequency component, a
linear transformation means 83 for regarding a feature vector
obtained by raster-scanning the real and imaginary components of
the Fourier frequency component calculated by the Fourier transform
means 81 as a one-dimensional feature vector and extracting a
discriminant feature from the principal components of the feature
vector, a basis matrix storage means 84 for storing a basis matrix
for the transformation, a linear transformation means 85 for
extracting a discriminant feature of principal components from a
power spectrum in the same manner as described above, and a basis
matrix storage means 86 for storing a basis matrix for the
transformation. The facial image feature extraction means 722
further includes a linear transformation means 88 for normalizing
each of the discriminant feature of the real and imaginary
components of the Fourier feature and the discriminant feature of
the power spectrum to a vector with a size of 1, and calculating a
discriminant feature of a vector obtained by combining the two
feature vectors, and a discriminant matrix storage means 89 for
storing a discriminant matrix for the discriminant feature.
[0113] After a Fourier frequency feature is extracted with this
arrangement, discriminant features of principal components are
calculated for a feature vector including the real and imaginary
parts of the Fourier frequency component as elements and a feature
vector including a power spectrum as an element, and a discriminant
feature is calculated again for a feature vector obtained by
combining the above vectors, thereby calculating the feature amount
of the face.
[0114] Each operation will be described in more detail below.
[0115] The Fourier transform means 81 performs a two-dimensional
Fourier transform for the input image f(x, y) (x=0, 1, 2, . . . ,
M-1, y=0, 1, 2, . . . , N-1) to calculate a Fourier feature F(u, v)
according to equation (11). This method is widely known and
described in, for example, Rosenfeld et al., "Digital Picture
Processing", Kindai Kagaku Sha, pp. 20-26, and hence a description
thereof will be omitted. 6 F ( u , v ) = x = 0 M - 1 y = 0 N - 1 f
( x , y ) exp ( - 2 .PI. i ( xu M + yv N ) ) ( 11 )
[0116] The Fourier power calculation means calculates a Fourier
power spectrum .vertline.F(u, v).vertline. by obtaining the size of
the Fourier feature F(u, v) according to equation (12).
.vertline.F(u, v).vertline.={square root}{right arrow over
(.vertline.Re(F(u, v)).vertline..sup.2+.vertline.Im(Fu,
v)).vertline..sup.2)} (12)
[0117] The two-dimensional Fourier spectra F(u, v) and
.vertline.F(u, v).vertline. obtained in this manner are obtained by
transforming only the images of two-dimensional real parts, the
obtained Fourier frequency components become symmetrical. For this
reason, these spectrum images F(u, v) and .vertline.F(u, v)
.vertline. have M.times.N components (u=0, 1, . . . , M-1; v=0, 1,
. . . , N-1), and half of the components, i.e., M.times.N/2
components (u=0, 1, . . . , M-1; v=0, 1, . . . , N-1) and the
remaining half components are substantially equivalent. Therefore,
the subsequent processing may be performed by using half components
as a feature vector. Obviously, computation can be simplified by
omitting computation for components which are not used as elements
of a feature vector in the Fourier transform means 81 and Fourier
power calculation means 82.
[0118] The linear transformation means 83 then handles the feature
amount extracted as a frequency feature as a vector. A partial
space to be defined in advance is set by a basis vector
(eigenvector) obtained by preparing a facial image set for learning
and performing discriminant analysis on the principal components of
a frequency feature vector in a corresponding cut region. Since
this basis vector is obtained by a widely known method described in
various references including the reference by W. Zhao, a
description thereof will be omitted. The reason why discriminant
analysis is not directly performed is that the number of dimensions
of a feature vector obtained by a Fourier transform is too large to
directly handle discriminant analysis. Although the already
indicated problem in principal component discriminant analysis
remains unsolved, this technique is one choice as a technique of
extracting a first-stage feature vector. Alternatively, a basis
matrix obtained by the method of repeating discriminant analysis
may be used.
[0119] That is, a discriminant matrix .PHI..sub.1 of principal
components which is to be stored in the basis matrix storage means
84 can be obtained from learning samples in advance by performing
discriminant analysis on the principal components of a
one-dimensional feature vector x.sub.1 obtained by raster-scanning
the real and imaginary components of a frequency feature. In this
case, a Fourier feature need not always be handled as a complex
number, and may be handled as a real number with an imaginary
component being handled as another feature element.
[0120] Letting .PSI..sub.1 be a basis matrix for principal 25
components, and W.sub.1 be a discriminant matrix obtained by
discriminant analysis on the vector of the principal components,
the discriminant matrix .PHI..sub.1 of the principal components can
be expressed by
.PHI..sub.1.sup.T=W.sub.1.sup.T.PSI..sub.1.sup.T (13)
[0121] It suffices if the number of dimensions to be reduced by the
principal component analysis is set to about {fraction (1/10)}
(about 200 dimensions) of the original feature Fourier feature.
Thereafter, the number of dimensions is reduced to about 70 by this
discriminant matrix. This basis matrix is calculated in advance
from learning samples and is used as information to be stored in
the basis matrix storage means 84.
[0122] In the case of the Fourier spectrum .vertline.F(u, v)
.vertline. as well, a spectrum is expressed as a one-dimensional
feature vector x.sub.2 by raster scanning, and basis matrix.
.PHI..sub.2.sup.T=.PSI..sub.2.sup.T- W.sub.2.sup.T, which is
obtained by discriminant analysis on the principal components of
the feature vector, is obtained in advance by learning samples.
[0123] Calculating a principal component discriminant feature for
each component of a Fourier feature in this manner makes it
possible to obtain a discriminant feature y.sub.1 of the principal
components of the feature vector x.sub.1 of the real and imaginary
components of Fourier components, and a discriminant feature
y.sub.2 of the principal components of the feature vector x.sub.2
of a power spectrum.
[0124] A normalization means 87 normalizes the size of each
obtained feature vector to a unit vector with a size of 1. In this
case, the vector length varies depending on the position of an
origin for the measurement of a vector, and hence its reference
position must also be determined in advance. In this case, it
suffices if a reference point is set by using a mean vector m.sub.i
obtained from a learning sample of a projected feature vector
y.sub.i. By setting a mean vector as a reference point, feature
vectors are distributed around the reference point. In the case of
a Gaussian distribution, in particular, feature vectors are
isotropically distributed. This makes it easy to limit a
distribution region in a case wherein a feature vector is quantized
in the end.
[0125] That is, a vector y.sub.i.sup.0 obtained by normalizing the
feature vector y.sub.i to a unit vector by using the mean vector
m.sub.i can be expressed by 7 y i 0 = y i - m i y i - m i ( 14
)
[0126] In this manner, the normalization means is provided to
normalize the feature vector y.sub.1 associated with the real and
imaginary numbers of Fourier power and the feature vector y.sub.2
associated with the power to unit vectors in advance. This makes it
possible to normalize the sizes of two different kinds of feature
amounts and stabilize the distribution features of feature
vectors.
[0127] In addition, since the sizes of these vectors have already
been normalized within a feature space necessary for discrimination
in the process of dimension reduction, normalization robust against
noise can be realized as compared with a case wherein normalization
is performed in a feature space containing more deleted noise. This
normalization can remove the influences of variation elements such
as variation components which are proportional to the overall
illumination intensity which is difficult to remove by simple
linear transformation.
[0128] The feature vectors y.sub.1.sup.0 and y.sub.2.sup.0
normalized in this manner are combined into one feature vector y in
the same manner as (equation 8), and the combined feature vector y
is projected to a discriminant space by using the discriminant
matrix W.sub.3 obtained by performing linear discriminant analysis,
thereby obtaining an output feature vector z. The discriminant
matrix W.sub.3 for this purpose is stored in the discriminant
matrix storage means 89, and the linear transformation means 88
performs projection computation for this purpose to calculate, for
example, a 24-dimensional feature vector z.
[0129] When the output feature vector z is to be quantized in five
bits per element, the size of each element must be normalized in
advance. For example, the size of each element is normalized in
advance in accordance with the variance value of each element.
[0130] That is, a standard deviation value .sigma..sub.i in a
learning sample of each element z.sub.i of the feature vector z is
obtained in advance, and normalization is performed to satisfy
z.sub.0=16Z.sub.i/3.sigma..sub.i. Assume that the size is five
bits. In this case, it suffices if the size is quantized to a value
falling within the range of -16 to 15.
[0131] In this case, normalization is the computation of
multiplying each element by the reciprocal of the standard
deviation. In consideration of a matrix .SIGMA. having
.sigma..sub.i as a diagonal element, a normalized vector z.sub.0
becomes z.sup.0=.SIGMA.z. That is, since simple linear
transformation is performed, .SIGMA. may be applied to the
discriminant matrix W.sub.3 in advance as indicated by equation
(15).
W.sub.3.sup.0.sup..sup.T=.SIGMA.W.sub.3.sup.T (15)
[0132] Performing normalization in this manner can perform range
correction necessary for quantization. In addition, since
normalization is performed by using the standard deviation,
computation based on the Mahalanobis distance can be performed by
only calculating a simple L2 norm in computing the norm of an
inter-pattern distance at the time of collation, thereby reducing
the computation amount at the time of collation.
[0133] As described above, the facial image feature extraction
means 722 extracts a feature vector z.sub.f from the normalized
image f(x, y) in this manner. With respect to an image g(x, y)
obtained by cutting only a central portion of a face, a feature
vector z.sub.g is extracted by the facial image feature extraction
means 722 in the same manner as described above. The two feature
vectors z.sub.f and z.sub.g are extracted by using the facial
metadata creating unit as a facial feature amount z.
[0134] Note that a computer may be caused to execute the above
facial metadata creation sequence by a computer program. In
addition, this program may be recorded on a computer-readable
recording medium.
[0135] (2) Facial Similarity Calculation
[0136] The operation of the facial similarity calculation unit 74
will be described next.
[0137] The facial similarity calculation unit 74 calculates a
similarity d(z.sub.1, z.sub.2) by using K-dimensional feature
vectors z.sub.1 and z.sub.2 obtained from two facial metadata.
[0138] For example, a similarity is calculated by the square
distance of equation (16): 8 d ( z 1 , z 2 ) = i = 1 K i z 1 , i -
z 2 , i 2 ( 16 )
[0139] where .alpha..sub.i is a weighting factor. If, for example,
the reciprocal of the standard deviation of each feature dimension
z.sub.i is used, calculation based on the Mahalanobis distance is
performed. If feature vectors are normalized in advance by equation
(15) or the like, since a basis matrix is normalized in advance
with variance values, the Mahalanobis distance is set.
Alternatively, a similarity may be calculated by the cosine of each
feature vector to be compared which is expressed by equation (3). 9
d ( Z 1 , Z 2 ) = Z 1 Z 2 Z 1 Z 2 ( 17 )
[0140] Note that when a distance is used, a larger value indicates
a lower similarity (the faces do not resemble each other), whereas
when a cosine is used, a larger value indicates a higher similarity
(the faces resemble each other).
[0141] According to the above description, one facial image is
registered, and a retrieval is performed by using one facial image.
When, however, a plurality of images are registered for the face of
one individual and a retrieval is to be performed by using one
facial image, a similarity may be calculated for each of a
plurality of facial metadata on the registration side.
[0142] Likewise, when a plurality of images are to be registered
for the face of one individual and a retrieval is to be performed
by using a plurality of images, calculating a similarity by
obtaining the mean or minimum value of similarity for each
combination makes it possible to calculate a similarity for one
facial data. This indicates that the matching system of the present
invention can be applied to face verification in an image sequence
by regarding the image sequence as a plurality of images.
[0143] The embodiments of the present invention have been described
above by referring to the accompanying drawings as need. Obviously,
however, the present invention can be implemented by a
computer-executable program.
[0144] In addition, this program may be recorded on a
computer-readable recording medium.
[0145] (Fourth Embodiment)
[0146] Another embodiment of the present invention will be
described in detail with reference to the accompanying drawings.
The present invention is directed to an improvement in the facial
metadata creating unit 72 according to the third invention.
According to the third invention, the discriminant features of the
principal components of a feature vector having the real and
imaginary parts of a Fourier frequency component obtained by
performing a Fourier transform for an input facial image and a
feature vector having a power spectrum as an element are
calculated, and the discriminant feature of a feature vector
obtained by combining the respective vectors is calculated again,
thereby calculating the feature amount of the face. In this case,
since a Fourier power spectrum reflects the overall feature amount
of an input image, components of the input pixels which contain
much noise (e.g., pixels around the mouth which tend to change in
relative position) are reflected in the power spectrum in the same
manner as the remaining pixels. As a consequence, even if an
effective feature amount is selected by discriminant analysis,
sufficient performance may not be obtained. In such a case, the
input image is segmented into regions, and a Fourier transform is
performed for each local region. Discriminant analysis is then
performed by using a power spectrum for each local region as a
feature amount. This can reduce the influences of the feature
amount of a region which locally exhibits poor discriminant
performance (large within-class variance) by discriminant
analysis.
[0147] FIG. 9 is a view for explaining an embodiment and shows the
flow of feature extraction processing. In this embodiment, for
example, a 32.times.32 pixel region is segmented into four
16.times.16 pixel regions, 16 8.times.8 pixel regions, 64 4.times.4
pixel regions, 256 2.times.2 pixel regions, and 1024 1.times.1
pixel regions (which are substantially the same as the input image,
and hence the input image can be used without segmentation)
(S1001). A Fourier transform is performed in each segmented region
(S1002). A power spectrum is then calculated (S1003). The above
calculation is performed all the segmented regions (S1004). The
size of a region is changed (S1005). The sizes of all the regions
are changed (S1006). FIG. 10 summarizes this processing flow.
1024.times.5 dimensions=5120-dimensional feature amount of all the
power spectra of the respective regions obtained in this manner is
extracted.
[0148] Since this number of dimensions is too large in general when
learning data is small in amount, the principal component analysis
is performed in advance to obtain in advance the basis of the
principal component analysis which reduces the number of
dimensions. For example, an appropriate number of dimensions is
about 300. Discriminant analysis is further performed on the
feature vector of this dimension count to obtain a basis which
reduces the number of dimensions and corresponds to a feature axis
exhibiting good discriminant performance. A basis corresponding to
the principal component analysis and discriminant analysis is
calculated in advance (this basis will be referred to as a PCLDA
projection basis .PSI.).
[0149] A discriminant feature z can be obtained by projecting the
5120-dimensional feature by linear computation using the projection
basis .PSI. using this PCLDA basis. The feature amount of the face
can be obtained by further performing quantization and the like for
this feature.
[0150] Note that the 5120-dimensional feature amount can be reduced
in the number of dimensions by considering the symmetry of the
Fourier power spectrum and removing and not using high-frequency
components. This can realize high-speed learning, reduce the amount
of data required, and realize high-speed feature extraction.
Therefore, the number of dimensions is preferably reduced as
needed.
[0151] Segmenting a region into blocks and multiplexing Fourier
spectra in this manner can sequentially obtain multiple expressions
of feature amounts having translation universality and local
feature amounts from a feature amount equivalent to an image
feature (in the case of 1024 segmentations). A feature amount
effective for identification is selected from the multiple,
redundant feature expressions by discriminant analysis, thereby
obtaining a compact feature amount which provides good
identification performance.
[0152] A Fourier power spectrum is obtained by nonlinear
computation for an image, which can calculate an effective feature
amount which cannot be obtained by simply applying discriminant
analysis based on linear computation to the image.
[0153] Although the application of linear discriminant analysis to
principal components has been described above, second-stage feature
extraction may be performed by using kernel discriminant analysis
(discriminant analysis using a kernel technique called Kernel
Fisher Discriminant Analysis, KFDA, Kernel Discriminant Analysis:
KDA, or Generalized Discriminant Analysis: GDA).
[0154] For a detailed description of kernel discriminant analysis,
see the reference by Q. Liu et al. (non-patent reference 3:
"Kernel-based Optimized Feature Vectors Selection and Discriminant
Analysis for Face Recognition", Processing of IAPR International
Conference on Pattern Recognition (ICPR), Vol. II, pp. 362-365,
2002) or the reference by G. Baudat (non-patent reference 4:
Generalized Discriminant Analysis Using a Kernel Approach", Neural
Computation, Vol. 12, pp. 2385-2404, 2000).
[0155] By extracting a feature using kernel discriminant analysis,
the effect of nonlinear feature extraction can be enhanced to allow
extraction of an effective feature.
[0156] In this case, however, since a large feature vector of 5120
dimensions is to be processed, a large amount of memory and a large
amount of learning data are required even for the principal
component analysis. Referring to FIG. 11, in order to avoid such a
problem, the principal component analysis/discriminant analysis is
individually performed for each block. Thereafter, two-stage
discriminant analysis (Linear Discriminant Analysis: LDA) is
performed. This makes it possible to reduce the computation
amount.
[0157] In this case, the principal component analysis and
discriminant analysis are performed for each region by using a
1024-dimensional feature amount (512 dimensions if the number of
dimensions is reduced to half in consideration of symmetry) to
obtain a basis matrix .PSI..sub.i (i=0, 1, 2, . . . , 5) in
advance. Each feature vector is then normalized by using its mean
value, and second-stage LDA projection is performed.
[0158] By performing processing for each block in this manner, the
number of data and computer resources required for learning can be
reduced. This makes it possible to shorten the time required for
the optimization of learning.
[0159] Note that high-speed computation can be realized by omitting
the vector normalization processing and calculating a basis matrix
for PCLDA projection and a basis matrix for LDA projection in
advance.
[0160] FIG. 12 is a view for explaining still another embodiment
and shows the flow of feature extraction processing. In this
embodiment, such region segmentation is performed in a plurality of
stages (two stages in FIG. 12) to extract multiple power spectra in
multiple resolutions as feature amounts for discriminant analysis
in consideration of the translation universality of Fourier power
spectra in local regions and the reliability of the local regions.
Feature extraction is then performed using the optimal feature
space obtained by discriminant analysis.
[0161] Assume that an input image f(x, y) has 32.times.32 pixels.
In this case, as shown in FIG. 10, a power spectrum .vertline.F(u,
v).vertline. of the entire image, power spectra
.vertline.F.sup.1.sub.1(u, v).vertline., .vertline.F.sup.1.sub.2(u,
v).vertline., .vertline.F.sup.1.sub.3(u, v).vertline., and
.vertline.F.sup.1.sub.4(u, v) .vertline. of four 16.times.16 pixel
regions obtained by segmenting the entire image into four regions,
and power spectra .vertline.F.sup.2.sub.1(u, v).vertline.,
.vertline.F.sup.2.sub.1(u, v), . . . , .vertline.F.sup.2.sub.16(u,
v).vertline. of 16 8.times.8 pixel regions obtained by segmenting
the entire image into 16 regions are extracted as feature
vectors.
[0162] In consideration of the symmetry of the Fourier power
spectrum of the real image, it suffices to extract 1/2 of them.
Alternatively, in order to avoid an increase in the size of a
feature vector for discriminant analysis, a feature vector may be
formed without sampling any high-frequency components for
discrimination. If, for example, a feature vector is formed by
sampling 1/4 of spectra which correspond to low-frequency
components, the number of learning samples required can be reduced
or the processing time required for learning and recognition can be
shortened. If the number of learning data is small, discriminant
analysis may be performed after the number of feature dimensions is
reduced by the principal component analysis in advance.
[0163] Discriminant analysis is performed by using a feature vector
x.sub.2.sup.f extracted in this manner and a learning set prepared
in advance to obtain a basis matrix .PSI..sub.2.sup.f in advance.
FIG. 9 shows an example of projection for the extraction of a
discriminant feature from principal components (Principal Component
Linear Discriminant Analysis; PCLDA). The feature vector
x.sub.2.sup.f is projected by using the basis matrix
.PSI..sub.2.sup.f, and the mean and size of the projected feature
vector are normalized, thereby calculating a feature vector
y.sub.2.sup.f.
[0164] Likewise, the feature vector x.sub.2.sup.f obtained by
combining the real and imaginary components of a Fourier frequency
is projected by linear computation processing using a basis matrix
.PSI..sub.1.sup.f to obtain a feature vector with a reduced number
of dimensions, and the mean and size of the vector are normalized
to calculate a feature vector y.sub.1.sup.f. A feature vector
obtained by combining these vectors is projected again by using a
discriminant basis .PSI..sub.1.sup.f to obtain a feature vector
z.sup.f. This vector is quantized in, for example, five bits to
extract a facial feature amount.
[0165] Assume that the input is a facial image normalized to a size
of 44.times.56 pixels. In this case, the above processing is
applied to the 32.times.32 pixels of a central portion to extract a
facial feature amount. In addition, facial feature amounts are also
extracted from multiple segmented regions of the 44.times.56 pixel
region of the entire face, including the entire 44.times.56 pixel
region, four 22.times.28 pixel regions, and 16 11.times.14 pixel
pixels.
[0166] FIG. 13 shows another embodiment, in which PCLDA projection
of a combination of a real component, an imaginary component, and a
power spectrum is performed for each local region, or PCLDA
projection of a feature obtained by combining a real component and
imaginary component and PCLDA projection of a power spectrum are
separately performed, and LDA projection is finally performed, as
shown in FIG. 14.
[0167] (Fifth Embodiment)
[0168] Another embodiment of the present invention will be
described in detail with reference to the accompanying
drawings.
[0169] This embodiment is an embodiment of a facial feature
description method using the present invention and descriptors of
facial features. FIG. 15 shows a. description of a facial feature
amount, as an example of a facial feature description, which uses a
DDL representation syntax (Description Definition Language
Representation Syntax) in ISO/IEC FDIS 15938-3, "Information
technology Multimedia content description interface--Part 3:
Visual".
[0170] In this case, for a description of a facial feature named
"AdvancedFaceRecognition", elements named "FourierFeature" and
"CentralFourieFeature" are provided. Each of "FourierFeature" and
"CentralFourieFeature" is a 5-bit integer without a sign,
representing that it can have 24-dimensional components to
63-dimensional components.
[0171] FIG. 16 shows a rule in a case wherein a binary
representation syntax is used for data representation. According to
this rule, the sizes of the array components of FourierFeature and
CentralFourierFeature are stored in the field of 6-bit integers
without signs in numOfFourierFeature and numOfCentralFourier, and
each component of FourierFeature and CentralFourierFeature is
stored in the form of a 5-bit integer without a sign.
[0172] Descriptors of such facial features using the present
invention will be described in more detail.
[0173] numofFourierFeature
[0174] This field specifies the number of components of
FourierFeature. The allowable range is from 24 to 63.
[0175] numOfCentralFourierFeature
[0176] This field specifies the number of components of
CentralFourierFeature. The allowable range is from 24 to 63.
[0177] FourierFeature
[0178] This element represents a facial feature based on the
cascaded LDA of the Fourier characteristics of a normalized face
image. The normalized face image is obtained by scaling an original
image into 56 lines with 46 luminance values in each line. The
center positions of two eyes in the normalized face image shall be
located on the 24th row and the 16th and 31st columns for the right
and left eyes respectively.
[0179] The FourierFeature element is derived from two feature
vectors; one is a Fourier Spectrum Vector x.sub.1.sup.f, and the
other is a Multi-block Fourier Amplitude Vector x.sub.2.sup.f. FIG.
17 illustrates the extraction process of FourierFeature. Given a
normalized face image, five steps should be performed to extract
the element;
[0180] (1) Extraction of a Fourier Spectrum Vector
x.sub.1.sup.f,
[0181] (2) Extraction of a Multi-block Fourier Amplitude Vector
x.sub.2.sup.f,
[0182] (3) Projections of feature vectors using PCLDA basis
matrices .PSI..sub.1.sup.f, .PSI..sub.2.sup.f, and their
normalization to unit vectors y.sub.1.sup.f, .PSI..sub.2.sup.f,
[0183] (4) Projection of a Joint Fourier Vector y.sub.3.sup.f of
the unit vectors using an LDA basis matrix .PSI..sub.3.sup.f,
[0184] (5) Quantization of the projected vector Z.sup.f.
[0185] STEP-1) Extraction of Fourier Spectrum Vector Given a
normalized face image f(x, y), the Fourier spectrum F(u, v) of f(x,
y) is calculated by 10 F ( u , v ) = x = 0 M - 1 y = 0 N - 1 f ( x
, y ) exp ( - 2 i ( xu M + yv N ) ) ( u = 0 , , M - 1 ; v = 0 , , N
- 1 ) ( 18 )
[0186] where, M=46 and N=56. A Fourier Spectrum Vector
x.sub.1.sup.f is defined as a set of scanned components of the
Fourier spectrum. FIG. 18 shows the scanning method of the Fourier
spectrum. The scanning shall be performed only on two rectangle
regions, regions A and region B, in the Fourier domain. The
scanning rule is concluded in FIG. 19. Here, S.sub.R(u, v) denotes
the top-left coordinate of region R, and E.sub.R(u, v) does the
bottom-right point of region R. Therefore, the Fourier Spectrum
Vector x.sub.1.sup.f is expressed by 11 x 1 f = ( Re [ F ( 0 , 0 )
] Re [ F ( 11 , 0 ) ] Re [ F ( 35 , 0 ) ] Re [ F ( 45 , 0 ) ] Re [
F ( 45 , 13 ) ] Im [ F ( 0 , 0 ) ] Im [ F ( 11 , 0 ) ] Im [ F ( 35
, 0 ) ] Im [ F ( 45 , 0 ) ] Im [ F ( 45 , 13 ) ] ) ( 19 )
[0187] The dimension of x.sub.1.sup.f is 644.
[0188] STEP 2) Extraction of Multi-block Fourier Amplitude
Vector
[0189] A multi-block Fourier Amplitude Vector is extracted from the
Fourier amplitudes of partial images in the normalized face image.
As the partial images, three types of images are used; (a) a
holistic image, (b) quarter images, and (c) {fraction (1/16)}
images.
[0190] (a) holistic image
[0191] A holistic image f.sub.1.sup.0(x, y) is obtained by clipping
the normalized image f(x, y) in 44.times.56 image size removing
boundary columns in both sides. It is given by
f.sub.1.sup.0(x, y)=f(x+1, y) (x=0, 1, . . . , 43; y=0,1, . . . ,
55) (20)
[0192] (b) quarter images
[0193] Quarter images are obtained by dividing the holistic image
f.sub.1.sup.0(x, y) equally into 4 blocks fk.sup.1(x, y) (k=1, 2,
3, 4) given by
f.sub.k.sup.1(x, y)=f1.sup.0(x+22s.sub.k.sup.1, y+28t.sub.k.sup.1)
(x=0, 1, . . . , 21; y=0, 1, . . . , 27) (21)
[0194] where s.sub.k.sup.1=(k-1)%2, t.sub.k.sup.1=(k-1)/2.
[0195] (c) one-sixteenth images
[0196] One-sixteenth images are obtained by dividing
f.sub.1.sup.0(x, y) equally into 16 equal blocks f.sub.k.sup.2(x,
y) (k=1, 2, 3, . . . , 16) given by
f.sub.k.sup.2(x, y)=f.sub.1.sup.0(x+11s.sub.k.sup.2,
y+14t.sub.k.sup.2) (x=0, 1, . . . , 10; y=0, 1, . . . , 13)
(22)
[0197] where s.sub.k.sup.2=(k-1)%4, t.sub.k.sup.2=(k-1)/4.
[0198] From these images, Fourier amplitudes
.vertline.F.sub.k.sup.j(u, v).vertline. are calculated follows: 12
F k j ( u , v ) = x = 0 M j - 1 y = 0 N j - 1 f k j ( x , y ) exp (
- 2 i ( xu M j + yv N j ) ) , F k j ( u , v ) = Re [ F k j ( u , v
) ] 2 + Im [ F k j ( u , v ) ] 2 ( 23 )
[0199] where M.sup.j is the width of each partial image, that is,
M.sup.0=44, M.sup.1=22, and M.sup.2=11. N.sup.J denotes the height
of each partial image, that is, N.sup.0=56, N.sup.1=28, and
N.sup.2=14.
[0200] Multi-block Fourier Amplitude Vectors is obtained by
scanning low frequency regions of each amplitude
.vertline.F.sub.k.sup.j(u, v).vertline. of 1) the holistic image
(k=1), 2) the quarter images (k=1, 2, 3, 4), and 3) the
one-sixteenth images (k=1, 2, . . . , 16). The scan regions are
defined in FIG. 19.
[0201] Therefore, the Multi-block Fourier Amplitude Vector
x.sub.2.sup.f is expressed as follows: 13 x x f = ( F 1 0 ( 0 , 0 )
F 1 0 ( 43 , 13 ) F 1 1 ( 0 , 0 ) F 1 1 ( 21 , 6 ) F 2 1 ( 0 , 0 )
F 2 1 ( 21 , 6 ) F 3 1 ( 0 , 0 ) F 4 1 ( 21 , 6 ) F 1 2 ( 0 , 0 ) F
16 2 ( 10 , 2 ) ) ( 24 )
[0202] The dimension of x.sub.2.sup.f is 856.
[0203] STEP 3) PCLDA Projection and Vector Normalization
[0204] The Fourier Spectrum Vector x.sub.1.sup.f and Multi-block
Fourier Amplitude Vector x.sub.2.sup.f shall be respectively
projected using the PCLDA basis matrices .PSI..sub.1.sup.f and
.PSI..sub.2.sup.f, and normalized to unit vectors y.sub.1.sup.f and
y.sub.2.sup.f. The normalized vector y.sub.k.sup.f (k=1, 2) is
given by 14 y k f = k f T x k f - m k f k f T x k f - m k f ( 25
)
[0205] where, the PCLDA basis matrix .PSI..sub.k.sup.f and the mean
vector m.sub.k.sup.f are a basis matrix obtained by performing
linear discriminant analysis on the principal components of
x.sub.k.sup.f and a mean of projected vectors, respectively. The
values of them are given by referring to a look-up-table calculated
in advance. The dimensions of y.sub.1.sup.f and y.sub.2.sup.f are
70 and 80, respectively.
[0206] STEP 4) LDA Projection of Joint Fourier Vector
[0207] The normalized vectors y.sub.1.sup.f and Y.sub.2.sup.f are
combined to form a 150-dimensional Joint Fourier vector
y.sub.3.sup.f, and projected using the LDA basis matrix
.PSI..sub.3.sup.f. The projected vector z.sub.f is given by 15 z f
= k f T y 3 f = k f T ( y 1 f y 2 f ) ( 26 )
[0208] STEP 5) Quantization
[0209] Each elements of z.sub.f is clipped in the range of
5bit-unsigned integer using the following equation: 16 w i f = { 0
if z 1 f < - 16 31 , if z 1 f > - 15 floor ( z i t + 16 )
others ( 27 )
[0210] The quantized elements are stored as FourierFeature.
FourierFeature[0] represents the first quantized element
w.sub.o.sup.f, and FourierFeature[numOfFourierFeature-1]
corresponds to the (numOfFourierFeature)th element
w.sup.f.sub.numOfFourierFeature-1.
[0211] CentralFourierFeature
[0212] This element represents a facial feature based on the
cascaded LDA of the Fourier characteristics of the central part in
the normalized face image. CentralFourierFeature is extracted in
the similar way as FourierFeature.
[0213] The central portion g(x, y) is obtained by clipping the
image f(x, y) into 32.times.32 image starting at (7, 12) as
follows:
g(x, y)=f(x+7, y+12) (x=0, 1, . . . , 31; y=0, 1, . . . , 31)
(28)
[0214] STEP 1) Extraction of Central Fourier Spectrum Vector
[0215] The Fourier spectrum G(u, v) of g(x, y) is calculated by 17
G ( u , v ) = x = 0 M - 1 y = 0 N - 1 g ( x , y ) exp ( - 2 i ( xu
M + yv N ) ) ( u = 0 , , M - 1 ; v = 0 , , N - 1 ) ( 29 )
[0216] where, M=32 and N=32. A 256-dimensional Central Fourier
Spectrum Vector x.sub.1.sup.g is produced by scanning the Fourier
spectrum G(u, v) as defined in FIG. 20. STEP 2) Extraction of
Multi-block Central Fourier Amplitude Vector
[0217] A Central Multi-block Fourier Amplitude Vectors
x.sub.2.sup.g is extracted from the Fourier amplitudes of (a) the
central part g.sub.1.sup.0(x, y), (b) quarter images
g.sub.k.sup.1(x, y) (k=1, 2, 3, 4), and (c) {fraction (1/16)}
images g.sub.k.sup.2(x, y) (k=1, 2, 3, . . . , 16).
[0218] (a) central part
g.sub.1.sup.0(x, y)=g(x, y) (x=0, 1, . . . , 31; y=0, 1, . . . ,31)
(31)
[0219] (b) quarter images
g.sub.k.sup.1(x, y)=g(x+16s.sub.k.sup.1, y+16t.sub.k.sup.1) (x=0,
1, . . . , 15; y=0, 1, . . . , 15) (31)
[0220] where S.sub.k.sup.1=(k-1)%2, t.sub.k.sup.1(k-1)/2
[0221] (c) one-sixteenth images
g.sub.k.sup.2(x, y)=g.sub.1.sup.0(x+8s.sub.k.sup.2,
y+8t.sub.k.sup.2) (x=0, 1, . . . , 7; y=0, 1, . . . , 7) (32)
[0222] where s.sub.k.sup.2=(k-1)%4 and t.sub.k.sup.2=(k-1)/4.
[0223] A Fourier amplitude .vertline.G.sub.k.sup.j(u, v).vertline.
of each image is calculated as follows: 18 G k j ( u , v ) = x = 0
M j - 1 y = 0 N j - 1 g k j ( x , y ) exp ( - 2 i ( xu M j + yv N j
) ) G k j ( u , v ) = Re [ G k j ( u , v ) ] 2 + Im [ G k j ( u , v
) ] 2 ( 33 )
[0224] where M.sup.0=32, M.sup.1=16, M.sup.2=8, N.sup.0=32,
N.sup.1=16, and N.sup.2=8. A multi-block Central Fourier Amplitude
Vector x.sub.2.sup.g is obtained by scanning each amplitude
.vertline.G.sub.k.sup.j(u, v).vertline. as defined in FIG. 20.
[0225] The processing in STEP 3-5) are the same as those
FourierFeature, for example, the Joint Central Fourier Vector
y.sub.3g consists of the normalized vectors y.sub.1.sup.g and
y.sub.2.sup.g. The basis matrices .PSI..sub.1.sup.g,
.PSI..sub.2.sup.g, and .PSI..sub.3.sup.g and the mean vectors
m.sub.1.sup.g and m.sub.2.sup.g for CentralFourierFeature are
calculated in advance, and prepared in the form of a look-up
table.
[0226] The size of CentralFourierFeature is indicated by
numOfCentralFourierFeature.
[0227] Facial feature description data obtained in this manner is
compact in description length but exhibits high recognition
performance, and hence is an expression efficient for the storage
and transmission of data.
[0228] Note that the present invention may be implemented by a
computer-executable program. In the case of the fifth embodiment,
the present invention can be implemented by describing the
functions indicated by steps 1 to 5 in FIG. 17 in a
computer-readable program and making the program function on the
computer.
[0229] In addition, this program may be recorded on a
computer-readable recording medium.
[0230] When the example shown in FIG. 17 is to be implemented as a
device, all or some of the functions written in the block diagram
of FIG. 21 may be implemented. More specifically, all or some of a
normalized face image output means 211, Fourier spectrum vector
extraction means 212, multiblock Fourier amplitude vector
extraction means 213 and PCLDA projection/vector normalization
means 214 may be implemented.
[0231] According to each embodiment described above, a feature
vector effective for discrimination by discriminant analysis is
extracted from an input pattern feature vector for each element
vector, and feature extraction is performed again for the obtained
feature vector by using a discriminant matrix by discriminant
analysis. This makes it possible to suppress a reduction in feature
amount effective for discrimination when feature dimension
reduction is performed, and to transform a feature vector for
efficient feature extraction.
[0232] Each embodiment described above is effective for a case
wherein the number of learning samples required for discriminant
analysis is limited in spite of a large pattern feature amount.
That is, the number of feature dimensions can be reduced, while a
loss of features effective for identification is suppressed,
without necessarily using the principal component analysis.
[0233] As has been described above, the image feature extraction
method, the image feature extraction device, and the recording
medium storing the corresponding program in the field of pattern
recognition according to the present invention are suitable for the
use in a feature vector transformation technique for compressing
feature dimension by extracting feature vectors effective for
recognition from input feature vectors.
* * * * *