U.S. patent application number 15/278461 was filed with the patent office on 2017-02-02 for method and a system for verifying facial data.
The applicant listed for this patent is Huawei Technologies Co., Ltd.. Invention is credited to Chaochao Lu, Xiaoou Tang.
Application Number | 20170031953 15/278461 |
Document ID | / |
Family ID | 54193837 |
Filed Date | 2017-02-02 |
United States Patent
Application |
20170031953 |
Kind Code |
A1 |
Tang; Xiaoou ; et
al. |
February 2, 2017 |
Method and a System for Verifying Facial Data
Abstract
A method for verifying facial data and a corresponding system,
which comprises retrieving a plurality of source-domain datasets
from a first database and a target-domain dataset from a second
database different from the first database; determining a latent
subspace matching with target-domain dataset best and a posterior
distribution for the determined latent subspace from the
target-domain dataset and the source-domain datasets; determining
information shared between the target-domain data and the
source-domain datasets; and establishing a Multi-Task learning
model from the posterior distribution P and the shared information
M on the target-domain dataset and the source-domain datasets.
Inventors: |
Tang; Xiaoou; (Hong Kong,
HK) ; Lu; Chaochao; (Hong Kong, HK) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Huawei Technologies Co., Ltd. |
Shenzhen |
|
CN |
|
|
Family ID: |
54193837 |
Appl. No.: |
15/278461 |
Filed: |
September 28, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
PCT/CN2014/000350 |
Mar 28, 2014 |
|
|
|
15278461 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06K 9/00288 20130101;
G06N 7/005 20130101; G06K 9/629 20130101; G06N 20/00 20190101; G06F
16/5838 20190101; G06K 9/6218 20130101; G06K 9/6232 20130101 |
International
Class: |
G06F 17/30 20060101
G06F017/30; G06K 9/00 20060101 G06K009/00 |
Claims
1. A method for verifying facial data, comprising: retrieving a
plurality of source-domain datasets X.sub.i from a first database
and a target-domain dataset X.sub.i from a second database
different from the first database; determining a latent subspace
Z.sub.t matching with target-domain dataset X.sub.t best and a
posterior distribution P for the determined latent subspace Z.sub.t
from the target-domain dataset X.sub.t and the source-domain
datasets X.sub.i; determining information M shared between the
target-domain data X.sub.t and the source-domain datasets X.sub.i;
and establishing a Multi-Task learning model L.sub.model from the
posterior distribution P and the shared information M on the
target-domain dataset X.sub.i and the source-domain datasets
X.sub.i.
2. The method according to claim 1, wherein the posterior
distribution P is determined using the following rule: given
hyper-parameters .theta. to be determined for concreting form of
the model, for a data x of the source-domain datasets X.sub.i, the
latent subspace and the hyper-parameters .theta. shall match with
the data x best.
3. The method according to claim 1, wherein the shared information
M is determined by extending a mutual entropy to the posterior
distributions P.
4. The method according to claim 1, wherein the model is
established with maximization of the posterior distributions P and
the shared information M.
5. The method according to claim 4, further comprising determining
all parameters for the established model by: initializing an
hyper-parameters .theta..sup.0 randomly for .theta.; applying a
gradient descent process to the model to obtain .differential. L
model .differential. .theta. ; ##EQU00013## and iterating .theta.
by rule of .theta. t = .theta. t - 1 - .alpha. .differential. L
model .differential. .theta. ##EQU00014## so as to make .theta. to
converge to optimal values.
6. The method according to claim 1, further comprising determining
a plurality of the training data from a plurality of pairs of face
images.
7. The method according to claim 6, wherein determining the
plurality of the training data from each pair of face images
further comprises: obtaining a first plurality of multiple scale
features m.sub.1, m.sub.2, . . . m.sub.p from a first face A and a
second plurality of multiple scale features n.sub.1, n.sub.2 . . .
n.sub.p from a second person B in the different landmarks of two
faces of A and B; and determining similarities S.sub.1, S.sub.2, .
. . S.sub.p of each two features in same landmarks, wherein S.sub.1
refers to the similarity of m.sub.1 and n.sub.2, wherein S2 refers
to the similarity of m.sub.2 and n.sub.2, . . . , and wherein
S.sub.p refers to the similarity of m.sub.p and n.sub.p.
8. The method according to claim 7, further comprising: inputting
S.sub.1, S.sub.2, . . . S.sub.p formed as input vector x to the
model to determine whether the first multiple scale features and
the second multiple scale features are from the same individual
using the following rules: given any unseen test point x, of X
formed of S.sub.1, S.sub.2, . . . S.sub.p, a probability of its
latent function f* is f*|X,y,x*:N(K*K.sup.-1{circumflex over
(f)},K**-K*{tilde over (K)}.sup.-1K*.sup.T), wherein K is a Kernel
matrix, and {tilde over (K)}=K+W.sup.-1, wherein f* is squashed to
find a probability of class membership as follows:
.pi.(f*)=.intg..pi.(f*)p(f*|X,y,x*)df*. wherein label y.sub.i=1
when the first multiple scale features and the second multiple
scale features of the face pair are from the same individual, and
wherein label y.sub.i=-1 when the first multiple scale features and
the second multiple scale features of the face pair are not from
the same individual.
9. The method according to claim 6, wherein determining the
training data from the pair of face images further comprises:
obtaining a first plurality of multiple scale features m.sub.1,
m.sub.2, . . . m.sub.p from a first face A and a second plurality
of multiple scale features n.sub.1, n.sub.2 . . . n.sub.p from a
second person B in the different landmarks of two faces of A and B;
and concatenating each pair of multi-scale features and its flipped
version to obtain [m.sub.1,n.sub.1] and [n.sub.1, m.sub.1] so as to
generate 2P multi-scale features of size 2L
[m.sub.1,n.sub.1],[n.sub.1,m.sub.1], . . .
,[m.sub.p,n.sub.p],[n.sub.p, m.sub.p] as input vector x to the
Multi-Task learning model.
10. The method according to claim 9, further comprising: grouping
data points of the input vector x into C different clusters,
wherein centers of the clusters are denoted by
{e.sub.i}.sub.i=1.sup.C, wherein variances of the clusters are
denoted by {.SIGMA..sub.i.sup.2}.sub.i=1.sup.C, wherein weights of
the clusters are denoted by {w.sub.i}.sub.i=1.sup.C, and wherein
w.sub.i is a ratio of the number of data points from the i-th
cluster to the number of all data points; determining a
corresponding probability p.sub.i and a variance
.sigma..sub.i.sup.2 for each c.sub.i; computing, for any un-seen
pair of face images, a joint feature vector x* for each pair of
patches; computing a first-order statistic and a second-order
statistic to the centers, wherein the statistics and variance of x*
are represented as its high-dimensional facial features, denoted by
x ^ * = [ .DELTA. 1 1 , .DELTA. 1 2 , .DELTA. 1 3 , .DELTA. 1 4 , ,
.DELTA. C 1 , .DELTA. C 2 , .DELTA. C 3 , .DELTA. C 4 ] T , and
wherein .DELTA. i 1 = w i ( x * - c i .SIGMA. i ) , .DELTA. i 2 = w
i ( x * - c i .SIGMA. i ) , .DELTA. i 3 = p i , and .DELTA. i 4 =
.sigma. i 2 ; ##EQU00015## and concatenating the high-dimensional
features from each pair of patches to form the final new
high-dimensional feature for the pair of face images, so as to
determine whether the first and the second multiple scale features
are from the same individual.
11. An apparatus for verifying facial data, comprising: at least
one processor configured to: retrieve a plurality of source-domain
datasets X.sub.i from a first database and a target-domain dataset
X.sub.i from a second database different from the first database;
and determine a latent subspace Z.sub.t matching with target-domain
dataset X.sub.t best and a posterior distribution P for the
determined latent subspace Z.sub.t from the target-domain dataset
X.sub.t and the source-domain datasets X.sub.i; determine
information M shared between the target-domain data X.sub.t and the
source-domain datasets X.sub.i; and establish a Multi-Task learning
model L.sub.model from the posterior distribution P, and the shared
information M on the target-domain dataset X.sub.t and the
source-domain datasets X.sub.i.
12. The apparatus according to claim 11, wherein the posterior
distribution P is determined using the following rule: given
hyper-parameters .theta. to be determined for concreting form of
the model, for a data x of the source-domain datasets X.sub.i, the
latent subspace and the hyper-parameters .theta. shall match with
the data x best.
13. The apparatus according to claim 11, wherein the shared
information M is determined by extending a mutual entropy to the
posterior distributions P.
14. The apparatus according to claim 11, wherein the model is
established by maximizing the posterior distributions P and the
shared information M
15. The apparatus according to claim 14, wherein the processor is
further configured to determine all parameters for the established
model by: initializing an hyper-parameters .theta..sup.0 randomly
for .theta.; applying a gradient descent process to the model to
obtain .differential. L model .differential. .theta. ; ##EQU00016##
and iterating .theta. by rule of .theta. t = .theta. t - 1 -
.alpha. .differential. L model .differential. .theta. ##EQU00017##
so as to make .theta. to converge to optimal values.
16. The apparatus according to claim 11, wherein the processor is
further configured to determine a plurality of the training data
from a pair of face images.
17. The apparatus according to claim 16, wherein the processor is
further configured to determine the training data from the pair of
face images by: obtaining a first plurality of multiple scale
features m.sub.1, m.sub.2, . . . m.sub.p from a first face A; and a
second plurality of multiple scale features n.sub.1, n.sub.2 . . .
n.sub.p from a second person B in the different landmarks of two
faces of A and B; and determining similarities S.sub.1, S.sub.2, .
. . S.sub.p of each two features in same landmarks, wherein S.sub.1
refers to the similarity of m.sub.1 and n.sub.2, wherein S2 refers
to the similarity of m.sub.2 and n.sub.2, . . . , and wherein
S.sub.p refers to the similarity of m.sub.p and n.sub.p.
18. The apparatus according to claim 17, wherein the processor is
further configured to input S.sub.1, S.sub.2, . . . S.sub.p formed
as input vector x to the Multi-Task learning model to determine
whether the first and the second multiple scale features are from
the same individual by the following rules: given any unseen test
point x* of X formed of S.sub.1, S.sub.2, . . . S.sub.p, a
probability of its latent function f* is
f*|X,y,x*:N(K*K.sup.-1{circumflex over (f)},K**-K*{tilde over
(K)}.sup.-1K*.sup.T), wherein K is a Kernel matrix and {tilde over
(K)}=K+W.sup.-1, wherein f* is squashed to find a probability of
class membership as follows:
.pi.(f*)=.intg..pi.(f*)p(F*|X,y,x*)df*, wherein label y.sub.i=1
when the first and the second multiple scale features of the face
pair are from the same individual, and wherein label y.sub.i=|1
when the first and the second multiple scale features of the face
pair are not from the same individual.
19. The apparatus according to claim 16, wherein the processor is
further configured to determine the training data from the pair of
face images by: obtaining a first plurality of multiple scale
features m.sub.1, m.sub.2, . . . m.sub.p from a first face A and a
second plurality of multiple scale features n.sub.1, n.sub.2 . . .
n.sub.p from a second person B in the different landmarks of two
faces of A and B; and concatenating each pair of multi-scale
features and its flipped version to obtain [m.sub.1,n.sub.1] and
[n.sub.1,m.sub.1] so as to generate 2P multi-scale features of size
2L [m.sub.1,n.sub.1],[n.sub.1,m.sub.1], . . .
,[m.sub.p,n.sub.p],[n.sub.p, m.sub.p] as input vector x to the
Multi-Task learning model.
20. The apparatus according to claim 19, wherein the processor is
further configured to: group data points of the input vector x into
C different clusters, wherein centers of the clusters are denoted
by {c.sub.i}.sub.i=1.sup.C, wherein variances of the clusters are
denoted by {.SIGMA..sub.i.sup.2}.sub.i=1.sup.C, wherein weights of
the clusters are denoted by {w.sub.i}.sub.i=1.sup.C, and wherein
w.sub.i is a ratio of the number of data points from the i-th
cluster to the number of all data points; determine a corresponding
probability p.sub.i and a variance .sigma..sub.i.sup.2 for each
c.sub.i, compute, for any un-seen pair of face images, a joint
feature vector x* for each pair of patches; compute a first-order
statistic and a second-order statistic to the centers, wherein the
statistics and variance of x* are represented as its
high-dimensional facial features, denoted by x ^ * = [ .DELTA. 1 1
, .DELTA. 1 2 , .DELTA. 1 3 , .DELTA. 1 4 , , .DELTA. C 1 , .DELTA.
C 2 , .DELTA. C 3 , .DELTA. C 4 ] T , and wherein .DELTA. i 1 = w i
( x * - c i .SIGMA. i ) , .DELTA. i 2 = w i ( x * - c i .SIGMA. i )
, .DELTA. i 3 = p i , and .DELTA. i 4 = .sigma. i 2 ; ##EQU00018##
and concatenate the high-dimensional features from each pair of
patches to form the final new high-dimensional feature for the pair
of face images, so as to determine whether the first and the second
multiple scale features are from the same individual.
21. A system for verifying facial data, comprising: means for
retrieving a plurality of source-domain datasets X.sub.i from a
first database and a target-domain dataset X.sub.t from a second
database different from the first database; means for determining a
latent subspace Z.sub.t matching with target-domain dataset X.sub.t
best and a posterior distribution P for the determined latent
subspace Z.sub.t from the target-domain dataset X.sub.t and the
source-domain datasets X.sub.i; means for determining information M
shared between the target-domain data X.sub.t and the source-domain
datasets X.sub.i; and means for establishing a Multi-Task learning
model L.sub.model from the posterior distribution P, and the shared
information M on the target-domain dataset X.sub.t and the
source-domain datasets) X.sub.i.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of International
Application No. PCT/CN2014/000350, filed on Mar. 28, 2014, the
disclosure of which is hereby incorporated by reference in its
entirety.
TECHNICAL FIELD
[0002] The present application relates to a method for verifying
facial data and a system thereof.
BACKGROUND
[0003] Face verification, which is a task of determining whether a
pair of face images are from the same person, has been an active
research topic in computer vision for decades. It has many
important applications, including surveillance, access control,
image retrieval, and automatic log-on for personal computer or
mobile devices. However, various visual complications deteriorate
the performance of face verification. This has been shown
particularly by numerous studies on real-world face images from the
wild.
[0004] Modern face verification methods are mainly divided into two
categories: extracting low-level features, and building
classification models. Although these existing methods have made
great progress in face verification, most of them are less flexible
when dealing with complex data distributions. For the methods in
the first category, for example, low-level features are
handcrafted. Even for features learned from data, the algorithm
parameters (such as the depth of random projection tree, or the
number of centers in k-means) also need to be specified by users.
Similarly, for the methods in the second category, the
architectures of neural networks (for example, the number of
layers, the number of nodes in each layer, etc.), and the
parameters of the models (for example, the number of Gaussians, the
number of classifiers, etc.) must also be determined in advance.
Since most existing methods require some assumptions to be made
about the structures of the data, they cannot work well when the
assumptions are not valid. Moreover, due to the existence of the
assumptions, it is hard to capture the intrinsic structures of data
using these methods.
[0005] Most existing face verification methods are suitable for
handling verification tasks under the underlying assumption that
the training data and the test data are drawn from the same feature
space and follow the same distribution. When the distribution
changes, these methods may suffer a large performance drop.
However, many practical scenarios involve cross-domain data drawn
from different facial appearance distributions. It is difficult to
recollect the necessary training data and rebuild the models in new
scenarios. Moreover, there is usually not enough training data in a
specified target domain to train a sufficiently good model for
high-accuracy face verification, due to the fact the weak diversity
of source data often leads to over-fitting. In such cases, it
becomes especially important to exploit more data from multiple
source-domains to improve the performance of face verification
methods in the target-domain.
SUMMARY
[0006] To address these issues, the present application proposes a
Multi-Task Learning approach based on Discriminative Gaussian
Process Latent Variable Model (MTL-DGPLVM) for face verification.
The MTL-DGPLVM model is based on Gaussian Processes (GPs) that is a
non-parametric Bayesian kernel method.
[0007] The present application uses GPs method mainly due to at
least one of the following three notable advantages. Firstly, it is
a non-parametric method, which means it is flexible and can cover
complex data distributions in the real world. Secondly, GPs method
can be computed effectively because its marginal probability is a
closed-form expression. Furthermore, its hyper-parameters can be
learned from data automatically without using model selection
methods such as cross validation, thereby avoiding the high
computational cost. Thirdly, the inference of GPs is based on
Bayesian rules, resulting in the robustness to over-fitting.
[0008] According to one embodiment of the present application, the
discriminative information constraint is used to enhance the
discriminability of GPs. Considering that GPs depend on the
covariance function, it is logical to adopt Kernel Fisher
Discriminant Analysis (KFDA) as the discriminative regularizer. In
order to take advantage of more data from multiple source-domains
to improve the performance in the target-domain, the present
application also introduces the multi-task learning constraint to
GPs. Here, it investigates the asymmetric multi-task learning
because the present application only focuses on the performance
improvement of the target task. From the perspective of information
theory, this constraint is to maximize the mutual information
between the distributions of target-domain data and multiple
source-domains data. The MLT-DGPLVM model can be optimized
effectively using the gradient descent method.
[0009] The proposed MLT-DGPLVM model can be applied to face
verification in two different ways: as a binary classifier and as a
feature extractor. For the first way, given a pair of face images,
it directly computes the posterior likelihood for each class to
make a prediction. In the second way, it automatically extracts
high-dimensional features for each pair of face images, and then
feeds them to a classifier to make the final decision.
[0010] In one aspect, there is disclosed a method for verifying
facial data, comprising a step of retrieving a plurality of
source-domain datasets from a first database and a target-domain
dataset from a second database different from the first database, a
step of determining a latent subspace matching with target-domain
dataset best, and a posterior distribution for the determined
latent subspace from the target-domain dataset and the
source-domain datasets; a step of determining information shared
between the target-domain data and the source-domain datasets; and
a step of establishing a Multi-Task learning model from the
posterior distribution P, and the shared information M on the
target-domain dataset and the source-domain datasets.
[0011] In another aspect of the present application, there is
disclosed an apparatus for verifying facial data, comprising a
model establishing module, wherein the model establishing module
comprises a retrieve unit configured to retrieve a plurality of
source-domain datasets from a first database and a target-domain
dataset from a second database different from the first database
and a model establisher configured to determine a latent subspace
matching with target-domain dataset best, and a posterior
distribution for the determined latent subspace from the
target-domain dataset and the source-domain datasets; determine
information shared between the target-domain data and the
source-domain datasets; and establish a Multi-Task learning model
from the posterior distribution, and the shared information on the
target-domain dataset and the source-domain datasets.
[0012] In further aspect of the present application, the
application further proposes system for verifying facial data,
comprising: [0013] means for retrieving a plurality of
source-domain datasets from a first database and a target-domain
dataset from a second database different from the first database;
[0014] means for determining a latent subspace matching with
target-domain dataset best, and a posterior distribution for the
determined latent subspace from the target-domain dataset and the
source-domain datasets; [0015] means for determining information
shared between the target-domain data and the source-domain
datasets; and [0016] means for establishing a Multi-Task learning
model from the posterior distribution, and the shared information
on the target-domain dataset and the source-domain datasets.
BRIEF DESCRIPTION OF THE DRAWING
[0017] Exemplary non-limiting embodiments of the present disclosure
are described below with reference to the attached drawings. The
drawings are illustrative and generally not to an exact scale. The
same or similar elements on different figures are referenced with
the same reference numbers.
[0018] FIG. 1 is a schematic diagram illustrating an apparatus for
verifying facial data consistent with some disclosed
embodiments.
[0019] FIG. 2 is a schematic diagram illustrating an apparatus for
verifying facial data when it is implemented in software,
consistent with some disclosed embodiments;
[0020] FIG. 3 is a schematic diagram illustrating a method for
verifying facial data, consistent with a first disclosed
embodiment.
[0021] FIG. 4 is a schematic diagram illustrating a method for
verifying facial data, consistent with a second disclosed
embodiment.
[0022] FIG. 5A and FIG. 5B are a schematic diagram illustrating a
Scenario for the first and the second embodiments as shown in FIG.
3 and FIG. 4, respectively.
DETAILED DESCRIPTION
[0023] Reference will now be made in detail to exemplary
embodiments, examples of which are illustrated in the accompanying
drawings. When appropriate, the same reference numbers are used
throughout the drawings to refer to the same or like parts. FIG. 1
is a schematic diagram illustrating an exemplary apparatus 1000 for
verifying facial data consistent with some disclosed
embodiments.
[0024] It shall be appreciated that the apparatus 1000 may be
implemented using certain hardware, software, or a combination
thereof. In addition, the embodiments of the present disclosure may
be adapted to a computer program product embodied on one or more
computer readable storage media (comprising but not limited to disk
storage, compact disc read only memory (CD-ROM), optical memory and
the like) containing computer program codes.
[0025] In the case that the apparatus 1000 is implemented with
software, the apparatus 1000 may include a general purpose
computer, a computer cluster, a mainstream computer, a computing
device dedicated for providing online contents, or a computer
network comprising a group of computers operating in a centralized
or distributed fashion. As shown in FIG. 2, the apparatus 1000 may
include one or more processors (processors 102, 104, 106 etc.), a
memory 112, a storage device 116, a communication interface, and a
bus to facilitate information exchange among various components of
apparatus 1000. Processors 102-106 may include a central processing
unit (CPU), a graphic processing unit (GPU), or other suitable
information processing devices. Depending on the type of hardware
being used, processors 102-106 can include one or more printed
circuit boards, and/or one or more microprocessor chips. Processors
102-106 can execute sequences of computer program instructions to
perform various methods that will be explained in greater detail
below.
[0026] Memory 112 can include, among other things, a random access
memory (RAM) and a read-only memory (ROM). Computer program
instructions can be stored, accessed, and read from memory 112 for
execution by one or more of processors 102-106. For example, memory
112 may store one or more software applications. Further, memory
112 may store an entire software application or only a part of a
software application that is executable by one or more of
processors 102-106. It is noted that although only one block is
shown in FIG. 1, memory 112 may include multiple physical devices
installed on a central computing device or on different computing
devices.
[0027] Referring to FIG. 1 again, where the apparatus 1000 is
implemented by the hardware, it may comprise a device 1001 for
establishing a Multi-Task learning model of facial data. The device
1001 comprises a retrieve unit 1011 configured to retrieve a
plurality of source-domain datasets X.sub.i from a first database
and a target-domain dataset X.sub.t from a second database
different from the first database. In the prior art, most existing
face verification methods are suitable for handling verification
tasks under the underlying assumption that the training data and
the test data are drawn from the same feature space and follow the
same distribution. When the distribution changes, these methods may
suffer a large performance drop. However, many practical scenarios
involve cross-domain data drawn from different facial appearance
distributions. It is difficult to recollect the necessary training
data and rebuild the models in new scenarios. Moreover, there is
usually not enough training data in a specified target domain to
train a sufficiently good model for high-accuracy face
verification, due to the fact the weak diversity of source data
often leads to over-fitting. In such cases, it becomes especially
important to exploit more data from multiple source-domains to
improve the performance of face verification methods in the
target-domain. Accordingly, besides the target-domain dataset as
used in the prior art, the present application unitize at least
four source-domain datasets. For example, for each one of the four
source-domain datasets, the training set consisting of around
20,000 (for example) pairs of matched images and around 20,000 (for
example) pairs of mismatched images may be collected.
[0028] In one embodiment of the present application, the
source-domain datasets may include different types of datasets,
which may comprises (for example):
[0029] Web Images This dataset contains around 40,000 facial images
from 3261 subjects; that is, approximately 10 images for each
person. The images were collected from the Web with significant
variations in pose, expression, and illumination conditions.
[0030] Multi-PIE. This dataset contains face images from 337
subjects under 15 view points and 19 illumination conditions in
four recording sessions. These images are collected under
controlled conditions.
[0031] YouTube.RTM. Faces This dataset contains 3425 videos of 1595
different subjects in the unconstrained environment. All the videos
were downloaded from YouTube. Each subject has a large-scale
collection of labeled images with low-resolution.
[0032] LifePhotos.sup.red2 This dataset contains approximately 5000
images of 400 subjects collected online. Each subject has roughly
10 images.
[0033] The target-domain dataset may comprise, for example, the
benchmark of face verification (Faces in the Wild, (LFW)) as
disclosed in the prior art. This dataset contains 13,233
uncontrolled face images of 5749 public figures with variety of
ethnicity, gender, age, etc. All of these images are collected from
the Web. In the present application, the LFW dataset may be used as
the target-domain dataset because it is a challenging benchmark
compared with other existing face verification methods.
[0034] The device 101 further comprises a model establisher 1012
configured to establish a Multi-Task learning model of facial data
based on the source-domain datasets X.sub.i database and the
target-domain dataset X.sub.t retrieved by the retrieve unit 1011.
In particular, the model establisher 1012 is configured to
determine a latent subspace Z matching with target-domain dataset
X.sub.t best, and a posterior distribution P for the target-domain
dataset X.sub.t from the determined latent subspace Z and the
source-domain datasets X.sub.i. The model establisher 1012 is
configured to further determine information NI shared between the
target-domain data X.sub.t and the source-domain datasets X.sub.i,
and then establish a Multi-Task learning model L.sub.model from the
posterior distribution P, the shared information Hand the
target-domain dataset X.sub.t, which will be discussed for details
in reference to the process as disclosed in another aspect of the
present application later.
[0035] As shown in FIG. 1, the apparatus 1000 may further comprise
a verification device 1002. The verification device 1002 is
configured to obtain a first plurality of multiple scale features
m.sub.1, m.sub.2, . . . m.sub.p from a first face A and a second
plurality of multiple scale features n.sub.1, n.sub.2 . . . n.sub.p
from a second person B in the different landmarks of two faces of A
and B. And then the similarities S.sub.1, S.sub.2, . . . S.sub.p of
each two features in the same landmarks are computed in the
conventional means. That is, S.sub.1 refers to the similarity of
m.sub.1 and n.sub.2, S2 refers to the similarity of m.sub.2 and
n.sub.2 . . . and S.sub.p refers to the similarity of m.sub.p and
n.sub.p. The S.sub.1, S.sub.2, . . . S.sub.p are formed as input
vector X to the Multi-Task learning model to determine whether the
first and the second multiple scale features are from the same
individual, which will be discussed in details later.
[0036] FIG. 3 shows a flowchart illustrating a method for verifying
facial data consistent with some disclosed embodiments. In FIG. 3,
process 100 comprises a series of steps that may be performed by
one or more of processors 102-106 or each module/unit of the
apparatus 1000 to implement a data processing operation. For
purpose of description, the following discussion is made in
reference to the situation where each module/unit of the apparatus
1000 is made in hardware or the combination of hardware and
software.
[0037] At step S101, the apparatus 1000 operates to retrieve a
plurality of source-domain datasets X.sub.i, X.sub.2 . . . X.sub.S
from a first database and a target-domain dataset X.sub.t from a
second database by the retrieve unit 1011. For example, the
source-domain datasets include different types of datasets, which
have been disused in the above.
[0038] At step S102, the apparatus 1000 operates to determine a
latent subspace Z for, which matches with X.sub.t best, based on a
conventional Gaussian Process Latent Variable Model (GPLVM) by the
model establisher 1012. And then, in this step, it determines a
posterior distribution P for the target data X.sub.t from the
latent subspace Z, source-domain datasets X.sub.i, which is
discussed in detailed as below.
[0039] To be specific, let Z=[z.sub.1, . . . z.sub.N].sup.T denote
the matrix whose rows represent corresponding positions of X in
latent space, where z.sub.i.di-elect cons.R.sup.d(d<<D). The
GPLVM can be interpreted as a Gaussian process mapping from a low
dimensional latent space to a high dimensional data set, where the
locale of the points in latent space is determined by maximizing
the Gaussian process likelihood with respect to Z. Given a
covariance function for the Gaussian process, denoted by k(.,.),
the likelihood of the data given the latent positions is as
follows,
p ( X | Z , .theta. ) = 1 ( 2 .pi. ) ND K D exp ( - 1 2 tr ( K - 1
XX T ) ) , ( 1 ) ##EQU00001##
where K.sub.i,j=k(z.sub.i, z.sub.j). Therefore, the posterior P can
be written as
p ( Z , .theta. | X ) = 1 Z a p ( X | Z , .theta. ) p ( Z ) p (
.theta. ) , ( 2 ) ##EQU00002##
where Z.sub.a is a normalization constant, the uninformative priors
over .theta., and the simple spherical Gaussian priors over Z are
introduced in the art and thus the detailed description is omitted
herein. To obtain the optimal .theta. and Z, it needs to optimize
the above likelihood with respect to .theta. and Z, respectively.
In other words, given the data X, this formula describes the
posterior of the data Z and the hyper-parameters .theta., which
means that the latent positions Z and the hyper-parameters .theta.
match with the data X best.
[0040] At step S103, the model establisher 1012 determines
information M shared between the target data X.sub.i and the source
data X.sub.i by extending a mutual entropy to the posterior
distributions P obtained in step S102.
[0041] From an asymmetric multi-task learning perspective, the
tasks should be allowed to share common hyper-parameters of the
covariance function. Moreover, from an information theory
perspective, the information cost between target task and multiple
source tasks should be minimized. A natural way to quantify the
information cost is to use the mutual entropy, because it is the
measure of the mutual dependence of two distributions. For
multi-task learning, we extend the mutual entropy to multiple
distributions as follows:
M = H ( p t ) - 1 S i = 1 S H ( p t | p i ) , ( 3 )
##EQU00003##
where H(.) is the marginal entropy, H(.|.) is the conditional
entropy, S is the number of source tasks, {p.sub.i}.sub.i=1.sup.S,
and p.sub.i are the probability distributions of source tasks and
target task, respectively. p.sub.t may be formulized as
p ( Z ) = 1 Z b exp ( - 1 .sigma. 2 J * ) , ( 4 ) ##EQU00004##
where Z.sub.b is a normalization constant, and .sigma..sup.2
represents a global scaling of the prior.
[0042] At step S104, the model establisher 1012 is configured to
establish a Multi-Task learning model L.sub.Model from posterior
distribution P, the shared information M and target-domain dataset
X.sub.t by rule of
L.sub.model=-log p(Z.sub.T, .theta.|X.sub.T)-.beta.M, (5)
where the parameter .beta. is preset to balance the relative
importance between the target-domain data X.sub.t and the
multi-task learning constraint M (i.e. the shared information).
[0043] P represents a posterior distribution for the target data
X.sub.t as set forth in formula (2); [0044] Z.sub.t represents the
latent subspace for X.sub.t, which matches with X.sub.t best, as
discussed in step S102; [0045] M represents information shared
between the target data X.sub.t and the source data Xi by extending
a mutual entropy to multiple distributions P as discussed in
formula (3).
[0046] The Multi-task learning model consists of two terms:
p(Z.sub.T, .theta.|X.sub.T) and M. The first item is to seek the
optimal hyper-parameters and latent subspace in the given
target-domain dataset. So p(Z.sub.T, .theta.|X.sub.T) should be
maximized, which means that the latent positions Z.sub.T and the
hyper-parameters .theta. matching with the data X.sub.T best should
be obtained. For simplicity, maximizing p(Z.sub.T, .theta.|X.sub.T)
is equivalent to maximizing log p(Z.sub.T, .theta.|X.sub.T). The
second term is the multi-task learning constraint, which describes
how much information is shared between the target-domain dataset
and the source-domain datasets in the latent subspaces. Therefore,
the second item should be also maximized. For convenience, the
maximization of the two terms is equivalent to the minimization of
the negative form of the two terms.
[0047] If taking formula (2) into (3), it then obtains
M = H ( p ( Z T , .theta. | X T ) ) - 1 S i = 1 S H ( p ( Z T ,
.theta. | X T ) | p ( Z i , .theta. | X i ) ) . ( 6 )
##EQU00005##
where H(.) is the marginal entropy, H(.|.) is the conditional
entropy, S is the number of source-domain datasets,
{p(z.sub.i)}.sub.i=1.sup.S, and p(Z.sub.T) are the prior
distributions of source-domain datasets and target-domain dataset,
respectively.
p ( Z ) = 1 Z b exp ( - 1 .sigma. 2 J * ) , ( 7 ) ##EQU00006##
where Z.sub.b is a normalization constant, and .sigma..sup.2
represents a global scaling of the prior.
J * = 1 .lamda. ( a T Ka - a T KA ( .lamda. I n + AKA ) - 1 AKa ) ,
where a = [ 1 n + 1 N + T , - 1 N - 1 N - T ] A = diag ( 1 N + ( I
N + - 1 N + 1 N + 1 N + T ) , 1 N - ( I N - - 1 N - 1 N - 1 N - T )
) . ( 8 ) ##EQU00007##
Here, I.sub.N denotes the N.times.N identity matrix and 1.sub.N
denotes the length-N vector of all ones in R.sup.N.
[0048] Since face verification is a binary classification problem
and the GPs mainly depend on the kernel function, it is natural to
use KFDA to model class structures in kernel spaces. For simplicity
of inference in the followings, we introduce another equivalent
formulation of KFDA to replace the one in the prior art. KFDA is a
kernelized version of linear discriminant analysis method. It finds
the direction defined by a kernel in a feature space, onto which
the projections of positive and negative classes are well separated
by maximizing the ratio of the between-class variance to the
within-class variance. Formally, let {z.sub.1, . . . ,
z.sub.N.sub.+} denote the positive class and {z.sub.N.sub.+.sub.+1,
. . . , z.sub.N} the negative class, where the numbers of positive
and negative classes are N.sub.+ and N.sub.-=N-N.sub.+,
respectively. Let K be the kernel matrix. Therefore, in the feature
space, the two sets {.phi..sub.K(z.sub.1), . . . , .phi..sub.K
(N.sub.N.sub.-)} and {.phi..sub.K (z.sub.N.sub.+.sub.+1), . . . ,
.phi..sub.K(z.sub.N.sub.+.sub.+1), . . . , .phi..sub.K(z.sub.N)}
represent the positive class and the negative class,
respectively.
[0049] From the above, in order to concrete form of (the parameters
K) for the Multi-Task learning model, only one parameter K needed
to be determined, which in turn depends on .theta.. At S104, the
hyper-parameters .theta..sup.0 randomly initialized first, and then
it applies the gradient descent method to the Multi-Task learning
model as shown in formulation (5) by rule of
.differential. L Model .differential. .theta. j = ( .beta. ( log P
T + 1 ) + .beta. SP T i = 1 S P T , i - 1 P T ) .differential. P T
.differential. .theta. j + .beta. S i = 1 S ( log P T - log P T , i
- 1 ) .differential. P T , i .differential. .theta. j . ( 9 )
##EQU00008##
At then, an iteration process is it applied to
.differential. L model .differential. .theta. ##EQU00009##
by rule of
.theta. t = .theta. t - 1 - .alpha. .differential. L model
.differential. .theta. ( 10 ) ##EQU00010##
[0050] In the t-th iteration, we use all source-domain datasets
X.sub.1, X.sub.2, . . . X.sub.s and target-domain X.sub.t obtained
above to obtain .theta..sup.t using the gradient descent method
according the Equation (9). By running the iteration many times,
.theta. will converge to some optimal values .theta.*. The optimal
.theta.* is used to determinate K by rule of
K i , j = k .theta. ( x i , x j ) = .theta. 0 exp ( - 1 2 m = 1 d
.theta. m ( x i m - x j m ) 2 ) + .theta. d + 1 + .delta. x i , x j
.theta. d + 2 , ( 11 ) ##EQU00011##
where .theta.={.theta..sub.i}.sub.i=0.sup.d+2 and d is the
dimension of the data point.
[0051] Once the K is determined, the Multi-Task learning model
(based on Discriminative Gaussian Process Latent Variable Model)
will be concreted.
[0052] At step S105, the apparatus 1000 operates to control the
verification device 1002 to verify the two faces based on the
contorted Model. Firstly, the apparatus 1000 operates to determine
a plurality of the training data by running the retrieve unit 1011.
Specifically, assuming that we have S source-domain datasets, and
each dataset contains N matched pairs and N mismatched pairs, P
multi-scale features are extracted from each image. Herein P
multi-scale feature means the feature obtained from patches with
different size in the facial image. As shown in FIG. 1, for each
pair of face images, the similarity score is computed between one
multi-scale feature of one image and it corresponding multi-scale
feature of the other image. As each pair of face images have P
pairs of multi-scale features, then a similarity vector of size P
can be obtained for each pair of face images. The similarity vector
is regarded as an input x.sub.i. So each pair of face images can be
converted to a similarity vector as an input in the above way.
Therefore, for each source-domain dataset x.sub.i, it consists of N
positive inputs and N negative inputs.
[0053] For example, as shown in FIG. 5A, a first plurality of
multiple scale features m.sub.1, m.sub.2, . . . m.sub.p from a
first face A are obtained and a second plurality of multiple scale
features n.sub.1, n.sub.2 . . . n.sub.p from a second person B in
the different landmarks of two faces of A and B. And then the
similarities S.sub.1, S.sub.2, . . . S.sub.p of each two features
in the same landmarks are computed in the conventional means. That
is, S.sub.1 refers to the similarity of m.sub.1 and n.sub.2, S2
refers to the similarity of m.sub.2 and n.sub.2, . . . and S.sub.p
refers to the similarity of m.sub.p and n.sub.p. The S.sub.1,
S.sub.2, . . . S.sub.p formed as input vector x to the Multi-Task
learning model to determine whether the first and the second
multiple scale features are from the same individual by the
following rules at step S106.
[0054] Given any unseen test point x* of X formed of S.sub.1,
S.sub.2, . . . S.sub.p, the probability of its latent function f*
is
f*|X, y, x*:N(K*K.sup.-1{circumflex over (f)}, K**-K*{tilde over
(K)}.sup.-1KK*.sup.T), (11)
where {tilde over (K)}=K+W.sup.-1.
[0055] That is, given the training data X, the labels of the
training data y, and an un-seen test input x*, its corresponding
latent function f* should follow the multivariate Gaussian
distribution with the mean K*K.sup.-1{circumflex over (f)} and the
covariance K**-K*{tilde over (k)}.sup.-1K*.sup.T.
[0056] Finally, the present application squashes f* to find the
probability of class membership as follows:
.pi.(f*)=.intg..pi.(f*)p(f*|X,y,x,)df*.
[0057] In other words, given the training data X, the labels of the
training data y, an un-seen test input x*, and the distribution of
its corresponding latent function f*, we integrate all
possibilities over the latent function f* of the an un-seen test
input x* to predict its label. If the first and the second multiple
scale features of the face pair are from the same individual, then
it corresponding label y.sub.i=1, otherwise, y.sub.i=-1.
[0058] Hereinafter, as shown in FIG. 4, a process 200 for verifying
facial data consistent with anther embodiments of present
application will be discussed. Similar to process 100, the process
200 also comprises a series of steps that may be performed by one
or more of processors 102-106 of the apparatus 1000 to implement a
data processing operation.
[0059] The process 200 comprises a step of S201 to retrieve a
plurality of source-domain datasets X1, X2 . . . Xs from a first
database and a target-domain dataset Xt from a second database by
running the retrieve unit 1011; a step of S202 of determining a
latent subspace Z for, which compiles with Xt best, based on a
conventional GPLVM by running the model establisher 1012. And then,
in this step, it determines a posterior distribution P for the
target data X.sub.t from the latent subspace Z, source-domain
datasets X.sub.i. In addition, the process 200 further comprises a
step of S203 to determine information M shared between the target
data X.sub.t and the source data X.sub.i by extending a mutual
entropy to the posterior distributions P obtained in step S202; and
a step of S204 to apply the gradient descent method to the
Multi-Task learning model.
[0060] The step of S201.about.S204 are the same as the step of
S101.about.S104, and thus the detailed description thereof
omitted.
[0061] And then, in step S205, it determines the inputs for the
proposed model (MTL-DGPLVM). Specifically, assuming that we have S
source-domain datasets, and each dataset contains N matched pairs
and N mismatched pairs, P multi-scale features are extracted from
each image. Herein P multi-scale features refer to the features
obtained from patches with different size in the facial image. Each
pair of multi-scale features and its flipped version are then
concatenated as shown in FIG. 5B, which is discussed below.
[0062] Suppose that the length of each multi-scale feature is L,
then each pair of face images can generate 2P multi-scale features
of size 2L. If the pair is from the same individual, then the
corresponding label of each multi-scale feature of size 2L is
y.sub.i=1, otherwise, y.sub.i=-1. Therefore, for each source-domain
dataset X.sub.i, it consists of 2PN positive inputs and 2PN
negative inputs. For example, as shown in FIG. 5B, a first
plurality of multiple scale features m.sub.1, m.sub.2, . . .
m.sub.p from a first face A are obtained and a second plurality of
multiple scale features n.sub.1, n.sub.2 . . . n.sub.p from a
second person B in the different landmarks of two faces of A and B.
And then each pair of multi-scale features and its flipped version
are concatenated to obtain [m.sub.1,n.sub.1] and [n.sub.1,
m.sub.1]. Therefore, for each face pair, it generates 2P
multi-scale features of size 2L as follows,
[m.sub.1,n.sub.1],[n.sub.1, m.sub.1], . . . ,[m.sub.p,n.sub.p],
[n.sub.p,m.sub.p]. The vectors are formed as input vector x to the
Multi-Task learning model.
[0063] At step S206, it uses the following method to group the
input data points into different clusters automatically. To be
specific, the principle of GP clustering is based on the key
observation that the variances of predictive values are smaller in
dense areas and larger in sparse areas. The variances can be
employed as a good estimate of the support of a probability density
function, where each separate support domain can be considered as a
cluster. This observation can be explained from the variance
function of any predictive data point) x*.
.sigma..sup.2(x*)=K**-K*K.sup.-1K*.sup.T. (12)
[0064] If x* is in a sparse region, then K*{tilde over
(K)}.sup.-1K*.sup.T becomes small, which leads to large variance
.sigma..sup.2(x*), and vice versa. Another good property of
Equation (12) is that it does not depend on the labels, which means
it can be applied to the unlabeled data.
[0065] To perform clustering, the following dynamic system
associated with Equation (12) can be written as
F(x)=-V.sigma..sup.2(x). (13)
[0066] The existing theorem can guarantee that almost all the
trajectories approach one of the stable equilibrium points detected
from Equation (13). After each data point finds its corresponding
stable equilibrium point, we can employ a complete graph to assign
cluster labels to data points with the stable equilibrium points.
Obviously, the variance function in Equation (12) completely
determines the performance of clustering.
[0067] Suppose that we finally obtain C clusters. The centers of
these clusters are denoted by {e.sub.i}.sub.i=1.sup.C, the
variances of these clusters by {.SIGMA..sub.i.sup.2}.sub.i=1.sup.C,
and their weights by {w.sub.i}.sub.i=1.sup.C where w.sub.i is the
ratio of the number of data points from the i-th cluster to the
number of all data points. All of the above variables can be
computed.
[0068] Then we refer to each e.sub.i as the input of Equation (11),
and we can obtain its corresponding probability p.sub.i and
variance .sigma..sub.i.sup.2.
[0069] For any un-seen pair of face images, it is to first compute
its joint feature vector x* for each pair of patches as shown in
FIG. 5B. Then its first-order and second-order statistics to the
centers will be we computed. The statistics and variance of x* are
represented as its high-dimensional facial features, denoted by
x ^ * = [ .DELTA. 1 1 , .DELTA. 1 2 , .DELTA. 1 3 , .DELTA. 1 4 , ,
.DELTA. C 1 , .DELTA. C 2 , .DELTA. C 3 , .DELTA. C 4 ] T , where
.DELTA. i 1 = w i ( x * - c i .SIGMA. i ) , .DELTA. i 2 = w i ( x *
- c i .SIGMA. i ) , .DELTA. i 3 = p i , and .DELTA. i 4 = .sigma. i
2 . ##EQU00012##
[0070] And then, all of the new high-dimensional features from each
pair of patches are concatenated to form the final new
high-dimensional feature for the pair of face images, so as to
determine whether the first and the second multiple scale features
are from the same individual, for example by using the linear
Support Vector Machine (SVM).
[0071] In the MTL-DGPLVM model, it needs to invert the large matrix
when doing inference and prediction. For large problems, both
storing the matrix and solving the associated linear systems are
computationally prohibitive. In one embodiment of this application,
the well-known anchor graphs method may be used to speed up this
process. To put it simply, the present application first selects
q(q=n) anchors to cover a cloud of n data points, and forms an
n.times.q matrix Q, where Q.sub.i,j=k.sub..theta.(x.sub.i,
c.sub.j). x.sub.i and x.sub.j are from n training data points and q
anchors, respectively. Then the original kernel matrix K can be
approximated as K=QQ.sup.T. Using the matrix inversion lemma,
computing the n.times.n matrix QQ.sup.T can be transformed into
computing the q.times.q matrix Q.sup.TQ, which is more
efficient.
[0072] Speedup on Inference
[0073] When optimizing the proposed model of the present
application, it needs to invert two large matrices (K.sup.-1+W) and
(.lamda.I.sub.n+AKA). Using the well-known Woodbury identity, it
can deduce the following equation
(K.sup.-1+W).sup.-1=K-KW.sup.1/2B.sup.-1W.sup.1/2K (14)
where W is a diagonal matrix and B=I.sub.n-W.sup.1/2KW.sup.1/2.
[0074] During inference, it takes q k-means clustering centers as
anchors to form Q. Substituting K=QQ.sup.T into B, it gets
B.sup.-1=I.sub.n-W.sup.1/2Q(I.sub.q+Q.sup.TWQ).sup.-1Q.sup.TW.sup.1/2
(15)
where (I.sub.q+Q.sup.TWQ) is a q.times.q matrix.
[0075] Similarly, it can get
(.lamda.I.sub.n+AKA).sup.-1=(.lamda.I.sub.n+AQQ.sup.TA).sup.-1=.lamda..s-
up.-1I.sub.n-.lamda..sup.-1AQ(.lamda.I.sub.q+Q.sup.TAAQ).sup.-1Q.sup.TA.
(16)
[0076] Speedup on Prediction
[0077] When computing the predictive variance .sigma.(x*), it needs
to invert the matrix (K+W.sup.-1). At this time, the Gaussian
Processes for Clustering as discussed in the above may be used to
calculate the accurate clustering centers that can be regarded as
the anchors. Using the Woodbury identity again, it obtains
(K+W.sup.-1).sup.-1=W-WQ(I.sub.q+Q.sup.TWQ).sup.-1Q.sup.TW (17)
where (I.sub.2q+Q.sup.TWQ) is only a q.times.q matrix, and its
inverse matrix can be computed more efficiently.
[0078] Although the preferred examples of the present disclosure
have been described, those skilled in the art can make variations
or modifications to these examples upon knowing the basic inventive
concept. The appended claims is intended to be considered as
comprising the preferred examples and all the variations or
modifications fell into the scope of the present disclosure.
[0079] Obviously, those skilled in the art can make variations or
modifications to the present disclosure without departing the
spirit and scope of the present disclosure. As such, if these
variations or modifications belong to the scope of the claims and
equivalent technique, they may also fall into the scope of the
present disclosure.
* * * * *