Face recognition system Patent Grant Turk , et al. January 12, 1 [Massachusetts Institute of Technology]

Face recognition system

Turk , et al. January 12, 1

Patent Grant RE36041

U.S. patent number RE36,041 [Application Number 08/340,615] was granted by the patent office on 1999-01-12 for face recognition system. This patent grant is currently assigned to Massachusetts Institute of Technology. Invention is credited to Alex P. Pentland, Matthew Turk.

United States Patent	RE36,041
Turk , et al.	January 12, 1999

Face recognition system

Abstract

A recognition system for identifying members of an audience, the system including an imaging system which generates an image of the audience; a selector module for selecting a portion of the generated image; a detection means which analyzes the selected image portion to determine whether an image of a person is present; and a recognition module responsive to the detection means for determining whether a detected image of a person identified by the detection means resembles one of a reference set of images of individuals.

Inventors:	Turk; Matthew (Cambridge, MA), Pentland; Alex P. (Cambridge, MA)
Assignee:	Massachusetts Institute of Technology (Cambridge, MA)
Family ID:	24434619
Appl. No.:	08/340,615
Filed:	November 16, 1994

Related U.S. Patent Documents


	Application Number	Filing Date	Patent Number	Issue Date
Reissue of:	608000	Nov 1, 1990	05164992	Nov 17, 1992

Current U.S. Class:	382/118; 382/201; 382/204
Current CPC Class:	G06K 9/6232 (20130101); G06K 9/00275 (20130101); H04N 19/94 (20141101); H04H 60/45 (20130101); H04N 19/00 (20130101); G06K 9/00241 (20130101); H04N 21/42201 (20130101); G07C 9/37 (20200101); G06K 9/00228 (20130101); A61B 5/1176 (20130101); H04H 60/56 (20130101); G06K 9/6247 (20130101); H04N 19/20 (20141101); H04H 60/59 (20130101)
Current International Class:	G06K 9/62 (20060101); H04N 7/26 (20060101); H04H 9/00 (20060101); G06K 9/00 (20060101); G06K 009/00 ()
Field of Search:	;382/115,118,200,201,195,278,279,724,204 ;348/143,1 ;340/825.36,825.49 ;455/2

References Cited [Referenced By]

U.S. Patent Documents


4636862	January 1987	Hatori et al.
4651289	March 1987	Maeda et al.
4752957	June 1988	Maeda
4838644	June 1989	Ochoa et al.
4858000	August 1989	Lu
4926491	May 1990	Maeda et al.
4930011	May 1990	Kiewit
4998286	March 1991	Tsujiuchi et al.
5031228	July 1991	Lu

Other References

L Sirovich et al., 1987 Optical Society of America, "Low-dimensional procedure for the characterization of human faces", pp. 519-524..

Primary Examiner: Mancuso; Joseph
Attorney, Agent or Firm: Fish & Richardson, P.C.

Claims

What is claimed is:

1. A recognition system for identifying members of an audience, the system comprising:

an imaging system which generates an image of the audience;

a selector module for selecting a portion of said generated image;

means for representing a reference set of images of individuals as a set of eigenvectors in a multi-dimensional image space;

a detection means which determines whether the selected image portion contains an image that can be classified as an image of a person, said detection means including means for representing said selected image portion as an input vector in said multi-dimensional image space and means for computing the distance between a point identified by said input vector and a multi-dimensional subspace defined by said set of eigenvectors, wherein said detection means uses the computed distance to determine whether the selected image portion contains an image that can be classified as an image of a person; and

a recognition module responsive to said detection means for determining whether a detected image of a person identified by said detection means resembles one of the reference set of images of individuals.

2. The recognition system of claim 1 wherein said detection means further comprises a thresholding means for determining whether an image of a person is present by comparing said computed distance to a preselected threshold.

3. The recognition system of claim 1 wherein said .[.selection means.]. .Iadd.selector module .Iaddend.comprises a motion detector for identifying the selected portion of said image by detector motion.

4. The recognition system of claim 3 wherein said .[.selection means.]. .Iadd.selector module .Iaddend.further comprises a locator module for locating the portion of said image corresponding to a face of the person based on motion detected by said motion detector.

5. The recognition system of claim 1 wherein said image of a person is an image of a person's face and wherein said reference set comprises images of faces of said individuals.

6. The recognition system of claim 1 wherein said recognition module comprises means for representing each member of said reference set as a corresponding point in said subspace.

7. The recognition system of claim 6 wherein the location of each point in subspace associated with a corresponding member of said reference set is determined by projecting a vector associated with that member onto said subspace.

8. The recognition system of claim 7 wherein said recognition module further comprises means for projecting said input vector onto said subspace.

9. The recognition system of claim 8 wherein said recognition module further comprises means for selecting a particular member of said reference set and means for computing a distance within said subspace between a point identified by the projection of said input vector onto said subspace and the point in said subspace associated with said selected member.

10. The recognition system of claim 8 wherein said recognition module further comprises means for determining for each member of said reference set a distance in subspace between the location associated with that member in subspace and the point identified by the projection of said input vector onto said subspace.

11. The recognition system of claim 10 wherein said image of a person is an image of a person's face and wherein said reference set comprises images of faces of said individuals.

12. A method for identifying members of an audience, the method comprising:

generating an image of the audience;

selecting a portion of said generated image;

representing a reference set of images of individuals as a set of eigenevectors in a multi-dimensional image space;

representing said selected image portion as an input vector in said multi-dimensional image space;

computing the distance between a point identified by said input vector and a multi-dimensional subspace defined by said set of eigenvectors;

using the computed distance to determine whether the selected image portion contains an image that can be classified as an image of a person; and

if it is determined that the selected image contains an image that can be classified as an image of a person determining whether said image of a person resembles one of a reference set of images of individuals.

13. The method of claim 12 further comprising the step of determining which one, if any, of the members of said reference set said image of a person resembles.

14. The method of claim 12 wherein the image of the audience is a sequence of image frames and wherein the method further comprises detecting motion within the sequence of image frames and wherein the selected image portion is determined on the basis of the detected motion.

15. The method of claim 12 wherein the step of determining whether the selected image portion contains an image that can be classified as an image of a person further comprises comparing said computed distance to a preselected threshold.

16. The method of claim 15 wherein the step of determining whether said image of a person resembles a member of said reference set comprises representing each member of said reference set as a corresponding point in said subspace.

17. The method of claim 16 wherein the step of determining whether said image of a person resembles a member of said reference set further comprises determining the location of each point in subspace associated with a corresponding member of said reference set by projecting a vector associated with that member onto said subspace.

18. The method of claim 17 wherein the step of determining whether said image of a person resembles a member of said reference set further comprises projecting said input vector onto said subspace.

19. The method of claim 18 wherein the step of determining whether said image of a person resembles a member of said reference set further comprises selecting a member of said reference set and computing a distance within said subspace between a point identified by the projection of said input vector onto said subspace and the point in said subspace associated with said selected member.

20. The method of claim 18 wherein the step of determining whether said image of a person resembles a member of said reference set further comprises determining for each member of said reference set a distance in subspace between the location for that member in subspace and the point identified by the projection of said input vector onto said subspace.

21. The method of claim 20 wherein said image of a person is an image of a person's face and wherein said reference set comprises images of faces of said individuals. .Iadd.

22. A recognition system comprising:

an imaging system which generates an image;

a selector module for selecting a portion of said generated image;

means for representing a reference set of images of individuals as a set of eigenvectors in a multi-dimensional image space;

a detection means which determines whether the selected image portion contains an image that can be classified as an image of a person, said detection means including means for representing said selected image portion as an input vector in said multi-dimensional image space and means for computing the distance between a point identified by said input vector and a multi-dimensional subspace defined by said set of eigenvectors, wherein said detection means uses the computed distance to determine whether the selected image portion contains an image that can be classified as an image of a person; and

a recognition module responsive to said detection means for determining whether a detected image of a person identified by said detection means resembles one of the reference set of images of individuals. .Iaddend..Iadd.23. The recognition system of claim 22 wherein said detection means further comprises a thresholding means for determining whether an image of a person is present by comparing said computed distance to a preselected threshold. .Iaddend..Iadd.24. The recognition system of claim 22 wherein said image of a person is an image of a person's face and wherein said reference set comprises images of faces of said individuals. .Iaddend..Iadd.25. The recognition system of claim 22 wherein said recognition module comprises means for representing each member of said reference set as a corresponding point in said subspace.

.Iaddend..Iadd.26. The recognition system of claim 25 wherein the location of each point in subspace associated with a corresponding member of said reference set is determined by projecting a vector associated with that member onto said subspace. .Iaddend..Iadd.27. The recognition system of claim 26 wherein said recognition module further comprises means for projecting said input vector onto said subspace. .Iaddend..Iadd.28. The recognition system of claim 27 wherein said recognition module further comprises means for selecting a particular member of said reference set and means for computing a distance within said subspace between a point identified by the projection of said input vector onto said subspace and the point in said subspace associated with said selected member. .Iaddend..Iadd.29. The recognition system of claim 27 wherein said recognition module further comprises means for determining for each member of said reference set a distance in subspace between the location associated with that member in subspace and the point identified by the projection of said input vector onto said subspace. .Iaddend..Iadd.30. The recognition system of claim 24 wherein said means for representing said reference set includes means for adding a member to said reference set by protecting into said subspace an input vector having a computed distance indicative of an image of a face. .Iaddend..Iadd.31. A method comprising:

generating an image;

selecting a portion of said generated image;

representing a reference set of images of faces of individuals as a set of eigenvectors in a multi-dimensional image space;

representing said selected image portion as an input vector in said multi-dimensional image space;

computing the distance between a point identified by said input vector and a multi-dimensional subspace defined by said set of eigenvectors;

using the computed distance to determine whether the selected image portion contains an image that can be classified as an image of a person's face; and

if it is determined that the selected image contains an image that can be classified as an image of a person's face, determining whether said image of a person's face resembles one of a reference set of images of faces of

individuals. .Iaddend..Iadd.32. The method of claim 31 further comprising the step of determining which one, if any, of the members of said reference set said image of a person's face resembles. .Iaddend..Iadd.33. The method of claim 31 wherein the step of determining whether the selected image portion contains an image that can be classified as an image of a person's face further comprises comparing said computed distance to a preselected threshold. .Iaddend..Iadd.34. The method of claim 33 wherein the step of determining whether said image of a person's face resembles a member of said reference set comprises representing each member of said reference set as a corresponding point in said subspace. .Iaddend..Iadd.35. The method of claim 34 wherein the step of determining whether said image of a person's face resembles a member of said reference set further comprises determining the location of each point in subspace associated with a corresponding member of said reference set by projecting a vector associated with that member onto said subspace.

.Iaddend..Iadd. The method of claim 35 wherein the step of determining whether said image of a person's face resembles a member of said reference set further comprises projecting said input vector onto said subspace. .Iaddend..Iadd.37. The method of claim 36 wherein the step of determining whether said image of a person's face resembles a member of said reference set further comprises determining for each member of said reference set a distance in subspace between the location for that member in subspace and the point identified by the projection of said input vector onto said subspace. .Iaddend.

Description

BACKGROUND OF THE INVENTION

The invention relates to a system for identifying members of a viewing audience.

For a commercial television network, the cost of its advertising time depends critically on the popularity of its programs among the television viewing audience. Popularity, in this case, is typically measured in terms of the program's share of the total audience viewing television at the time the program airs. As a general rule of thumb, advertisers prefer to place their advertisements where they will reach the greatest number of people. Thus, there is a higher demand among commercial advertisers for advertising time slots along side more popular programs. Such time slots can also demand a higher price.

Because the economics of television advertising depends so critically on the tastes and preferences of the television audience, the television industry invests a substantial amount of time, effort and money in measuring those tastes and preferences. One preferred approach involves monitoring the actual viewing habits of a group of volunteer families which represent a cross-section of all people who watch television. Typically, the participants in such a study allow monitoring equipment to be placed in their homes. Whenever a participant watches a television program, the monitoring equipment records the time, the identity of the program and the identity of the members of the viewing audience. Many of these systems require active participation by the television viewer to obtain the monitoring information. That is, the viewer must in some way interact with the equipment to record his presence in the viewing audience. If the viewer forgets to record his presence the monitoring statistics will be incomplete. In general, the less manual intervention required by the television viewer, the more likely it is that the gathered statistics on viewing habits will be complete and error free.

Systems have been developed which automatically identify members of the viewing audience without requiring the viewer to enter any information. For example, U.S. Pat. No. 4,858,000 to Daozehng Lu, issued Aug. 15, 1989 describes such a system. In the system, a scanner using infrared detectors locates a member of the viewing audience, captures an image of the located member, extracts a pattern signature for the captured image and then compares the extracted pattern signature to a set of stored pattern image signatures to identify the audience member.

SUMMARY OF THE INVENTION

In general, in one aspect, the invention is a recognition system for identifying members of an audience. The invention includes an imaging system which generates an image of the audience; a selector module for selecting a portion of the generated image; a detection means which analyzes the selected image portion to determine whether an image of a person is present; and a recognition module for determining whether a detected image of a person resembles one of a reference set of images of individuals.

Preferred embodiments include the following features. The recognition module also determines which one, if any, of the individuals in the reference set the detected image resembles. The selection means includes a motion detector for identifying the selected portion of the image by detecting motion and it includes a locator module for locating the portion of the image corresponding to the face of the person detected. In the recognition system, the detection means and the recognition module employ a first and second pattern recognition techniques, respectively, to determine whether an image of a person is present in the selected portion of the image and both pattern recognition techniques employ a set of eigenvectors in a multi-dimensional image space to characterize the reference set. In addition, the second pattern recognition technique also represents each member of the reference set as a point in a subspace defined by the set of eigenvectors. Also, the image of a person is an image of a person's face and the reference set includes images of faces of the individuals.

Also in preferred embodiments, the recognition system includes means for representing the reference set as a set of eigenvectors in a multi-dimensional image space and the detection means includes means for representing the selected image portion as an input vector in the multi-dimensional image space and means for computing the distance between a point identified by the input vector and a subspace defined by the set of eigenvectors. The detection means also includes a thresholding means for determining whether an image of a person is present by comparing the computed distance to a preselected threshold. The recognition module includes means for representing each member of the reference set as a corresponding point in the subspace. To determine the location of each point in subspace associated with a corresponding member of the reference set, a vector associated with that member is projected onto the subspace.

The recognition module also includes means for projecting the input vector onto the subspace, means for selecting a particular member of the reference set, and means for computing a distance within the subspace between a point identified by the projection of the input vector onto the subspace and the point in the subspace associated with the selected member.

In general, in another aspect, the invention is a method for identifying members of an audience. The invention includes the steps of generating an image of the audience; selecting a portion of the generated image; analyzing the selected image portion to determine whether an image of a person is present; and if an image of a person is determined to be present, determining whether the image of a person resembles one of a reference set of images of individuals.

One advantage of the invention is that it is fast, relatively simple and works well in a constrained environment, i.e., an environment for which the associated image remains relatively constant except for the coming and going of people. In addition, the invention determines whether a selected portion of an image actually contains an image of a face. If it is determined that the selected image portion contains an image of a face, the invention then determine which one of a reference set of known faces the detected face image most resembles. If the detected face image is not present among the reference set, the invention reports the presence of a unknown person in the audience. The invention has the ability to discriminate face images from images of other objects.

Other advantages and features will become apparent from the following description of the preferred embodiment and from the claims.

DESCRIPTION OF THE PREFERRED EMBODIMENT

FIG. 1 is a block diagram of a face recognition system;

FIG. 2 is a flow diagram of an initialization procedure for the face recognition module;

FIG. 3 is a flow diagram of the operation of the face recognition module; and

FIG. 4 is a block diagram of a motion detection system for locating faces within a sequence of images.

STRUCTURE AND OPERATION

Referring to FIG. 1, in an audience monitoring system 2, a video camera 4, which is trained on an area where members of a viewing audience generally sit to watch the TV, sends a sequence of video image frames to a motion detection module 6. Video camera 4, which may, for example, be installed in the home of a family that has volunteered to participate in a study of public viewing habits, generates images of TV viewing audience. Motion detection module 6 processes the sequence of image frames to identify regions of the recorded scene that contain motion, and thus may be evidence of the presence of a person watching TV. In general, motion detection module 6 accomplishes this by comparing successive frames of the image sequence so as to find those locations containing image data that changes over time. Since the image background (i.e., images of the furniture and other objects in the room) will usually remain unchanged from frame to frame, the areas of movement will generally be evidence of the presence of a person in the viewing audience.

When movement is identified, a head locator module 8 selects a block of the image frame containing the movement and sends it to a face recognition module 10 where it is analyzed for the presence of recognizable faces. Face recognition module 10 performs two functions. First, it determines whether the image data within the selected block resembles a face. Then, if it does resemble a face, module 10 determines whether the face is one of a reference set of faces. The reference set may include, for example, the images of faces of all members of the family in whose house the audience monitoring system has been installed.

To perform its recognition functions, face recognizer 10 employs a multi-dimensional representation in which face images are characterized by a set of eigenvectors or "eigenfaces". In general, according to this technique, each image is represented as a vector (or a point) in very high dimensional image space in which each pixel of the image is represented by a corresponding dimension or axis. The dimension of this image space thus depends upon the size of the image being represented and can become very large for any reasonably sized image. For example, if the block of image data is N pixels by N pixels, then the multi-dimensional image space has dimension N.sup.2. The image vector which represents the N.times.N block of image data in this multi-dimensional image space is constructed by simply concatenating the rows of the image data to generate a vector of length N.sup.2.

Face images, like all other possible images, are represented by points within this multi-dimensional image space. The distribution of faces, however, tends to be grouped within a region of the image space. Thus, the distribution of faces of the reference set can be characterized by using principal component analysis. The resulting principal components of the distribution of faces, or the eigenvectors of the covariance matrix of the set of face images, defines the variation among the set of face images. These eigenvectors are typically ordered, each one accounting for a different amount of variation among the face images. They can be thought of as a set of features which together characterize the variation between face images within the reference set. Each face image location within the multi-dimensional image space contributes more or less to each eigenvector, so that each eigenvector represents a sort of ghostly face which is referred to herein as an eigenface.

Each individual face from the reference set can be represented exactly in terms of a linear combination of M non-zero eigenfaces. Each face can also be approximated using only the M' "best" faces, i.e., those that have the largest eigenvalues, and which therefore account for the most variance within the set of face images. The best M' eigenfaces span an M'-dimensional subspace (referred to hereinafter as "face space") of all possible images.

This approach to face recognition involves the initialization operations shown in FIG. 2 to "train" recognition module 10. First, a reference set of face images is obtained and each of the faces of that set is represented as a corresponding vector or point in the multi-dimensional image space (step 100). Then, using principal component analysis, the distribution of points for the reference set of faces is characterized in terms of a set of eigenvectors (or eigenfaces) (step 102). If a full characterization of the distribution of points is performed, it will yield N.sup.2 eigenfaces of which M are non-zero. Of these, only the M' eigenfaces corresponding to the highest eigenvalues are chosen, where M'<M<<N.sup.2. This subset of eigenfaces is used to define a subspace (or face space) within the multidimensional image space. Finally, each member of the reference set is represented by a corresponding point within face space (step 104). For a given face, this is accomplished by projecting its point in the higher dimensional image space onto face space.

If additional faces are added to the reference set at a later time, these operations are repeated to update the set of eigenfaces characterizing the reference set.

After face recognition module 10 is initialized, it implements the steps shown in FIG. 3 to recognize face images supplied by face locator module 8. First, face recognition module 10 projects the input image (i.e., the image presumed to contain a face) onto face space by projecting it onto each of the M' eigenfaces (step 200). Then, module 10 determines whether the input image is a face at all (whether known or unknown) by checking to see if the image is sufficiently close to "face space" (step 202). That is, module 10 computes how far the input image in the multi-dimensional image space is from the face space and compares this to a preselected threshold. If the computed distance is greater than the preselected threshold, module 10 indicates that it does not represent a face image and motion detection module 6 locates the next block of the overall image which may contain a face image.

If the computed distance is sufficiently close to face space (i.e., less than the preselected threshold), recognition module 10 treats it as a face image and proceeds with determining whose face it is (step 206). This involves computing distances between the projection of the input image onto face space and each of the reference face images in face space. If the projected input image is sufficiently close to any one of the reference faces (i.e., the computed distance in face space is less than a predetermined distance), recognition module 10 identifies the input image as belonging to the individual associated with that reference face. If the projected input image is not sufficently close to any one of the reference faces, recognition module 10 reports that a person has been located but the identity of the person is unknown.

The mathematics underlying each of these steps will now be described in greater detail.

Calculating Eigenfaces

Let a face image I(x,y) be a two-dimensional N by N array of (8-bit) intensity values. The face image is represented in the multi-dimensional image space as a vector of dimension N.sup.2. Thus, a typical image of size 256 by 256 becomes a vector of dimension 65,536, or, equivalently, a point in 65,536-dimensional image space. An ensemble of images, then, maps to a collection of points in this huge space.

Images of faces, being similar in overall configuration, are not randomly distributed in this huge image space and thus can be described by a relatively low dimensional subspace. Using principal component analysis, one identifies the vectors which best account for the distribution of face images within the entire image space. These vectors, namely, the "eigenfaces", define the "face space". Each vector is of length N.sup.2, describes an N by N image, and is a linear combination of the original face images of the reference set.

Let the training set of face images be .GAMMA..sub.1, .GAMMA..sub.2, .GAMMA..sub.3, . . . , .GAMMA..sub.m. The average face of the set is defined by

where the summation is from n=1 to M. Each face differs from the average by the vector .PHI..sub.i =.GAMMA..sub.i -.PSI.. This set of very large vectors is then subject to principal component analysis, which seeks a set of M orthonormal vectors, u.sub.n, which best describes the distribution of the data. The kth vector, u.sub.k, is chosen such that:

is a maximum, subject to: ##EQU1##

The vectors u.sub.k and scalars .lambda..sub.k are the eigenvectors and eigenvalues, respectively, of the covariance matrix ##EQU2## where the matrix A=[.PHI..sub.1 .PHI..sub.2 . . . .PHI..sub.M ]. The matrix C, however, is N.sup.2 by N.sup.2, and determining the N.sup.2 eigenvectors and eigenvalues can become an intractable task for typical image sizes.

If the number of data points in the face space is less than the dimension of the overall image space (namely, if, M<N.sup.2), there will be only M-1, rather than N.sup.2, meaningful eigenvectors. (The remaining eigenvectors will have associated eigenvalues of zero.) One can solve for the N.sup.2 -dimensional eigenvectors in this case by first solving for the eigenvectors of an M by M matrix--e.g. solving a 16.times.16 matrix rather than a 16,384 by 16,384 matrix--and then taking appropriate linear combinations of the face images .PHI..sub.i. Consider the eigenvectors v.sub.i of A.sup.T A such that:

Premultiplying both sides by A, yields:

from which it is apparent that Av.sub.i are the eigenvectors of C=AA.sup.T.

Following this analysis, it is possible to construct the M by M matrix L=A.sup.T A, where L.sub.mn =.PHI..sub.m.sup.T .PHI..sub.n, and find the M eigenvectors, v.sub.1, of L. These vectors determine linear combinations of the M training set face images to form the eigenfaces u.sub.1 : ##EQU3##

With this analysis the calculations are greatly reduced, from the order of the number of pixels in the images (N.sup.2) to the order of the number of images in the training set (M). In practice, the training set of face images will be relatively small (M<<N.sup.2), and the calculations become quite manageable. The associated eigenvalues provide a basis for ranking the eigenvectors according to their usefulness in characterizing the variation among the images.

In practice, a smaller M' is sufficient for identification, since accurate construction of the image is not a requirement. In this framework, identification becomes a pattern recognition task. The eigenfaces span an M'-dimensional subspace of the original N.sup.2 image space. The M' significant eigenvectors of the L matrix are chosen as those with the largest associated eigenvalues. In test cases based upon M=16 face images, M'=7 eigenfaces were found to yield acceptable results, i.e., a level of accuracy sufficient for monitoring a TV audience for purposes of studying viewing habits and tastes.

A new face image (.GAMMA.) is transformed into its eigenface components (i.e., projected into "face space") by a simple operation,

for k=1, . . . , M'. This describes a set of point-by-point image multiplications and summations, operations which may be performed at approximately frame rate on current image processing hardware.

The weights form a vector .OMEGA..sup.T =[.omega..sub.1 .omega..sub.2 . . . .omega..sub.M,] that describes the contribution of each eigenface in representing the input face image, treating the eigenfaces as a basis set for face images. The vector may then be used in a standard pattern recognition algorithm to find which of a number of pre-defined face classes, if any, best describes the face. The simplest method for determining which face class provides the best description of an input face image is to find the face class k that minimizes the Euclidian distance

where .OMEGA..sub.k is a vector describing the kth face class. The face classes .OMEGA..sub.i are calculated by averaging the results of the eigenface representation over a small number of face images (as few as one) of each individual. A face is classified as belonging to class k when the minimum .epsilon..sub.k is below some chosen threshold .theta..sub..epsilon.. Otherwise the face is classified as "unknown", and optionally used to create a new face class.

Because creating the vector of weights is equivalent to projecting the original face image onto the low-dimensional face space, many images (most of them looking nothing like a face) will project onto a given pattern vector. This is not a problem for the system, however, since the distance .epsilon. between the image and the face space is simply the squared distance between the mean-adjusted input image .PHI.=.GAMMA.-.PSI. and .PHI..sub.f=.SIGMA..omega..sub.k u.sub.k, its projection onto face space (where the summation is over k from 1 to M'):

Thus, there are four possibilities for an input image and its pattern vector: (1) near face space and near a face class; (2) near face space but not near a known face class; (3) distant from face space and near a face class; and (4) distant from face space and not near a known face class.

In the first case, an individual is recognized and identified. In the second case, an unknown individual is present. The last two cases indicate that the image is not a face image. Case three typically shows up as a false positive in most other recognition systems. In the described embodiment, however, the false recognition may be detected because of the significant distance between the image and the subspace of expected face images.

Summary of Eigenface Recognition Procedure

To summarize, the eigenfaces approach to face recognition involves the following steps:

1. Collect a set of characteristic face images of the known individuals. This set may include a number of images for each person, with some variation in expression and in lighting. (Say four images of ten people, so M=40.)

2. Calculate the (40.times.40) matrix L, find its eigenvectors and eigenvalues, and choose the M' eigenvectors with the highest associated eigenvalues. (Let M'=10 in this example.)

3. Combine the normalized training set of images according to Eq. 7 to produce the (M'=10) eigenfaces u.sub.k.

4. For each known individual, calculate the class vector .OMEGA..sub.k by averaging the eigenface pattern vectors .OMEGA. (from Eq. 9) calculated from the original (four) images of the individual. Choose a threshold .theta..sub..epsilon. which defines the maximum allowable distance from any face class, and a threshold .theta..sub.t which defines the maximum allowable distance from face space (according to Eq. 10).

5. For each new face image to be identified, calculate its pattern vector .phi., the distances .epsilon..sub.i to each known class, and the distance .epsilon. to face space. If the distance .epsilon.>.theta..sub.t, classify the input image as not a face. If the minimum distance .epsilon..sub.k .ltoreq..theta..sub..epsilon. and the distance .epsilon..ltoreq..theta..sub.1, classify the input face as the individual associated with class vector .OMEGA..sub.k. If the minimum distance .epsilon..sub.k >.theta..epsilon. and .epsilon..ltoreq..theta..sub.1, then the image may be classified as "unknown", and optionally used to begin a new face class.

6. If the new image is classified as a known individual, this image may be added to the original set of familiar face images, and the eigenfaces may be recalculated (steps 1-4). This gives the opportunity to modify the face space as the system encounters more instances of known faces.

In the described embodiment, calculation of the eigenfaces is done offline as part of the training. The recognition currently takes about 400 msec running rather inefficiently in Lisp on a Sun 4, using face images of size 128.times.128. With some special-purpose hardware, the current version could run at close to frame rate (33 msec).

Designing a practical system for face recognition within this framework requires assessing the tradeoffs between generality, required accuracy, and speed. If the face recognition task is restricted to a small set of people (such as the members of a family or a small company), a small set of eigenfaces is adequate to span the faces of interest. If the system is to learn new faces or represent many people, a larger basis set of eigenfaces will likely be required.

Motion Detection And Head Tracking

In the described embodiment, motion detection module 6 and head locator module 8 locates and tracks the position of the head of any person within the scene viewed by video camera 4 by implementing the tracking algorithm depicted in FIG. 4. A sequence of image frames 30 from video camera 4 first passes through a spatio-temporal filtering module 32 which accentuates image locations which change with time. Spatio-temporal filtering module 32 identifies the locations of motion by performing a differencing operation on successive frames of the sequence of image frames. In the output of the spatio-temporal filter module 32, a moving person "lights up" whereas the other areas of the image containing no motion appear as black.

The spatio-temporal filtered image passes to a thresholding module 34 which produces a binary motion image identifying the locations of the image for which the motion exceeds a preselected threshold. That is, it locates the areas of the image containing the most motion. In all such areas, the presence of a person is postulated.

A motion analyzer module 36 analyzes the binary motion image to watch how "motion blobs" change over time to decide if the motion is caused by a person moving and to determine head position. A few simple rules are applied, such as "the head is the small upper blob above a larger blob (i.e., the body)", and "head motion must be reasonably slow and contiguous" (i.e., heads are not expected to jump around the image erratically).

The motion image also allows for an estimate of scale. The size of the blob that is assumed to be the moving head determines the size of the subimage to send to face recognition module 10 (see FIG. 1). This subimage is rescaled to fit the dimensions of the eigenfaces.

Using "Face Space" To Locate The Face

Face space may also be used to locate faces in single images, either as an alternative to locating faces from motion (e.g. if there is too little motion or many moving objects) or as a method of achieving more precision than is possible by use of motion tracking alone.

Typically, images of faces do not change radically when projected into the face space; whereas, the projection of non-face images appear quite different. This basic idea may be used to detect the presence of faces in a scene. To implement this approach, the distance .epsilon. between the local subimage and face space is calculated at every location in the image. This calculated distance from face space is then used as a measure of "faceness". The result of calculating the distance from face space at every point in the image is a "face map" .epsilon.(x,y) in which low values (i.e., the dark areas) indicate the presence of a face.

Direct application of Eq. 10, however, is rather expensive computationally. A simpler, more efficient method of calculating the face map .epsilon.(x,y) is as follows.

To calculate the face map at every pixel of an image I(x,y), the subimage centered at that pixel is projected onto face space and the projection is then subtracted from the original subimage. To project a subimage .GAMMA. onto face space, one first subtracts the mean image (i.e., .PSI.), resulting in .PHI.=.GAMMA.-.PSI.. With .PHI..sub.f being the projection of .PHI. onto face space, the distance measure at a given image location is then: ##EQU4## since .PHI..sub.f .perp.(.PHI.-.PHI..sub.f). Because .PHI..sub.f is a linear combination of the eigenfaces (.PHI..sub.f =.SIGMA..sub.i .omega..sub.i u.sub.i) and the eigenfaces are orthonormal vectors,

and

where .epsilon.(x,y) and .omega..sub.i (x,y) are scalar functions of image location, and .PHI.(x,y) is a vector function of image location.

The second term of Eq. 13 is calculated in practice by a correlation with the L eigenfaces: ##EQU5## where x the correlation operator. The first term of Eq. 13 becomes ##EQU6## Since the average face .PSI. and the eigenfaces u.sub.i are fixed, the terms .PSI..sup.T .PSI. and .PSI.xu.sub.i may be computed ahead of time.

Thus, the computation of the face map involves only L+1 correlations over the input image and the computation of the first term .GAMMA..sup.T (x,y).GAMMA.(x,y). This is computed by squaring the input image I(x,y) and, at each image location, summing the squared values of the local subimage.

Scale Invariance

Experiments reveal that recognition performance decreases quickly as the head size, or scale, is mis-judged. It is therefore desirable for the head size in the input image must be close to that of the eigenfaces. The motion analysis can give an estimate of head size, from which the face image is rescaled to the eigenface size.

Another approach to the scale problem, which may be separate from or in addition to the motion estimate, is to use multiscale eigenfaces, in which an input face image is compared with eigenfaces at a number of scales. In this case the image will appear to be near the face space of only the closest scale eigenfaces. Equivalently, the input image (i.e., the portion of the overall image selected for analysis) can be scaled to multiple sizes and the scale which results in the smallest distance measure to face space used.

Other embodiments are within the following claims. For example, although the eigenfaces approach to face recognition has been presented as an information processing model, it may also be implemented using simple parallel computing elements, as in a connectionist system or artificial neural network.

* * * * *