Face Recognition Using Kernel Methods

Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)

Bibtex Metadata Paper


Ming-Hsuan Yang


Principal Component Analysis and Fisher Linear Discriminant methods have demonstrated their success in face detection, recog(cid:173) nition, and tracking. The representation in these subspace methods is based on second order statistics of the image set, and does not address higher order statistical dependencies such as the relation(cid:173) ships among three or more pixels. Recently Higher Order Statistics and Independent Component Analysis (ICA) have been used as in(cid:173) formative low dimensional representations for visual recognition. In this paper, we investigate the use of Kernel Principal Compo(cid:173) nent Analysis and Kernel Fisher Linear Discriminant for learning low dimensional representations for face recognition, which we call Kernel Eigenface and Kernel Fisherface methods. While Eigenface and Fisherface methods aim to find projection directions based on the second order correlation of samples, Kernel Eigenface and Ker(cid:173) nel Fisherface methods provide generalizations which take higher order correlations into account. We compare the performance of kernel methods with Eigenface, Fisherface and ICA-based meth(cid:173) ods for face recognition with variation in pose, scale, lighting and expression. Experimental results show that kernel methods pro(cid:173) vide better representations and achieve lower error rates for face recognition.

1 Motivation and Approach

Subspace methods have been applied successfully in numerous visual recognition tasks such as face localization, face recognition, 3D object recognition, and tracking. In particular, Principal Component Analysis (PCA) [20] [13] ,and Fisher Linear Dis(cid:173) criminant (FLD) methods [6] have been applied to face recognition with impressive results. While PCA aims to extract a subspace in which the variance is maximized (or the reconstruction error is minimized), some unwanted variations (due to light(cid:173) ing, facial expressions, viewing points, etc.) may be retained (See [8] for examples). It has been observed that in face recognition the variations between the images of the same face due to illumination and viewing direction are almost always larger than image variations due to the changes in face identity [1]. Therefore, while the PCA projections are optimal in a correlation sense (or for reconstruction" from a low dimensional subspace), these eigenvectors or bases may be suboptimal from the

classification viewpoint.

Representations of Eigenface [20] (based on PCA) and Fisherface [6] (based on FLD) methods encode the pattern information based on the second order dependencies, i.e., pixelwise covariance among the pixels, and are insensitive to the dependencies among multiple (more than two) pixels in the samples. Higher order dependencies in an image include nonlinear relations among the pixel intensity values, such as the relationships among three or more pixels in an edge or a curve, which can cap(cid:173) ture important information for recognition. Several researchers have conjectured that higher order statistics may be crucial to better represent complex patterns. Recently, Higher Order Statistics (HOS) have been applied to visual learning prob(cid:173) lems. Rajagopalan et ale use HOS of the images of a target object to get a better approximation of an unknown distribution. Experiments on face detection [16] and vehicle detection [15] show comparable, if no better, results than other PCA-based methods.

The concept of Independent Component Analysis (ICA) maximizes the degree of statistical independence of output variables using contrast functions such as Kullback-Leibler divergence, negentropy, and cumulants [9] [10]. A neural net(cid:173) work algorithm to carry out ICA was proposed by Bell and Sejnowski [7], and was applied to face recognition [3]. Although the idea of computing higher order mo(cid:173) ments in the ICA-based face recognition method is attractive, the assumption that the face images comprise of a set of independent basis images (or factorial codes) is not intuitively clear. In [3] Bartlett et ale showed that ICA representation out(cid:173) perform PCA representation in face recognition using a subset of frontal FERET face images. However, Moghaddam recently showed that ICA representation does not provide significant advantage over PCA [12]. The experimental results suggest that seeking non-Gaussian and independent components may not necessarily yield better representation for face recognition.

In [18], Sch6lkopf et ale extended the conventional PCA to Kernel Principal Com(cid:173) ponent Analysis (KPCA). Empirical results on digit recognition using MNIST data set and object recognition using a database of rendered chair images showed that Kernel PCA is able to extract nonlinear features and thus provided better recog(cid:173) nition results. Recently Baudat and Anouar, Roth and Steinhage, and Mika et ale applied kernel tricks to FLD and proposed Kernel Fisher Linear Discriminant (KFLD) method [11] [17] [5]. Their experiments showed that KFLD is able to ex(cid:173) tract the most discriminant features in the feature space, which is equivalent to extracting the most discriminant nonlinear features in the original input space.

In this paper we seek a method that not only extracts higher order statistics of samples as features, but also maximizes the class separation when we project these features to a lower dimensional space for efficient recognition. Since much of the important information may be contained in the high order dependences among the pixels of a: face image, we investigate the use of Kernel PCA and Kernel FLD for face recognition, which we call Kernel Eigenface and Kernel Fisherface methods, and compare their performance against the standard Eigenface, Fisherface and ICA methods. In the meanwhile, we explain why kernel methods are suitable for visual recognition tasks such as face recognition.

2 Kernel Principal Component Analysis

== Given a set of m centered (zero mean, unit variance) samples Xk, Xk [Xkl, ... ,Xkn]T ERn, PCA aims to find the projection directions that maximize the variance, C, which is equivalent to finding the eigenvalues from the covariance