{"title": "Face Recognition Using Kernel Methods", "book": "Advances in Neural Information Processing Systems", "page_first": 1457, "page_last": 1464, "abstract": "", "full_text": "Face Recognition Using Kernel Methods\n\nMing-Hsuan Yang\n\nHonda Fundamental Research Labs\n\nMountain View, CA 94041\n\nmyang@hra.com\n\nAbstract\n\nPrincipal Component Analysis and Fisher Linear Discriminant\nmethods have demonstrated their success in face detection, recog(cid:173)\nnition, and tracking. The representation in these subspace methods\nis based on second order statistics of the image set, and does not\naddress higher order statistical dependencies such as the relation(cid:173)\nships among three or more pixels. Recently Higher Order Statistics\nand Independent Component Analysis (ICA) have been used as in(cid:173)\nformative low dimensional representations for visual recognition.\nIn this paper, we investigate the use of Kernel Principal Compo(cid:173)\nnent Analysis and Kernel Fisher Linear Discriminant for learning\nlow dimensional representations for face recognition, which we call\nKernel Eigenface and Kernel Fisherface methods. While Eigenface\nand Fisherface methods aim to find projection directions based on\nthe second order correlation of samples, Kernel Eigenface and Ker(cid:173)\nnel Fisherface methods provide generalizations which take higher\norder correlations into account. We compare the performance of\nkernel methods with Eigenface, Fisherface and ICA-based meth(cid:173)\nods for face recognition with variation in pose, scale, lighting and\nexpression. Experimental results show that kernel methods pro(cid:173)\nvide better representations and achieve lower error rates for face\nrecognition.\n\n1 Motivation and Approach\n\nSubspace methods have been applied successfully in numerous visual recognition\ntasks such as face localization, face recognition, 3D object recognition, and tracking.\nIn particular, Principal Component Analysis (PCA) [20] [13] ,and Fisher Linear Dis(cid:173)\ncriminant (FLD) methods [6] have been applied to face recognition with impressive\nresults. While PCA aims to extract a subspace in which the variance is maximized\n(or the reconstruction error is minimized), some unwanted variations (due to light(cid:173)\ning, facial expressions, viewing points, etc.) may be retained (See [8] for examples).\nIt has been observed that in face recognition the variations between the images of\nthe same face due to illumination and viewing direction are almost always larger\nthan image variations due to the changes in face identity [1]. Therefore, while the\nPCA projections are optimal in a correlation sense (or for reconstruction\" from a\nlow dimensional subspace), these eigenvectors or bases may be suboptimal from the\n\n\fclassification viewpoint.\n\nRepresentations of Eigenface [20] (based on PCA) and Fisherface [6] (based on FLD)\nmethods encode the pattern information based on the second order dependencies,\ni.e., pixelwise covariance among the pixels, and are insensitive to the dependencies\namong multiple (more than two) pixels in the samples. Higher order dependencies\nin an image include nonlinear relations among the pixel intensity values, such as\nthe relationships among three or more pixels in an edge or a curve, which can cap(cid:173)\nture important information for recognition. Several researchers have conjectured\nthat higher order statistics may be crucial to better represent complex patterns.\nRecently, Higher Order Statistics (HOS) have been applied to visual learning prob(cid:173)\nlems. Rajagopalan et ale use HOS of the images of a target object to get a better\napproximation of an unknown distribution. Experiments on face detection [16] and\nvehicle detection [15] show comparable, if no better, results than other PCA-based\nmethods.\n\nThe concept of Independent Component Analysis (ICA) maximizes the degree\nof statistical independence of output variables using contrast functions such as\nKullback-Leibler divergence, negentropy, and cumulants [9] [10]. A neural net(cid:173)\nwork algorithm to carry out ICA was proposed by Bell and Sejnowski [7], and was\napplied to face recognition [3]. Although the idea of computing higher order mo(cid:173)\nments in the ICA-based face recognition method is attractive, the assumption that\nthe face images comprise of a set of independent basis images (or factorial codes)\nis not intuitively clear. In [3] Bartlett et ale showed that ICA representation out(cid:173)\nperform PCA representation in face recognition using a subset of frontal FERET\nface images. However, Moghaddam recently showed that ICA representation does\nnot provide significant advantage over PCA [12]. The experimental results suggest\nthat seeking non-Gaussian and independent components may not necessarily yield\nbetter representation for face recognition.\n\nIn [18], Sch6lkopf et ale extended the conventional PCA to Kernel Principal Com(cid:173)\nponent Analysis (KPCA). Empirical results on digit recognition using MNIST data\nset and object recognition using a database of rendered chair images showed that\nKernel PCA is able to extract nonlinear features and thus provided better recog(cid:173)\nnition results. Recently Baudat and Anouar, Roth and Steinhage, and Mika et\nale applied kernel tricks to FLD and proposed Kernel Fisher Linear Discriminant\n(KFLD) method [11] [17] [5]. Their experiments showed that KFLD is able to ex(cid:173)\ntract the most discriminant features in the feature space, which is equivalent to\nextracting the most discriminant nonlinear features in the original input space.\n\nIn this paper we seek a method that not only extracts higher order statistics of\nsamples as features, but also maximizes the class separation when we project these\nfeatures to a lower dimensional space for efficient recognition. Since much of the\nimportant information may be contained in the high order dependences among\nthe pixels of a: face image, we investigate the use of Kernel PCA and Kernel FLD\nfor face recognition, which we call Kernel Eigenface and Kernel Fisherface methods,\nand compare their performance against the standard Eigenface, Fisherface and ICA\nmethods. In the meanwhile, we explain why kernel methods are suitable for visual\nrecognition tasks such as face recognition.\n\n2 Kernel Principal Component Analysis\n\n==\nGiven a set of m centered (zero mean, unit variance) samples Xk, Xk\n[Xkl, ... ,Xkn]T ERn, PCA aims to find the projection directions that maximize\nthe variance, C, which is equivalent to finding the eigenvalues from the covariance\n\n\fmatrix\n\nAW=CW\n\n(1)\nfor eigenvalues A ~ 0 and eigenvectors W E Rn. In Kernel PCA, each vector x is\nprojected from the input space, Rn, to a high dimensional feature space, Rf, by a\nf ~ n. Note that the dimensionality\nnonlinear mapping function:\nof the feature space can be arbitrarily large. In Rf, the corresponding eigenvalue\nproblem is\n\n<t> : Rn -+ Rf,\n\n(2)\nwhere C4> is a covariance matrix. All solutions weI> with A =I- 0 lie in the span of\n<t> (x1), ..., <t> (Xm ), and there exist coefficients ai such that\n\n\"AW4> = C4>w4>\n\nm\n\nw4> = E ai<t>(xi)\n\ni=l\n\nDenoting an m x m matrix K by\n\nK\u00b7\u00b7 - k(x\u00b7 x\u00b7) - <t>(x\u00b7)\u00b7 <t>(x\u00b7)\n1\n\n~'1 -\n\n~1 -\n\n~\n\n, the Kernel PCA problem becomes\n\nmAKa =K2 a\n\n(3)\n\n(4)\n\n(5)\n\nmAa =Ka\n\n(6)\nwhere a denotes a column vector with entries aI, ... , am. The above derivations\nassume that all the projected samples <t>(x) are centered in Rf. See [18] for a ~ethod\nto center the vectors <t>(x) in Rf.\nNote that conventional PCA is a special case of Kernel PCA with polynomial kernel\nof first order. In other words, Kernel PCA is a generalization of conventional PCA\nsince different kernels can be utilized for different nonlinear projections.\nWe can now project the vectors in Rf to a lower dimensional space spanned by the\neigenvectors weI>, Let x be a test sample whose projection is <t>(x) in Rf, then the\nprojection of <t>(x) onto the eigenvectors weI> is the nonlinear principal components\ncorresponding to <t>:\n\nw4> . <t>(x) =E ai (<t> (Xi) . <t>(x)) = E aik(xi, x)\n\nm\n\nm\n\n(7)\n\ni=l\n\ni=l\n\nIn other words, we can extract the first q (1 ~ q ~ m) nonlinear principal com(cid:173)\nponents (Le., eigenvectors w4\u00bb using the kernel function without the expensive\noperation that explicitly projects the samples to a high dimensional space Rf\" The\nfirst q components correspond to the first q non-increasing eigenvalues of (6). For\nface recognition where each x encodes a face image, we call the extracted nonlinear\nprincipal components Kernel Eigenfaces.\n\n3 Kernel Fisher Linear Discriminant\n\nSimilar to the derivations in Kernel PCA, we assume the projected samples <t>(x)\nare centered in Rf (See [18] for a method to center the vectors <t>(x) in Rf), we\nformulate the equations in a way that use dot products for FLD only. Denoting the\nwithin-class and between-class scatter matrices by S~ and SiJ, and applying FLD\nin kernel space, we need to find eigenvalues A and eigenvectors weI> of\n\nAS~WeI> = siJweI>\n\n(8)\n\n\f, which can be obtained by\n\n<P\n\nWOPT = argw;x I(Wq,)TS~Wq,1 = W l W 2\n\nI(W<P)T S~W<P I\n\n[<P\n\n<P\n\n... w;.]\n\n(9)\n\nwhere {w[Ii == 1, 2, ... ,m} is the set of generalized eigenvectors corresponding to\nthe m largest generalized eigenvalues {Ai Ii == 1,2, ... ,m}.\nFor given classes t and u and their samples, we define the kernel function by\n\nLet K be a m x m matrix defined by the elements (Ktu)~1;:::,cc, where K tu is a\nmatrix composed of dot products in the feature space Rf, Le.,\n\nK == (K )=l,\ntu u=l,\n\n,c where K\n,c\n\ntu\n\n== (k\n\n)r=l,\nrs s=l,\n\n,lt\n,I'U\n\n(11)\n\nNote K tu is a It x Iu matrix, and K is a m x m symmetric matrix. We also define\na matrix Z:\n\n(12)\n\nwhere (Zt) is a It x It matrix with terms all equal to ~, Le., Z is a m x m block\ndiagonal matrix. The between-class and within-class scatter matrices in a high\ndimensional feature space Rf are defined as\n\nc\n\nsiJ == L liJ.ti (p/f)T\n\ni=l\n\nIi\n\nC\n\nS~ == L L ep(Xij )~(Xij)T\n\ni=l j=l\n\n(13)\n\n(14)\n\nwhere pi is the mean of class i in Rf, Ii is the number of samples belonging to class\ni. From the theory of reproducing kernels, any solution w<P E Rf must lie in the\nspan of all training samples in Rf, Le.,\n\nc\n\nIp\n\nw<P == L L cy'pqep(xpq )\n\np=lq=l\n\nIt follows that we can get the solution for (15) by solving:\n\nAKKa==KZKa\n\nConsequently, we can write (9) as\n\n<P\n\nI(WifJ)T sifJwifJl\nWOPT == argmaxwifJ I(WifJ)TS!WifJ I\n\nlaKZKal\n== argmaxw\u00abp laKKal\n== [wi ... w~]\n\n(15)\n\n(16)\n\n(17)\n\nWe can project ~(x) to a lower dimensional space spanned by the eigenvectors w<P\nin a way similar to Kernel PCA (See Section 2). Adopting the same technique in\nthe Fisherface method (which avoids singularity problems in computing W6PT) for\nface recognition [6], we call the extracted eigenvectors in (17) Kernel Fisherfaces.\n\n\f4 Experiments\n\nWe test both kernel methods against standard rCA, Eigenface, and Fisherface meth(cid:173)\nods using the publicly available AT&T and Yale databases. The face images in\nthese databases have several unique characteristics. While the images in the AT&T\ndatabase contain the facial contours and vary in pose as well scale, the face images\nin the Yale database have been cropped and aligned. The face images in the AT&T\ndatabase were taken under well controlled lighting conditions whereas the images\nin the Yale database were acquired under varying lighting conditions. We use the\nfirst database as a baseline study and then use the second one to evaluate face\nrecognition methods under varying lighting conditions.\n\n4.1 Variation in Pose and Scale\n\nThe AT&T (formerly Olivetti) database contains 400 images of 40 subjects. To\n.reduce computational complexity, each face image is downsampled to 23 x 28 pix(cid:173)\nels. We represent each image by a raster scan vector of the intensity values, .and\nthen normalize them to be zero-mean vectors. The mean and standard deviation\nof Kurtosis of the face images are 2.08 and 0.41, respectively (the Kurtosis of a\nGaussian distribution is 3). Figure 1 shows images of two subjects. In contrast to\nimages of the Yale database, the images include the facial contours, and variation\nin pose as well as scale. However, the lighting conditions remain constant.\n\nFig~re 1: Face images in the AT&T database (Left) and the Yale database (Right).\n\nThe experiments are performed using the \"leave-one-out\" strategy: To classify an\nimage of person, that image is removed from the training set of (m - 1) images and\nthe projection matrix is computed. All the m images in the training set are projected\nto a reduced space using the computed projection matrix w or weI> and recognition\nis performed based on a nearest neighbor classifier. The number of principal compo(cid:173)\nnents or independent components are empirically determined to achieve the lowest\nerror rate by each method. Figure 2 shows the experimental results. Among all the\nmethods, the Kernel Fisherface method with Gaussian kernel and second degree\npolynomial kernel achieve the lowest error rate. Furthermore, the kernel methods\nperform better than standard rCA, Eigenface and Fisherface methods. Though our\nexperiments using rCA seem to contradict to the good empirical results reported in\n[3] [4] [2]' a close look at the data sets reveals a significant difference in pose and\nscale variation of the face images in the AT&T database, whereas a subset of frontal\nFERET face images with change of expression was used in [3] [2]. Furthermore, the\ncomparative study on classification with respect to PCA in [4] (pp. 819, Table 1)\nand the errors made by two rCA algorithms in [2] (pp. 50, Figure 2.18) seem to\nsuggest that lCA methods do not have clear advantage over other approaches in\nrecognizing faces with pose and scale variation.\n\n4.2 Variation in Lighting and Expression\n\nThe Yale database contains 165 images of 11 subjects that includes variation in\nboth facial expression and lighting. For computational efficiency, each image has\nbeen downsampled to 29 x 41 pixels. Likewise, each face image is represented by a\n\n\fMethod\n\nI rCA\nEigenface\nFisherface\nKernel Eigenface, d==2\nKernel Eigenface, d==3\nKernel Fisherface (P)\nKernel Fisherface (G)\n\n30\n14\n50\n50\n14\n14\n\n2.75 (11/400)\n1.50 (6/400)\n2.50 (10/400)\n2.00 (8/400)\n1.25 (5/400)\n1.25 (5/400)\n\nFigure 2: Experimental results on AT&T database.\n\ncentered vector of normalized intensity values. The mean and standard deviation\nof Kurtosis of the face images are 2.68 and 1.49, respectively. Figure 1 shows 22\nclosely cropped images of two subjects which include internal facial structures such\nas the eyebrow, eyes, nose, mouth and chin, but do not contain the facial contours.\n\nUsing the same leave-one-out strategy, we experiment with the number of princi(cid:173)\npal components and independent components to achieve the lowest error rates for\nEigenface and Kernel Eigenface methods. For Fisherface and Kernel Fisherface\nmethods, we project all the samples onto a subspace spanned by the c - 1 largest\neigenvectors. The experimental results are shown in Figure 3. Both kernel methods\nperform better than standard ICA, Eigenface and Fisherface methods. Notice that\nthe improvement by the kernel methods are rather significant (more than 10%). No(cid:173)\ntice also that kernel methods consistently perform better than conventional methods\nfor both databases. The performance achieved by the ICA method indicates that\nface representation using independent sources is not effective when the images are\ntaken under varying lighting conditions.\n\n35\n\n30\n\n~ 25\n\n~ 20\n~8 15\n~ 10\n\no.\n\n29.09\n\n28..49\n\n27.27\n\n24.24\n\n-<\n~\n\n<\nu\nl:l-.\n\n~~ -<,-..\n~&\n~s\n\nQ\n....:l\n~\n\nQ,-..\n\n~&\n\nMethod\n\nI lCA\nEigenface\nFisherface\nKernel Eigenface, d==2\nKernel Eigenface, d==3\nKernel Fisherface (P)\nKernel Fisherface (G)\n\n30\n14\n80\n60\n14\n14\n\n28.48 (47/165)\n8.48 (14/165)\n27.27 (45/165)\n24.24 (40/165)\n6.67 (11/165)\n6.06 (10/165)\n\n\u00a7\nS\n\nQ\n\n~\n\nFigure 3: Experimental results on Yale database.\n\nFigure 4 shows the training samples of the Yale database projected onto the first two\neigenvectors extracted by the Kernel Eigenface and Kernel Fisherface methods. The\nprojected samples of different classes are smeared by the Kernel Eigenface method\nwhereas the samples projected by the Kernel Fisherface are separated quite welL\nIn fact, the samples belonging to the same class are projected to the same position\nby the largest two eigenvectors. This example provides an explanation to the good\nresults achieved by the Kernel Fisherface method.\n\nThe experimental results show that Kernel Eigenface and Fisherface methods are\nable to extract nonlinear features and achieve lower error rate. Instead of using a\nnearest neighbor classifier, the performance can potentially be improved by other\nclassifiers (e.g., k-nearest neighbor and perceptron). Another potential improvement\n\n\fis to use all the extracted nonlinear components as features (Le., without projecting\nto a lower dimensional space) and use a linear Support Vector Machine (SVM)\nto construct a decision surface. Such a two-stage approach is, in spirit, similar\nto nonlinear SVMs in which the samples are first projected to a high dimensional\nfeature space where a hyperplane with largest hyperplane is constructed. In fact,\none important factor of the recent success in SVM applications for visual recognition\nis due to the use of kernel methods.\n\nr1-o+. +\n\n\u00b0 class 1\n~::::~\n:\n1- ..........:\u00b7 ..~...... \u00b7~~:-~\"'-e>\"0~O\u00b7\"'....\u00b7\u00b7\u00b7:-\u00b7-\u00a5.-(I\u00b7\u00b7 .. \u00b7 .. ;\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7O\u00b7.\u00b7\u00b7 .. ; .... \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7-:\u00b7\u00b7 ..........:.. \u00b7\u00b7 ...... \u00b71 * c1ass4\n~ ~:::: ~\nV class?\n1 A class8\nclass9\n\n:...\u00a7it'CIlIl''''''~IX'';:)\n1iiI=-*$*_~\"\n\n\u00b0\n~ \u00b0\n\n;.,,\u00b7 .. \u00b7 .. \u00b7\n\n:.u\u00b7\u00b7.. \u00b7\n\n\u00b0\n\n1-\n\n:\n\n;\n\n:.'\"\n\n1-\n\n~ -21-\n~\n\n:*\n\n:\n\n:\n\n:\n\n:\n\n;\n\n;\n\n:\n\n:\n\n:\n\n:\n\n:\n\n;\n\n;\n\n*\n\n*\n\n'*\n\n<l\n\n\u00b7.. 1 E~i:~~H\n\u00b0 class13\n~~~~::~::~~~\n\nI-\n\n1-\n\nI-\n\n1-\n\n01-\n\n~\n~\n\n,\n\n,\n\n;\n\n,\n\n,\n\n~\n\n;.\n\n0\n;:\n\n,\n\n:\n\n:\n\n:\n\n:\n\n.\"\n\n1>\n\n-:\n\n,\n\n:\n\n~\n\n,\n\n:.\u00b0 *\n\n* !\n\n,\n\n;\n\n:\n\n,\n\n,\n\n\u00b0\n\u00b7,\u00b7\u00b7\u00b7 .. \u00b7\u00b7\u00b71 ~\n0\n1 y\n\n\u00b7, .. \u00b7\u00b7\n\n\u00b7\n\n\u00b7;\u00b7\n\n\u00b7' .. \u00b7\u00b7\n\n,\n\n\u00b7.. 1\n\n~\n\n'-'\n<l\nt>\n~\n1 *\n--=.,,,0.:--1\n0\n\n5\n\no\n\n2\n\n4\n\n:0.08\n\n-0.06\n\n-0.04\n\n-0.02\n\n0.02\n\n0.04\n\n0.)6\n\nO.DB\n\n(a) Kernel Eigenface method.\n\n(b) Kernel Fisherface method.\n\nFigure 4: Samples projected by Kernel PCA and Kernel Fisher methods.\n\n5 Discussion and Conclusion\n\nThe representation in the conventional Eigenface and Fisherface approaches is based\non second order statistics of the image set, Le., covariance matrix, and does not use\nhigh order statistical dependencies such as the relationships among three or more\npixels. For face recognition, much of the important information may be contained\nin the high order statistical relationships among the pixels. Using the kernel tricks\nthat are often used in SVMs, we extend the conventional methods to kernel space\nwhere we can extract nonlinear features among three or more pixels. We have in(cid:173)\nvestigated Kernel Eigenface and Kernel Fisherface methods, and demonstrate that\nthey provide a more effective representation for face recognition. Compared to\nother techniques for nonlinear feature extraction, kernel methods have the advan(cid:173)\ntages that they do not require nonlinear optimization, but only the solution of an\neigenvalue problem. Experimental results on two benchmark databases show that\nKernel Eigenface and Kernel Fisherface methods achieve lower error rates than the\nICA, Eigenface and Fisherface approaches in face recognition. The performance\nachieved by the ICA method also indicates that face representation using indepen(cid:173)\ndent basis images is not effective when the images contain pose, scale or lighting\nvariation. Our future work will focus on analyzing face recognition methods us(cid:173)\ning other kernel methods in high dimensional space. We plan to investigate and\ncompare the performance of other face recognition methods [14] [12] [19].\n\nReferences\n\n[1] Y. Adini, Y. Moses, and S. Ullman. Face recognition: The problem of com(cid:173)\nIEEE PAMI, 19(7):721-732,\n\npensating for changes in illumination direction.\n1997.\n\n[2] M. S. Bartlett. Face Image Analysis by Unsupervised Learning and Redundancy\n\nReduction. PhD thesis, University of California at San Diego, 1998.\n\n\f[3] M. S. Bartlett, H. M. Lades, and T. J. Sejnowski.\n\nIndependent component\nIn Proc. of SPIE, volume 3299, pages\n\nrepresentations for face recognition.\n528-539, 1998.\n\n[4] M. S. Bartlett and T. J. Sejnowski. Viewpoint invariant face recognition using\nindependent component analysis and attractor networks. In NIPS 9, page 817,\n1997.\n\n[5] G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel\n\napproach. Neural Computation, 12:2385-2404,2000.\n\n[6] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. Fisherfaces:\nRecognition using class specific linear projection. IEEE PAMI, 19(7):711-720,\n1997.\n\n[7] A. J. Bell and T. J. Sejnowski. An information - maximization approach to\nblind separation and blind deconvolution. Neural Computation, 7(6):1129(cid:173)\n1159, 1995.\n\n[8] C. 1\\1. Bishop.\n\nfleural fretworks for .J.Dattern Recognition. Oxford University\n\nPress, 1995.\n\n[9] P. Comon. Independent component analysis: A new concept? Signal Process(cid:173)\n\ning, 36(3):287-314-, 1994.\n\n[10] A. Hyviirinen, J. Karhunen, and E. Oja.\n\nWiley-Interscience, 2001.\n\nIndependent Component Analysis.\n\n[11] S. Mika, G. Riitsch, J. Weston, B. Sch6lkopf, A. Smola, and K.-R. Muller.\nIn NIPS 12,\n\nInvariant feature extraction and classification in kernel spaces.\npages 526-532, 2000.\n\n[12] B. Moghaddam. Principal manifolds and bayesian subspaces for visual recog(cid:173)\nnition. In Proc. IEEE Int'l Conf. on Computer Vision, pages 1131-1136,1999.\n[13] B. Moghaddam and A. Pentland. Probabilistic visual learning for object recog(cid:173)\n\nnition. IEEE PAMI, 19(7):696-710, 1997.\n\n[14] P. J. Phillips. Support vector machines applied to face recognition. In NIPS\n\n11, pages 803-809, 1998.\n\n[15] A. N. Rajagopalan, P. Burlina, and R. Chellappa. Higher order statistical\nlearning for vehicle detection in images. In Proc. IEEE Int'l Con!. on Computer\nVision, volume 2, pages 1204-1209,1999.\n\n[16] A. N. Rajagopalan, K. S. Kumar, J. Karlekar, R. Manivasakan, and M. M.\nPatil. Finding faces in photographs. In Proc. IEEE Int'l Conf. on Computer\nVision, pages 640-645, 1998.\n\n[17] V. Roth and V. Steinhage. Nonlinear discriminant analysis using kernel func(cid:173)\n\ntions. In NIPS 12, pages 568-574,2000.\n\n[18] B. Sch6lkopf, A. Smola, and K.-R. Muller. Nonlinear component analysis as a\n\nkernel eigenvalue problem. Neural Computation, 10(5):1299-1319,1998.\n\n[19] Y. W. Teh and G. E. Hinton. Rate-coded restricted Boltzmann machines for\n\nface recognition. In NIPS 13, pages 908-914, 2001.\n\n[20] M. Turk and A. Pentland. Eigenfaces for recognition. J. of Cognitive Neuro(cid:173)\n\nscience, 3(1):71-86, 1991.\n\n\f", "award": [], "sourceid": 2087, "authors": [{"given_name": "Ming-Hsuan", "family_name": "Yang", "institution": null}]}