{"title": "Bayesian Modeling of Facial Similarity", "book": "Advances in Neural Information Processing Systems", "page_first": 910, "page_last": 916, "abstract": null, "full_text": "Bayesian Modeling of Facial Similarity \n\nBaback Moghaddam \n\nMitsubishi Electric Research Laboratory \n\n201 Broadway \n\nCambridge, MA 02139 , USA \n\nbabackCOmerl.com \n\nTony Jebara and Alex Pentland \nMassachusettes Institute of Technology \n\n20 Ames St . \n\nCambridge, MA 02139 , USA \n\n{jebara,sandy}COmedia.mit.edu \n\nAbstract \n\nIn previous work [6, 9, 10], we advanced a new technique for direct \nvisual matching of images for the purposes of face recognition \nand image retrieval , using a probabilistic measure of similarity \nbased primarily on a Bayesian (MAP) analysis of image differ(cid:173)\nences, leading to a \"dual\" basis similar to eigenfaces [13]. The \nperformance advantage of this probabilistic matching technique \nover standard Euclidean nearest-neighbor eigenface matching was \nrecently demonstrated using results from DARPA's 1996 \"FERET\" \nface recognition competition, in which this probabilistic matching \nalgorithm was found to be the top performer. We have further \ndeveloped a simple method of replacing the costly com put ion of \nnonlinear (online) Bayesian similarity measures by the relatively \ninexpensive computation of linear (offline) subspace projections \nand simple (online) Euclidean norms, thus resulting in a significant \ncomputational speed-up for implementation with very large image \ndatabases as typically encountered in real-world applications. \n\n1 \n\nIntroduction \n\nCurrent approaches to image matching for visual object recognition and image \ndatabase retrieval often make use of simple image similarity metrics such as \nEuclidean distance or normalized correlation, which correspond to a template(cid:173)\nmatching approach to recognition [2, 5]. For example, in its simplest form, the \n\n\fBayesian Modeling of Facial Similarity \n\n911 \n\nsimilarity measure S(h , h) between two images hand h can be set to be inversely \nproportional to the norm 1111 - hll. Such a simple formulation suffers from a major \ndrawback: it does not exploit knowledge of which types of variation are critical \n(as opposed to incidental) in expressing similarity. In this paper , we formulate a \nprobabilistic similarity measure which is based on the probability that the image \nintensity differences , denoted by .6. = h - [2 , are characteristic of typical variations \nin appearance of the same object. For example, for purposes of face recognition , \nwe can define two classes of facial image variations: intrapersonal variations ~h \n(corresponding, for example, to different facial expressions of the same individual) \nand extrapersonal variations OE (corresponding to variations between different \nindividuals) . Our similarity measure is then expressed in terms of the probability \n\n(1) \n\nwhere P(011.6.) is the a posteriori probability given by Bayes rule , using estimates \nof the likelihoods P(.6.101) and P(.6.IOE)' The likelihoods are derived from training \ndata using an efficient subspace method for density estimation of high-dimensional \ndata [7, 8]. This Bayesian (MAP) approach can also be viewed as a generalized \nnonlinear extension of Linear Discriminant Analysis (LDA) [12, 3] or \"Fisher Face\" \ntechniques [1] for face recognition. Moreover, our nonlinear generalization has \ndistinct computational/storage advantages over some of these linear methods for \nlarge databases. \n\n2 Difference Density Modeling \n\nConsider the problem of characterizing the type of intensity differences which \noccur when matching two images in a face recognition task. We have two classes \n(intrapersonal 0 1 and extrapersonal OE) which we will assume form Gaussian \ndistributions whose likelihoods can be estimated as P(.6.101) and P(.6.IOE) for a \ngiven intensity difference .6. = h - [2 . \nGiven these likelihoods we can evaluate a similarity score S(h, h) between a pair \nof images directly in terms of the intrapersonal a posteriori probability as given by \nBayes rule : \n\nS \n\n(2) \n\nwhere the priors P(O) can be set to reflect specific operating conditions (e.g., \nnumber of test images vs. \nthe size of the database) or other sources of a priori \nknowledge regarding the two images being matched. Additionally, this particular \nBayesian formulation casts the standard face recognition task (essentially an M -ary \nclassification problem for M individuals) into a binary pattern classification problem \nwith 0 1 and OE. This much simpler problem is then solved using the maximum \na posteriori (MAP) rule -\ni.e. , two images are determined to belong to the same \nindividual if P(011.6.) > P(OEI.6.), or equivalently, if S(h, h) > !. \nTo deal with the high-dimensionality of .6., we make use of the efficient density \nestimation method proposed by Moghaddam & Pentland [7, 8] which divides \nthe vector space nN into two complementary subs paces using an eigenspace \ndecomposition . This method relies on a Principal Components Analysis (PCA) \n[4] to form a low-dimensional estimate of the complete likelihood which can be \nevaluated using only the first M principal components, where M < < N. \n\n\f912 \n\nB. Moghaddam. T. Jebara and A. Pentland \n\n3 Efficient Similarity Computation \n\nConsider now a feature space of ~ vectors, the differences between two images \n(Ii and h). The two classes of interest in this space correspond to intrapersonal \nand extrapersonal variations and each is modeled as a high-dimensional Gaussian \ndensity as in Equation 3. The densities are zero-mean since for each ~ = Ii - h \nthere exists a ~ = h - I j \u2022 \n\n(3) \n\nBy PCA, the Gaussians are known to only occupy a subspace of image space (face(cid:173)\nspace) and thus, only the top few eigenvectors of the Gaussian densities are relevant \nfor modeling. These densities are used to evaluate the similarity score in Equation 2. \n\nComputing the similarity score involves first subtracting a candidate image Ii from a \ndatabase entry h . The resulting ~ image is then projected onto the eigenvectors of \nthe extrapersonal Gaussian and also the eigenvectors of the intrapersonal Gaussian. \nThe exponentials are computed, normalized and then combined as in Equation 2. \nThis operation is iterated over all members of the database (many Ik images) until \nthe maximum score is found (i.e. \nthe match). Thus, for large databases, this \nevaluation is expensive but can be simplified by offline transformations. \nTo compute the likelihoods p(~lrh) and P(~IOE) we pre-process the Ik images \nwith whitening transformations. Each image is converted and stored as whitened \nsubspace coefficients; i for intrapersonal space and e for extrapersonal space (see \nEquation 4). Here, A and V are matrices of the largest eigenvalues and eigenvectors \nof ~E or ~/. Typically, we have used MI = 100 and ME = 100 for 0 1 and OE \nrespectively. \n\n(4) \n\nAfter this pre-processing, evaluating the Gaussians can be reduced to simple \nEuclidean distances as in Equation 5. Denominators are of course pre-computed. \nThese likelihoods are evaluated and used to compute the MAP similarity S in \nEquation 2. Euclidean distances are computed between the lOa-dimensional i \nvectors as well as the lOa-dimensional e vectors. Thus, roughly 2 x (ME + AfJ) = \n400 arithmetic operations are required for each similarity computation , avoiding \nrepeated image differencing and projections. \n\ne-tlle )-ek Il 2 \n(211\" )D/2[ ~E [1/2 \n\n(5) \n\nThe ML similarity matching is even simpler since only the intra-personal class is \nevaluated, leading to the following modified form for the similarity measure \n\n(6) \n\n\fBayesian Modeling of Facial Similarity \n\n913 \n\n(a) \n\n(b) \n\nFigure 1: Examples of FERET frontal-view image paIrs used for (a) the Gallery set \n(training) and (b) the Probe set (testing). \n\nFigure 2: Face alignment system [7]. \n\n4 Experimental Results \n\nTo test our recognition strategy we used a collection of images from the ARPA \nFERET face database. The set of images consists of pairs of frontal-views (FA/FB) \nand are divided into two subsets: the \"gallery\" (training set) and the \"probes\" \n(testing set). The gallery images consisted of 74 pairs of images (2 per individual) \nand the probe set consisted of 38 pairs of images, corresponding to a subset of the \ngallery members. The probe and gallery datasets were captured a week apart and \nexhibit differences in clothing, hair and lighting (see Figure 1). \n\nEach of these images were affine normalized with a canonical model using an \nautomatic face-processing system which normalizes for translation, scale as well \nas slight rotations (both in-plane and out-of-plane). This system is described in \ndetail in [7, 8] and uses maximum-likelihood estimation of object location (in this \ncase the position and scale of a face and the location of individual facial features) \nto geometrically align faces into standard normalized form as shown in Figure 2. \nAll the faces in our experiments were geometrically aligned and normalized in this \nmanner prior to further analysis. \n\n4.1 Eigenface Matching \n\nAs a baseline comparison, we first used an eigenface matching technique for \nrecognition [13]. The normalized images from the gallery and the probe sets were \nprojected onto a lOO-dimensional eigenspace similar to that shown in Figure 3 and \na nearest-neighbor rule based on a Euclidean distance measure was used to match \n\n\f914 \n\n.. \n\n\u2022\u2022 alIi!t. \n\n~ \n\nB. Moghaddam, T. Jebara and A. Pentland \n\n\" \n\n.~ \n\n.p. \n\n--\n..... -,- -1-\n-, \n.,' ~ \n3 \n\",._..,. \n----\n\n...... \n-1If4 ~~ \n-\n.- ~ \n\ni \n\n\". \n\n~ \n\nFigure 3: Standard Eigenfaces. \n\nFigure 4: \"Dual\" Eigenfaces: (a) Intrapersonal, (b) Extrapersonal \n\n(b) \n\neach probe image to a gallery image. We note that this method corresponds to \na generalized template-matching method which uses a Euclidean norm measure of \nsimilarity which is, however, restricted to the principal subspace of the data. The \nrank-l recognition rate obtained with this method was found to be 84%. \n\n4.2 Bayesian Matching \n\nFor our probabilistic algorithm, we first gathered training data by computing \nthe intensity differences for a training subset of 74 intrapersonal differences (by \nmatching the two views of every individual in the gallery) and a random subset \nof 296 extrapersonal differences (by matching images of different individuals in the \ngallery), corresponding to the classes fh and OE, respectively, and performing a \nseparate PCA analysis on each. \n\nWe note that the two mutually exclusive classes Of and OE correspond to a \n\"dual\" set of eigenfaces as shown in Figure 4. Note that the intrapersonal \nvariations shown in Figure 4-(a) represent subtle variations due mostly to expression \nchanges (and lighting) whereas the extrapersonal variations in Figure 4-(b) are more \nrepresentative of general eigenfaces which code variations such as hair color, facial \nhair and glasses. These extrapersonal eigenfaces are qualitatively similar to the \nstandard normalized intensity eigenfaces shown in Figure 3. \n\nWe next computed the likelihood estimates P(~IO[) and P(~IOE) using the PCA(cid:173)\nbased method [7, 8], using subspace dimensions of M[ = 10 and ME = 30 for Of and \nOE, respectively. These density estimates were then used with a default setting of \nequal priors, P(O[) = P(OE), to evaluate the a posteriori intrapersonal probability \nP(O[I~) for matching probe images to those in the gallery. Therefore, for each \nprobe image we computed probe-to-gallery differences and sorted the matching \norder, this time using the a posteriori probability P(~hl~) as the similarity measure. \nThis probabilistic ranking yielded an improved rank-1 recognition rate of 90% . \n\n\fBayesian Modeling of Facial Similarity \n\n915 \n\n1.00 \n\n~ o 0.90 \n~ \n.s::. \n~ \nE \nQ) 0.80 \n> \n~ \n\"3 \nE \n(3 0.70 \n\nt>----. MIT Sap 96 \n+--