{"title": "Facial Memory Is Kernel Density Estimation (Almost)", "book": "Advances in Neural Information Processing Systems", "page_first": 24, "page_last": 30, "abstract": null, "full_text": "Facial Memory is Kernel Density Estimation \n\n(Almost) \n\nMatthew N. Dailey Garrison W. Cottrell \n\nDepartment of Computer Science and Engineering \n\nU.C. San Diego \n\nThomas A. Busey \n\nDepartment of Psychology \n\nIndiana University \n\nLa Jolla, CA 92093-0114 \n\n{mdailey,gary}@cs.ucsd.edu \n\nBloomington, IN 47405 \nbusey@indiana.edu \n\nAbstract \n\nWe compare the ability of three exemplar-based memory models, each \nusing three different face stimulus representations, to account for the \nprobability a human subject responded \"old\" in an old/new facial mem(cid:173)\nory experiment. The models are 1) the Generalized Context Model, 2) \nSimSample, a probabilistic sampling model, and 3) MMOM, a novel \nmodel related to kernel density estimation that explicitly encodes stim(cid:173)\nulus distinctiveness. The representations are 1) positions of stimuli in \nMDS \"face space,\" 2) projections of test faces onto the \"eigenfaces\" of \nthe study set, and 3) a representation based on response to a grid of Gabor \nfilter jets. Of the 9 model/representation combinations, only the distinc(cid:173)\ntiveness model in MDS space predicts the observed \"morph familiarity \ninversion\" effect, in which the subjects' false alarm rate for morphs be(cid:173)\ntween similar faces is higher than their hit rate for many of the studied \nfaces. This evidence is consistent with the hypothesis that human mem(cid:173)\nory for faces is a kernel density estimation task, with the caveat that dis(cid:173)\ntinctive faces require larger kernels than do typical faces. \n\n1 Background \n\nStudying the errors subjects make during face recognition memory tasks aids our under(cid:173)\nstanding of the mechanisms and representations underlying memory, face processing, and \nvisual perception. One way of evoking such errors is by testing subjects' recognition of \nnew faces created from studied faces that have been combined in some way (e.g. Solso and \nMcCarthy, 1981; Reinitz, Lammers, and Cochran 1992). Busey and Tunnicliff (submit(cid:173)\nted) have recently examined the extent to which image-quality morphs between unfamiliar \nfaces affect subjects' tendency to make recognition errors. \n\nTheir experiments used facial images of bald males and morphs between these images (see \n\n\fFacial Memory Is Kernel Density Estimation (Almost) \u2022. . . \" \n\n<: .. \n\nI' \",.,,\", . \" . . \n\n\\i< r \n\n.,. .'~ . . \n,.;;.' :-.. ' .' ... ..\u2022. . ' .. \n\n, ,';'.> \"i' .. > \n\\ \" \n\n' . ' \n\n. \n\nIl\" \n\n25 \n\nFigure 1: Three normalized morphs from the database. \n\n::;;1;f~ \n\nFigure 1) as stimuli. In one study, Busey (in press) had subjects rate the similarity of all \npairs in a large set of faces and morphs, then performed a multidimensional scaling (MDS) \nof these similarity ratings to derive a 6~dimensional \"face space\" (Valentine and Endo, \n1992). In another study, \"Experiment 3\" (Busey and Tunnicliff, submitted), 179 subjects \nstudied 68 facial images, including 8 similar pairs and 8 dissimilar pairs, as determined in a \npilot study. These pairs were included in order to study how morphs between similar faces \nand dissimilar faces evoke false alanns. We call the pair of images from which a morph are \nderived its \"parents,\" and the morph itself as their \"child.\" In the experiment's test phase, \nthe subjects were asked to make new/old judgments in response to 8 of the 16 morphs, 20 \ncompletely new distractor faces, the 36 non-parent targets and one of the parents of each of \nthe 8 morphs. The results were that, for many of the morphlparent pairs, subjects responded \n\"old\" to the unstudied morph more often than to its studied parent. However, this effect (a \nmorphfamiliarity inversion) only occurred for the morphs with similar parents. It seems \nthat the similar parents are so similar to their \"child\" morphs that they both contribute \ntoward an \"old\" (false alann) response to the morpho \n\nResearchers have proposed many models to account for data from explicit memory ex(cid:173)\nperiments. Although we have applied other types of models to Busey and Tunnicliff's \ndata with largely negative results (Dailey et al., 1998), in this paper, we limit discussion \nto exemplar-based models, such as the Generalized Context Model (Nosofsky, 1986) and \nSAM (Gillund and Shiffrin, 1984). These models rely on the assumption that subjects \nexplicitly store representations of each of the stimuli they study. Busey and Tunnicliff ap(cid:173)\nplied several exemplar-based models to the Experiment 3 data, but none of these models \nhave been able to fully account for the observed similar morph familiarity inversion with(cid:173)\nout positing that the similar parents are explicitly blended in memory, producing prototypes \nnear the morphs. \n\nWe extend Busey and Tunnicliff's (submitted) work by applying two of their exemplar \nmodels to additional image-based face stimulus representations, and we propose a novel \nexemplar model that accounts for the similar morphs' familiarity inversion. The results are \nconsistent with the hypothesis that facial memory is a kernel density estimation (Bishop, \n1995) task, except that distinctive exemplars require larger kernels. Also, on the basis of \nour model, we can predict that distinctiveness with respect to the study set is the critical \nfactor influencing kernel size, as opposed to a context-free notion of distinctiveness. We \ncan easily test this prediction empirically. \n\n2 Experimental Methods \n2.1 Face Stimuli and Normalization \nThe original images were 104 digitized 560x662 grayscale images of bald men, with con(cid:173)\nsistent lighting and background and fairly consistent position. The subjects varied in race \nand extent of facial hair. We automatically located the left and right eyes on each face using \na simple template correlation technique then translated, rotated, scaled and cropped each \nimage so the eyes were aligned in each image. We then scaled each image to 114x 143 to \nspeed up image processing. Figure 1 shows three examples of the normalized morphs (the \noriginal images are copyrighted and cannot be published) . \n\n\f26 \n\nM N. Dailey, G. W Cottrell and T. A. Busey \n\n2.2 Representations \nPositions in multidimensional face space Many researchers have used a multidimen(cid:173)\nsional scaling approach to model various phenomena in face processing (e.g. Valentine and \nEndo, 1992). Busey (in press) had 343 subjects rate the similarity of pairs of faces in the \ntest set and performed a multidimensional scaling on the similarity matrix for 100 of the \nfaces (four non-parent target faces were dropped from this analysis). The process resulted \nin a 6-dimensional solution with r2 = 0.785 and a stress of 0.13. In the MDS modeling \nresults described below, we used the 6-dimensional vector associated with each stimulus as \nits representation. \n\nPrincipal component projections \"Eigenfaces,\" or the eigenvectors of the covariance \nmatrix for a set of face images, are a common basis for face representations (e.g. Turk and \nPentland, 1991). We performed a principal components analysis on the 68 face images used \nin the study set for Busey and Tunnicliff's experiment to get the 67 non-zero eigenvectors \nof their covariance matrix. We then projected each of the 104 test set images onto the 30 \nmost significant eigenfaces to obtain a 30-dimensional vector representing each face. l \n\nGabor filter responses von der Malsburg and colleagues have made effective use of \nbanks of Gabor filters at various orientations and spatial frequencies in face recognition sys(cid:173)\ntems. We used one form of their wavelet (Buhmann, Lades, and von der Malsburg, 1990) at \nfive scales and 8 orientations in an 8x8 square grid over each normalized face image as the \nbasis for a third face stimulus representation. However, since this representation resulted \nin a 2560-dimensional vector for each face stimulus, we performed a principal components \nanalysis to reduce the dimensionality to 30, keeping this representation's dimensionality the \nsame as the eigenface representation's. Thus we obtained a 30-dimensional vector based \non Gabor filter responses to represent each test set face image. \n\n2.3 Models \nThe Generalized Context Model (GCM) There are several different flavors to the GCM. \nWe only consider a simple sum-similarity form that will lead directly to our distinctiveness(cid:173)\nmodulated density estimation model. Our version of GCM's predicted P(old), given a \nrepresentation y of a test stimulus and representations x E X of the studied exemplars, is \n\npredy = a + {3 L e- c (dx \u2022y )2 \n\nxEX \n\nwhere a and {3linearly convert the probe's summed similarity to a probability, X is the set \nof representations of the study set stimuli; c is used to widen or narrow the width of the \nsimilarity function, and dx,y is either Ilx - yll, the Euclidean distance between x and y \nor the weighted Euclidean distance VLk Wk(Xk - Yk)2 where the \"attentional weights\" \nWk are constants that sum to 1. Intuitively, this model simply places a Gaussian-shaped \nfunction over each of the studied exemplars, and the predicted familiarity of a test probe is \nsimply the summed height of each of these surfaces at the probe's location. \n\nRecall that two of our representations, PC projection space and Gabor filter space, are \n30-dimensional, whereas the other, MDS, is only 6-dimensional. Thus allowing adaptive \nweights for the MDS representation is reasonable, since the resulting model only uses 8 \nparameters to fit 100 points, but it is clearly unreasonable to allow adaptive weights in \nPC and Gabor space, where the resulting models would be fitting 32 parameters to 100 \npoints. Thus, for all models, we report results in MDS space both with and without adaptive \nweights, but do not report adaptive weight results for models in PC and Gabor space. \n\nSimSample Busey and Tunnicliff (submitted) proposed SimSample in an attempt to rem(cid:173)\nedy the GCM's poor predictions of the human data. It is related to both GCM, in that it \n\n1 We used 30 eigenfaces because with this number, our theoretical \"distinctiveness\" measure was \n\nbest correlated with the same measure in MDS space. \n\n\fFacial Memory Is Kernel Density Estimation (Almost) \n\n27 \n\nuses representations in MDS space, and SAM (Gillund and Shiffrin, 1984), in that it in(cid:173)\nvolves sampling exemplars. The idea behind the model is that when a subject is shown \na test stimulus, instead of a summed comparison to all of the exemplars in memory, the \ntest probe probabilistically samples a single exemplar in memory, and the subject responds \n\"old\" if the probe's similarity to the exemplar is above a noisy criterion. The model has \na similarity scaling parameter and two parameters describing the noisy threshold function. \nDue to space limitations, we cannot provide the details of the model here. \n\nBusey and Tunnicliff were able to fit the human data within the SimS ample framework, \nbut only when they introduced prototypes at the locations of the morphs in MDS space and \nmade the probability of sampling the prototype proportional to the similarity of the parents. \nHere, however, we only compare with the basic version that does not blend exemplars. \nMixture Model of Memory (MMOM) \nIn this model, we assume that subjects, at study \ntime, implicitly create a probability density surface corresponding to the training set. The \nsubjects' probability of responding \"old\" to a probe are then taken to be proportional to the \nheight of this surface at the point corresponding to the probe. The surface must be robust \nin the face of the variability or noise typically encountered in face recognition (lighting \nchanges, perspective changes, etc.) yet also provide some level of discrimination support \n(i.e. even when the intervals of possible representations for a single face could overlap \ndue to noise, some rational decision boundary must still be constructed). If we assume \na Gaussian mixture model, in which the density surface is built from Gaussian \"blobs\" \ncentered on each studied exemplar, the task is a form of kernel density estimation (Bishop, \n1995). \n\nWe can fonnulate the task of predicting the human subjects' P( old) in this framework, then, \nas optimizing the priors and widths of the kernel functions to minimize the mean squared \nerror of the prediction. However, we also want to minimize the number of free parameters \nin the model -\nparsimonious methods for setting the priors and kernel function widths \npotentially lead to more useful insights into the principles underlying the human data. If \nthe priors and widths were held constant, we would have a simple two parameter model \npredicting the probability a subject responds \"old\" to a test stimulus y: \n\npredy = L oe-\n\nI!x_~1!2 \n\n2 .. \n\nxEX \n\nwhere a folds together the uniform prior and normalization constants, and (7 is the stan-\ndard deviation of the Gaussian kernels. If we ignore the constants, however, this model \nis essentially the same as the version of the GCM described above. As the results section \nwill show, this model cannot fully account for the human familiarity data in any of our \nrepresentational spaces. \n\nTo improve the model, we introduce two parameters to allow the prior (kernel function \nheight) and standard deviation (kernel function width) to vary with the distinctiveness of the \nstudied exemplar. This modification has two intuitive motivations. First, when humans are \nasked which of two parent faces a 50% morph is most similar to, if one parent is distinctive \nand the other parent is typical, subjects tend to choose the more distinctive parent (Tanaka et \naI., submitted). Second, we hypothesize that when a human is asked to study and remember \na set of faces for a recognition test, faces with few neighbors will likely have more relaxed \n(wider) discrimination boundaries than faces with many nearby neighbors. \nThus in each representation space, for each studied face x, we computed d(x), the theoret(cid:173)\nical distinctiveness of each face, as the Z-scored average distance to the five nearest studied \nfaces. We then allowed the height and width of each kernel function to vary with d(x): \n\npredy = L 0(1 + cod(x\u00bbe 2(\"(l+c .. d(x\u00bb2 \n\nI!x_yl!2 \n\n_ \n\nAs was the case for GCM and SimSample, we report the results of using a weighted Eu-\nclidean distance between y and x in MDS space only. \n\nxEX \n\n\f28 \n\nM. N Dailey. G. W. Cottrell and T. A. Busey \n\nModel \nGCM \nSimS ample \nMMOM \n\n\" MDS space I MDS + weights I PC projections I Gabor jets I \n\n0.1633 \n0.1521 \n0.1601 \n\n0.1417 \n0.1404 \n0.1528 \n\n0.1745 \n0.1756 \n0.1992 \n\n0.1624 \n0.1704 \n0.1668 \n\nTable 1: RMSE for the three models and three representations. Quality of fit for models \nwith adaptive attentional weights are only reported for the low-dimensional representation \n(\"MDS + weights\"). The baseline RMSE, achievable with a constant prediction, is 0.2044. \n\n2.4 Parameter fitting and model evaluation \nFor each of the twelve combinations of models with face representations, we searched \nparameter space by simple hill climbing for the parameter settings that minimized the mean \nsquared error between the model's predicted P(old) and the actual human P(old) data. \n\nWe rate each model's effectiveness with two criteria. First, we measure the models' global \nfit with RMSE over all test set points. A model's RMSE can be compared to the baseline \nperformance of the \"dumbest\" model, which simply predicts the mean human P(old) of \n0.5395, and achieves an RMSE of 0.2044. Second, we evaluate the extent to which a model \npredicts the mean human response for each of the six categories of test set stimuli: 1) non(cid:173)\nparent targets, 2) non-morph distractors, 3) similar parents, 4) dissimilar parents, 5) similar \nmorphs, and 6) dissimilar morphs. If a model correctly predicts the rank ordering of these \ncategory means, it obviously accounts for the similar morph familiarity inversion pattern in \nthe human data. As long as models do an adequate job of fitting the human data overall, as \nmeasured by RMSE, we prefer models that predict the morph familiarity inversion effect \nas a natural consequence of minimizing RMSE. \n\n3 Results \n\nTable 1 shows the global fit of each model/representation pair. The SimSample model in \nMDS space provides the best quantitative fit. GeM generally outperforms MMOM, indi(cid:173)\ncating that for a tight quantitative fit, having parameters for a linear transformation built \ninto the model is more important than allowing the kernel function to vary with distinctive(cid:173)\nness. Also of note is that the PC projection representation is consistently outperformed by \nboth the Gabor jet representation and the MDS space representation. \n\nBut for our purposes, the degree to which a model predicts the mean human responses for \neach of the six categories of stimuli is more important, given that it is doing a reasonably \ngood job globally. Figure 2 takes a more detailed look at how well each model predicts \nthe human category means. Even though SimSample in MDS space has the best global \nfit to the human familiarity ratings, it does not predict the familiarity inversion for similar \nmorphs. Only the mixture model in weighted MDS space correctly predicts the morph \nfamiliarity effect. All of the other models underpredict the human responses to the similar \nmorphs. \n\n4 Discussion \n\nThe results for the mixture model are consistent with the hypothesis that facial memory is \na kernel density estimation task, with the caveat that distinctive exemplars require larger \nkernels. Whereas true density estimation would tend to deemphasize outliers in sparse \nareas of the face space, the human data show that the priors and kernel function widths for \noutliers should actually be increased. Two potentially significant problems with the work \npresented here are first, we experimented with several models before finding that MMOM \nwas able to predict the morph familiarity inversion effect, and second, we are fitting a single \n\n\fFacial Memory Is Kernel Density Estimation (Almost) \n\n29 \n\nGCMlMDS \n\nSimSamplelMDS \n\nMMOMlMDS \n\n0.6 \n\ni \n~ 0.\" \nf \n\n~ 0.2 \n\n0 .0 \n\n0.6 \n\ni \nE\" 0 .4 \nf \n\n~ 0.2 \n\n0.0 \n\n0 .6 \n\ni \nit'\" 0.\" \n\nr \n\n0.2 \n\n0,0 \n\n0.6 \n\ni \nE\" 0.\" \n\nr \n\n0.2 \n\n0 .0 \n\n0.6 \n\ni \niI:\" OA \nf \n\n~ 0.2 \n\n0 .0 \n\nOP SM T \n\nSP OM 0 \n\nGCMlMDS+wts \n\nOP SM T \n\nSP OM 0 \nSimSamplelMDS+wts \n\n0.6 \n\ni \nE\" 0.4 \nf \n\n~ 0.2 \n\n0.0 \n\nDP SM T \n\nSP OM 0 \nMMOMlMDS+wts \n\nOP SM T \n\nSP OM 0 \n\nGCMlPC \n\n0.6 \n\ni \nE\" 0.\" \nf \n\n~ 0 .2 \n\n0.0 \n\n0 .6 \n\ni \nil:\"0A \nf \n\n~ 0.2 \n\n0.0 \n\nOP SM T \nSimSampleIPC \n\nSP OM 0 \n\nOP SM T \nGCWGabor \n\nSP OM 0 \n\nDP SM T \n\nSP OM 0 \nSimSample/Gabor \n\n0.6 \n\ni \nill:: 0.\" \nf \n\n~ 0.2 \n\n0.0 \n\n0.6 \n\ni \nill:: 0.\" \nf \n\n~ 0 .2 \n\n0.0 \n\n0.6 \n\ni \nill:: 0.\" \nf \n\n~ 0.2 \n\n0.0 \n\n0.6 \n\ni \nit'\" 0.\" \nt \n\n~ 0.2 \n\n0.0 \n\nSP OM D \n\nOP SM T \nMMOMIPC \n\nOP SM \n\nT \n\nSP OM 0 \n\nMMOWGabor \n\nDP SM T \n\nSP OM 0 \n\nOP SM T \n\nSP OM 0 \n\nOP SM T \n\nSP OM 0 \n\nr::::=:I Actual \n_Predicted \n\nFigure 2: Average actual/predicted responses to the faces in each category. Key: DP = \nDissimilar parents; SM = Similar morphs; T = Non-parent targets; SP = Similar parents; \nDM = Dissimilar morphs; D = Distractors. \n\nexperiment. The model thus must be carefully tested against new data, and its predictions \nempirically validated. \n\nSince a theoretical distinctiveness measure based on the sparseness of face space around an \nexemplar was sufficient to account for the similar morphs' familiarity inversion, we predict \nthat distinctiveness with respect to the study set is the critical factor influencing kernel size, \nrather than context-free human distinctiveness judgments. We can easily test this prediction \nby having subjects rate the distinctiveness of the stimuli without prior exposure and then \ndetermine whether their distinctiveness ratings improve or degrade the model's fit. \n\nA somewhat disappointing (though not particularly surprising) aspect of our results is that \nthe model requires a representation based on human similarity judgments. Ideally, we \nwould prefer to provide an information-processing account using image-based representa(cid:173)\ntions like eigenface projections or Gabor filter responses. Interestingly, the efficacy of the \nimage-based representations seems to depend on how similar they are to the MDS repre(cid:173)\nsentations. The PC projection representation performed the worst, and distances between \npairs of PC representations had a correlation of 0.388 with the distances between pairs of \nMDS representations. For the Gabor filter representation, which performed better, the cor(cid:173)\nrelation is 0.517. In future work, we plan to investigate how the MDS representation (or a \nrepresentation like it) might be derived directly from the face images. \n\n\f30 \n\nM N. Dailey, G. W Cottrell and T A. Busey \n\nBesides providing an infonnation-processing account of the human data, there are several \nother avenues for future research. These include empirical testing of our distinctiveness \npredictions, evaluating the applicability of the distinctiveness model in domains other than \nface processing, and evaluating the ability of other modeling paradigms to account for this \ndata. \n\nAcknowledgements \n\nWe thank Chris Vogt for comments on a previous draft, and other members of Gary's \nUnbelievable Research Unit (GURU) for earlier comments on this work. This research was \nsupported in part by NIMH grant MH57075 to GWe. \n\nReferences \n\nBishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press, \n\nOxford. \n\nBusey, T. A. (1999). Where are morphed faces in multi-dimensional face space? Psycho(cid:173)\n\nlogical Science. In press. \n\nBusey, T. A. and Tunnicliff, J. (submitted). Accounts of blending, distinctiveness and \ntypicality in face recognition. Journal of Experimental Psychology: Learning, Memory, \nand Cognition. \n\nDailey, M. N., Cottrell, G. W., and Busey, T. A. (1998). Eigenfaces for familiarity. In \nProceedings of the Twentieth Annual Conference of the Cognitive Science Society, pages \n273-278, Mahwah, NJ. Erlbaum. \n\nGillund, G. and Shiffrin, R. (1984). A retrieval model for both recognition and recall. \n\nPsychological Review, 93(4):411-428. \n\nJ. Buhmann, M. L. and von der Malsburg, C. (1990). Size and distortion invariant object \nrecognition by hierarchical graph matching. In Proceedings of the IJCNN International \nJoint Conference on Neural Networks, volume II, pages 411-416. \n\nNosofsky, R. M. (1986). Attention, similarity, and the identification-categorization rela(cid:173)\n\ntionship. Journal of Experimental Psychology: General, 116(1):39-57. \n\nReinitz, M., Lammers, W., and Cochran, B. (1992). Memory-conjunction errors: Mis(cid:173)\ncombination of stored stimulus features can produce illusions of memory. Memory & \nCognition, 20(1):1-11. \n\nSolso, R. L. and McCarthy, J. E. (1981). Prototype formation offaces: A case of pseudo(cid:173)\n\nmemory. British Journal of Psychology, 72(4):499-503. \n\nTanaka, J., Giles, M., Kremen, 5., and Simon, V. (submitted). Mapping attract or fields in \n\nface space: The atypicality bias in face recognition. \n\nTurk, M. and Pentland, A. (1991). Eigenfaces for recognition. The Journal of Cognitive \n\nNeuroscience, 3:71-86. \n\nValentine, T. and Endo, M. (1992). Towards an exemplar model of face processing: The \neffects of race and distinctiveness. The Quarterly Journal of Experimental Psychology, \n44A(4):671-703. \n\n\f", "award": [], "sourceid": 1527, "authors": [{"given_name": "Matthew", "family_name": "Dailey", "institution": null}, {"given_name": "Garrison", "family_name": "Cottrell", "institution": null}, {"given_name": "Thomas", "family_name": "Busey", "institution": null}]}