{"title": "Beauty-in-averageness and its contextual modulations: A Bayesian statistical account", "book": "Advances in Neural Information Processing Systems", "page_first": 4082, "page_last": 4092, "abstract": "Understanding how humans perceive the likability of high-dimensional ``objects'' such as faces is an important problem in both cognitive science and AI/ML. Existing models generally assume these preferences to be fixed. However, psychologists have found human assessment of facial attractiveness to be context-dependent. Specifically, the classical Beauty-in-Averageness (BiA) effect, whereby a blended face is judged to be more attractive than the originals, is significantly diminished or reversed when the original faces are recognizable, or when the blend is mixed-race/mixed-gender and the attractiveness judgment is preceded by a race/gender categorization, respectively. This \"Ugliness-in-Averageness\" (UiA) effect has previously been explained via a qualitative disfluency account, which posits that the negative affect associated with the difficult race or gender categorization is inadvertently interpreted by the brain as a dislike for the face itself. In contrast, we hypothesize that human preference for an object is increased when it incurs lower encoding cost, in particular when its perceived {\\it statistical typicality} is high, in consonance with Barlow's seminal ``efficient coding hypothesis.'' This statistical coding cost account explains both BiA, where facial blends generally have higher likelihood than ``parent faces'', and UiA, when the preceding context or task restricts face representation to a task-relevant subset of features, thus redefining statistical typicality and encoding cost within that subspace. We use simulations to show that our model provides a parsimonious, statistically grounded, and quantitative account of both BiA and UiA. We validate our model using experimental data from a gender categorization task. We also propose a novel experiment, based on model predictions, that will be able to arbitrate between the disfluency account and our statistical coding cost account of attractiveness.", "full_text": "Beauty-in-averageness and its contextual\nmodulations: A Bayesian statistical account\n\nChaitanya K. Ryali\n\nDepartment of Computer Science and Engineering\n\nUniversity of California San Diego\n\n9500 Gilman Drive La Jolla, CA 92093\n\nrckrishn@eng.ucsd.edu\n\nAngela J. Yu\n\nDepartment of Cognitive Science\nUniversity of California San Diego\n\n9500 Gilman Drive La Jolla, CA 92093\n\najyu@ucsd.edu\n\nAbstract\n\nUnderstanding how humans perceive the likability of high-dimensional \u201cobjects\u201d\nsuch as faces is an important problem in both cognitive science and AI/ML. Existing\nmodels generally assume these preferences to be \ufb01xed. However, psychologists\nhave found human assessment of facial attractiveness to be context-dependent.\nSpeci\ufb01cally, the classical Beauty-in-Averageness (BiA) effect, whereby a blended\nface is judged to be more attractive than the originals, is signi\ufb01cantly diminished\nor reversed when the original faces are recognizable, or when the blend is mixed-\nrace/mixed-gender and the attractiveness judgment is preceded by a race/gender\ncategorization, respectively. This \"Ugliness-in-Averageness\" (UiA) effect has\npreviously been explained via a qualitative dis\ufb02uency account, which posits that\nthe negative affect associated with the dif\ufb01cult race or gender categorization is\ninadvertently interpreted by the brain as a dislike for the face itself. In contrast,\nwe hypothesize that human preference for an object is increased when it incurs\nlower encoding cost, in particular when its perceived statistical typicality is high,\nin consonance with Barlow\u2019s seminal \u201cef\ufb01cient coding hypothesis.\u201d This statistical\ncoding cost account explains both BiA, where facial blends generally have higher\nlikelihood than \u201cparent faces\u201d, and UiA, when the preceding context or task restricts\nface representation to a task-relevant subset of features, thus rede\ufb01ning statistical\ntypicality and encoding cost within that subspace. We use simulations to show\nthat our model provides a parsimonious, statistically grounded, and quantitative\naccount of both BiA and UiA. We validate our model using experimental data from\na gender categorization task. We also propose a novel experiment, based on model\npredictions, that will be able to arbitrate between the dis\ufb02uency account and our\nstatistical coding cost account of attractiveness.\n\n1\n\nIntroduction\n\nHumans readily express liking and disliking for complex, high-dimensional \u201cobjects\u201d, be they\nfaces, movies, houses, technology, books, or life partners, even if they cannot verbalize exactly\nwhy. Understanding how these preferences arise is important for both cognitive science, and for AI\nsystems that interact with humans. In particular, face processing presents a prime case study for\ncomplex information processing in humans. Human, including very young babies [1], ef\ufb01ciently\nperform sophisticated computational tasks based on a brief glimpse of a face, such as recognizing\nindividuals, identifying emotional states, and assessing social traits such as attractiveness [2]. The\nlast phenomenon has an obvious impact on real-life decisions such as dating, employment, education,\nlaw enforcement, and criminal justice [3]. Existing models of human preferences, in both machine\nlearning and cognitive science, have generally assumed social processing of faces (e.g. attractiveness\njudgment) to be a \ufb01xed function of the underlying face features [4, 5, 6, 7, 8, 9]. However, a\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: BiA, UiA in celebrity morphs. (a) The middle face, a 50% blend of the faces on the left and\nright, is generally judged by human subjects to be more attractive than either \u201cparent\u201d face (from\n[12]). (b) Simulated typicality increases with increasing number of faces used in the blend. (c)\nSimulated typicality of facial blends increases as the \u201cparent\u201d faces are more evenly represented in\nthe blend. (d) Example image (from [11]) depicting a morph of two recognizable faces (here, Bush\nand Obama). (e) Blends of recognizable individuals are rated by human subjects as less attractive\nthan individual recognizable faces, while blends of stranger faces are rated as more attractive (adapted\nfrom [11]). (f) Simulated typicality has similar pattern as data (e). A constant offset of 6 was added\nto produce positive values. Simulation parameters: d = 60, drace = 1, s = 2, \u03c30 = 1, \u03c3r = 0.5 and\nk=1 pk = 0.05, all simulations in 2-d subspace, corresponding to a\n\n\u00b5 = 1, |K| = 50, \u03c3sal = 0.2,(cid:80)|K|\n\nrandom subspace or a distinctive feature subspace.\n\nseries of recent psychology experiments have indicated that facial preferences in the brain are\nnot \ufb01xed, but rather systematically dependent on what other face-processing task the observer is\nalso performing. Speci\ufb01cally, these experiments show that a classical phenomenon known as the\nbeauty-in-averageness (BiA) effect (see Figure 1a), whereby blends of multiple faces are usually\nfound to be more attractive than the originals [10], can be suppressed or even reversed (termed\nUgliness-in-Averageness or UiA), when the facial blends are created from recognizable faces [11]\n(see Figure 1d;e), or when attractiveness judgment of a mixed-race/mixed-gender blend is preceded\nby a racial/gender categorization task [12, 13] (see Figure 2a;b), respectively.\nThe facial BiA effect has long been seen as an example of human preference for highly prototypical\nstimuli over more unusual stimuli [14]. Early accounts explained this phenomenon as re\ufb02ecting a\nbiological predisposition to interpret prototypicality as a cue to mate value or reproductive health\n[15, 16]. However, this mate-value account cannot explain human preference for prototypicality in a\nvariety of natural and arti\ufb01cial objective categories such as dogs, birds, \ufb01sh, automobiles, watches,\nand even synthetic dot patterns [17, 18, 14]. Moreover, it cannot explain why attractiveness of facial\nblends should depend sensitively on the behavioral context, e.g. when required to do racial or gender\ndiscrimination [12, 13]. To explain this diverse array of phenomena, a more parsimonious account\nbased on processing \ufb02uency has been proposed: prototypes are processed more \u201c\ufb02uently\u201d and humans\nprefer more \ufb02uently processed stimuli [14, 12].\nWhile conceptually appealing, this \ufb02uency account does not explain what \u201c\ufb02uency\u201d really means\ncomputationally, nor in what sense it may be advantageous for the organism. We address issues, by\nhypothesizing that human preference for an object is increased when it incurs lower coding cost,\nin particular when its perceived statistical typicality is high. This hypothesis is consonant with\nideas from information theory [19] and Barlow\u2019s \u201cef\ufb01cient coding hypothesis,\u201d which stipulates that\nneural encoding should be organized so as to minimize the energy expenditure needed to represent\nthe sensory environment, in particular using fewer spikes (expending less energy) to encode more\nprobable stimuli [20, 21, 22, 23]. Ef\ufb01cient coding is necessary given that the brain accounts for\n\n2\n\n\fFigure 2: UiA of biracial blends. (a) Example stimuli used in [12], with the middle face being a 50%\nblend of the Asian and Caucasian faces on either side. (b) Mean attractiveness ratings for single-race\n(left), and mixed-race blends (right), when race categorization preceded the attractiveness rating\n(from [12]). (c) Simulated typicality exhibits similar pattern as human data. A constant offset of 2\nwas added to produce positive values. Simulation parameters same as in Figure 1.\n\n20% of the adult human body\u2019s energy expenditure, but only 2% of its weight [24]. In the context\nof faces and other complex objects, we suggest that the brain represents each new stimulus by \ufb01rst\n\ufb01nding the closest previously learned representative (prototype) [25], and then use neural activations\nto encode the discrepancy between the features of the inferred prototype and the current stimulus,\nperhaps via predictive coding [26], whereby the top-down inputs instantiate prototypical expectations\nand bottom-up activation encode additional prediction errors. Without delving into a detailed neural\nimplementation of such a computational process [27], we broadly quantify this \u201ccoding cost\u201d in\nterms of category-conditional log likelihood (Section 2): the smaller the likelihood of the stimulus\nconditioned on the inferred category, the more encoding cost it incurs. To summarize, we recast\n\u201cprocessing \ufb02uency\u201d into this statistically grounded notion of \u201ccategory-conditional likelihood\u201d, and\npropose that a statistically likely stimulus is preferable for humans due to lower neural coding cost,\nor less energy required to encode the prediction error.\nBesides coding ef\ufb01ciency, we make a second major assumption, which is that the brain uses attentional\nmechanisms to focus or project its face representation into a subset (subspace) of task-relevant features,\nand that statistical typicality and thus coding cost are rede\ufb01ned in this projected representation. This\nis consistent with a large body of work showing that perceptual tasks are often performed in a low-\ndimensional, task-relevant subspace [28] focusing on informative dimensions in high dimensional\ndata, using attentional modulation that can be either top-down goal-directed [29] or bottom-up\nsaliency-based [30]. It is also consistent with a body of work showing that people often use category\nmembership to predict features and reason about members of the category [31, 32].\nWe will give a short intuitive explanation of how our theory explains both BiA and UiA. In BiA,\nassuming the population distribution of faces is unimodal (something we will verify on real face\nimage data in Section 3), facial blends will generally have higher likelihood than \u201cparent faces\u201d, and\ntherefore higher attractiveness ratings. On the other hand, when the behavioral context or task induces\nthe brain to restrict face representation to a task-relevant subset of features, such as the subspace that\nmaximally separates male and female faces in a gender discrimination task, then statistical typicality\nis rede\ufb01ned within that subspace to be particularly unfavorable toward stimuli that are situated in\nbetween categories in the task-relevant subspace but close to the center in the original undifferentiated\nrepresentation. This scenario explains why bi-racial or bi-gender blends are generally perceived as\nmore attractive than their \u201cparent faces\u201d, but that effect is diminished or even reversed when there is\nan explicit race- or gender-discrimination task [12, 13], respectively. Similarly, our model predicts\nthat blends of familiar faces should be viewed as less attractive [11]: we hypothesize familiar faces\nare represented by their statistically distinctive features (features that differentiate them from generic\n\n3\n\n c.Race Categorization00.20.40.60.811.2Typicalityb.Race Categorization00.20.40.60.811.2TypicalitySingle RaceAttractivenessMixed RaceSingle RaceAttractivenessMixed RaceSingle RaceAttractivenessMixed Race\ffaces), and the recognition of familiar faces leads the attentional system to focus on the subset of\ndistinctive features, within which the facial blends are statistically relatively less likely.\nThe rest of the paper is organized as follows. In Section 2, we formally de\ufb01ne coding cost, sta-\ntistical typicality, and attentional focusing in face processing. In Section 3.1, we use an abstract\nstatistical model to illustrate our theory, and show how BiA and context-induced UiA can arise\nunder our statistical assumptions. In Section 3, we use a real face image data set, and an explicit\nface representation commonly used in machine vision, to \ufb01t a parametric distributions of faces. We\nthen verify statistical assumptions of our abstract model using this face dataset, as well as making\ndetailed predictions of attractiveness (as a percentages of bi-racial and bi-gender blend) based on\nour measure of statistical typicality, the category-conditional log likelihood. Then, using actual face\nstimuli used in a gender-discrimination task [12] and projecting them into our face representation,\nwe show that our predicted facial attractiveness of these stimuli signi\ufb01cantly correlate with subjects\u2019\nactual attractiveness ratings. Next, we propose an experiment to disambiguate the dis\ufb02uency and\nstatistical typicality accounts of UiA (Section 4). Finally, we conclude in Section 5 with a discussion\non the limitations of our model and future directions of research.\n\n2 A Formal Representation of Faces and Attractiveness\nWe assume humans have an internal d-dimensional representation of faces X [33, 5, 34], in which\neach face is represented by a vector x = (x1, . . . , xd) of d real-valued features. We also assume\nthat this face space is endowed with a probability distribution pX (x) representing the perceived\ndistribution of faces in the environment [35], which in general can be a complex mixture distribution\nwith different components corresponding to different subgroups of the population (e.g. different races,\ngenders, other subtypes). In general, we assume facial attractiveness is proportional to log likelihood\nof the face, log pX (x). In the absence of any categorization task, explicit (e.g. race, gender) or\nimplicit (e.g. individual recognition), we assume more likely faces are preferred. This explains the\nBiA effect, as long as pX (x) is approximately a single-peaked distribution (e.g. Gaussian), since the\naverage of two points drawn from such a distribution will probably have a higher likelihood than the\noriginal two points. We will show in Section 3 that the empirical distribution of a large sample of real\nface images is indeed approximately Gaussian.\nIt may seem puzzling that pX (x) is both approximately Gaussian and a mixture distribution. The\nreason is that the different components differ from each other only on a small subset of features.\nFor example, male and female faces may have indistinguishable distributions along most featural\ndimensions, but be quite distinct in features that are especially gender discriminating. We will see\nthat real face data indeed exhibit this property in Section 3.\nWhen the observer performs a categorization task, such as race discrimination, we assume the\nattractiveness of a face stimulus is proportional to the category-conditional log-likelihood log pX (x|c),\nwhere c is the estimated category, among those of potential interest, based on the general distribution\npX (x). For example, pX (x) can be viewed as approximately a mixture of two Gaussians (male\nand female) in the gender discrimination task (see Section 3). We propose that the brain \ufb01rst uses\nBayesian estimation to estimate the gender of a face (e.g. c = male), and then the attractiveness\nof the face would be proportional to pX (x|c = male), which is inversely related to the coding cost\nnecessary to represent x on top of knowing its category.\nAn additional wrinkle is that we assume, in the context of a particular task, the brain projects the\nface space and its distribution into a task-relevant subspace \u02dcX \u2286 X . Denoting the projection of a\nface x into a subspace \u02dcX as \u02dcx, we rede\ufb01ne statistical typicality in the subspace \u02dcX as log p \u02dcX (\u02dcx|c), the\nlog-likelihood of \u02dcx constrained to the subspace \u02dcX , conditioned on the Bayesian estimated category c.\nIn a race- or gender-categorization task, we assume the brain projects the face space into a race- or\ngender-informative subspace \u02dcXcat, respectively. Bi-racial/bi-gender blends are statistically atypical\nof both categories in this projected space, where the different race or gender categories are clearly\ndistinct, thus resulting in low attractiveness ratings.\nWe also use this projection idea to model BiA in blends of familiar or recognizable faces. We\nassume that the brain by default performs recognition, a categorization task, whereby recognizable\nfaces have their own modes, plus there is a general distribution for all unfamiliar faces. It has been\nsuggested that people encode familiar faces using features that are most distinctive/salient, as this is\n\n4\n\n\fnot only computationally ef\ufb01cient but may also boost recognition [36]. Accordingly, we assume that\na familiar face is represented by its s statistically most distinctive (atypical) features: we assume xf\nis represented by its veridical value, if it is among the top s z-scored dimensions, and 0 otherwise. We\nassume that a blend of two recognizable faces x induces an implicit categorization in the subspace\n\u02dcXsal spanned by a subset of the distinctive features of the parent faces (an alternative is to project\nto the subspace spanned by the blend\u2019s own distinctive features, an approach that yields similar\nresults in our simulations, which are not shown here). Similar to explicit categorization tasks, c is\nthe a posteriori (i.e., after classi\ufb01cation) most probable identity, and statistical typicality in this case\n(\u02dcxsal|c). The attractiveness rating is low for the blend x in this subspace, because it is\nis log p \u02dcXsal\nsuf\ufb01ciently unlike either of the parent faces (low conditional likelihood), but also low likelihood\nrelative to the general distribution given that this is by de\ufb01nition the subspace of facial features that\nare distinctive (statistically unlikely). Relatedly, people might not compute the statistical typicality of\na face with respect all the underlying features in the absence of an implicit or explicit categorization\ntask, and may do some only for a random subset of features.\n\n3 Demonstrations\n\nWe will \ufb01rst present a simple abstract model in section 3.1 that captures both BiA and as well as UiA\nin various contexts. The simplicity of this model is deliberate, in that it is meant to be both expository,\nas well as demonstrating the generality of our proposal, since BiA and UiA are not speci\ufb01c to faces\nbut emerge for other natural and synthetic objects [37, 14, 38]. In section 3.2, we use a data-based\nface space representation for further validation.\n\n3.1 Abstract Model\n\n3.1.1 Generative Model\nWe assume that humans internally represent each face x = (x1, . . . , xd) \u2208 Rd as generated from\na mixture of Gaussians, whereby the components can either correspond to well-known faces {fi}\n(assume K of these) or demographic subgroups {hr} (assume G of these, e.g. gender, race),\n\n0\n\n\u03c32\n0\n\n0\n\n\u03a3r =\n\n(cid:20)\u03c32\n\nr\n\n1drace\u00d7drace\n\nr=1\n\ng(x) =\n\nqrhr(x),\n\nwhere hr(x) = N (x; \u00b5r, \u03a3r), fk(x) = N (x; \u00b5k, \u03a3k) and(cid:80)|K|\n(cid:21)\n\nk=1 pk << 1 as the number of\nknown faces should be much fewer than unknown faces. We assume that the distributions of the\nmixture components hr differ only in a small number of dimensions, 1, . . . , drace and are identical on\nthe other dother := d \u2212 drace dimensions. Speci\ufb01cally, we assume \u00b5r,drace+1:d = 0 \u2208 Rdother and\n\n1dother\u00d7dother\n\ndrace << d, the mixture g(x) = (cid:80)|G|\n\n(3)\nwhere 1n\u00d7n is an identity matrix of dimensions n \u00d7 n. For simplicity, we assume |G| = 2 and set\n\u00b51 = \u2212\u00b52 = \u00b5, where \u00b51:drace = [\u00b5, . . . , \u00b5] \u2208 Rdrace. We also set the prior/mixture probability\ndistribution q to be uniform.\nApproximation. Note that since the statistics of hr differ only in a small number of dimensions\nr=1 qrhr(x) is well approximated by \u02dcg(x) = N (x; \u00b50, \u03a30),\nwhere \u00b50 = 0 \u2208 Rd and \u03a30 = \u03c32\n1n\u00d7n and can be assumed to be used to perform inference except\nwhen demographic features bear relevance, thus simplifying computations and representation.\nSalient feature representation. The mixture components {fk} represent known/recognizable faces,\nwhere the variance in each component corresponds to natural variability in a face, such as variations\nin pose or expressions. For each face k, we assume subjects encode/represent only s distinctive\nfeatures (relative to the assumed generative distribution) as described in the previous section, denoted\nby ik\nsal) and assume the same statistics\n\ns (the variance along these dimensions is denoted as \u03c32\n\n1, . . . , ik\n\n0\n\n,\n\n5\n\nX \u223c\n\npkfk(x) + (1 \u2212\n\npk)g(x),\n\n|K|(cid:88)\n\nk=1\n\n|K|(cid:88)\n|G|(cid:88)\n\nk=1\n\n(1)\n\n(2)\n\n\falong other dimensions as \u02dcg(x), the approximate, assumed generative distribution for a generic,\nunfamiliar face.\nRecognition. For simplicity, we assume the brain applies Bayes\u2019 rule to compute the posterior\nfor each face x, and then picks the most probable category in each case via maximum a posteriori\nestimation.\n\n3.1.2 Simulation Results\nBiA. We \ufb01rst examine whether our abstract model can capture some nuances of the BiA effect. Our\nsimulation shows that as the number of constituent faces that go into the blend increases, the typicality\nof the blend increases, so that the blend is expected to be perceived as increasingly more attractive\n(see Figure 1b). This is consistent with the \ufb01nding that attractiveness of faces increases (decreases)\nwhen they are distorted towards (away from) the population mean [39, 4]. Additionally, it has been\nfound that more evenly blended face images are perceived as more attractive [12], something that is\nalso captured by the typicality measure in a simulation of the abstract model (Figure 1c).\nUiA: Familiar Faces. In [11], participants from Netherlands and New-Zealand rated blends of local\ncelebrities (people famous in one country but not the other). Blends of unknown celebrities were\nrated as more attractive than the \u201cparent\u201d face images (classic BiA), while blends of local celebrities\nwere rated as less attractive relative to the constituent images: a reversal of BiA. An example image\n(from [11]) depicting a morph of two recognizable faces can be seen in Figure 1d, while Figure 1e\nshows BiA and its reversal in data from the study. As discussed in section 2, low statistical typicality\nof the blend in the distinctive feature subspace (here 1-d) results in UiA. Simulations qualitatively\ncapture this effect in Figure 1f.\nUiA: Race Categorization. In [12], participants rated mixed and single race blends on attractiveness\nafter performing a race categorization task (Asian or Caucasian). An example of stimuli used in\n[12] is shown in Figure 2a. Data in Figure 2b shows that mixed-race morphs are rated as less\nattractive relative to single race morphs when a race categorization task preceded the attractiveness\njudgment. As previously hypothesized in section 2, low statistical typicality of a mixed race blend in\nthe subspace of race informative features induces UiA. For simplicity, we assume this subspace is\nthe one determined by Linear Discriminant Analysis (LDA). Simulations qualitatively capture the\nbehaviour of attractiveness judgments in data (Figure 2c).\n\n3.2 Data-Based Face Representation\n\nWe model faces using the Active Appearance Model (AAM), a well-established machine vision\ntechnique that reconstructs images well, generates realistic synthetic faces, produces a latent rep-\nresentation of only a few dozen features [40, 41, 42], and whose features seem to be encoded by\nface processing neurons in the monkey brain [43]. AAM assigns each face image a vector of shape\nfeatures, which are just the (x, y) coordinates of some consistently de\ufb01ned landmarks across all faces\n\u2013 in our case, we use the free software Face++ 1, which labels 83 landmarks (e.g. contour points of\nthe mouth, nose, eyes). AAM also assigns each face image a vector of texture features, which are\nthe grayscale pixel values of a warped version of the image after aligning the landmark locations\nto the average landmark locations across the data set (see Figure 3a for a schematic illustration of\nAAM). Consistent with standard practice for reducing the dimensional of AAM [40, 41, 42, 43], we\nperform principal component analysis (PCA) on each of the shape and texture features. In addition,\nto remove the correlation among shape and texture features, we then perform another PCA to get\njoint shape-texture features that are statistically uncorrelated with one another. We use the top 60\nprincipal components (highest eigenvalues) (d = 60 for our face space). We train our version of\nAAM using a publicly available dataset of 597 face images [44], with neutral facial expression taken\nin the laboratory.\nFirst, we validate several assumptions made in the abstract model. We \ufb01nd the distribution of faces\nlearned from data is indeed approximately normal along a random face space (AAM) axis (Figure\n3b), but a mixture of two Gaussians in race-informative (Figure 3c) or gender-informative subspaces\n(Figure 3d), found using LDA. We then use this face data-informed AAM representation to make\nnuanced predictions about facial attractiveness as a function of % blend between faces of different\nraces or different genders [12, 13]. We \ufb01rst randomly drew 60 Asian and white face images (with\n\n1https://www.faceplusplus.com\n\n6\n\n\fFigure 3: AAM-based face representation and model simulations. (a) AAM consists of shape and\ntexture features; a joint PCA is then conducted over both types of features to remove correlations. (b)\nThe empirical distribution projected in a random direction (1-d subspace) is normally distributed. (c)\nThe empirical distribution of white and Asian faces projected into a race-informative subspace (1D\nsubspace obtained by LDA) is approximately a mixture of two normal distributions. x: mean location\nof face images (60 total) for each % of blend, i.e. 1: 100% white and 0% Asian, . . ., 11: 0% white\nand 100% Asian. (d) Analogous to (c) by projecting faces into a gender-informative subspace. x:\nmean location of face images (10 total) for each % of blend (male image: female image). (e) Model\nsimulated typicality for actual face images (indexed by racial blend proportion) exhibits BiA as a\nfunction of racial blend when there is no categorization task. (f) Analogous to (e), but indexed by\ngender blend proportion. (g) Model simulated typicality for actual face images exhibits UiA when\nstatistical typicality is measured in the race-informative subspace. (h) Analogous to (g) but measured\nin the gender-informative subspace.\n\nreplacement) from the face data set [44], and then blended them at 10% increments, from 100% of\nthe white faces to 100% of the Asian faces, thus producing 11 morphs of each pair (see Figures 3c).\nAverage predicted typicality, de\ufb01ned as category-conditional log likelihood, has an inverted U shape\n(BiA) relative to % racial blend in the original space (Figures 3e) or a random subspace (not shown),\nbut a U shape (UiA) in the race-informative subspace (Figures 3g). Analogously, when 60 pairs of\nmale and female faces are randomly drawn and then blended in different proportions, the model\npredicts typicality, and thus attractiveness, to have similar BiA (Figures 3f) and UiA effects (Figures\n3h).\n\n3.3 Experimental Validation: Attractiveness of Individual Faces\n\nUsing actual face stimuli used in the gender discrimination study [13], as well as subjects\u2019 actual\nattractiveness ratings, we can assess the ability of our model to predict the attractiveness of individual\nface images. In this study, subjects rated the attractiveness of blends from 10 unique pairs of male\nand female \u201cparent\u201d faces, in different proportions (10% increment, see Figure 4a), under either\nthe control condition (no gender discrimination), or the experimental condition (following gender\ndiscrimination). We projected the stimuli into our AAM space (Figure 4b), and computed statistical\ntypicality in the original/full space as well as the task informative subspace (Figure 4c;d). Even\nthough this study used only 10 pairs of face images, we see that the model-predicted BiA/UiA effects\nare very similar to those based on a much larger random sampling from our face dataset (Figure\n3f;h), and the predicted UiA pattern corresponds well to the reported attractiveness of the actual faces\n(Figure 4e). Moreover, we \ufb01nd that the difference in attractiveness ratings between experimental and\ncontrol conditions correlate signi\ufb01cantly with model-predicted typicality of individual face images\n(r = 0.30, p = 0.0017). We never expected the correlation to be close to 1, because there are clearly\nother determinants of facial attractiveness besides typicality, such as perceptual and conceptual\n\n7\n\n1234567891011-20-1001020Gender Informative Subspace00.10.20.30.4ProbabilityMaleFemaleAverage Morph Location050100%Female-125-120-115-110-105-100TypicalityOriginal Space%FemaleTypicality050100-3.3-3.2-3.1-3-2.9-2.8-2.7-2.6-2.5-2.4Gender Informative Subspace1234567891011-40-2002040Race Informative Subspace00.10.20.30.4ProbabilityWhiteAsianAverage Morph LocationRandom Direction-5005000.050.10.150.20.25050100%Asian-9-8-7-6-5-4-3-2Race Informative SubspaceTypicality050100-110-108-106-104-102-100-98-96-94Original Space%AsianTypicalitya.b.c.d.e.f.g.h.\fpriming, contrast, clarity, and symmetry [14]. Indeed, female faces are generally found to be more\nattractive than male faces [14], which is why we plot the difference in attractiveness rating between\nexperimental and control conditions \u2013 this removes any baseline effects of gender.\n\nFigure 4: UiA of bi-gender blends induced by gender discrimination. (a) Example stimuli used in\n[13]: blends of varying proportions of male and female \u201cparent\u201d faces. (b) Analogous to Figure\n3c;d, but using the actual experimental stimuli projected into our trained AAM. (c), (d) Analogous to\nFigure 3e;f;g;h, but using actual experimental stimuli projected into our trained AAM. (e) Difference\nin attractiveness ratings between experimental and control condition versus blend %.\n\n4 Disentangling Dis\ufb02uency and Typicality accounts\n\nIn the simulations and experiments considered so far, our statistical typicality account and the\ndis\ufb02uency account make qualitatively similar predictions, because the dif\ufb01culty of categorization and\nstatistical typicality are monotonically related: the faces that are hardest to categorize are also the least\nlikely given either category. To disambiguate these accounts, we need an experiment that dissociates\ncategorization dif\ufb01culty with statistical atypicality. We therefore suggest the following experiment\nthat involves discriminating age, which is unimodally distributed (see the empirical distribution of\nage [45] in Figure 5a). The proposed experimental condition is to rate the attractiveness of faces after\ndiscriminating age: is the person older or younger than 37 years old? According to the statistical\ntypicality account, attractiveness ratings would look like Figure 5b, having a shape similar to the\npopulation distribution (Figure 5a). In contrast, the dis\ufb02uency account would qualitatively predict\nthe dif\ufb01culty of categorization to be the greatest and thus processing \ufb02uency and attractiveness to\nbe lowest near the categorization boundary. Figure 5c illustrates this by simply joining two lines\nthat decrease toward the categorization boundary. To summarize, our model predicts BiA in this\nexperiment, while the dis\ufb02uency account predicts UiA.\n\nFigure 5: (a) Empirical distribution of ages in dataset [45]. (b) Predictions of our typicality based\nmodel. (c) Predictions based on a dis\ufb02uency account.\n\n5 Discussion\n\nMost existing models of human preferences assume these preferences to be \ufb01xed and do not model\ncontextual dependence. In this paper, we propose a statistically grounded model of human \u201cliking\u201d,\nwhereby the attractiveness of a stimulus depends on its neural coding cost, in particular how likely it\nis relative to its perceived category. This argument is based on information-theoretic considerations,\nin particular the coding cost associated with statistically unlikely stimuli, and is related to Barlow\u2019s\n\u201cef\ufb01cient coding hypothesis.\u201d Additionally, we assume that humans naturally project high-dimensional\n\n8\n\n050100%Female-130-128-126-124-122-120-118-116TypicalityOriginal Space050100%Female00.20.40.60.81AttractivenessExpt-Control100050%Female-3-2.8-2.6-2.4-2.2-2-1.8-1.6TypicalityGender Informative Subspace1011123456789-10-50510Gender Informative Subspace00.10.20.30.4ProbabilityMaleFemaleAverage Morph Location%Female10080200a.b.c.d.4060e.Age-3.5-3-2.5-2-1.5-1TypicalityFluencya.b.c.\fdata, such as faces, in a task-relevant manner to a low-dimensional subspace representation, either\nvia top-down goal-directed speci\ufb01cation (informativeness with respect to a particular discrimination\nproblem, such as race or gender), or via bottom-up saliency (distinctiveness with respective to the\nassumed generative distribution). Under our framework, therefore, the attractiveness of a stimulus\nis context-dependent for two different reasons: (1) the set of hypotheses under consideration in the\nBayesian posterior computation is context-dependent (e.g. the two gender categories compose the\nhypothesis space in a gender discrimination task), and (2) the statistical distribution corresponding to\nthe generative model changes according to the featural subspace that supports the current task (e.g.\nduring a gender discrimination task, the face space and its distribution are de\ufb01ned only with respect\nto the subspace that is most-informative for gender discrimination).\nWhile race and gender correspond to existing multi-modality in the distribution of faces, our theory\nwould suggest that UiA can be produced in arbitrary cross-mode blends, if human participants can\nlearn novel bimodal distributions of faces in an experimental setting \u2013 an experimentally testable\nprediction. Relatedly, it is worth making a distinction between statistical typicality as we de\ufb01ne it,\nand protoypicality, as is usually conceived in the psychology literature [14]. Prototypicality implies\nthere are clear modes in the stimulus distribution, and how prototypical a stimulus is presumably\ndepends on how \u201cclose\u201d it is to the closest mode. Typicality, as we de\ufb01ne it, however, does not always\ndepend monotonically on distance to the closest mode, and is well de\ufb01ned even for distributions that\nhave no distinct modes (such as a uniform distribution). Separately, it is important to reiterate that\nwe do not claim that statistical typicality or coding cost is the only determinant of attractiveness or\n\u201cliking.\u201d Many other factors have been shown experimentally to be important for human preferences\nof complex objects, such as perceptual and conceptual priming, contrast, clarity, and symmetry [14].\nIn addition to providing a statistically grounded explanation of contextual dependence of human\nattractiveness judgment, our work also provides some general insight as to how high-dimensional data\ncan be analyzed and stored ef\ufb01ciently: the system needs to be able to dynamically shift its subspace\nprojection according to task demands, so as to reduce the need for representational and computational\ncomplexity at any given moment. Moreover, our work suggests two different ways to identify the\nappropriate subspace projection (and thus the appropriate form of complexity reduction). One is\nsupervised, task-speci\ufb01ed choice of hypothesis space, and thus the corresponding subspace projection\nthat best discriminate the hypothesis \u2013 we argue this is what underlies context-induced decrease in the\nattractiveness of bi-racial/bi-gender faces during race- and gender-discrimination tasks, respectively.\nThe other route is unsupervised, saliency-induced subspace projection, in which the statistically\nunlikely (relative to population distribution) features of a high-dimensional stimulus are privileged in\ntheir processing and encoding, and subsequent computations are performed within this subspace \u2013\nthis is what underlies our explanation of the UiA effect in celebrity-blend faces. The general idea\nof \u201ctagging\u201d high-dimensional data by their distinctive features seems like a good way to store and\nanalyze complex data. Our work sheds light on one possible role played by attention: it is one way\nto dynamically construct subspaces that emphasizes feature dimensions that are most relevant or\nsalient for performing the task at hand. There is a broad and confusing literature of attention in\nboth psychology and neuroscience. A productive direction of future research would be to relate our\nhypothesized role of attention to that larger literature.\nThough human attractiveness perception is interesting in itself, we are more fundamentally interested\nin a computational understanding of how the brain encodes and processes complex, high-dimensional\ndata (e.g. faces), and how attention can dynamically alter the featural representation in a task-\ndependent manner. Faces are appealing because they are informationally rich, ecologically important,\nand for which we have a computationally tractable and neurally relevant [3] parametric representation\n(AAM). We therefore used faces to implement/test concrete ideas about information representation and\nits contextual modulation in this work. The attractiveness literature provides a convenient empirical\ntest of our theory, but we expect that the dynamic representational framework we hypothesize here\nto also affect other cognitive processes, such as working memory, learning, decision-making, and\nproblem-solving, in the sense that all these cognitive processes depend on what features are currently\nmade salient by attentional mechanisms. For example, learning to memorize a set of items should be\neasier if one\u2019s attention is focused the features that make these items easiest to organize conceptually.\nIndeed, while the ecological bene\ufb01ts of \u201cliking\u201d based on encoding cost is rather generic and long-\nterm, the bene\ufb01ts of cognitive expediency or accuracy derived from focusing on task-relevant features\nare acute and immediate. The latter points to a promising line of future research.\n\n9\n\n\f6 Acknowledgments\n\nWe thank Piotr Winkielman and Jamin Halberstadt for sharing the gender categorization data and\nhelpful discussions, Samer Sabri for helpful input with the writing and the anonymous reviewers for\nhelpful comments. This project was partially funded by a UCSD Academic Senate research award to\nAJY.\n\nReferences\n[1] Langlois, J. H., Roggman, L. A. & Rieser-Danner, L. A. Infants\u2019 differential social responses to attractive\n\nand unattractive faces. Developmental Psychology 26, 153 (1990).\n\n[2] Gauthier, I., Tarr, M. & Bub, D. Perceptual Expertise: Bridging Brain and Behavior (OUP USA, 2010).\n[3] Todorov, A., Olivola, C. Y., Dotsch, R. & Mende-Siedlecki, P. Social attributions from faces: Determinants,\n\nconsequences, accuracy, and functional signi\ufb01cance. Annual Review of Psychology 66 (2015).\n\n[4] Kagian, A., Dror, G., Leyvand, T., Cohen-Or, D. & Ruppin, E. A humanlike predictor of facial attractiveness.\n\nIn Advances in Neural Information Processing Systems, 649\u2013656 (2007).\n\n[5] Said, C. P. & Todorov, A. A statistical model of facial attractiveness. Psychological Science 22, 1183\u20131190\n\n(2011).\n\n[6] Todorov, A., Dotsch, R., Wigboldus, D. H. & Said, C. P. Data-driven methods for modeling social\n\nperception. Social and Personality Psychology Compass 5, 775\u2013791 (2011).\n\n[7] Todorov, A., Said, C. P., Engell, A. D. & Oosterhof, N. N. Understanding evaluation of faces on social\n\ndimensions. Trends in cognitive sciences 12, 455\u2013460 (2008).\n\n[8] Song, A., Linjie, L., Atalla, C. & Cottrell, G. Learning to see faces like humans: Modeling the social\n\ndimensions of faces. Journal of Vision 17, 837 (2017).\n\n[9] Guan, J., Ryali, C. & Yu, A. J. Computational modeling of social face perception in humans: Leveraging\n\nthe active appearance model. bioRxiv (2018).\n\n[10] Langlois, J. H. & Roggman, L. A. Attractive faces are only average. Psychological science 1, 115\u2013121\n\n(1990).\n\n[11] Halberstadt, J., Pecher, D., Zeelenberg, R., Ip Wai, L. & Winkielman, P. Two Faces of Attractiveness:\n\nMaking Beauty in Averageness Appear and Reverse. Psychological Science 24, 2343\u20132346 (2013).\n\n[12] Halberstadt, J. & Winkielman, P. Easy on the eyes, or hard to categorize: Classi\ufb01cation dif\ufb01culty decreases\n\nthe appeal of facial blends. Journal of Experimental Social Psychology 50, 175\u2013183 (2014).\n\n[13] Owen, H. E., Halberstadt, J., Carr, E. W. & Winkielman, P. Johnny Depp, Reconsidered: How Category-\nRelative Processing Fluency Determines the Appeal of Gender Ambiguity. PLOS ONE 11, e0146328\n(2016).\n\n[14] Winkielman, P., Halberstadt, J., Fazendeiro, T. & Catty, S. Prototypes are attractive because they are easy\n\non the mind. Psychological science 17, 799\u2013806 (2006).\n\n[15] Symons, D. The Evolution of Human Sexuality (Oxford University Press, 1979).\n[16] Thornhill, R. & Gangestad, S. W. Human facial beauty : Averageness, symmetry, and parasite resistance.\n\nHuman Nature (Hawthorne, N.Y.) 4, 237\u2013269 (1993).\n\n[17] Halberstadt, J. & Rhodes, G. The Attractiveness of Nonface Averages: Implications for an Evolutionary\n\nExplanation of the Attractiveness of Average Faces. Psychological Science 11, 285\u2013289 (2000).\n\n[18] Halberstadt, J. & Rhodes, G.\n\nIt\u2019s not just average faces that are attractive: Computer-manipulated\naverageness makes birds, \ufb01sh, and automobiles attractive. Psychonomic Bulletin & Review 10, 149\u2013156\n(2003).\n\n[19] Shannon, C. E. A mathematical theory of communication. Bell system technical journal 27, 379\u2013423\n\n(1948).\n\n[20] Barron, A., Rissanen, J. & Yu, B. The minimum description length principle in coding and modeling.\n\nIEEE Transactions on Information Theory 44, 2743\u20132760 (1998).\n\n[21] Cover, T. M. & Thomas, J. A. Elements of information theory 2nd edition. Willey-Interscience: NJ (2006).\n[22] Barlow, H. B. Possible principles underlying the transformations of sensory messages (1961).\n[23] Levy, W. B. & Baxter, R. A. Energy ef\ufb01cient neural codes. Neural Computation 8, 531\u2013543 (1996).\n[24] Raichle, M. E. & Gusnard, D. A. Appraising the brain\u2019s energy budget. Proceedings of the National\n\nAcademy of Sciences 99, 10237\u201310239 (2002).\n\n10\n\n\f[25] Murphy, G. The Big Book of Concepts (MIT press, 2004).\n[26] Rao, R. P. & Ballard, D. H. Predictive coding in the visual cortex: A functional interpretation of some\n\nextra-classical receptive-\ufb01eld effects. Nature neuroscience 2, 79 (1999).\n\n[27] Marr, D. Vision: A computational investigation into the human representation and processing of visual\n\ninformation. MIT Press. Cambridge, Massachusetts (1982).\n\n[28] Edelman, S. & Intrator, N. Learning as Extraction of Low-Dimensional Representations. In Mechanisms\n\nof Perceptual Learning, 353\u2013380 (Academic Press, 1996).\n\n[29] Navalpakkam, V. & Itti, L. Modeling the in\ufb02uence of task on attention. Vision Research 45, 205\u2013231\n\n(2005).\n\n[30] Itti, L. & Koch, C. Feature combination strategies for saliency-based visual attention systems. Journal of\n\nElectronic imaging 10, 161\u2013170 (2001).\n\n[31] Malt, B. C., Ross, B. H. & Murphy, G. L. Predicting features for members of natural categories when\ncategorization is uncertain. Journal of Experimental Psychology: Learning, Memory, and Cognition 21,\n646 (1995).\n\n[32] Murphy, G. L. & Ross, B. H. Uncertainty in category-based induction: When do people integrate across\ncategories? Journal of Experimental Psychology: Learning, Memory, and Cognition 36, 263\u2013276 (2010).\n[33] Valentine, T. A uni\ufb01ed account of the effects of distinctiveness, inversion, and race in face recognition.\n\nThe Quarterly Journal of Experimental Psychology Section A 43, 161\u2013204 (1991).\n\n[34] Oosterhof, N. N. & Todorov, A. The functional basis of face evaluation. Proceedings of the National\n\nAcademy of Sciences 105, 11087\u201311092 (2008).\n\n[35] Dotsch, R., Hassin, R. R. & Todorov, A. Statistical learning shapes face evaluation. Nature Human\n\nBehaviour 1, 0001 (2016).\n\n[36] Mauro, R. & Kubovy, M. Caricature and face recognition. Memory & Cognition 20, 433\u2013440 (1992).\n[37] Halberstadt, J. The generality and ultimate origins of the attractiveness of prototypes. Personality and\n\nSocial Psychology Review 10, 166\u2013183 (2006).\n\n[38] Vogel, T., Carr, E. W., Davis, T. & Winkielman, P. Category structure determines the relative attractiveness\nof global versus local averages. Journal of Experimental Psychology: Learning, Memory, and Cognition\n44, 250 (2018).\n\n[39] Rhodes, G. & Tremewan, T. Averageness, exaggeration, and facial attractiveness. Psychological science 7,\n\n105\u2013110 (1996).\n\n[40] Edwards, G. J., Cootes, T. F. & Taylor, C. J. Face recognition using active appearance models. In European\n\nConference on Computer Vision, 581\u2013595 (Springer, 1998).\n\n[41] Cootes, T. F., Edwards, G. J. & Taylor, C. J. Active appearance models. IEEE Transactions on Pattern\n\nAnalysis & Machine Intelligence 23, 681\u2013685 (2001).\n\n[42] Tzimiropoulos, G. & Pantic, M. Optimization problems for fast aam \ufb01tting in-the-wild. In Proceedings of\n\nthe IEEE International Conference on Computer Vision, 593\u2013600 (2013).\n\n[43] Chang, L. & Tsao, D. Y. The Code for Facial Identity in the Primate Brain. Cell 169, 1013\u20131028.e14\n\n(2017).\n\n[44] Ma, D. S., Correll, J. & Wittenbrink, B. The Chicago face database: A free stimulus set of faces and\n\nnorming data. Behavior Research Methods 47, 1122\u20131135 (2015).\n\n[45] Bainbridge, W. A., Isola, P. & Oliva, A. The intrinsic memorability of face photographs. Journal of\n\nExperimental Psychology: General 142, 1323 (2013).\n\n11\n\n\f", "award": [], "sourceid": 2018, "authors": [{"given_name": "Chaitanya", "family_name": "Ryali", "institution": "UC San Diego"}, {"given_name": "Angela", "family_name": "Yu", "institution": "UC San Diego"}]}