{"title": "Adaptation and Unsupervised Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 237, "page_last": 244, "abstract": null, "full_text": "Adaptation and Unsupervised Learning\n\nPeter Dayan Maneesh Sahani Gr\u00b4egoire Deback\n\nGatsby Computational Neuroscience Unit\n\n17 Queen Square, London, England, WC1N 3AR.\n\n dayan, maneesh\n\n@gatsby.ucl.ac.uk, gdeback@ens-lyon.fr\n\nAbstract\n\nAdaptation is a ubiquitous neural and psychological phenomenon, with\na wealth of instantiations and implications. Although a basic form of\nplasticity, it has, bar some notable exceptions, attracted computational\ntheory of only one main variety. In this paper, we study adaptation from\nthe perspective of factor analysis, a paradigmatic technique of unsuper-\nvised learning. We use factor analysis to re-interpret a standard view of\nadaptation, and apply our new model to some recent data on adaptation\nin the domain of face discrimination.\n\n1 Introduction\n\nAdaptation is one of the \ufb01rst facts with which neophyte neuroscientists and psychologists\nare presented. Essentially all sensory and central systems show adaptation at a wide variety\nof temporal scales, and to a wide variety of aspects of their informational milieu. Adap-\ntation is a product (or possibly by-product) of many neural mechanisms, from short-term\nsynaptic facilitation and depression,1 and spike-rate adaptation,28 through synaptic remod-\neling27 and way beyond. Adaptation has been described as the psychophysicist\u2019s electrode,\nsince it can be used as a sensitive method for revealing underlying processing mechanisms;\nthus it is both phenomenon and tool of the utmost importance.\nThat adaptation is so pervasive makes it most unlikely that a single theoretical framework\nwill be able to provide a compelling treatment. Nevertheless, adaptation should be just\nas much a tool for theorists interested in modeling neural statistical learning as for psy-\nchophysicists interested in neural processing. Put abstractly, adaptation involves short or\nlong term changes to aspects of the statistics of the environment experienced by a system.\nThus, accounts of neural plasticity driven by such statistics, even if originally conceived as\naccounts of developmental (or perhaps representational) plasticity, 19 are automatically can-\ndidate models for the course and function of adaptation. Conversely, thoughts about adap-\ntation lay at the heart of the earliest suggestions that redundancy reduction and information\nmaximization should play a central role in models of cortical unsupervised learning. 4\u20136, 8, 23\nRedundancy reduction theories of adaptation reached their apogee in the work of Linsker, 26\nAtick, Li & colleagues2, 3, 25 and van Hateren.40 Their mathematical framework (see sec-\ntion 2) is that of maximizing information transmission subject to various sources of noise\nand limitations on the strength of key signals. Noise plays the critical roles of rendering\nsome signals essentially undetectable, and providing a confusing background against which\nother signals should be ampli\ufb01ed. Adaptation, by affecting noise levels and informational\ncontent (notably probabilistic priors), leads to altered stimulus processing. Early work con-\ncentrated on the effects of sensory noise on visual receptive \ufb01elds; more recent studies 41\nhave used the same framework to study stimulus speci\ufb01c adaptation.\nRedundancy reduction is one major conceptual plank in the modern theory of unsupervised\nlearning. However, there are various other important complementary ideas, notably gen-\n\n\u0001\n\fA\n\n\u0007\t\b\n\nB\n\n\u001a\u001c\u001b\n\n\u000f\u000e\u0011\u0010\u0013\u0012\u0015\u0014\u0017\u0016\n\u001a\u001e\u001d\n\n\u0007\t\b\nFigure 1: A) Redundancy reduction model.\u001f\nis the explicit input, combining signal and noise!\nis the explicit output, to be corrupted by noise#\nto give$\n. We seek the \ufb01lter %\nredundancy subject to a power constraint. B) Factor analysis model. Now\"\n. The empirical mean is '\nof \u001f\nprior, captures latent structure underlying the covariance&\nuniquenesses(*) capture unmodeled variance and additional noise such as+-,\n. Generative/\nrecognition% weights parameterize statistical inverses.\n\nerative models.19 Here, we consider adaptation from the perspective of factor analysis, 15\nwhich is one of the most fundamental forms of generative model. After describing the fac-\ntor analysis model and its relationship with redundancy reduction models of adaptation in\nsection 3, section 4 studies loci of adaptation in one version of this model. As examples,\nwe consider adaptation of early visual receptive \ufb01elds to light levels, 38 orientation detec-\ntion to a persistent bias (the tilt aftereffect),9, 16 and a recent report of adaptation of face\ndiscrimination to morphed anti-faces. 24\n\n;\nthat minimizes\n, with a white, Gaussian,\n; the\nand\n\n2 Information Maximization\n\nFigure 1,3 shows a linear model of, for concreteness, retinal processing. Here,\n\n-\n\nis\n\nassume for simplicity that they are strictly positive).\n\n, which is the sum of a signal4 and detector\n\nfor\n, against a background of additional noise\n, and the noise\n, respectively;\nall are mutually independent. The input may be higher dimensional than the output, ie\nis a circulant\nare (discrete) sine\n, whose terms we will\nis a covariance matrix; we\n\ndimensional photoreceptor input 132547698\nnoise8\n, is \ufb01ltered by a retinal matrix to produce an : -dimensional output ;52=<>1\ncommunication down the optic nerve?@2A;B6DC\n. We assume that the signal is Gaussian, with meanE and covarianceF\nN\u0017K\nJLK\nterms are white and Gaussian, with meanE and covariancesGIH\nandGMH\n, as is true of the retina. Here, the signal is translation invariant, ieF\n0PO=:\nmatrix11 withFRQTSVUXWDY[Z]\\_^a` . This means that the eigenvectors ofF\nand cosines, with eigenvalues coming from the Fourier series for W\n(they are non-negative since F\nwrite asb-c7debfHgdihThjhkOml\nGiven no input noise (GMH\n2Bl ), the mutual information between1n2>4 and?\n\\\u0080y\u0081z\u0082w\n<}F~<>\u007fB6_G\n?\u0017w\n?tst\\5v\n4\u0013qr?tsu23v\n`\u0084\u0083\u0086\u0085\nwherev\ny\u0081z determinant of its covariance matrix). We consider maximizing this with respect to<\n\u007f\u008f\u008e . It is a conventional result in principal components analysis 12, 20 that\nH\u008a\u0089\u008b2\n;\u0088w\n<92>\u0090B\u0091B\u0092n\u0093\nis the :B\u009d\u009e:\nwhere \u0090\ndiagonal matrix with the given form, and\u0092\nis an:\u009f\u009d\u001e0 matrix whose rows are the \ufb01rst:\n(transposed) eigenvectors ofF\nweak input channels (ie those with smallbt\u00a0 ) so as fully to utilize all the output channels.\n\nKpo\ntr\u008c\u008d<}F~<\n\u0096I\u0097\u008a\u009a\nqThjhThjq\nis an arbitrary : -dimensional rotation matrix with \u0090>\u0090\n\n(1)\nis the entropy function (which, for a Gaussian distribution, is proportional to the\n, a\ncalculation which only makes sense in the face of a constraint, such as on the average power\n\nwith\n\n\u0091i\u0094 diag\u0095\n\n\u0096\u008f\u0097\u0099\u0098\n. This choice makes<}F~<\n\nthe solution to this constrained maximization problem involves whitening, ie making\n\n4-sa2xY[y{zB|\n\n(2)\n\n\u0096I\u0097j\u009b\u0017\u009c\n\n, \u0091\n\n, and effectively ampli\ufb01es\n\n\n\u0001\n\u0002\n\u0003\n\u0004\n\u0005\n\u0006\n\n\u000b\n\u0006\n\f\n\u0018\n\u0019\n\"\n\u001f\n.\n0\nC\nJ\no\no\n|\nH\nN\nK\n|\n|\nG\nH\nN\nK\nw\n\u0087\nw\nc\nq\nc\nc\n\u007f\n2\nK\n\u0093\n\u007f\n\u0094\nK\n\fA) RR\n100\n\nr\ne\nw\no\np\nr\ne\nt\nl\n\ufb01\n\n10\u22121\n\nB) FA\n100\n\n10\u22121\n\nC) RR\n5\n\nD) FA\n5\n\n0\n\nt\nc\ne\nf\nf\ne\nr\ne\nt\nf\na\n\nt\nl\ni\nt\n\n0\n\n10\u22122\n\n100\n\n10\u22122\n\n102\n\n100\n\n101\n\nfrequency\n\n101\n\n102\n\nfrequency\n\n\u22125\n\n60\n\n90\n\n\u22125\n\n60\n\n120\n\n90\n\n120\n\n\u0006\f\u000b\u000e\n\n\u0003\u001a\u0019\n\n\u0014\u0016\u0015\u0018\u0017\n\n.\u0002\u0001\n\n.\u0002\u0001\n\nangle\n\nstrength of this adaptation to the data).\n\n\u0003\u0005\u0004\u0007\u0006 ) and high (dashed+\n\nthat minimizes\nof equation 2.14 Another conventional\n, and the diagonal entries\n\nFigure 2: Simple adaptation. A;B) Filter power as a function of spatial frequency for the redundancy\nreduction (A: RR) and factor analysis (B: FA) solutions for the case of translation invariance, for low\n\ninputs. C) Data9 (crosses) and RR solution41 (solid) for the tilt aftereffect.\nD) Data (crosses) and linear approximate FA solution (solid). For FA, angle estimation is based on\n\nangle\n\u0006 ) input noise and \b\n\t\n(solid: +\n, . Even though the optimal\nFA solution does have exactly identical uniquenesses, the difference is too small to \ufb01gure. In (B), \u000f\nfactors were found for\u0010\u000e\u0011\u000e\u0012\n\u0003\u0016\u0019 . Adaptation was based\nthe linear output of the single factor; linearity breaks down for\u0013\n\u0013\u001c\u001b\u001e\u001d\non reducing the uniquenesses(M) for units activated by the adapting stimulus (\ufb01tting the width and\nJ \u001f\nIn the face of input noise, whitening is dangerous for those channels for which G\u008fH\nb-\u00a0 ,\n\u0083#\"\u009fb-\u00a0 . One heuristic is to pre-\nsince noise rather than signal would be ampli\ufb01ed by the !\n\ufb01lter1 using an0\n-dimensional matrix $\nis the prediction of4\nsuch that$I1\nH\u0005& , and then apply the<\nthe average error %\u0099w\n$*1x\\54kw\nresult12 is that $ has a similar form to<\n, except that\u0090D2\n\u0092('\u001cs\narebt\u00a0\n` . This makes the full (approximate) \ufb01lter\nof the equivalent of\u0091\nY[bf\u00a0\u001e6\nGtH\n\u0096*\u0097\n\u0096I\u0097\u008a\u009b\n\u0096I\u0097\u008a\u009a\ndiag\u0095\n\u0097j\u009b/)0+\n\u0097\u0099\u0098*),+\n\u0097\u008a\u009a.),+\n<92>\u0090B\u0091B\u0092\nqThjhThaq\nFigure 2A shows the most interesting aspect of this \ufb01lter in the case that b\n21!\u0099\u0083/2\u008bH , in-\nspired by the statistics of natural scenes,36 for which 2 might be either a temporal or spatial\nfrequency. The solid curve shows the diagonal components of \u0091\nare comparatively ampli\ufb01ed against the output noiseC\nnoise levelGtH\nconstraint and the exact value ofG\nnot transmitted at all. However, the main pattern of dependence on GIH\n\nfor small input noise.\nThis \ufb01lter is a band-pass \ufb01lter. Intermediate frequencies with input power well above the\n, On the other hand,\nthe dashed line shows the same components for high input noise. This \ufb01lter is a low-pass\n\ufb01lter, as only those few components with suf\ufb01cient input power are signi\ufb01cantly transmit-\nted. The \ufb01lter in equation 3 is based on a heuristic argument. An exact argument 2, 3 leads to\na slightly more complicated form for the optimal \ufb01lter, in which, depending on the power\n, there is a sharp cut-off in which some frequencies are\nis the same as in\n\n\ufb01gure 2A; the differences lie well outside the realm of experimental test.\nFigure 2A shows a powerful form of adaptation. 3 High relative input noise arises in cases\nof low illumination; low noise in cases of high illumination. The whole \ufb01ltering character-\nistics of the retina should change, from low-pass (smoothing in time or space) to band-pass\n(differentiation in space or time) \ufb01ltering. There is evidence that this indeed happens, with\ndendritic remodeling happening over times of the order of minutes. 42\nWainwright41 (see also10) suggested an account along exactly these lines for more stimulus-\nspeci\ufb01c forms of adaptation such as the tilt aftereffect shown in \ufb01gure 2C. Here (concep-\n\nwith\n\n(3)\n\nfew seconds, and then are asked, by one of a number of means, to assess the orientation\nof test gratings. The crosses in \ufb01gure 2C shows the error in their estimates; the adapting\n\ntually), subjects are presented with a vertical grating (3k254\norientation appears to repel nearby angles, so that true values of 3 near 4\n\nl76 ) for an adapting period of a\nl\n6 are reported\n\n,\n,\n\u0001\n\u0003\no\n\u007f\n\u0083\nJ\n\u0093\n\u0091\n\u0094\n\u0098\n\u009a\n-\nq\n\u009a\n-\n\u009a\n-\n\u009c\n\u00a0\nJ\nH\nJ\nJ\n\fas being further away. Wainwright modeled this in the light of a neural population code\nfor representing orientation and a \ufb01lter related to that of equation 3. He suggested that\nis temporarily increased. Thus, as in\nthe solid line of \ufb01gure 2A, the transmission through the adapted \ufb01lter of this signal should\nto calculate\nthe orientation of a test grating are unaware of this adaptation, then, as in the solid line of\n\ufb01gure 2C, an estimation error like that shown by the subjects will result.\n\nduring adaptation, the signal associated with 3k2 4\nbe temporarily reduced. If the recipient structures that use the equivalent of;\n\nl#6\n\n3 Factor Analysis and Adaptation\n\nwhere\n\nEMq\n\n\u0005\u0011\u0005\nand\n\n(4)\n\nqThThjhaq\f\n\npresented, and to set\nmatrix\n\n,\n\nis a set of top-down generative weights,\n\nand covariance matrix\n, and\na diagonal matrix\nthat are not represented in\n, equation 4 speci\ufb01es a Gaussian\n\nare to set\n\nthat are\nby maximizing the likelihood of the empirical covariance\nis only\n\nWe sought to understand the adaptation of equation 3 and \ufb01gure 2A in a factor analysis\nmodel. Factor analysis15 is one of the simplest probabilistic generative schemes used to\nmodel the unsupervised learning of cortical representations, and underlies many more so-\nis particularly interesting,\nbecause it is central to the relationship between factor analysis and principal components\nanalysis.20, 34, 39\nFigure 1B shows the elements of a factor analysis model (see Dayan & Abbott 12 for a\nis generated from the\n\nphisticated approaches. The case of uniform input noise GIH\nrelevant tutorial introduction). The (so-called) visible variable 1\nlatent variable; according to the two-step\n\b}2 diagY\u000b\n\n1\u0088q\t\b\u001cs with\n1Rw\n;Ms\u0002\u0001\u0004\u0003\n;76\u0007\u0006\n;ts\u0002\u0001\u0004\u0003\nq\t\u000e\u0088s is a multi-variate Gaussian distribution with mean\nis the mean of1\nof uniquenesses, which are the variances of the residuals of 1\n. Marginalizing out ;\nthe covariances associated with ;\ndistribution for1\u000f\u0001\u0010\u0003\n6\u0012\b\u001cs , and, indeed, the maximum likelihood values for the\n1\u0017q\nto the empirical mean of the 1\nparameters given some input data 1\nof the1 under a Wishart distribution with mean\n6\u0014\b\n, sinceY\ndetermined up to an:\u009f\u009d\u001e:\n\u0090}`\n\u0090B`TY\ndetermines 1\nThe generative or synthetic model of equation 4 shows how ;\nwhich maps a presented input1\n1\u008f`aq\t\u0016ks with\n\u0016L25Y\n\u0091\u0001{\u0092\n\n~2D\u008c\n\n(7)\n(8)\nare the (ordered) eigenvalues of\nrather than explicitly of the signal. Here\nhas the natural interpretation of being the average power of the unexplained components.\n\nwith the same conventions as in equation 2, except that\nthe covariance matrix\n\n\u0093 with\u0091\u0001u2 diag \n\\\u0080:*`\nof the visible variables1\n\u0096\b\u0007\n\t\n\u0093 with\u009192 diag\u0095\n\t\u0086\u0098\nGtH\nis the residual uniqueness of equation 8 in the case that G*H\n\u0093 with\u009192 diag\u0095\n\n\u0018\u0010\u000b\u0011\r\n2Db\n\u0096\b\u0007\u0081\u0097\n\u0097\u008a\u009b/),+\n\u0018\u0010\u000b\u0016\u0013\u0017\n\n69GtH\n<92>\u0090B\u0091B\u0092\n\nreally comes from a signal and noise model as in \ufb01gure 1, then\n\nThis makes the recognition weights of equation 9\n\n<92>\u0090B\u0091B\u0092\n\n\u0096\b\u0007\u000f\t\u0099\u009a\n\t\u0099\u009a\n\n\u0096\u0012\u0007\u000f\t\u0099\u009b\n\t\u008a\u009b\n\nto the common uniqueness\n\nthe maximum likelihood factor analysis solution has the property that\n\nFigure 2D shows a version of the tilt illusion coming from a factor analysis model given\npopulation coded input (with Gaussian tuning curves with an orientation bandwidth of\n\nThe similarity between this and the approximate redundancy reduction expression of equa-\ntion 3 is evident. Just like that \ufb01lter, adaptation to high and low light levels (high and low\n. The \ufb01lter\nof equation 3 was heuristic; this is exact. Also, there is no power constraint imposed; rather\n\nThis analysis is particularly well suited to the standard treatment of redundancy reduction\nto each of the\n.\nis translation invariant in this case, it need not be that\nis proportional\n. However, it is to a close approximation, and \ufb01gure 2B shows that the strength of\n(evaluated as in the \ufb01gure\ncaption) shows the same structure of adaptation as in the probabilistic principal components\n\nsignal/noise ratios), leads to a transition from bandpass to lowpass \ufb01ltering in<\nsomething similar derives from the generative model\u2019s prior over the latent variables ;\ncase of \ufb01gure 2A, since adding independent noise of the same strength GIH\ninput variables can automatically be captured by adding GIH\nHowever, even though the signal 4\ntoK\nin the maximum likelihood <\nthe principal components of F\nsolution, as a function ofGMH\n6 ) and a single factor. It is impossible to perform the full non-linear computation of\n\u0085\u0086l\nextracting an angle from the population activity 1\n1\u008f` .\nl76 , this regime comprises\nof angular variation in the input. For instance, around 3>2\ns . A close match in this model to Wainwright\u2019s41 suggestion is\nroughly 3\u0019\u0018\n6 ) that are reliably activated by\nfor the input units (around 3R254\nThis makes<\nmore sensitive to small variations in1 away from 3\u00132\n6 , and so leads to a tilt aftereffect\nWainwright\u2019s41 in orientation discrimination, boosting sensitivity near the adapted 3 and\n\nthat the uniquenesses\nan adapting stimulus should be decreased, as if the single factor would predict a greater\nproportion of the variability in the activation of those units.\nof equation 5\n\nHowever, in a regime in which a linear approximation holds, the one factor can represent the\nsystematic covariation in the activity of the population coming from the single dimension\n\nas an estimation bias. Figure 2D shows the magnitude of this effect in the linear regime.\nThis is a rough match for the data in \ufb01gure 2C. Our model also shows the same effect as\n\nin a single linear operation<5Y[1\u0080\\\n\nreducing it around half a tuning width away. 33\n\n!\u0099\u0085\u0086l\n\n6\u0086q\n\n4 Adaptation for Faces\n\nAnother, and even simpler, route to adaptation is changing\ntowards the mean of the\nrecently presented (ie the adapting) stimuli. We use this to model a recently reported effect\nof adaptation on face discrimination. 24\n\n\u001c Note that changing the mean'\n\n\u001f according to the input has no effect on the factor.\n\n\u0006\n\u0005\n\u007f\nc\n\\\nH\n\\\n\u0093\n\\\n\u0001\n\u0005\n'\n\u001f\n\u0006\n\u0093\n)\nc\n\u008e\n\u0083\n\u0002\n\u001f\n\u0013\n\n\u0098\nq\n\u009c\nh\n\u0002\n\u001f\n\u001f\n6\nJ\n\n\u0097\nJ\n\n\u0097\nJ\n\u0098\n\u0097\n\u0098\n\u009a\n-\nq\n\u009a\n\u009a\n-\n\u009b\n\u009a\n-\n\u009c\nh\nJ\nJ\n\n\b\nJ\n\u0006\n4\no\n\u001a\nl\n6\n\n\u001f\nl\n\u001b\n4\nl\n\u0006\n1\n\fn\no\ni\nt\na\nc\n\ufb01\n\ni\nt\nn\ne\nd\ni\n\nm\na\nd\nA\n\nA) Data\n1\n\nB) FA\n1\n\nC) Data\n1\n\nD) FA\n1\n\n0.5\n\ns\ne\ns\nn\no\np\ns\ne\nr\n\n0.5\n\n0\n\u22120.2\n\n0\n\n0.2\n\n0.4\n\nAdam strength\n\n0\n\u22120.2\n\n0\n\n0.2\n\n0.4\n\nAdam strength\n\n0\n\n0.2\n\n0.4\n\nHenry strength\n\nm\na\nd\nA\n0\n\u22120.2\n\n0\n\u22120.2\n\n0\n\n0.2\n\n0.4\n\nHenry strength\n\n\u0003\u000e\u0003\u0016\u0003\n\naverages over all faces, and, for FA, \u0011\n\nFigure 3: Face discrimination. Here, Adam and Henry are used for concreteness; all results are\nrandom draws. A) Experimental24 mean propensity to\nreport Adam as a function of the strength of Adam in the input for no adaptation (\u2018o\u2019); adaptation\nto anti-Adam (\u2018x\u2019); and adaptation to anti-Henry (\u2018\n\u2019). The curves are cumulative normal \ufb01ts. B)\nMean propensity in the factor analysis model for the same outcomes. The model, like some subjects,\nis more extreme than the mean of the subjects, particularly for test anti-faces. C;D) Experimental\nand model proportion of reports of Adam when adaptation was to anti-Adam; but various strengths\nof Henry are presented. The model captures the decrease in Adam given presentation of anti-Henry\nthrough a normalization pool (solid); although it does not decrease to quite the same extent as the\ndata. Just reporting the face with the largest\n\nto match the peak of the solid curve).\n\n\u0003/\u0004\u0007\u0006\n\n) (dashed) shows no decrease in reporting Adam given\n\u0010\u0003\u0002\u0007\u0006\n\n(except for the dashed line in D, for\n\n\u0003\u0005\u0004\n\n\u0011\u0003\u0002\u0005\u0004\n\nwhich\n\npresentation of anti-Henry. Here+\n\n\u0003/\u0004\n\n\u000f\t\b\n\nl\f\u000b\n\nof the way to the target face.\n\nLeopold and his colleagues24 studied adaptation in the complex stimulus domain of faces.\nTheir experiment involved four target faces (associated with names \u2018Adam\u2019, \u2018Henry\u2019, \u2018Jim\u2019,\n\u2018John\u2019) which were previously unfamiliar to subjects, together with morphed versions of\nthese faces lying on \u2018lines\u2019 going through the target faces and the average of all four faces.\nThese interpolations were made visually sensible using a dense correspondence map be-\ntween the faces. The task for the subjects was always to identify which of the four faces was\npresented; this is obviously impossible at the average face, but becomes progressively eas-\nier as the average face is morphed progressively further (by an amount called its strength)\ntowards one of the target faces. The circles in \ufb01gure 3A show the mean performance of the\nsubjects in choosing the correct face as a function of its strength; performance is essentially\nperfect\nA negative strength version of one of the target faces (eg anti-Adam) was then shown to\nseconds before one of the positive strength faces was shown as a test.\nthe subjects for\nThe other two lines in \ufb01gure 3A show that the effect of adaptation is to boost the effective\nstrength of the given face (Adam), since (crosses) the subjects were much readier to report\nAdam, even for the average face (which contains no identity information), and much less\nready to report the other faces even if they were actually the test stimulus (shown by the\nsquares). As for the tilt aftereffect, discrimination is biased away from the adapted stimulus.\nFigure 3C shows that adapting to anti-Adam offers the greatest boost to the event that Adam\nis reported to a test face (say Henry) that is not Adam, at the average face. Reporting Adam\nfalls off if either increasing strengths of Henry or anti-Henry are presented. That presenting\nHenry should decrease the reporting of Adam is obvious, and is commented on in the paper.\nHowever, that presenting anti-Henry should decrease the reporting of Adam is less obvious,\nsince, by removing Henry as a competitor, one might have expected Adam to have received\nan additional boost.\nFigure 3B;D shows our factor analysis model of these results. Here, we consider a case with\n\nvisible units, and\n\nfactors, one for each face, with generative weights\n\ngoverning the input activity associated with full strength versions of each face generated\nfrom independent\n\n\t\u000f AdamqThjhTh\n` distributions. In this representation, morphing is easy, consist-\n\nY\u008dE*q\n\n\u0085\u000e\n\n\n\u0001\n\u0001\n\u0001\n\u0001\n\u0003\n\u0017\n\u0006\n\u0001\n\u0003\n\n\u0005\n2\n\u0001\n\u0003\nK\n\fis the strength and\n\nas the probability that face\n\nis the strength of the adapting stimulus.\n\nwhere\n, the angle between the \u000f\n\n\u000f Adam 6\u0003\u0002\n2\u0001\n