{"title": "Correlations strike back (again): the case of associative memory retrieval", "book": "Advances in Neural Information Processing Systems", "page_first": 288, "page_last": 296, "abstract": "It has long been recognised that statistical dependencies in neuronal activity need to be taken into account when decoding stimuli encoded in a neural population. Less studied, though equally pernicious, is the need to take account of dependencies between synaptic weights when decoding patterns previously encoded in an auto-associative memory. We show that activity-dependent learning generically produces such correlations, and failing to take them into account in the dynamics of memory retrieval leads to catastrophically poor recall. We derive optimal network dynamics for recall in the face of synaptic correlations caused by a range of synaptic plasticity rules. These dynamics involve well-studied circuit motifs, such as forms of feedback inhibition and experimentally observed dendritic nonlinearities. We therefore show how addressing the problem of synaptic correlations leads to a novel functional account of key biophysical features of the neural substrate.", "full_text": "Correlations strike back (again): the case of\n\nassociative memory retrieval\n\nCristina Savin1\n\ncs664@cam.ac.uk\n\nPeter Dayan2\n\ndayan@gatsby.ucl.ac.uk\n\nM\u00b4at\u00b4e Lengyel1\n\nm.lengyel@eng.cam.ac.uk\n\n1Computational & Biological Learning Lab, Dept. Engineering, University of Cambridge, UK\n\n2Gatsby Computational Neuroscience Unit, University College London, UK\n\nAbstract\n\nIt has long been recognised that statistical dependencies in neuronal activity need\nto be taken into account when decoding stimuli encoded in a neural population.\nLess studied, though equally pernicious, is the need to take account of dependen-\ncies between synaptic weights when decoding patterns previously encoded in an\nauto-associative memory. We show that activity-dependent learning generically\nproduces such correlations, and failing to take them into account in the dynamics\nof memory retrieval leads to catastrophically poor recall. We derive optimal net-\nwork dynamics for recall in the face of synaptic correlations caused by a range of\nsynaptic plasticity rules. These dynamics involve well-studied circuit motifs, such\nas forms of feedback inhibition and experimentally observed dendritic nonlineari-\nties. We therefore show how addressing the problem of synaptic correlations leads\nto a novel functional account of key biophysical features of the neural substrate.\n\n1\n\nIntroduction\n\nAuto-associative memories have a venerable history in computational neuroscience. However, it is\nonly rather recently that the statistical revolution in the wider \ufb01eld has provided theoretical traction\nfor this problem [1]. The idea is to see memory storage as a form of lossy compression \u2013 information\non the item being stored is mapped into a set of synaptic changes \u2013 with the neural dynamics during\nretrieval representing a biological analog of a corresponding decompression algorithm. This implies\nthere should be a tight, and indeed testable, link between the learning rule used for encoding and the\nneural dynamics used for retrieval [2].\nOne issue that has been either ignored or trivialized in these treatments of recall is correlations\namong the synapses [1\u20134] \u2013 beyond the perfect (anti-)correlations emerging between reciprocal\nsynapses with precisely (anti-)symmetric learning rules [5]. There is ample experimental data for\nthe existence of such correlations: for example, in rat visual cortex, synaptic connections tend to\ncluster together in the form of overrepresented patterns, or motifs, with reciprocal connections being\nmuch more common than expected by chance, and the strengths of the connections to and from\neach neuron being correlated [6]. The study of neural coding has indicated that it is essential to\ntreat correlations in neural activity appropriately in order to extract stimulus information well [7\u2013\n9]. Similarly, it becomes pressing to examine the nature of correlations among synaptic weights in\nauto-associative memories, the consequences for retrieval of ignoring them, and methods by which\nthey might be accommodated.\n\n1\n\n\fHere, we consider several well-known learning rules, from simple additive ones to bounded synapses\nwith metaplasticity, and show that, with a few signi\ufb01cant exceptions, they induce correlations be-\ntween synapses that share a pre- or a post-synaptic partner. To assess the importance of these de-\npendencies for recall, we adopt the strategy of comparing the performance of decoders which either\ndo or do not take them into account [10], showing that they do indeed have an important effect on\nef\ufb01cient retrieval. Finally, we show that approximately optimal retrieval involves particular forms\nof nonlinear interactions between different neuronal inputs, as observed experimentally [11].\n\n2 General problem formulation\n\nWe consider a network of N binary neurons that enjoy all-to-all connectivity.1 As is conventional,\nand indeed plausibly underpinned by neuromodulatory interactions [12], we assume that network\ndynamics do not play a role during storage (with stimuli being imposed as patterns of activity on the\nneurons), and that learning does not occur during retrieval.\nTo isolate the effects of different plasticity rules on synaptic correlations from other sources of\ncorrelations, we assume that the patterns of activity inducing the synaptic changes have no particular\nstructure, i.e. their distribution factorizes. For further simplicity, we take these activity patterns to\nbe binary with pattern density f, i.e. a prior over patterns de\ufb01ned as:\n\n(cid:89)\n\ni\n\nPstore(x) =\n\nPstore(xi)\n\nPstore(xi) = f xi \u00b7 (1 \u2212 f )1\u2212xi\n\n(1)\n\nDuring recall, the network is presented with a cue, \u02dcx, which is a noisy or partial version of one\nof the originally stored patterns. Network dynamics should complete this partial pattern, using the\ninformation in the weights W (and the cue). We start by considering arbitrary dynamics; later we\nimpose the critical constraint for biological realisability that they be strictly local, i.e. the activity of\nneuron i should depend exclusively on inputs through incoming synapses Wi,\u00b7.\nSince information storage by synaptic plasticity is lossy, recall is inherently a probabilistic inference\nproblem [1, 13] (Fig. 1a), requiring estimation of the posterior over patterns, given the information\nin the weights and the recall cue:\n\nP (x|W, \u02dcx) \u221d Pstore(x) \u00b7 Pnoise(\u02dcx|x) \u00b7 P(W|x)\n\n(2)\nThis formulation has formed the foundation of recent work on constructing ef\ufb01cient autoassociative\nrecall dynamics for a range of different learning rules [2\u20134]. In this paper, we focus on the last term\nP(W|x), which expresses the probability of obtaining W as the synaptic weight matrix when x is\nstored along with T \u2212 1 random patterns (sampled from the prior, Eq. 1). Critically, this is where\nwe diverge from previous analyses that assumed this distribution was factorised, or only trivially\ncorrelated due to reciprocal synapses being precisely (anti-)symmetric [1, 2, 4]. In contrast, we\nexplicitly study the emergence and effects of non-trivial correlations in the synaptic weight matrix-\ndistribtion, because almost all synaptic plasticity rules induce statistical dependencies between the\nsynaptic weights of each neuron (Fig. 1a, d).\nThe inference problem expressed by Eq. 2 can be translated into neural dynamics in several ways\n\u2013 dynamics could be deterministic, attractor-like, converging to the most likely pattern (a MAP\nestimate) of the distribution of x [2], or to a mean-\ufb01eld approximate solution [3]; alternatively, the\ndynamics could be stochastic, with the activity over time representing samples from the posterior,\nand hence implicitly capturing the uncertainty associated with the answer [4]. We consider the latter.\nSince we estimate performance by average errors, the optimal response is the mean of the posterior,\nwhich can be estimated by integrating the activity of the network during retrieval.\nWe start by analysing the class of additive learning rules, to get a sense for the effect of correla-\ntions on retrieval. Later, we focus on multi-state synapses, for which learning rules are described\nby transition probabilities between the states [14]. These have been used to capture a variety of\nimportant biological constraints such as bounds on synaptic strengths and metaplasticity, i.e. the\nfact that synaptic changes induced by a certain activity pattern depend on the history of activity at\nthe synapse [15]. The two classes of learning rule are radically different; so if synaptic correlations\nmatter during retrieval in both cases, then the conclusion likely applies in general.\n\n1Complete connectivity simpli\ufb01es the computation of the parameters for the optimal dynamics for cascade-\n\nlike learning rules considered in the following, but is not necessary for the theory.\n\n2\n\n\fFigure 1: Memory recall as inference and additive learning rules. a. Top: Synaptic weights,\nW, arise by storing the target pattern x together with T \u22121 other patterns, {x(t)}t=1...T\u22121. During\nrecall, the cue, \u02dcx, is a noisy version of the target pattern. The task of recall is to infer x given W\nand \u02dcx (by marginalising out {x(t)}). Bottom: The activity of neuron i across the stored patterns is\na source of shared variability between synapses connecting it to neurons j and k. b-c. Covariance\nrule: patterns of synaptic correlations and recall performance for retrieval dynamics ignoring or\nconsidering synaptic correlations; T = 5. d-e. Same for the simple Hebbian learning rule. The\ncontrol is an optimal decoder that ignores W.\n3 Additive learning rules\n\ni\n\nWij =(cid:80)\n\n, x(t)\n\nt \u2126(x(t)\n\nLocal additive learning rules assume that synaptic changes induced by different activity patterns\ncombine additively; such that storing a sequence of T patterns from Pstore(x), results in weights\nj ), with function \u2126(xi, xj) describing the change in synaptic strength induced\nby presynaptic activity xj and postsynaptic activity xi. We consider a generalized Hebbian form for\nthis function, with \u2126 (xi, xj) = (xi \u2212 \u03b1)(xj \u2212 \u03b2). This class includes, for example, the covariance\nrule (\u03b1 = \u03b2 = f), classically used in Hop\ufb01eld networks, or simple Hebbian learning (\u03b1 = \u03b2 = 0).\nAs synaptic changes are deterministic, the only source of uncertainty in the distribution P(W|x)\nis the identity of the other stored patterns. To estimate this, let us \ufb01rst consider the distribution of\nthe weights after storing one random pattern from Pstore(x). The mean \u00b5 and covariance C of the\nweight change induced by this event can be computed as:2\n\nPstore(x)(cid:0)\u2126|(x) \u00b7 \u2126|(x)T(cid:1) dx \u2212 \u00b5 \u00b7 \u00b5T\n\n\u00b5 =\n\nPstore(x)\u2126|(x)dx,\n\nC =\n\n(cid:90)\n\n(cid:90)\n\nSince the rule is additive and the patterns are independent, the mean and covariance scale linearly\nwith the number of intervening patterns. Hence, the distribution over possible weight values at\nrecall, given that pattern x is stored along with T \u2212 1 other, random, patterns has mean \u00b5W =\n\u2126(x) + (T \u2212 1) \u00b7 \u00b5, and covariance CW = (T \u2212 1) \u00b7 C. Most importantly, because the rule is\nadditive, in the limit of many stored patterns (and in practice even for modest values of T ), the\ndistribution P(W|x) approaches a multivariate Gaussian that is characterized completely by these\ntwo quantities; moreover, its covariance is independent of x.\nFor retrieval dynamics based on Gibbs sampling, the key quantity is the log-odds ratio\n\n(3)\n\n(4)\n\n(cid:18) P(xi = 1|x\u00aci, W, \u02dcx)\n\nP(xi = 0|x\u00aci, W, \u02dcx)\n\n(cid:19)\n\nIi = log\n\nfor neuron i, which could be represented by the total current entering the unit. This would translate\ninto a probability of \ufb01ring given by the sigmoid activation function f (Ii) = 1/(1 + e\u2212Ii ).\nThe total current entering a neuron is a sum of two terms: one term from the external input of the\nform c1 \u00b7 \u02dcxi + c2 (with constants c1 and c2 determined by parameters f and r [16]), and one term\nfrom the recurrent input, of the form:\n\n(cid:17)\u2212(cid:16)\n\n(cid:17)T\n\nC\u22121(cid:16)\n\nW| \u2212 \u00b5(0)\n\nW\n\nW| \u2212 \u00b5(1)\n\nW\n\nW| \u2212 \u00b5(1)\n\nW\n\n(5)\n\n(cid:17)(cid:19)\n\n(cid:18)(cid:16)\n\n(cid:17)T\n\nC\u22121(cid:16)\n\nI rec\ni =\n\n1\n\n2(T \u22121)\n\nW| \u2212 \u00b5(0)\n\nW\n\n2For notational convenience, we use a column-vector form of the matrix of weight changes \u2126, and the\n\nweight matrix W, marked by subscript |.\n\n3\n\n00.51cortical data (Song 2005)simple Hebb rulecorrcovariance ruleabde25501000102030error (%)controlN25501000510simple (ignoring correlations)exact (considering correlations)error (%)controlN00.51corrc\fW = \u2126|(x(0/1))+(T\u22121)\u00b5 and x(0/1) is the vector of activities obtained from x in which\nwhere \u00b5(0/1)\nthe activity of neuron i is set to 0, or 1, respectively.\nIt is easy to see that for the covariance rule, \u2126 (xi, xj) = (xi \u2212 f )(xj \u2212 f ), synapses sharing\na single pre- or post-synaptic partner happen to be uncorrelated (Fig. 1b). Moreover, as for any\n(anti-)symmetric additive learning rule, reciprocal connections are perfectly correlated (Wij = Wji).\nThe (non-degenerate part of the) covariance matrix in this case becomes diagonal, and the total\ncurrent in optimal retrieval reduces to simple linear dynamics :\n\n(cid:88)\n\nj\n\n(cid:88)\n(cid:123)(cid:122)\n\nj\n\nWij\n\n(cid:125)\n\nxj\n\n(cid:125)\n\n\u2212 f\n\n(cid:124)\n\n\u2212 f 2 1 \u2212 2f\n(cid:124)\n(cid:123)(cid:122)\n(cid:125)\n\n2\n\nconstant\n\n(cid:19)\n\n(6)\n\nIi =\n\n1\n\n(T \u2212 1) \u03c32\n\nW\n\n(cid:18)(cid:88)\n(cid:124)\n\nj\n\n\u2212 (1 \u2212 2f )2\n(cid:123)(cid:122)\n(cid:124)\n\n2\n\n(cid:125)\n\nWijxj\n\n(cid:123)(cid:122)\n\nrecurrent input\n\nfeedback inhibition\n\nhomeostatic term\n\nwhere \u03c32\nW is the variance of a synaptic weight resulting from storing a single pattern. This term\nincludes a contribution from recurrent excitatory input, dynamic feedback inhibition (proportional\nto the total population activity) and a homeostatic term that reduces neuronal excitability as function\nof the net strength of its synapses (a proxy for average current the neuron expects to receive) [17].\nReassuringly, the optimal decoder for the covariance rule recovers a form for the input current that is\nclosely related to classic Hop\ufb01eld-like [5] dynamics (with external \ufb01eld [1, 18]): feedback inhibition\nis needed only when the stored patterns are not balanced (f (cid:54)= 0.5); for the balanced case, the\nhomeostatic term can be integrated in the recurrent current, by rewriting neural activities as spins.\nIn sum, for the covariance rule, synapses are fortuitously uncorrelated (except for symmetric pairs\nwhich are perfectly correlated), and thus simple, classical linear recall dynamics suf\ufb01ce (Fig. 1c).\nThe covariance rule is, however, the exception rather than the rule. For example, for simple Hebbian\nlearning, \u2126 (xi, xj) = xi\u00b7xj, synapses sharing a pre- or post-synaptic partner are correlated (Fig. 1d)\nInterestingly, the \ufb01nal expression of the\nand so the covariance matrix C is no longer diagonal.\nrecurrent current to a neuron remains strictly local (because of additivity and symmetry), and very\nsimilar to Eq. 6, but feedback inhibition becomes a non-linear function of the total activity in the\nnetwork [16]. In this case, synaptic correlations have a dramatic effect: using the optimal non-linear\ndynamics ensures high performance, but trying to retrieve information using a decoder that assumes\nsynaptic independence (and thus uses linear dynamics) yields extremely poor performance, which\nis even worse than the obvious control of relying only on the information in the recall cue and the\nprior over patterns (Fig. 1e).\nFor the generalized Hebbian case, \u2126 (xi, xj) = (xi\u2212\u03b1)(xj\u2212\u03b2) with \u03b1(cid:54)= \u03b2, the optimal decoder be-\ncomes even more complex, with the total current including additional terms accounting for pairwise\ncorrelations between any two synapses that have neuron i as a pre- or post-synaptic partner [16].\nHence, retrieval is no longer strictly local3 and a biological implementation will require approximat-\ning the contribution of non-local terms as a function of locally available information, as we discuss\nin detail for palimpsest learning below.\n\n4 Palimpsest learning rules\n\nThough additive learning rules are attractive for their analytical tractability, they ignore several im-\nportant aspects of synaptic plasticity, e.g. they assume that synapses can grow without bound. We\ninvestigate the effects of bounded weights by considering another class of learning rules, which as-\nsumes synaptic ef\ufb01cacies can only take binary values, with stochastic transitions between the two\nunderpinned by paired cascades of latent internal states [14] (Fig. 2). These learning rules, though\nvery simple, capture an important aspect of memory \u2013 the fact that memory is leaky, and information\nabout the past is overwritten by newly stored items (usually referred to as the palimpsest property).\nAdditionally, such rules can account for experimentally observed synaptic metaplasticity [15].\n\n3For additive learning rules, the current to neuron i always depends only on synapses local to a neuron, but\nthese can also include outgoing synapses of which the weight, W\u00b7i, should not in\ufb02uence its dynamics. We refer\nto such dynamics as \u2018semi-local\u2019. For other learning rules, the optimal current to neuron i may depend on all\nconnections in the network, including Wjk with j, k(cid:54)= i (\u2018non-local\u2019 dynamics).\n\n4\n\n\fFigure 2: Palimpsest learning. a. The cascade model. Colored circles are latent states (V ) that\nbelong to two different synaptic weights (W ), arrows are state transitions (blue: depression, red:\npotentiation) b. Different variants of mapping pre- and post-synaptic activations to depression (D)\nand potentiation (P): R1\u2013postsynaptically gated, R2\u2013presynaptically gated, R3\u2013XOR rule. c. Cor-\nrelation structure induced by these learning rules. c. Retrieval performance for each rule.\nLearning rule\n\nLearning is stochastic and local, with changes in the state of a synapse Vij being determined only by\nthe activation of the pre- and post-synaptic neurons, xj and xi. In general, one could de\ufb01ne separate\ntransition matrices for each activity pattern, M(xi, xj), describing the probability of a synaptic state\ntransitioning between any two states Vij to V (cid:48)\nij following an activity pattern, (xi, xj). For simplicity,\nwe de\ufb01ne only two such matrices, for potentiation, M+, and depression, M\u2212, respectively, and then\nmap different activity patterns to these events. In particular, we assume Fusi\u2019s cascade model [14]4\nand three possible mappings (Fig. 2b [16]): 1) a postsynaptically gated learning rule, where changes\noccur only when the postsynaptic neuron is active, with co-activation of pre- and post- neuron lead-\ning to potentiation, and to depression otherwise5; 2) a presynaptically gated learning rule, typically\nassumed when analysing cascades[20, 21]; and 3) an XOR-like learning rule which assumes po-\ntentiation occurs whenever the pre- and post- synaptic activity levels are the same, with depression\notherwise. The last rule, proposed by Ref. 22, was speci\ufb01cally designed to eliminate correlations\nbetween synapses, and can be viewed as a version of the classic covariance rule fashioned for binary\nsynapses.\n\nEstimating the mean and covariance of synaptic weights\n\nages over possible neural activity patterns: M = (cid:80)\n\nAt the level of a single synapse, the presentation of a sequence of uncorrelated patterns from\nPstore(x) corresponds to a Markov random walk, de\ufb01ned by a transition matrix M, which aver-\nPstore(xi) \u00b7 Pstore(xj) \u00b7 M(xi, xj). The\ndistribution over synaptic states t steps after the initial encoding can be calculated by starting from\nthe stationary distribution of the weights \u03c0V 0 (assuming a large number of other patterns have pre-\nviously been stored; formally, this is the eigenvector of M corresponding to eigenvalue \u03bb = 1), then\nstoring the pattern (xi, xj), and \ufb01nally t \u2212 1 other patterns from the prior:\n\nxi,xj\n\nt\u22121 \u00b7 M(xi, xj) \u00b7 \u03c0V 0,\n\n\u03c0V (xi, xj, t) = M\n\n(7)\nl = P(Vij = l|xi, xj), l \u2208 {1 . . . 2n},\nwith the distribution over states given as a column vector, \u03c0V\nwhere n is the depth of the cascade. Lastly, the distribution over weights, P(Wij|xi, xj), can be\nderived as \u03c0W = MV \u2192W \u00b7 \u03c0V , where MV \u2192W is a deterministic map from states to observed\nweights (Fig. 2a).\nAs in the additive case, the states of synapses sharing a pre- or post- synaptic partner will be corre-\nlated (Figs. 1a, 2c). The degree of correlations for different synaptic con\ufb01gurations can be estimated\nby generalising the above procedure to computing the joint distribution of the states of pairs of\nsynapses, which we represent as a matrix \u03c1. E.g. for a pair of synapses sharing a postsynaptic\npartner (Figs. 1b, d, and 2c), element (u, v) is \u03c1uv = P(Vpost,pre1 = u, Vpost,pre2 = v). Hence, the\npresentation of an activity pattern (xpre1, xpre2, xpost) induces changes in the corresponding pair of\n\n4Other models, e.g. serial [19], could be used as well without qualitatively affecting the results.\n5One could argue that this is the most biologically relevant as plasticity is often NMDA-receptor dependent,\n\nand hence it requires postsynaptic depolarisation for any effect to occur.\n\n5\n\nacorrelation coefficientcortex data (Song 2005)PDPD--PDprepost0011R1R3--PDR2bsimpledynamicscorr-dependentdynamicsd00.20.4c010200102001020error (%) correlated synapsesexactapprox**pseudo-storage00.30.600.20.4\fwhere \u02c6M(xi) = (cid:80)\n\nincoming synapses to neuron post as \u03c1(1) = M(xpost, xpre1) \u00b7 \u03c1(0) \u00b7 M(xpost, xpre2)T, where \u03c1(0)\nis the stationary distribution corresponding to storing an in\ufb01nite number of triplets from the pattern\ndistribution [16].\nReplacing \u03c0V with \u03c1 (which is now a function of the triplet (xpre1, xpre2, xpost)), and the multipli-\ncation by M with the slightly more complicated operator above, we can estimate the evolution of\nthe joint distribution over synaptic states in a manner very similar to Eq. 7:\nPstore(xi) \u00b7 \u02c6M(xi) \u00b7 \u03c1(t\u22121) \u00b7 \u02c6M(xi)T,\n\n(cid:88)\n\n\u03c1(t) =\n\n(8)\n\nxi\n\nxj\n\nPstore(xj)M(xi, xj). Also as above, the \ufb01nal joint distribution over states\nV \u2192W . This\n\ncan be mapped into a joint distribution over synaptic weights as MV \u2192W \u00b7 \u03c1(t) \u00b7 MT\napproach can be naturally extended to all other correlated pairs of synapses [16].\nThe structure of correlations for different synaptic pairs varies signi\ufb01cantly as a function of the\nlearning rule (Fig. 2c), with the overall degree of correlations depending on a range of factors.\nCorrelations tend to decrease with cascade depth and pattern sparsity. The \ufb01rst two variants of the\nlearning rule considered are not symmetric, and so induce different patterns of correlations than the\nadditive rules above. The XOR rule is similar to the covariance rule, but the reciprocal connections\nare no longer perfectly correlated (due to metaplasticity), which means that it is no longer possible\nto factorize P(W|x). Hence, assuming independence at decoding seems bound to introduce errors.\n\nApproximately optimal retrieval when synapses are independent\n\n(cid:81)\nthe evidence from the weights factorizes, P(W|x) =\nIf we ignore synaptic correlations,\ni,j P(Wij|xi, xj), and so the exact dynamics would be semi-local3. We can further approximate\nthe contribution of the outgoing weights by its mean, which recovers the same simple dynamics\nderived for the additive case:\n\n(cid:18) P(xi = 1|x\u00aci, W, \u02dcx)\n\nP(xi = 0|x\u00aci, W, \u02dcx)\n\n(cid:19)\n\n(cid:88)\n\nj\n\nIi = log\n\n= c1\n\nWijxj + c2\n\nWij + c3\n\nj\n\nj\n\nxj + c4 \u02dcxi + c5 (9)\n\n(cid:88)\n\n(cid:88)\n\nThe parameters c. depend on the prior over x, the noise model, the parameters of the learning rule\nand t. Again, the optimal decoder is similar to previously derived attractor dynamics; in particular,\nfor stochastic binary synapses with presynaptically gated learning the optimal dynamics require\ndynamic inhibition only for sparse patterns, and no homeostatic term, as used in [21] .\nTo validate these dynamics, we remove synaptic correlations by a pseudo-storage procedure in which\nsynapses are allowed to evolve independently according to transition matrix M, rather than changing\nas actual intermediate patterns are stored. The dynamics work well in this case, as expected (Fig. 2d,\nblue bars). However, when storing actual patterns drawn from the prior, performance becomes ex-\ntremely poor, and often worse than the control (Fig. 2d, gray bars). Moreover, performance worsens\nas the network size increases (not shown). Hence, ignoring correlations is highly detrimental for this\nclass of learning rules too.\n\n(cid:88)\n\n(cid:18)(cid:88)\n\nApproximately optimal retrieval when synapses are correlated\nTo accommodate synaptic correlations, we approximate P(W|x) with a maximum entropy dis-\ntribution with the same marginals and covariance structure, ignoring the higher order moments.6\nSpeci\ufb01cally, we assume the evidence from the weights has the functional form:\n\nP(W|x, t) =\n\n1\n\nZ(x, t)\n\nexp\n\nkij(x, t) \u00b7 Wij +\n\n1\n2\n\nij\n\nJ(ij)(kl)(x, t) \u00b7 WijWkl\n\nijkl\n\n(10)\n\nWe use the TAP mean-\ufb01eld method [23] to \ufb01nd parameters k and J and the partition function, Z,\nfor each possible activity pattern x, given the mean and covariance for the synaptic weights matrix,\ncomputed above7 [16].\n\n6This is just a generalisation of the simple dynamics which assume a \ufb01rst order max entropy model; more-\nover, the resulting weight distribution is a binary analog of the multivariate normal used in the additive case,\nallowing the two to be directly compared.\n\n7Here, we ask whether it is possible to accommodate correlations in appropriate neural dynamics at all,\n\nignoring the issue of how the optimal values for the parameters of the network dynamics would come about.\n\n6\n\n(cid:19)\n\n\fi\n\nnumber of coactivated inputs,(cid:80)\n\nj Wijxj; lines: different levels of neural excitability(cid:80)\n\nFigure 3: Implications for neural dynamics. a. R1: parameters for I rec\n; linear modulation by\nnetwork activity, nb. b. R2: nonlinear modulation of pairwise term by network activity (cf. middle\npanel in a); other parameters have linear dependences on nb. c. R1: Total current as function of\nj Wij, line\nwidths scale with frequency of occurrence in a sample run. d. Same for R2. e. Nonlinear integration\nin dendrites, reproduced from [11], cf. curves in c.\nExact retrieval dynamics based on Eq. 10, but not respecting locality constraints, work substantially\nbetter in the presence of synaptic correlations, for all rules (Fig. 2d, yellow bars). It is important to\nnote that for the XOR rule, which was supposed to be the closest analog to the covariance rule and\nhence afford simple recall dynamics [22], error rates stay above control, suggesting that it is actually\na case in which even dependencies beyond 2nd-order correlation would need to be considered.\nAs in the additive case, exact recall dynamics are biologically implausible, as the total current to\nthe neuron depends on the full weight matrix. It is possible to approximate the dynamics using\nstrictly local information by replacing the nonlocal term by its mean, which, however, is no longer a\nj(cid:54)=i xj [16]. Under\nthis approximation, the current from recurrent connections corresponding to the evidence from the\nweights becomes:\n\nconstant, but rather a linear function of the total activity in the network, nb =(cid:80)\n\n(cid:52)\n(ij)(ik)(x)WijWik \u2212 Z(cid:52)\n\nJ\n\n(11)\n\n(cid:18) P(W|x(1))\n\nP(W|x(0))\n\n(cid:19)\n\n(cid:88)\n\n=\n\nI rec\ni = log\n\n(cid:88)\n\n1\n2\n\njk\n\n(cid:52)\nk\nij (x)Wij +\n\nj\n\nwhere i is the index of the neuron to be updated, and x(0/1) activity vector has the to-be-updated\nneuron\u2019s activity set to 1 or 0, respectively, and all other components given by the current network\nstate. The functions k\n\n(cid:0)x(0)(cid:1),\nand Z(cid:52) = log(cid:0)Z(cid:0)x(1)(cid:1)(cid:1) \u2212 log(cid:0)Z(cid:0)x(0)(cid:1)(cid:1) depend on the local activity at the indexed synapses,\n\n(cid:0)x(1)(cid:1)\u2212J(ij)(kl)\n\n(cid:52)\nij (x) = kij(x(1))\u2212kij(x(0)), J\n\n(cid:52)\n(ij)(kl)(x) = J(ij)(kl)\n\n(formally, quadratically) on the number of co-active inputs, nW 1 = (cid:80)\n\nmodulated by the number of active neurons in the network, nb. This approximation is again consis-\ntent with our previous analysis, i.e. in the absence of synaptic correlations, the complex dynamics\nrecover the simple case presented before. Importantly, this approximation also does about as well as\nexact dynamics (Fig. 2d, red bars).\nFor post-synaptically gated learning, comparing the parameters of the dynamics in the case of inde-\npendent versus correlated synapses (Fig. 3a) reveals a modest modulation of the recurrent input by\nthe total activity. More importantly, the net current to the postsynaptic neuron depends non-linearly\nj xjWij, (Fig. 3c), which\nis reminiscent of experimentally observed dendritic non-linearities [11] (Fig. 3e). Conversely, for\nthe presynaptically gated learning rule, approximately optimal dynamics predict a non-monotonic\nmodulation of activity by lateral inhibition (Fig. 3b), but linear neural integration (Fig. 3d).8 Lastly,\nretrieval based on the XOR rule has the same form as the simple dynamics derived for the factorized\ncase [16]. However, the total current has to be rescaled to compensate for the correlations introduced\nby reciprocal connections.\n\n8The difference between the two rules emerges exclusively because of the constraint of strict locality of the\n\napproximation, since the exact form of the dynamics is essentially the same for the two.\n\n7\n\n01020\u22120.500.501020\u221210\u221250501020\u22120.0500.05024681012\u221220612number of coactive inputspostsynaptic currentacdno corrcorrnumber of coactive inputspostsynaptic current024681012\u221210\u22125051001020\u22120.0100.01beTIPMIDDLEBASEnumber of inputsnormalized EPSP0.00.20.40.60.81.001234567\fadditive\n\ncascade\n\nRULE\ncovariance\nsimple Hebbian\ngeneralized Hebbian\npresyn. gated\npostsyn. gated\nXOR\n\nEXACT DYNAMICS\nstrictly local, linear\nstrictly local, nonlinear\nsemi-local, nonlinear\nnonlocal, nonlinear\nnonlocal, nonlinear\nbeyond correlations\n\nNEURAL IMPLEMENTATION\n\nlinear feedback inh., homeostasis\nnonlinear feedback inh.\nnonlinear feedback inh.\nnonlinear feedback inh., linear dendritic integr.\nlinear feedback inh., non-linear dendritic integr.\n?\n\nTable 1: Results summary: circuit adaptations against correlations for different learning rules.\n\n5 Discussion\n\nStatistical dependencies between synaptic ef\ufb01cacies are a natural consequence of activity dependent\nsynaptic plasticity, and yet their implications for network function have been unexplored. Here, in\nthe context of an auto-associative memory network, we investigated the patterns of synaptic corre-\nlations induced by several well-known learning rules and their consequent effects on retrieval. We\nshowed that most rules considered do indeed induce synaptic correlations and that failing to take\nthem into account greatly damages recall. One fortuitous exception is the covariance rule, for which\nthere are no synaptic correlations. This might explain why the bulk of classical treatments of auto-\nassociative memories, using the covariance rule, could achieve satisfying capacity levels despite\noverlooking the issue of synaptic correlations [5, 24, 25].\n\nIn general, taking correlations into account optimally during recall requires dynamics in which there\nare non-local interactions between neurons. However, we derived approximations that perform well\nand are biologically realisable without such non-locality (Table 1). Examples include the modula-\ntion of neural responses by the total activity of the population, which could be mediated by feedback\ninhibition, and speci\ufb01c dendritic nonlinearities. In particular, for the post-synaptically gated learn-\ning rule, which may be viewed as an abstract model of hippocampal NMDA receptor-dependent\nplasticity, our model predicts a form of non-linear mapping of recurrent inputs into postsynaptic\ncurrents which is similar to experimentally observed dendritic integration in cortical pyramidal cells\n[11]. In general, the tight coupling between the synaptic plasticity used for encoding (manifested\nin patterns of synaptic correlations) and circuit dynamics offers an important route for experimental\nvalidation [2].\nNone of the rules governing synaptic plasticity that we considered perfectly reproduced the pattern\nof correlations in [6]; and indeed, exactly which rule applies in what region of the brain under which\nneuromodulatory in\ufb02uences is unclear. Furthermore, results in [6] concern the neocortex rather\nthan the hippocampus, which is a more common target for models of auto-associative memory.\nNonetheless, our analysis has shown that synaptic correlations matter for a range of very different\nlearning rules that span the spectrum of empirical observations.\nAnother strategy to handle the negative effects of synaptic correlations is to weaken or eliminate\nthem. For instance, in the palimpsest synaptic model [14], the deeper the cascade, the weaker the\ncorrelations, and so metaplasticity may have the bene\ufb01cial effect of making recall easier. Another,\npopular, idea is to use very sparse patterns [21], although this reduces the information content of\neach one. More speculatively, one might imagine a process of off-line synaptic pruning or recoding,\nin which strong correlations are removed or the weights adjusted so that simple recall methods will\nwork.\nHere, we focused on second-order correlations. However, for plasticity rules such as XOR, we\nshowed that this does not suf\ufb01ce. Rather, higher-order correlations would need to be considered,\nand thus, presumably higher-order interactions between neurons approximated. Finally, we know\nfrom work on neural coding of sensory stimuli that there are regimes in which correlations either\nhelp or hurt the informational quality of the code, assuming that decoding takes them into account.\nGiven our results, it becomes important to look at the relative quality of different plasticity rules,\nassuming realizable decoding \u2013 it is not clear whether rules that strive to eliminate correlations will\nbe bested by ones that do not.\nAcknowledgments This work was supported by the Wellcome Trust (CS, ML), the Gatsby Chari-\ntable Foundation (PD), and the European Union Seventh Framework Programme (FP7/2007\u20132013)\nunder grant agreement no. 269921 (BrainScaleS) (ML).\n\n8\n\n\fReferences\n1. Sommer, F.T. & Dayan, P. Bayesian retrieval in associative memories with storage errors. IEEE transac-\n\ntions on neural networks 9, 705\u2013713 (1998).\n\n2. Lengyel, M., Kwag, J., Paulsen, O. & Dayan, P. Matching storage and recall: hippocampal spike timing-\n\ndependent plasticity and phase response curves. Nature Neuroscience 8, 1677\u20131683 (2005).\n\n3. Lengyel, M. & Dayan, P. Uncertainty, phase and oscillatory hippocampal recall. Advances in Neural\n\nInformation Processing (2007).\n\n4. Savin, C., Dayan, P. & Lengyel, M. Two is better than one: distinct roles for familiarity and recollection in\nretrieving palimpsest memories. in Advances in Neural Information Processing Systems, 24 (MIT Press,\nCambridge, MA, 2011).\n\n5. Hop\ufb01eld, J.J. Neural networks and physical systems with emergent collective computational abilities.\n\nProc. Natl. Acad. Sci. USA 76, 2554\u20132558 (1982).\n\n6. Song, S., Sj\u00a8ostr\u00a8om, P.J., Reigl, M., Nelson, S. & Chklovskii, D.B. Highly nonrandom features of synaptic\n\nconnectivity in local cortical circuits. PLoS biology 3, e68 (2005).\n\n7. Dayan, P. & Abbott, L. Theoretical Neuroscience (MIT Press, 2001).\n8. Averbeck, B.B., Latham, P.E. & Pouget, A. Neural correlations, population coding and computation.\n\nNature Reviews Neuroscience 7, 358\u2013366 (2006).\n\n9. Pillow, J.W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population.\n\nNature 454, 995\u2013999 (2008).\n\n10. Latham, P.E. & Nirenberg, S. Synergy, redundancy, and independence in population codes, revisited.\n\nJournal of Neuroscience 25, 5195\u20135206 (2005).\n\n11. Branco, T. & H\u00a8ausser, M. Synaptic integration gradients in single cortical pyramidal cell dendrites. Neuron\n\n69, 885\u2013892 (2011).\n\n12. Hasselmo, M.E. & Bower, J.M. Acetylcholine and memory. Trends Neurosci. 16, 218\u2013222 (1993).\n13. MacKay, D.J.C. Maximum entropy connections: neural networks.\n\nin Maximum Entropy and Bayesian\nMethods, Laramie, 1990 (eds. Grandy, Jr, W.T. & Schick, L.H.) 237\u2013244 (Kluwer, Dordrecht, The Nether-\nlands, 1991).\n\n14. Fusi, S., Drew, P.J. & Abbott, L.F. Cascade models of synaptically stored memories. Neuron 45, 599\u2013611\n\n(2005).\n\n15. Abraham, W.C. Metaplasticity: tuning synapses and networks for plasticity. Nature Reviews Neuroscience\n\n9, 387 (2008).\n\n16. For details, see Supplementary Information.\n17. Zhang, W. & Linden, D. The other side of the engram: experience-driven changes in neuronal intrinsic\n\nexcitability. Nature Reviews Neuroscience (2003).\n\n18. Engel, A., Englisch, H. & Sch\u00a8utte, A. Improved retrieval in neural networks with external \ufb01elds. Euro-\n\nphysics Letters (EPL) 8, 393\u2013397 (1989).\n\n19. Leibold, C. & Kempter, R. Sparseness constrains the prolongation of memory lifetime via synaptic meta-\n\nplasticity. Cerebral cortex (New York, N.Y. : 1991) 18, 67\u201377 (2008).\n\n20. Amit, Y. & Huang, Y. Precise capacity analysis in binary networks with multiple coding level inputs.\n\nNeural computation 22, 660\u2013688 (2010).\n\n21. Huang, Y. & Amit, Y. Capacity analysis in multi-state synaptic models: a retrieval probability perspective.\n\nJournal of computational neuroscience (2011).\n\n22. Dayan Rubin, B. & Fusi, S. Long memory lifetimes require complex synapses and limited sparseness.\n\nFrontiers in Computational Neuroscience (2007).\n\n23. Thouless, D.J., Anderson, P.W. & Palmer, R.G. Solution of \u2019Solvable model of a spin glass\u2019. Philosophical\n\nMagazine 35, 593\u2013601 (1977).\n\n24. Amit, D., Gutfreund, H. & Sompolinsky, H. Storing in\ufb01nite numbers of patterns in a spin-glass model of\n\nneural networks. Phys Rev Lett 55, 1530\u20131533 (1985).\n\n25. Treves, A. & Rolls, E.T. What determines the capacity of autoassociative memories in the brain? Network\n\n2, 371\u2013397 (1991).\n\n9\n\n\f", "award": [], "sourceid": 227, "authors": [{"given_name": "Cristina", "family_name": "Savin", "institution": "University of Cambridge"}, {"given_name": "Peter", "family_name": "Dayan", "institution": "Gatsby Unit, UCL"}, {"given_name": "Mate", "family_name": "Lengyel", "institution": "University of Cambridge"}]}