{"title": "Maximising Sensitivity in a Spiking Network", "book": "Advances in Neural Information Processing Systems", "page_first": 121, "page_last": 128, "abstract": null, "full_text": "Maximising Sensitivity in a Spiking Network\n\nAnthony J. Bell,\n\nRedwood Neuroscience Institute\n1010 El Camino Real, Suite 380\n\nMenlo Park, CA 94025\n\nLucas C. Parra\n\nBiomedical Engineering Department\n\nCity College of New York\n\nNew York, NY 10033\n\ntbell@rni.org\n\nparra@ccny.cuny.edu\n\nAbstract\n\nWe use unsupervised probabilistic machine learning ideas to try to ex-\nplain the kinds of learning observed in real neurons, the goal being\nto connect abstract principles of self-organisation to known biophysi-\ncal processes. For example, we would like to explain Spike Timing-\nDependent Plasticity (see [5,6] and Figure 3A), in terms of information\ntheory. Starting out, we explore the optimisation of a network sensitiv-\nity measure related to maximising the mutual information between input\nspike timings and output spike timings. Our derivations are analogous to\nthose in ICA, except that the sensitivity of output timings to input tim-\nings is maximised, rather than the sensitivity of output \u2018\ufb01ring rates\u2019 to\ninputs. ICA and related approaches have been successful in explaining\nthe learning of many properties of early visual receptive \ufb01elds in rate cod-\ning models, and we are hoping for similar gains in understanding of spike\ncoding in networks, and how this is supported, in principled probabilistic\nways, by cellular biophysical processes. For now, in our initial simula-\ntions, we show that our derived rule can learn synaptic weights which can\nunmix, or demultiplex, mixed spike trains. That is, it can recover inde-\npendent point processes embedded in distributed correlated input spike\ntrains, using an adaptive single-layer feedforward spiking network.\n\n1 Maximising Sensitivity.\n\nIn this section, we will follow the structure of the ICA derivation [4] in developing the\nspiking theory. We cannot claim, as before, that this gives us an information maximisation\nalgorithm, for reasons that we will delay addressing until Section 3. But for now, to \ufb01rst\ndevelop our approach, we will explore an interim objective function called sensitivity which\nwe de\ufb01ne as the log Jacobian of how input spike timings affect output spike timings.\n\n1.1 How to maximise the effect of one spike timing on another.\n\nConsider a spike in neuron j at time tl that has an effect on the timing of another spike in\nneuron i at time tk. The neurons are connected by a weight wij. We use i and j to index\nneurons, and k and l to index spikes, but sometimes for convenience we will use spike\nindices in place of neuron indices. For example, wkl, the weight between an input spike l\nand an output spike k, is naturally understood to be just the corresponding wij.\n\n\fdtk\n\ndu\n\ndtl\n\nu(t)\n\nR(t)\n\nt\n\nk\n\nt\n\nl\n\nthreshold potential\n\nresting potential\n\noutput spikes\n\ninput spikes\n\nFigure 1: Firing time tk is determined by the time of threshold crossing. A change of an\ninput spike time dtl affects, via a change of the membrane potential du the time of the\noutput spike by dtk.\n\nIn the simplest version of the Spike Response Model [7], spike l has an effect on spike k that\ndepends on the time-course of the evoked EPSP or IPSP, which we write as Rkl(tk (cid:0) tl).\nIn general, this Rkl models both synaptic and dendritic linear responses to an input spike,\nand thus models synapse type and location. For learning, we need only consider the value\nof this function when an output spike, k, occurs.\nIn this model, depicted in Figure 1, a neuron adds up its spiking inputs until its mem-\nbrane potential, ui(t), reaches threshold at time tk. This threshold we will often, again for\nconvenience, write as uk (cid:17) ui(tk; ftlg), and it is given by a sum over spikes l:\n\nuk =Xl\n\nwklRkl(tk (cid:0) tl) :\n\n(1)\n\nTo maximise timing sensitivity, we need to determine the effect of a small change in the\ninput \ufb01ring time tl on the output \ufb01ring time tk. (A related problem is tackled in [2].) When\ntl is changed by a small amount dtl the membrane potential will change as a result. This\nchange in the membrane potential leads to a change in the time of threshold crossing dtk.\nThe contribution to the membrane potential, du, due to dtl is (@uk=@tl)dtl, and the change\nin du corresponding to a change dtk is (@uk=@tk)dtk. We can relate these two effects\nby noting that the total change of the membrane potential du has to vanish because uk is\nde\ufb01ned as the potential at threshold. ie:\n\ndu =\n\n@uk\n@tk\n\ndtk +\n\n@uk\n@tl\n\ndtl = 0 :\n\n(2)\n\nThis is the total differential of the function uk = u(tk; ftlg), and is a special case of the\nimplicit function theorem. Rearranging this:\n\ndtk\ndtl\n\n= (cid:0)\n\n@uk\n\n@tl (cid:30) @uk\n\n@tk\n\n= (cid:0)wkl _Rkl= _uk :\n\n(3)\n\nNow, to connect with the standard ICA derivation [4], recall the \u2018rate\u2019 (or sigmoidal) neu-\n\nron, for which yi = gi(ui) and ui =Pj wijxj. For this neuron, the output dependence on\n\n\f1\nwij\n\n(cid:0) fi(ui)xj\n\n(4)\n\nwhere the \u2018score functions\u2019, fi, are de\ufb01ned in terms of a density estimate on the summed\ninputs: fi(ui) = @\n@ui\n\ni = @\n\nlog g0\n\n@ui\n\nThe analogous learning gradient for the spiking case, from (3), is:\n\n(cid:0)Pa j(a) _Rka\n\n_uk\n\n:\n\n(5)\n\n@\n\n@wij\n\n@yi\n\nlog ^p(ui).\n\nlog(cid:12)(cid:12)(cid:12)(cid:12)\n@xj(cid:12)(cid:12)(cid:12)(cid:12) =\ndtl(cid:12)(cid:12)(cid:12)(cid:12) =\nlog(cid:12)(cid:12)(cid:12)(cid:12)\n\ndtk\n\n1\nwij\n\n@\n\n@wij\n\ninput is @yi=@xj = wijg0\n\ni while the learning gradient is:\n\nwhere j(a) = 1 if spike a came from neuron j, and 0 otherwise.\nComparing the two cases in (4) and (5), we see that the input variable xj has become\nthe temporal derivative of the sum of the EPSPs coming from synapse j, and the output\nvariable (or score function) fi(ui) has become _u(cid:0)1\nk , the inverse of the temporal derivative\nof the membrane potential at threshold. It is intriguing (A) to see this quantity appear as\nanalogous to the score function in the ICA likelihood model, and, (B) to speculate that\nexperiments could show that this\u2018 voltage slope at threshold\u2019 is a hidden factor in STDP\ndata, explaining some of the scatter in Figure 3A. In other words, an STDP datapoint should\nlie on a 2-surface in a 3D space of f(cid:1)w; (cid:1)t;\n_uk shows up in any\nlearning rule optimising an objective function involving output spike timings.\n\nIncidentally,\n\n_ukg.\n\n1.2 How to maximise the effect of N spike timings on N other ones.\n\nNow we deal with the case of a \u2018square\u2019 single-layer feedforward mapping between spike\ntimings. There can be several input and output neurons, but here we ignore which neurons\nare spiking, and just look at how the input timings affect the output timings. This is captured\nin a Jacobian matrix of all timing dependencies we call T. The entries of this matrix are\nTkl (cid:17) @tk=@tl. A multivariate version of the sensitivity measure introduced in the previous\nsection is the log of the absolute determinant of the timing matrix, ie: log jTj. The full\nderivation for the gradient rW log jTj is in the Appendix. Here, we again draw out the\nanalogy between Square ICA [4] and this gradient, as follows. Square ICA with a network\ny = g(Wx) is:\n\n(cid:1)W / rW log jJj = W(cid:0)1 (cid:0) f (u)xT\n\n(6)\nwhere the Jacobian J has entries @yi=@xj and the score functions are now, fi(u) =\ni being the special case of\n(cid:0) @\n@ui\nICA. We will now split the gradient in (6) according to the chain rule:\n\nlog ^p(u) for the general likelihood case, with ^p(u) =Qi g0\n\nrW log jJj = [rJ log jJj] (cid:10) [rWJ]\n\n= (cid:2)J(cid:0)T(cid:3) (cid:10)(cid:20)Jkl i(k)(cid:18) j(l)\n\nwkl\n\n(cid:0) fk(u)xj(cid:19)(cid:21) :\n\n(7)\n\n(8)\n\nIn this equation, i(k) = (cid:14)ik and j(l) = (cid:14)jl. The righthand term is a 4-tensor with entries\n\n@Jkl=@wij, and (cid:10) is de\ufb01ned as A (cid:10) Bij =Pkl AklBklij. We write the gradient this way\n\nto preserve, in the second term, the independent structure of the 1 ! 1 gradient term in\n(4), and to separate a dif\ufb01cult derivation into two easy parts. The structure of (8) holds up\nwhen we move to the spiking case, giving:\n\nrW log jTj = [rT log jTj] (cid:10) [rWT]\n\n= (cid:2)T(cid:0)T(cid:3) (cid:10)\"Tkl i(k) j(l)\n\nwkl\n\n(cid:0)Pa j(a) _Rka\n\n_uk\n\n!#\n\n(9)\n\n(10)\n\n\fwhere i(k) is now de\ufb01ned as being 1 if spike k occured in neuron i, and 0 otherwise. j(l)\nand j(a) are analogously de\ufb01ned.\nBecause the T matrix is much bigger than the J matrix, and because it\u2019s entries are more\ncomplex, here the similarity ends. When (10) is evaluated for a single weight in\ufb02uencing a\nsingle spike coupling (see the Appendix for the full derivation), it yields:\n\n(cid:1)wkl /\n\n@ log jTj\n\n@wkl\n\n=\n\nTkl\n\nwkl(cid:0)(cid:2)T(cid:0)1(cid:3)lk (cid:0) 1(cid:1) ;\n\n(11)\n\nThis is a non-local update involving a matrix inverse at each step. In the ICA case of (6),\nsuch an inverse was removed by the Natural Gradient transform (see [1]), but in the spike\ntiming case, this has turned out not to be possible, because of the additional asymmetry\nintroduced into the T matrix (as opposed to the J matrix) by the _Rkl term in (3).\n\n2 Results.\n\nNonetheless, this learning rule can be simulated.\nIt requires running the network for a\nwhile to generate spikes (and a corresponding T matrix), and then for each input/output\nspike coupling, the corresponding synapse is updated according to (11). When this is done,\nand the weights learn, it is clear that something has been sacri\ufb01ced by ignoring the issue\nof which neurons are producing the spikes. Speci\ufb01cally, the network will often put all the\noutput spikes on one output neuron, with the rates of the others falling to zero. It is happy\nto do this, if a large log jTj can thereby be achieved, because we have not included this\n\u2018which neuron\u2019 information in the objective. We will address these and other problems in\nSection 3, but now we report on our simulation results on demultiplexing.\n\n2.1 Demultiplexing spike trains.\n\nAn interesting possibility in the brain is that \u2018patterns\u2019 are embedded in spatially distributed\nspike timings that are input to neurons. Several patterns could be embedded in single input\ntrains. This is called multiplexing. To extract and propagate these patterns, the neurons\nmust demultiplex these inputs using its threshold nonlinearity. Demultiplexing is the \u2018point\nprocess\u2019 analog of the unmixing of independent inputs in ICA. We have been able to ro-\nbustly achieve demultiplexing, as we now report.\n\nWe simulated a feed-forward network with 3 integrate-and-\ufb01re neurons and inputs from 3\npresynaptic neurons. Learning followed (11) where we replace the inverse by the pseudo-\ninverse computed on the spikes generated during 0.5 s. The pseudo-inverse is necessary\nbecause even though on average, the learning matches number of output spikes to number\nof input spikes, the matrix T is still not usually square and so its actual inverse cannot be\ntaken.\n\nIn addition, in these simulations, an additional term is introduced in the learning to make\nsure all the output neurons \ufb01re with equal probability. This partially counters the ignoral of\nthe \u2018which neuron\u2019 information, which we explained above. Assuming Poisson spike count\nni for the ith output neuron with equal \ufb01ring rate (cid:22)ni it is easy to derive in an approximate\n\nterm that will control the spike count,Pi((cid:22)ni (cid:0) ni). The target \ufb01ring rates (cid:22)ni were set to\n\nmatch the \u201csource\u201d spike train in this example.\n\nThe network learns to demultiplex mixed spike trains, as shown in Figure 2. This demulti-\nplexing is a robust property of learning using (11) with this new spike-controlling term.\n\nFinally, what about the spike-timing dependendence of the observed learning? Does it\nmatch experimental results? The comparison is made in Figure 3, and the answer is no.\nThere is a timing-dependent transition between depression and potentiation in our result\n\n\fSpike Trains\n\nmixing\n\n0\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\n0\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\nsynaptic weights\n\ni\n\ns\nn\na\nr\nt\n \nt\n\nu\np\nn\n\ni\n \n\nd\ne\nx\nm\n\ni\n\nt\n\nu\np\nu\no\n\nt\n\ni\n\nn\na\nr\nt\n \n\ni\n\ne\nk\np\ns\n \nl\n\ni\n\na\nn\ng\ni\nr\no\n\n1\n\n2\n\n3\n\n1\n\n2\n\n3\n\n1\n\n2\n\n3\n\n0\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\ntime in ms\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n1\n\n0.5\n\n0\n\n\u22120.5\n\nFigure 2: Unmixed spike trains. The input (top lef) are 3 spike trains which are a mixture\nof three independent Poison processes (bottom left). The network unmixes the spike train\nto approximately recover the original (center left). In this example 19 spikes correspond to\nthe original with 4 deletion and 2 insertions. The two panels at the right show the mixing\n(top) and synaptic weight matrix after training (bottom).\n\nin Figure 3B, but it is not a sharp transition like the experimental result in Figure 3A. In\naddition, it does not transition at zero (ie: when tk (cid:0) tl = 0), but at a time offset by the\nrise time of the EPSPs. In earlier experiments, in which we tranformed the gradient in\n(11) by an approximate inverse Hessian, to get an approximate Natural Gradient method,\na sharp transition did emerge in simulations. However, the approximate inverse Hessian\nwas singular, and we had to de-emphasise this result. It does suggest, however, that if the\nNatural Gradient transform can be usefully done on some variant of this learning rule, it\nmay well be what accounts for the sharp transition effect of STDP.\n\n3 Discussion\n\nAlthough these derivations started out smoothly, the reader possibly shares the authors\u2019\nfrustration at the approximations involved here. Why isn\u2019t this simple, like ICA? Why\ndon\u2019t we just have a nice maximum spikelihood model, ie: a density estimation algorithm\nfor multivariate point processes, as ICA was a model in continuous space? We are going\nto be explicit about the problems now, and will propose a direction where the solution may\nlie. The over-riding problem is: we are unable to claim that in maximising log jTj, we are\nmaximising the mutual information between inputs and outputs because:\n1. The Invertability Problem. Algorithms such as ICA which maximise log Jacobians can\nonly be called Infomax algorithms if the network transformation is both deterministic and\ninvertable. The Spike Response Model is deterministic, but it is not invertable in general.\nWhen not invertable, the key formula (considering here vectors of input and output timings,\ntin and tout)is transformed from simple to complex. ie:\n\np(tout) =\n\np(tin)\n\njTj\n\nbecomes p(tout) =Zsolns tin\n\np(tin)\n\njTj\n\nd tin\n\n(12)\n\nThus when not invertable, we need to know the Jacobians of all the inputs that could have\ncaused an output (called here \u2018solns\u2019), something we simply don\u2019t know.\n2. The \u2018Which Neuron\u2019 Problem.\nInstead of maximising the mutual information\nI(tout; tin), we should be maximising I(tiout; tiin), where the vector ti is the timing\n\n\f(A) STDP\n\n(B) Gradient\n\n150\n\n100\n\n50\n\n0\n\n\u221250\n\n150\n\n100\n\n50\n\n0\n\n\u221250\n\n)\n.\nu\n.\na\n(\n \n \n\nw\n\n \n\n)\n\n%\n\n(\n \n \n\nw\n\n \n/\n \n\nw\n\n \n\n\u2212100\n\n\u2212100\n\n\u221250\n\n0\n\n t (ms)\n\n50\n\n100\n\n\u2212100\n\n\u221220\n\n0\n\n20\n\n40\n t (ms)\n\n60\n\n80\n\n100\n\nFigure 3: Dependence of synaptic modi\ufb01cation on pre/post inter-spike interval. Left (A):\nFrom Froemke & Dan, Nature (2002)]. Dependence of synaptic modi\ufb01cation on pre/post\ninter-spike interval in cat L2/3 visual cortical pyramidal cells in slice. Naturalistic spike\ntrains. Each point represents one experiment. Right (B): According to Equation (11).\nEach point corresponds to an spike pair between approximately 100 input and 100 output\nspikes.\n\nvector, t, with the vector, i, of corresponding neuron indices, concatenated. Thus, \u2018who\nspiked?\u2019 should be included in the analysis as it is part of the information.\n3. The Predictive Information Problem. In ICA, since there was no time involved, we did\nnot have to worry about mutual informations over time between inputs and outputs. But in\nthe spiking model, output spikes may well have (predictive) mutual information with future\ninput spikes, as well as the usual (causal) mutual information with past input spikes. The\nformer has been entirely missing from our analysis so far.\n\nThese temporal and spatial infomation dependencies missing in our analysis so far, are\nthrown into a different light by a single empirical observation, which is that Spike Timing-\nDependent Plasticity is not just a feedforward computation like the Spike Response Model.\nSpeci\ufb01cally, there must be at least a statistical, if not a causal, relation between a real\nsynapse\u2019s plasticity and its neuron\u2019s output spike timings, for Figure 3B to look like it\ndoes.\n\nIt seems we have to confront the need for both a \u2018memory\u2019 (or reconstruction) model, such\nas the T we have thus far dealt with, in which output spikes talk about past inputs, and a\n\u2018prediction\u2019 model, in which they talk about future inputs. This is most easily understood\nfrom the point of view of Barber & Agakov\u2019s variational Infomax algorithm [3]. They\nargue for optimising a lower bound on mutual information, which, for our neurons\u2019, would\nbe expressed using an inverse model ^p, as follows:\n\n(13)\n\neI(tiin; tiout) = H(tiin) (cid:0) hlog ^p(tiinjtiout)ip(tiin;tiout) (cid:20) I(tiin; tiout)\n\nIn a feedforward model, H(tiin) may be disregarded in taking gradients, leading us to the\noptimisation of a \u2018memory-prediction\u2019 model ^p(tiinjtiout) related to something suppos-\nedly happening in dendrites, somas and at synapses. In trying to guess what this might be,\nit would be nice if the math worked out. We need a square Jacobian matrix, T, so that\njTj = ^p(tiinjtiout) can be our memory/prediction model. Now let\u2019s rename our feedfor-\nward timing Jacobian T (\u2018up the dendritic trees\u2019), as (cid:0)!\nT, and let\u2019s fantasise that there is\nsome, as yet unspeci\ufb01ed, feedback Jacobian (cid:0)\nT (\u2018down the dendritic trees\u2019), which covers\nelectrotonic in\ufb02uences as they spread from soma to synapse, and which (cid:0)!\nT can be com-\nbined with by some operation \u2018(cid:10)\u2019 to make things square. Imagine further, that doing this\nyields a memory/prediction model on the inputs. Then the T we are looking for is (cid:0)!\n (cid:0)\nT,\n\nT (cid:10)\n\nD\nD\nD\nD\n\f(cid:0)!\nT (cid:10)\n\nand the memory-prediction model is: ^p(tiinjtiout) =(cid:12)(cid:12)(cid:12)\n\nT should be as before, ie: (cid:0)!\n\n(cid:0)!\nT (cid:10)\n\n (cid:0)\n\nT(cid:12)(cid:12)(cid:12)\n\nT and (cid:0)\n\nT had entries (cid:0)\n\nT (giving them each 4 indices ijkl), so that T =\n\nT be? Becoming just one step more concrete, suppose (cid:0)\n\nIdeally, the entries of (cid:0)!\nT kl = @tk=@tl. What should the entries\nof (cid:0)\nT lk = @cl=@tk,\nwhere cl is some, as yet unspeci\ufb01ed, value, or process, occuring at an input synapse when\nspike l comes in. What seems clear is that (cid:10) should combine the correctly tensorised forms\nof (cid:0)!\n (cid:0)\nT sums over the spikes\nk and l to give a I (cid:2) J matrix, where I is the number of output neurons, and J the number\nof input neurons. Then our quantity, T, would represent all dependencies of input neuronal\nactivity on output activity, summed over spikes.\nFurther, we imagine that (cid:0)\nT contains reverse (feedback) electrotonic transforms from soma\nto synapse (cid:0)\nR lk that are somehow symmetrically related to the feedforward Spike Re-\nsponses from synapse to soma, which we now rename (cid:0)!\nR kl. Thinking for a moment in terms\nof somatic k and synaptic l, voltages V , currents I and linear cable theory, the synapse to\nsoma transform, (cid:0)!\n(cid:0)!\nZ kl, while the soma\nto synapse transform, (cid:0)\n (cid:0)\nY lk [8]. The\nsymmetry in these equations is that (cid:0)!\nFinally, then, what is cl? And what is its relation to the calcium concentration, [Ca2+]l,\nat a synapse, when spike l comes in? These questions naturally follow from considering\nthe experimental data, since it is known that the calcium level at synapses is the critical\nintegrating factor in determining whether potentiation or depression occurs [5].\n\nR lk would be related to an admittance in Il = Vk\nZ kl is just the inverse conjugate of (cid:0)\nY lk.\n\nR kl would be related to an impedance in Vk = Il\n\n4 Appendix: Gradient of log jTj for the full Spike Response Model.\n\nHere we give full details of the gradient for Gerstner\u2019s Spike Response Model [7]. This is\na general model for which Integrate-and-Fire is a special case. In this model the effect of\na presynaptic spike at time tl on the membrane potential at time t is described by a post\nsynaptic potential or spike response, which may also depend on the time that has passed\nsince the last output spike tk(cid:0)1, hence the spike response is written as R(t (cid:0) tk(cid:0)1; t (cid:0) tl).\nThis response is weighted by the synaptic strength wl. Excitatory or inhibitory synapses are\ndetermined by the sign of wl. Refractoriness is incorporated by adding a hyper-polarizing\ncontribution (spike-afterpotential) to the membrane potential in response to the last preced-\ning spike (cid:17)(t (cid:0) tk(cid:0)1). The membrane potential as a function of time is therefore given\nby\n\nwlR(t (cid:0) tk(cid:0)1; t (cid:0) tl) :\n\n(14)\n\nu(t) = (cid:17)(t (cid:0) tk(cid:0)1) +Xl\n\nWe have ignored here potential contributions from external currents which can easily be in-\ncluded without modifying the following derivations. The output \ufb01ring times tk are de\ufb01ned\nas the times for which u(t) reaches \ufb01ring threshold from below. We consider a dynamic\nthreshold, #(t (cid:0) tk(cid:0)1), which may depend on the time since that last spike tk(cid:0)1, together\nthen output spike times are de\ufb01ned implicitly by:\n\nt = tk : u(t) = #(t (cid:0) tk(cid:0)1) and\n\ndu(t)\n\ndt\n\n> 0 :\n\n(15)\n\nFor this more general model Tkl is given by\n\nTkl =\n\ndtk\ndtl\n\n= (cid:0)(cid:18) @u\n\n@tk\n\n(cid:0)\n\n@#\n\n@tk(cid:19)(cid:0)1 @u\n\n@tl\n\n=\n\nwkl _R(tk (cid:0) tk(cid:0)1; tk (cid:0) tl; )\n\n_u(tk) (cid:0) _#(tk (cid:0) tk(cid:0)1)\n\n;\n\n(16)\n\nwhere _R(s; t); _u(t), and _#(t) are derivatives with respect to t. The dependence of Tkl on\ntk(cid:0)1 should be implicitly assumed. It has been omitted to simplify the notation.\n\n\fNow we compute the derivative of log jTj with respect to wkl. For any matrix T we have\n@ log jTj=@Tab = [T(cid:0)1]ba. Therefore:\n\nUtilising the Kronecker delta (cid:14)ab = (1 if a = b; else 0), the derivative of (16) with respect\nto wkl gives:\n\n(17)\n\n[T(cid:0)1]ba\n\n@ log jTj\n\n@Tab\n\n= (cid:14)ak(cid:14)bl\n\n@ log jTj\n\n@wkl\n\n@Tab\n@wkl\n\n:\n\n@\n\n@wkl\"\n\nwab _R(ta (cid:0) ta(cid:0)1; ta (cid:0) tb)\n\n@Tab\n\n@wklXab\n\n_R(ta (cid:0) ta(cid:0)1; ta (cid:0) tb)\n_u(ta) (cid:0) _#(ta (cid:0) ta(cid:0)1)\n\nwab _R(ta (cid:0) ta(cid:0)1; ta (cid:0) tb)(cid:14)ak _R(ta (cid:0) ta(cid:0)1; ta (cid:0) tl)\n\n=Xab\n(cid:17)(ta (cid:0) ta(cid:0)1) +Pc wac _R(ta (cid:0) ta(cid:0)1; ta (cid:0) tc) (cid:0) _#(ta (cid:0) ta(cid:0)1)#\n(cid:16) _u(ta) (cid:0) _#(ta (cid:0) ta(cid:0)1)(cid:17)2\nwal(cid:21) :\n= (cid:14)akTab(cid:20) (cid:14)bl\n[T(cid:0)1]ba(cid:14)akTab(cid:20) (cid:14)bl\n= Xab\nwkl [T(cid:0)1]lk (cid:0)Xb\n\nwal(cid:21)\n[T(cid:0)1]bkTkl! =\n\nTkl\n\nwkl(cid:0)[T(cid:0)1]lk (cid:0) 1(cid:1) :\n\nTkl\n\n=\n\nTal\n\n(cid:0)\n\nTal\n\n(cid:0)\n\nwab\n\nwab\n\n(18)\n\n(19)\n\n(20)\n\n=\n\n@Tab\n@wkl\n\nTherefore:\n\n(cid:0)\n\n@ log jTj\n\n@wkl\n\nAcknowledgments\n\nWe are grateful for inspirational discussions with Nihat Ay, Michael Eisele, Hong Hui Yu,\nJim Crutch\ufb01eld, Jeff Beck, Surya Ganguli, Sophi`e Deneve, David Barber, Fabian Theis,\nTony Zador and Arunava Banerjee. AJB thanks all RNI colleagues for many such discus-\nsions.\n\nReferences\n\n[1] Amari S-I. 1997. Natural gradient works ef\ufb01ciently in learning, Neural Computation,\n10, 251-276\n\n[2] Banerjee A. 2001. On the Phase-Space Dynamics of Systems of Spiking Neurons.\nNeural Computation, 13, 161-225\n\n[3] Barber D. & Agakov F. 2003. The IM Algorithm: A Variational Approach to Informa-\ntion Maximization. Advances in Neural Information Processing Systems 16, MIT Press.\n\n[4] Bell A.J. & Sejnowski T.J. 1995. An information maximization approach to blind\nseparation and blind deconvolution, Neural Computation, 7, 1129-1159\n\n[5] Dan Y. & Poo M-m. 2004. Spike timing-dependent plasticity of neural circuits, Neuron,\n44, 23-30\n\n[6] Froemke R.C. & Dan Y. 2002. Spike-timing-dependent synaptic modi\ufb01cation induced\nby natural spike trains. Nature, 28, 416: 433-8\n\n[7] Gerstner W. & Kistner W.M. 2002. Spiking neuron models, Camb. Univ. Press\n\n[8] Zador A.M., Agmon-Snir H. & Segev I. 1995. The morphoelectrotonic transform: a\ngraphical approach to dendritic function, J. Neurosci., 15(3): 1669-1682\n\n\f", "award": [], "sourceid": 2674, "authors": [{"given_name": "Anthony", "family_name": "Bell", "institution": null}, {"given_name": "Lucas", "family_name": "Parra", "institution": null}]}