{"title": "A Novel Kernel for Learning a Neuron Model from Spike Train Data", "book": "Advances in Neural Information Processing Systems", "page_first": 595, "page_last": 603, "abstract": "From a functional viewpoint, a spiking neuron is a device that transforms input spike trains on its various synapses into an output spike train on its axon. We demonstrate in this paper that the function mapping underlying the device can be tractably learned based on input and output spike train data alone. We begin by posing the problem in a classification based framework. We then derive a novel kernel for an SRM0 model that is based on PSP and AHP like functions. With the kernel we demonstrate how the learning problem can be posed as a Quadratic Program. Experimental results demonstrate the strength of our approach.", "full_text": "A Novel Kernel for Learning a Neuron Model from\n\nSpike Train Data\n\nNicholas Fisher, Arunava Banerjee\n\nDepartment of Computer and Information Science and Engineering\n\nUniversity of Florida\nGainesville, FL 32611\n\n{nfisher,arunava}@cise.ufl.edu\n\nAbstract\n\nFrom a functional viewpoint, a spiking neuron is a device that transforms input\nspike trains on its various synapses into an output spike train on its axon. We\ndemonstrate in this paper that the function mapping underlying the device can be\ntractably learned based on input and output spike train data alone. We begin by\nposing the problem in a classi\ufb01cation based framework. We then derive a novel\nkernel for an SRM0 model that is based on PSP and AHP like functions. With\nthe kernel we demonstrate how the learning problem can be posed as a Quadratic\nProgram. Experimental results demonstrate the strength of our approach.\n\n1\n\nIntroduction\n\nNeurons are the predominant component of the nervous system and understanding them is a ma-\njor challenge in modern neuroscience research [1]. Many neuron models have been proposed to\nunderstand the dynamics of individual and populations of neurons. Although these models vary\nin complexity, at a fundamental level they are mechanisms which transform input spike trains into\nan output spike train. This view has found expression in the Quantitative Single-Neuron Modeling\ncompetition where submitted models compete on how accurately they can predict the output spike\ntrain of a biological neuron given an input current [2]. Since the vast majority of neurons receive\ninput from chemical synapses [3], a stricter stipulation would be to predict output spikes based on\ninput spike trains at the various synapses of the neuron. There are advantages to this variation of\nthe problem: complicated subthreshold \ufb02uctuations in the membrane potential need not be modeled,\nsince models are now judged strictly on the basis of their performance at predicting the timing of\noutput spikes. Models now have the liberty to focus on threshold crossings at the expense of be-\ning inaccurate in the subthreshold regime. Not only does the model better represent the functional\ncomplexity of the input/output transformation of a neuron, comparisons to the real neuron can be\nconducted in a non-invasive manner.\nIn this paper we learn a Spike Response Model 0 (SRM0)[4] approximation of a neuron by only\nconsidering the timing of all afferent (incoming) and efferent (outgoing) spikes of the neuron over\na bounded past. We begin by formulating the problem in a classi\ufb01cation based supervised learning\nframework where spike train data is labeled according to whether the neuron is about to spike, or\nhas recently spiked. We demonstrate that optimizing the model to properly classify this labeled data\nnaturally leads to a quadratic programming problem when combined with an appropriate represen-\ntation of the model via a dictionary of functions. We then derive a novel kernel on spike trains which\nis computed from a dictionary of post-synaptic potential (PSP) and after-hyperpolarizing potential\n(AHP) like functions. Finally, experimental results are presented to demonstrate the ef\ufb01cacy of the\napproach. For a complementary approach to learning a neuron model from spike train data, see [5].\n\n1\n\n\fAn SRM0 model was chosen for several reasons. First, SRM0 has been shown to be fairly versatile\nand accurate at modeling biological neurons [6]. Second, SRM0 is a relatively simple neuron model,\nand therefore is likely to display better generalizability on unseen input. Finally, the disparity be-\ntween the learned neuron model and the actual neuron could shed light on the various operational\nmodes of biological neurons. It is conceivable that the learned SRM0 model accurately predicts the\nbehavior of the neuron a majority of the time. However, there could be states, bursting for example,\nwhere the prediction diverges. In such a case, the neuron can be seen as operating in two differ-\nent modes, one SRM0 like, and the other not. Multiple models could then be learned to model the\nneuron in its various operational modes.\n\n2 General model of the neuron\n\n1, tj\n\n2 . . . tj\nNj\n\ntime. We can then formalize the membrane potential function P : RN \u2192 R, where N =(cid:80)m\n\nIt has been shown, that if one assumes a neuron to be a \ufb01nite precision device with fading memory\nand a refractory period, then the membrane potential of the neuron, P , can be modeled as a function\nof the timing of the neuron\u2019s afferent and efferent spikes which have occurred within a bounded\npast [7]. Spikes that have aged past this bound, denoted by \u03a5, are considered to have a negligible\neffect on the present value of P . We denote the arrival times of spikes at synapse j using the vector\ntj = (cid:104)tj\n(cid:105), where Nj is bounded from above by the number of spikes that can be present\nin an \u03a5 window of time. t0 represents the output spike train of the neuron and vectors t1 . . . tm\nrepresent spike trains on the input synapses. tj\ni represents the time that has elapsed since that spike\nwas generated or received by the neuron. Spikes are only considered if they occurred within \u03a5\nj=0 Nj.\nP (t0, . . . , tm) is de\ufb01ned over the space of all spike trains and reports the present membrane potential\nof the neuron. The neuron generates a spike when P (t0, . . . , tm) = \u0398 and dP/dt \u2265 0, where \u0398 is\nthe threshold of the neuron. For notational simplicity, we de\ufb01ne the spike con\ufb01guration, s \u2208 RN ,\nwhich represents the timing of all afferent and efferent spikes within the window of length \u03a5. s is\nthe vector of vectors, s = (cid:104)t0, . . . , tm(cid:105). The neuron generates a spike when P (s) = \u0398, dP/dt \u2265 0.\nAs discussed in Section 1, we shall learn an SRM0 approximation of the neuron. The SRM0 model\nuses a bounded past history as described above to calculate the present membrane potential of the\nneuron. The present membrane potential \u02c6P is calculated as shown in Equation 1. \u03b7 models the effect\nof a past generated spike, the AHP. \u0001j represents the response of the neuron to a presynaptic spike at\nsynapse j, the PSP. urest is the resting membrane potential. At any given time, the neuron generates\na spike if the membrane potential crosses the threshold from below (i.e., \u02c6P (s) = \u0398, d \u02c6P /dt \u2265 0).\n\nN0(cid:88)\n\nm(cid:88)\n\nNj(cid:88)\n\n\u03b7(t0\n\ni ) +\n\n\u0001j(tj\n\ni ) + urest\n\n(1)\n\ni=1\n\nj=1\n\ni=1\n\n\u02c6P (s) =\n\n3 Classi\ufb01cation Problem\n\nIn order to learn an SRM0 approximation of a neuron in a non-invasive manner, we pose a super-\nvised learning classi\ufb01cation problem which labels the given spike train data according to whether\nthe neuron is about to spike or has recently spiked. We denote the former S\u2212 and the latter S +. This\nproblem is equivalent to classifying subthreshold spike con\ufb01gurations ( \u02c6P (s) < \u0398) from suprathresh-\nold spike con\ufb01gurations ( \u02c6P (s) \u2265 \u0398), which leads to the classi\ufb01cation problem shown in Equation 2.\nIt should be noted that the true membrane potential function, P , is a feasible solution to this problem\nsince P (s) < \u0398 \u2200s \u2208 S\u2212 and P (s) \u2265 \u0398 \u2200s \u2208 S +.\n\ns.t. \u02c6P (s) \u2212 \u0398 \u2265 1 \u2200s \u2208 S + AND \u02c6P (s) \u2212 \u0398 \u2264 \u22121 \u2200s \u2208 S\u2212\n\n(2)\n\n(cid:13)(cid:13)(cid:13) \u02c6P (s)\n\n(cid:13)(cid:13)(cid:13)2\n\nMin.\n\nTo generate training data which belong to S + and S\u2212, we provide the spike con\ufb01gurations which\noccur at a \ufb01xed in\ufb01nitesimal time differential before and after the neuron generates a spike, as\nillustrated in Figure 1(a). The spike train at the instant the neuron generated a spike is shown by\nthe solid lines. We shift the spike window in\ufb01nitesimally into the past (future) to produce a spike\ncon\ufb01guration s \u2208 S\u2212(S +), shown by the up (down) arrows. Notice that the spike which is currently\n\n2\n\n\fgenerated in the output spike train, t0, emphasized by the dashed circle, is not included in either\nspike con\ufb01guration s. The reason it is not included in s \u2208 S\u2212 is that it simply has not been generated\nat that point in time. The reason it is not included in s \u2208 S + is twofold. First, the spike would induce\nan AHP effect which would cause the membrane potential to fall below the threshold. Second, if it\nwere included, this would cause the classi\ufb01er to only consider whether or not that particular spike\nexisted when classifying a given spike con\ufb01guration as a member of S + or S\u2212. If it did exist, it\nwould belong to S +, and if it did not exist it would belong to S\u2212. Although this method would work\nwell for the training data, it would not generalize to unseen live spike train data.\n\nFigure 1: Figure (a) depicts the spike con\ufb01gurations used in the classi\ufb01cation problem. Figure (b)\nshows the REEF for a \ufb01xed value of t = 1s and variable \u03b2 and \u03c4 values. Figure (c) portrays the\nform of cross sections of the REEF as a function of t for different values of \u03b2 and \u03c4.\n\nProducing a hypersurface which can separate the supra-threshold spike con\ufb01gurations from the sub-\nthreshold spike con\ufb01gurations within the spike time feature space, would be extremely dif\ufb01cult.\nAs discussed above, if we could map a given spike con\ufb01guration s to its corresponding membrane\npotential P (s), then the classi\ufb01cation problem is trivial. Although we do not have access to the\nmembrane potential function, we can use a linear combination of functions from a dictionary to\nreproduce an approximation to the membrane potential function P . The choice of the dictionary is\ncrucial. By choosing a dictionary which is tailored to the form of typical PSP and AHP functions,\nwe increase the likelihood of successfully modeling the given neuron.\nThe SRM0 model is an additively separable model [8], that is, the membrane potential is a sum\ni )). This\nfeature lends itself well to modeling the membrane potential using a linear combination of dictionary\nelements. The dictionary used here was one derived from a function used by MacGregor and Lewis\nfor neuron modeling [9]. It consists of functions (parametrized by \u03b2 and \u03c4) of the form\n\nof functions of the individual spikes of the spike con\ufb01guration ( \u02c6P (s) =(cid:80)m\n\n(cid:80)Nj\n\nj=0\n\ni=1\n\n\u02c6Pij(tj\n\nf\u03b2,\u03c4 (t) =\n\n1\n\u03c4\n\n\u00b7 exp(\u2212\u03b2/t) \u00b7 exp(\u2212t/\u03c4 )\n\n(3)\n\nWe call this the reciprocal exponential \u2013 exponential function (REEF) dictionary. Figures1(b) and\n(c) present the dictionary for various cross sections of t, \u03b2 and \u03c4.\n\n4 Approximation of the membrane potential function\n\nWe would like to combine members of the chosen dictionary of functions to construct an approxima-\ntion of the membrane potential function, P , which will yield a solution to the classi\ufb01cation problem\nposed in Equation 2. We shall \ufb01rst discuss how this can be achieved in a discrete setting, where we\ncombine a \ufb01nite number of \u03b2 and \u03c4 parametrized dictionary functions to model P . Following this\nwe will discuss a continuous formulation, in which we combine elements drawn from an in\ufb01nite\ncontinuous range of \u03b2 and \u03c4 parametrized dictionary functions to model P . In the context of the\ncontinuous formulation, we will prove a speci\ufb01c instance of the Representer theorem which was \ufb01rst\nshown by Kimeldorf and Wahba [10]. The Representer theorem shows that the optimal solution to\nthe posed classi\ufb01cation problem must lie in the span of the data points which were used to train the\nclassi\ufb01er. In the discrete and continuous formulation, we will \ufb01rst model the effect of a single spike\nfor simplicity. We will conclude this section by extending the continuous formulation to the case of\nmultiple spikes on a single synapse, and the case of multiple spikes on multiple synapses.\n\n3\n\n135051000.25\u03c4REEF as a function of \u03b2 and \u03c4\u03b20336610000.51Vary \u03c4 (\u03b2=0.5)Time (ms) \u03c4 = 10\u03c4 = 20\u03c4 = 30\u03c4 = 400336610000.51Vary \u03b2 (\u03c4=20)Time (ms) \u03b2=0\u03b2=5\u03b2=10\u03b2=15Pasttttt0123S+S\u2212a)b)c)\f4.1 Discrete Formulation\n\nM(cid:88)\n\nN(cid:88)\n\nIn the discrete formulation, we wish to approximate the membrane potential function using a linear\ncombination of a \ufb01nite, prede\ufb01ned set of functions from the REEF dictionary. Focusing on the\nsingle spike case, our goal is to model the effect of a single spike on the membrane potential. We\ndenote this effect on the membrane potential by \u02c6P and it is de\ufb01ned as a linear combination of\n\u03c4 \u00b7 exp(\u2212\u03b2/t) \u00b7 exp(\u2212t/\u03c4 ) is\nparametrized REEF functions as shown in Equation 4. ft(\u03b2, \u03c4 ) = 1\nnow a univariate function over t for \ufb01xed values of \u03b2 and \u03c4. A speci\ufb01c set of parameter settings\n{(\u03b21, \u03c41), . . . , (\u03b2M , \u03c41), (\u03b21, \u03c42), . . . , (\u03b2M , \u03c4N )} are used to construct a \u02c6P that can best reproduce\nthe effect of the spike on the membrane potential. Inserting Equation 4 into Equation 2 yields a\nquadratic optimization problem on the mixing coef\ufb01cients \u03b1i,j\u2019s.\n\n\u02c6P (t) =\n\n\u03b1i,jft(\u03b2i, \u03c4j)\n\n(4)\n\ni=1\n\nj=1\n\nThe major disadvantage of the discrete formulation is that for any given neuron, the optimal value set\nof the \u03b2\u2019s and \u03c4\u2019s is unlikely to be known beforehand. While one can argue that the approximation\n\u02c6P can be improved by increasing M and N, as the number of functions increases, so does the\ndimensionality of the feature space. Since M and N can be increased independent of the size of\nthe training dataset, the procedure is susceptible to over-\ufb01tting. To resolve this issue, we shift to a\ncontinuous formulation of the problem, which by virtue of the Representer theorem does not suffer\nfrom the rising feature space dimensionality issue. The dimensionality of the feature space is now\ncontrolled by the span of the training dataset.\n\n4.2 Continuous formulation\nIn the continuous formulation, we consider L2, the Hilbert space of square integrable functions on\nthe domain {\u03b2, \u03c4} \u2208 [0,\u221e)2. We are concerned with \ufb01nding a threshold dependent classi\ufb01cation\nfunction \u02c6P , such that \u02c6P (t) \u2265 \u0398 + 1 when the spike t \u2208 S + and \u02c6P (t) \u2264 \u0398 \u2212 1 when t \u2208 S\u2212. This\nfunction is de\ufb01ned in Equation 5.\n\n\u02c6P (t) = (cid:104)\u03b1(\u03b2, \u03c4 ), ft(\u03b2, \u03c4 )(cid:105) =\n\n(5)\nIn this formulation, the mixing function, \u03b1(\u03b2, \u03c4 ), is by de\ufb01nition a member of L2. Therefore, if\nft(\u03b2, \u03c4 ) \u2208 L2, then \u02c6P (t) is \ufb01nite by the Cauchy-Schwartz inequality since (cid:104)\u03b1(\u03b2, \u03c4 ), ft(\u03b2, \u03c4 )(cid:105) \u2264\n(cid:107)\u03b1(\u03b2, \u03c4 )(cid:107) \u00b7 (cid:107)ft(\u03b2, \u03c4 )(cid:107) < \u221e if both (cid:107)\u03b1(\u03b2, \u03c4 )(cid:107) < \u221e and (cid:107)ft(\u03b2, \u03c4 )(cid:107) < \u221e. To show that ft(\u03b2, \u03c4 ) \u2208\nL2 we must show (cid:104)ft(\u03b2, \u03c4 ), ft(\u03b2, \u03c4 )(cid:105) < \u221e. For ease of readability we shall henceforth suppress\nthe domain variables in ft(\u03b2, \u03c4 ) and \u03b1(\u03b2, \u03c4 ) and refer to them as ft and \u03b1.\n\n\u03b1(\u03b2, \u03c4 )ft(\u03b2, \u03c4 ) d\u03b2 d\u03c4\n\n0\n\n0\n\n(cid:90) \u221e\n\n(cid:90) \u221e\n\n(x + y)2\n(t+t)2 = 1\n\n4.2.1 Proof\n\n(cid:104)fx, fy(cid:105) =\n\n=\n\n(cid:90) \u221e\n\n(cid:90) \u221e\n\n0\n\n0\n\nxy\n\n(cid:19)\n\n(cid:18)\n\n\u2212 \u03b2\nx\n\n(cid:16)\u2212 x\n\n\u03c4\n\n(cid:17) 1\n\n\u03c4\n\nexp\n\n(cid:19)\n\n(cid:18)\n\n\u2212 \u03b2\ny\n\n(cid:17)\n\n(cid:16)\u2212 y\n\n\u03c4\n\nexp\n\nexp\n\n1\n\u03c4\n\nexp\n\nd\u03b2d\u03c4\n\n(6)\n\n(7)\n\n4 < \u221e \u2200t \u2208 [\u0001,\u221e) for some \u0001 > 0.\n\nTherefore (cid:104)ft, ft(cid:105) = t\u00b7t\nWe must note here that by de\ufb01ning the membrane potential function in this manner, we have formu-\nlated a problem which yields a solution which is different from the solution to the discrete problem.\nSince the delta function centered at any arbitrary point (\u03b2\u2217, \u03c4\u2217) does not belong to L2, the mixing\nfunction \u03b1 cannot be made up of a linear combination of these delta functions, as is the case in the\ndiscrete formulation. In addition, we are not working with a reproducing kernel Hilbert space since\nwe are considering L2. However, our de\ufb01nition in Equation 5 de\ufb01nes the \u201cpoint evaluation\u201d of our\nmembrane potential function.\nSince \u02c6P (t) is de\ufb01ned using the standard inner product in L2 with respect to particular members of\nL2, we can reformulate the classi\ufb01cation problem in Equation 2 as shown in Equation 8. Here M is\n\n4\n\n\fMin. (cid:107)\u03b1(cid:107)2\n\nthe number of data points, m = 1 . . . M, and ym is the corresponding classi\ufb01cation for spike time\ntm (that is, ym = +1 if tm \u2208 S + and ym = \u22121 if tm \u2208 S\u2212).\n\ns.t. ym ((cid:104)\u03b1, ftm(cid:105) \u2212 \u0398) \u2265 1 m = {1 . . . M}\n\nfor \u03b1 to the optimization problem speci\ufb01ed in Equation 8 can be expressed as \u03b1 = (cid:80)M\n\n(8)\nWe can now use a speci\ufb01c instance of the Representer theorem [10] to show that the optimal solution\nk=1 \u03bdkftk.\nWe can then substitute this equality back into Equation 8 to produce a dual formulation of the\noptimization problem, which is a standard quadratic programming problem.\n\n4.2.2 Representer Theorem\nFor some \u03bd1, \u03bd2, . . . \u03bdM \u2208 R, the solution to Equation 8 can be written in the form\n\n\u03b1 =\n\n\u03bdkftk\n\n(9)\n\nM(cid:88)\n\nk=1\n\nProof We consider the subspace of L2 spanned by the REEF functions evaluated at the times of\nthe given training data points (span{ ftk : 1 \u2264 k \u2264 M }). We then consider the projection \u03b1(cid:107) of \u03b1\non this subspace. By noting \u03b1 = \u03b1(cid:107) + \u03b1\u22a5 and rewriting Equation 8 in its Lagrangian form, we are\nleft with Equation 10. However, by the de\ufb01nition of \u03b1\u22a5, (cid:104)\u03b1\u22a5, ftk(cid:105) = 0, which then simpli\ufb01es the\nsummation term of Equation 10 to only depend upon \u03b1(cid:107) as shown in Equation 11.\n\nMin. (cid:107)\u03b1(cid:107)2 +\n\n\u03bbk\n\nM(cid:88)\n\nk=1\n\n(cid:2)1 \u2212 yk\n(cid:0)(cid:104)\u03b1(cid:107), ftk(cid:105) + (cid:104)\u03b1\u22a5, ftk(cid:105) \u2212 \u0398(cid:1)(cid:3)\nM(cid:88)\n(cid:0)(cid:104)\u03b1(cid:107), ftk(cid:105) \u2212 \u0398(cid:1)(cid:3)\n(cid:2)1 \u2212 yk\n\n\u03bbk\n\nMin. (cid:107)\u03b1(cid:107)2 +\n\nk=1\n\n(10)\n\n(11)\n\n(12)\n\n(13)\n\n(14)\n\n(15)\n\nIn addition, by considering the relation shown in Equation 12, we \ufb01nd that the \ufb01rst term is minimized\nwhen \u03b1 = \u03b1(cid:107). Hence, the optimal solution to Equation 8 will lie in the aforementioned subspace\nand therefore have the form of Equation 9.\n\n(cid:107)\u03b1(cid:107)2 = (cid:107)\u03b1(cid:107)(cid:107)2 + (cid:107)\u03b1\u22a5(cid:107)2 \u2265 (cid:107)\u03b1(cid:107)(cid:107)2\n\n4.2.3 Dual Representation\n\nWe can now substitute the form of the optimal solution shown in Equation 9 back into the original\noptimization problem shown in Equation 8. This leads to the problem in Equation 13 which is\nequivalent to Equation 14. The resultant quadratic programming problem is solvable given that we\nhave access to the positive de\ufb01nite matrix K, which was derived in Section 4.2.1 and is shown in\nEquation 15.\n\n\u03bdkftk\n\ns.t. ym\n\n\u03bdkftk , ftm\n\n\u2212 \u0398\n\n\u2265 1 m = {1 . . . M}\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) M(cid:88)\nM(cid:88)\n\nk=1\n\nMin.\n\nM(cid:88)\n\ni=1\n\nj=1\n\n(cid:43)\n\n(cid:33)\n(cid:33)\n\n(cid:32)(cid:42) M(cid:88)\n(cid:32) M(cid:88)\n(cid:90) \u221e\n\nk=1\n\nk=1\n\n(cid:90) \u221e\n\nMin.\n\n\u03bdi\u03bdjK(ti, tj) s.t. ym\n\n\u03bdkK(tk, tm) \u2212 \u0398\n\n\u2265 1 m = {1 . . . M}\n\nK(ti, tj) = (cid:104)fti, ftj(cid:105) =\n\nftiftj d\u03b2 d\u03c4 =\n\ntitj\n\n(ti + tj)2\n\n0\n\n0\n\n4.3 Single Synapse\n\nWe are now in a position to extend the framework to multiple spikes on a single synapse. Since\nwe are learning an SRM0 approximation of a neuron, we assume that the effects of spikes are\nadditively separable [8] and that each spike\u2019s effect on the membrane potential for the given synapse\nis identical. Introducing the latter assumption is the core contribution of this section. We \ufb01rst de\ufb01ne\nthe threshold dependent classi\ufb01cation function for a single spike in a manner identical to that of\nthe single spike formulation shown in Equation 5. This will be the \u201cstereotyped\u201d effect that a spike\narriving at this synapse has on the membrane potential. Note that the AHP effect of the output spike\ntrain can be modeled seamlessly (as a virtual synapse) in this framework.\n\n5\n\n\f4.3.1 Primal Problem\n\n1 , tm\n\n2 , . . . , tm\nNm\n\nWe now consider the additive effects of multiple spikes arriving at a synapse. We de\ufb01ne the vector\ntm = (cid:104)tm\n(cid:105) to be the mth data point, which consists of Nm spikes, represented by\ntheir spike times. Note that we have abused notation. Instead of the superscript repeatedly referring\nto the synapse in question, it now refers to the data point. The primal optimization problem, de\ufb01ned\nin Equation 16, is equivalent to Equation 17.\n\nMin. (cid:107)\u03b1(cid:107)2 s.t. ym\n\n\u2265 1 m = {1 . . . M}\n\n(cid:32) Nm(cid:88)\n(cid:32)(cid:42)\n\nh=1\n\n(cid:10)\u03b1, ftm\nNm(cid:88)\n\nh\n\n(cid:33)\n(cid:33)\n\n(cid:11) \u2212 \u0398\n(cid:43)\n\n(16)\n\n(17)\n\nMin. (cid:107)\u03b1(cid:107)2 s.t. ym\n\nThe Representer theorem states that the optimal \u03b1 must lie in span{(cid:80)Nk\n\n: 1 \u2264 k \u2264 M }. We\nomit the formal proof since it follows along the lines of the previous case. Therefore, the optimal \u03b1\nto Equation 17 will be of the form\n\ni=1 ftk\n\nftm\n\nh=1\n\n\u03b1,\n\nh\n\ni\n\n\u2212 \u0398\n\n\u2265 1 m = {1 . . . M}\n\nM(cid:88)\n\nNk(cid:88)\n\n\u03b1 =\n\n\u03bdk\n\nftk\n\ni\n\n(18)\n\n4.3.2 Dual Problem\n\nk=1\n\ni=1\n\nSubstituting back Equation 18 yields the dual problem Equation 19, which can be solved given the\npositive de\ufb01nite kernel in Equation 20.\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) M(cid:88)\n\nk=1\n\nNk(cid:88)\n\ni=1\n\nMin.\n\n\u03bdk\n\nftk\n\ni\n\ns.t. ym\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n(cid:42) Np(cid:88)\n\n(cid:32)(cid:42) M(cid:88)\nNq(cid:88)\n\nk=1\n\nftq\n\nk\n\nNk(cid:88)\nNp(cid:88)\n\ni=1\n\n,\n\nNm(cid:88)\n(cid:68)\nNq(cid:88)\n\nh=1\n\nftk\n\ni\n\n=\n\n\u03bdk\n\n(cid:43)\n\n(cid:43)\n\n(cid:69)\n\n(cid:33)\nNp(cid:88)\n\nNq(cid:88)\n\nK(tp, tq) =\n\nftp\n\ni\n\n,\n\nftp\n\ni\n\n, ftq\n\nk\n\n=\n\ni=1\n\nk=1\n\ni=1\n\nk=1\n\ni=1\n\nk=1\n\n4.4 Multiple Synapses\n\n\u2212 \u0398\n\n\u2265 1 m = {1 . . . M} (19)\n\nftm\n\nh\n\ni \u00b7 tq\ntp\nk)2\n(tp\ni + tq\n\nk\n\n(20)\n\nIn the multiple synapse case, the principles are identical to that of the single synapse, with the\nexception that spikes arriving at different synapses could have different effects on the membrane\npotential, depending on the strength/type of the synaptic junction. Therefore, we keep the effects of\neach synapse on the membrane potential separate by assigning each synapse its own \u03b1 function.\n\n4.4.1 Primal Problem\n\nSince each synapse and the output has its own \u03b1 function, this simply adds another summation term\nover the S synapses and the output (indexed by 0). The primal optimization problem is de\ufb01ned in\nEquation 21 which is equivalent to Equation 22. S is the number of synapses, Nm,s is the number\nof spikes on the sth synapse of the mth data point, and tm,s\nis the timing of the hth spike on the sth\nsynapse of the mth data point.\n\nh\n\nS(cid:88)\nS(cid:88)\n\ns=0\n\nMin.\n\nMin.\n\n(cid:107)\u03b1s(cid:107)2 s.t. ym\n\n(cid:107)\u03b1s(cid:107)2 s.t. ym\n\n\uf8eb\uf8ed S(cid:88)\nNm,s(cid:88)\n\uf8eb\uf8ed S(cid:88)\n(cid:42)\n\nh=1\n\ns=0\n\n\u03b1s,\n\n(cid:68)\nNm,s(cid:88)\n\n\u03b1s, ftm,s\n\nh\n\nftm,s\n\nh\n\n(cid:69) \u2212 \u0398\n(cid:43)\n\n\u2212 \u0398\n\n\uf8f6\uf8f8 \u2265 1 m = {1 . . . M}\n\uf8f6\uf8f8 \u2265 1 m = {1 . . . M}\n\n(21)\n\n(22)\n\ns=0\n\ns=0\n\nh=1\n\ntheorem states that\nlie in\n: 1 \u2264 k \u2264 M}. This is identical to the single synapse case for each synapse,\n\nthe sth synapses must\n\nthe optimal \u03b1s\n\nfor\n\n6\n\nspan{(cid:80)Nk,s\n\nThe Representer\ni=1 ftk,s\n\ni\n\n\fMin.\n\ns.t.\n\ns=0\n\n\u03bdk\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) M(cid:88)\nNk,s(cid:88)\nS(cid:88)\n\uf8eb\uf8ed S(cid:88)\n(cid:42) M(cid:88)\n(cid:42)Np,s(cid:88)\nS(cid:88)\n\nk=1\n\nk=1\n\ns=0\n\ni=1\n\nym\n\nftk,s\n\ni\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\nNk,s(cid:88)\nNq,s(cid:88)\n\ni=1\n\ni\n\nftk,s\n\n\u03bdk\n\nNm,s(cid:88)\n(cid:43)\n\nh=1\n\n,\n\nftm,s\n\nh\n\n(cid:43)\nS(cid:88)\n\n\u2212 \u0398\n\nNp,s(cid:88)\n\n\uf8f6\uf8f8 \u2265 1 m = {1 . . . M}\nNq,s(cid:88)\n\n\u00b7 tq,s\ntp,s\ni\nk )2\ni + tq,s\n(tp,s\n\nk\n\n(24)\n\n(25)\n\nand therefore, the optimal \u03b1s to Equation 22 will be of the form\n\nM(cid:88)\n\nNk,s(cid:88)\n\n4.4.2 Dual Problem\n\nk=1\n\ni=1\n\n\u03b1s =\n\n\u03bdk\n\nftk,s\n\ni\n\n(23)\n\nSubstituting Equation 23 into Equation 22 yields the dual problem shown in Equation 24 which can\nbe solved given access to the positive de\ufb01nite kernel de\ufb01ned in Equation 25.\n\nK(tp, tq) =\n\nftp,s\n\n,\n\ni\n\nftq,s\n\nk\n\n=\n\ns=0\n\ni=1\n\nk=1\n\ns=0\n\ni=1\n\nk=1\n\n4.5 Summary\n\nWith the above kernels we are able to formulate quadratic programming problems which can be\nsolved with SVMlight [11]. The choice of the dictionary used to derive the kernel is critical to the\nsuccess of this technique. A dictionary of functions tailored to the forms of PSPs and AHPs will\nperform better than a more general class of functions. The properties of the REEF dictionary which\nmake it suitable for this problem are its exponential decay as well as its additive separability [8]. This\nexplains why a Gaussian radial basis function (GRBF) does not work well for this problem. The\nGRBF kernel is not additive. A slight variation of the GRBF which takes the sum of Gaussian\nfunctions, rather than their product, was also explored. This performed better than the GRBF;\nhowever it did not perform well when applied to more complicated neurons.\n\n5 Results\n\nTo test the kernel we learned SRM0 neurons with increasing levels of complexity. We \ufb01rst con-\nsidered a simplistic neuron which only received spikes on a single synapse. We then increased the\ncomplexity of the neuron, by introducing AHP effects as well as different types (excitatory and in-\nhibitory) of afferent synapses with varying synaptic weights. The PSP effect was modeled via the\nclassical alpha function [PSP(t) = C \u00b7 t \u00b7 exp(\u2212t/\u03c4 )] while the AHP effect was modeled by an\nexponential function[AHP(t) = K \u00b7 exp(\u2212t/\u03c4 )]. Although we learned neurons with varying com-\nplexity, for want of space, we discuss here the case of a single neuron that received input spike trains\nfrom 4 excitatory synapses and 1 inhibitory synapse to mimic the ratio of connections observed in\nthe cortex [12]. The stereotyped PSP for the excitatory and inhibitory synapses differed in their rise\nand fall times. The parameters for the stereotyped PSP were set as follows. For the excitatory PSP,\nC = 0.1 and \u03c4 = 10, where t is in units of milliseconds. For the inhibitory PSP, C = \u22120.39 and\n\u03c4 = 5. For the AHP, K = \u221216.667 and \u03c4 = 2.\nWe \ufb01rst trained the classi\ufb01er using 100,000 seconds of spike train data. Only the spike con\ufb01gurations\noccurring at \ufb01xed differentials before and after the neuron emitted a spike were considered. The in-\nput spike trains were generated using an inhomogeneous Poisson process, where the rate was varied\nsinusoidally around the intended mean spike rate in order to produce a more general set of training\ndata. This resulted in 1,647,249 training data points, however only 10,681 of them were used in\nthe solution as support vectors. After training, we tested our model using 100 seconds of unseen\ndata. All spike con\ufb01gurations were considered when testing, regardless of temporal proximity to\nspike generation. To quantify our results, we \ufb01rst calculated the accuracy\n,\n\n(cid:16)correct classi\ufb01cations\n(cid:16)correct negative classi\ufb01cations\n\n(cid:16)correct positive classi\ufb01cations\n\ntotal data points\n\n(cid:17)\n(cid:17)\n\n(cid:17)\n\nthe sensitivity\n\ntotal positive data points\n\ntotal negative data points\n\n.\n\n, and speci\ufb01city\n\n7\n\n\fFigure 2: Figure (a) shows histograms of the difference in time between the actual and predicted\nspike time by the learned model. Figure (b) shows the various PSP approximations (gray) in com-\nparison to the PSP functions used by the neuron (black). Figure (c) depicts the AHP approximation\n(gray) and the AHP function used by the neuron (black).\n\nThey were 0.9947, 0.9532 and 0.9948 respectively. We also calculated a histogram of how close\nthe spike predictions were. For every spike produced by the neuron, we determined the temporal\nproximity of the closest spike time predicted by the model. We then histogrammed this data. Figure\n2(a) shows two histograms depicting these calculations. The larger histogram contains predictions\nwith time differences varying between 0 and 70 ms, with a bin size of 1 ms while the inlaid his-\ntogram ranges from 0 to 10 ms and has a bin size of 0.1 ms. Both use a logarithmic scale on the\ny-axis. From the histograms, we see that the vast majority of spikes were predicted correctly (with\na temporal proximity of 0 ms) and that out of the mispredicted spike times, the temporal proximity\nof all predicted spikes fell within 70 ms of the actual spike time.\nIn Figures 2(b) and 2(c) we display a comparison of the approximated PSP and AHP versus the true\nPSP and AHP. To calculate the classi\ufb01cation model\u2019s approximated PSP we arti\ufb01cially send a single\nspike across each input synapse. We arti\ufb01cially generate a spike to produce the AHP approximation.\nBy considering the distance of the single spike data point from the classi\ufb01er\u2019s margin as the spike\nages, we can get a scaled and translated version of the PSP and AHP. The \ufb01gures show these ap-\nproximations scaled and translated back appropriately. In Figure 2(b) we show the approximations\nof the PSPs for the input synapses. The approximations are shown in gray; the true PSPs are shown\nin black. The different line styles are representative of the different synapses and therefore have\nvarying synaptic weights. A similar image for the AHP is shown in Figure 2(c). We note that there\nare small differences between the approximated and the true functions. If the PSP and AHP ap-\nproximations were exact, we would have seen perfect classi\ufb01cation results. However, as with most\nmachine learning techniques, the quality of the solution is limited by the training data given.\n\n6 Conclusion\n\nIn this paper we have developed a classi\ufb01cation framework which uses a novel kernel derived from\na REEF dictionary to produce an SRM0 approximation of a neuron. The technique used is non-\ninvasive in the sense that it only requires the timing of afferent and efferent spikes within a certain\nbounded past. The REEF dictionary was chosen due to its similarity to PSP and AHP functions used\nin a neuron model proposed by MacGregor and Lewis [9].\nBy producing an SRM0 approximation, which is additively separable [8], we produce a model which\nis both versatile and accurate [6].\nIn addition, it is a relatively simple model, which allows for\nincreased generalizability to unseen input. The simplicity of the SRM0 model has the potential to\nallow us to observe deviations between the model and the neuron, which can lead to insights on the\nvarious behavioral modes of neurons.\n\nAcknowledgments\n\nThis work was supported by a National Science Foundation grant (NSF IIS-0902230) to A.B.\n\n8\n\n0204060100101102103FrequencySpike Time Difference (ms)020406080100\u22121.5\u22121\u22120.500.51VoltageTime (ms)020406080100\u221220\u221215\u221210\u2212505VoltageTime (ms)00.20.40.60.81100101102103FrequencySpike Time Difference (ms)b)c)a)\fReferences\n[1] R. Jolivet, A. Roth, F. Sch\u00a8urmann, W. Gerstner, and W. Senn. Special issue on quantitative\n\nneuron modeling. Biological Cybernetics, 99(4):237\u2013239, 2008.\n\n[2] W. Gerstner and R. Naud. How Good Are Neuron Models? Science, 326(5951):379\u2013380,\n\n2009.\n\n[3] W. Gerstner and W. Kistler. Spiking Neuron Models: An Introduction. Cambridge University\n\nPress New York, NY, USA, 2002.\n\n[4] R. Jolivet, T.J. Lewis, and W. Gerstner. The spike response model: a framework to pre-\ndict neuronal spike trains. Arti\ufb01cial Neural Networks and Neural Information Processing\u2013\nICANN/ICONIP 2003, pages 173\u2013173, 2003.\n\n[5] L. Paninski, J.W. Pillow, and E.P. Simoncelli. Maximum likelihood estimation of a stochastic\n\nintegrate-and-\ufb01re neural encoding model. Neural Computation, 16(12):2533\u20132561, 2004.\n\n[6] R. Jolivet, T.J. Lewis, and W. Gerstner. Generalized integrate-and-\ufb01re models of neuronal\nactivity approximate spike trains of a detailed model to a high degree of accuracy. Journal of\nNeurophysiology, 92(2):959\u2013976, 2004.\n\n[7] A. Banerjee. On the phase-space dynamics of systems of spiking neurons. I: Model and exper-\n\niments. Neural Computation, 13(1):161\u2013193, 2001.\n\n[8] Tadeusz Stanisz. Functions with separated variables. Master\u2019s thesis, Zeszyty Naukowe Uni-\n\nwerstyetu Jagiellonskiego, 1969.\n\n[9] R.J. MacGregor and E.R. Lewis. Neural Modeling. Plenum Press, New York, 1977.\n[10] G. Kimeldorf and G. Wahba. Some results on Tchebychef\ufb01an spline functions. Journal of\n\nMathematical Analysis and Applications, 33(1):82\u201395, 1971.\n\n[11] T. Joachims. Making large-scale support vector machine learning practical. In Advances in\n\nKernel Methods, pages 169\u2013184. MIT Press, 1999.\n\n[12] E.M. Izhikevich. Simple model of spiking neurons. IEEE Transactions on Neural Networks,\n\n14(6):1569\u20131572, 2003.\n\n9\n\n\f", "award": [], "sourceid": 490, "authors": [{"given_name": "Nicholas", "family_name": "Fisher", "institution": null}, {"given_name": "Arunava", "family_name": "Banerjee", "institution": null}]}