{"title": "Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 1417, "page_last": 1424, "abstract": null, "full_text": " Synergies between Intrinsic and Synaptic\n Plasticity in Individual Model Neurons\n\n\n\n Jochen Triesch\n Dept. of Cognitive Science, UC San Diego, La Jolla, CA, 92093-0515, USA\n Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany\n triesch@ucsd.edu\n\n\n\n\n Abstract\n\n This paper explores the computational consequences of simultaneous in-\n trinsic and synaptic plasticity in individual model neurons. It proposes\n a new intrinsic plasticity mechanism for a continuous activation model\n neuron based on low order moments of the neuron's firing rate distribu-\n tion. The goal of the intrinsic plasticity mechanism is to enforce a sparse\n distribution of the neuron's activity level. In conjunction with Hebbian\n learning at the neuron's synapses, the neuron is shown to discover sparse\n directions in the input.\n\n\n1 Introduction\n\nNeurons in the primate visual system exhibit a sparse distribution of firing rates. In partic-\nular, neurons in different visual cortical areas show an approximately exponential distribu-\ntion of their firing rates in response to stimulation with natural video sequences [1]. The\nbrain may do this because the exponential distribution maximizes entropy under the con-\nstraint of a fixed mean firing rate. The fixed mean firing rate constraint is often considered\nto reflect a desired level of metabolic costs. This view is theoretically appealing. However,\nit is currently not clear how neurons adjust their firing rate distribution to become sparse.\nSeveral different mechanisms seem to play a role: First, synaptic learning can change a\nneuron's response to a distribution of inputs. Second, intrinsic learning may change con-\nductances in the dendrites and soma to adapt the distribution of firing rates [7]. Third,\nnon-linear lateral interactions in a network can make a neuron's responses more sparse [8].\nIn the extreme case this leads to winner-take-all networks, which form a code where only\na single unit is active for any given stimulus. Such ultra-sparse codes are considered inef-\nficient, however. This paper investigates the interaction of intrinsic and synaptic learning\nprocesses in individual model neurons in the learning of sparse codes.\nWe consider an individual continuous activation model neuron with a non-linear transfer\nfunction that has adjustable parameters. We are proposing a simple intrinsic learning mech-\nanism based on estimates of low-order moments of the activity distribution that allows the\nmodel neuron to adjust the parameters of its non-linear transfer function to obtain an ap-\nproximately exponential distribution of its activity. We then show that if combined with a\nstandard Hebbian learning rule employing multiplicative weight normalization, this leads\nto the extraction of sparse features from the input. This is in sharp contrast to standard\nHebbian learning in linear units with multiplicative weight normalization, which leads to\n\n\f\nthe extraction of the principal Eigenvector of the input correlation matrix. We demonstrate\nthe behavior of the combined intrinsic and synaptic learning mechanisms on the classic\nbars problem [4], a non-linear independent component analysis problem.\nThe remainder of this paper is organized as follows. Section 2 introduces our scheme for in-\ntrinsic plasticity and presents experiments demonstrating the effectiveness of the proposed\nmechanism for inducing a sparse firing rate distribution. Section 3 studies the combination\nof intrinsic plasticity with Hebbian learning at the synapses and demonstrates how it gives\nrise to the discovery of sparse directions in the input. Finally, Sect. 4 discusses our findings\nin the context of related work.\n\n2 Intrinsic Plasticity Mechanism\n\nBiological neurons do not only adapt synaptic properties but also change their excitabil-\nity through the modification of voltage gated channels. Such intrinsic plasticity has been\nobserved across many species and brain areas [9]. Although our understanding of these\nprocesses and their underlying mechanisms remains quite unclear, it has been hypothesized\nthat this form of plasticity contributes to a neuron's homeostasis of its mean firing rate level.\nOur basic hypothesis is that the goal of intrinsic plasticity is to ensure an approximately ex-\nponential distribution of firing rate levels in individual neurons. To our knowledge, this\nidea was first investigated in [7], where a Hodgkin-Huxley style model with a number of\nvoltage gated conductances was considered. A learning rule was derived that adapts the\nproperties of voltage gated channels to match the firing rate distribution of the unit to a\ndesired distribution. In order to facilitate the simulation of potentially large networks we\nchoose a different, more abstract level of modeling employing a continuous activation unit\nwith a non-linear transfer function. Our model neuron is described by:\n Y = S T\n (X ) , X = w u , (1)\nwhere Y is the neuron's output (firing rate), X is the neuron's total synaptic current, w\nis the neuron's weight vector representing synaptic strengths, the vector u represents the\npre-synaptic input, and S(.) is the neuron's non-linear transfer function (activation func-\ntion), parameterized by a vector of parameters . In this section we will not be concerned\nwith synaptic mechanism changing the weight vector w, so we will just consider a partic-\nular distribution p(X = x) p(x) of the net synaptic current and consider the resulting\ndistribution of firing rates p(Y = y) p(y). Intrinsic plasticity is modeled as inducing\nchanges to the non-linear transfer function with the goal of bringing the distribution of\nactivity levels p(y) close to an exponential distribution.\nIn general terms, the problem is that of matching a distribution to another. Given a signal\nwith a certain distribution, find a non-linear transfer function that converts the signal to\none with a desired distribution. In image processing, this is typically called histogram\nmatching. If there are no restrictions on the non-linearity then a solution can always be\nfound. The standard example is histogram equalization, where a signal is passed through\nits own cumulative density function to give a uniform distribution over the interval [0, 1].\nWhile this approach offers a general solution, it is unclear how individual neurons could\nachieve this goal. In particular, it requires that the individual neuron can change its non-\nlinear transfer function arbitrarily, i.e. it requires infinitely many degrees of freedom.\n\n2.1 Intrinsic Plasticity Based on Low Order Moments of Firing Rate\n\nIn contrast to the general scheme outlined above the approach proposed here utilizes a\nsimple sigmoid non-linearity with only two adjustable parameters a and b:\n 1\n Sab(X) = . (2)\n 1 + exp (- (X - b) /a)\n\n\f\nParameter a > 0 changes the steepness of the sigmoid, while parameter b shifts it\nleft/right1. Qualitatively similar changes in spike threshold and slope of the activation\nfunction have been observed in cortical neurons. Since the non-linearity has only two de-\ngrees of freedom it is generally not possible to ascertain an exponential activity distribution\nfor an arbitrary input distribution. A plausible alternative goal is to just match low order\nmoments of the activity distribution to those of a specific target distribution. Since our\nsigmoid non-linearity has two parameters, we consider the first and second moments.\nFor a random variable T following an exponential distribution with mean we have:\n\n 1\n p(T = t) = exp (-t/) ; M 1 T = ; M 2 T 2 = 22 , (3)\n T T\n\nwhere . denotes the expected value operator. Our intrinsic plasticity rule is formulated as\na set of simple proportional control laws for a and b that drive the first and second moments\n Y and Y 2 of the output distributions to the values of the corresponding moments of an\nexponential distribution M1 and M2:\n T T\n\n a = Y 2 - 22 , b = ( Y - ) , (4)\nwhere and are learning rates. The mean of the desired exponential distribution is\na free parameter which may vary across cortical areas. Equations (4) describe a system\nof coupled integro-differential equations where the integration is implicit in the expected\nvalue operations. Note that both Y and Y 2 depend on the sigmoid parameters a and\nb. From (4) it is obvious that there is a stationary point of these dynamics if the first and\nsecond moment of Y equal the desired values of and 22, respectively.\nThe first and second moments of Y need to be estimated online. In our model, we calculate\nestimates ^\n M 1 and ^\n M 2 of Y and Y 2 according to:\n Y Y\n\n \n ^ \n M 1 = (y - ^\n M 1 ) , ^\n M 2 = (y2 - ^\n M 2 ) , (5)\n Y Y Y Y\n\nwhere is a small learning rate.\n\n2.2 Experiments with Intrinsic Plasticity Mechanism\n\nWe tested the proposed intrinsic plasticity mechanism for a number of distributions of\nthe synaptic current X (Fig. 1). Consider the case where this current follows a Gaussian\ndistribution with zero mean and unit variance: X N (0, 1). Under this assumption we can\ncalculate the moments Y and Y 2 (although only numerically) for any particular values\nof a and b. Panel a in Fig. 1 shows a phase diagram of this system. Its flow field is sketched\nand two sample trajectories converging to a stationary point are given. The stationary point\nis at the intersection of the nullclines where Y = and Y 2 = 22. Its coordinates\nare a 0.90, b 2.38. Panel b compares the theoretically optimal transfer function\n \n(dotted), which would lead to an exactly exponential distribution of Y , with the learned\nsigmoidal transfer function (solid). The learned transfer function gives a very good fit.\nThe resulting distribution of Y is in fact very close to the desired exponential distribution.\nFor the general case of a Gaussian input distribution with mean and standard deviation\n G\n , the sigmoid parameters will converge to and under the\n G a a b b \n G G + G\nintrinsic plasticity rule. If the input to the unit can be assumed to be Gaussian, this relation\ncan be used to calculate the desired parameters of the sigmoid non-linearity directly.\n\n 1Note that while we view adjusting a and b as changing the shape of the sigmoid non-linearity,\nan equivalent view is that a and b are used to linearly rescale the signal X before it is passed through\na \"standard\" logistic function. In general, however, intrinsic plasticity may give rise to non-linear\nchanges that cannot be captured by such a linear re-scaling of all weights.\n\n\f\n 10\n 10\na b input distribution \n 1 optimal transfer fct.\n learned transfer fct.\n 8\n 8\n 0.8\n\n\n 6\n 6\n 0.6\n bb\n\n 4\n 4 0.4\n\n\n 2\n 2 0.2\n\n\n\n 0\n 0 0\n 0\n 0 1\n 1 2\n 2 3\n 3 4\n 4 5\n 5 -4 -2 0 2 4\n a\n a\n\n\nc d\n input distribution input distribution \n 1 optimal transfer fct. 1 optimal transfer fct.\n learned transfer fct. learned transfer fct.\n\n 0.8 0.8\n\n\n\n 0.6 0.6\n\n\n\n 0.4 0.4\n\n\n\n 0.2 0.2\n\n\n\n 0 0\n 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1\n\n\n\nFigure 1: Dynamics of intrinsic plasticity mechanism for various input distributions. a,b:\nGaussian input distribution. Panel a shows the phase plane diagram. Arrows indicate\nthe flow field of the system. Dotted lines indicate approximate locations of the nullclines\n(found numerically).Two example trajectories are exhibited which converge to the station-\nary point (marked with a circle). Panel b shows the optimal (dotted) and learned trans-\nfer function (solid). The Gaussian input distribution (dashed, not drawn to scale) is also\nshown. c,d: same as b but for uniform and exponential input distribution. Parameters were\n = 0.1, = 5 10-4, = 2 10-3, = 10-3.\n\n\nPanels c and d show the result of intrinsic plasticity for two other input distributions. In\nthe case of a uniform input distribution in the interval [0, 1] (panel c) the optimal transfer\nfunction becomes infinitely steep for x 1. For an exponentially distributed input (panel\nd), the ideal transfer function would simply be the identity function. In both cases the\nintrinsic plasticity mechanism adjusts the sigmoid non-linearity in a sensible fashion and\nthe output distribution is a fair approximation of the desired exponential distribution.\n\n2.3 Discussion of the Intrinsic Plasticity Mechanism\n\nThe proposed mechanism for intrinsic plasticity is effective in driving a neuron to exhibit\nan approximately exponential distribution of firing rates as observed in biological neurons\nin the visual system. The general idea is not restricted to the use of a sigmoid non-linearity.\nThe same adaptation mechanism can also be used in conjunction with, say, an adjustable\nthreshold-linear activation function. An interesting alternative to the proposed mechanism\ncan be derived by directly minimizing the KL divergence between the output distribution\nand the desired exponential distribution through stochastic gradient descent. The resulting\nlearning rule, which is closely related to a rule for adapting a sigmoid nonlinearity to max-\n\n\f\nimize the output entropy derived by Bell and Sejnowski[2], will be discussed elsewhere. It\nleads to very similar results to the ones presented here.\nA biological implementation of the proposed mechanism is plausible. All that is needed are\nestimates of the first and second moment of the firing rate distribution. A specific, testable\nprediction of the simple model is that changes to the distribution of a neuron's firing rate\nlevels that keep the average firing rate of the neuron unchanged but alter the second moment\nof the firing rate distribution should lead to measurable changes in the neuron's excitability.\n\n3 Combination of Intrinsic and Synaptic Plasticity\n\nIn this Section we want to study the effects of simultaneous intrinsic and synaptic learn-\ning for an individual model neuron. Synaptic learning is typically modeled with Hebbian\nlearning rules, of which a large number are being used in the literature. In principle, any\nHebbian learning rule can be combined with our scheme for intrinsic plasticity. Due to\nspace limitations, we only consider the simplest of all Hebbian learning rules:\n\n T\n w = uY (u) = uSab(w u) , (6)\nwhere the notation is identical to that of Sec. 2 and is a learning rate. This learning rule\nis unstable and needs to be accompanied by a scheme limiting weight growth. We simply\nadopt a multiplicative normalization scheme that after each update re-scales the weight\nvector to unit length: w w/|| w ||.\n\n3.1 Analysis for the Limiting Case of Fast Intrinsic Plasticity\n\nUnder a few assumptions, an interesting intuition about the simultaneous intrinsic and Heb-\nbian learning can be gained. Consider the limit of intrinsic plasticity being much faster than\nHebbian plasticity. This may not be very plausible biologically, but it allows for an inter-\nesting analysis. In this case we may assume that the non-linearity has adapted to give an\napproximately exponential distribution of the firing rate Y before w can change much.\nThus, from (6), w can be seen as a weighted sum of the inputs u, with the activities\nY acting as weights that follow an approximately exponential distribution. Since similar\ninputs u will produce similar outputs Y , the expected value of the weight update w\nwill be dominated by a small set of inputs that produce the highest output activities. The\nremainder of the inputs will \"pull\" the weight vector back to the average input u . Due\nto the multiplicative weight normalization, the stationary states of the weight vector are\nreached if w is parallel to w, i.e., if w = kw for some constant k.\nA simple example shall illustrate the effect of intrinsic plasticity on Hebbian learning in\nmore detail. Consider the case where there are only two clusters of inputs at the locations\nc and . Let us also assume that both clusters account for exactly half of the inputs. If\n 1 c2\nthe weight vector is slightly closer to one of the two clusters, inputs from this cluster will\nactivate the unit more strongly and will exert a stronger \"pull\" on the weight vector. Let\nm = ln(2) denote the median of the exponential firing rate distribution with mean .\nThen inputs from the closer cluster, say c , will be responsible for all activities above\n 1 m\nwhile the inputs from the other cluster will be responsible for all activities below m. Hence,\nthe expected value of the weight update w will be given by:\n y m y\n w c1 exp(-y/)dy + c2 exp(-y/)dy (7)\n m \n 0\n \n = ((1 + ln 2) c1 + (1 - ln 2) c2) . (8)\n 2\nTaking the multiplicative weight normalization into account, we see that the weight vector\n\n\f\n -3\n x 10 0\n 8 10\n i\n\n\n\n 6\n -1\n 10\n\n\n 4\n\n frequency -2\n 10\n\n 2\n\n\n contribution to weight vector f -3\n 0 10\n 0 200 400 600 800 1000 0 2 4 6 8\n cluster number i contribution to weight vector f -3\n i x 10\n\n\nFigure 2: Left: relative contributions to the weight vector f for\n i N = 1000 input clusters\n(sorted). Right: the distribution of the f is approximately exponential.\n i\n\n\nwill converge to either of the following two stationary states:\n (1 ln 2)c1 + (1 ln 2)c2\n w = . (9)\n || (1 ln 2)c1 + (1 ln 2)c2 ||\nThe weight vector moves close to one of the two clusters but does not fully commit to it.\nFor the general case of N input clusters, only a few clusters will strongly contribute to the\nfinal weight vector. Generalizing the result from above, it is not difficult to derive that the\nweight vector w will be proportional to a weighted sum of the cluster centers:\n N\n w fici ; with fi = 1 + log(N ) - i log(i) + (i - 1) log(i - 1) , (10)\n i=1\nwhere we define 0 log(0) 0. Here, f denotes the relative contribution of the\n i i-th closest\ninput cluster to the final weight vector. There can be at most N! resulting weight vectors\nowing to the number of possible assignments of the f to the clusters. Note that the final\n i\nweight vector does not depend on the desired mean activity level . Fig. 2 plots (10)\nfor N = 1000 (left) and shows that the resulting distribution of the f is approximately\n i\nexponential (right).\nWe can see why such a weight vector may correspond to a sparse direction in the input space\nas follows: consider the case where the input cluster centers are random vectors of unit\nlength in a high-dimensional space. It is a property of high-dimensional spaces that random\nvectors are approximately orthogonal, so that T\n c , where is the Kronecker delta.\n i cj ij ij\nIf we consider the projection of an input from an arbitrary cluster, say c , onto the weight\n j\nvector, we see that T T T\n w c . The distribution of\n j f X = w u follows\n i ici cj fj\nthe distribution of the f , which is approximately exponential. Thus, the projection of all\n i\ninputs onto the weight vector has an approximately exponential distribution. Note that this\nbehavior is markedly different from Hebbian learning in a linear unit which leads to the\nextraction of the principal eigenvector of the input correlation matrix.\nIt is interesting to note that in this situation the optimal transfer function S that will make\nthe unit's activity Y have an exponential distribution of a desired mean is simply a multi-\nplication with a constant k, i.e. S(X) = kX. Thus, depending on the initial weight vector\nand the resulting distribution of X, the neuron's activation function may transiently adapt\nto enforce an approximately exponential firing rate distribution, but the simultaneous Heb-\nbian learning drives it back to a linear form. In the end, a simple linear activation function\nmay result from this interplay of intrinsic and synaptic plasticity. In fact, the observation\nof approximately linear activation functions in cortical neurons is not uncommon.\n\n\f\n 1\n\n\n\n 0.5\n activity\n\n 00 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5\n 4\n x 10\n input patterns/10000\n\n\n\n\n\nFigure 3: Left: example stimuli from the \"bars\" problem for a 10 by 10 pixel retina. Right:\nthe activity record shows the unit's response to every 10th input pattern. Below, we show\nthe learned weight vector after presentation of 10,000, 20,000, and 30,000 training patterns.\n\n\n3.2 Application to the \"Bars\" Problem\n\nThe \"bars\" problem is a standard problem for unsupervised learning architectures [4]. It\nis a non-linear ICA problem for which traditional ICA approaches have been shown to\nfail [5]. The input domain consists of an N-by-N retina. On this retina, all horizontal\nand vertical bars (2N in total) can be displayed. The presence or absence of each bar is\ndetermined independently, with every bar occurring with the same probability p (in our\ncase p = 1/N). If a horizontal and a vertical bar overlap, the pixel at the intersection point\nwill be just as bright as any other pixels on the bars, rather than twice as bright. This makes\nthe problem a non-linear ICA problem. Example stimuli from the bars dataset are shown\nin Fig. 3 (left). Note that we normalize input vectors to unit length. The goal of learning in\nthe bars problem is to find the independent sources of the images, i.e., the individual bars.\nThus, the neural learning system should develop filters that represent the individual bars.\nWe have trained an individual sigmoidal model neuron on the bars input domain. The\ntheoretical analysis above assumed that intrinsic plasticity is much faster than synaptic\nplasticity. Here, we set the intrinsic plasticity to be slower than the synaptic plasticity,\nwhich is more plausible biologically, to see if this may still allow the discovery of sparse\ndirections in the input. As illustrated in Fig. 3 (right) the unit's weight vector aligns with\none of the individual bars as soon as the intrinsic plasticity has pushed the model neuron\ninto a regime where its responses are sparse: the unit has discovered one of the independent\nsources of the input domain. This result is robust if the desired mean activity of the unit\nis changed over a wide range. If is reduced from its default value (1/2N = 0.05)\nover several orders of magnitude (we tried down to 10-5) the result remains unchanged.\nHowever, if is increased above about 0.15, the unit will fail to represent an individual\nbar but will learn a mixture of two or more bars, with different bars being represented with\ndifferent strengths. Thus, in this example -- in contrast to the theoretical result above --\nthe desired mean activity does influence the weight vector that is being learned. The\nreason for this is that the intrinsic plasticity only imperfectly adjusts the output distribution\nto the desired exponential shape. As can be seen in Fig. 3 the output has a multimodal\nstructure. For low , only the highest mode, which corresponds to a specific single bar\npresented in isolation, contributes strongly to the weight vector.\n\n4 Discussion\n\nBiological neurons are highly adaptive computation devices. While the plasticity of a neu-\nron's synapses has always been a core topic of neural computation research, there has been\nlittle work investigating the computational properties of intrinsic plasticity mechanisms and\n\n\f\nthe relation between intrinsic and synaptic learning. This paper has investigated the poten-\ntial role of intrinsic learning mechanisms operating at the soma when used in conjunction\nwith Hebbian learning at the synapses. To this end, we have proposed a new intrinsic plas-\nticity mechanism that adjusts the parameters of a sigmoid nonlinearity to move the neuron's\nfiring rate distribution to a sparse regime. The learning mechanism is effective in produc-\ning approximately exponential firing rate distributions as observed in neurons in the visual\nsystem of cats and primates. Studying simultaneous intrinsic and synaptic learning, we\nfound a synergistic relation between the two. We demonstrated how the two mechanisms\nmay cooperate to discover sparse directions in the input. When applied to the classic \"bars\"\nproblem, a single unit was shown to discover one of the independent sources as soon as the\nintrinsic plasticity moved the unit's activity distribution into a sparse regime. Thus, this re-\nsearch is related to other work in the area of Hebbian projection pursuit and Hebbian ICA,\ne.g., [3, 6]. In such approaches, the \"standard\" Hebbian weight update rule is modified to\nallow the discovery of non-gaussian directions in the input. We have shown that the com-\nbination of intrinsic plasticity with the standard Hebbian learning rule can be sufficient for\nthe discovery of sparse directions in the input. Future work will analyze the combination\nof intrinsic plasticity with other Hebbian learning rules. Further, we would like to consider\nnetworks of such units and the formation of map-like representations. The nonlinear nature\nof the transfer function may facilitate the construction of hierarchical networks for unsu-\npervised learning. It will also be interesting to study the effects of intrinsic plasticity in the\ncontext of recurrent networks, where it may contribute to keeping the network in a certain\ndesired dynamic regime.\n\nAcknowledgments\n\nThe author is supported by the National Science Foundation under grants NSF 0208451\nand NSF 0233200. I thank Erik Murphy-Chutorian and Emanuel Todorov for discussions\nand comments on earlier drafts.\n\nReferences\n\n[1] R. Baddeley, L. F. Abbott, M.C. Booth, F. Sengpiel, and T. Freeman. Responses of neurons in\n primary and inferior temporal visual cortices to natural scenes. Proc. R. Soc. London, Ser. B,\n 264:17751783, 1998.\n[2] A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation and\n blind deconvolution. Neural Computation, 7:11291159, 1995.\n[3] B. S. Blais, N. Intrator, H. Shouval, and L. N. Cooper. Receptive field formation in natural scene\n environments. Neural Computation, 10:17971813, 1998.\n[4] P. Foldiak. Forming sparse representations by local anti-hebbian learning. Biological Cybernet-\n ics, 64:165170, 1990.\n[5] S. Hochreiter and J. Schmidhuber. Feature extraction through LOCOCODE. Neural Computa-\n tion, 11(3):679714, 1999.\n[6] A. Hyvarinen and E. Oja. Independent component analysis by general nonlinear hebbian-like\n learning rules. Signal Processing, 64(3):301313, 1998.\n[7] M. Stemmler and C. Koch. How voltage-dependent conductances can adapt to maximize the\n information encoded by neuronal firing rate. Nature Neuroscience, 2(6):521527, 1999.\n[8] W. E. Vinje and J. L. Gallant. Sparse coding and decorrelation in primary visual cortex during\n natural vision. Science, 287:12731276, 2000.\n[9] W. Zhang and D. J. Linden. The other side of the engram: Experience-driven changes in neuronal\n intrinsic excitability. Nature Reviews Neuroscience, 4:885900, 2003.\n\n\f\n", "award": [], "sourceid": 2731, "authors": [{"given_name": "Jochen", "family_name": "Triesch", "institution": null}]}