{"title": "Information Maximization in Single Neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 160, "page_last": 166, "abstract": null, "full_text": "Information Maximization in Single Neurons \n\nMartin Stemmler and Christof Koch \n\nComputation and Neural Systems Program \n\nCaltech 139-74 \n\nPasadena, CA 91 125 \n\nEmail: stemmler@klab.caltech.edu.koch@klab.caltech.edu \n\nAbstract \n\nInformation from the senses must be compressed into the limited range \nof firing rates generated by spiking nerve cells. Optimal compression \nuses all firing rates equally often, implying that the nerve cell's response \nmatches the statistics of naturally occurring stimuli. Since changing \nthe voltage-dependent ionic conductances in the cell membrane alters \nthe flow of information, an unsupervised, non-Hebbian, developmental \nlearning rule is derived to adapt the conductances in Hodgkin-Huxley \nmodel neurons. By maximizing the rate of information transmission, \neach firing rate within the model neuron's limited dynamic range is used \nequally often . \n\nAn efficient neuronal representation of incoming sensory information should take advan(cid:173)\ntage of the regularity and scale invariance of stimulus features in the natural world. In \nthe case of vision, this regularity is reflected in the typical probabilities of encountering \nparticular visual contrasts, spatial orientations, or colors [1]. Given these probabilities, an \noptimized neural code would eliminate any redundancy, while devoting increased repre(cid:173)\nsentation to commonly encountered features. \n\nAt the level of a single spiking neuron, information about a potentially large range of stimuli \nis compressed into a finite range of firing rates, since the maximum firing rate of a neuron is \nlimited. Optimizing the information transmission through a single neuron in the presence \nof uniform, additive noise has an intuitive interpretation: the most efficient representation \nof the input uses every firing rate with equal probability. An analogous principle for non(cid:173)\nspiking neurons has been tested experimentally by Laughlin [2], who matched the statistics \n\n\fInformation Maximization in Single Neurons \n\n161 \n\n(Hodgkin-Huxley \n\nSoma spiking conductances) \n\n(coupling conductance) \n\nFigure 1: The model neuron contains two compartments to represent the cell's soma and \ndendrites. To maximize the information transfer, the parameters for six calcium and six \npotassium voltage-dependent conductances in the dendritic compartment are iteratively ad(cid:173)\njusted, while the somatic conductances responsible for the cell's spiking behavior are held \nfixed. \n\nof naturally occurring visual contrasts to the response amplitudes of the blowfly'S large \nmonopolar cell. \n\nFrom a theoretical perspective, the central question is whether a neuron can \"learn\" the \nbest representation for natural stimuli through experience. During neuronal development, \nthe nature and frequency of incoming stimuli are known to change both the anatomical \nstructure of neurons and the distribution of ionic conductances throughout the cell [3]. We \nseek a guiding principle that governs the developmental timecourse of the Na+, Ca2+ and \nK+ conductances in the somatic and dendritic membrane by asking how a neuron would \nset its conductances to transmit as much information as possible. Spiking neurons must \nassociate a range of different inputs to a set of distinct responses-a more difficult task than \n\n\f162 \n\nM. Stemmler and C. Koch \n\nkeeping the firing rate or excitatory postsynaptic potential (EPSP) amplitude constant under \nchanging conditions, two tasks for which learning rules that change the voltage-dependent \nconductances have recently been proposed [4, 5] . Learning the proper representation of \nstimulus information goes beyond simply correlating input and output; an alternative to the \nclassic postulate of Hebb [6], in which synaptic learning in networks is a consequence of \ncorrelated activity between pre- and postsynaptic neurons, is required for such learning in \na single neuron. \n\nTo explore the feasibility of learning rules for information maximization, a simplified \nmodel of a neuron consisting of two electrotonic compartments, illustrated in fig. 1, was \nconstructed. The soma (or cell body) contains the classic Hodgkin-Huxley sodium and \ndelayed rectifier potassium conductances, with the addition of a transient potassium \"A(cid:173)\n\"current and an effective calcium-dependent potassium current. The soma is coupled \nthrough an effective conductance G to the dendritic compartment, which contains the \nsynaptic input conductance and three adjustable calcium and three adjustable potassium \nconductances. \n\nThe dynamics of this model are given by Hodgkin-Huxley-like equations that govern the \nmembrane potential and a set of activation and inactivation variables, mi and hi , respec(cid:173)\ntively. In each compartment of the neuron, the voltage V evolves as \n\nC dV - \"\"\" \n\nill - ~ gi m i \n\ni-V' \n\n) \n\nPi hqi (E \n\ni \n\n(1) \n\ni \n\nwhere C is the membrane capacitance, gi is the (peak) value of the i-th conductance, Pi and \nqi are integers, and Ei are the ion-specific reversal potentials. The variables hi and mi obey \nfirst order kinetics of the type dm/dt = (moo (V) - m) /T(V), where moo (V) denotes the \nsteady state activation when the voltage is clamped to V and T(V) is a voltage-dependent \ntime constant. \n\nAll parameters for the somatic compartment, with the exception of the adaptation con(cid:173)\nductance, are given by the standard model of Connor et al (1977) [7], This choice of \nsomatic spiking conductances allows spiking to occur at arbitrarily low firing rates. Adap(cid:173)\ntation is modeled by a calcium-dependent potassium conductance that scales with the fir(cid:173)\ning rate, such that the conductance has a mean value of 34 mS/cm2 Hz. The calcium \nand potassium conductances in the dendritic compartment have simple activation and in(cid:173)\nactivation functions described by distinct Boltzmann functions. Together with the peak \nconductance values, the midpoint voltages VI and slopes s of these Boltzmann functions \nadapt to the statistics of stimuli. For simplicity, all time constants for the dendritic con-\nductances are set to a constant 5 msec. For additional details and parameter values, see \nhttp://www.klab.caltech.edu/infomax. \n\n2 \n\nHodgkin-Huxley models can exhibit complex behaviors on several timescales, such as fir(cid:173)\ning patterns consisting of \"bursts\"-sequences of multiple spikes interspersed with periods \nof silence. We will, however, focus on models of regularly spiking cells that adapt to \na sustained stimulus by spiking periodically. To quantify how much information about a \ncontinuous stimulus variable x the time-averaged firing rate f of a regularly spiking neuron \ncarries, we use a lower bound [8] on the mutual information J(f; x) between the stimulus \n\n\fInformation Maximization in Single Neurons \n\n163 \n\nx and the firing rate f: \n\nhB(J; x) = -jIn (p(J) CTf(X)) p(x) dx -In(J27re), \n\n(2) \n\nwhere p(J) is the probability, given the set of all stimuli, of a firing rate f, and CTJ (x) is the \nvariance of the firing rate in response to a given stimulus x. \n\nTo maximize the information transfer, does a neuron need to \"know\" the arrival rates of \nphotons impinging on the retina or the frequencies of sound waves hitting the ear's tym(cid:173)\npanic membrane? Since the ion channels in the dendrites only sense a voltage and not the \nstimulus directly, the answer to this question, fortunately, is no: maximizing the informa(cid:173)\ntion between the firing rate f and the dendritic voltage Vdend(t) is equivalent to maximizing \nthe information about the stimuli, as long as we can guarantee that the transformation from \nstimuli to firing rates is always one-to-one. \n\nSince a neuron must be able to adapt to a changing environment and shifting intra- and \nextracellular conditions [4], learning and relearning of the proper conductance parameters, \nsuch as the channel densities, should occur on a continual basis. An alphabet zoo of dif(cid:173)\nferent calcium (Ca2+) conductances in neurons of the central nervous system, denoted 'L', \n'N', 'P', 'R', and 'T' -conductances, reflects a wealth of different voltage and pharmacolog(cid:173)\nical properties [9], matching an equal diversity of potassium (K+) channels. No fewer than \nten different genes code for various Ca2+ subunits, allowing for a combinatorial number \nof functionally different channels [10]. A self-regulating neuron should be able to express \ndifferent ionic channels and insert them into the membrane. In information maximization, \nthe parameters for each of the conductances, such as the number of channels, are continu(cid:173)\nally modified in the direction that most increases the mutual information 1[1; Vdend (t)] each \ntime a stimulus occurs. \n\nThe standard approach to such a problem is known as stochastic approximation of the \nmutual information, which was recently applied to feedforward neural networks for blind \nsource sound separation by Bell and Sejnowski [11]. We define a \"free energy\" :F = \nE(J) -\n(3-1 hB(J;X), where E(J) incorporates constraints on the peak or mean firing \nrate f, and (3 is a Lagrangean parameter that balances the mutual information and con(cid:173)\nstraint satisfaction. Stochastic approximation then consists of adjusting the parameter r of \na voltage-dependent conductance by \n\n(3) \n\nwhenever a stimulus x is presented; this will, by definition, occur with probability p(x). \nIn the model, the stimuli are taken to be maintained synaptic input conductances 9syn last(cid:173)\ning 200 msec and drawn randomly from a fixed, continuous probability distribution. Af(cid:173)\nter an initial transient, we assume that the voltage waveform Vdend(t) settles into a sim(cid:173)\nple periodic limit cycle as dictated by the somatic spiking conductances. We thus posit \nthe existence of the invertible composition of maps, such that the input conductance 9syn \nmaps onto a periodic voltage waveform Vdend(t) of period T, from thence onto an aver-\n\naged current (1) = liT J: 1(t) dt to the soma, and then finally onto an output firing rate \n\nf. The last element in this chain of transformations, the steady-state current-discharge \n\n\f164 \n\nM. Stemmler and C. Koch \n\n: input probability - -----. \n: optimal firing rate - - - - - - . \n\n\u2022 \n\n~ \n\n, .. \n\n\" \n\nI \n\nI \n\nI \n\n~ \n\n~' \n\n, \n, \" \n\n. f' 10 Llearnedfiringrate--__ _ \nr.I'J \nc: \n';, 0.8 ~ \n, \n~ 0.6 ~ \n.g: \n...' \nQ.. 0.4 ~ \n, \n: / \ns. \n..5 0.2 ~/ _-\n, \n, \n\n\" \n\" \n, \n\n>' \n\n~ \n\n-\n\nI \n\n, , \n\n\" \" \", \\\\ \n, \n, \n, \n, \n'\" \n\" \n\n~ .~ wms..,.\", 116 n5 \n\n'-\n\n,\n\n\u00b750 o \n\n100 \n\nTime ImW'C1 \n\n200 \n\n60 \n\nN :r: \n50 :::;' \nQl \nU \n40 '0 \n~ \n~ \n30 g;c \n\u00b7c \nti: \n\n20 \n\n0.0 ,'---'-----'-----'--- --'-----'----' \n\n100 \n\n120 \n\n140 \n\n160 \n\n180 \n\nSynaptic Input Conductance [nSl \n\nFigure 2: The inputs to the model are synaptic conductances, drawn randomly from a \nGaussian distribution of mean 141 nS and standard deviation of 25 nS with the re(cid:173)\nstriction that the conductance be non-negative (dot-dashed line). The learning rule in \neq. 4-maximizing the information in the cell's firing rate-was used to adjust the peak \nconductances, midpoint voltages, and slopes of the \"dendritic\" Ca2+ and K+ conduc(cid:173)\ntances over the course of 10.9 (simulated) minutes .. The learning rate decayed with time: \n71(t) = 710 exp( -t/Tlearning) , with 710 = 4.3 X 10- 3 and Tlearning = 4.4 sec. The optimal firing \nrate response curve (dotted line) is asymptotically proportional to the cumulative probabil(cid:173)\nity distribution of inputs. \nvoltage in the trained model. \n\nThe inset illustrates the typical timecourse of the dendritic \n\nrelationship at the soma, can be predicted from the theory of dynamical systems (see \nhttp://www.klab.caltech.edu/'''stemmler for details). \n\nThe voltage and the conductances are nonlinearly coupled: the conductances affect the \nvoltage, which, in turn, sets the conductances. Since the mutual information is a global \nproperty of the stimulus set, the learning rule for anyone conductance would depend on the \nvalues of all other conductances, were it not for the nonlinear feedback loop between volt(cid:173)\nages and conductances. This nonlinear coupling must satisfy the strict physical constraint \nof charge conservation : when the neuron is firing periodically, the average current injected \nby the synaptic and voltage-dependent conductances must equal the average current dis(cid:173)\ncharged by the neuron. Remarkably, charge conservation results in a learning mechanism \nthat is strictly local, so that the mechanism for changing one conductance does not depend \non the values of any other conductances. \n\nFor instance, information maximization predicts that the peak calcium or potassium con(cid:173)\nductance 9i changes by \n\neach time a stimulus is presented. Here 71(t) is a time-dependent learning rate, the angular \nbrackets indicate an average over the stimulus duration, and c( (Vdend)) is a simple function \nthat is zero for most commonly encountered voltages, equal to a positive constant below \nsome minimum, and equal to a negative constant above some maximum voltage. This \n\n\fInformation Maximization in Single Neurons \n\n165 \n\noriginal firing rate ------ (cid:173)\noptimal firing rate - - - -. - . \" \nlearned firing rate - - \" , \n\" \\ \n\nI \nI \n\" \nI \nI \nI \n\n' \n, \nI, \n' \n' \n' \n\n, , , , \n\n, , \n\nI \nI \n\nI \nI \nI \n\n:>.. \n.::E \n:.0 0.1 \nro \n..0 \n8 \n0... \n2 ro \n~0.05 \nC \n.;:::: \n~ \n\n//:-::-----~:/---\n\n.-/:_-- ''---\n0.0 =---' __ ---\"''----_\n30 \n\n20 \n\n-,---- - ------\n\n.., ~ \n\n----\"'---_--------0..' - -__ \"----'''-' \n\n50 \nFiring Rate of Cell [Hz] \n\n40 \n\n60 \n\nFigure 3: The probability distribution of firing rates before and after adaptation of voltage(cid:173)\ndependent conductances. Learning shifts the distribution from a peaked distribution to a \nmuch flatter one, so that the neuron uses each firing rate within the range [22, 59] Hz equally \noften in response to randomly selected synaptic inputs. \n\nfunction represents the constraint on the maximum and minimum firing rate, which sets \nthe limit on the neuron's dynamic range. A constraint on the mean firing rate implies \nthat c( (Vdend)) is simply a negative constant for all suprathreshold voltages. Under this \nconstraint, the optimal distribution of firing rates becomes exponential (not shown). This \nlatter case corresponds to transmitting as much information as possible in the rate while \nfiring as little as possible. \nGiven a stimulus x, the dominant term 8/8V(t) (mihi(Ej - V)) of eq. 4 changes those \nconductances that increase the slope of the firing rate response to x . A higher slope means \nthat more of the neuron 's limited range of firing rates is devoted to representing the stimulus \nx and its immediate neighborhood. Since the learning rule is democratic yet competitive, \nonly the most frequent inputs \"win\" and thereby gain the largest representation in the output \nfiring rate. \n\nIn Fig. 2, the learning rule of eq . 4-generalized to also change the midpoint voltage and \nsteepness of the activation and inactivation functions-has been used to train the model \nneuron as it responds to random, 200 msec long amplitude modulations of a synaptic input \nconductance to the dendritic compartment. The cell \"learns\" the statistical structure of the \ninput, matching its adapted firing rate to the cumulative distribution function of the con(cid:173)\nductance inputs. The distribution of firing rates shifts from a peaked distribution to a much \nflatter one, so that all firing rates are used nearly equally often (Fig. 3). The information \nin the firing rate increases by a factor of three to 10.7 bits/sec, as estimated by adding a \n5 msec, Gaussian-distributed noise jitter to the spike times. \n\nChanging how tightly the stimulus amplitudes are clustered around the mean will increase \nor decrease the slope of the firing rate response to input, without necessarily changing \nthe average firing rate. Neuronal systems are known to adapt not only to the mean of \n\n\f166 \n\nM. Stemmler and C. Koch \n\nthe stimulus intensity, but also to the variance of the stimulus [12]. We predict that such \nadaptation to stimulus variance will occur not just at the level of networks of neurons, but \nalso at the single cell level. \n\nWhile the detailed substrate for maximizing the information at both the single cell and \nnetwork level awaits experimental elucidation, the terms in the learning rule of eq. 4 have \nsimple biophysical correlates: the derivative term, for instance, is reflected in the stochastic \nflicker of ion channels switching between open and closed states. The transitions between \nsimple open and closed states will occur at a rate proportional to (8/ 8V (m(V))) 'Y in equi(cid:173)\nlibrium, where the exponent I is 1/2 or 1, depending on the kinetic model. To change \nthe information transfer properties of the cell, a neuron could use state-dependent phos(cid:173)\nphorylation of ion channels or gene expression of particular ion channel subunits, possibly \nmediated by G-protein initiated second messenger cascades, to modify the properties of \nvoltage-dependent conductances. The tools required to adaptively compress information \nfrom the senses are thus available at the subcellular level. \n\nReferences \n\n[1] D. L. Ruderman, Network 5(4), 517 (1995), R. J. Baddeley and P. J. B. Hancock, \n\nProc. Roy. Soc. B 246, 219 (1991), J. J. Atick, Network 3, 213 (1992). \n\n[2] S. Laughlin, Z. Natuiforsch. 36c, 910 (1981). \n[3] Purves, D. Neural activity and the growth of the brain, (Cambridge University Press, \n\nNY, 1994); X. Gu and N. C. Spitzer, Nature 375, 784 (1995). \n\n[4] G. LeMasson, E. Marder, and L. F. Abbott, Science 259,1915 (1993). \n[5] A. J. Bell, Neurallnfonnation Processing Systems 4,59 (1992). \n[6] D. o. Hebb, The Organization of Behavior (Wiley, New York, 1949). \n[7] J. A. Connor, D. Walter, R. McKown, Biophys. J. 18, 81 (1977). \n[8] R. B. Stein, Biophys. J. 7,797 (1967). \n[9] R. B. Avery and D. Johnston, J. Neurosci. 16, 5567 (1996), F. Helmchen, K. Imoto, \n\nand B. Sakmann, Biophys. J. 70, 1069 (1996). \n\n[10] F. Hofmann, M. Biel, and V. Flockerzi, Ann. Rev. Neurosci. 17, 399 (t 994). \n[11] Y. Z. Tsypkin, Adaptation and Learning in Automatic Systems (Academic Press, NY, \n1971))' R. Linsker, Neural Compo 4, 691 (1992), and A. J. Bell and T. J. Sejnowski, \nNeural Compo 7,1129 (1995). \n\n[12] S. M. Smirnakis et al., Nature 386, 69 (1997). \n\n\f", "award": [], "sourceid": 1572, "authors": [{"given_name": "Martin", "family_name": "Stemmler", "institution": null}, {"given_name": "Christof", "family_name": "Koch", "institution": null}]}