{"title": "Learning with Temporal Derivatives in Pulse-Coded Neuronal Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 195, "page_last": 203, "abstract": null, "full_text": "LEARNING WITH TEMPORAL DERIVATIVES IN \n\nPULSE-CODED NEURONAL SYSTEMS \n\nMark Gluck \n\nDavid B. Parker \n\nEric S. Reifsnider \n\n195 \n\nDepartment of Psychology \n\nStanford University \nStanford, CA 94305 \n\nAbstract \n\nA number of learning models have recently been proposed which \ninvolve calculations of temporal differences (or derivatives in \ncontinuous-time models). These models. like most adaptive network \nmodels. are formulated in tenns of frequency (or activation), a useful \nabstraction of neuronal firing rates. To more precisely evaluate the \nimplications of a neuronal model. it may be preferable to develop a \nmodel which transmits discrete pulse-coded information. We point out \nthat many functions and properties of neuronal processing and learning \nmay depend. in subtle ways. on the pulse-coded nature of the informa(cid:173)\ntion coding and transmission properties of neuron systems. When com(cid:173)\npared to formulations in terms of activation. computing with temporal \nderivatives (or differences) as proposed by Kosko (1986). Klopf \n(1988). and Sutton (1988). is both more stable and easier when refor(cid:173)\nmulated for a more neuronally realistic pulse-coded system. In refor(cid:173)\nmulating these models in terms of pulse-coding. our motivation has \nbeen to enable us to draw further parallels and connections between \nreal-time behavioral models of learning and biological circuit models \nof the substrates underlying learning and memory. \n\nINTRODUCTION \n\nLearning algorithms are generally defined in terms of continuously-valued levels of input \nand output activity. This is true of most training methods for adaptive networks. (e.g .\u2022 \nParker. 1987; Rumelhart. Hinton. & Williams, 1986; Werbos. 1974; Widrow & Hoff, \n1960). and also for behavioral models of animal and hwnan learning. (e.g. Gluck & \nBower. 1988a, 1988b; Rescorla & Wagner. 1972). as well as more biologically oriented \nmodels of neuronal function (e.g .\u2022 Bear & Cooper, in press; Hebb, 1949; Granger. \nAbros-Ingerson, Staubli, & Lynch, in press; Gluck & Thompson, 1987; Gluck. \nReifsnider. & Thompson. in press; McNaughton & Nadel. in press; Gluck & Rumelhart. \nin press). In spite of the attractive simplicity and utility of the \"activation\" construct \n\n\f196 \n\nParker, Gluck and Reifsnider \n\nneurons use discrete trains of pulses for the transmission of information from cell to cell. \nFrequency (or activation) is a useful abstraction of pulse trains. especially for bridging \nthe gap between whole-animal and single neuron behavior. To more precisely evaluate \nthe implications of a neuronal model. it may be preferable to develop a model which \ntransmits discrete pulse-coded information; it is possible that many functions and proper(cid:173)\nties of neuronal processing and learning may depend. in subtle ways. on the pulse-coded \nnature of the information coding and transmission properties of neuron systems. \n\nIn the last few years, a number of learning models have been proposed which involve \ncomputations of temporal differences (or derivatives in continuous-time models). Klopf \n(1988) presented a formal real-time model of classical conditioning that predicts the \nmagnitude of conditioned responses (CRs). given the temporal relationships between \nconditioned stimuli (eSs) and an unconditional stimulus (US). Klopf's model incor(cid:173)\nporates a \"differential-Hebbian\" learning algorithm in which changes in presynaptic lev(cid:173)\nels of activity are correlated with changes in postsynaptic levels of activity. Motivated \nby the constraints and motives of engineering. rather than animal learning. Kosko (1986) \nproposed the same basic rule and provided extensive analytic insights into its properties. \nSutton (1988) introduced a class of incremental learning procedures. called \"temporal \ndifference\" methods. which update associative (predictive) weights according to the \ndifference between temporally successive predictions. In addition to the applied potential \nof this class of algorithms. Sutton & Barto (1987) show how their model. like Klopf's \n(1988) model. provides a good fit to a wide range of behavioral data on classical condi(cid:173)\ntioning. \n\nThese models. all of which depend on computations involving changes over time in \nactivation levels. have been successful both for predicting a wide range of behavioral \nanimal learning data (Klopf. 1988; Sutton & Barto. 1987) and for solving useful \nengineering problems in adaptive prediction (Kosko. 1986; Sutton. 1988). The possibility \nthat these models might represent the computational properties of individual neurons. \nseems, at first glance. highly unlikely. However. we show by reformulating these models \nfor pulse-coded communication (as in neuronal systems) rather than in terms of abstract \nactivation levels. the computational soundness as well as the biological relevance of the \nmodels is improved. By avoiding the use of unstable differencing methods in computing \nthe time-derivative of activation levels. and by increasing the error-tolerance of the com(cid:173)\nputations, pulse coding will be shown to improve the accuracy and reliability of these \nmodels. The pulse coded models will also be shown to lend themselves to a closer com(cid:173)\nparison to the function of real neurons than do models that operate with activation levels. \nAs the ability of researchers to directly measure neuronal behavior grows. the value of \nsuch close comparisons will increase. As an example. we describe here a pulse-coded \nversion of Klopf's differential-Hebbian model of classification learning. Further details \nare contained in Gluck. Parker. & Reifsnider. 1988. \n\n\fLearning with Temporal Derivatives \n\n197 \n\nPulse-Coding in Neuronal Systems \n\nWe begin by outlining the general theory and engineering advantages of pulse-coding \nand then describe a pulse-coded refonnulation of differential-Hebbian learning. The key \nidea is quite simple and can be summarized as follows: Frequency can be seen, loosely \nspeaking, as an integral of pulses; conversely, therefore, pulses can be thought of as car(cid:173)\nrying infonnation about the derivatives of frequency. Thus, computing with the \"deriva(cid:173)\ntives of frequency\" is analogous to computing with pulses. As described below, our \nbasic conclusion is that differential-Hebbian learning (Klopf, 1988; Kosko, 1986) when \nrefonnulated for a pulse-coded system is both more stable and easier to compute than is \napparent when the rule is fonnulated in tenns of frequencies. These results have impor(cid:173)\ntant implications for any learning model which is based on computing with time(cid:173)\nderivatives, such as Sutton's Temporal Difference model (Sutton, 1988; Sutton & Barto, \n1987) \n\nThere are many ways to electrically transmit analog information from point to point. \nPerhaps the most obvious way is to transmit the infonnation as a signal level. In elec(cid:173)\ntronic systems, for example, data that varies between 0 and 1 can be transmitted as a vol(cid:173)\ntage level that varies between 0 volts and 1 volt This method can be unreliable, how(cid:173)\never, because the receiver of the information can't tell if a constant DC voltage offset has \nbeen added to the information, or if crosstalk has occurred with a nearby signal path. To \nthe exact degree that the signal is interfered with, the data as read by the receiver will be \nerroneously altered. The consequences of faults appearing in the signal are particularly \nserious for systems that are based on derivatives of the signal. In such systems, even a \nsmall, but sudden, unintended change in signal level can drastically alter its derivative, \ncreating large errors. \n\nA more reliable way to transmit analog information is to encode it as the frequency of a \nseries of pulses. A receiver can reliably detennine if it has received a pulse, even in the \nface of DC voltage offsets or moderate crosstalk. Most errors will not be large enough to \nconstitute a pulse, and thus will have no effect on the transmitted infonnation. The \nreceiver can count the number of pulses received in a given time window to detennine \nthe frequency of the pulses. Further infonnation on encoding analog infonnation as the \nfrequency of a series of pulses can be found in many electrical engineeri.ng textbooks \n(e.g., Horowitz & Hill, 1980). \n\nAs noted by Parker (1987), another advantage of coding an analog signal as the fre(cid:173)\nquency of a series of pulses is that the time derivative of the signal can be easily and \nstably calculated: If x (t) represents a series of pulses (x equals 1 if a pulse is occuring at \ntime t; otherwise it equals 0) then we can estimate the frequency, f (t), of the series of \npulses using an exponentially weighted time average: \n\nf (t) = Jllx ('t)e-Jl{t-'t) d't \n\n\f198 \n\nParker, Gluck and Reifsnider \n\nwhere Jl is the decay constant. The well known formula for the derivative of 1 (t) is \nAtJP- = Jl~(t)-/(t)) \n\nThus. the time derivative of pulse-coded information can be calculated without using any \nunstable differencing methods. it is simply a function of presence or absence of a pulse \nrelative to the current expectation (frequency) of pulses. As described earlier. calculation \nof time derivatives is a critical component of the learning algorithms proposed by Klopf \n(1988). Kosko (1986) and Sutton (Sutton. 1988; Sutton & Barto 1987). They are also an \nimportant aspect of 2nd order (pseudo-newtonian) extensions of the backpropogation \nlearning rule for multi-layer adaptive \"connectionist\" networks (parker. 1987). \n\nSummary 01 Klopf s Model \n\nKlopf (1988) proposed a model of classical conditioning which incorporates the same \nlearning rule proposed by Kosko (1986) and which extends some of the ideas presented \nin Sutton and Barto's (1981) real-time generalization of Rescorla and Wagner's (1972) \nmodel of classical conditioning. The mathematical specification of Klopf s model con(cid:173)\nsists of two equations: one which calculates output signals based on a weighted sum of \ninput signals (drives) and one which determines changes in synapse efficacy due to \nchanges in signal levels. The specification of signal output level is defined as \n\nwhere: y (t ) is the measure of postsynaptic frequency of firing at time t; Wi (t) is the \nefficacy (positive or negative) of the i th synapse; Xi (t) is the frequency of action poten(cid:173)\ntials at the i th synapse; 9 is the threshold of firing; and n is the number of synapses on \nthe \"neuron\". This equation expresses the idea that the postsynaptic firing frequency \ndepends on the summation of the weighted presynaptic firing frequencies. Wi (t )Xi (t ). \nrelative to some threshold. 9. The learning mechanism is defined as \n\nwhere: ~Wi (t) is the change in efficacy of the i th synapse at time t; ~y (t) is the change \nin postsynaptic firing at time t; 't' is the longest interstimulus interval over which delayed \nconditioning is effective. The C j are empirically established learning rate constants -(cid:173)\neach corresponding to a different inter-stimulus interval. \n\nIn order to accurately simulate various behavioral phenomena observed in classical con(cid:173)\nditioning. Klopf adds three ancillary assumptions to his model. First. he places a lower \nbound of 0 on the activation of the node. Second. he proposes that changes in synaptic \n\n\fLearning with Temporal Derivatives \n\n199 \n\nweight, ~w; (t), be calculated only when the change in presynaptic signal level is positive \n-- that is, when Ax; (t-j) > O. Third, he proposes separate excitatory and inhibitory \nweights in contrast to the single real-valued associative weights in other conditioning \nmodels (e.g., Rescorla & Wagner, 1972; Sutton & Barto, 1981). It is intriguing to note \nthat all of these assumptions are not only sufficiently justified by constraints from \nbehavioral data but are also motivated by neuronal constraints. For a further examination \nof the biological and behavioral factors supporting these assumptions see Gluck, Parker, \nand Reifsnider (1988). \n\nThe strength of Klopf's model as a simple formal behavioral model of classical condi(cid:173)\ntioning is evident. Although the model has not yielded any new behavioral predictions, it \nhas demonstrated an impressive ability to reproduce a wide, though not necessarily com(cid:173)\nplete, range of Pavlovian behavioral phenomena with a minimum of assumptions. \n\nKlopf (1988) specifies his learning algorithm in terms of activation or frequency levels. \nBecause neuronal systems communicate through the transmission of discrete pulses, it is \ndifficult to evaluate the biological plausibility of an algorithm when so formulated. For \nthis reason, we present and evaluate a pulse-coded reformulation of Klopf's model. \n\nA Pulse-Coded Reformulation of Klopf s Model \n\nWe illustrate here a pulse-coded reformulation of Klopf's (1988) model of classical con(cid:173)\nditioning. The equations that make up the model are fairly simple. A neuron is said to \nhave fired an output pulse at time t if vet) > e, where e is a threshold value and vet) is \ndefined as follows: \n\nvet) = (l-d)v(t-l) + !:Wi(t)Xi(t) \n\n(1) \n\nwhere v (t) an auxiliary variable, d is a small positive constant representing the leakage \nor decay rate, Wi (t) is the efficacy of synapse i at time t, and Xi (t) is the frequency of \npresynaptic pulses at time t at synapse i. The input to the decision of whether the neuron \nwill fire consists of the weights and efficacies of the synapses as well as information \nabout previous activation levels at the neuronal output Note that the leakage rate, d, \ncauses older information about activation levels to have less impact on current values of \nv (t) than does recent information of the same type. \n\nThe output of the neuron, p (t), is: \n\nv (t) > e then p (t) = 1 (pulse generated) \nv (t ) ~ e then p (t) = 0 (no pulse generated) \n\nIt is important that once p (t) has been determined, v (t) will need to be adjusted if \n\n\f200 \n\nParker, Gluck and Reifsnider \n\np (t) = 1. To reflect the fact that the neuron has fired, (i.e., p (t) = 1) then v (t) = v (t) - 1. \nThis decrement occurs after p (t) has been determined for the current t. Frequencies of \npulses at the output node and at the synapses are calculated using the following equa(cid:173)\ntions: \n\n/ (t) = / (t-l) + 11/(t) \n\nwhere \n\n11/ (t) = m(p (t) - / (t-l)) \n\nwhere / (t) is the frequency of outgoing pulses at time t; p (t) is the ouput (1 or 0) of the \nneuron at time t ; and m is a small positive constant representing a leakage rate for the \nfrequency calculation. \n\nFollowing Klopf (1988), changes in synapse efficacy occur according to \n\n(2) \n\nwhere \n\nI1Wi(t) = Wi (t+l) - Wi(t) \n\nand l1y (t) and ru:i (t) are calculated analogously to 11/ (t); 't is the longest interstimulus \ninterval (lSI) over which delay conditioning is effective; and C j is an empirically esta(cid:173)\nblished set of learning rates which govern the efficacy of conditioning at an lSI of j . \nChanges in Wi (t) are governed by the learning rule in Equation 2 which alters v (t) via \nEquation 1. \n\nFigure 1 shows the results of a computer simulation of a pulse-coded version of Klopf's \nconditioning model. The first graph shows the excitatory weight (dotted line) and inhibi(cid:173)\ntory weight (dashed line) of the CS \"synapse\". Also on the same graph is the net synaptic \nweight (solid line), the sum of the excitatory and inhibitory weights. The subsequent \ngraphs show CS input pulses, US input pulses, and the output (CR) pulses. The simula(cid:173)\ntion consists of three acquisition trials followed by three extinction trials. \n\n\fLearning with Temporal Derivatives \n\n201 \n\n.................................. ( ................................. , \n\n\u00b7V1\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7U ............................. . \n\n( .............................. .J \n\n~ \n\n0.6 \nen \nl! 0.4 \n; \n.~ 02 \n. \n~ 0.0 \n\u00b70.2 \n\n. ....... . \n\n------------------------------------------~-----~-----------\n\no \n\n50 \n\n100 \n\n150 \n\ncycle \n\n200 \n\n250 \n\n300 \n\nFigure 1. Simulation of pulse.coded version of Klopf's conditioning model. \nTop panel shows excitatory and inhibitory weights as dashed lines and the net \nsynaptic weight of the CS as a solid line. Lower panels show the CS and US \ninputs and the CR output. \n\nAs expected, excitatory weight increases in magnitude over the three ~quisition trials, \nwhile inhibitory weight is stable. During the first two extinction trials, the excitatory and \nthe net synaptic weights decrease in magnitude, while the inhibitory weight increases. \nThus, the CS produces a decreasing amount of output pulses (the CR). During the third \nextinction trial the net synaptic weight is so low that the CS cannot produce output \npulses, and so the CR is extinct. However, as net weight and excitatory weight remain \npositive, there are residual effects of the acquisition which will accelerate reacquisition. \nBecause a threshold must be reached before a neuronal output pulse can be emitted, and \nbecause output must occur for weight changes to occur, pulse coding adds to the \naccelerated reacquisition effect that is evident in the original Klopf model; extinction is \nhalted before net weight is zero, when pulses can no longer be produced. \n\n\f202 \n\nParker, Gluck and Reifsnider \n\nDiscussion \n\nTo facilitate comparison between learning algorithms involving temporal derivative com(cid:173)\nputations and actual neuronal capabilities. we formulated a pulse-coded variation of \nKlopfs classical conditioning model. Our basic conclusion is that computing with tem(cid:173)\nporal derivatives (or differences) as proposed by Kosko (1986). Klopf (1988). and Sutton \n(1988). is more stable and easier when reformulated for a more neuronally realistic. \npulse-coded system. than when the rules are fonnulated in terms of frequencies or activa(cid:173)\ntion. \n\nIt is our hope that further examination of the characteristics of pulse-coded systems may \nreveal facts that bear on the characteristics of neuronal function. In refonnulating these \nalgorithms in terms of pulse-coding. our motivation has been to enable us to draw further \nparallels and connections between real-time behavioral models of learning and biological \ncircuit models of the substrates underlying classical conditioning. (e.g .\u2022 Thompson. 1986; \nGluck & Thompson. 1987; Donegan. Gluck. & Thompson. in press). More generally. \nnoting the similarities and differences between algorithmic/behavioral theories and bio(cid:173)\nlogical capabilities is one way of laying the groundwork for developing more complete \nintegrated theories of the biological bases of associative learning (Donegan. Gluck. & \nThompson. in press). \n\nAcknowledgments \n\nCorrespondence should be addressed to: Mark A. Gluck. Dept of Psychology. Jordan \nHall; Bldg. 420. Stanford. CA 94305. For their commentary and critique on earlier drafts \nof this and related papers. we are indebted to Harry Klopf. Bart Kosko. Richard Sutton. \nand Richard Thompson. This research was supported by an Office of Naval Research \nGrant to R. F. Thompson and M. A. Gluck. \n\nReferences \n\nBear. M. F., & Cooper, L. N. (in press). Molecular mechanisms for synaptic modification in the \n\nvisual cortex: Interaction between theory and experiment. In M. A. Gluck, & D. E. \nRumelhart (Eds.), Neuroscience and Connectionist Theory. Hillsdale, N.J.: Lawrence Erl(cid:173)\nbaum Associates .. \n\nDonegan, N. H., Gluck, M. A., & Thompson, R. F. (1989). Integrating behavioral and biological \n\nmodels of classical conditioning. In R. D. Hawkins, & G. H. Bower (Eds.), Computational \nmodels of learning in simple neural systems (Volume 22 of the Psychology of Learning and \nMotivation). New York: Academic Press. \n\nGluck, M. A., & Bower. G. H. (1988a). Evaluating an adaptive network model of human learning. \n\nJournal of Memory and Language, 27, 166-195. \n\nGluck, M. A., & Bower, G. H. (1988b). From conditioning to category learning: An adaptive net(cid:173)\n\nwork model. Journal of Experimental Psychology: General, 117(3), 225-244. \n\n\fLearning with Temporal Derivatives \n\n203 \n\nGluck, M. A., Parker, D. B., & Reifsnider, E. (1988). Some biological implications of a \n\ndifferential-Hebbian learning rule. Psychobiology, 16(3), 298-302. \n\nGluck, M. A, Reifsnider, E. S., & Thompson, R. F. (in press). Adaptive signal processing and tem(cid:173)\nporal coarse coding: Cerebellar models of classical conditioning and VOR Adaptation. In \nM. A. Gluck, & D. E. Rumelhart (Eds.), Neuroscience and Connectionist Theory. Hillsdale, \nN.1.: Lawrence Erlbaum Associates .. \n\nGluck, M. A, & Rumelhart, D. E. (in press). Neuroscience and Connectionist Theory. Hillsdale, \n\nN.J.: Lawrence Erlbaum Associates .. \n\nGluck, M. A., & Thompson, R. F. (1987). Modeling the neural substrates of associative learning \n\nand memory: A computational approach. Psychological Review, 94, 176-191. \n\nGranger, R., Ambros-Ingerson, 1., Staubli, U., & Lynch, G. (in press). Memorial operation of multi(cid:173)\n\npIe, interacting simulated brain structures. In M. A. Gluck, & D. E. Rumelhart (Eds.), Neu(cid:173)\nroscience and Connectionist Theory. Hillsdale, N.J.: Lawrence Erlbaum Associates .. \n\nHebb, D. (1949). Organization of Behavior. New York: Wiley & Sons. \nHorowitz, P., & Hill, W. (1980). The Art of Electronics. Cambridge, England: Cambridge Univer(cid:173)\n\nsity Press. \n\nKlopf, A. H. (1988). A neuronal model of classical conditioning. Psychobiology, 16(2), 85-125. \nKosko, B. (1986). Differential hebbian learning. In 1. S. Denker (Ed.), Neural Networksfor Com(cid:173)\n\nputing, AlP Conference Proceedings 151 (pp. 265-270). New York: American Institute of \nPhysics. \n\nMcNaughton, B. L., & Nadel, L. (in press). Hebb-Marr networks and the neurobiological represen(cid:173)\ntation of action in space. In M. A. Gluck, & D. E. Rumelhart (Eds.), Neuroscience and Con(cid:173)\nnectionist Theory. Hillsdale, N.J.: Lawrence Erlbaum Associates .. \n\nParker, D. B. (1987). Optimal Algorithms for Adaptive Networks: Second Order Back Propaga(cid:173)\n\ntion, Second Order Direct Propagation, and Second Order Hebbian Learning. Proceedings \nof the IEEE First Annual Conference on Neural Networks. San Diego, California:, . \n\nRescorla. R. A, & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the \neffectiveness of reinforcement and non-reinforcement. In A. H. Black, & W. F. Prokasy \n(Eds.), Classical conditioning II: Current research and theory. New York: Appleton(cid:173)\nCentury-Crofts. \n\nRumeIhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by \nerror propogation. In D. Rumelhart, & 1. McClelland (Eds.), Parallel distributed process(cid:173)\ning: Explorations in the microstructure of cognition (Vol. 1: Foundations). Cambridge, \nM.A.: MIT Press. \n\nSutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learn(cid:173)\n\ning, 3, 9-44. \n\nSutton, R. S., & Barto, A. G. (1981). Toward a modem theory of adaptive networks: Expectation \n\nand prediction. Psychological Review, 88, 135-170. \n\nSutton, R. S., & Barto, A. G. (1987). A temporal-difference model of classical conditioning. In \n\nProceedings of the 9th Annual Conference of the Cognitive Science Society. Seattle, WA. \n\nThompson, R. F. (1986). The neurobiology ofleaming and memory. Science, 233, 941-947. \nWerbos, P. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sci(cid:173)\n\nences. Doctoral dissertation (Economics), Harvard University, Cambridge, Mass .. \n\nWidrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. Institute of Radio Engineers, \n\nWestern Electronic Show and Convention, Convention Record, 4,96-194. \n\n\f\fPart II \n\nApplication \n\n\f", "award": [], "sourceid": 94, "authors": [{"given_name": "David", "family_name": "Parker", "institution": null}, {"given_name": "Mark", "family_name": "Gluck", "institution": null}, {"given_name": "Eric", "family_name": "Reifsnider", "institution": null}]}