{"title": "Fractionally Predictive Spiking Neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 253, "page_last": 261, "abstract": "Recent experimental work has suggested that the neural firing rate can be interpreted as a fractional derivative, at least when signal variation induces neural adaptation. Here, we show that the actual neural spike-train itself can be considered as the fractional derivative, provided that the neural signal is approximated by a sum of power-law kernels. A simple standard thresholding spiking neuron suffices to carry out such an approximation, given a suitable refractory response. Empirically, we find that the online approximation of signals with a sum of power-law kernels is beneficial for encoding signals with slowly varying components, like long-memory self-similar signals. For such signals, the online power-law kernel approximation typically required less than half the number of spikes for similar SNR as compared to sums of similar but exponentially decaying kernels. As power-law kernels can be accurately approximated using sums or cascades of weighted exponentials, we demonstrate that the corresponding decoding of spike-trains by a receiving neuron allows for natural and transparent temporal signal filtering by tuning the weights of the decoding kernel.", "full_text": "Fractionally Predictive Spiking Neurons\n\nSander M. Bohte\nCWI, Life Sciences\n\nAmsterdam, The Netherlands\n\nS.M.Bohte@cwi.nl\n\nJaldert O. Rombouts\nCWI, Life Sciences\n\nAmsterdam, The Netherlands\nJ.O.Rombouts@cwi.nl\n\nAbstract\n\nRecent experimental work has suggested that the neural \ufb01ring rate can be inter-\npreted as a fractional derivative, at least when signal variation induces neural adap-\ntation. Here, we show that the actual neural spike-train itself can be considered as\nthe fractional derivative, provided that the neural signal is approximated by a sum\nof power-law kernels. A simple standard thresholding spiking neuron suf\ufb01ces\nto carry out such an approximation, given a suitable refractory response. Em-\npirically, we \ufb01nd that the online approximation of signals with a sum of power-\nlaw kernels is bene\ufb01cial for encoding signals with slowly varying components,\nlike long-memory self-similar signals. For such signals, the online power-law\nkernel approximation typically required less than half the number of spikes for\nsimilar SNR as compared to sums of similar but exponentially decaying kernels.\nAs power-law kernels can be accurately approximated using sums or cascades of\nweighted exponentials, we demonstrate that the corresponding decoding of spike-\ntrains by a receiving neuron allows for natural and transparent temporal signal\n\ufb01ltering by tuning the weights of the decoding kernel.\n\n1\n\nIntroduction\n\nA key issue in computational neuroscience is the interpretation of neural signaling, as expressed by\na neuron\u2019s sequence of action potentials. An emerging notion is that neurons may in fact encode\ninformation at multiple timescales simultaneously [1, 2, 3, 4]: the precise timing of spikes may be\nconveying high-frequency information, and slower measures, like the rate of spiking, may be relating\nlow-frequency information. Such multi-timescale encoding comes naturally, at least for sensory\nneurons, as the statistics of the outside world often exhibit self-similar multi-timescale features [5]\nand the magnitude of natural signals can extend over several orders. Since neurons are limited in\nthe rate and resolution with which they can emit spikes, the mapping of large dynamic-range signals\ninto spike-trains is an integral part of attempts at understanding neural coding.\nExperiments have extensively demonstrated that neurons adapt their response when facing persistent\nchanges in signal magnitude. Typically, adaptation changes the relation between the magnitude of\nthe signal and the neuron\u2019s discharge rate. Since adaptation thus naturally relates to neural coding,\nit has been extensively scrutinized [6, 7, 8]. Importantly, adaptation is found to additionally exhibit\nfeatures like dynamic gain control, when the standard deviation but not the mean of the signal\nchanges [1], and long-range time-dependent changes in the spike-rate response are found in response\nto large magnitude signal steps, with the changes following a power-law decay (e.g. [9]).\nTying the notions of self-similar multi-scale natural signals and adaptive neural coding together,\nit has recently been suggested that neuronal adaptation allows neuronal spiking to communicate a\nfractional derivative of the actual computed signal [10, 4]. Fractional derivatives are a generalization\nof standard \u2018integer\u2019 derivatives (\u2018\ufb01rst order\u2019, \u2018second order\u2019), to real valued derivatives (e.g. \u20180.5th\norder\u2019). A key feature of such derivatives is that they are non-local, and rather convey information\nover essentially a large part of the signal spectrum [10].\n\n1\n\n\fHere, we show how neural spikes can encode temporal signals when the spike-train itself is taken\nas the fractional derivative of the signal. We show that this is the case for a signal approximated\nby a sum of shifted power-law kernels starting at respective times ti and decaying proportional to\n1/(t \u2212 ti)\u03b2. Then, the fractional derivative of this approximated signal corresponds to a sum of\nspikes at times ti, provided that the order of fractional differentiation \u03b1 is equal to 1 \u2212 \u03b2: a spike-\ntrain is the \u03b1 = 0.2 fractional derivative of a signal approximated by a sum of power-law kernels\nwith exponent \u03b2 = 0.8. Such signal encoding with power-law kernels can be carried out for example\nwith simple standard thresholding spiking neurons with a refractory reset following a power-law.\nAs fractional derivatives contain information over many time-ranges, they are naturally suited for\npredicting signals. This links to notions of predictive coding, where neurons communicate devia-\ntions from expected signals rather than the signal itself. Predictive coding has been suggested as\na key feature of neuronal processing in e.g.\nthe retina [11]. For self-similar scale-free signals,\nfuture signals may be in\ufb02uenced by past signals over very extended time-ranges: so-called long-\nmemory. For example, fractional Brownian motion (fBm) can exhibit long-memory, depending on\ntheir Hurst-parameter H. For H > 0.5 fBM models which exhibit long-range dependence (long-\nmemory) where the autocorrelation-function follows a power-law decay [12]. The long-memory\nnature of signals approximated with sums of power-law kernels naturally extends this signal ap-\nproximation into the future along the autocorrelation of the signal, at least for self-similar 1/f \u03b3 like\nsignals. The key \u201cpredictive\u201d assumption we make is that a neuron\u2019s spike-train up to time t contains\nall the information that the past signal contributes to the future signal t(cid:48) > t.\nThe correspondence between a spike-train as a fractional derivative and a signal approximated as a\nsum of power-law kernels is only exact when spike-trains are taken as a sum of Dirac-\u03b4 functions\nand the power-law kernels as 1/t\u03b2. As both responses are singular, neurons would only be able\nto approximate this. We show empirically how sums of (approximated) 1/t\u03b2 power-law kernels\ncan accurately approximate long-memory fBm signals via simple difference thresholding, in an\nonline greedy fashion. Thus encodings signals, we show that the power-law kernels approximate\nsynthesized signals with about half the number of spikes to obtain the same Signal-to-Noise-Ratio,\nwhen compared to the same encoding method using similar but exponentially decaying kernels.\nWe further demonstrate the approximation of sine wave modulated white-noise signals with sums of\npower-law kernels. The resulting spike-trains, expressed as \u201cinstantaneous spike-rate\u201d, exhibit the\nphase-presession as in [4], with suppression of activity on the \u201cback\u201d of the sine-wave modulation,\nand stronger suppression for lower values of the power-law exponent (corresponding to a higher\norder for our fractional derivative). We \ufb01nd the effect is stronger when encoding the actual sine wave\nenvelope, mimicking the difference between thalamic and cortical neurons reported in [4]. This may\nsuggest that these cortical neurons are more concerned with encoding the sine wave envelope.\nThe power-law approximation also allows for the transparent and straightforward implementation of\ntemporal signal \ufb01ltering by a post-synaptic, receiving neuron. Since neural decoding by a receiving\nneuron corresponds to adding a power-law kernel for each received spike, modifying this receiv-\ning power-law kernel then corresponds to a temporal \ufb01ltering operation, effectively exploiting the\nwide-spectrum nature of power-law kernels. This is particularly relevant, since, as has been amply\nnoted [9, 14], power-law dynamics can be closely approximated by a weighted sum or cascade of\nexponential kernels. Temporal \ufb01ltering would then correspond to simply tuning the weights for this\nsum or cascade. We illustrate this notion with an encoding/decoding example for both a high-pass\nand low-pass \ufb01lter.\n2 Power-law Signal Encoding\nNeural processing can often be reduced to a Linear-Non-Linear (LNL) \ufb01ltering operation on incom-\ning signals [15] (\ufb01gure 1), where inputs are linearly weighted and then passed through a non-linearity\nto yield the neural activation. As this computation yields analog activations, and neurons commu-\nnicate through spikes, the additional problem faced by spiking neurons is to decode the incoming\nsignal and then encode the computed LNL \ufb01lter again into a spike-train. The standard spiking neu-\nron model is that of Linear-Nonlinear-Poisson spiking, where spikes have a stochastic relationship\nto the computed activation [16]. Here, we interpret the spike encoding and decoding in the light of\nprocessing and communicating signals with fractional derivatives [10].\nAt least for signals with mainly (relatively) high-frequency components, it has been well established\nthat a neural signal can be decoded with high \ufb01delity by associating a \ufb01xed kernel with each spike,\n\n2\n\n\fFigure 1: Linear-Non-Linear \ufb01lter, with spike-decoding front-end and spike-encoding back-end.\n\nand summing these kernels [17]; keeping track of doublets and triplet spikes allows for even greater\n\ufb01delity. This approach however only worked for signals with a frequency response lacking low\nfrequencies [17]. Low-frequency changes lead to \u201cadaptation\u201d, where the kernel is adapted to \ufb01t the\nsignal again [18]. For long-range predictive coding, the absence of low frequencies leaves little to\npredict, as the effective correlation time of the signals is then typically very short as well [17].\nUsing the notion of predictive coding in the context of (possible) long-range dependencies, we\nde\ufb01ne the goal of signal encoding as follows: let a signal xj(t) be the result of the continuous-time\ncomputation in neuron j up to time t, and let neuron j have emitted spikes tj up to time t. These\nspikes should be emitted such that the signal xj(t(cid:48)) for t(cid:48) < t is decoded up to some signal-to-noise\nratio, and these spikes should be predictive for xj(t(cid:48)) for t(cid:48) > t in the sense that no additional spikes\nare needed at times t(cid:48) > t to convey the predictive information up to time t.\nTaking kernels as a signal \ufb01lter of \ufb01xed width, as in the general approach in [17] has the important\ndrawback that the signal reconstruction incurs a delay for the duration of the \ufb01lter: its detection\ncannot be communicated until the \ufb01lter is actually matched to the signal. This is inherent to any\nbackward-looking \ufb01lter-maching solution. Alternatively, a predictive coding approach could rely on\nonly on a very short backward looking \ufb01lter, minimizing the delay in the system, and continuously\ncomputing a forward predictive signal. At any time in the future then, only deviations of the actual\nsignal from this expectation are communicated.\n\n2.1 Spike-trains as fractional derivative\n\nAs recent work has highlighted the possibility that neurons encode fractional derivatives, it is note-\nworthy that the non-local nature of fractional calculus offers a natural framework for predictive\ncoding. In particular, as we will show, when we assume that the predictive information about the\nfuture signal is fully contained in the current set of spikes, a signal approximated as a sum of power-\nlaw kernels corresponds to a fractional derivative in the form of a sum of Dirac-\u03b4 functions, which\nthe neuron can obviously communicate through timed spikes.\nThe fractional derivative r(t) of a signal x(t) is denoted as D\u03b1x(t), and intuitively expresses:\n\nr(t) = d\u03b1\n\ndt\u03b1 x(t),\n\nwhere \u03b1 is the fractional order, e.g. 0.5. This is most conveniently computed through the Fourier\ntransformation in the frequency domain, as a simple multiplication:\n\nR(\u03c9) = H(\u03c9)X(\u03c9),\n\nwhere the Fourier-transformed fractional derivative operator H(\u03c9) is by de\ufb01nition (i\u03c9)\u03b1 [10], and\nX(\u03c9) and R(\u03c9) are the Fourier transforms of x(t) and r(t) respectively.\nWe assume that neurons carry out predictive coding by emitting spikes such that all predictive infor-\nmation is contained in the current spikes, and no more spikes will be \ufb01red if the signal follows this\nprediction. Approximating spikes by Dirac-\u03b4 functions, we take the spike-train up to some time t0\nto be the fractional derivative of the past signal and be fully predictive for the expected in\ufb02uence the\n\n3\n\nNeuron i\u03a3\u03b1\u201cLNL\u201dnD\u03b1D\u03b1Dniinx (t)x (t)x (t)x (t)x (t)1x (t)1\fFigure 2: a) Signal x(t) and corresponding fractional derivative r(t): 1/t\u03b2 power-laws and delta-\nfunctions; b) power-law approximation, timed to spikes; compared to sum of \u03b1-functions (black\ndashed line). c) Approximated 1/t\u03b2 power-law kernel for different values of k from eq. (2). d)\nThe approximated 1/t\u03b2 power-law kernel (blue line) can be decomposed as a weighted sum of\n\u03b1-functions with various decay time-constants (dashed lines).\n\npast signal has on the future signal:\n\nr(t) = (cid:88)\n\nti t0 due to x(t < t0) does not require\nadditional future spikes. We note that a sum of power-law decaying kernels with power-law t\u2212\u03b2\nfor \u03b2 = 1 \u2212 \u03b1 corresponds to such a fractional derivative: the Fourier-transform for a power-law\ndecaying kernel of form t\u2212\u03b2 is proportional to (i\u03c9)\u03b2\u22121, hence for a signal that just experienced a\nsingle step from 0 to 1 at time t we get:\n\nR(\u03c9) = (i\u03c9)\u03b1(i\u03c9)\u03b2\u22121,\n\nand setting \u03b2 = 1 \u2212 \u03b1 yields a constant in Fourier-space, which of course is the Fourier-transform\nof \u03b4(t). It is easy to check that shifted power-law decaying kernels, e.g. (t \u2212 ta)\u2212\u03b2 correspond to\na shifted fractional derivative \u03b4(t \u2212 ta), and the fractional derivative of a sum of shifted power-law\ndecaying kernels corresponds to a sum of shifted delta-functions. Note that for decaying power-laws,\nwe need \u03b2 > 0, and for fractional derivatives we require \u03b1 > 0.\nThus, with the reverse reasoning, a signal approximated as the sum of power-law decaying kernels\ncorresponds to a spike-train with spikes positioned at the start of the kernel, and, beyond a current\ntime t, this sum of decaying kernels is is interpreted as a prediction of the extent to which the future\nsignal can be predicted by the past signal.\nObviously, both the Dirac-\u03b4 function and the 1/t\u03b2 kernels are singular (\ufb01gure 2a) and can only be\napproximated. For real applications, only some part of the 1/t\u03b2 curve can be considered, effectively\nleaving the magnitude of the kernel and the high frequency component (the extend to which the\ninitial 1/t\u03b2 peak is approximated) as free parameters. Figure 2b illustrates the signal approximated\nby a random spikes train; as compared to a sum of exponentially decaying \u03b1-kernels, the long-\nmemory effects of power-law decay kernels is evident.\n\n4\n\n00.10.20.30.4time (s)time (s)Fractionally Predicting Spikes00.10.20.30.4Non\u2212singular kernelsx(t)r(t)\u03b1-exp(\u03c4=10ms)x(t)r(t)t0t0time (ms) k=400k=50k=10Power-law kernel approximation, \u03b2 = 0.501002003004005000100200300400500time (ms)Power\u2212law kernel as sum of exponentsa)c)b)d)\f2.2 Practical encoding\n\nTo explore the ef\ufb01cacy of the power-law kernel approach to signal encoding/decoding, we take a\nstandard thresholding online approximation approach, where neurons communicate only deviations\nbetween the current computed signal x(t) and the emitted approximated signal \u02c6x(t) exceeding some\nthreshold \u03b8. The emitted signal \u02c6x(t) is constructed as the (delayed) sum of \ufb01lter kernels \u03ba each\nstarting at the time of the emitted spike:\n\n\u02c6x(t) =(cid:88)\n\ntj \u03b8,\n\n|x(\u03c4) \u2212 \u02c6x(\u03c4)| \u2212 |x(\u03c4) \u2212 (\u02c6x(\u03c4) \u2212 \u03ba(\u03c4))|) > \u03b8,\n\n(1)\n\nt0(cid:88)\nt0(cid:88)\n\n\u03c4 =t0\u2212\u2206\n\n\u03c4 =t0\u2212\u2206\n\nthe signal approximation improvement is computed here as the absolute value of the difference\nbetween the current signal noise and the signal noise when a kernel is added (or subtracted).\nAs an approximation of 1/t\u03b2 power-law kernels, we let the kernel \ufb01rst quickly rise, and then\ndecay according to the power-law. For a practical implementation, we use a 1/t\u03b2 signal mul-\ntiplied by a modi\ufb01ed version of the logistic sigmoid function logsig(t) = 1/(1 + exp(\u2212t)):\nv(t, k) = 2 logsig(kt) \u2212 1, such that the kernel becomes:\n\n\u03ba(t) = \u03bbv(t, k)1/t\u03b2,\n\n(2)\nwhere \u03ba(t) is zero for t(cid:48) < t, and parameter k determines the angle of the initial increasing part of the\nkernel. The resulting kernel is further scaled by a factor \u03bb to achieve a certain signal approximation\nprecision (kernels for power-law exponential \u03b2 = 0.5 and several values of k are shown in \ufb01gure\n2c). As an aside, the resulting (normalized) power-law kernel can very accurately be approximated\nover multiple orders of magnitude by a sum of just 11 \u03b1-function exponentials (\ufb01gure 2d).\nNext, we compare the ef\ufb01ciency of signal approximation with power-law predictive kernels as com-\npared to the same approximation using standard \ufb01xed kernels. For this, we synthesize self-similar\nsignals with long-range dependencies. We \ufb01rst remark on some properties of self-similar signals\nwith power-law statistics, and on how to synthesize them.\n\n2.3 Self-similar signals with power-law statistics\n\nThere is extensive literature on the synthesis of statistically self-similar signals with 1/f-like statis-\ntics, at least going back to Kolmogorov [19] and Mandelbrot [20]. Self-similar signals exhibit\nslowly decaying variances, long-range dependencies and a spectral density following a power law.\nImportantly, for wide-sense self-similar signals, the autocorrelation functions also decays following\na power-law. Although various distinct classes of self-similar signals with 1/f-like statistics exist\n[12], fractional Brownian motion (fBm) is a popular model for many natural signals. Fractional\nBrownian motion is characterized by its Hurst-paramater H, where H = 0.5 corresponds to regular\nBrownian motion, and fBM models with H > 0.5 exhibit long-range (positive) dependence. The\nspectral density of an fBm signal is proportional to a power-law, 1/f \u03b3, where \u03b3 = 2H + 1. We used\nfractional Brownian motion to generate self-similar signals for various H values, using the wfbm\nfunction from the Matlab wavelet toolbox.\n\n5\n\n\fFigure 3: Left: example of encoding of fBm signal with power-law kernels. Using an exponentially\ndecaying kernel (inset) required 1398 spikes vs. 618 for the power-law kernel (k = 50), for the same\nSNR. Right: SNR for various \u03b2 power-law exponents using a \ufb01xed number of spikes (48Hz), with\ncurves for different H-parameters, each curve averaged over \ufb01ve 16s signals. The dashed blue curve\nplots the H = 0.6 curve, using less spikes (36Hz); the \ufb02at bottom dotted line shows the average\nperformance of the non-power-law exponentially decaying kernel, also for H = 0.6.\n\n3 Signal encoding/decoding\n3.1 Encoding long-memory self-similar signals\n\nWe applied the thresholded kernel approximation outlined above to synthesized fBm signals with\nH > 0.5, to ensure long-term dependence in the signal. An example of such encoding is given in\n\ufb01gure 3, left panel, using both positive and negative spikes, (inset, red line: the power-law kernel\nused). When encoding the same signal with kernels without the power-law tail (inset, blue line), the\napproximation required more than twice as many spikes for the same Signal-to-Noise-Ratio (SNR).\nIn \ufb01gure 3, right panel, we compared the encoding ef\ufb01cacy for signals with different H-parameters,\nas a function of the power-law exponent, using the same number of spikes for each signal (achieved\nby changing the \u03bb parameter and the threshold \u03b8). We \ufb01nd that more slowly varying signals, corre-\nsponding to higher H-parameters, are better encoded by the power-law kernels, More surprisingly,\nwe \ufb01nd and signals are consistently best encoded for low \u03b2-values, in the order of 0.1\u2212 0.3. Similar\nresults were obtained for different values of k in equation (2).\nWe should remark that without negative spikes, there is no longer a clear performance advantage for\npower-law kernels (even for large \u03b2): where power-law kernels are bene\ufb01cial on the rising part of a\nsignal, they lose on downslopes where their slow decay cannot follow the signal.\n\n3.2 Sine-wave modulated white-noise\n\nFractional derivatives as an interpretation of neuronal \ufb01ring-rate has been put forward by a series of\nrecent papers [10, 21, 4], where experimental evidence was presented to suggest such an interpreta-\ntion. A key \ufb01nding in [4] was that the instantaneous \ufb01ring rate of neurons along various processing\nstages of a rat\u2019s whisker movement exhibit a phase-lead relative to the amplitude of the movement\nmodulation. The phase-lead was found to be greater for cortical neurons as compared to thala-\nmic neurons. When the \ufb01ring rate corresponds to the \u03b1-order fractional derivative, the phase-lead\nwould correspond to greater fractional order \u03b1 in the cortical neurons [10] . We used the sum-\nof-power-laws to approximate both the sine-wave-modulated white noise and the actual sine-wave\nitself, and found similar results (\ufb01gure 4): smaller power-law exponents, in our interpretation also\ncorresponding to larger fractional derivative orders, lead to increasingly fewer spikes at the back of\nthe sine-wave (both in the case where we encode the signal with both positive and negative spikes\n\u2013 then counting only the positive spikes \u2013 and when the signal is approximated with only positive\nspikes \u2013 not shown). We \ufb01nd an increased phase-lead when approximating the actual sine-wave ker-\n\n6\n\n0246810121416\u221250050100150200250time (s)signalSignal approximation w/ Power\u2212law kernel s(t)s(t)050001time (ms) approx exp kernelpower\u2212law00.10.20.30.40.50.60.70.80.91121416182022242628SNR (\u00b1 \u03c3)\u03b2SNR for different H\u2212factors for mean spikes/s rate of 48Hz H=0.6H=0.75H=0.9H=0.6, 75%exp. H=.6\fFigure 4: Sinewave phase-lead. Left: when encoding sine-wave modulated white noise (inset);\nright: encoding the sine-wave signal itself (inset). Average \ufb01ring rate is computed over 100ms, and\nnormalized to match the sine-wave kernel.\n\nFigure 5: Illustration of frequency \ufb01ltering with modi\ufb01ed decoding kernels. The square boxes show\nthe respective kernels in both time and frequency space. See text for further explanation.\n\nnel as opposed to the white-noise modulation, suggesting that perhaps cortical neurons more closely\nencode the former as compared to thalamic neurons.\n\n3.3 Signal Frequency Filtering\n\nFor a receiving neuron i to properly interpret a spike-train r(t)j from neuron j, both neurons would\nneed to keep track of past events over extended periods of time: current spikes have to be added\nto or subtracted from the future expectation signal that was already communicated through past\nspikes. The required power-law processes can be implemented in various manners, for instance as\na weighted sum or a cascade of exponential processes [9, 10]. A natural bene\ufb01t of implementing\npower-law kernels as a weighted sum or cascade of exponentials is that a receiving neuron can carry\nout temporal signal \ufb01ltering simply by tuning the respective weight parameters for the kernel with\nwhich it decodes spikes into a signal approximation.\nIn \ufb01gure 5, we illustrate this with power-law kernels that are transformed into high-pass and low-\npass \ufb01lters. We \ufb01rst approximated our power-law kernel (2) with a sum of 11 exponentials (depicted\nin the left-center inset). Using this approximation, we encoded the signal (\ufb01gure 5, center). The\nsignal was then reconstructed using the resultant spikes, using the power-law kernel approximation,\nbut with some zeroed out exponentials (respectively the slowly decaying exponentials for the high-\npass \ufb01lter, and the fast-decaying kernels for the low-pass \ufb01lter). Figure 5, most right, shows the\nresulting \ufb01ltered signal approximations. Obviously, more elaborate tuning of the decoding kernel\nwith a larger sum of kernels can approximate a vast variety of signal \ufb01lters.\n\n7\n\n024681012141611.522.533.54normalized rate \u03b2 = 0.9\u03b2 = 0.5signal\u03b2 = 0.9\u03b2 = 0.5signalApproximation white noise with sine wave modulationSine-wave approximation024681012141600.511.522.53time(s)time(s) 10010210time(ms)10\u2212510010freq(Hz)0500010000\u221240\u221220020406080time(ms)signal approximation10010210410\u22125100105\u221240\u221230\u221220\u22121001020304050600\u221240\u221230\u221220\u2212100102030405060500010000time(ms)100102104time(ms)10\u22125100105freq(Hz)low pass filterhigh pass filter010020030040050001time (ms)power\u2212law kernel as sum of exponents\f4 Discussion\nTaking advantage of the relationship between power-laws and fractional derivatives, we outlined the\npeculiar fact that a sum of Dirac-\u03b4 functions, when taken as a fractional derivative, corresponds to\na signal in the form of a sum of power-law kernels. Exploiting the obvious link to spiking neural\ncoding, we showed how a simple thresholding spiking neuron can compute a signal approximation\nas a sum of power-law kernels; importantly, such a simple thresholding spiking neuron closely\n\ufb01ts standard biological spiking neuron models, when the refractory response follows a power-law\ndecay (e.g. [22]). We demonstrated the usefulness of such an approximation when encoding slowly\nvarying signals, \ufb01nding that encoding with power-law kernels signi\ufb01cantly outperformed similar but\nexponentially decaying kernels that do not take long-range signal dependencies into account.\nCompared to the work where the \ufb01ring rate is considered as a fractional derivative, e.g. [10], the\npresent formulation extends the notion of neural coding with fractional derivatives to individual\nspikes, and hence \ufb01ner temporal variations: each spike effectively encodes very local signal varia-\ntions, while also keeping track of long-range variations. The interpretation in [10] of the fractional\nderivative r(t) as a rate leads to a 1:1 relation between the fractional derivative order and the power-\nlaw decay exponent of adaptation of about 0.2 [10, 13, 9]. For such fractional derivative \u03b1, our\nderivation implies a power-law exponent for the power law kernels \u03b2 = 1 \u2212 \u03b1 \u2248 0.8, consistent\nwith our sine-wave reconstruction, as well as with recent adapting spiking neuron models [22]. We\n\ufb01nd that when signals are approximated with non-coupled positive and negative neurons (i.e. one\nneuron encodes the positive part of the signal, the other the negative), such much faster-decaying\npower-law kernels encode more ef\ufb01ciently than slower decaying ones. Non-coupled signal encod-\ning obviously fair badly when signals rapidly change polarity; this however seems consistent with\nhuman illusory experiences [23].\nAs noted, the singularity of 1/t\u03b2 power-law kernels means that initial part of the kernel can only be\napproximated. Here, we initially focused our simulation on the use of long-range power-law kernels\nfor encoding slowly varying signals. A more detailed approximation of this initial part of the kernel\nmay be needed to incorporate effects like gain modulation [24, 8], and determine up to what extent\nthe power-law kernels already account for this phenomenon. This would also provide a natural link\nto existing neural models of spike-frequency adaptation, e.g. [25], as they are primarily concerned\nwith modeling the spiking neuron behavior rather than the computational aspects.\nWe used a greedy online thresholding process to determine when a neuron would spike to approxi-\nmate a signal, this in contrast to of\ufb02ine optimization methods that place spikes at optimal times, like\nSmith & Lewicki [26]. The key difference of course is that the latter work is concerned with decod-\ning a signal, and in effect attempts to determine the effective neural (temporal) \ufb01lter. As we aimed\nto illustrate in the signal \ufb01ltering example, these notions are not mutually exclusive: a receiving\nneuron could very well \ufb01lter the incoming signal with a carefully shaped weighted sum of kernels,\nand then, when the \ufb01lter is activated, signal the magnitude of the match through fractional spiking.\nPredictive coding seeks to \ufb01nd a careful balance between encoding known information as well as\nfuture, derived expectations [27]. It does not seem unreasonable to formulate this balance as a no-\ngoing-back problem, where current computations are projected forward in time, and corrected where\nneeded. In terms of spikes, this would correspond to our assumption that, absent new information,\nno additional spikes need to be \ufb01red by a neuron to transmit this forward information.\nThe kernels we \ufb01nd are somewhat in contrast to the kernels found by Bialek et. al. [17], where\nthe optimal \ufb01lter exhibited both a negative and a positive part and no long-range \u201ctail\u201d. Several\npractical issues may contribute to this difference, not least the relative absence of low frequency\nvariations, as well as the fact that the signal considered is derived from the \ufb02y\u2019s H1 neurons. These\ntwo neurons have only partially overlapping receptive \ufb01elds, and the separation into positive and\nnegative spikes is thus slightly more intricate. We need to remark though that we see no impediment\nfor the presented signal approximation to be adapted to such situations, or situations where more\nthan two neurons encode fractions of a signal, as in population coding, e.g. [28].\nThe issue of long-range temporal dependencies as discussed here seems to be relatively unappre-\nciated. Long-range power-law dynamics potentially offer a variety of \u201chooks\u201d for computation\nthrough time [9], like for temporal difference learning and relative temporal computations (and pos-\nsibly exploiting spatial and temporal statistical correspondences [29]).\nAcknowledgement: JOR supported by NWO Grant 612.066.826, SMB partly by NWO Grant 639.021.203.\n\n8\n\n\fReferences\n[1] A.L. Fairhall, G.D. Lewen, W. Bialek, and R.R.R. van Steveninck. Multiple timescales of adaptation in a\n\nneural code. In NIPS, volume 13. The MIT Press, 2001.\n\n[2] B. Wark, A. Fairhall, and F. Rieke. Timescales of inference in visual adaptation. Neuron, 61(5):750\u2013761,\n\n2009.\n\n[3] S. Panzeri, N. Brunel, N.K. Logothetis, and C. Kayser. Sensory neural codes using multiplexed temporal\n\nscales. Trends in Neurosciences, page in press, 2010.\n\n[4] B.N. Lundstrom, A.L. Fairhall, and M. Maravall. Multiple Timescale Encoding of Slowly Varying\nWhisker Stimulus Envelope in Cortical and Thalamic Neurons In Vivo. J. of Neurosci, 30(14):50\u201371,\n2010.\n\n[5] JH Van Hateren. Processing of natural time series of intensities by the visual system of the blow\ufb02y. Vision\n\nResearch, 37(23):3407\u20133416, 1997.\n\n[6] N. Brenner, W. Bialek, and R. de Ruyter van Steveninck. Adaptive rescaling maximizes information\n\ntransmission. Neuron, 26(3):695\u2013702, 2000.\n\n[7] B. Wark, B.N. Lundstrom, and A. Fairhall. Sensory adaptation. Current opinion in neurobiology,\n\n17(4):423\u2013429, 2007.\n\n[8] M. Famulare and A.L. Fairhall. Feature selection in simple neurons: how coding depends on spiking\n\ndynamics. Neural Computation, 22:1\u201318, 2009.\n\n[9] P.J. Drew and LF Abbott. Models and properties of power-law adaptation in neural systems. Journal of\n\nneurophysiology, 96(2):826, 2006.\n\n[10] B.N. Lundstrom, M.H. Higgs, W.J. Spain, and A.L. Fairhall. Fractional differentiation by neocortical\n\npyramidal neurons. Nature neuroscience, 11(11):1335\u20131342, 2008.\n\n[11] T. Hosoya, S.A. Baccus, and M. Meister. Dynamic predictive coding by the retina. Nature, 436:71\u201377,\n\n2005.\n\n[12] G.W. Wornell. Signal processing with fractals: a wavelet based approach. Prentice Hall, NJ, 1999.\n[13] Z. Xu, JR Payne, and ME Nelson. Logarithmic time course of sensory adaptation in electrosensory\n\nafferent nerve \ufb01bers in a weakly electric \ufb01sh. Journal of neurophysiology, 76(3):2020, 1996.\n\n[14] S. Fusi, PJ Drew, and LF Abbott. Cascade models of synaptically stored models. Neuron, 45:1\u201314, 2005.\n[15] C.M. Bishop. Neural networks for pattern recognition. Oxford University Press, USA, 1995.\n[16] EJ Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Computation in\n\nNeural Systems, 12(2):199\u2013213, 2001.\n\n[17] F. Rieke, D. Warland, and W. Bialek. Spikes: exploring the neural code. The MIT Press, 1999.\n[18] A.L. Fairhall, G.D. Lewen, W. Bialek, and R.R.R. van Steveninck. Ef\ufb01ciency and ambiguity in an adaptive\n\nneural code. Nature, 412(6849):787\u2013792, 2001.\n\n[19] A. Kolmogorov. Wienersche Spiralen und einige andere interessante kurven in Hilbertschen raum. Com-\n\nputes Rendus (Doklady) Academic Sciences USSR (NS), 26:115\u2013118, 1940.\n\n[20] B.B. Mandelbrot and J.W. Van Ness. Fractional Brownian motions, fractional noises and applications.\n\nSIAM review, 10(4):422\u2013437, 1968.\n\n[21] B.N. Lundstrom, M. Famulare, L.B. Sorensen, W.J. Spain, and A.L. Fairhall. Sensitivity of \ufb01ring rate\nto input \ufb02uctuations depends on time scale separation between fast and slow variables in single neurons.\nJournal of Computational Neuroscience, 27(2):277\u2013290, 2009.\n\n[22] C Pozzorini, R Naud, S Mensi, and W Gerstner. Multiple timescales of adaptation in single neuron\n\nmodels. In Front. Comput. Neurosci.: Bernstein Conference on Computational Neuroscience, 2010.\n\n[23] A A Stocker and E P Simoncelli. Visual motion aftereffects arise from a cascade of two isomorphic\n\nadaptation mechanisms. J. Vision, 9(9):1\u201314, 2009.\n\n[24] S. Hong, B.N. Lundstrom, and A.L. Fairhall. Intrinsic gain modulation and adaptive neural coding. PLoS\n\nComputational Biology, 4(7), 2008.\n\n[25] R. Jolivet, A. Rauch, HR Luescher, and W. Gerstner. Integrate-and-Fire models with adaptation are good\n\nenough: predicting spike times under random current injection. NIPS, 18:595\u2013602, 2006.\n\n[26] E. Smith and M.S. Lewicki. Ef\ufb01cient coding of time-relative structure using spikes. Neural Computation,\n\n17(1):19\u201345, 2005.\n\n[27] N. Tishby, F.C. Pereira, and W. Bialek. The information bottleneck method. Arxiv physics/0004057, 2000.\n[28] Q.J.M. Huys, R.S. Zemel, R. Natarajan, and P. Dayan. Fast population coding. Neural Computation,\n\n19(2):404\u2013441, 2007.\n\n[29] O. Schwartz, A. Hsu, and P. Dayan. Space and time in visual context. Nature Rev. Neurosci., 8(11), 2007.\n\n9\n\n\f", "award": [], "sourceid": 790, "authors": [{"given_name": "Jaldert", "family_name": "Rombouts", "institution": null}, {"given_name": "Sander", "family_name": "Bohte", "institution": null}]}