{"title": "Unsupervised Pixel-prediction", "book": "Advances in Neural Information Processing Systems", "page_first": 809, "page_last": 815, "abstract": null, "full_text": "Unsupervised Pixel-prediction \n\nWilliam R. Softky \nMath Resp.arch Branch \n\nNIDDK, NIH \n\n9190 Wisconsin Ave #350 \n\nBethesda, MD 20814 \n\nbill@homer.niddk.nih.gov \n\nAbstract \n\nWhen a sensory system constructs a model of the environment \nfrom its input, it might need to verify the model's accuracy. One \nmethod of verification is multivariate time-series prediction: a good \nmodel could predict the near-future activity of its inputs, much \nas a good scientific theory predicts future data. Such a predict(cid:173)\ning model would require copious top-down connections to compare \nthe predictions with the input. That feedback could improve the \nmodel's performance in two ways: by biasing internal activity to(cid:173)\nward expected patterns, and by generating specific error signals if \nthe predictions fail. A proof-of-concept model-an event-driven, \ncomputationally efficient layered network, incorporating \"cortical\" \nfeatures like all-excitatory synapses and local inhibition- was con(cid:173)\nstructed to make near-future predictions of a simple, moving stim(cid:173)\nulus. After unsupervised learning, the network contained units not \nonly tuned to obvious features of the stimulus like contour orienta(cid:173)\ntion and motion, but also to contour discontinuity (\"end-stopping\") \nand illusory contours. \n\n1 \n\nIntroduction \n\nSomehow, brains make very accurate models of the outside world from their raw \nsensory input. How might brains check and improve those models? What signal is \nthere to verify a model of the world? \n\nThe scientific method faces a similar problem: how to verify theories. In science, \ntheories are verified by predicting future data, using the implicit assumption that \n\n\f810 \n\nW.R.SOFfKY \n\ngood predictions can only result from good models. By analogy, it is possible that \nbrains predict their afferent input (e.g. at the thalamus), and that making such \npredictions and using them as feedback is a unifying design principle of cortex. \nThe proof-of-concept model presented here uses unsupervised Hebbian learning to \npredict, pixel-wise, the location of a moving pattern slightly in the future. \n\nWhy try prediction? \n\n\u2022 Predicting future data usually requires a good generative model. For instance: to \npredict the brightness of individual TV pixels even a fraction of a second in advance, \none would need models of contours, objects, motion, occlusion, shadow, etc. \n\n\u2022 A successful prediction can help filter out input noise, like a Kalman filter. \n\n\u2022 A failed prediction provides a specific, high-dimensional error signal. \n\nis not only possible \n\nin cortex-which has massive feedback \n\n\u2022 Prediction \nconnections-but necessary as well, because those feedback fibers, their target den(cid:173)\ndrites, and synaptic integration impose inevitable delays. So for a feedback signal \nto arrive at the cell body \"on time,\" it would need to have been generated tens of \nmilliseconds earlier, as a prediction of imminent activity. \n\u2022 In this model, \"prediction\" means producing spikes in advance which will correlate \nwith subsequent input spikes. Specifically, the network's goal is to produce at each \ngrid point a train of spikes at times Pj which predicts the input train Ik, in the \nsense of maximizing their normalized cross-correlation. The objective function L \n(\"likeness\") can be expressed in terms of a smoothing \"bump\" function B(t:J;, ty) \n(of spikes at times t:J; and ty) and a correlation function C(trainl, train2, ~t): \n\nC(P,I, ~T) \n\nL(P,I,~T) \n\nexp ( -It:J; T- t yl ) \nL: L: B(Pj + ~t, Ik) \n\nj \n\nk \nC(P, I, ~T) \n\nJC(P, P, O)C(I, 1,0) \n\n\u2022 In order to avoid a trivial but useless prediction (\"the weather tomorrow will be \njust like today;'), one must ensure that a unit cannot usually predict its own firing \n(for example, pick ~t ~ T greater than the autocorrelation time of a spike train). \n\n2 Model \n\nThe input to the network is a 16 x 16 array of spike trains, with toroidal array \nboundary conditions. The spikes are driven by a \"stimulus\" bar of excitation one \nunit wide and seven units long, which moves smoothly perpendicular to its orien(cid:173)\ntation behind the array (in a broad circle, so that all orientations and directions \nare represented; Fig. 1A). The stimulus point transiently generates spikes at each \ngrid point there according to a Poisson process: the whole array of spikes can be \nvisualized as a twinkling, moving contour. \n\n\f811 \n\ndelay \n\nt - - - -\n\n_ \n\n_ \n\n_ \n\n_ \n\ntuned, precise, predictive \nfeedback \n\nUnsupervised Pixel-prediction \n\nA \n\nB \n\n_ \n\ntrigger & \nforward \nhelper syn~ /.. _ \n_ \n_ \ninputs \n\n_ \n\n_ \n\n_ \n\nFigure 1: A network predicts dynamic patterns. A A moving pattern on \na grid of spiking pixels describes a slow circle, and drives activity in a network \nabove. B The three-layer network learns to predict that activity just before it \noccurs. Forward connections, evolving by Hebbian rules, produce top-level units \nwith coarse receptive fields and fine stimulus-tuning (e.g. contour orientation and \nmotion). Each spike from a top unit is \"bound\" (by coincidence detection) with \nthe particular spike which triggered it, to produce feedback which is both stimulus(cid:173)\ntuned and spatially specific. A Hebb rule determines how the delayed, predictive \nfeedback will drive middle-layer units and be compared to input-layer units. Because \nall connections are excitatory, winner-take-all inhibition within local groups of units \nprevents runaway excitation. \n\n2.1 Network Structure \n\nThe network has three layers. The bottom layer contains the spiking pixels, and \nthe \"surprise\" units described below. The middle layer, having the same spatial \nresolution as the input, has four coarsely-tuned units per input pixel. And the \ntop layer contains the most finely-tuned units, spaced at half the spatial resolution \n(at every fourth gridpoint, i.e. with coarser spatial resolution and larger receptive \nfields). The signal flow is bi-directional [10, 7], with both forward and feedback \nsynaptic connections. All connections between units are excitatory, and excitation \nis kept in check by local winner-take-all inhibition (WTA). For example, a given \ninput spike can only trigger one spike out of the 16 units directly above it in the \ntop layer (Fig. IB). \n\nUnsupervised learning occurs through two local Hebb-like rules. Forward connec(cid:173)\ntions evolve to make nearby (competing) units strongly anticorrelated-for instance, \nunits typically become tuned to different contour orientations and directions of \nmotion-while feedback connections evolve to maximally correlate delayed feedback \nsignals with their targets. \n\n2.2 Binary multiplication in single units \n\nWhile some neural models implement multiplication as a nonlinear function of the \nsum of the inputs, the spiking model used here implements multiplication as a \nbinary operation on two distinct classes of synapses. \n\n\f812 \n\nW.R.SOFrKY \n\nhelper \n\ninh \n\ncomc. \n\nA ~'I~ ~ ~ \n\"helper\" I I I \n.m~ \n\ntrigger detector delay \n\nI \n\nB \n\nprediction \nof X \n\nV \n\n-\n\n-\n\nout \n\n- - - *--------\n\nI \n\nFigure 2: Multiplicative synapses and surprise detection. A A spiking unit \nmultiplies two types of synaptic inputs: the \"helper\" type increments an internal \nbias without triggering a spike, and the \"trigger\" type can trigger a spike (*), \nwithout incrementing, but only if the bias is above a threshold. Spike propagation \nmay be discretely delayed, and coincidences of two units fired by the same input \nspike can be detected. B Once the network has generated a (delayed) prediction of \na given pixel's activity, the match of prediction and reality can be tested by special(cid:173)\npurpose units: one type which detects unpredicted input, the other which detects \nunfulfilled predictions. The firing of either type can drive the network's learning \nrules, so units above can become tuned to consistent patters of failed predictions, \nas occur at discontinuities and illusory contours. \n\nA helper synapse, when activated by a presynaptic spike, will increment or decre(cid:173)\nment the postsynaptic voltage without ever initiating a spike. A trigger synapse, on \nthe other hand, can initiate a spike (if the voltage is above the threshold determined \nby its WTA neighbors), but cannot adjust the voltage (Fig. 2A; the helper type is \nloosely based on the weak, slow NMDA synapses on cortical apical dendrites, while \ntriggers are based on strong, brief AMPA synapses on basal dendrites.) Thus, a \nunit can only fire when both synaptic types are active, so the output firing rate \napproximates the product of the rates of helpers and triggers. Each unit has two \ncharacteristic timescales: a slower voltage decay time, and the essentially instanta(cid:173)\nneous time necessary to trigger and propagate a spike. \n\nThis scheme has two advantages. One is that a single cell can implement a relatively \n\"pure\" multiplication of distinct inputs, as required for computations like motion(cid:173)\ndetection. The other advantage is that feedback signals, restricted to only helper \nsynapses, cannot by themselves drive a cell, so closed positive-feedback loops cannot \n\"latch\" the network into a fixed state, independent of the input. Therefore, all \ntrigger synapses in this network are forward, while all delayed, lateral, and feedback \nconnections are of the helper type. \n\n2.3 Feedback \n\nThere are two issues in feedback: How to construct tuned, specific feedback, and \nwhat to do with the feedback where it arrives. \n\n\fUnsupervised Pixel-prediction \n\n813 \n\nAn accurate prediction requires information about the input: both about its exact \npresent state, and about its history over nearby space and recent time. In this model, \nthose signals are distinct: spatial and temporal specificity is given by each input \nspike, and the spatia-temporal history is given by the stimulus-tuned responses of \nthe slow, coarse-grained units in the top layer. Spatially-precise feedback requires \nrecombining those signals. (Feedback from V1 cortical Layer VI to thalamus has \nrecently been shown to fit these criteria, being both spatially refined and direction(cid:173)\nselective; [3] Grieve & Sillito, 1995). \n\nIn this network, each feedback signal results from the AND of spikes from a input(cid:173)\nlayer spike (spatially specific) and the resulting top-layer spike it produces (stimulus(cid:173)\ntuned). This \"binding\" across levels of specificity requires single-spike temporal \nprecision, and may even be one of the perceptual uses for spike timing in cortex \n[1, 9]. \n\n2.4 Surprise detection \n\nOnce predictive feedback is learned, it can be used in two ways: biasing units toward \nexpected activity, and comparing predictions against actual input. Feedback to the \nmiddle layer is used as a bias signal through helper synapses, by adding the feedback \nto the bias signal. But feedback to the bottom , input-layer is compared with actual \ninput by means of special \"surprise\" units which subtract prediction from input \n(and vice versa). \n\nBecause both prediction and input are noisy signals, their difference is even noisier, \nand must be both temporally smoothed and thresholded to generate a mismatch(cid:173)\nspike. In this model , these prediction/input differences are accomplished pixel-by(cid:173)\npixel using ad-hoc units designed for the purpose (Fig. 2B). There is no indication \nthat cortex operates so simplistically, but there are indications that cortical cells \nare in general sensitive to mismatches between expectation and reality, such as \ndiscontinuities in space (edges) , in time (on- and off-responses), and in context \n(saliency) . \n\nThe resulting error vector can drive upper-layer units just as the input does, so that \nthe network can learn patterns of failed predictions, which typically correspond to \ndiscontinuities in the stimulus. Learning consistent patterns of bad predictions is \na completely generic recipe for discovering such discontinuitites, which often cor(cid:173)\nrespond closely to visually important features like contour ends, corners, illusory \ncontours, and occlusion . \n\n3 Results and Discussion \n\nAfter prolonged exposure to the stimulus, the network produces a blurred cloud of \nspikes which anticipates the actual input spikes, but which also consistently predicts \ninput beyond the bar's ends (leading to small clouds of surprise-unit activity track(cid:173)\ning the ends). The top-level units, driven both by input signals and by feedback , \nbecome tuned either to different motions of the bar itself (due to Hebbian learning \nof the input), or to different motions of its ends (due to Hebbian learning of the \nsurprise-units); see Fig. 3. Cells tuned to contour ends ( \"end-stopped\") have been \nfound in visual cortex [11], although the principles of their genesis are not known . \nUsing the same parameters but a different stimlus, the network can also evolve units \n\n\f814 \n\nW.R.SOFfKY \n\n38 \n36 \n34 \n32 \n30 \n28 \nCD 26 \na. 24 \n~ 22 \nC\\I 20 \nQ; 18 \n~ 1 6 8+t!tHIIIIII __ IIIH!Io' \n...J 14 \n12 \n10 \n8 \n6 \n4 \n2 \n\n+ <H-$iIII-\n* -\n-\n\\=. \n\\= + \n'\\= X \n* \n\n14 \nCD 12 \n~ 10 \n8 \nQ; 6 \n! 4 \n2 \no \n\n........ - ... \n\nXX>IMIOQO( \n\nx x x _ \n\nx --.III.BijMl!mj. \n\n+ \n\n\u2022 _ \n\n\u2022\u2022\u2022 \n\n>MC1M_cc_IIf-1 t ~ \n\nFigure 3: Single units are highly stimulus-specific. Spikes from all units at \none location are shown (with time) as a stimlus bar (insets) passes them with six \ndifferent relative positions and motions. Out of the many units available, only one \nor two are active in each layer for a given stimulus configuration. The inactive \nunits are tuned to stimulus orientations not shown here. Some units are driven by \n\"surprise\" units (Figure 2 and text), and respond only to the bar's ends (. and x), \nbut not to its center (+). Such responses lag behind those of ordinary units, because \nthey must temporally integrate to determine whether a significant mismatch exists \nbetween the noisy prediction and the noisy input. Spikes from five passes have been \nsummed to show the units' reliability. \n\nwhich detect the illusory contours present in certain moving gratings. \n\nSeveral researchers propose that cortex (or similar networks) might use feedback \npathways to recreate or regenerate their (static) input [7,4, 10]. The approach here \nrequires instead that the network forecast future (dynamic) input [8] . In a general \nsense, predicting the future is a better test of a model than predicting the present, \nin the same sense that scientific theories which predict future experimental data are \nmore persuasive than theories which predict existing data. Prediction of the raw \ninput has advantages over prediction of some higher-level signal [5, 6, 2]: the raw \ninput is the only unprocessed \"reality\" available to the network, and comparing the \nprediction with that raw input yields the highest-dimensional error vector possible. \n\nSpiking networks are likewise useful. As in cortex, spikes both truncate small inputs \nand contaminate them with quantization-noise, crucial practical problems which \nreal-valued networks avoid. Spike-driven units can implement purely correlative \ncomputations like motion-detection, and can avoid parasitic positive-feedback loops. \nSpike timing can identify which of many possible inputs fired a given unit, thereby \nmaking possible a more specific feedback signal. The most practical benefit is that \ninteractions among rare events (like spikes) are much faster to compute than real-\n\n\fUnsupervised Pixel-prediction \n\n815 \n\nvalued ones; this particular network of 8000 units and 200,000 synapses runs faster \nthan the workstation can display it. \n\nThis model is an ad-hoc network to illustrate some of the issues a brain might face \nin trying to predict its retinal inputs; it is not a model of cortex. Unfortunately, the \nhypothesis that cortex predicts its own inputs does not suggest any specific circuit \nor model to test. But two experimental tests may be sufficiently model-independent. \nOne is that cortical \"non-classical\" receptive fields should have a temporal structure \nwhich reflects the temporal sequences of natural stimuli, so a given cell's activity will \nbe either enhanced or suppressed when its input matches contextual expectations. \nAnother is that feedback to a single cell in thalamus, or to an individual cortical \napical dendrite, should arrive on average earlier than afferent input to the same \ncell. \n\nReferences \n\n[1] A. Engel , P. Koenig, A. Kreiter, T. Schillen, and W. Singer. Temporal coding \nin the visual cortex: New vistas on integration in the nervous system. TINS, \n15:218-226, 1992. \n\n[2] K. Fielding and D. Ruck. Recognition of moving light displays using hidden \n\nmarkov models. Pattern Recognition, 28:1415-1421,1995. \n\n[3] K. 1. Grieve and A. M. Sillito. Differential properties of cells in the feline \nprimary visual cortex providing the cortifugal feedback to the lateral geniculate \nnucleus and visual claustrum. J. Neurosci., 15:4868-4874,1995. \n\n[4] G. Hinton, P. Dayan, B. Frey, and R. Neal. The wake-sleep algorithm for \n\nunsupervised neural networks. Science, 268:1158-1161,1995. \n\n[5] P. R. Montague and T. Sejnowski. The predictive brain: Temporal coincidence \nand\u00b7 temporal order in synaptic learning mechanisms. Learning and Memory, \n1:1-33, 1994. \n\n[6] P. Read Montague, Peter Dayan, Christophe Person, and T. Sejnowski. Bee \nforaging in uncertain environments using predictive hebbian learning. Nature, \n377:725-728, 1995. \n\n[7] D. Mumford. Neuronal architectures for pattern-theoretic problems. In C. Koch \nand J. Davis, editors, Large-scale theories of the cortex, pages 125-152. MIT \nPress, 1994. \n\n[8] W. Softky. Could time-series prediction assist visual processing? Soc. Neurosci. \n\nAbstracts, 21:1499, 1995. \n\n[9] W. Softky. Simple codes vs. efficient codes. Current Opinion in Neurbiology, \n\n5:239-247, 1995. \n\n[10] S. Ullman. Sequence-seeking and counterstreams: a model for bidirectional in(cid:173)\n\nformation flow in cortex. In C. Koch and J . Davis, editors, Large-scale theories \nof the cortex, pages 257-270. MIT Press, 1994. \n\n[11] S. Zucker, A. Dobbins, and L. Iverson. Two stages of curve detection suggest \n\ntwo styles of visual computation. Neural Computation, 1:68-81, 1989. \n\n\f", "award": [], "sourceid": 1083, "authors": [{"given_name": "William", "family_name": "Softky", "institution": null}]}