{"title": "Dendritic cortical microcircuits approximate the backpropagation algorithm", "book": "Advances in Neural Information Processing Systems", "page_first": 8721, "page_last": 8732, "abstract": "Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience. However, the main learning mechanism behind these advances \u2013 error backpropagation \u2013 appears to be at odds with neurobiology. Here, we introduce a multilayer neuronal network model with simplified dendritic compartments in which error-driven synaptic plasticity adapts the network towards a global desired output. In contrast to previous work our model does not require separate phases and synaptic learning is driven by local dendritic prediction errors continuously in time. Such errors originate at apical dendrites and occur due to a mismatch between predictive input from lateral interneurons and activity from actual top-down feedback. Through the use of simple dendritic compartments and different cell-types our model can represent both error and normal activity within a pyramidal neuron. We demonstrate the learning capabilities of the model in regression and classification tasks, and show analytically that it approximates the error backpropagation algorithm. Moreover, our framework is consistent with recent observations of learning between brain areas and the architecture of cortical microcircuits. Overall, we introduce a novel view of learning on dendritic cortical circuits and on how the brain may solve the long-standing synaptic credit assignment problem.", "full_text": "Dendritic cortical microcircuits\n\napproximate the backpropagation algorithm\n\nJo\u00e3o Sacramento\u21e4\n\nDepartment of Physiology\n\nUniversity of Bern, Switzerland\nsacramento@pyl.unibe.ch\n\nYoshua Bengio\u2021\n\nMila and Universit\u00e9 de Montr\u00e9al, Canada\n\nyoshua.bengio@mila.quebec\n\nRui Ponte Costa\u2020\n\nDepartment of Physiology\n\nUniversity of Bern, Switzerland\n\ncosta@pyl.unibe.ch\n\nWalter Senn\n\nDepartment of Physiology\n\nUniversity of Bern, Switzerland\n\nsenn@pyl.unibe.ch\n\nAbstract\n\nDeep learning has seen remarkable developments over the last years, many of\nthem inspired by neuroscience. However, the main learning mechanism behind\nthese advances \u2013 error backpropagation \u2013 appears to be at odds with neurobiology.\nHere, we introduce a multilayer neuronal network model with simpli\ufb01ed dendritic\ncompartments in which error-driven synaptic plasticity adapts the network towards\na global desired output. In contrast to previous work our model does not require\nseparate phases and synaptic learning is driven by local dendritic prediction errors\ncontinuously in time. Such errors originate at apical dendrites and occur due to\na mismatch between predictive input from lateral interneurons and activity from\nactual top-down feedback. Through the use of simple dendritic compartments\nand different cell-types our model can represent both error and normal activity\nwithin a pyramidal neuron. We demonstrate the learning capabilities of the model\nin regression and classi\ufb01cation tasks, and show analytically that it approximates\nthe error backpropagation algorithm. Moreover, our framework is consistent\nwith recent observations of learning between brain areas and the architecture of\ncortical microcircuits. Overall, we introduce a novel view of learning on dendritic\ncortical circuits and on how the brain may solve the long-standing synaptic credit\nassignment problem.\n\n1\n\nIntroduction\n\nMachine learning is going through remarkable developments powered by deep neural networks (Le-\nCun et al., 2015). Interestingly, the workhorse of deep learning is still the classical backpropagation\nof errors algorithm (backprop; Rumelhart et al., 1986), which has been long dismissed in neuro-\nscience on the grounds of biologically implausibility (Grossberg, 1987; Crick, 1989). Irrespective\nof such concerns, growing evidence demonstrates that deep neural networks outperform alternative\nframeworks in accurately reproducing activity patterns observed in the cortex (Lillicrap and Scott,\n2013; Yamins et al., 2014; Khaligh-Razavi and Kriegeskorte, 2014; Yamins and DiCarlo, 2016; Kell\net al., 2018). Although recent developments have started to bridge the gap between neuroscience\n\u21e4Present address: Institute of Neuroinformatics, University of Z\u00fcrich and ETH Z\u00fcrich, Z\u00fcrich, Switzerland\n\u2020Present address: Computational Neuroscience Unit, Department of Computer Science, SCEEM, Faculty of\n\nEngineering, University of Bristol, United Kingdom\n\n\u2021CIFAR Senior Fellow\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fand arti\ufb01cial intelligence (Marblestone et al., 2016; Lillicrap et al., 2016; Scellier and Bengio, 2017;\nCosta et al., 2017; Guerguiev et al., 2017), how the brain could implement a backprop-like algorithm\nremains an open question.\nIn neuroscience, understanding how the brain learns to associate different areas (e.g., visual and\nmotor cortices) to successfully drive behaviour is of fundamental importance (Petreanu et al., 2012;\nManita et al., 2015; Makino and Komiyama, 2015; Poort et al., 2015; Fu et al., 2015; Pakan et al.,\n2016; Zmarz and Keller, 2016; Attinger et al., 2017). However, how to correctly modify synapses to\nachieve this has puzzled neuroscientists for decades. This is often referred to as the synaptic credit\nassignment problem (Rumelhart et al., 1986; Sutton and Barto, 1998; Roelfsema and van Ooyen,\n2005; Friedrich et al., 2011; Bengio, 2014; Lee et al., 2015; Roelfsema and Holtmaat, 2018), for\nwhich the backprop algorithm provides an elegant solution.\nHere we propose that the prediction errors that drive learning in backprop are encoded at distal\ndendrites of pyramidal neurons, which receive top-down input from downstream brain areas (we\ninterpret a brain area as being equivalent to a layer in machine learning) (Petreanu et al., 2009;\nLarkum, 2013). In our model, these errors arise from the inability to exactly match via lateral input\nfrom local interneurons (e.g. somatostatin-expressing; SST) the top-down feedback from downstream\ncortical areas. Learning of bottom-up connections (i.e., feedforward weights) is driven by such error\nsignals through local synaptic plasticity. Therefore, in contrast to previous approaches (Marblestone\net al., 2016), in our framework a given neuron is used simultaneously for activity propagation (at the\nsomatic level), error encoding (at distal dendrites) and error propagation to the soma without the need\nfor separate phases.\nWe \ufb01rst illustrate the different components of the model. Then, we show analytically that under\ncertain conditions learning in our network approximates backpropagation. Finally, we empirically\nevaluate the performance of the model on nonlinear regression and recognition tasks.\n2 Error-encoding dendritic cortical microcircuits\n\n2.1 Neuron and network model\nBuilding upon previous work (Urbanczik and Senn, 2014), we adopt a simpli\ufb01ed multicompart-\nment neuron and describe pyramidal neurons as three-compartment units (schematically depicted\nin Fig. 1A). These compartments represent the somatic, basal and apical integration zones that\ncharacteristically de\ufb01ne neocortical pyramidal cells (Spruston, 2008; Larkum, 2013). The dendritic\nstructure of the model is exploited by having bottom-up and top-down synapses converging onto\nseparate dendritic compartments (basal and distal dendrites, respectively), a \ufb01rst approximation in line\nwith experimental observations (Spruston, 2008) and re\ufb02ecting the preferred connectivity patterns of\ncortico-cortical projections (Larkum, 2013).\nConsistent with the connectivity of SST interneurons (Urban-Ciecko and Barth, 2016), we also\nintroduce a second population of cells within each hidden layer with both lateral and cross-layer\nconnectivity, whose role is to cancel the top-down input so as to leave only the backpropagated\nerrors as apical dendrite activity. Modelled as two-compartment units (depicted in red, Fig. 1A), such\ninterneurons are predominantly driven by pyramidal cells within the same layer through weights\nk,k, and they project back to the apical dendrites of the same-layer pyramidal cells through weights\nWIP\nk,k (Fig. 1A). Additionally, cross-layer feedback onto SST cells originating at the next upper layer\nWPI\nk + 1 provide a weak nudging signal for these interneurons, modelled after Urbanczik and Senn\n(2014) as a conductance-based somatic input current. We modelled this weak top-down nudging on a\none-to-one basis: each interneuron is nudged towards the potential of a corresponding upper-layer\npyramidal cell. Although the one-to-one connectivity imposes a restriction in the model architecture,\nthis is to a certain degree in accordance with recent monosynaptic input mapping experiments show\nthat SST cells in fact receive top-down projections (Leinweber et al., 2017), that according to our\nproposal may encode the weak interneuron \u2018teaching\u2019 signals from higher to lower brain areas.\nThe somatic membrane potentials of pyramidal neurons and interneurons evolve in time according to\n\nd\ndt\nd\ndt\n\nk (t) = glk uP\nuP\nuI\nk(t) = glk uI\n\nk (t) + gBvP\nk(t) + gDvI\n\nB,k(t)  uP\nk(t)  uI\n\nk (t) + gAvP\nk(t) + iI\n\nk(t) +  \u21e0(t),\n\nA,k(t)  uP\n\nk (t) +  \u21e0(t)\n\n(1)\n\n(2)\n\n2\n\n\fwith one such pair of dynamical equations for every hidden layer 0 < k < N; input layer neurons\nare indexed by k = 0, g\u2019s are \ufb01xed conductances,  controls the amount of injected noise. Basal\nand apical dendritic compartments of pyramidal cells are coupled to the soma with effective transfer\nconductances gB and gA, respectively. Subscript lk is for leak, A is for apical, B for basal, D for\ndendritic, superscript I for inhibitory and P for pyramidal neuron. Eqs. 1 and 2 describe standard\nconductance-based voltage integration dynamics, having set membrane capacitance to unity and\nresting potential to zero for clarity. Background activity is modelled as a Gaussian white noise input,\n\u21e0 in the equations above. To keep the exposition brief we use matrix notation, and denote by uP\nk\nand uI\nk the vectors of pyramidal and interneuron somatic voltages, respectively. Both matrices and\nvectors, assumed column vectors by default, are typed in boldface here and throughout. Dendritic\ncompartmental potentials are denoted by v and are given in instantaneous form by\n\nvP\nB,k(t) = WPP\nvP\nA,k(t) = WPP\n\nk,k1 (uP\nk,k+1 (uP\n\nk1(t))\nk+1(t)) + WPI\n\nk,k (uI\n\nk(t)),\n\n(3)\n(4)\n\nwhere (u) is the neuronal transfer function, which acts componentwise on u.\n\n2\n\nFigure 1: Learning in error-encoding dendritic microcircuit network. (A) Schematic of network\nwith pyramidal cells and lateral inhibitory interneurons. Starting from a self-predicting state \u2013 see\nmain text and supplementary material (SM) \u2013 when a novel teaching (or associative) signal is\npresented at the output layer (utrgt\n), a prediction error in the apical compartments of pyramidal\nneurons in the upstream layer (layer 1, \u2018error\u2019) is generated. This error appears as an apical voltage\nde\ufb02ection that propagates down to the soma (purple arrow) where it modulates the somatic \ufb01ring rate,\nwhich in turn leads to plasticity at bottom-up synapses (bottom, green). (B) Activity traces in the\nmicrocircuit before and after a new teaching signal is learned. (i) Before learning: a new teaching\nsignal is presented (utrgt\n), which triggers a mismatch between the top-down feedback (grey blue)\nand the cancellation given by the lateral interneurons (red). (ii) After learning (with plasticity at the\nbottom-up synapses (WPP\n1,0)), the network successfully predicts the new teaching signal, re\ufb02ected on\nno distal \u2019error\u2019 (top-down and lateral interneuron input cancel each other). (C) Interneurons learn to\npredict the backpropagated activity (i), while simultaneously silencing the apical compartment (ii),\neven though the pyramidal neurons remain active (not shown).\n\n2\n\nFor simplicity, we reduce pyramidal output neurons to two-compartment cells: the apical compartment\nis absent (gA = 0 in Eq. 1) and basal voltages are as de\ufb01ned in Eq. 3. Although the design can be\nextended to more complex morphologies, in the framework of dendritic predictive plasticity two\ncompartments suf\ufb01ce to compare desired target with actual prediction. Synapses proximal to the\nsoma of output neurons provide direct external teaching input, incorporated as an additional source of\ncurrent iP\nN ), with some \ufb01xed somatic nudging\nconductance gsom. This can be modelled closer to biology by explicitly setting the somatic excitatory\nand inhibitory conductance-based inputs (Urbanczik and Senn, 2014). For a given output neuron,\niP\nN (t) = gP\nand inhibitory synaptic reversal potentials, respectively, where the inputs are balanced according to\n\nN (t), where Eexc and Einh are excitatory\n\nexc,N (t)Eexc  uP\n\nN. In practice, one can simply set iP\n\nN (t) + gP\n\ninh,N (t)Einh  uP\n\nN = gsom(utrgt\n\nN  uP\n\n3\n\nABC(i)layer 2(output)layer 1(hidden)sensory inputlayer 0output(ii)u: somatic potentialv: dendritic potential}B: basalA: apicalP: pyramidal cellI: interneuronI1uI1uP2vlayer 2layer 1PB,kvwPI1,1-IP1,1w2,1wPPtarget}errorsensory input(layer 0)utrgt2PP1,0wPP1,0w+uP1PA,1v015000101500010||Apical pot.||Time (ms)Time (ms)beforelearningafterplasticityApicalpotentialTime (ms)0100200error0Sensoryinputtarget0100200Time (ms)(ii)(i)1Apical topdownApical cancelationutrgt2OutputuP2errorrP0PA,1VPP1,0w||pyrk+1 - intk||2targettarget\futrgt\nN Einh\nEexcEinh\n\n, gP\n\ngP\nexc,N = gsom\nde\ufb01nes the target teaching voltage utrgt\nInterneurons are similarly modelled as two-compartment cells, cf. Eq. 2. Lateral dendritic projections\nfrom neighboring pyramidal neurons provide the main source of input as\n\nN towards which the neuron is nudged4.\n\n. The point at which no current \ufb02ows, iP\n\ninh,N = gsom\n\nN = 0,\n\nutrgt\nN Eexc\nEexcEinh\n\nvI\nk(t) = WIP\n\nk,k (uP\n\nk (t)),\n\n(5)\n\nwhereas cross-layer, top-down synapses de\ufb01ne the teaching current iI\nk. This means that an interneuron\nat layer k permanently (i.e., when learning or performing a task) receives balanced somatic teaching\nexcitatory and inhibitory input from a pyramidal neuron at layer k+1 on a one-to-one basis (as above,\nbut with uP\nk+1 as target). With this setting, the interneuron is nudged to follow the corresponding\nnext layer pyramidal neuron. See SM for detailed parameters.\n\n2.2 Synaptic learning rules\n\nThe synaptic learning rules we use belong to the class of dendritic predictive plasticity rules (Ur-\nbanczik and Senn, 2014; Spicher et al., 2018) that can be expressed in its general form as\n\nd\ndt\n\nw = \u2318 ((u)  (v)) r,\n\n(6)\n\nwhere w is an individual synaptic weight, \u2318 is a learning rate, u and v denote distinct compartmental\npotentials,  is a rate function, and r is the presynaptic input. Eq. 6 was originally derived in the light\nof reducing the prediction error of somatic spiking, when u represents the somatic potential and v is\na function of the postsynaptic dendritic potential.\nIn our model the plasticity rules for the various connection types are:\n\nd\ndt\n\nWPP\n\nk,k1 = \u2318PP\nWIP\nk,k = \u2318IP\n\nWPI\n\nk,k = \u2318PI\n\nd\ndt\nd\ndt\n\nB,k)rP\nk1T\nk,k1(uP\nk )  (\u02c6vP\nk)rP\nkT\nk,k(uI\nk)  (\u02c6vI\nA,krI\nkT\nk,kvrest  vP\n\n,\n\n,\n\n,\n\n(7)\n\n(8)\n\n(9)\n\ngB\n\nB,k and \u02c6vI\nvP\n\nwhere (\u00b7)T denotes vector transpose and rk \u2318 (uk) the layer k \ufb01ring rates. The synaptic weights\nevolve according to the product of dendritic prediction error and presynaptic rate, and can undergo\nboth potentiation or depression depending on the sign of the \ufb01rst factor (i.e., the prediction error).\nFor basal synapses, such prediction error factor amounts to a difference between postsynaptic rate\nand a local dendritic estimate which depends on the branch potential. In Eqs. 7 and 8, \u02c6vP\nB,k =\nk take into account dendritic attenuation factors of the different\nvI\nglk+gB+gA\ncompartments. On the other hand, the plasticity rule (9) of lateral interneuron-to-pyramidal synapses\naims to silence (i.e., set to resting potential vrest = 0, here and throughout zero for simplicity)\nthe apical compartment; this introduces an attractive state for learning where the contribution from\ninterneurons balances (or cancels out) top-down dendritic input. This learning rule of apical-targeting\ninterneuron synapses can be thought of as a dendritic variant of the homeostatic inhibitory plasticity\nproposed by Vogels et al. (2011); Luz and Shamir (2012).\nIn experiments where the top-down connections are plastic, the weights evolve according to\n\nk = gD\n\nglk+gD\n\nd\ndt\nTD,k = Wk,k+1 rP\n\nWPP\n\nk,k+1 = \u2318PP\n\nk,k+1(uP\n\nk )  (\u02c6vP\n\nTD,k)rP\nk+1T\n\nwith \u02c6vP\ncompartment into a distal part receiving the top-down input (with voltage \u02c6vP\ncompartment receiving the lateral input from the interneurons (with voltage vP\n\nk+1. An implementation of this rule requires a subdivision of the apical\nTD,k) and another distal\n\nA,k).\n\n,\n\n(10)\n\n4Note that in biology a target may be represented by an associative signal from the motor cortex to a sensory\n\ncortex (Attinger et al., 2017).\n\n4\n\n\f2.3 Comparison to previous work\n\nIt has been suggested that error backpropagation could be approximated by an algorithm that requires\nalternating between two learning phases, known as contrastive Hebbian learning (Ackley et al., 1985).\nThis link between the two algorithms was \ufb01rst established for an unsupervised learning task (Hinton\nand McClelland, 1988) and later analyzed (Xie and Seung, 2003) and generalized to broader classes\nof models (O\u2019Reilly, 1996; Scellier and Bengio, 2017).\nThe concept of apical dendrites as distinct integration zones, and the suggestion that this could\nsimplify the implementation of backprop has been previously made (K\u00f6rding and K\u00f6nig, 2000, 2001).\nOur microcircuit design builds upon this view, offering a concrete mechanism that enables apical error\nencoding. In a similar spirit, two-phase learning recently reappeared in a study that exploits dendrites\nfor deep learning with biological neurons (Guerguiev et al., 2017). In this more recent work, the\ntemporal difference between the activity of the apical dendrite in the presence and in the absence of\nthe teaching input represents the error that induces plasticity at the forward synapses. This difference\nis used directly for learning the bottom-up synapses without in\ufb02uencing the somatic activity of the\npyramidal cell. In contrast, we postulate that the apical dendrite has an explicit error representation by\nsimultaneously integrating top-down excitation and lateral inhibition. As a consequence, we do not\nneed to postulate separate temporal phases, and our network operates continuously while plasticity at\nall synapses is always turned on.\nError minimization is an integral part of brain function according to predictive coding theories\n(Rao and Ballard, 1999; Friston, 2005). Interestingly, recent work has shown that backprop can\nbe mapped onto a predictive coding network architecture (Whittington and Bogacz, 2017), related\nto the general framework introduced by LeCun (1988). A possible network implementation is\nsuggested by Whittington and Bogacz (2017) that requires intricate circuitry with appropriately tuned\nerror-representing neurons. According to this work, the only plastic synapses are those that connect\nprediction and error neurons. By contrast, in our model, lateral, bottom-up and top-down connections\nare all plastic, and errors are directly encoded in dendritic compartments.\n\n3 Results\n\n3.1 Learning in dendritic error networks approximates backprop\n\nIn our model, neurons implicitly carry and transmit errors across the network. In the supplementary\nmaterial, we formally show such propagation of errors for networks in a particular regime, which we\nterm self-predicting. Self-predicting nets are such that when no external target is provided to output\nlayer neurons, the lateral input from interneurons cancels the internally generated top-down feedback\nand renders apical dendrites silent. In this case, the output becomes a feedforward function of the\ninput, which can in theory be optimized by conventional backprop. We demonstrate that synaptic\nplasticity in self-predicting nets approximates the weight changes prescribed by backprop.\nWe summarize below the main points of the full analysis (see SM). First, we show that somatic\nmembrane potentials at hidden layer k integrate feedforward predictions (encoded in basal dendritic\npotentials) with backpropagated errors (encoded in apical dendritic potentials):\n\nk = uk + Nk+1 WPP\nuP\n\nk,k+1 N1Yl=k+1\n\nDl WPP\n\nl,l+1! DNutrgt\n\nN  uN + O(Nk+2).\n\nParameter  \u2327 1 sets the strength of feedback and teaching versus bottom-up inputs and is assumed\nto be small to simplify the analysis. The \ufb01rst term is the basal contribution and corresponds to uk ,\nthe activation computed by a purely feedforward network that is obtained by removing lateral and\ntop-down weights from the model (here and below, we use superscript \u2018-\u2019 to refer to the feedforward\nmodel). The second term (of order Nk+1) is an error that is backpropagated from the output layer\ndown to k-th layer hidden neurons; matrix Dk is a diagonal matrix with i-th entry containing the\nderivative of the neuronal transfer function evaluated at uk,i.\nSecond, we compare model synaptic weight updates for the bottom-up connections to those prescribed\nby backprop. Output layer updates are exactly equal by construction. For hidden neuron synapses,\n\n5\n\n\fwe obtain\n\nl,l+1! DNutrgt\n\nk,k1Nk+1 N1Yl=k\n\nWPP\n\nDl WPP\n\nk,k1 = \u2318PP\n\nN  uNrk1T + O(Nk+2).\nUp to a factor which can be absorbed in the learning rate, this plasticity rule becomes equal to the\nbackprop weight change in the weak feedback limit  ! 0, provided that the top-down weights are\nset to the transpose of the corresponding feedforward weights.\nIn our simulations, top-down weights are either set at random and kept \ufb01xed, in which case the\nequation above shows that the plasticity model optimizes the predictions according to an approxima-\ntion of backprop known as feedback alignment (Lillicrap et al., 2016); or learned so as to minimize\nan inverse reconstruction loss, in which case the network implements a form of target propagation\n(Bengio, 2014; Lee et al., 2015).\n\n3.2 Deviations from self-predictions encode backpropagated errors\n\nTo illustrate learning in the model and to con\ufb01rm our analytical insights we \ufb01rst study a very simple\ntask: memorizing a single input-output pattern association with only one hidden layer; the task\nnaturally generalizes to multiple memories.\nGiven a self-predicting network (established by microcircuit plasticity, Fig. S1, see SM for more\ndetails), we focus on how prediction errors get propagated backwards when a novel teaching signal is\nprovided to the output layer, modeled via the activation of additional somatic conductances in output\npyramidal neurons. Here we consider a network model with an input, a hidden and an output layer\n(layers 0, 1 and 2, respectively; Fig. 1A).\nWhen the pyramidal cell activity in the output layer is nudged towards some desired target (Fig. 1B\n(i)), the bottom-up synapses WPP\n2,1 from the lower layer neurons to the basal dendrites are adapted,\nagain according to the plasticity rule that implements the dendritic prediction of somatic spiking (see\nEq. 7). What these synapses cannot explain away encodes a dendritic error in the pyramidal neurons\nof the lower layer 1. In fact, the self-predicting microcircuit can only cancel the feedback that is\nproduced by the lower layer activity.\n1,0 (Eq. 7).\nThe somatic integration of apical activity induces plasticity at the bottom-up synapses WPP\nAs the apical error changes the somatic activity, plasticity of the WPP\n1,0 weights tries to further reduce\nthe error in the output layer. Importantly, the plasticity rule depends only on local information\navailable at the synaptic level: postsynaptic \ufb01ring and dendritic branch voltage, as well as the\npresynaptic activity, in par with phenomenological models of synaptic plasticity (Sj\u00f6str\u00f6m et al., 2001;\nClopath et al., 2010; Bono and Clopath, 2017). This learning occur concurrently with modi\ufb01cations\nof lateral interneuron weights which track changes in the output layer. Through the course of learning\nthe network comes to a point where the novel top-down input is successfully predicted (Fig. 1B,C).\n\n0 assigned to corresponding target rtrgt\n\n3.3 Network learns to solve a nonlinear regression task\nWe now test the learning capabilities of the model on a nonlinear regression task, where the goal is to\nassociate sensory input with the output of a separate multilayer network that transforms the same\nsensory input (Fig. 2A). More precisely, a pyramidal neuron network of dimensions 30-50-10 (and 10\nhidden layer interneurons) learns to approximate a random nonlinear function implemented by a held-\naside feedforward network of dimensions 30-20-10. One teaching example consists of a randomly\ndrawn input pattern rP\n0 )),\n1,0 rP\nwith scale factors k2,1 = 10 and k1,0 = 2. Teacher network weights and input pattern entries are\nsampled from a uniform distribution U (1, 1). We used a soft rectifying nonlinearity as the neuronal\ntransfer function, (u) =  log(1 + exp((u  \u2713)), with  = 0.1,  = 1 and \u2713 = 3. This parameter\nsetting led to neuronal activity in the nonlinear, sparse \ufb01ring regime.\nThe network is initialized to a random initial synaptic weight con\ufb01guration, with both pyramidal-\npyramidal WPP\n1,1 independently drawn\n1,2 is kept \ufb01xed throughout, in the spirit\nfrom a uniform distribution. Top-down weight matrix WPP\nof feedback alignment (Lillicrap et al., 2016). Output layer teaching currents iP\n2 are set so as to\n. Learning rates were manually chosen to yield best\nnudge uP\n\n1,2 and pyramidal-interneuron weights WIP\n\n2 towards the teacher-generated utrgt\n\n2 = (k2,1Wtrgt\n\n2,1 (k1,0 Wtrgt\n\n2,1, WPP\n\n1,0, WPP\n\n1,1, WPI\n\n2\n\n6\n\n\fFigure 2: Dendritic error microcircuit learns to solve a nonlinear regression task online and\nwithout phases.\n(A-C) Starting from a random initial weight con\ufb01guration, a 30-50-10 fully-\nconnected network learns to approximate a nonlinear function (\u2018separate network\u2019) from input-output\npattern pairs. (B) Example \ufb01ring rates for a randomly chosen output neuron (rP\n2 , blue noisy trace)\nand its desired target imposed by the associative input (rtrgt\n, blue dashed line), together with the\nvoltage in the apical compartment of a hidden neuron (vP\nA,1, grey noisy trace) and the input rate from\n0 , green). Traces are shown before (i) and after learning (ii). (C) Error curves\nthe sensory neuron (rP\nfor the full model and a shallow model for comparison.\n\n2\n\n2  rtrgt\n\nperformance. Some learning rate tuning was required to ensure the microcircuit could track the\nchanges in the bottom-up pyramidal-pyramidal weights, but we did not observe high sensitivity once\nthe correct parameter regime was identi\ufb01ed. Error curves are exponential moving averages of the sum\nof squared errors loss krP\n2 k2 computed after every example on unseen input patterns. Test\nerror performance is measured in a noise-free setting ( = 0). Plasticity induction terms given by\nEqs. 7-9 are low-pass \ufb01ltered with time constant \u2327w before being de\ufb01nitely consolidated, to dampen\n\ufb02uctuations; synaptic plasticity is kept on throughout. Plasticity and neuron model parameters are as\nde\ufb01ned above.\nWe let learning occur in continuous time without pauses or alternations in plasticity as input patterns\nare sequentially presented. This is in contrast to previous learning models that rely on computing\nactivity differences over distinct phases, requiring temporally nonlocal computation, or globally\ncoordinated plasticity rule switches (Hinton and McClelland, 1988; O\u2019Reilly, 1996; Xie and Seung,\n2003; Scellier and Bengio, 2017; Guerguiev et al., 2017). Furthermore, we relaxed the bottom-up\nvs. top-down weight symmetry imposed by backprop and kept the top-down weights WPP\n1,2 \ufb01xed.\nForward WPP\nin line with the recently discovered feedback alignment phenomenon (Lillicrap et al., 2016). This\nsimpli\ufb01es the architecture, because top-down and interneuron-to-pyramidal synapses need not be\nchanged. We set the scale of the top-down weights, apical and somatic conductances such that\nfeedback and teaching inputs were strong, to test the model outside the weak feedback regime\n( ! 0) for which our SM theory was developed. Finally, to test robustness, we injected a weak\nnoise current to every neuron.\nOur network was able to learn this harder task (Fig. 2B), performing considerably better than a\nshallow learner where only hidden-to-output weights were adjusted (Fig. 2C). Useful changes were\nthus made to hidden layer bottom-up weights. The self-predicting network state emerged throughout\nlearning from a random initial con\ufb01guration (see SM; Fig. S1).\n\n1,2 weights quickly aligned to \u21e0 45o of the feedback weightsWPP\n\n2,1T (see Fig. S1),\n\n3.4 Microcircuit network learns to classify handwritten digits\n\nNext, we turn to the problem of classifying MNIST handwritten digits. We wondered how our\nmodel would fare in this benchmark, in particular whether the prediction errors computed by the\ninterneuron microcircuit would allow learning the weights of a hierarchical nonlinear network with\nmultiple hidden layers. To that end, we trained a deeper, larger 4-layer network (with 784-500-500-10\npyramidal neurons, Fig. 3A) by pairing digit images with teaching inputs that nudged the 10 output\nneurons towards the correct class pattern. We initialized the network to a random but self-predicting\n\n7\n\nAWPP2,1WPP1,0WPP1,2WIP1,1WPI1,1PA,1rP2shallow learningpyramidal neuron learningSquared errorTraining trial (x107)00.5100.1C0100200rP00100200PA,1Time [ms]Time [ms](ii)(i)beforeafterBlearningr2trgt250rP20v0separate networkteaching/associative inputvr2trgtlayer 2(output)layer 1(hidden)sensoryinputlayer 0ApicalpotentialSensoryinputOutput (Hz)00.503||Apical pot. ||200.503|| pyrk+1 - intk||2Trial (x107)(i)(ii)\fFigure 3: Dendritic error networks\nlearn to classify handwritten digits.\n(A) A network with two hidden lay-\ners learns to classify handwritten digits\nfrom the MNIST data set. (B) Classi-\n\ufb01cation error achieved on the MNIST\ntesting set (blue; cf. shallow learner\n(black) and standard backprop6(red)).\n\nk + I uP\n\nk = (1 I) vI\n\ncon\ufb01guration where interneurons cancelled top-down inputs, rendering the apical compartments\nsilent before training started. Top-down and interneuron-to-pyramidal weights were kept \ufb01xed.\nHere for computational ef\ufb01ciency we used a simpli\ufb01ed network dynamics where the compartmental\npotentials are updated only in two steps before applying synaptic changes. In particular, for each\npresented MNIST image, both pyramidal and interneurons are \ufb01rst initialized to their bottom-\nup prediction state (3), uk = vB,k, starting from layer 1 up to the top layer N. Output layer\nneurons are then nudged towards their desired target utrgt\nN , yielding updated somatic potentials\nN = (1  N ) vB,N + N utrgt\nN . To obtain the remaining \ufb01nal compartmental potentials, the\nuP\nnetwork is visited in reverse order, proceeding from layer k = N  1 down to k = 1. For each k,\ninterneurons are \ufb01rst updated to include top-down teaching signals, uI\nk+1; this\nyields apical compartment potentials according to (4), after which we update hidden layer somatic\npotentials as a convex combination with mixing factor k. The convex combination factors introduced\nabove are directly related to neuron model parameters as conductance ratios. Synaptic weights are\nthen updated according to Eqs. 7-10. Such simpli\ufb01ed dynamics approximates the full recurrent\nnetwork relaxation in the deterministic setting  ! 0, with the approximation improving as the\ntop-down dendritic coupling is decreased, gA ! 0.\nWe train the models on the standard MNIST handwritten image database, further splitting the training\nset into 55000 training and 5000 validation examples. The reported test error curves are computed\non the 10000 held-aside test images. The four-layer network shown in Fig. 3 is initialized in a\nself-predicting state with appropriately scaled initial weight matrices. For our MNIST networks,\nwe used relatively weak feedback weights, apical and somatic conductances (see SM) to justify\nour simpli\ufb01ed approximate dynamics described above, although we found that performance did not\nappreciably degrade with larger values. To speed-up training we use a mini-batch strategy on every\nlearning rule, whereby weight changes are averaged across 10 images before being applied. We\ntake the neuronal transfer function  to be a logistic function, (u) = 1/(1 + exp(u)) and include\na learnable threshold on each neuron, modelled as an additional input \ufb01xed at unity with a plastic\nweight. Desired target class vectors are 1-hot coded, with rtrgt\nN 2{ 0.1, 0.8}. During testing, the\noutput is determined by picking the class label corresponding to the neuron with highest \ufb01ring rate.\nWe found the model to be relatively robust to learning rate tuning on the MNIST task, except for the\nrescaling by the inverse mixing factor to compensate for teaching signal dilution (see SM for the\nexact parameters).\nThe network was able to achieve a test error of 1.96%, Fig. 3B, a \ufb01gure not overly far from the\nreference mark of non-convolutional arti\ufb01cial neural networks optimized with backprop (1.53%) and\ncomparable to recently published results that lie within the range 1.6-2.4% (Lee et al., 2015; Lillicrap\net al., 2016; N\u00f8kland, 2016). The performance of our model also compares favorably to the 3.2%\ntest error reported by Guerguiev et al. (2017) for a two-hidden-layer network. This was possible\ndespite the asymmetry of forward and top-down weights and at odds with exact backprop, thanks\nto a feedback alignment dynamics. Apical compartment voltages remained approximately silent\nwhen output nudging was turned off (data not shown), re\ufb02ecting the maintenance of a self-predicting\nstate throughout learning, which enabled the propagation of errors through the network. To further\ndemonstrate that the microcircuit was able to propagate errors to deeper hidden layers, and that the\ntask was not being solved by making useful changes only to the weights onto the topmost hidden\nlayer, we re-ran the experiment while keeping \ufb01xed the pyramidal-pyramidal weights connecting the\ntwo hidden layers. The network still learned the dataset and achieved a test error of 2.11%.\n\n8\n\nBMNIST handwritten digit images50028x2810500inputhidden 1hidden 2output89A1.96%1.53%8.4%02000510TrialsTest error (%)single-layerdendritic microcircuitbackprop\fAs top-down weights are likely plastic in cortex, we also trained a one-hidden-layer (784-1000-10)\nnetwork where top-down weights were learned on a slow time-scale according to learning rule (10).\nThis inverse learning scheme is closely related to target propagation (Bengio, 2014; Lee et al., 2015).\nSuch learning could play a role in perceptual denoising, pattern completion and disambiguation,\nand boost alignment beyond that achieved by pure feedback alignment (Bengio, 2014). Starting\nfrom random initial conditions and keeping all weights plastic (bottom-up, lateral and top-down)\nthroughout, our network achieved a test classi\ufb01cation performance of 2.48% on MNIST. Once more,\nuseful changes were made to hidden synapses, even though the microcircuit had to track changes in\nboth the bottom-up and the top-down pathways.\n\n4 Conclusions\n\nOur work makes several predictions across different levels of investigation. Here we brie\ufb02y highlight\nsome of these predictions and related experimental observations. The most fundamental feature of\nthe model is that distal dendrites encode error signals that instruct learning of lateral and bottom-\nup connections. While monitoring such dendritic signals during learning is challenging, recent\nexperimental evidence suggests that prediction errors in mouse visual cortex arise from a failure\nto locally inhibit motor feedback (Zmarz and Keller, 2016; Attinger et al., 2017), consistent with\nour model. Interestingly, the plasticity rule for apical dendritic inhibition, which is central to error\nencoding in the model, received support from another recent experimental study (Chiu et al., 2018).\nA further implication of our model is that prediction errors occurring at a higher-order cortical area\nwould imply also prediction errors co-occurring at earlier areas. Recent experimental observations in\nthe macaque face-processing hierarchy support this (Schwiedrzik and Freiwald, 2017).\nHere we have focused on the role of a speci\ufb01c interneuron type (SST) as a feedback-speci\ufb01c\ninterneuron. There are many more interneuron types that we do not consider in our framework. One\nsuch type are the PV (parvalbumin-positive) cells, which have been postulated to mediate a somatic\nexcitation-inhibition balance (Vogels et al., 2011; Froemke, 2015) and competition (Masquelier\nand Thorpe, 2007; Nessler et al., 2013). These functions could in principle be combined with our\nframework in that PV interneurons may be involved in representing another type of prediction error\n(e.g., generative errors).\nHumans have the ability to perform fast (e.g., one-shot) learning, whereas neural networks trained by\nbackpropagation of error (or approximations thereof, like ours) require iterating over many training\nexamples to learn. This is an important open problem that stands in the way of understanding the\nneuronal basis of intelligence. One possibility where our model naturally \ufb01ts is to consider multiple\nsubsystems (for example, the neocortex and the hippocampus) that transfer knowledge to each other\nand learn at different rates (McClelland et al., 1995; Kumaran et al., 2016).\nOverall, our work provides a new view on how the brain may solve the credit assignment problem\nfor time-continuous input streams by approximating the backpropagation algorithm, and bringing\ntogether many puzzling features of cortical microcircuits.\n\nAcknowledgements\nThe authors would like to thank Timothy P. Lillicrap, Blake Richards, Benjamin Scellier and Mihai\nA. Petrovici for helpful discussions. WS thanks Matthew Larkum for many inspiring discussions on\ndendritic processing. JS thanks Elena Kreutzer, Pascal Leimer and Martin T. Wiechert for valuable\nfeedback and critical reading of the manuscript.\nThis work has been supported by the Swiss National Science Foundation (grant 310030L-156863 of\nWS), the European Union\u2019s Horizon 2020 Framework Programme for Research and Innovation under\nthe Speci\ufb01c Grant Agreement No. 785907 (Human Brain Project), NSERC, CIFAR, and Canada\nResearch Chairs.\n\n9\n\n\fReferences\nAckley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines.\n\nCognitive Science, 9(1):147\u2013169.\n\nAttinger, A., Wang, B., and Keller, G. B. (2017). Visuomotor coupling shapes the functional development of\n\nmouse visual cortex. Cell, 169(7):1291\u20131302.e14.\n\nBengio, Y. (2014). How auto-encoders could provide credit assignment in deep networks via target propagation.\n\narXiv:1407.7906\n\nBono, J. and Clopath, C. (2017). Modeling somatic and dendritic spike mediated plasticity at the single neuron\n\nand network level. Nature Communications, 8(1):706.\n\nBottou, L. (1998). Online algorithms and stochastic approximations. In Saad, D., editor, Online Learning and\n\nNeural Networks. Cambridge University Press, Cambridge, UK.\n\nChiu, C. Q., Martenson, J. S., Yamazaki, M., Natsume, R., Sakimura, K., Tomita, S., Tavalin, S. J., and\nHigley, M. J. (2018). Input-speci\ufb01c nmdar-dependent potentiation of dendritic gabaergic inhibition. Neuron,\n97(2):368\u2013377.\n\nClopath, C., B\u00fcsing, L., Vasilaki, E., and Gerstner, W. (2010). Connectivity re\ufb02ects coding: a model of\n\nvoltage-based stdp with homeostasis. Nature Neuroscience, 13(3):344\u2013352.\n\nCosta, R. P., Assael, Y. M., Shillingford, B., de Freitas, N., and Vogels, T. P. (2017). Cortical microcircuits as\n\ngated-recurrent neural networks. In Advances in Neural Information Processing Systems, pages 271\u2013282.\n\nCrick, F. (1989). The recent excitement about neural networks. Nature, 337:129\u2013132.\n\nDorrn, A. L., Yuan, K., Barker, A. J., Schreiner, C. E., and Froemke, R. C. (2010). Developmental sensory\n\nexperience balances cortical excitation and inhibition. Nature, 465(7300):932\u2013936.\n\nFriedrich, J., Urbanczik, R., and Senn, W. (2011). Spatio-temporal credit assignment in neuronal population\n\nlearning. PLOS Computational Biology, 7(6):e1002092.\n\nFriston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society of London B:\n\nBiological Sciences, 360(1456):815\u2013836.\n\nFroemke, R. C. (2015). Plasticity of cortical excitatory-inhibitory balance. Annual Review of Neuroscience,\n\n38(1):195\u2013219.\n\nFu, Y., Kaneko, M., Tang, Y., Alvarez-Buylla, A., and Stryker, M. P. (2015). A cortical disinhibitory circuit for\n\nenhancing adult plasticity. eLife, 4:e05558.\n\nGrossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive\n\nScience, 11(1):23\u201363.\n\nGuerguiev, J., Lillicrap, T. P., and Richards, B. A. (2017). Towards deep learning with segregated dendrites.\n\neLife, 6:e22901.\n\nHinton, G. E. and McClelland, J. L. (1988). Learning representations by recirculation. In Anderson, D. Z.,\n\neditor, Neural Information Processing Systems, pages 358\u2013366. American Institute of Physics.\n\nKell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V., and McDermott, J. H. (2018). A task-optimized\nneural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing\nhierarchy. Neuron.\n\nKhaligh-Razavi, S.-M. and Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain\n\nit cortical representation. PLOS Computational Biology, 10(11):1\u201329.\n\nK\u00f6rding, K. P. and K\u00f6nig, P. (2000). Learning with two sites of synaptic integration. Network: Comput. Neural\n\nSyst., 11:1\u201315.\n\nK\u00f6rding, K. P. and K\u00f6nig, P. (2001). Supervised and unsupervised learning with two sites of synaptic integration.\n\nJournal of Computational Neuroscience, 11:207\u2013215.\n\nKumaran, D., Hassabis, D., and McClelland, J. L. (2016). What learning systems do intelligent agents need?\n\ncomplementary learning systems theory updated. Trends in Cognitive Sciences, 20(7):512 \u2013 534.\n\n10\n\n\fLarkum, M. (2013). A cellular mechanism for cortical associations: an organizing principle for the cerebral\n\ncortex. Trends in Neurosciences, 36(3):141\u2013151.\n\nLeCun, Y. (1988). A theoretical framework for back-propagation. In Touretzky, D., Hinton, G., and Sejnowski,\nT., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 21\u201328. Morgan Kaufmann,\nPittsburg, PA.\n\nLeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436\u2013444.\n\nLee, D.-H., Zhang, S., Fischer, A., and Bengio, Y. (2015). Difference target propagation. In Machine Learning\n\nand Knowledge Discovery in Databases, pages 498\u2013515. Springer.\n\nLeinweber, M., Ward, D. R., Sobczak, J. M., Attinger, A., and Keller, G. B. (2017). A Sensorimotor Circuit in\n\nMouse Cortex for Visual Flow Predictions. Neuron, 95(6):1420\u20131432.e5.\n\nLillicrap, T. P., Cownden, D., Tweed, D. B., and Akerman, C. J. (2016). Random synaptic feedback weights\n\nsupport error backpropagation for deep learning. Nature Communications, 7:13276.\n\nLillicrap, T. P. and Scott, S. H. (2013). Preference distributions of primary motor cortex neurons re\ufb02ect control\n\nsolutions optimized for limb biomechanics. Neuron, 77(1):168\u2013179.\n\nLuz, Y. and Shamir, M. (2012). Balancing feed-forward excitation and inhibition via Hebbian inhibitory synaptic\n\nplasticity. PLOS Computational Biology, 8(1):e1002334.\n\nMakino, H. and Komiyama, T. (2015). Learning enhances the relative impact of top-down processing in the\n\nvisual cortex. Nature Neuroscience, 18(8):1116\u20131122.\n\nManita, S., Suzuki, T., Homma, C., Matsumoto, T., Odagawa, M., Yamada, K., Ota, K., Matsubara, C.,\nInutsuka, A., Sato, M., et al. (2015). A top-down cortical circuit for accurate sensory perception. Neuron,\n86(5):1304\u20131316.\n\nMarblestone, A. H., Wayne, G., and Kording, K. P. (2016). Toward an integration of deep learning and\n\nneuroscience. Frontiers in Computational Neuroscience, 10:94.\n\nMasquelier, T. and Thorpe, S. (2007). Unsupervised learning of visual features through spike timing dependent\n\nplasticity. PLOS Computational Biology, 3.\n\nMcClelland, J. L., McNaughton, B. L., and O\u2019reilly, R. C. (1995). Why there are complementary learning\nsystems in the hippocampus and neocortex: insights from the successes and failures of connectionist models\nof learning and memory. Psychological review, 102(3):419.\n\nNessler, B., Pfeiffer, M., Buesing, L., and Maass, W. (2013). Bayesian computation emerges in generic cortical\n\nmicrocircuits through spike-timing-dependent plasticity. PLOS Computational Biology, 9(4):e1003037.\n\nN\u00f8kland, A. (2016). Direct feedback alignment provides learning in deep neural networks. In Advances in\n\nNeural Information Processing Systems, pages 1037\u20131045.\n\nO\u2019Reilly, R. C. (1996). Biologically plausible error-driven learning using local activation differences: The\n\ngeneralized recirculation algorithm. Neural Computation, 8(5):895\u2013938.\n\nPakan, J. M., Lowe, S. C., Dylda, E., Keemink, S. W., Currie, S. P., Coutts, C. A., Rochefort, N. L., and\nMrsic-Flogel, T. D. (2016). Behavioral-state modulation of inhibition is context-dependent and cell type\nspeci\ufb01c in mouse visual cortex. eLife, 5:e14985.\n\nPetreanu, L., Gutnisky, D. A., Huber, D., Xu, N.-l., O\u2019Connor, D. H., Tian, L., Looger, L., and Svoboda,\nK. (2012). Activity in motor-sensory projections reveals distributed coding in somatosensation. Nature,\n489(7415):299\u2013303.\n\nPetreanu, L., Mao, T., Sternson, S. M., and Svoboda, K. (2009). The subcellular organization of neocortical\n\nexcitatory connections. Nature, 457(7233):1142\u20131145.\n\nPoort, J., Khan, A. G., Pachitariu, M., Nemri, A., Orsolic, I., Krupic, J., Bauza, M., Sahani, M., Keller,\nG. B., Mrsic-Flogel, T. D., and Hofer, S. B. (2015). Learning enhances sensory and multiple non-sensory\nrepresentations in primary visual cortex. Neuron, 86(6):1478\u20131490.\n\nRao, R. P. and Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some\n\nextra-classical receptive-\ufb01eld effects. Nature Neuroscience, 2(1):79\u201387.\n\nRoelfsema, P. R. and Holtmaat, A. (2018). Control of synaptic plasticity in deep cortical networks. Nature\n\nReviews Neuroscience, 19(3):166.\n\n11\n\n\fRoelfsema, P. R. and van Ooyen, A. (2005). Attention-gated reinforcement learning of internal representations\n\nfor classi\ufb01cation. Neural Computation, 17(10):2176\u20132214.\n\nRumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating\n\nerrors. Nature, 323:533\u2013536.\n\nScellier, B. and Bengio, Y. (2017). Equilibrium propagation: Bridging the gap between energy-based models\n\nand backpropagation. Frontiers in Computational Neuroscience, 11:24.\n\nSchwiedrzik, C. M. and Freiwald, W. A. (2017). High-level prediction signals in a low-level area of the macaque\n\nface-processing hierarchy. Neuron, 96(1):89\u201397.e4.\n\nSj\u00f6str\u00f6m, P. J., Turrigiano, G. G., and Nelson, S. B. (2001). Rate, Timing, and Cooperativity Jointly Determine\n\nCortical Synaptic Plasticity. Neuron, 32(6):1149\u20131164.\n\nSpicher, D., Clopath, C., and Senn, W. (2018). Predictive plasticity in dendrites: from a computational principle\n\nto experimental data (in preparation).\n\nSpruston, N. (2008). Pyramidal neurons: dendritic structure and synaptic integration. Nature Reviews Neuro-\n\nscience, 9(3):206\u2013221.\n\nSutton, R. S. and Barto, A. G. (1998). Reinforcement learning: An introduction, volume 1. MIT Press,\n\nCambridge, Mass.\n\nUrban-Ciecko, J. and Barth, A. L. (2016). Somatostatin-expressing neurons in cortical networks. Nature Reviews\n\nNeuroscience, 17(7):401\u2013409.\n\nUrbanczik, R. and Senn, W. (2014). Learning by the dendritic prediction of somatic spiking. Neuron, 81(3):521\u2013\n\n528.\n\nVogels, T. P., Sprekeler, H., Zenke, F., Clopath, C., and Gerstner, W. (2011). Inhibitory plasticity balances\n\nexcitation and inhibition in sensory pathways and memory networks. Science, 334(6062):1569\u20131573.\n\nWhittington, J. C. R. and Bogacz, R. (2017). An approximation of the error backpropagation algorithm in a\n\npredictive coding network with local Hebbian synaptic plasticity. Neural Computation, 29(5):1229\u20131262.\n\nXie, X. and Seung, H. S. (2003). Equivalence of backpropagation and contrastive Hebbian learning in a layered\n\nnetwork. Neural Computation, 15(2):441\u2013454.\n\nYamins, D. L. and DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex.\n\nNature Neuroscience, 19(3):356\u2013365.\n\nYamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., and DiCarlo, J. J. (2014). Performance-\noptimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National\nAcademy of Sciences, 111(23):8619\u20138624.\n\nZmarz, P. and Keller, G. B. (2016). Mismatch receptive \ufb01elds in mouse visual cortex. Neuron, 92(4):766\u2013772.\n\n12\n\n\f", "award": [], "sourceid": 5269, "authors": [{"given_name": "Jo\u00e3o", "family_name": "Sacramento", "institution": "University of Bern"}, {"given_name": "Rui", "family_name": "Ponte Costa", "institution": "Univeristy of Bern"}, {"given_name": "Yoshua", "family_name": "Bengio", "institution": "U. Montreal"}, {"given_name": "Walter", "family_name": "Senn", "institution": "University of Bern"}]}