{"title": "Attractor Network Dynamics Enable Preplay and Rapid Path Planning in Maze\u2013like Environments", "book": "Advances in Neural Information Processing Systems", "page_first": 1684, "page_last": 1692, "abstract": "Rodents navigating in a well-known environment can rapidly learn and revisit observed reward locations, often after a single trial. While the mechanism for rapid path planning is unknown, the CA3 region in the hippocampus plays an important role, and emerging evidence suggests that place cell activity during hippocampal preplay periods may trace out future goal-directed trajectories. Here, we show how a particular mapping of space allows for the immediate generation of trajectories between arbitrary start and goal locations in an environment, based only on the mapped representation of the goal. We show that this representation can be implemented in a neural attractor network model, resulting in bump--like activity profiles resembling those of the CA3 region of hippocampus. Neurons tend to locally excite neurons with similar place field centers, while inhibiting other neurons with distant place field centers, such that stable bumps of activity can form at arbitrary locations in the environment. The network is initialized to represent a point in the environment, then weakly stimulated with an input corresponding to an arbitrary goal location. We show that the resulting activity can be interpreted as a gradient ascent on the value function induced by a reward at the goal location. Indeed, in networks with large place fields, we show that the network properties cause the bump to move smoothly from its initial location to the goal, around obstacles or walls. Our results illustrate that an attractor network with hippocampal-like attributes may be important for rapid path planning.", "full_text": "Attractor Network Dynamics Enable Preplay and\nRapid Path Planning in Maze\u2013like Environments\n\nDane Corneil\n\nWulfram Gerstner\n\nLaboratory of Computational Neuroscience\n\u00b4Ecole Polytechnique F\u00b4ed\u00b4erale de Lausanne\n\nLaboratory of Computational Neuroscience\n\u00b4Ecole Polytechnique F\u00b4ed\u00b4erale de Lausanne\n\nCH-1015 Lausanne, Switzerland\ndane.corneil@epfl.ch\n\nCH-1015 Lausanne, Switzerland\n\nwulfram.gerstner@epfl.ch\n\nAbstract\n\nRodents navigating in a well\u2013known environment can rapidly learn and revisit ob-\nserved reward locations, often after a single trial. While the mechanism for rapid\npath planning is unknown, the CA3 region in the hippocampus plays an important\nrole, and emerging evidence suggests that place cell activity during hippocam-\npal \u201cpreplay\u201d periods may trace out future goal\u2013directed trajectories. Here, we\nshow how a particular mapping of space allows for the immediate generation of\ntrajectories between arbitrary start and goal locations in an environment, based\nonly on the mapped representation of the goal. We show that this representation\ncan be implemented in a neural attractor network model, resulting in bump\u2013like\nactivity pro\ufb01les resembling those of the CA3 region of hippocampus. Neurons\ntend to locally excite neurons with similar place \ufb01eld centers, while inhibiting\nother neurons with distant place \ufb01eld centers, such that stable bumps of activity\ncan form at arbitrary locations in the environment. The network is initialized to\nrepresent a point in the environment, then weakly stimulated with an input cor-\nresponding to an arbitrary goal location. We show that the resulting activity can\nbe interpreted as a gradient ascent on the value function induced by a reward at\nthe goal location. Indeed, in networks with large place \ufb01elds, we show that the\nnetwork properties cause the bump to move smoothly from its initial location to\nthe goal, around obstacles or walls. Our results illustrate that an attractor network\nwith hippocampal\u2013like attributes may be important for rapid path planning.\n\n1\n\nIntroduction\n\nWhile early human case studies revealed the importance of the hippocampus in episodic memory [1,\n2], the discovery of \u201cplace cells\u201d in rats [3] established its role for spatial representation. Recent\nresults have further suggested that, along with these functions, the hippocampus is involved in active\nspatial planning: experiments in \u201cone\u2013shot learning\u201d have revealed the critical role of the CA3\nregion [4, 5] and the intermediate hippocampus [6] in returning to goal locations that the animal has\nseen only once. This poses the question of whether and how hippocampal dynamics could support\na representation of the current location, a representation of a goal, and the relation between the two.\nIn this article, we propose that a model of CA3 as a \u201cbump attractor\u201d [7] can be be used for path\nplanning. The attractor map represents not only locations within the environment, but also the spatial\nrelationship between locations. In particular, broad activity pro\ufb01les (like those found in intermediate\nand ventral hippocampus [8]) can be viewed as a condensed map of a particular environment. The\nplanned path presents as rapid sequential activity from the current position to the goal location,\nsimilar to the \u201cpreplay\u201d observed experimentally in hippocampal activity during navigation tasks [9,\n10], including paths that require navigating around obstacles. In the model, the activity is produced\nby supplying input to the network consistent with the sensory input that would be provided at the\n\n1\n\n\fgoal site. Unlike other recent models of rapid goal learning and path planning [11, 12], there is\nno backwards diffusion of a value signal from the goal to the current state during the learning or\nplanning process. Instead, the sequential activity results from the representation of space in the\nattractor network, even in the presence of obstacles.\nThe recurrent structure in our model is derived from the \u201csuccessor representation\u201d [13], which\nrepresents space according to the number and length of paths connecting different locations. The\nresulting network can be interpreted as an attractor manifold in a low\u2013dimensional space, where the\ndimensions correspond to weighted version of the most relevant eigenvectors of the environment\u2019s\ntransition matrix. Such low\u2013frequency functions have recently found support as a viable basis for\nplace cell activity [14\u201316]. We show that, when the attractor network operates in this basis and is\nstimulated with a goal location, the network activity traces out a path to that goal. Thus, the bump\nattractor network can act as a spatial path planning system as well as a spatial memory system.\n\n2 The successor representation and path\u2013\ufb01nding\n\nA key problem in reinforcement learning is assessing the value of a particular state, given the ex-\npected returns from that state in both the immediate and distant future. Several model\u2013free algo-\nrithms exist for solving this task [17], but they are slow to adjust when the reward landscape is\nrapidly changing. The successor representation, proposed by Dayan [13], addresses this issue.\nGiven a Markov chain described by the transition matrix P, where each element P (s, s(cid:48)) gives the\nprobability of transitioning from state s to state s(cid:48) in a single time step; a reward vector r, where\neach element r(s(cid:48)) gives the expected immediate returns from state s(cid:48); and a discount factor \u03b3, the\nexpected returns v from each state can be described by\n\nv = r + \u03b3Pr + \u03b32P2r + \u03b33P3r + . . .\n\n= (I \u2212 \u03b3P)\u22121r\n= Lr.\n\n(1)\n\nThe successor representation L provides an ef\ufb01cient means of representing the state space according\nto the expected (discounted) future occupancy of each state s(cid:48), given that the chain is initialized from\nstate s. An agent employing a policy described by the matrix P can immediately update the value\nfunction when the reward landscape r changes, without any further exploration.\nThe successor representation is particularly useful for representing many reward landscapes in the\nsame state space. Here we consider the set of reward functions where returns are con\ufb01ned to a single\nstate s(cid:48); i.e. r(s(cid:48)) = \u03b4s(cid:48)g where \u03b4 denotes the Kronecker delta function and the index g denotes a\nparticular goal state. From Eq. 1, we see that the value function is then given by the column s(cid:48)\nof the matrix L. Indeed, when we consider only a single goal, we can see the elements of L as\nL(s, s(cid:48)) = v(s|s(cid:48) = g). We will use this property to generate a spatial mapping that allows for a\nrapid approximation of the shortest path between any two points in an environment.\n\n2.1 Representing space using the successor representation\n\nIn the spatial navigation problems considered here, we assume that the animal has explored the en-\nvironment suf\ufb01ciently to learn its natural topology. We represent the relationship between locations\nwith a Gaussian af\ufb01nity metric a: given states s(x, y) and s(cid:48)(x, y) in the 2D plane, their af\ufb01nity is\n\na(s(x, y), s(cid:48)(x, y)) = a(s(cid:48)(x, y), s(x, y)) = exp\n\n(2)\n\nwhere d is the length of the shortest traversable path between s and s(cid:48), respecting walls and obstacles.\nWe de\ufb01ne \u03c3 to be small enough that the metric is localized (Fig. 1) such that a(s(x, y),\u00b7) resembles\na small bump in space, truncated by walls. Normalizing the af\ufb01nity metric gives\n\n(cid:18)\u2212d2\n\n(cid:19)\n\n2\u03c32\ns\n\np(s, s(cid:48)) =\n\na(s, s(cid:48))\ns(cid:48) a(s, s(cid:48))\n\n(cid:80)\n\n2\n\n.\n\n(3)\n\n\fThe normalized metric can be interpreted as a transition probability for an agent exploring the envi-\nronment randomly. In this case, a spectral analysis of the successor representation [14, 18] gives\n\nv(s|s(cid:48) = g) = \u03c0(s(cid:48))\n\n(1 \u2212 \u03b3\u03bbl)\u22121\u03c8l(s)\u03c8l(s(cid:48))\n\n(4)\n\nwhere \u03c8l are the right eigenvectors of the transition matrix P, 1 = |\u03bb0| \u2265 |\u03bb1| \u2265 |\u03bb2|\u00b7\u00b7\u00b7 \u2265\n|\u03bbn| are the eigenvalues [18], and \u03c0(s(cid:48)) denotes the steady\u2013state occupancy of state s(cid:48) resulting\nfrom P. Although the af\ufb01nity metric is de\ufb01ned locally, large\u2013scale features of the environment are\nrepresented in the eigenvectors associated with the largest eigenvalues (Fig. 1).\nWe now express the position in the 2D space using a set of \u201csuccessor coordinates\u201d, such that\n\nn(cid:88)\n\nl=0\n\n(cid:113)\n\ns(x, y) (cid:55)\u2192 \u02d8s =\n\n(1 \u2212 \u03b3\u03bb0)\n\n\u22121\u03c80(s),\n\n(1 \u2212 \u03b3\u03bb1)\n\n\u22121\u03c81(s), . . . ,\n\n(1 \u2212 \u03b3\u03bbq)\n\n\u22121\u03c8q(s)\n\n(cid:113)\n\n(cid:19)\n\n(5)\n\n(cid:18)(cid:113)\n\n(cid:113)\n\n= (\u03be0(s), \u03be1(s), . . . , \u03beq(s))\n\n(1 \u2212 \u03b3\u03bbl)\n\n\u22121\u03c8l. This is similar to the \u201cdiffusion map\u201d framework by Coifman and\nwhere \u03bel =\nLafon [18]; with the useful property that, if q = n, the value of a given state when considering\na given goal is proportional to the scalar product of their respective mappings: v(s|s(cid:48) = g) =\n\u03c0(s(cid:48))(cid:104)\u02d8s, \u02d8s(cid:48)(cid:105). We will use this property to show how a network operating in the successor coordinate\nspace can rapidly generate prospective trajectories between arbitrary locations.\nNote that the mapping can also be de\ufb01ned using the eigenvectors \u03c6l of a related measure of the\nspace, the normalized graph Laplacian [19]. The eigenvectors \u03c6l serve as the objective functions for\nslow feature analysis [20], and approximations have been extracted through hierarchical slow feature\nanalysis on visual data [15, 16], where they have been used to generate place cell\u2013like behaviour.\n\n2.2 Path\u2013\ufb01nding using the successor coordinate mapping\n\nSuccessor coordinates provide a means of mapping a set of locations in a 2D environment to a new\nspace based on the topology of the environment. In the new representation, the value landscape\nis particularly simple. To move from a location \u02d8s towards a goal position \u02d8s(cid:48), we can consider a\nconstrained gradient ascent procedure on the value landscape:\n\n(cid:104)\n(cid:104)\n\n(\u02d8s \u2212 (\u02d8st + \u03b1\u2207v(\u02d8st)))2(cid:105)\n(\u02d8s \u2212 (\u02d8st + \u02dc\u03b1\u02d8s(cid:48)))2(cid:105)\n\n\u02d8st+1 = arg min\n\n\u02d8s\u2208 \u02d8S\n\n= arg min\n\n\u02d8s\u2208 \u02d8S\n\n(6)\n\nwhere \u03c0(s(cid:48)) has been absorbed into the parameter \u02dc\u03b1. At each time step, the state closest to an\nincremental ascent of the value gradient is selected amongst all states in the environment \u02d8S. In the\nfollowing, we will consider how the step \u02d8st + \u02dc\u03b1\u02d8s(cid:48) can be approximated by a neural attractor network\nacting in successor coordinate space.\nDue to the properties of the transition matrix, \u03c80 is constant across the state space and does not\ncontribute to the value gradient in Eq. 6. As such, we substituted a free parameter for the coef\ufb01cient\n\n(cid:112)(1 \u2212 \u03b3\u03bb0)\u22121, which controlled the overall level of activity in the network simulations.\n\n3 Encoding successor coordinates in an attractor network\n\nThe bump attractor network is a common model of place cell activity in the hippocampus [7, 21].\nNeurons in the attractor network strongly excite other neurons with similar place \ufb01eld centers, and\nweakly inhibit the neurons within the network with distant place \ufb01eld centers. As a result, the\nnetwork allows a stable bump of activity to form at an arbitrary location within the environment.\n\n3\n\n\f[Left] A rat explores a maze\u2013like environment and passively learns its topology. We as-\nFigure 1:\nsume a process such as hierarchical slow feature analysis, that preliminarily extracts slowly changing\nfunctions in the environment (here, the vectors \u03be1 . . . \u03beq). The vector \u03be1 for the maze is shown in\nthe top left. In practice, we extracted the vectors directly from a localized Gaussian transition func-\ntion (bottom center, for an arbitrary location). [Right] This basis can be used to generate a value\nmap approximation over the environment for a given reward (goal) position and discount factor \u03b3\n(inset). Due to the walls, the function is highly discontinuous in the xy spatial dimensions. The\ngoal position is circled in white. In the scatter plot, the same array of states and value function are\nshown in the \ufb01rst two non\u2013trivial successor coordinate dimensions. In this space, the value function\nis proportional to the scalar product between the states and the goal location. The grey and black\ndots show corresponding states between the inset and the scatter plot.\n\nSuch networks typically represent a periodic (toroidal) environment [7, 21], using a local excitatory\nweight pro\ufb01le that falls off exponentially. Here, we show how the spatial mapping of Eq. 5 can be\nused to represent bounded environments with arbitrary obstacles. The resulting recurrent weights\ninduce stable \ufb01ring \ufb01elds that decrease with distance from the place \ufb01eld center, around walls and\nobstacles, in a manner consistent with experimental observations [22].\nIn addition, the network\ndynamics can be used to perform rapid path planning in the environment.\nWe will use the techniques introduced in the attractor network models by Eliasmith and Anderson\n[23] to generalize the bump attractor. We \ufb01rst consider a purely feed\u2013forward network, composed of\na population of neurons with place \ufb01eld centers scattered randomly throughout the environment. We\nassume that the input is highly preprocessed, potentially by several layers of neuronal processing\n(Fig. 1), and given directly by units k whose activities \u02d8sin\nk (t) = \u03bek(sin(t)) represent the input in the\nsuccessor coordinate dimensions introduced above. The activity ai of neuron i in response to the m\ninputs \u02d8sin\n\nk (t) can be described by\n\n(cid:34) m(cid:88)\n\n(cid:35)\n\n\u03c4\n\ndai(t)\n\ndt\n\n= \u2212ai(t) + g\n\nwf f\n\nik \u02d8sin\n\nk (t)\n\n(7)\n\nk=1\n\n+\n\nwhere g is a gain factor, [\u00b7]+ represents a recti\ufb01ed linear function, and wf f\nik are the feed\u2013forward\nweights. Each neuron is particularly responsive to a \u201cbump\u201d in the environment given by its encod-\ning vector ei = \u02d8si\n||\u02d8si||, the normalized successor coordinates of a particular point in space, which\ncorresponds to its place \ufb01eld center. The input to neuron i in the network is then given by\n\nm(cid:88)\n\nk=1\n\nwf f\nik = [ei]k,\nk (t) = ei \u00b7 \u02d8sin(t).\n\nwf f\n\nik \u02d8sin\n\n(8)\n\nA neuron is therefore maximally active when the input coordinates are nearly parallel to its encoding\nvector. Although we assume the input is given directly in the basis vectors \u03bel for convenience, a\nneural encoding using an (over)complete basis based on a linear combination of the eigenvectors \u03c8l\nor \u03c6l is also possible given a corresponding transformation in the feed\u2013forward weights.\n\n4\n\n3020100-10-20-30-40-40-30-20-100102030-50\fFigure 2: [Left] The attractor network structure for the maze\u2013like environment in Fig. 1. The inputs\ngive a low\u2013dimensional approximation of the successor coordinates of a point in space. The network\nis composed of 500 neurons with encoding vectors representing states scattered randomly through-\nout the environment. Each neuron\u2019s activation is proportional to the scalar product of its encoding\nvector and the input, resulting in a large \u201cbump\u201d of activity. Recurrent weights are generated using a\nleast\u2013squares error decoding of the successor coordinates from the neural activities, projected back\non to the neural encoding vectors. [Right] The generated recurrent weights for the network. The\nplot shows the incoming weights from each neuron to the unit at the circled position, where neurons\nare plotted according to their place \ufb01eld centers.\n\nn(cid:88)\n\nIf the input \u02d8sin(t) represents a location in the environment, a bump of activity forms in the network\n(Fig. 2). These activities give a (non\u2013linear) encoding of the input. Given the response properties of\nthe neurons, we can \ufb01nd a set of linear decoding weights dj that recovers an approximation of the\ninput given to the network from the neural activities [23]:\n\n\u02d8srec(t) =\n\ndj \u00b7 aj(t).\n\n(9)\n\nj=1\n\nThese decoding weights dj were derived by minimizing the least\u2013squares estimation error of a set\nof example inputs from their resulting steady\u2013state activities, where the example inputs correspond\nto the successor coordinates of points evenly spaced throughout the environment. The minimization\ncan be performed by taking the Moore\u2013Penrose pseudoinverse of the matrix of neural activities in\nresponse to the example inputs (with singular values below a certain tolerance removed to avoid\nover\ufb01tting). The vector dj therefore gives the contribution of aj(t) to a linear population code for\nthe input location.\nWe now introduce the recurrent weights wrec\nto allow the network to maintain a memory of past\nij\ninput in persistent activity. The recurrent weights are determined by projecting the decoded location\nback on to the neuron encoding vectors such that\n\nij = (1 \u2212 \u0001) \u00b7 ei \u00b7 dj,\nwrec\n\nij aj(t) = (1 \u2212 \u0001) \u00b7 ei \u00b7 \u02d8srec(t).\nwrec\n\nn(cid:88)\n\nj=1\n\n(10)\n\nHere, the factor \u0001 (cid:28) 1 determines the timescale on which the network activity fades. Since the\nencoding and decoding vectors for the same neuron tend to be similar, recurrent weights are highest\nbetween neurons representing similar successor coordinates, and the weight pro\ufb01le decreases with\nthe distance between place \ufb01eld centers (Fig. 2). The full neuron\u2013level description is given by\n\n\u03c4\n\ndai(t)\n\ndt\n\n\uf8ee\uf8f0 n(cid:88)\n\n\uf8f9\uf8fb\n= \u2212ai(t) + g(cid:2)ei \u00b7(cid:0)(1 \u2212 \u0001) \u00b7 \u02d8srec(t) + \u03b1 \u00b7 \u02d8sin(t)(cid:1)(cid:3)\n\n= \u2212ai(t) + g\n\nij aj(t) + \u03b1\n\nwf f\n\nik \u02d8sin\n\nk (t)\n\nwrec\n\nm(cid:88)\n\nk=1\n\n+\n\n+\n\nj=1\n\n5\n\n(11)\n\n\fwhere the \u03b1 parameter corresponds to the input strength. If we consider the estimate of \u02d8srec(t)\nrecovered from decoding the activities of the network, we arrive at the update equation\n\n\u03c4\n\nd\u02d8srec(t)\n\ndt\n\n\u2248 \u03b1 \u00b7 \u02d8sin(t) \u2212 \u0001 \u00b7 \u02d8srec(t).\n\n(12)\n\nGiven a location \u02d8sin(t) as an initial input, the recovered representation \u02d8srec(t) approximates the\ninput and reinforces it, allowing a persistent bump of activity to form. When \u02d8sin(t) then changes\nto a new (goal) location, the input and recovered coordinates con\ufb02ict. By Eq. 12, the recovered\nlocation moves in the direction of the new input, giving us an approximation of the initial gradient\nascent step in Eq. 6 with the addition of a decay controlled by \u0001. As we will show, the attractor\ndynamics typically cause the network activity to manifest as a movement of the bump towards the\ngoal location, through locations intermediate to the starting position and the goal (as observed in\nexperiments [9, 10]). After a short stimulation period, the network activity can be decoded to give a\nstate nearby the starting position that is closer to the goal. Note that, with no decay \u0001, the network\nactivity will tend to grow over time. To induce stable activity when the network representation\nmatches the goal position (\u02d8srec(t) \u2248 \u02d8sin(t)), we balanced the decay and input strength (\u0001 = \u03b1).\nIn the following, we consider networks where the successor coordinate representation was truncated\nto the \ufb01rst q dimensions, where q (cid:28) n. This was done because the network is composed of a limited\nnumber of neurons, representing only the portion of the successor coordinate space corresponding\nto actual locations in the environment. In a very high\u2013dimensional space, the network can rapidly\nmove into a regime far from any actual locations, and the integration accuracy suffers. In effect, the\nweight pro\ufb01les and feed\u2013forward activation pro\ufb01le become very narrow, and as a result the bump\nof activity simply disappears from the original position and reappears at the goal. Conversely, low\u2013\ndimensional representations tend to result in broad excitatory weight pro\ufb01les and activity pro\ufb01les\n(Fig. 2). The high degree of excitatory overlap across the network causes the activity pro\ufb01le to move\nsmoothly between distant points, as we will show.\n\n4 Results\n\nWe generated attractor networks according to the layout of multiple environments containing walls\nand obstacles, and stimulated them successively with arbitrary startpoints and goals. We used\nn = 500 neurons to represent each environment, with place \ufb01eld centers selected randomly through-\nout the environment. The successor coordinates were generated using \u03b3 = 1. We adjusted q to\ncontrol the dimensionality of the representation. The network activity resembles a bump across a\nportion of the environment (Fig. 3). Low\u2013dimensional representations (low q) produced large activ-\nity bumps across signi\ufb01cant portions of the environment; when a weak stimulus was provided at the\ngoal, the overall activity decreased while the center of the bump moved towards the goal through\nthe intervening areas of the environment. With a high\u2013dimensional representation, activity bumps\nbecame more localized, and shifted discontinuously to the goal (Fig. 3, bottom row).\nFor several networks representing different environments, we initialized the activity at points evenly\nspaced throughout the environment and provided weak feed\u2013forward stimulation corresponding to\na \ufb01xed goal location (Fig. 4). After a short delay (5\u03c4), we decoded the successor coordinates from\nthe network activity to determine the closest state (Eq. 6). The shifts in the network representation\nare shown by the arrows in Fig. 4. For two networks, we show the effect of different feed\u2013forward\nstimuli representing different goal locations. The movement of the activity pro\ufb01le was similar to the\nshortest path towards the goal (Fig. 4, bottom left), including reversals at equidistant points (center\nbottom of the maze). Irregularities were still present, however, particularly near the edges of the\nenvironment and in the immediate vicinity of the goal (where high\u2013frequency components play a\nlarger role in determining the value gradient).\n\n5 Discussion\n\nWe have presented a spatial bump attractor model generalized to represent environments with arbi-\ntrary obstacles, and shown how, with large activity pro\ufb01les relative to the size of the environment, the\nnetwork dynamics can be used for path\u2013\ufb01nding. This provides a possible correlate for goal\u2013directed\n\n6\n\n\fFigure 3: Attractor network activities illustrated over time for different inputs and networks, in\nmultiples of the membrane time constant \u03c4. Purple boxes indicate the most active unit at each point\nin time. [First row] Activities are shown for a network representing a maze\u2013like environment in\na low\u2013dimensional space (q = 5). The network was initially stimulated with a bump of activa-\ntion representing the successor coordinates of the state at the black circle; recurrent connections\nmaintain a similar yet fading pro\ufb01le over time. [Second row] For the same network and initial con-\nditions, a weak constant stimulus was provided representing the successor coordinates at the grey\ncircle; the activities transiently decrease and the center of the pro\ufb01le shifts over time through the\nenvironment. [Third row] Two positions (black and grey circles) were sequentially activated in a\nnetwork representing a second environment in a low\u2013dimensional space (q = 4). [Bottom row] For\na higher\u2013dimensional representation (q = 50), the activity pro\ufb01le fades rapidly and reappears at the\nstimulated position.\n\nactivity observed in the hippocampus [9, 10] and an hypothesis for the role that the hippocampus\nand the CA3 region play in rapid goal\u2013directed navigation [4\u20136], as a complement to an additional\n(e.g. model\u2013free) system enabling incremental goal learning in unfamiliar environments [4].\nRecent theoretical work has linked the bump\u2013like \ufb01ring behaviour of place cells to an encoding of the\nenvironment based on its natural topology, including obstacles [22], and speci\ufb01cally to the successor\nrepresentation [14]. As well, recent work has proposed that place cell behaviour can be learned by\nprocessing visual data using hierarchical slow feature analysis [15, 16], a process which can extract\nthe lowest frequency eigenvectors of the graph Laplacian generated by the environment [20] and\ntherefore provide a potential input for successor representation\u2013based activity. We provide the \ufb01rst\nlink between these theoretical analyses and attractor\u2013based models of CA3.\nSlow feature analysis has been proposed as a natural outcome of a plasticity rule based on Spike\u2013\nTiming\u2013Dependent Plasticity (STDP) [24], albeit on the timescale of a standard postsynaptic po-\n\n7\n\n0.04.09.013.018.00.04.09.013.018.00.02.04.06.08.00.02.755.58.2511.0\fFigure 4: Large\u2013scale, low\u2013dimensional attractor network activities can be decoded to determine\nlocal trajectories to long\u2013distance goals. Arrows show the initial change in the location of the\nactivity pro\ufb01le by determining the state closest to the decoded network activity (at t = 5\u03c4) after\nweakly stimulating with the successor coordinates at the black dot (\u03b1 = \u0001 = 0.05). Pixels show the\nplace \ufb01eld centers of the 500 neurons representing each environment, coloured according to their\nactivity at the stimulated goal site. [Top left] Change in position of the activity pro\ufb01le in a maze\u2013\nlike environment with low\u2013dimensional activity (q = 5) compared to [Bottom left] the true shortest\npath towards the goal at each point in the environment. [Additional plots] Various environments\nand stimulated goal sites using low\u2013dimensional successor coordinate representations.\n\ntential rather than the behavioural timescale we consider here. However, STDP can be extended to\nbehavioural timescales when combined with sustained \ufb01ring and slowly decaying potentials [25] of\nthe type observed on the single\u2013neuron level in the input pathway to CA3 [26], or as a result of\nnetwork effects. Within the attractor network, learning could potentially be addressed by a rule that\ntrains recurrent synapses to reproduce feed\u2013forward inputs during exploration (e.g. [27]).\nOur model assigns a key role to neurons with large place \ufb01elds in generating long\u2013distance goal\u2013\ndirected trajectories. This suggests that such trajectories in dorsal hippocampus (where place \ufb01elds\nare much smaller [8]) must be inherited from dynamics in ventral or intermediate hippocampus.\nThe model predicts that ablating the intermediate/ventral hippocampus [6] will result in a signi\ufb01cant\nreduction in goal\u2013directed preplay activity in the remaining dorsal region. In an intact hippocampus,\nthe model predicts that long\u2013distance goal\u2013directed preplay in the dorsal hippocampus is preceded\nby preplay tracing a similar path in intermediate hippocampus. However, these large\u2013scale networks\nlack the speci\ufb01city to consistently generate useful trajectories in the immediate vicinity of the goal.\nTherefore, higher\u2013dimensional (dorsal) representations may prove useful in generating trajectories\nclose to the goal location, or alternative methods of navigation may become more important.\nIf an assembly of neurons projecting to the attractor network is active while the animal searches the\nenvironment, reward\u2013modulated Hebbian plasticity provides a potential mechanism for reactivating\na goal location. In particular, the presence of a reward\u2013induced neuromodulator could allow for\npotentiation between the assembly and the attractor network neurons active when the animal receives\na reward at a particular location. Activating the assembly would then provide stimulation to the goal\nlocation in the network; the same mechanism could allow an arbitrary number of assemblies to\nbecome selective for different goal locations in the same environment. Unlike traditional model\u2013\nfree methods of learning which generate a static value map, this would give a highly con\ufb01gurable\nmeans of navigating the environment (e.g. visiting different goal locations based on thirst vs. hunger\nneeds), providing a link between spatial navigation and higher cognitive functioning.\nAcknowledgements\nThis research was supported by the Swiss National Science Foundation (grant agreement no. 200020 147200).\nWe thank Laureline Logiaco and Johanni Brea for valuable discussions.\n\n8\n\n\fReferences\n[1] William Beecher Scoville and Brenda Milner. Loss of recent memory after bilateral hippocampal lesions. Journal of\n\nneurology, neurosurgery, and psychiatry, 20(1):11, 1957.\n\n[2] Howard Eichenbaum. Memory, amnesia, and the hippocampal system. MIT press, 1993.\n[3] John O\u2019Keefe and Jonathan Dostrovsky. The hippocampus as a spatial map. preliminary evidence from unit activity in\n\nthe freely-moving rat. Brain research, 34(1):171\u2013175, 1971.\n\n[4] Kazu Nakazawa, Linus D Sun, Michael C Quirk, Laure Rondi-Reig, Matthew A Wilson, and Susumu Tonegawa.\nHippocampal CA3 NMDA receptors are crucial for memory acquisition of one-time experience. Neuron, 38(2):305\u2013\n315, 2003.\n\n[5] Toshiaki Nakashiba, Jennie Z Young, Thomas J McHugh, Derek L Buhl, and Susumu Tonegawa. Transgenic inhibition\n\nof synaptic transmission reveals role of ca3 output in hippocampal learning. Science, 319(5867):1260\u20131264, 2008.\n\n[6] Tobias Bast, Iain A Wilson, Menno P Witter, and Richard GM Morris. From rapid place learning to behavioral perfor-\n\nmance: a key role for the intermediate hippocampus. PLoS biology, 7(4):e1000089, 2009.\n\n[7] Alexei Samsonovich and Bruce L McNaughton. Path integration and cognitive mapping in a continuous attractor neural\n\nnetwork model. The Journal of Neuroscience, 17(15):5900\u20135920, 1997.\n\n[8] Kirsten Brun Kjelstrup, Trygve Solstad, Vegard Heimly Brun, Torkel Hafting, Stefan Leutgeb, Menno P Witter, Edvard I\nMoser, and May-Britt Moser. Finite scale of spatial representation in the hippocampus. Science, 321(5885):140\u2013143,\n2008.\n\n[9] Brad E Pfeiffer and David J Foster. Hippocampal place-cell sequences depict future paths to remembered goals. Nature,\n\n497(7447):74\u201379, 2013.\n\n[10] Andrew M Wikenheiser and A David Redish. Hippocampal theta sequences re\ufb02ect current goals. Nature neuroscience,\n\n2015.\n\n[11] Louis-Emmanuel Martinet, Denis Sheynikhovich, Karim Benchenane, and Angelo Arleo. Spatial learning and action\n\nplanning in a prefrontal cortical network model. PLoS computational biology, 7(5):e1002045, 2011.\n\n[12] Filip Ponulak and John J Hop\ufb01eld. Rapid, parallel path planning by propagating wavefronts of spiking neural activity.\n\nFrontiers in computational neuroscience, 7, 2013.\n\n[13] Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Com-\n\nputation, 5(4):613\u2013624, 1993.\n\n[14] Kimberly L Stachenfeld, Matthew Botvinick, and Samuel J Gershman. Design principles of the hippocampal cognitive\nmap. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger, editors, Advances in Neural\nInformation Processing Systems 27, pages 2528\u20132536. Curran Associates, Inc., 2014.\n\n[15] Mathias Franzius, Henning Sprekeler, and Laurenz Wiskott. Slowness and sparseness lead to place, head-direction, and\n\nspatial-view cells. PLoS Computational Biology, 3(8):e166, 2007.\n\n[16] Fabian Schoenfeld and Laurenz Wiskott. Modeling place \ufb01eld activity with hierarchical slow feature analysis. Frontiers\n\nin Computational Neuroscience, 9:51, 2015.\n\n[17] Richard S Sutton and Andrew G Barto. Introduction to reinforcement learning. MIT Press, 1998.\n[18] Ronald R Coifman and St\u00b4ephane Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5\u201330,\n\n2006.\n\n[19] Sridhar Mahadevan. Learning Representation and Control in Markov Decision Processes, volume 3. Now Publishers\n\nInc, 2009.\n\n[20] Henning Sprekeler. On the relation of slow feature analysis and laplacian eigenmaps. Neural computation, 23(12):\n\n3287\u20133302, 2011.\n\n[21] John Conklin and Chris Eliasmith. A controlled attractor network model of path integration in the rat. Journal of\n\ncomputational neuroscience, 18(2):183\u2013203, 2005.\n\n[22] Nicholas J Gustafson and Nathaniel D Daw. Grid cells, place cells, and geodesic generalization for spatial reinforcement\n\nlearning. PLoS computational biology, 7(10):e1002235, 2011.\n\n[23] Chris Eliasmith and C Charles H Anderson. Neural engineering: Computation, representation, and dynamics in neu-\n\nrobiological systems. MIT Press, 2004.\n\n[24] Henning Sprekeler, Christian Michaelis, and Laurenz Wiskott. Slowness: an objective for spike-timing-dependent\n\nplasticity. PLoS Comput Biol, 3(6):e112, 2007.\n\n[25] Patrick J Drew and LF Abbott. Extending the effects of spike-timing-dependent plasticity to behavioral timescales.\n\nProceedings of the National Academy of Sciences, 103(23):8876\u20138881, 2006.\n\n[26] Phillip Larimer and Ben W Strowbridge. Representing information in cell assemblies: persistent activity mediated by\n\nsemilunar granule cells. Nature neuroscience, 13(2):213\u2013222, 2010.\n\n[27] Robert Urbanczik and Walter Senn. Learning by the dendritic prediction of somatic spiking. Neuron, 81(3):521\u2013528,\n\n2014.\n\n9\n\n\f", "award": [], "sourceid": 1030, "authors": [{"given_name": "Dane", "family_name": "Corneil", "institution": "EPFL"}, {"given_name": "Wulfram", "family_name": "Gerstner", "institution": "EPFL"}]}