{"title": "Training recurrent networks to generate hypotheses about how the brain solves hard navigation problems", "book": "Advances in Neural Information Processing Systems", "page_first": 4529, "page_last": 4538, "abstract": "Self-localization during navigation with noisy sensors in an ambiguous world is computationally challenging, yet animals and humans excel at it. In robotics, {\\em Simultaneous Location and Mapping} (SLAM) algorithms solve this problem through joint sequential probabilistic inference of their own coordinates and those of external spatial landmarks. We generate the first neural solution to the SLAM problem by training recurrent LSTM networks to perform a set of hard 2D navigation tasks that require generalization to completely novel trajectories and environments. Our goal is to make sense of how the diverse phenomenology in the brain's spatial navigation circuits is related to their function. We show that the hidden unit representations exhibit several key properties of hippocampal place cells, including stable tuning curves that remap between environments. Our result is also a proof of concept for end-to-end-learning of a SLAM algorithm using recurrent networks, and a demonstration of why this approach may have some advantages for robotic SLAM.", "full_text": "Training recurrent networks to generate hypotheses\nabout how the brain solves hard navigation problems\n\nIngmar Kanitscheider & Ila Fiete\n\nAustin, TX 78712\n\nDepartment of Neuroscience\n\nThe University of Texas\n\nikanitscheider, ilafiete @mail.clm.utexas.edu\n\nAbstract\n\nSelf-localization during navigation with noisy sensors in an ambiguous world is\ncomputationally challenging, yet animals and humans excel at it. In robotics, Si-\nmultaneous Location and Mapping (SLAM) algorithms solve this problem through\njoint sequential probabilistic inference of their own coordinates and those of exter-\nnal spatial landmarks. We generate the \ufb01rst neural solution to the SLAM problem\nby training recurrent LSTM networks to perform a set of hard 2D navigation tasks\nthat require generalization to completely novel trajectories and environments. Our\ngoal is to make sense of how the diverse phenomenology in the brain\u2019s spatial\nnavigation circuits is related to their function. We show that the hidden unit rep-\nresentations exhibit several key properties of hippocampal place cells, including\nstable tuning curves that remap between environments. Our result is also a proof\nof concept for end-to-end-learning of a SLAM algorithm using recurrent networks,\nand a demonstration of why this approach may have some advantages for robotic\nSLAM.\n\n1\n\nIntroduction\n\nSensory noise and ambiguous spatial cues make self-localization during navigation computationally\nchallenging. Errors in self-motion estimation cause rapid deterioration in localization performance, if\nlocalization is based simply on path integration (PI), the integration of self-motion signals. Spatial\nfeatures in the world are often spatially extended (e.g. walls) or similar landmarks are found at\nmultiple locations, and thus provide only partial position information. Worse, localizing in novel\nenvironments requires solving a chicken-or-egg problem: Since landmarks are not yet associated\nwith coordinates, agents must learn landmark positions from PI (known as mapping), but PI location\nestimates drift rapidly and require correction from landmark coordinates.\nDespite the computational dif\ufb01culties, animals exhibit stable neural tuning in familiar and novel\nenvironments over several 10s of minutes [1, 2], even though the PI estimates in the same animals is\nestimated to deteriorate within a few minutes [3]. These experimental and computational \ufb01ndings\nsuggest that the brain is solving some version of the simultaneous localization and mapping (SLAM)\nproblem.\nIn robotics, the SLAM problem is solved by algorithms that approximate Bayes-optimal sequential\nprobabilistic inference: at each step, a probability distribution over possible current locations and\nover the locations of all the landmarks is updated based on noisy motion and noisy, ambiguous\nlandmark inputs [4]. These algorithms simultaneously update location and map estimates, effectively\nbootstrapping their way to better estimates of both. Quantitative studies of neural responses in rodents\nsuggest that their brains might also perform high-quality sequential probabilistic fusion of motion\nand landmark cues during navigation [3]. The required probabilistic computations are dif\ufb01cult to\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\ftranslate by hand into forms amenable to neural circuit dynamics, and it is entirely unknown how the\nbrain might perform them.\nWe ask here how the brain might solve the SLAM problem.\nInstead of imposing heavy prior\nassumptions on the form a neural solution might take, we espouse a relatively model-free approach\n[5, 6, 7]: supervised training of recurrent neural networks to solve spatial localization in familiar\nand novel environments. A recurrent architecture is necessary because self-localization from motion\ninputs and different landmark encounters involves integration over time, which requires memory. We\nexpect that the network will form representations of the latent variables essential to solving the task .\nUnlike robotic SLAM algorithms that simultaneously acquire a representation of the agent\u2019s location\nand a detailed metric map of a novel environment, we primarily train the network to perform accurate\nlocalization; the map representation is only explicitly probed by asking the network to extract features\nto correctly classify the environment it is currently in. However, even if the goal is to merely localize\nin one of several environments, the network must have created and used a map of the environment\nto enable accurate localization with noisy PI. In turn, an algorithm that successfully solves the\nproblem of accurate localization in novel environments can automatically solve the SLAM problem,\nas mapping a space then simply involves assigning correct coordinates to landmarks, walls, and\nother features in the space [4]. Our network solution exploits the fact that the SLAM problem can\nbe considered as one of mapping sequences of ambiguous motion and landmark observations to\nlocations, in a way that generalizes across trajectories and environments.\nOur goal is to better understand how the brain solves such problems, by relating emergent responses\nin the trained network to those observed in the brain, and through this process to synthesize, from\na function-driven perspective, the large body of phenomenology on the brain\u2019s spatial navigation\ncircuits. Because we have access to all hidden units and control over test environments and trajectories,\nthis approach allows us to predict the effective dimensionality of the dynamics required to solve the 2D\nSLAM task and make novel predictions about the representations the brain might construct to solve\nhard inference problems. Even from the perspective of well-studied robotic SLAM, this approach\ncould allow for the learning and use of rich environment structure priors from past experience, which\ncan enable faster map building in novel environments.\n\n2 Methods\n\n2.1 Environments and trajectories\n\nWe study the task of a simulated rat that must estimate its position (i.e., localize itself) while moving\nalong a random trajectory in two-dimensional enclosures, similar to a typical task in which rats chase\nrandomly scattered food pellets [8]. The enclosure is polygon-shaped and the rat does not have\naccess to any local or distal spatial cues other than touch-based information upon contact with the\nboundaries of the environment (Figure 1A-B; for details see SI Text, section 1-4). We assume that\nthe rat has access to noisy estimates of self-motion speed and direction, as might be derived from\nproprioceptive and vestibular cues (Figure 1A), and to boundary-contact information derived from\nits rare encounters with a boundary whose only feature is its geometry. On boundary contact, the\nrat receives information only about its distance and angle relative to the boundary (Figure 1B). This\ninformation is degenerate: it depends simply on the pose of the rat with respect to the boundary,\nand the same signal could arise at various locations along the boundary. Self-motion and boundary\ncontact estimates are realistically noisy, with magnitudes based on work in [3].\n\n2.2 Navigation tasks\n\nWe study the following navigation tasks:\n\n\u2022 Localization only: Localization in a single familiar environment. The rat is familiar with\nthe geometry of the environment but starts each trial at a random unknown location. To\nsuccessfully solve the task, the rat must infer its location relative to a \ufb01xed point in the\ninterior on the basis of successive boundary contacts and its knowledge of the environment\u2019s\ngeometry, and be able to generalize this computation across novel random trajectories.\n\u2022 Generalized SLAM: Localization in novel environments. Each trial takes place in a novel\nenvironment, sampled from a distribution of random polygons (Figure 1C; SI Text, section\n\n2\n\n\fFigure 1: Task setup. Self-localization in 2D enclosures. A Noisy heading direction and speed\ninputs allow the simulated rat to update its location in the interior. B Occasional boundary contacts\nprovide noisy estimates of the its relative angle (\u03b2) and distance (d) from the wall. Length scale is for\nlocalization-only task. C Samples from the distribution of random environments. D Architecture of\nthe recurrent neural network.\n\n1); the rat must accurately infer its location relative to the starting point by exploiting\nboundary inputs despite not knowing the geometry of its enclosure. To solve the task, the rat\nmust be able to generalize its localization computations to trials with both novel trajectories\nand novel environments.\n\n\u2022 Specialized task: Localization in and classi\ufb01cation of any of 100 familiar environments.\nEach trial takes place in one of 100 known environments, sampled from a distribution of\nrandom polygons (Figure 1C; SI Text, section 1), but the rat does not know which one.\nThe trial starts at a \ufb01xed point inside the polygon (known to rat through training), and the\nongoing trajectory is random. In addition to the challenges of the localization tasks above,\nthe rat must correctly classify the environment.\n\nThe environments are random polygons with 10 vertices. The center-to vertex lengths are drawn\nrandomly from a distribution with mean 1m in the localization-only task or 0.33m in the specialized\nand generalized SLAM tasks.\n\n2.3 Recurrent network architecture and training\n\nThe network has three layers: input, recurrent hidden and output layer (Figure 1D). The input layer\nencodes noisy self-motion cues like velocity and head direction change, as well as noisy boundary-\ncontact information like relative angle and distance to boundary (SI Text, section 9). The recurrent\nlayer contains 256 Long Short-Term Memory (LSTM) units with peepholes and forget gates [9], an\narchitecture demonstrated to be able to learn dependencies across many timesteps [10]. We adapt\nthe nonlinearity of the LSTM units to produce non-negative hidden activations in order to facilitate\nthe comparison with neural \ufb01ring rates1. Two self-localization units in the output perform a linear\nreadout; their activations correspond to the estimated location coordinates. The cost function for\nlocalization is mean squared error. The classi\ufb01cation output is implemented by a softmax layer with\n100 neurons (1 per environment); the cost function is cross-entropy. When the network is trained\nto both localize and classify, the relative weight is tuned such that the classi\ufb01cation cost is half of\nthe localization cost. Independent trials used for training: 5000 trials in the localization-only task,\n250,000 trials in the specialized task, and 300,000 trials in the generalized task. The network is\ntrained using the Adam algorithm [11], a form of stochastic gradient descent. Gradients are clipped\n\n1The LSTM equations are implemented by the equations:\n\nit = \u03c3(Wxixt + Whiht\u22121 + wci (cid:12) ct\u22121 + bi)\nft = \u03c3(Wxf xt + Whf ht\u22121 + wcf (cid:12) ct\u22121 + bf )\nct = ftct\u22121 + it tanh(Wxcxt + Whcht\u22121 + bc)\not = \u03c3(Wxoxt + Whoht\u22121 + wco (cid:12) ct + bo)\nht = ot tanh([ct]+)\n\nwhere \u03c3 is the logistic sigmoid function, h is the hidden activation vector, i, f, o and c are respectively the\ninput gate, forget gate, output gate and cell activation vectors, a (cid:12) b denotes point-wise multiplication and [x]+\ndenotes recti\ufb01cation.\n\n3\n\nABCDlocalization or classificationrecurrent layer (LSTM)input: motion and boundary??1 m\fto 1. During training performance is monitored on a validation set of 1000 independent trials, and\nnetwork parameters with the smallest validation error are selected. All results are cross-validated\non a separate set of 1000 test trials to ensure the network indeed generalizes across new random\ntrajectories and/or environments.\n\n3 Results\n\n3.1 Network performance on spatial tasks rivals optimal performance\n\n3.1.1 Localization in a familiar environment\n\nThe trained network, starting a trial from an unknown random initial position and running along a new\nrandom trajectory, quickly localizes itself within the space (Figure 2, red curve). The mean location\nerror (averaged over new test trials) drops as a function of time in each trial, as the rat encounters\nmore boundaries in the environment. After about 5 boundary contacts, the initial error has sharply\ndeclined.\n\nFigure 2: Localization in a single familiar environment. Mean absolute error on the localization-only\ntask (left), radial error measured from origin (middle) and angular error (right). One time step\ncorresponds to 0.77 seconds. Network performance (red, NN) is compared to that of the particle \ufb01lter\n(black, PF). Also shown: single hypothesis \ufb01lter (light red, SH) and simple path integration (gray, PI)\nestimates as controls.\n\nThe drop in error over time and the \ufb01nal error of the network match that of the optimal Bayesian esti-\nmator with access to the same noisy sensory data but perfect knowledge of the boundary coordinates\n(Figure 2, black). The optimal Bayesian estimator is implemented as a particle \ufb01lter (PF) with 1000\nparticles and performs fully probabilistic sequential inference about position, using the environment\ncoordinates and the noisy sensory data. The posterior location distributions are frequently elongated\nin an angular arc and multimodal (thus far from Gaussian).\nBoth network and PF vastly outperform pure PI. First, since the PI estimate does not have access to\nboundary information, it cannot overcome initial localization uncertainty due to the unknown starting\npoint. Second, the error in the PI estimate of location grows unbounded with time, as expected due to\nthe accumulating effects of noise in the motion estimates (Figure 2, gray). In contrast, the errors in\nthe network and PF \u2013 which make use of the same motion estimates \u2013 remain bounded.\nFinally we contrast the performance of the network and PF with the single hypothesis (SH) algorithm,\nwhich updates a single location estimate (rather than a probability distribution) by taking into account\nmotion, contact, and arena shape. The SH algorithm can be thought of as an abstraction of neural\nbump attractor models [12, 13], in which an activity bump is updated using PI and corrected when a\nlandmark or boundary with known spatial coordinates is observed. The SH algorithm overcomes, to\na certain degree, the initial localization uncertainty due to the unknown starting position, but the error\nsteadily increases thereafter. It still vastly underperforms the network and PF, since it is not able to\nef\ufb01ciently resolve the complex-shaped uncertainties induced by featureless boundaries.\n\n4\n\n050010001500timestep00.511.5mean abs err [m]PIPFNNSH050010001500timestep01radial error [m]012ang error [rads]050010001500timestepPFSH\f3.1.2 Localization in novel environments\n\nThe network is trained to localize within a different environment in each trial, then tested on a set of\ntrials in different novel environments.\nStrikingly, the network localizes well in the novel environments, despite its ignorance about their\nspeci\ufb01c geometry (Figure 3A, red). While the network (unsurprisingly) does not match the per-\nformance of an oracular PF that is supplied with the arena geometry at the beginning of the trial\n(Figure 3A, black), its error exceeds the oracular PF by only \u2248 50%, and it vastly outperforms\nPI-based estimation (Figure 3A, gray) and a naive Bayesian (NB) approach that takes into account the\ndistribution of locations across the ensemble of environments (Figure 3A, reddish-gray; SI section 8).\nCompared to robotic SLAM in open-\ufb01eld environments, this task setting is especially dif\ufb01cult since\ndistant boundary information is gathered only from sparse contacts, rather than spatially extended\nand continuous measurements with laser or radar scanners.\n\n3.1.3 Localization in and classi\ufb01cation of 100 familiar environments\n\nThe network is trained on 100 environments then tested in an arbitrary environment from that set. The\ngoal is to identify the environment and localize within it, from a known starting location. Localization\ninitially deteriorates because of PI errors (Figure 3B, red). After a few boundary encounters, the\nnetwork correctly identi\ufb01es the environment (Figure 3C), and simultaneously, localization error drops\nas the network now associates the boundary with coordinates for the appropriate environment. The\nnetwork\u2019s localization error post-classi\ufb01cation matches that of an oracular PF with full knowledge\nabout the environment geometry. Within 200s of exploration within the environment, classi\ufb01cation\nperformance is close to 100%.\nAs a measure of the ef\ufb01cacy of the neural network in solving the specialized task, we compare its\nperformance to PFs that do not know the identity of the environment at the outset of the trial (PF\nSLAM) and that perform both localization and classi\ufb01cation, with varying numbers of particles,\nFigure 3D-E. For classi\ufb01cation, the asymptotic network performance with 256 recurrent units is\ncomparable to a 10,000 particle PF SLAM, while for localization, the asymptotic network performance\nis comparable to a 4,000 particle PF SLAM, suggesting that the network is extremely ef\ufb01cient. Even\nthe 10,000 particle PF SLAM classi\ufb01cation estimate sometimes prematurely collapses to not always\nthe correct value. The network is slower to select a classi\ufb01cation, and is more accurate, improving on\na common problem with particle-\ufb01lter based SLAM caused by particle depletion.\n\nFigure 3: Localization and classi\ufb01cation in the generalized and specialized SLAM tasks. A Localiza-\ntion performance of the generalized network (red, NN) tested in novel environments, compared to a\nPF that knows the environment identity (black, oracular PF). Controls: PI only (gray, PI) and a naive\nBayes \ufb01lter (see text and SI; reddish-gray, NB). B Same as (A), but for the specialized network tested\nin 100 familiar environments. C Classi\ufb01cation performance of the specialized network in 100 familiar\nenvironments. D-E Localization and classi\ufb01cation by a SLAM PF with different number of particles,\ncompared to the specialized network in 100 familiar environments. F Classi\ufb01cation performance of\nthe general network after retraining of the readout weights on the specialized task.\n\n5\n\nPIoracular PFNNNBspecialized network050010001500timestep050010001500timestep050010001500timestepgeneralized network, specialized task00.10.05BFCDEoracular PF500 PF SLAM1000400010000generalized networkA00500100015000.20.40.6mean abs err [m]timestep01000100class (%)class (%)\f3.1.4 Spontaneous classi\ufb01cation of novel environments\n\nIn robotic SLAM, algorithms that self-localize accurately in novel environments in the presence of\nnoise must simultaneously build a map of the environments. Since the network in the general task\nin Figure 3A successfully localizes in novel environments, and is able to distinguish between them\nthough they are quite similar, we conjecture that it must entertain a spontaneous representation of the\nenvironment.\nTo test this hypothesis we \ufb01x the input and recurrent weights of the network trained on the generalized\ntask (completely novel environments) and retrain it on the specialized task (one out of hundred\nfamiliar environments), whereby only the readout weights are trained for classi\ufb01cation. We \ufb01nd that\nthe classi\ufb01cation performance late in each trial is close to 80%, much higher than chance (1%), Figure\n3F. This implies that the hidden neurons spontaneously build a representation that separates novel\nenvironments so they can be linearly classi\ufb01ed. This separation can be interpreted as a simple form\nof spontaneous map-building. However, this spontaneous map-building is done with \ufb01xed weights -\nthis is different than standard Hop\ufb01eld-type network models that require synaptic plasticity to learn a\nnew environment.\n\n3.2 Comparison with and predictions for neural representation\n\nNeural activity in the hippocampus and entorhinal cortex \u2013 areas involved in spatial navigation \u2013 has\nbeen extensively catalogued, usually while animals chase randomly dropped food pellets in open\n\ufb01eld environments. It is not always clear what function the observed responses play in solving hard\nnavigation problems, or why certain responses exist. Here we compare the responses of our network,\nwhich is trained to solve such tasks, with the experimental phenomenology.\nHidden units in our network exhibit stable place tuning, similar to place cells in CA1/CA3 of the\nhippocampus [14, 15, 16], Figure 4A,B (left two columns). Stable place \ufb01elds are observed across\ntasks \u2013 the network trained to localize in a single familiar environment exhibits stable \ufb01elds there,\nwhile the networks trained on the specialized and generalized tasks exhibit repeatedly stable \ufb01elds in\nall tested environments.\n\nFigure 4: Neuron-like representations. A Spatial tuning of four typical hidden units from the\nspecialized network, measured twice with different trajectories in the same environment (columns\n1-2, blue box). The same cells are measured in a second environment (column 3, red box). B Same\nas A but for the generalized network; both environments were not in the training set. C Hidden\nunits (representative sample of 20) are not tuned to head direction. D Cumulative distribution of\nsimilarity of hidden unit states in the specialized (top) and generalized (bottom) networks, for trials\nin the same environment (blue) versus trials in different environments (pink). Control: similarity\nafter randomizing over environments (gray). E Spatial selectivities of hidden units in the specialized\nnetwork. Inset: spatial selectivity (averaged across environments) versus effective projection strength\nto classi\ufb01er neurons, per hidden unit.\n\nThe hidden units, all of which receive head direction inputs and use this data to compute location esti-\nmates, nevertheless exhibit weak to nil head direction tuning, Figure 4C, again similar to observations\nin rodent place cells [17] (but see [18] for a report of head direction tuning in bat place cells).\nBetween different environments, the network trained on the specialized task exhibits clear remapping\n[19, 20], both global and local: cells \ufb01re in some environments and not others, and cells that were\nco-active in one environment are not in another, Figure 4A,B (third column). There is, in addition,\na substantial amount of rate modulation in cells when they do not globally remap. Strikingly, the\nnetwork trained on the generalized task exhibits different but stable and reproducible maps of different\n\n6\n\nspecialized networkenv Aenv Aenv Bgeneralized networkABC01cumul freqenv Aenv Aenv B0.050.10.150.2normalized similarityspatial selectivity (SS)00.51frequency0102030DESSweight01head directionactivity distribution\fnovel environments with remapping, even though the input and recurrent connections were never\nreadjusted for these novel environments, Figure 4B. This result suggests a computation that is distinct\nfrom the dynamics of settling into pre-trained \ufb01xed maps for different environments.\nThe similarity and dissimilarity of the representations within the same environment and across\nenvironments, in the specialized and generalized tasks are quanti\ufb01ed in Figure 4D: the representations\nare randomized across environments but stable within an environment.\nFor networks trained on the specialized or generalized tasks, the spatial selectivity of hidden units in\nan environment - measured as the fraction of the variance of each hidden neuron\u2019s activation that\ncan be explained by location - is broad and long-tailed or sparse, Figure 4E: a few cells exhibit\nhigh selectivity, many have low selectivity. Interestingly, cells with low spatial selectivity in one\nenvironment also tend to have low selectivity across environments (in other words, the distribution in\nselectivity per cell across environments is narrower than the distribution of selectivity across cells\nper environment). Indeed, spatial information in hippocampal neurons seems to be concentrated in a\nsmall set of neurons [21], an experimental observation that seemed to run counter to the information-\ntheoretic view that whitened representations are most ef\ufb01cient. However, our 256-neuron recurrent\nnetwork, which ef\ufb01ciently solves a hard task that requires 104 particles, seems to do the same.\nThere is a negative correlation between spatial selectivity and the strength of feedforward connections\nto the classi\ufb01cation units: Hidden units that more strongly drive classi\ufb01cation also tend to be less\nspatially selective, Figure 4E (inset). In other words, some low spatial selectivity cells correspond to\nwhat are termed context cells [22]. It remains unclear and the focus of future work to understand the\nrole of the remaining cells with low spatial selectivity.\n\n3.3\n\nInner workings of the network\n\nFigure 5: Inner workings of the network A Hidden units in the localization-only network predict the\ncovariances (Cxx, Cyy, Cxy) of the posterior location (x, y) distributions in the particle \ufb01lter. B Light\nred: snapshots of the narrowing set of potential environment classi\ufb01cations by the specialized neural\nnetwork at different early times in a trajectory, as determined by the activation of classi\ufb01er neurons\nin the output layer. C Dimensionality of the hidden representations, estimated by the correlation\ndimension measure: localization network (top), specialized network (middle), generalized network\n(bottom). Dimensionality estimated from across-environment pooled responses for the latter two\nnetworks.\n\nBeyond the similarities between representations in our hidden units and neural representations, what\ncan we learn about how the network solves the SLAM problem?\nThe performance of the network compared to the particle \ufb01lter (and its superiority to simpler\nstrategies used as controls) already implies that the network is performing sophisticated probabilistic\ncomputations about location. If it is indeed tracking probabilities, it should be possible to predict the\n\n7\n\n3.9 sA8.6 s10.1 s30.3 s-1.5-1-0.500.5051015log(#elements within)d = 5.0 +/- 0.04010d = 5.6 +/- 0.03051015d = 8.6 +/- 0.1log(radius)00.0200.02-0.010.01Cxx (net prediction)00.100.1-0.050.05Cxx (PF)Cyy (PF)Cyy (net prediction)Cxy (PF)Cxy (net prediction)r = 0.49r = 0.45r = 0.45CB\funcertainties in location estimation from the hidden units. Indeed, all three covariance components\nrelated to the location estimate of the particle \ufb01lter can be predicted by cross-validated linear\nregression from the hidden units in the localization-only network (Figure 5A).\nWhen \ufb01rst placed into one of 100 familiar environments, the specialized network simultaneously\nentertains multiple possibilities for environment identity, Figure 5B. The activations of neurons in\nthe soft-max classi\ufb01cation layer may be viewed as a posterior distribution over environment identity.\nWith continued exploration and boundary encounters, the represented possibilities shrink until the\nnetwork has identi\ufb01ed the correct environment.\nUnlike the particle \ufb01lter and contrary to neural models that implement probabilistic inference by\nstochastic sampling of the underlying distribution [23], this network implements ongoing near-optimal\nprobabilistic location estimation through a fully deterministic dynamics.\nLocation in 2D spaces is a continuous 2D metric variable, so one might expect location representations\nto lie on a low-dimensional manifold. On the other hand, SLAM also involves the representation of\nlandmark and boundary coordinates and the capability to classify environments, which may greatly\nexpand the effective dimension of a system solving the problem. We analyze the fractal manifold\ndimension of the hidden layer activities in the three networks, Figure 5C2. The localization-only\nnetwork has a dimension D = 5.0. Surprisingly, the specialized network states (pooled across all 100\nenvironments) are equally low-dimensional: D = 5.6. The generalized network states, pooled across\nenvironments, have dimension D = 8.6. (The dimensionality of activity in the latter two networks,\nconsidered in single environments only, remains the same as when pooled across environments.)\nThis implies that the network extracts and representing only the most relevant summary statistics\nrequired to solve the 2D localization tasks, and that these statistics have fairly low dimension. These\ndimension estimates could serve as a prediction for hippocampal dynamics in the brain.\n\n4 Discussion\n\nBy training a recurrent network on a range of challenging navigation tasks, we have generated \u2013\nto our knowledge \u2013 the \ufb01rst fully neural SLAM solution that is as effective as particle \ufb01lter-based\nimplementations. Existing neurally-inspired SLAM algorithms such as RatSLAM [24] have combined\nattractor models with semi-metric topological maps, but only the former was neurally implemented.\n[25] trained a bidirectional LSTM network to transform laser range sensor data into location estimates,\nbut the network was not shown to generalize across environments. In contrast, our recurrent network\nimplementation is fully neural and generalizes successfully across environments with very different\nshapes. (Also see [26], a new preprint posted while this paper was under review, reporting on a\nSLAM implementation with recurrent neural network components. Other recent efforts to combine\nDNNs with SLAM usually apply DNNs to the input visual input, and the outputs of the DNN are\nthen fed into an existing SLAM algorithm [27, 28]. By contrast, our focus has been on \ufb01nding neural\nsolutions to the SLAM algorithm itself.)\nPrevious hand-designed models such as the multichart attractor model of Samsonovich & Mc-\nNaughton [12] could path integrate and use landmark information to correct the network\u2019s PI estimate\nin many different environments. Yet our model substantially transcends those computational capabili-\nties: First, our model performs sequential probabilistic inference, not simply a hard resetting of the\nPI estimate according to external cues. Second, our network reliably localizes in 100 environments\nwith 256 LSTM units (which corresponds to 512 dynamical units); the low capacity of the multichart\nattractor model would require about 175,000 neurons for the same number of environments. This\ncomparison suggests that the special network architecture of the LSTM not only affects learnability,\nbut also capacity. Finally, unlike the multichart attractor model, our model is able to linearly separate\ncompletely novel environments without changing its weights, as shown in section 3.1.4.\nDespite its success in reproducing some key elements of the phenomenology of the hippocampus,\nour network model does not incorporate many biological constraints. This is in itself interesting,\nsince it suggests that observed phenomena like stable place \ufb01elds and remapping may emerge\nfrom the computational demands of hard navigation tasks rather than from detailed biological\n\n2To estimate the fractal dimension, we use \u201ccorrelation dimension\u201d: measure the number of states across\ntrials that fall into a ball of radius r around a point in state space. The slope of log(#states) versus log(r) is the\nfractal dimension at that point.\n\n8\n\n\fconstraints. It will be interesting to see whether incorporating constraints like Dale\u2019s law and the\nknown gross architecture of the hippocampal circuit results in the emergence of additional features\nassociated with the brain\u2019s navigation circuits, such as sparse population activity, directionality in\nplace representations in 1D environments, and grid cell-like responses.\nThe choice of an LSTM architecture for the hidden layer units, involving multiplicative input, output\nand forget gates and persistent cells, was primarily motivated by its ability to learn longer time-\ndependencies. One might wonder whether such multiplicative interactions could be implemented\nin biological neurons. A model by [29] proposed that dendrites of granule cells in the dental gyrus\ncontextually gate projections from grid cells in the entorhinal cortex to place cells. Similarly, granule\ncells could implement LSTM gates by modulating recurrent connections between pyramidal neurons\nin hippocampal area CA3. LSTM cells might be interpreted as neural activity or as synaptic weights\nupdated by a form of synaptic plasticity.\nThe learning of synaptic weights by gradient descent does not map well to biologically plausible\nsynaptic plasticity rules, and such learning is slow, requiring a vast number of supervised training\nexamples. Our present results offer a hint that, through extensive learning, the generalized network\nacquires useful general prior knowledge about the structure of natural navigation tasks, which it then\nuses to map and localize in novel environments with minimal further learning. One could thus argue\nthat the slow phase of learning is evolutionary, while learning during a lifetime can be brief and\ndriven by relatively little experience in new environments. At the same time, progress in biologically\nplausible learning may one day bridge the ef\ufb01ciency gap to gradient descent [30].\nFinally, although our work is focused on understanding the phenomenology of navigation circuits\nin the brain, it might also be of some interest for robotic SLAM. SLAM algorithms are sometimes\naugmented by feedforward convolutional networks to assist in speci\ufb01c tasks like place recognition\n(see e.g. [27, 28]) from camera images, but the geometric calculations and parameters at the core of\nSLAM algorithms are still largely hand-speci\ufb01ed. By contrast, this work provides a proof of concept\nfor the feasibility end-to-end learning of SLAM algorithms using recurrent neural networks and\nshows that the trained network provides a solution to the particle depletion problem that plagues many\nparticle \ufb01lter-based approaches to SLAM and is highly effective in identifying which low-dimensional\nsummary statistics to update over time.\n\nAcknowledgments\n\nThis work is supported by the NSF (CRCNS 26-1004-04xx), an HFSP award to IRF (26-6302-87),\nand the Simons Foundation through the Simons Collaboration on the Global Brain. The authors\nacknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin\n(URL: http://www.tacc.utexas.edu) for providing HPC resources that have contributed to the research\nresults reported within this paper.\n\nReferences\n\n[1] Etienne Save, Ludek Nerad, and Bruno Poucet. Contribution of multiple sensory information to\n\nplace \ufb01eld stability in hippocampal place cells. Hippocampus, 10(1):64\u201376, 2000.\n\n[2] Torkel Hafting, Marianne Fyhn, Sturla Molden, May-Britt Moser, and Edvard I. Moser. Mi-\n\ncrostructure of a spatial map in the entorhinal cortex. Nature, 436:801\u2013806, 2005.\n\n[3] Allen Cheung, David Ball, Michael Milford, Gordon Wyeth, and Janet Wiles. Maintaining a\ncognitive map in darkness: the need to fuse boundary knowledge with path integration. PLoS\nComput Biol, 8(8):e1002651, 2012.\n\n[4] Sebastian Thrun, Wolfram Burgard, and Dieter Fox. Probabilistic robotics. MIT press, 2005.\n[5] Valerio Mante, David Sussillo, Krishna V Shenoy, and William T Newsome. Context-dependent\n\ncomputation by recurrent dynamics in prefrontal cortex. Nature, 503(7474):78\u201384, 2013.\n\n[6] Daniel LK Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James J\nDiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual\ncortex. Proceedings of the National Academy of Sciences, 111(23):8619\u20138624, 2014.\n\n[7] Adam Marblestone, Greg Wayne, and Konrad Kording. Towards an integration of deep learning\n\nand neuroscience. arXiv preprint arXiv:1606.03813, 2016.\n\n9\n\n\f[8] Robert U Muller and John L Kubie. The effects of changes in the environment on the spatial\n\n\ufb01ring of hippocampal complex-spike cells. Journal of Neuroscience, 7(7):1951\u20131968, 1987.\n\n[9] Alex Graves. Generating sequences with recurrent neural networks.\n\narXiv:1308.0850, 2013.\n\narXiv preprint\n\n[10] Sepp Hochreiter and J\u00fcrgen Schmidhuber. Long short-term memory. Neural computation,\n\n9(8):1735\u20131780, 1997.\n\n[11] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[12] A Samsonovich and B L McNaughton. Path integration and cognitive mapping in a continuous\n\nattractor neural network model. J Neurosci, 17(15):5900\u20135920, 1997.\n\n[13] Yoram Burak and Ila R Fiete. Fundamental limits on persistent activity in networks of noisy\n\nneurons. Proc Natl Acad Sci U S A, 109(43):17645\u201350, Oct 2012.\n\n[14] J O\u2019Keefe and J Dostrovsky. The hippocampus as a spatial map. preliminary evidence from\n\nunit activity in the freely-moving rat. Brain Res, 34(1):171\u2013175, 1971.\n\n[15] John O\u2019Keefe and Lynn Nadel. The hippocampus as a cognitive map. Behavioral and Brain\n\nSciences, 2(04):487\u2013494, 1979.\n\n[16] Matthew A Wilson and Bruce L McNaughton. Dynamics of the hippocampal ensemble code\n\nfor space. Science, 261(5124):1055\u20131058, 1993.\n\n[17] Robert U Muller, Elizabeth Bostock, Jeffrey S Taube, and John L Kubie. On the directional\n\ufb01ring properties of hippocampal place cells. The Journal of Neuroscience, 14(12):7235\u20137251,\n1994.\n\n[18] Alon Rubin, Michael M Yartsev, and Nachum Ulanovsky. Encoding of head direction by\n\nhippocampal place cells in bats. The Journal of Neuroscience, 34(3):1067\u20131080, 2014.\n\n[19] J O\u2019Keefe and DH Conway. Hippocampal place units in the freely moving rat: why they \ufb01re\n\nwhere they \ufb01re. Experimental Brain Research, 31(4):573\u2013590, 1978.\n\n[20] Robert U. Muller, John L. Kubie, E. M. Bostock, J. S. Taube, and G. J. Quirk. Spatial \ufb01ring\ncorrelates of neurons in the hippocampal formation of freely moving rats, pages 296\u2013333.\nOxford University Press, New York, NY, US, 1991.\n\n[21] Gy\u00f6rgy Buzs\u00e1ki and Kenji Mizuseki. The log-dynamic brain: how skewed distributions affect\n\nnetwork operations. Nature Reviews Neuroscience, 15(4):264\u2013278, 2014.\n\n[22] David M Smith and Sheri J Y Mizumori. Hippocampal place cells, context, and episodic\n\nmemory. Hippocampus, 16(9):716\u2013729, 2006.\n\n[23] J\u00f3zsef Fiser, Pietro Berkes, Gerg\u02ddo Orb\u00e1n, and M\u00e1t\u00e9 Lengyel. Statistically optimal perception and\nlearning: from behavior to neural representations. Trends in cognitive sciences, 14(3):119\u2013130,\n2010.\n\n[24] Michael Milford and Gordon Wyeth. Persistent navigation and mapping using a biologically\ninspired slam system. The International Journal of Robotics Research, 29(9):1131\u20131153, 2010.\n[25] Alexander F\u00f6rster, Alex Graves, and J\u00fcrgen Schmidhuber. Rnn-based learning of compact maps\n\nfor ef\ufb01cient robot localization. In ESANN, pages 537\u2013542, 2007.\n\n[26] J. Zhang, L. Tai, J. Boedecker, W. Burgard, and M. Liu. Neural SLAM. arXiv preprint\n\narXiv:1706.09520, 2017.\n\n[27] Zetao Chen, Obadiah Lam, Adam Jacobson, and Michael Milford. Convolutional neural\n\nnetwork-based place recognition. CoRR, abs/1411.1509, 2014.\n\n[28] Niko Sunderhauf, Sareh Shirazi, Feras Dayoub, Ben Upcroft, and Michael Milford. On the\nperformance of convnet features for place recognition. In Intelligent Robots and Systems (IROS),\n2015 IEEE/RSJ International Conference on, pages 4297\u20134304. IEEE, 2015.\n\n[29] Robin M Hayman and Kathryn J Jeffery. How heterogeneous place cell responding arises from\nhomogeneous grids - a contextual gating hypothesis. Hippocampus, 18(12):1301\u20131313, 2008.\n[30] Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, and Zhouhan Lin. Towards biologically\n\nplausible deep learning. arXiv preprint arXiv:1502.04156, 2015.\n\n10\n\n\f", "award": [], "sourceid": 2373, "authors": [{"given_name": "Ingmar", "family_name": "Kanitscheider", "institution": "UT Austin"}, {"given_name": "Ila", "family_name": "Fiete", "institution": null}]}