{"title": "Bounded Invariance and the Formation of Place Fields", "book": "Advances in Neural Information Processing Systems", "page_first": 1483, "page_last": 1490, "abstract": "", "full_text": "Bounded invariance and the formation of\n\nplace (cid:12)elds\n\nReto Wyss and Paul F.M.J. Verschure\n\nInstitute of Neuroinformatics\n\nUniversity/ETH Z(cid:127)urich\n\nZ(cid:127)urich, Switzerland\n\nrwyss,pfmjv@ini.phys.ethz.ch\n\nAbstract\n\nOne current explanation of the view independent representation of\nspace by the place-cells of the hippocampus is that they arise out\nof the summation of view dependent Gaussians. This proposal as-\nsumes that visual representations show bounded invariance. Here\nwe investigate whether a recently proposed visual encoding scheme\ncalled the temporal population code can provide such representa-\ntions. Our analysis is based on the behavior of a simulated robot\nin a virtual environment containing speci(cid:12)c visual cues. Our re-\nsults show that the temporal population code provides a represen-\ntational substrate that can naturally account for the formation of\nplace (cid:12)elds.\n\n1\n\nIntroduction\n\nPyramidal cells in the CA3 and CA1 regions of the rat hippocampus have shown to\nbe selectively active depending on the animal\u2019s position within an environment[1].\nThe ensemble of locations where such a cell (cid:12)res { the place (cid:12)eld { can be deter-\nmined by a combination of di(cid:11)erent environmental and internal cues[2], where vision\nhas been shown to be of particular importance[3]. This raises the question, how\negocentric visual representations of visual cues can give rise to an allocentric rep-\nresentation of space. Recently it has been proposed that a place (cid:12)eld is formed by\nthe summation of Gaussian tuning curves, each oriented perpendicular to a wall of\nthe environment and peaked at a (cid:12)xed distance from it[4, 5, 6]. While this proposal\ntries to explain the actual transformation from one coordinate system to another,\nit does not account for the problem how appropriate egocentric representations of\nthe environment are formed. Thus, it is unclear, how the information about a rat\u2019s\ndistance to di(cid:11)erent walls becomes available, and in particular how this proposal\nwould generalize to other environments where more advanced visual skills, such as\ncue identi(cid:12)cation, are required.\n\nFor an agent moving in an environment, visual percepts of objects/cues undergo a\ncombination of transformations comprising zooming and rotation in depth. Thus,\nthe question arises, how to construct a visual detector, which has a Gaussian like\ntuning with regard to the positions within the environment from which snapshots\n\n\f2\n\n3\n\n1\n\n1\n\n2\n\n3\n\nTPC\n\nTPC\n\nTPC\n\nl\nl\n\ne\nc\n \n\ne\nc\na\np\n\nl\n\nFigure 1: Place cells from multiple snapshots. The robot is placed in a virtual\nsquare environment with four patterns on the walls, i.e. a square, a triangle, a\nZ and a X. The robot scans the environment for salient stimuli by rotating on\nplace. A saliency detector triggers the acquisition of visual snapshots which are\nsubsequently transformed into TPCs. A place cell is de(cid:12)ned through its associated\nTPC templates.\n\nof a visual cue are taken. The internal representation of a stimulus, upon which\nsuch a detector is based, should be tolerant to certain degrees of visual deformations\nwithout loosing speci(cid:12)city or, in other words, show a bounded invariance. In this\nstudy we show that a recently proposed cortical model of visual pattern encoding,\nthe temporal population code (TPC), directly supports this notion of bounded\ninvariance[7]. The TPC is based on the notion that a cortical network can be seen\nto transform a spatial pattern into a purely temporal code.\n\nHere, we investigate to what extent the bounded invariance provided by the TPC\ncan be exploited for the formation of place (cid:12)elds. We address this question in the\ncontext of a virtual robot behaving in an environment containing several visual\ncues. Our results show, that the combination of a simple saliency mechanism with\nthe TPC naturally gives rise to allocentric representations of space, similar to the\nplace (cid:12)elds observed in the hippocampus.\n\n2 Methods\n\n2.1 The experimental setup\n\nExperiments are performed using a simulated version of the real-world robot Khep-\nera (K-team, Lausanne, Switzerland) programmed in C++ using OpenGL. The\nrobot has a circular body with two wheels attached to its side each controlled by an\nindividual motor. The visual input is provided by a camera with a viewing angle\nof 60(cid:14) mounted on top of the robot. The neural networks are simulated on a Linux\ncomputer using a neural network simulator programmed in C++.\n\nThe robot is placed in square arena ((cid:12)g. 1, left),and in the following, all lengths will\nbe given in units of the side lengths of the square environment.\n\n2.2 The temporal population code\n\nVisual information is transformed into a TPC by a network of laterally coupled\ncortical columns, each selective to one of four orientations   2 f0(cid:14); 45(cid:14); 90(cid:14); 135(cid:14)g\nand one of three spatial frequencies (cid:23) 2 fhigh; medium; lowg[7]. The outputs of\nthe network are twelve vectors A ;(cid:23) each re(cid:13)ecting the average population activity\nrecorded over 100 time-steps for each type of cortical column. These vectors are\n\n\freduced to three vectors A(cid:23) by concatenating the four orientations. This set of\nvectors form the TPC which represents a single snapshot of a visual scene.\n\nThe similarity S(s1; s2) between two snapshots s1 and s2 is de(cid:12)ned as the average\ncorrelation (cid:26) between the corresponding vectors, i.e.\n\nS(s1; s2) = *Z(cid:16)(cid:26)(As1\n\n(cid:23) ; As2\n\n(cid:23) )(cid:17)+8(cid:23)\n\n(1)\n\nwhere Z is the Fisher Z-Transform given by Z((cid:26)) = 1=2 ln((1 + (cid:26))=(1 (cid:0) (cid:26))), which\ntransforms a typically skewed distribution of correlation coe(cid:14)cients (cid:26) into an ap-\nproximately normal distribution of coe(cid:14)cients. Thus, Z((cid:26)) becomes a measure on\na proportional scale such that mean values are well de(cid:12)ned.\n\n2.3 Place cells from multiple snapshots\n\nIn this study, the response properties of a place cell are given by the similarity\nbetween incoming snapshots of the environment and template snapshots associated\nto the place cell when it was constructed. Thus, for both, the acquisition of place\ncells as well as their exploitation, the system needs to be provided with snapshots\nof its environment that contain visual features. For this purpose, the robot is\nequipped with a simple visual saliency detector s(t) that selects scenes with high\ncentral contrast:\n\nc(y; t)2\n\ns(t) = P e(cid:0)y2\n\nP c(y; t)2\n\nwhere c(y; t) denotes the contrast at location y 2 [(cid:0)1; +1]2 in the image at time\nt. At each point in time where s(t) > (cid:18)saliency, a new snapshot is acquired with a\nprobability of 0.1. A place cell k is de(cid:12)ned by n snapshots called templates tk\ni with\ni = 1 : : : n.\n\nWhenever the robot tries to localize itself, it scans the environment by rotating\nin place and taking snapshots of visually salient scenes ((cid:12)g. 1). The similarity\nS between each incoming snapshot sj with j = 1 : : : m and every template tk\nis\ni\ndetermined using eq. 1. The activation ak of place cell k for a series of m snapshots\nsj is then given by a sigmoidal function\n\nak(ik) =(cid:16)1 + exp(cid:16)(cid:0)(cid:12)(ik (cid:0) (cid:18))(cid:17)(cid:17)(cid:0)1\n\nwhere\n\nik =Dmax\n\ni (cid:16)S(tk\n\ni ; sj)(cid:17)Ej\n\n:\n\n(2)\n\nik represents the input to the place cell which is computed by determining the\nmaximal similarity of each snapshot to any template of the place cell and subsequent\naveraging, i.e. h(cid:1)ij corresponds to the average over all snapshots j.\n2.4 Position reconstruction\n\nThere are many di(cid:11)erent approaches to the problem of position reconstruction or\ndecoding from place cell activity[8]. A basis function method uses a linear combi-\nnation of basis functions (cid:30)k(x) with the coe(cid:14)cients proportional to the activity of\nthe place cells ak. Here we use a direct basis approach, i.e. the basis function (cid:30)k(x)\ndirectly corresponds to the average activation ak of place cell k at position x within\nthe environment. The reconstructed position ^x is then given by\n\n^x = argmax\n\nak(cid:30)k(x)\n\nx Xk\n\nThe reconstruction error is given by the distance between the reconstructed and\ntrue position averaged over all positions within the environment.\n\n\f2.5\n2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\n2\n1.5\n1\n0.5\n\nFigure 2: Similarity surfaces for the four di(cid:11)erent cues. Similarity between a refer-\nence snapshot of the di(cid:11)erent cues taken at the position marked by the white cross\nand all the other positions surrounding the reference location.\n\n2.5 Place (cid:12)eld shape and size\n\nIn order to investigate the shape of a place (cid:12)eld (cid:30)(x), and in particular to determine\nits degree of asymmetry and its size, we computed the two-dimensional normalized\ninertial tensor I given by\n\nIij = Pr (cid:30)(r)(cid:16)(cid:14)ijr2 (cid:0) rirj(cid:17)\n\nPr (cid:30)(r)\n\nwith r = fr1; r2g = x (cid:0) ^x where ^x = P x(cid:30)(x)=P (cid:30)(x) corresponds to the \\center\nof gravity\" and (cid:14)ij is the Kronecker delta. I is symmetric and can therefore be\ndiagonalized, i.e. I = VTDV, such that V is an orthonormal transformation matrix\nand Dii > 0 for i = 1; 2. A measure of the half-width of the place (cid:12)eld along its two\nprincipal axes is then di = p2Dii such that a measure of asymmetry is given by\n\n0 (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\nd1 (cid:0) d2\n\nd1 + d2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\n(cid:20) 1\n\nThis measure becomes zero for symmetric place (cid:12)elds while approaching one for\nasymmetric ones. In addition, we can estimate the size of the place (cid:12)eld by approx-\nimating its shape by an ellipse, i.e. (cid:25)d1d2.\n\n3 Results\n\n3.1 Bounded invariance\n\nInitially, we investigate the topological properties of the temporal population coding\nspace. Depending on the position within an environment, visual stimuli undergo a\ngeometric transformation which is a combination of scaling and rotation in depth.\nFig. 2 shows the similarity to a reference snapshot taken at the location of the white\ncross for the four di(cid:11)erent cues. Although the precise shape of the similarity surface\ndi(cid:11)ers, the similarity decreases smoothly and monotonically for increasing distances\nto the reference point for all stimuli.\n\nThe similarity surface for di(cid:11)erent locations of the reference point is shown in (cid:12)g. 3\nfor the Z cue. Although the Z cue has no vertical mirror symmetry, the similarity\nsurfaces are nearly symmetric with respect to the vertical center line. Thus, using\na single cue, localization is only possible modulo a mirror along the vertical center.\nThe implications of this will be discussed later. Concerning di(cid:11)erent distances of\nthe reference point to the stimulus, (cid:12)g. 3 (along the columns) shows that the speci-\n(cid:12)city of the similarity measure is large for small distances while the tuning becomes\n\n\f2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\n2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\n2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\n2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\n2.5\n2\n1.5\n1\n0.5\n\nFigure 3: Similarity surface of Z cue for di(cid:11)erent reference points. The dis-\ntance/angle of the reference point to the cue is kept constant along the rows/columns\nrespectively.\n\nbroader for large distances. This is a natural consequence of the perspective pro-\njection which implies that the changes in visual perception due to di(cid:11)erent viewing\npositions are inversely proportional to the viewing distance.\n\n3.2 Place cells from multiple snapshots\n\nThe response of a place cell is determined by eq. 2 based on four associated snap-\nshots/templates taken at the same location within the environment. The templates\nfor each place cell are chosen by the saliency detector and therefore there is no\nexplicit control over the actual snapshots de(cid:12)ning a place cell, i.e. some place cells\nare de(cid:12)ned based on two or more templates of the same cue. Furthermore, the\nstochastic nature of the saliency detector does not allow for any control over the\nprecise position of the stimulus within the visual (cid:12)eld. This is, where the intrin-\nsic translation invariance of the temporal population code plays an important role,\ni.e. the precise position of the stimulus within the visual (cid:12)eld at the time of the\nsnapshot has no e(cid:11)ect on the resulting encoding as long as the whole stimulus is\nvisible.\n\nFig. 4 shows examples of the receptive (cid:12)elds (subsequently also called place (cid:12)elds)\nof such place cells acquired at the nodes of a regular 5 (cid:2) 5 lattice within the envi-\nronment. Most of the place (cid:12)elds have a Gaussian-like tuning which is compatible\nwith single cell recordings from pyramidal cells in CA3 and CA1[2], i.e. the place\ncells maximally respond close to their associated positions and degrade smoothly\nand monotonically for increasing distances. Some place cells have multiple sub-\n(cid:12)elds in that they respond to di(cid:11)erent locations in the environment with a similar\namplitude.\n\n3.3 Position reconstruction\n\nSubsequently, we determine the accuracy up to which the robot can be localized\nwithin the environment. Therefore we use the direct basis approach for position re-\nconstruction as described in the Methods. As basis functions we take the normalized\nresponse pro(cid:12)les of place cells constructed from four templates taken at the nodes\nof a regular lattice covering the environment. Fig. 5a shows the reconstruction error\naveraged over the environment as a function of the number of place cells as well as\nthe number of snapshots taken at each location. The reconstruction error decreases\nmonotonically both for an increasing number of place cells as well as an increasing\n\n\fFigure 4: Place (cid:12)elds of 5 (cid:2) 5 place cells. The small squares show the average\nresponse of 5 (cid:2) 5 di(cid:11)erent place cells for all the positions of the robot within\nthe environment. Darker regions correspond to stronger responses. The relative\nlocation of each square within the (cid:12)gure corresponds to the associated location of\nthe place cell within the environment. All place (cid:12)elds are scaled to a common\nmaximal response.\n\nnumber of snapshots. An asymptotic reconstruction error is approached very fast,\ni.e. for more then 25 place cells and more then two snapshots per location. Thus,\nfor a behaving organism exploring an unknown environment, this implies that a\nrelatively sparse exploration strategy su(cid:14)ces to create a complete representation of\nthe new environment.\n\nAbove we have seen that localization with a single snapshot is only possible modulo\na mirror along the axis where the cue is located. The systematic reconstruction\nerror introduced by this short-coming can be determined analytically and is (cid:25) 0.13\nin units of the side-length of the square environment. For an increasing number\nof snapshots, the probability that all snapshots are from the same pair of opposite\ncues, decreases exponentially fast and we therefore also expect the systematic error\nto vanish. Considering 100 place cells, the di(cid:11)erence in reconstruction error between\n1 and 10 snapshots amounts to 0:147 (cid:6) 0:008 (mean (cid:6) SD) which is close to the\npredicted systematic error due to the e(cid:11)ect discussed above. Thus, an increasing\nnumber of snapshots primarily helps to resolve ambiguities due to the symmetry\nproperties of the TPC.\n\n3.4 Place (cid:12)eld shape\n\nFig. 5b-c shows scatter plots of both, place (cid:12)eld asymmetry and size versus the\ndistance of the place (cid:12)eld\u2019s associated location from the center of the environment.\nThere is a tendency that o(cid:11)-center place cells have more asymmetric place (cid:12)elds\nthan cells closer to the center (r=0.32) which is in accordance with experimental\nresults[5]. Regarding place (cid:12)eld size, there is no direct relation to the associated\nposition of place (cid:12)eld (r=0.08) apart from the fact that the variance is maximal\nfor intermediate distances from the center.\nIt must be noted, however, that the\nsize of the place (cid:12)eld critically depends on the choice of the threshold (cid:18) in eq. 2.\nIndeed di(cid:11)erent relations between place (cid:12)eld size and location can be achieved by\nassuming non homogeneous thresholds, which for example might be determined for\n\n\fa\n\nr\no\nr\nr\ne\n\n \n\nn\no\n\ni\nt\nc\nu\nr\nt\ns\nn\no\nc\ne\nr\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n2\n\n4\n\n6\n# snapshots\n\nb\n\n0.6\n\n0.4\n\n0.2\n\nr\ne\nt\nn\ne\nc\n \nm\no\nr\nf\n \ne\nc\nn\na\nt\ns\nd\n\ni\n\n0\n0\n\n8\n\n10\n\n100\n\n25\n\n50\n\n75\n\n# placecells\n\nc\n\n0.6\n\n0.4\n\n0.2\n\n0\n0\n\n0.25\n\n0.5\nasymmetry\n\n0.75\n\n0.1\n\nsize\n\n0.2\n\n0.3\n\nFigure 5: (a) Position reconstruction error. The average error in position recon-\nstruction as a function of the number of snapshots and the number of place cells\nconsidered. (b-c) Scatter plots of the place (cid:12)eld asymmetry/size versus the dis-\ntance of the place (cid:12)elds associated location to the center of the environment. The\ncorrelation coe(cid:14)cients are r=0.32/0.08 respectively.\n\neach place cell individually based on its range of inputs. The measure for place\n(cid:12)eld asymmetry, in contrast, has shown to be more stable in this respect (data not\nshown).\n\n4 Discussion\n\nWe have shown that the bounded invariance properties of visual stimuli encoded\nin a TPC are well suited for the formation of place (cid:12)elds. More speci(cid:12)cally, the\ntopology preservation of similarity amongst di(cid:11)erent viewing angles and distances\nallows a direct translation of the visual similarity between two views to their relative\nlocation within an environment. Therefore, only a small number of place cells\nare required for position reconstruction. Regarding the shape of the place (cid:12)elds,\nonly weak correlations between its asymmetry and its distance to the center of the\nenvironment have been found.\n\nAs opposed to the present approach, experimental results suggest that place (cid:12)eld\nformation in the hippocampus relies on multiple sensory modalities and not only\nvision. Although it was shown that vision may play an important role[3], proprio-\nceptive stimuli, for example, can become important in situations where either visual\ninformation is not available such as in the dark or in the presence of visual singu-\nlarities, where two di(cid:11)erent locations elicit the same visual sensation[9]. A type\nof information strongly related to proprioceptive stimuli, is the causal structure\nof behavior which imposes continuous movement in both space and time, i.e. the\ninformation about the last location can be of great importance for estimating the\ncurrent location[10]. Indeed, a recent study has shown that position reconstruction\nerror greatly reduces, if this additional constraint is taken into account[8]. In the\npresent approach we analyzed the properties of place cells in the absence of a behav-\nioral paradigm. Thus, it is not meaningful to integrate information over di(cid:11)erent\nlocations. We expect, however, that for a continuously behaving robot this type of\ninformation would be particularly useful to resolve the ambiguities introduced by\nthe mirror invariance in the case of a single visual snapshot.\nAs opposed to the large (cid:12)eld of view of rats ((cid:25) 320(cid:14)[11]) the robot used in this\nstudy has a very restricted (cid:12)eld of view. This has direct implications on the robot\u2019s\nbehavior. The advantage of only considering a 60(cid:14) (cid:12)eld of view is, however, that\nthe amount of information contributed by single cues can be investigated. We\n\n\fhave shown, that a single view allows for localization modulo a mirror along the\norientation of the corresponding stimulus. This ambiguity can be resolved taking\nadditional snapshots into account. In this context, maximal additional information\ncan be gained if a new snapshot is taken along a direction orthogonal to the (cid:12)rst\nsnapshot which is also more e(cid:14)cient from a behavioral point of view than using\nstimuli from opposite directions.\n\nThe acquisition of place cells was supervised, in that their associated locations are\nassumed to correspond to the nodes of a regular lattice spanning the environment.\nWhile this allows for a controlled statistical analysis of the place cell properties,\nit is not very likely that an autonomously behaving agent can acquire place cells\nin such a regular fashion. Rather, place cells have to be acquired incrementally\nbased on purely local information.\nInformation about the number of place cells\nresponding or the maximal response of any place cell for a particular location is\nlocally available to the agent, and can therefore be used to selectively trigger the\nacquisition of new place cells. In general, the representation will most likely also\nre(cid:13)ect further behavioral requirements in that important locations where decisions\nneed to be taken, will be represented by a high density of place cells.\n\nAcknowledgments\n\nThis work was supported by the European Community/Bundesamt f(cid:127)ur Bildung und\nWissenschaft Grant IST-2001-33066 (to P.V.). The authors thank Peter K(cid:127)onig for\nvaluable discussions and contributions to this study.\n\nReferences\n\n[1] J. O\u2019Keefe and J. Dostrovsky. The hippocampus as a spatial map: preliminary evidence\nfrom unit activity in the freely moving rat. Brain Res, 34:171{5, 1971.\n\n[2] J. O\u2019Keefe and L. Nadel. The hippocampus as a cognitive map. Clarendon Press,\nOxford, 1987.\n\n[3] J. Knierim, H. Kudrimoti, and B. McNaughton. Place cells, head direction cells, and\nthe learning of landmark stability. J. Neursci., 15:1648{59, 1995.\n\n[4] J. O\u2019Keefe and N. Burgess. Geometric determinants of the place (cid:12)elds of hippocampal\nneurons. Nature, 381(6581):425{8, 1996.\n\n[5] J. O\u2019Keefe, N. Burgess, J.G. Donnett, K.J. Je(cid:11)rey, and E.A. Maguire. Place cells,\nnavigational accuracy, and the human hippocampus. Philos Trans R Soc Lond B Biol\nSci., 353(1373):1333{40, 1998.\n\n[6] N. Burgess, J.G. Donnett, H.J. Je(cid:11)rey, and J. O\u2019Keefe. Robotic and neuronal sim-\nulation of the hippocampus and rat navigation. Philos Trans R Soc Lond B Biol Sci.,\n352(1360):1535{43, 1997.\n\n[7] R. Wyss, P. K(cid:127)onig, and P.F.M.J. Verschure. Invariant representations of visual patterns\nin a temporal population code. Proc. Natl. Acad. Sci. USA, 100(1):324{9, 2003.\n\n[8] K. Zhang, I. Ginzburg, B.L. McNaughton, and T.J. Sejnowski. Interpreting neuronal\npopulation activity by reconstruction: Uni(cid:12)ed framework with application in hippocampal\nplace cells. J Neurophysiol., 79(2):1017{44, 1998.\n\n[9] A. Arleo and W. Gerstner. Spatial cognition and neuro-mimetic navigation: a model\nof hippocampal place cell activity. Biol Cybern., 83(3):287{99, 2000.\n\n[10] G. Quirk, R. Muller, and R. Kubie. The (cid:12)ring of hippocampal place cells in the dark\ndepends on the rat\u2019s recent experience. J. Neursci., 10:2008{17, 1995.\n\n[11] A. Hughes. A schematic eye for the rat. Visual Res., 19:569{88, 1977.\n\n\f", "award": [], "sourceid": 2425, "authors": [{"given_name": "Reto", "family_name": "Wyss", "institution": null}, {"given_name": "Paul", "family_name": "Verschure", "institution": null}]}