{"title": "Emergence of Global Structure from Local Associations", "book": "Advances in Neural Information Processing Systems", "page_first": 1101, "page_last": 1108, "abstract": null, "full_text": "Emergence of Global Structure from \n\nLocal Associations \n\nThea B. Ghiselli-Crippa \n\nDepartment of Infonnation Science \n\nUniversity of Pittsburgh \nPittsburgh PA 15260 \n\nPaul W. Munro \n\nDepartment of Infonnation Science \n\nUniversity of Pittsburgh \nPittsburgh PA 15260 \n\nABSTRACT \n\nA variant of the encoder architecture, where units at the input and out(cid:173)\nput layers represent nodes on a graph. is applied to the task of mapping \nlocations to sets of neighboring locations. The degree to which the re(cid:173)\nsuIting internal (i.e. hidden unit) representations reflect global proper(cid:173)\nties of the environment depends upon several parameters of the learning \nprocedure. Architectural bottlenecks. noise. and incremental learning of \nlandmarks are shown to be important factors in maintaining topograph(cid:173)\nic relationships at a global scale. \n\n1 INTRODUCTION \n\nThe acquisition of spatial knowledge by exploration of an environment has been the sub(cid:173)\nject of several recent experimental studies. investigating such phenomena as the relation(cid:173)\nship between distance estimation and priming (e.g. McNamara et al .\u2022 1989) and the influ(cid:173)\nence of route infonnation (McNamara et al., 1984). Clayton and Habibi (1991) have gath(cid:173)\nered data suggesting that temporal contiguity during exploration is an important factor in \ndetennining associations between spatially distinct sites. This data supports the notion \nthat spatial associations are built by a temporal process that is active during exploration \nand by extension supports Hebb's (1949) neurophysiological postulate that temporal as(cid:173)\nsociations underlie mechanisms of synaptic learning. Local spatial infonnation acquired \nduring the exploration process is continuously integrated into a global representation of \nthe environment (cognitive map). which is typically arrived at by also considering global \nconstraints. such as low dimensionality. not explicitly represented in the local relation(cid:173)\nships. \n\n1101 \n\n\f1102 \n\nGhiselli-Crippa and Munro \n\n2 NETWORK ARCHITECTURE AND TRAINING \n\nThe goal of this network design is to reveal structure among the internal representations \nthat emerges solely from integration of local spatial associations; in other words. to show \nhow a network trained to learn only local spatial associations characteristic of an environ(cid:173)\nment can develop internal representations which capture global spatial properties. A vari(cid:173)\nant of the encoder architecture (Ackley et al .\u2022 1985) is used to associate each node on a 2-\nD graph with the set of its neighboring nodes. as defined by the arcs in the graph. This 2-\nD neighborhood mapping task is similar to the I-D task explored by Wiles (1993) using \nan N-2-N architecture. which can be characterized in terms of a graph environment as a \ncircular chain with broad neighborhoods. \n\nIn the neighborhood mapping experiments described in the following, the graph nodes are \nvisited at random: at each iteration, a training pair (node-neighborhood) is selected at ran(cid:173)\ndom from the training set. As in the standard encoder task, the input patterns are all or-' \nthogonal. so that there is no structure in the input domain that the network could exploit \nin constructing the internal representations; the only information about the structure of \nthe environment comes from the local associations that the network is shown during \ntraining. \n\n2.1 N\u00b7H\u00b7N NETWORKS \n\nThe neighborhood mapping task was first studied using a strictly layered feed-forward N(cid:173)\nH-N architecture, where N is the number of input and output units. corresponding to the \nnumber of nodes in the environment, and H is the number of units in the single hidden \nlayer. Experiments were done using square grid environments with wrap-around (toroidal) \nand without wrap-around (bounded) at the edges. The resulting hidden unit representations \nreflect global properties of the environment to the extent that distances between them cor(cid:173)\nrelate with distances between corresponding points on the grid. These two distance mea(cid:173)\nsures are plotted against one another in Figure 1 for toroidal and bounded environments. \n\n5x5 Grid \n\n4 Hidden Units \n\nU5\u00b7, . - - - - - - - - - - - : \" '1 \n\n1.4 \n\n1.2 \n1.0 \n\ne:: \no _ \n\u00b70 \n.!! \ne:: \u2022 06 \n:Ie:: \n~ lIS 0.6 \ng.~ a:c 0.4 \n0.2 \n\nO.ol---~----\"T\"\"\"---i \n\n3 \n\no \n\n1 \n\n2 \nGrid Distance \n\nWith wrap-around \n\n5x5 Grid \n\n4 Hidden Units \n\n2.0-,.----------......., \n\nR\"2 = 0.499 \n\n1.5 \n\ne:: \no \niii \ni\u00b71.0 \n.,0 \n\n!i \ni\"~ a:c 0.5 \n\n: \n\nO.O ...... _,....... __ ----r--__ -...-..-4 \n\n234 \n\n5 \n\no \n\nGrid Distance \nNo wrap-around \n\nFigure 1: Scatterplots of Distances between Hidden Unit Representations vs. Distances \nbetween Corresponding Locations in the Grid Environment. \n\n\fEmergence of Global Structure from Local Associations \n\n1103 \n\n2.2 N\u00b72\u00b7H\u00b7N Networks \n\nA hidden layer with just two units forces representations into a 2-D space. which matches \nthe dimensionality of the environment. Under this constraint. the image of the environ(cid:173)\nment in the 2-D space may reflect the topological structure of the environment. This con(cid:173)\njecture leads to a further conjecture that the 2-D representations will also reveal global re(cid:173)\nlationships of the environment. Since the neighborhoods in a 2-D representation are not \nlinearly separable regions. another layer (H-Iayer) is introduced between the two-unit layer \nand the output (see Figure 2). Thus. the network has a strictly layered feed-forward \nN-2-H-N architecture. where the N units at the input and output layers correspond to the \nN nodes in the environment. two units make up the topographic layer. and H is the num(cid:173)\nber of units chosen for the new layer (H is estimated according to the complexity of the \ngraph). Responses for the hidden units (in both the T- and H-layers) are computed using \nthe hyperbolic tangent (which ranges from -1 to +1). while the standard sigmoid (0 to +1) \nis used for the output units. to promote orthogonality between representations (Munro. \n1989). Instead of the squared error. the cross entropy function (Hinton. 1987) is used to \navoid problems with low derivatives observed in early versions of the network. \n\n~ooe@o~oo \n\n.~ \n\n<: .\u2022 \u00b7\u00b7.\u00b7 \u2022\u2022\u2022 \u00b7.:.:.\u00b7 \u2022. 1.: :; .::'.,..: .............. . \nf \n:' <. :>/<::>. : \u2022. \u00b7 \u2022. 1 .\u2022. \u00b7\u2022\u00b7 .\u2022\u2022 \u00b7 .\u2022. \u00b7 \u2022. :\u00b7.\u00b7 \u2022\u2022. \u00b7 \u2022. :.\u00b7 \u2022. \u00b7 \u2022. : .\u2022\u2022. \u00b7.\u00b7: \u2022.\u2022. \u00b7! .. \u00b7 \u2022. i: \u2022. i.\\.::.j) \n\n' .. :.'/> .'><' \n\n00 \n\no \n\n2 \n\n3 \n\n5 \n\n6 \n\n7 \n\n8 \n\noooeooooo \n\nFigure 2: A 3x3 Environment and the Corresponding Network. When input unit 3 is \nactivated, the network responds by activating the same unit and all its neighbors. \n\n3 RESULTS \n\n3.1 T\u00b7UNIT RESPONSES \n\nNeighborhood mapping experiments were done using bounded square grid environments \nand N-2-H-N networks. After training, the topographic unit activities corresponding to \neach of the N possible inputs are plotted, with connecting lines representing the arcs from \n\n\f1104 \n\nGhiselli-Crippa and Munro \n\nthe environment. Each axis in Figure 3 represents the activity of one of the T-units. \nThese maps can be readily examined to study the relationship between their global struc(cid:173)\nture and the structure of the environment. The receptive fields of the T-units give an alter(cid:173)\nnative representation of the same data: the response of each T-unit to all N inputs is repre(cid:173)\nsented by N circles arranged in the same configuration as the nodes in the grid environ(cid:173)\nment. Circle size is proportional to the absolute value of the unit activity; filled circles \nindicate negative values, open circles indicate positive values. The receptive field repre(cid:173)\nsents the T-unit's sensitivity with respect to the environment. \n\n\u2022 \n\n\u2022 0 \n\n\u2022\u2022\u2022 \n000 \n1:8 \n\n\u2022 \u00b70 \n\n\u2022\u2022\u2022 \n... \n~c8 \n\n26~ \n\n. \n\noCXX) \n0 00 \n\u2022\u2022\u2022\u2022 \n\n\u2022\u2022\u2022\u2022 \nleo o \n. \u00b08 \n\n.\u00b70 \n\n\u2022 00 \n\nFigure 3: Representations at the Topographic Layer. Activity plots and receptive fields \nfor two 3x3 grids (left and middle) and a 4x4 grid(right). \n\nThe two 3x3 cases shown in Figure 3 illustrate alternative solutions that are each locally \nconsistent, but have different global structure. In the first case, it is evident how the first \nunit is sensitive to changes in the vertical location of the grid nodes, while the second \nunit is sensitive to their horizontal location. The axes are essentially rotated 45 degrees in \nthe second case. Except for this rotation of the reference axes, both representations cap(cid:173)\ntured the global structure of the 3x3 environment. \n\n3.2 NOISE IN THE HIDDEN UNITS \n\nWhile networks tended to fonn maps in the T -layer that reflect the global structure of the \nenvironment, in some cases the maps showed correspondences that were less obvious: \ni.e., the grid lines crossed, even though the network converged. A few techniques have \nproven valuable for promoting global correspondence between the topographic representa(cid:173)\ntions and the environment, including Judd and Munro's (1993) introduction of noise as \npressure to separate representations. The noise is implemented as a small probability for \n\n\fEmergence of Global Structure from Local Associations \n\n1105 \n\nreversing the sign of individual H-unit outputs. As reported in a previous study \n(Ghiselli-Crippa and Munro, 1994), the presence of noise causes the network to develop \ntopographic representations which are more separated, and therefore more robust, so that \nthe correct output units can be activated even if one or more of the H-units provides an \nincorrect output. From another point of view, the noise can be seen as causing the net(cid:173)\nwork to behave as if it had an effective number of hidden units which is smaller than the \ngiven number H. The introduction of noise as a means to promote robust topographic \nrepresentations can be appreciated by examining Figure 4, which illustrates the represen(cid:173)\ntations of a 5x5 grid developed by a 25-2-20-25 network trained without noise (left) and \nwith noise (middle) (the network was initialized with the same set of small random \nweights in all cases). Note that the representations developed by the network subject to \nnoise are more separated and exhibit the same global structure as the environment. To \navoid convergence problems observed with the use of noise throughout the whole training \nprocess, the noise can be introduced at the beginning of training and then gradually re(cid:173)\nduced over time. \n\nA similar technique involves the use of low-level noise injected in the T-Iayer to directly \npromote the formation of well-separated representations. Either Gaussian or uniform \nnoise directly added to the T-unit outputs gives comparable results. The use of noise in \neither hidden layer has a beneficial influence on the formation of globally consistent rep(cid:173)\nresentations. However. since the noise in the H-units exerts only an indirect influence on \nthe T -unit representations, the choice of its actual value seems to be less crucial than in \nthe case where the noise is directly applied at the T-Iayer. \n\nThe drawback for the use of noise is an increase in the number of iterations required by \nthe network to converge, that scales up with the magnitude and duration of the noise. \n\nFigure 4: Representations at the Topographic Layer. Training with no noise (left) and \nwith noise in the hidden units (middle); training using landmarks (right). \n\n3.3 LANDMARK LEARNING \n\nAnother effective method involves the organization of training in 2 separate phases, to \nmodel the acquisition of landmark information followed by the development of route \nand/or survey knowledge (Hart and Moore, 1973; Siegel and White, 1975). This method \nis implemented by manipulating the training set during learning, using coarse spatial res(cid:173)\nolution at the outset and introducing interstitial features as learning progresses to the sec(cid:173)\nond phase. The first phase involves training the network only on a subset of the possible \n\n\f1106 \n\nGhiselli-Crippa and Munro \n\nN patterns (landmarks). Once the landmarks have been learned. the remaining patterns are \nadded to the training set. In the second phase. training proceeds as usual with the full set \nof training patterns; the only restriction is applied to the landmark points. whose topo(cid:173)\ngraphical representations are not allowed to change (the corresponding weights between \ninput units and T-units are frozen). thus modeling the use of landmarks as stable reference \npoints when learning the details of a new environment. The right pane of Figure 4 illus(cid:173)\ntrates the representations developed for a 5x5 grid using landmark training; the same 25-2-\n20-25 network mentioned above was trained in 2 phases. first on a subset of 9 patterns \n(landmarks) and then on the full set of 25 patterns (the landmarks are indicated as white \ncircles in the activity plot). \n\n3.4 NOISE IN LANDMARK LEARNING \n\nThe techniques described above (noise and landmark learning) can be combined together to \nbetter promote the emergence of well-structured representation spaces. In particular, noise \ncan be used during the first phase of landmark learning to encourage a robust representa(cid:173)\ntion of the landmarks: Figure 5 illustrates the representations obtained for a 5x5 grid \nusing landmark training with two different levels of noise in the H-units during the first \nphase. The effect of noise is evident when comparing the 4 comer landmarks in the right \npane of Figure 4 (landmark learning with no noise) with those in Figure 5. With increas(cid:173)\ning levels of noise. the T-unit activities corresponding to the 4 comer landmarks approach \nthe asymptotic values of + 1 and -1; the activity plots illustrate this effect by showing \nhow the comer landmark representations move toward the comers of T-space, reaching a \nconfiguration which provides more resistance to noise. During the second phase of train(cid:173)\ning, the landmarks function as reference points for the additional features of the environ(cid:173)\nment and their positioning in the representational space therefore becomes very impor(cid:173)\ntant. A well-fonned, robust representation of the landmarks at the end of the first phase \nis crucial for the fonnation of a map in T-space that reflects global structure, and the use \nof noise can help promote this. \n\nFigure 5: Representations at the Topographic Layer. Landmark training using noise in \nphase 1: low noise level (left). high noise level (right). \n\n4 DISCUSSION \n\nLarge scale constraints intrinsic to natural environments. such as low dimensionality, are \nnot necessarily reflected in local neighborhood relations, but they constitute infonnation \nwhich is essential to the successful development of useful representations of the environ-\n\n\fEmergence of Global Structure from Local Associations \n\n1107 \n\nment. In our model, some of the constraints imposed on the network architecture effec(cid:173)\ntively reduce the dimensionality of the representational space. Constraints have been in(cid:173)\ntroduced several ways: bottlenecks, noise, and landmark learning; in all cases, these con(cid:173)\nstraints have had constructive influences on the emergence of globally consistent repre(cid:173)\nsentation spaces. The approach described presents an alternative to Kohonen's (1982) \nscheme for capturing topography; here, topographic relations emerge in the representa(cid:173)\ntional space, rather than in the weights between directly connected units. \n\nThe experiments described thus far have focused on how global spatial structure can \nemerge from the integration of local associations and how it is affected by the introduc(cid:173)\ntion of global constraints. As mentioned in the introduction, one additional factor influ(cid:173)\nencing the process of acquisition of spatial knowledge needs to be considered: temporal \ncontiguity during exploration. that is. how temporal associations of spatially adjacent lo(cid:173)\ncations can influence the representation of the environment. For example, a random type \nof exploration (\"wandering\") can be considered. where the next node to be visited is select(cid:173)\ned at random from the neighbors of the current node. Preliminary studies indicate that \nsuch temporal contiguity during training reSUlts in the fonnation of hidden unit represen(cid:173)\ntations with global properties qualitatively similar to those reported here. Alternatively, \nmore directed exploration methods can be studied. with a systematic pattern guiding the \nchoice of the next node to be visited. The main purpose of these studies will be to show \nhow different exploration strategies can affect the formation and the characteristics of cog(cid:173)\nnitive maps of the environment. \n\nHigher order effects of temporal and spatial contiguity can also be considered. However, \nin order to capture regularities in the training process that span several exploration steps. \nsimple feed-forward networks may no longer be sufficient; partially recurrent networks \n(Elman, 1990) are a likely candidate for the study of such processes. \n\nAcknowledgements \n\nWe wish to thank Stephen Hirtle, whose expertise in the area of spatial cognition greatly \nbenefited our research. We are also grateful for the insightful comments of Janet Wiles. \n\nReferences \n\nD. H. Ackley. G. E. Hinton, and T. J. Sejnowski (1985) \"A learning algorithm for \nBoltzmann machines,\" Cognitive Science, vol. 9. pp. 147-169. \n\nK. Clayton and A. Habibi (1991) \"The contribution of temporal contiguity to the spatial \npriming effect,\" Journal of Experimental Psychology: Learning, Memory, and \nCognition. vol. 17, pp. 263-27l. \n\nJ. L. Elman (1990) \"Finding structure in time,\" Cognitive Science, vol. 14, pp. 179-\n211. \n\nT. B. Ghiselli-Crippa and P. W. Munro (1994) \"Learning global spatial structures from \nlocal associations,\" in M. C Mozer, P. Smolensky, D. S. Touretzky, J. L. Elman, and A. \nS. Weigend (Eds.), Proceedings of the 1993 Connectionist Models Summer School, \nHillsdale, NJ: Erlbaum. \n\n\f1108 \n\nGhiselli-Crippa and Munro \n\nR. A. Hart and G. T. Moore (1973) \"The development of spatial cognition: A review,\" in \nR. M. Downs and Stea (Eds.), Image and Environment, Chicago, IL: Aldine. \n\nD. O. Hebb (1949) The Organization of Behavior, New York, NY: Wiley. \n\nG. E. Hinton (1987) \"Connectionist learning procedures,\" Technical Report CMU-CS-\n87-115, version 2, Pittsburgh, PA: Carnegie-Mellon University, Computer Science \nDepartment. \n\nS. Judd and P. W. Munro (1993) \"Nets with unreliable hidden nodes learn error-correcting \ncodes,\" in C. L. Giles, S. J. Hanson, and J. D. Cowan, Advances in Neural Information \nProcessing Systems 5, San Mateo, CA: Morgan Kaufmann. \n\nT. Kohonen (1982) \"Self-organized fonnation of topological correct feature maps,\" \nBiological Cybernetics, vol. 43, pp. 59-69. \n\nT. P. McNamara, J. K. Hardy, and S. C. Hirtle (1989) \"Subjective hierarchies in spatial \nmemory,\" Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. \n15, pp. 211-227. \n\nT. P. McNamara, R. Ratcliff, and G. McKoon (1984) \"The mental representation of \nknowledge acquired from maps,\" Journal of Experimental Psychology: Learning, \nMemory, and Cognition, vol. 10, pp. 723-732. \n\nP. W. Munro (1989) \"Conjectures on representations in backpropagation networks,\" \nTechnical Report TR-89-035, Berkeley, CA: International Computer Science Institute. \n\nA. W. Siegel and S. H. White (1975) \"The development of spatial representations of \nlarge-scale environments,\" in H. W. Reese (Ed.), Advances in Child Development and \nBehavior, New York, NY: Academic Press. \n\nJ. Wiles (1993) \"Representation of variables and their values in neural networks,\" in \nProceedings of the Fifteenth Annual Conference of the Cognitive Science Society, \nHillsdale, NJ: Erlbaum. \n\n\f", "award": [], "sourceid": 852, "authors": [{"given_name": "Thea", "family_name": "Ghiselli-Crippa", "institution": null}, {"given_name": "Paul", "family_name": "Munro", "institution": null}]}