{"title": "Investment Learning with Hierarchical PSOMs", "book": "Advances in Neural Information Processing Systems", "page_first": 570, "page_last": 576, "abstract": "", "full_text": "Investment Learning \n\nwith Hierarchical PSOMs \n\nJorg Walter and Helge Ritter \n\nDepartment of Information Science \n\nUniversity of Bielefeld, D-33615 Bielefeld, Germany \n\nEmail: {walter.helge}@techfak.uni-bielefeld.de \n\nAbstract \n\nWe propose a hierarchical scheme for rapid learning of context dependent \n\"skills\" that is based on the recently introduced \"Parameterized Self(cid:173)\nOrganizing Map\" (\"PSOM\"). The underlying idea is to first invest some \nlearning effort to specialize the system into a rapid learner for a more \nrestricted range of contexts. \n\nThe specialization is carried out by a prior \"investment learning stage\", \nduring which the system acquires a set of basis mappings or \"skills\" for \na set of prototypical contexts. Adaptation of a \"skill\" to a new context \ncan then be achieved by interpolating in the space of the basis mappings \nand thus can be extremely rapid. \n\nWe demonstrate the potential of this approach for the task of a 3D visuo(cid:173)\nmotor map for a Puma robot and two cameras. This includes the for(cid:173)\nward and backward robot kinematics in 3D end effector coordinates, the \n2D+2D retina coordinates and also the 6D joint angles. After the invest(cid:173)\nment phase the transformation can be learned for a new camera set-up \nwith a single observation. \n\n1 Introduction \n\nMost current applications of neural network learning algorithms suffer from a large number \nof required training examples. This may not be a problem when data are abundant, but in \nmany application domains, for example in robotics, training examples are costly and the \nbenefits of learning can only be exploited when significant progress can be made within a \nvery small number of learning examples. \n\n\fInvestment Learning with Hierarchical PSOMs \n\n571 \n\nIn the present contribution, we propose in section 3 a hierarchically structured learning ap(cid:173)\nproach which can be applied to many learning tasks that require system identification from \na limited set of observations. The idea builds on the recently introduced \"Parameterized \nSelf-Organizing Maps\" (\"PSOMs\"), whose strength is learning maps from a very small \nnumber of training examples [8, 10, 11]. \nIn [8], the feasibility of the approach was demonstrated in the domain of robotics, among \nthem, the learning of the inverse kinematics transform of a full 6-degree of freedom (DOF) \nPuma robot. In [10], two improvements were introduced, both achieve a significant in(cid:173)\ncrease in mapping accuracy and computational efficiency. In the next section, we give a \nshort summary of the PSOM algorithm; it is decribed in more detail in [11] which also \npresents applications in the domain of visual learning. \n\n2 The PSOM Algorithm \n\nA Parameterized Self-Organizing Map is a parametrized, m-dimensional hyper-surface \nM = {w(s) E X ~ rn.dls E S ~ rn.m} that is embedded in some higher-dimensional \nvector space X. M is used in a very similar way as the standard discrete self-organizing \nmap: given a distance measure dist(x, x') and an input vector x, a best-match location \ns*(x) is determined by minimizing \n\ns*:= argmin dist(x, w(s)) \n\nSES \n\n(1) \n\nThe associated \"best-match vector\" w(s*) provides the best approximation of input x in the \nmanifold M. If we require dist(\u00b7) to vary only in a subspace X in of X (i.e., dist( x, x') = \ndist(Px, Px/), where the diagonal matrix P projects into xin), s* (x) actually will only \ndepend on Px. The projection (l-P)w(s* (x)) E x out ofw(s* (x)) lies in the orthogonal \nsubspace x out can be viewed as a (non-linear) associative completion of a fragmentary \ninput x of which only the part Px is reliable. It is this associative mapping that we will \nexploit in applications of the PSOM. \n\nM \n\nis \n\nX out \n\n3 \n\nD aeA;;S \n\nconstructed \nas a manifold that passes \nthrough a given set D of \ndata examples (Fig. I de(cid:173)\npicts the situation schemat(cid:173)\nically). To this end, we \nassign to each data sam(cid:173)\nple a point a E Sand \ndenote the associated data \nsample by Wa. The set A \nof the assigned parameter \nvalues a should provide a \ngood discrete \"model\" of \nthe topology of our data set \n(Fig. I right). The assign(cid:173)\nment between data vectors \nand points a must be made \nin a topology preserving fashion to ensure good interpolation by the manifold M that is \nobtained by the following steps. \n\nFigure 1: Best-match s* and associative completion w(s*(x)) of \ninput Xl, X2 (Px) given in the input subspace Xin. Here in this \nsimple case, the m = 1 dimensional manifold M is constructed \nto pass through four data vectors (square marked). The left side \nshows the d = 3 dimensional embedding space X = xin X X out \nand the right side depicts the best match parameter s* (x) parameter \nmanifold S together with the \"hyper-lattice\" A of parameter values \n(indicated by white squares) belonging to the data vectors. \n\n\f572 \n\n1. WALTER, H. RITTER \n\nFor each point a E A, we construct a \"basis function\" H(\u00b7, a; A) or simplified I H(\u00b7, a) : \nS ~ 1R that obeys ( i) H (ai, aj) = 1 for i = j and vanishes at all other points of A i =J j \n(orthonormality condition,) and (ii) EaEA H (a, s) = 1 for ' Cartesian error !:1x \nCartesian x I--t u => pixel error \npixel Ul--t o\"obot => Cartesian error !:1x \n\nDirect trained \n\nT-PSOM \n\nT-PSOMwith \nMeta-PSOM \n\n1.4mm 0.008 \n1.2pix \n0.010 \n3.8mm 0.023 \n\n4.4mm 0.025 \n3.3 pix \n0.025 \n5.4mm 0.030 \n\nTable 1: Mean Euclidean deviation (mm or pixel) and normalized root mean square error (NRMS) \nfor 1000 points total in comparison of a direct trained T-PSOM and the described hierarchical Meta(cid:173)\nPSOM network, in the rapid learning mode after one single observation. \n\n5 Discussion and Conclusion \n\nA crucial question is how to structure systems, such that learning can be efficient. In \nthe present paper, we demonstrated a hierarchical approach that is motivated by a decom(cid:173)\nposition of the learning phase into two different stages: A longer, initial learning phase \n\"invests\" effort into a gradual and domain-specific specialization of the system. This in(cid:173)\nvestment learning does not yet produce the final solution, but instead pre-structures the \nsystem such that the subsequently final specialization to a particular solution (within the \nchosen domain) can be achieved extremely rapidly. \n\nTo implement this approach, we used a hierarchical architecture of mappings. While in \nprinciple various kinds of network types could be used for this mappings, a practically \nfeasible solution must be based on a network type that allows to construct the required \nbasis mappings from rather small number of training examples. In addition, since we use \ninterpolation in weight space, similar mappings should give rise to similar weight sets to \nmake interpolation meaningful. PSOM meat this requirements very well, since they allow \na direct non-iterative construction of smooth mappings from rather small data sets. They \nachieve this be generalizing the discrete self-organizing map [3, 9] into a continuous map \nmanifold such that interpolation for new data points can benefit from topology information \nthat is not available to most other methods. \n\nWhile PSOMs resemble local models [4, 5, 6] in that there is no interference between \ndifferent training points, their use of a orthogonal set of basis functions to construct the \n\n\f576 \n\nJ. WALTER, H. RIITER \n\nmap manifold put them in a intennediate position between the extremes of local and of \nfully distributed models. \n\nA further very useful property in the present context is the ability of PSOMs to work as \nan attractor network with a continuous attractor manifold. Thus a PSOM needs no fixed \ndesignation of variables as inputs and outputs; Instead the projection matrix P can be used \nto freely partition the full set of variables into input and output values. Values of the latter \nare obtained by a process of associative completion. \n\nTechnically, the investment learning phase is realized by learning a set of prototypical basis \nmappings represented as weight sets of a T-PSOM that attempt to cover the range of tasks \nin the given domain. The capability for subsequent rapid specialization within the domain \nis then provided by an additional mapping that maps a situational context into a suitable \ncombination of the previously learned prototypical basis mappings. The construction of \nthis mapping again is solved with a PSOM (\"Meta\"-PSOM) that interpolates in the space \noJprototypical basis mappings that were constructed during the \"investment phase\". \n\nWe demonstrated the potential of this approach with the task of 3D visuo-motor mapping, \nlearn-able with a single observation after repositioning a pair of cameras. \nThe achieved accuracy of 4.4 mm after learning by a single observation, compares very \nwell with the distance range 0.5-2.1 m of traversed positions. As further data becomes \navailable, the T-PSOM can certainly be fine-tuned to improve the perfonnance to the level \nof the directly trained T-PSOM. \n\nThe presented arrangement of a basis T-PSOM and two Meta-PSOMs demonstrates further \nthe possibility to split hierarchical learning in independently changing domain sets. When \nthe number of involved free context parameters is growing, this factorization is increasingly \ncrucial to keep the number of pre-trained prototype mappings manageable. \n\nReferences \n\n[1] K. Fu, R. Gonzalez and C. Lee. Robotics: Control, Sensing, Vision, and Intelligence. McGraw(cid:173)\n\nHill, 1987 \n\n[2] F. Girosi and T. Poggio. Networks and the best approximation property. BioI. Cybem., \n\n63(3):169-176,1990. \n\n[3] T. Kohonen. Self-Organization and Associative Memory. Springer, Heidelberg, 1984. \n[4] 1. Moody and C. Darken. Fast learning in networks of locally-tuned processing units. Neural \n\nComputation, 1:281-294, 1989. \n\n[5] S. Omohundro. Bumptrees for efficient function, constraint, and classification learning. In \n\nNIPS*3, pages 693-699. Morgan Kaufman Publishers, 1991. \n\n[6] 1. Platt. A resource-allocating network for function interpolation. Neural Computation, 3:213-\n\n255,1991 \n\n[7] M. Powell. Radial basis functions for multivariable interpolation: A review, pages 143-167. \n\nClarendon Press, Oxford, 1987. \n\n[8] H. Ritter. Parametrized self-organizing maps. In S. Gielen and B. Kappen; editors, ICANN'93-\n\nProceedings, Amsterdam, pages 568-575. Springer Verlag, Berlin, 1993. \n\n[9] H. Ritter, T. Martinetz, and K. Schulten. Neural Computation and Self-organizing Maps. Ad(cid:173)\n\ndison Wesley, 1992. \n\n[10] 1. Walter and H. Ritter. Local PSOMs and Chebyshev PSOMs - improving the parametrised \n\nself-organizing maps. In Proc. ICANN, Paris, volume 1, pages 95-102, October 1995. \n\n[11] 1. Walter and H. Ritter. Rapid learning with parametrized self-organizing maps. Neurocomput(cid:173)\n\ning, Special Issue, (in press), 1996. \n\n\f", "award": [], "sourceid": 1141, "authors": [{"given_name": "J\u00f6rg", "family_name": "Walter", "institution": null}, {"given_name": "Helge", "family_name": "Ritter", "institution": null}]}