{"title": "Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison", "book": "Advances in Neural Information Processing Systems", "page_first": 123, "page_last": 130, "abstract": null, "full_text": "Kohonen Feature Maps and Growing \n\nCell Structures -\n\na Performance Comparison \n\nBernd Fritzke \n\nInternational Computer Science Institute \n\n1947 Center Street, Suite 600 \nBerkeley, CA 94704-1105, USA \n\nAbstract \n\nA performance comparison of two self-organizing networks, the Ko(cid:173)\nhonen Feature Map and the recently proposed Growing Cell Struc(cid:173)\ntures is made. For this purpose several performance criteria for \nself-organizing networks are proposed and motivated. The models \nare tested with three example problems of increasing difficulty. The \nKohonen Feature Map demonstrates slightly superior results only \nfor the simplest problem. For the other more difficult and also more \nrealistic problems the Growing Cell Structures exhibit significantly \nbetter performance by every criterion . Additional advantages of \nthe new model are that all parameters are constant over time and \nthat size as well as structure of the network are determined auto(cid:173)\nmatically. \n\n1 \n\nINTRODUCTION \n\nSelf-organizing networks are able to generate interesting low-dimensional represen(cid:173)\ntations of high-dimensional input data. The most well-known of these models is \nthe Kohonen Feature Map (Kohonen [1982)) . So far it has been applied to a large \nvariety of problems including vector quantization (Schweizer et al. [1991)), biolog(cid:173)\nical modelling (Obermayer, Ritter & Schulten [1990)), combinatorial optimization \n(Favata & Walker [1991]) and also processing of symbolic information(Ritter & \nKohonen [1989)) . \n\n123 \n\n\f124 \n\nFritzke \n\nIt has been reported by a number of researchers, that one disadvantage of Kohonen's \nmodel is the fact, that the network structure had to be specified in advance. This is \ngenerally not possible in an optimal way since a necessary piece of information, the \nprobability distribution of the input signals, is usually not available. The choice of \nan unsuitable network structure, however, can badly degrade network performance. \n\nRecently we have proposed a new self-organizing network model - the Growing Cell \nStructures - which can automatically determine a problem specific network struc(cid:173)\nture (Fritzke [1992]). By now the model has been successfully applied to clustering \n(Fritzke [1991]) and combinatorial optimization (Fritzke & Wilke [1991]). \nIn this contribution we directly compare our model to that of Kohonen. We first \nreview some general properties of self-organizing networks and several performance \ncriteria for these networks are proposed and motivated. The new model is then \nbriefly described. Simulation results are presented and allow a comparison of both \nmodels with respect to the proposed criteria. \n\n2 SELF-ORGANIZING NETWORKS \n\n2.1 CHARACTERlSTICS \n\nA self-organizing network consists of a set of neurons arranged in some topolog(cid:173)\nical structure which induces neighborhood relations among the neurons. An n(cid:173)\ndimensional reference vector is attached to every neuron. This vector determines \nthe specific n-dimensional input signal to which the neuron is maximally sensitive. \n\nBy assigning to every input signal the neuron with the nearest reference vector \n(according to a suitable norm), a mapping is defined from the space of all possible \ninput signals onto the neural structure. A given set of reference vectors thus divides \nthe input vector space into regions with a common nearest reference vector. These \nregions are commonly denoted as Voronoi regions and the corresponding partition \nof the input vector space is denoted Voronoi partition. \n\nSelf-organizing networks learn (change internal parameters) in an unsupervised \nmanner from a stream of input signals. These input signals obey a generally un(cid:173)\nknown probability distribution. For each input signal the neuron with the nearest \nreference vector is determined, the so-called \"best matching unit\" (bmu). The ref(cid:173)\nerence vectors of the bmu and of a number of its topological neighbors are moved \ntowards the input signal. The adaptation of topological neighbors distinguishes \nself-organization (\"winner take most\") from competitive learning where only the \nbmu is adapted (\"winner take all\"). \n\n2.2 PERFORMANCE CRlTERlA \n\nOne can identify three main criteria for self-organizing networks. The importance \nof each criterion may vary from application to application. \n\nTopology Preservation. This denotes two properties of the mapping defined by \nthe network. We call the mapping topology-preserving if \n\n\fKohonen Feature Maps and Growing Cell Structures-a Performance Comparison \n\n125 \n\na) similar input vectors are mapped onto identical or closely neighboring neu(cid:173)\n\nrons and \n\nb) neighboring neurons have similar reference vectors. \n\nProperty a) ensures, that small changes of the input vector cause correspondingly \nsmall changes in the position of the bmu. The mapping is robust against distortions \nof the input , a very important property for applications dealing with real , noisy data. \nProperty b) ensures robustness of the inverse mapping . The topology preservation \nis especially interesting when the dimension of the input vectors is higher than the \nnetwork dimension. Then the mapping reduces the data dimension but usually \npreserves important similarity relations among the input data. \n\nModelling of Probability Distribution. A set of reference vectors is said to \nmodel the probability distribution, ifthe local density of reference vectors in the input \nvector space approaches the probability density of the input vector distribution . \n\nThis property is desirable for two reasons. First, we get an implicit model of the \nunknown probability distribution underlying the input signals. Second, the network \nbecomes fault-tolerant against damage, since every neuron is only \"responsible\" for \na small fraction of all input vectors . If neurons are destroyed for some reason the \nmapping ability of the network degrades only proportionally to the number of the \ndestroyed neurons (soft fail) . This is a very desirable property for technical (as well \nas natural) systems. \nMinimization of Quantization Error. The quantization error for a given input \nsignal is the distance between this signal and the reference vector of the bmu . We \ncall a set of reference vectors error minimizing for a given probability distribution \nif the mean quantization error is minimized. \n\nThis property is important, if the original signals have to be reconstructed from \nthe reference vectors which is a very common situation in vector quantization. The \nquantization error in this case limits the accuracy of the reconstruction . \n\nOne should note that the optimal distribution of reference vectors for error mini(cid:173)\nmization is generally different from the optimal distribution for distribution mod(cid:173)\nelling. \n\n3 THE GROWING CELL STRUCTURES \n\nThe Growing Cell Structures are a self-organizing network an important feature \nof which is the ability to automatically find a problem specific network structure \nthrough a growth process. \nBasic building blocks are k-dimensional hypertetrahedrons: lines for k = 1, triangles \nfor k = 2, tetrahedrons for k = 3 etc. The vertices of the hypertetrahedrons are the \nneurons and the edges denote neighborhood relations. \n\nBy insertion and deletion of neurons the structure is modified. This is done during a \nself-organization process which is similar to that in Kohonen 's model. Input signals \ncause adaptation of the bmu and its topological neighbors. In contrast to Kohonen's \nmodel all parameters are constant including the width of the neighborhood around \n\n\f126 \n\nFritzke \n\nthe bmu where adaptation takes place. Only direct neighbors and the bmu itself \nare being adapted. \n\n3.1 \n\nINSERTION OF NEURONS \n\nTo determine the positions where new neurons should be inserted the concept of a \nresource is introduced. Every neuron has a local resource variable and new neurons \nare always inserted near the neuron with the highest resource value. New neurons \nget part of the resource of their neighbors so that in the long run the resource is \ndistributed evenly among all neurons. \n\nEvery input signal causes an increase of the resource variable of the best matching \nunit. Choices for the resource examined so far are \n\n\u2022 the summed quantization error caused by the neuron \n\u2022 the number of input signals received by the neuron \n\nAlways after a constant number of adaptation steps (e.g. 100) a new neuron is \ninserted. For this purpose the neuron with the highest resource is determined and \nthe edge connecting it to the neighbor with the most different reference vector is \n\"split\" by inserting the new neuron. Further edges are added to rebuild a structure \nconsisting only of k-dimensional hypertetrahedrons. \n\nThe reference vector of the new neuron is interpolated from the reference vectors \nbelonging to the ending points of the split edge. The resource variable of the new \nneuron is initialized by subtracting some resource from its neighbors) the amount of \nwhich is determined by the reduction of their Voronoi regions through the insertion. \n\n3.2 DELETION OF NEURONS \n\nBy comparing the fraction of all input signals which a specific neuron has received \nand the volume of its Voronoi region one can derive a local estimate of the probability \ndensity of the input vectors. \n\nThose neurons) whose reference vectors fall into regions of the input vector space \nwith a very low probability density) are regarded as \"superfluous)) and are removed. \nThe result are problem-specific network structures potentially consisting of several \nseparate sub networks and accurately modelling a given probability distribution. \n\n4 SIMULATION RESULTS \n\nA number of tests have been performed to evaluate the performance of the new \nmodel. One series is described in the following. \n\nThree methods have been compared. \n\na) Kohonen Feature Maps (KFM) \nb) Growing Cell Structures with quantization error as resource (GCS-l) \nc) Growing Cell Structures with number of input signals as resource (GCS-2) \n\n\fKohonen Feature Maps and Growing Cell Structures-a Performance Comparison \n\n127 \n\n[J \n\n[J \n\no \n\n[J \n\nc \n\nc \n\n[J \n\n[J \n\nDistribution A: \nThe probability density \nis uniform in the unit \nsquare \n\nDistribution B: \nThe probability density \nis uniform in the lOx \n10-field, by a factor 100 \nhigher in the 1 x I-field \nand zero elsewhere \n\ninside \n\nDistribution C: \nThe probability density \nis uniform \nthe \nseven lower squares, by \na factor 10 higher in the \ntwo upper squares and \nzero elsewhere. \n\nFigure 1: Three different probability distributions used for a performance compar(cid:173)\nison. Distribution A is very simple and has a form ideally suited for the Kohonen \nFeature Map which uses a square grid of neurons. Distribution B was chosen to \nshow the effects of a highly varying probability density. Distribution C is the most \nrealistic with a number of separate regions some of which have also different prob(cid:173)\nability densities. \n\nThese models were applied to the probability distributions shown in fig. 1. The Ko(cid:173)\nhonen model was used with a 10 x 10-grid of neurons. The Growing Cell Structures \nwere used to build up a two dimensional cell structure of the same size. This was \nachieved by stopping the growth process when the number of neurons had reached \n100. \n\nAt the end of the simulation the proposed criteria were measured as follows: \n\n\u2022 The topology preservation requires two properties. Property a) was mea(cid:173)\n\nsured by the topographical product recently proposed by Bauer e.a. for this \npurpose (Bauer & Pawelzik [1992]). Property b) was measured by com(cid:173)\nputing the mean edge length in the input space, i.e. the mean difference \nbetween reference vectors of directly neighboring neurons. \n\n\u2022 The distribution modelling was measured by generating 5000 test signals \naccording to the specific probability distribution and counting for every \nneuron the number of test signals it has been bmu for. The standard \ndeviation of all counter values was computed and divided by the mean \nvalue of the counters to get a normalized measure, the distribution error, \nfor the modelling of the probability distribution. \n\n\u2022 The error minimization was measured by computing the mean square quan(cid:173)\n\ntization error of the test signals. \n\nThe numerical results of the simulations are shown in fig. 2. Typical examples of \nthe final network structures can be seen in fig. 3. It can be seen from fig. 2 that the \n\n\f128 \n\nFritzke \n\nA \n\nmodel \nKFM \nGCS-1 \nGCS-2 \n\nB \n\n0.022 \n0.014 \n10.0111 \n\nC \n\n0.048 \n0.044 \n10.019 1 \n\na) topographical product \nB \nC \n0.84 \n0.90 \n\u00a7I] 10.591 \n0.73 \n1.57 \n\nmodel \nKFM \nGCS-1 \nGCS-2 \n\nA \n\n0.26 \n\nc) distribution error \n\nmodel \nKFM \nGCS-1 \nGCS-2 \n\nA \n\nB \n\nC \n\n0.092 \n10.056 1 \n0.071 \n\n0.110 \n0.015 \n10.0131 \n\n0.11 \n0.11 \n\nb) mean edge length \nA \n\nB \n\nC \n\n0.00077 \n0.00089 \n10.000551 10.000041 \n\n0.0020 \n0.0019 \n0.0019 \nd) quantization error \n\n0.00086 \n0.00010 \n\nmodel \nKFM \nGCS-1 \nGCS-2 \n\nFigure 2: Simulation results of the performance comparison. The model of Koho(cid:173)\nnen(KFM) and two versions of the Growing Cell Structures have been compared \nwith respect to different criteria. All criteria are such, that smaller values are better \nvalues. The best (smallest) value in each column is enclosed in a box. Simulations \nwere performed with the probability distributions A, Band C from fig. 1. \n\nmodel of Kohonen has superior values only for distribution A, which is very regular \nand formed exactly like the chosen network structure (a square). Since generally \nthe probability distribution is unknown and irregular, the distributions Band C are \nby far more realistic. For these distributions the Growing Cell Structures have the \nbest values. \n\nThe modelling of the distribution and the minimization of the quantization error \nare generally concurring objectives. One has to decide which objective is more \nimportant for the current application. Then the appropriate version ofthe Growing \nCell Structures can optimize with respect to that objective. For the complicated \ndistribution C, however, either version of the Growing Cell Structures performs for \nevery criterion better than Kohonen's model. \n\nEspecially notable is the low quantization error for distribution C and the error \nminimizing version (GCS-2) of the Growing Cell Structures (see fig. 2d). This \nvalue indicates a good potential for vector quantization. \n\n5 DISCUSSION \n\nOur investigations indicate that - w.r.t the proposed criteria - the Growing Cell \nStructures are superior to Kohonen's model for all but very carefully chosen trivial \nexamples. Although we used small examples for the sake of clarity, our experiments \nlead us to conjecture, that the difference will further increase with the difficulty and \nsize of the problem. \n\nThere are some other important advantages of our approach. First, all parameters \nare constant. This eliminates the difficult choice of a \"cooling schedule\" which \nis necessary in Kohonen's model. Second, the network size does not have to be \nspecified in advance. Instead the growth process can be continued until an arbitrary \nperformance criterion is met. To meet a specific criterion with Kohonen's model, \none generally has to try different network sizes. To start always with a very large \n\n\fKohonen Feature Maps and Growing Cell Structures-a Performance Comparison \n\n129 \n\nDistribution A \n\nDistribution B \n\nDistribution C \n\n-I--\n\n-f-\nv -\n\n-\" \n\na) \n\nb) \n\nc) \n\nFigure 3: Typical simulation results for the model of Kohonen and the two ver(cid:173)\nsions of the Growing Cell Structures. The network size is 100 in every case. The \nprobability distributions are described in fig. 1. \na) Kohonen Feature Map (KFM). For distributions Band C the fixed network \nstructure leads to long connections and neurons in regions with zero probability \ndensity. \nb) Growing Cell Structures, distribution modelling variant (GCS-1). The growth \nprocess combined with occasional removal of \"superfluous\" neurons has led to sev(cid:173)\neral sub networks for distributions Band C. For distribution B roughly half of \nthe neurons are used to model either of the squares. This corresponds well to the \nunderlying probability density. \nc) Growing Cell Structures, error minimizing variant (GCS-2). The difference to \nthe previous variant can be seen best for distribution B, where only a few neurons \nare used to cover the small square. \n\n\f130 \n\nFritzke \n\nnetwork is not a good solution to this problem, since the computational effort grows \nfaster than quadratically with the network size. \n\nCurrently applications of variants of the new method to image compression and \nrobot control are being investigated. Furthermore a new type of radial basis function \nnetwork related to (Moody & Darken [1989]) is being explored, which is based on \nthe Growing Cell Structures. \n\nREFERENCES \n\nBauer, H.- U. & K. Pawelzik [1992}, \"Quantifying the neighborhood preservation of \nself-organizing feature maps,\" IEEE Transactions on Neural Networks 3, \n570-579. \n\nFavata, F. & R. Walker [1991]' \"A study of the application of Kohonen-type neural \nnetworks to the travelling Salesman Problem,\" Biological Cybernetics 64, \n463-468. \n\nFritzke, B. [1991], \"Unsupervised clustering with growing cell structures,\" Proc. of \n\nIJCNN-91, Seattle, 531-536 (Vol. II). \n\nFritzke, B. [1992], \"Growing cell structures - a self-organizing network in k dimen(cid:173)\nsions,\" in Artificial Neural Networks II, I. Aleksander & J. Taylor, eds., \nNorth-Holland, Amsterdam, 1051-1056. \n\nFritzke, B. & P. Wilke [19911, \"FLEXMAP - A neural network with linear time and \nspace complexity for the traveling salesman problem,\" Proc. of IJCNN-91, \nSingapore, 929-934. \n\nKohonen, T. [19821, \"Self-organized formation of topologically correct feature \n\nmaps,\" Biological Cybernetics 43, 59-69. \n\nMoody, J. & C. Darken [19891, \"Fast Learning in Networks of Locally-Tuned Pro(cid:173)\n\ncessing Units,\" Neural Computation 1, 281-294. \n\nObermayer, K., H. Ritter & K. Schulten [1990J, \"Large-scale simulations of self(cid:173)\n\norganizing neural networks on parallel computers: application to biological \nmodeling,\" Parallel Computing 14,381-404. \n\nRitter, H.J. & T. Kohonen [1989], \"Self-Organizing Semantic Maps,\" Biological Cy(cid:173)\n\nbernetics 61,241-254. \n\nSchweizer, L., G. Parladori, G.L. Sicuranza & S. Marsi [1991}, \"A fully neural ap(cid:173)\nproach to image compression,\" in Artificial Neural Networks, T. Kohonen, \nK. Miikisara, O. Simula & J. Kangas, eds., North-Holland, Amsterdam, \n815-820. \n\n\f", "award": [], "sourceid": 694, "authors": [{"given_name": "Bernd", "family_name": "Fritzke", "institution": null}]}