{"title": "ART2/BP architecture for adaptive estimation of dynamic processes", "book": "Advances in Neural Information Processing Systems", "page_first": 169, "page_last": 175, "abstract": null, "full_text": "ART2/BP architecture for adaptive estimation of \n\ndynamic processes \n\nEinar S~rheim * \nDepartment of Computer Science \nUNIK, Kjeller \nUniversity of Oslo \nN-2007 Norway \nAbstract \n\nThe goal has been to construct a supervised artificial neural network that \nlearns incrementally an unknown mapping. As a result a network con(cid:173)\nsisting of a combination of ART2 and backpropagation is proposed and \nis called an \"ART2/BP\" network. The ART2 network is used to build \nand focus a supervised backpropagation network. The ART2/BP network \nhas the advantage of being able to dynamically expand itself in response \nto input patterns containing new information. Simulation results show \nthat the ART2/BP network outperforms a classical maximum likelihood \nmethod for the estimation of a discrete dynamic and nonlinear transfer \nfunction. \n\n1 \n\nINTRODUCTION \n\nMost current neural network architectures such as backpropagation require a cyclic \npresentation of the entire training set to converge. They are thus not very well suited \nfor adaptive estimation tasks where the training vectors arrive one by one, and where \nthe network may never see the same training vector twice. The ART2/BP network \nsystem is an attempt to construct a network that works well on these problems. \n\nMain features of our ART2/BP are: \n\n\u2022 implements incremental supervised learning \n\u2022 dynamically self-expanding \n\n*e-mail address:einar@tellus.unik.nooreinars@ifi.uio.no \n\n169 \n\n\f170 \n\nSorheim \n\n\u2022 learning of a novel training pattern does not wash away memory of previous \n\ntraining patterns \n\n\u2022 short convergence time for learning a new pattern \n\n2 BACKGROUND \n\nAdaptive estimation of nonlinear functions requires some basic features of the esti(cid:173)\nmation algorithm. \n\n1. Incremental learning \n\nThe input/output pairs arrive to the estimation machine one by one. By ac(cid:173)\ncumulating the input/output pairs into a training set and rerun the training \nprocedure at every arrival of a new input/output pair, one could use a conven(cid:173)\ntional method. Obvious disadvantages would however be \n\n\u2022 huge learning time required as the size of the training set increases . \n\u2022 an upper limit, N, on the number of elements in the training set will \nhave to be set. The training set will then be a gliding horizon of the N \nlast input/output pairs, and information prior to the N last input/output \npairs will be lost. \n\n2. Plasticity \n\nLearning of a new input/output pair should not wash away the memory of \npreviously learned nonconflicting input/output pairs. With most existing feed(cid:173)\nforward supervised nets this is hard to accomplish, though some efforts have \nbeen made (Otwell 90). Some networks, like the ART-family and RCN (Ryan \n1988) are plastic but they are self-organizing, not supervised. \n\nTo summarize: \nNeed a supervised network that learns incrementally the mapping of an unknown \nsystem and that can be used to predict future outputs. The system in question \nmaps analog vectors to analog vectors. \n\n3 COMBINED ARCHITECTURE \n\nIn the proposed network architecture an ART2 network controls a BP network, see \nFigure 1. \n\nThe BP-network consists of many relatively small subnetworks where the subnets \nare specialized on one particular domain of the input space. ART2 controls how the \ninput space is divided among the subnets and the total amount of sub nets needed. \n\nThe ART2 network analyzes the input part of the input/output pairs as they arrive \nto the system. For a given input pattern i:r, ART2 finds the category G:r which has \nthe closest resemblance to ~. If this resemblance is good enough, ~ is of category \nG:r and the LTM-weights of G:r are updated. The BP-subnetwork BP:r, connected \nto G:r, is as a consequence activatedt and relearning of BP:r is done. The learning \nset consists of a \"representative\" set of the neighbouring subnets patterns and a \nsmall number of the previous patterns belonging to category G:r. To summarize the \n\n.... \n\n\fART2IBP Architecture for Adaptive Estimation of Dynamic Processes \n\n171 \n\nalgorithm goes as follows: \n\n1. Send input vector to ART2 network \n2. ART2 classification. \n3. If in learning mode adjust ART2 LTM weights of the winning node. \n4. Send input to the back propagation network connected to the winning ART2 \n\nnode. \n\n5. If in learning mode: \n\n\u2022 find a representative training set. \n\u2022 do epoch learning on training set. \n\nOtherwise \n\n\u2022 compute output of the selected back propagation network. \n\n6. Go to 1. for new input vector. \n\nThe ART2/BP neural network can be used for adaptive estimation of nonlinear \ndynamic processes. The mapping to be estimated then is \nl( u(t), yet\u00bb~ \n\n(1) \n\nyet + ot) \nu(t) \nyet) \n\nf ~m \n\nf ~n \n\nThe input/output pairs will be i7J = [u(t) , yet), yet + ot)], denote the input part of \ni7J: i = [u(t) , yet)] and the output part of (0: 0 = yet + ot). \n\n4 ART2 MODIFIED \n\nART2 was developed by Carpenter& Grossberg see (Carpenter 1987) and (Carpen(cid:173)\nter 1988). ART2 categorizes arbitrary sequences of analog input patterns, and the \ncategories can be of arbitrary coarseness. For a detailed description of ART2, see \n(Carpenter 1987). \n\n4.1 MODIFICATION \n\nIn the standard ART2-algorithm input vectors (patterns) are normalized. For this \napplication it is not desired to classify parallel vectors of different magnitude as \nbelonging to the same category. By adding an extra element to the input vector \nwhere this element is simply \n\n(2) \n\nthe new input vector becomes \n\n(3) \nFrom a scaled vector of i: i = a :{ the original vector i could easily be found as : \n(4) \n\n-\n\n-\n\n\f172 \n\nSorheim \n\nand by using the augmented i as the input to ART2 instead of i one can at any \npoint in Fl( representation layer) and F2( categorization layer) generate the corre(cid:173)\nsponding non-normalized vector. The F2 node competition is modified so that the \nnode having bottom-up LTM weights with the smallest distance (distance being the \neuclidean norm) to the Fl layer pattern code wins the competition. The distance \ndJ of F2 node J is given by: \n\nIIv - zjll \nbeing the 12 - norm \nFl pattern code. \nbottom - up LTM weights of F2 node J \n\nv \nzj \n\n(5) \n\nReset is done by calculating the distance d between the Fl layer pattern code V and \n\n~ \nJ : \n\n(6) \nand comparing it to a largest acceptable bound p. If d > p the winning node is \ninhibited and a new node will be created. If d ~ p LTM-patterns of the winning \nnode J are modified (learning). \n\nd = IIv- ~I \n\n5 BACK PROPAGATION NETWORK \n\nThe backpropagation network used in this work is of the standard feedforward type, \nsee (Rumelhart 1986) . The number of hidden layers and nodes should be kept low \nin the subnetworks, for the problems in our simulations we used 1 hidden layer with \n2 nodes. As for training algorithms several different kinds have been tried: \n\n\u2022 Standard back propagation (SBP) \n\n\u2022 A modified back propagation (MBP) method similar to the one used in the \n\nBPS simulator from George Mason University. \n\n\u2022 Quickprop (Q). \n\n\u2022 A quasi-Newton method (BFGS). \n\nAll of these except SBP show similar performance in my test cases. \n\nThe BP-networks performs as an interpolator in this algorithm and any good inter(cid:173)\npolation algorithm can be used instead of BP. Approximation theory gives several \ninteresting techniques for approximation/interpolation of multidimensional func(cid:173)\ntions such as Radial Basis Functions and Hyper Basis Functions, for further detail \nsee (Poggio 90). These methods requires a representative training set where the \ninput part determines the location of centers in the input space. The ART2 alg<r \nrithm can be used for determining these centers in an adaptive way and thus making \npossible an incremental version of the approximation theory techniques. This idea \nhas not been tested yet, but is an interesting concept for further research. \n\n\fART2IBP Architecture for Adaptive Estimation of Dynamic Processes \n\n173 \n\n6 LEARNING \n\nLearning in ART2/BP is a two stage process. First the input patterns is sent to \nthe ART2 network for categorizing and learning. ART2 will then activate the \nBP subnetwork that is a local expert on patterns of the same category as the \ninput pattern, and learning of this subnetwork will occur. A training set that is \nrepresentative for the domain of the input space has to be found. Let a small \nnumber of the last categorized input/output pairs be allocated to its corresponding \nsubnet to provide a part of the training set. Denote such a set as LJOc, (C being \nthe category). Define the location ofF2 node J to be its bottom-up weights ;J. Let \nthe current input i~ define an origin, then find the F2 nodes closest to origin in each \nn-ant of the input space. Call this set of nodes N~ and the set oflast input/output \npairs stored in these nodes N JO~. The training set is then chosen to be: \nT~ = N _IO~ U LJO~ \nBefore training, the elements in T~ are scaled to increase accuracy and to accelerate \nlearning. BP-Iearning is then performed, the stopping criteria being a fixed error \nterm or a maximum number of iterations. \n\n7 ESTIMATION \n\nIn estimation mode learning in the network is turned off. Given an input thenetwork \nwill produce an output that hopefully will be close to the output of the real system. \nThe ART2-network selects a winning node in the same way as described before but \nnow the reset assembly is not activated. Then the input is fed to the corresponding \nBP subnetwork and its output is used as an estimate of the original functions output. \nBecause each subnetwork is scaled to cover the domain of the input space made up \nby the complex hull Co(T~) of its training set T~, the entire ART2/BP network will \ncover the complex hull C o(T) C ~n+m where: \nT= \n{set of all previous fs used to train the network} \nGood estimation/prediction can thus be expected if i ( Co(T). This means that if \nthe input vector i lies in a domain of the input space that has not been previously \nexplored by the elements in the training set, the network will generalize poorly. \n\n8 EXAMPLE \n\nThe ART2/BP network has been used to estimate a dynamic model of a tank filled \nwith liquid. The liquid level is sampled every 6t time interval and the ART2/BP \nnetwork is used to estimate the discrete dynamic nonlinear transfer function of the \nliquid level as a function of inlet liquid flow and previous liquid level. That is, we \nwant to find a good estimate j(.,.) of: \n\ny(t + 6t) \nu(t) \ny(t) \n\nf( u(t), y(t\u00bb \ninlet liquid flow at time t \nliquid level at time t \n\n(7) \n\n\f174 \n\nSorheim \n\no. 2 \n\n0.1 5 \n\no. 1 \n\n0.0 5 \n\n'\\ \n\n5 \n-0.0 \n\n-0. 1 \n\n\\ \n\n',J \n, \n\n~J \n.1 An \nw' \n.~ r VS] ' WH \nj \n\n, \n\nI,~ \nII! \n\n.A. \n\ne-\n\n150 \n\n.~, \n~Yce:::::::::J \n\n2100 \n\n250 \n\n= \n\nI \n\n300 \n\nblack line: ARMA model estimation error (y(t + 6t) - YARMA(t + 6t)) \ngrey line: ART2/BP estimation error (y(t + 6t) - YART2/BP(t + 6t\u00bb \n\nFigure 1: Comparison of the estimation error of the ARMA model and the \nART2/BP network \n\nTo increase the nonlinearities of the transfer function, the area of the tank varies \nwith a step function of the liquid level. The BP subnetworks have 2 input nodes, \n1 hidden layer with 2 neurons and a single neuron output layer. In the simulations \np = 0.04 and the last three categorized input/output pairs are stored at every \nsubnetwork. As the input space is 2-dimensional giving 4 neighbouring nodes the \nmaximum size of the training set 7 input/output pairs. After a learning period of \n1000 samples with random inlet flow, three test cases are run with the network in \nestimation mode. The network had then formed about 140 categories. The same \nset of simulation data is also run through an offline maximum likelihood method to \nestimate a linear ARMA model of the plant, see (Ljung 1983). / \n\nFigure 1 shows the simulation results of the three test cases where : \n\nsamples 1-100 : random input flow. \nsamples 101-200 : constant input flow at a low level. \nsamples 201-300 : constant input flow at a high level. \n\nIn Figure 1, the estimation errors of the two methods are compared. For the \nfirst 100 samples with stochastic input flow, the estimation error variance of the \n\n\fART2IBP Architecture for Adaptive Estimation of Dynamic ftocesses \n\n175 \n\nART2/BP network is roughly a factor 10 less than that of the ARMA-model. The \nperformance of ART2/BP is also significantly better for the constant input flow \ncases, here the ARMA model has an error of -- 0.02 while the ART2/BP-error is \n- 0.002. The overall improvement in estimation error is a reduction of roughly 0.1 \n. Also keep in mind that ART2/BP is compared to an offline maximum likelihood \nmethod while ART2/BP clearly is an online method. The online version of the \nmaximum likelihood would most probably have given a worse performance than the \noffline version. \n\n9 CONCLUSION/COMMENTS \n\nThe proposed ART2/BP neural network architecture offers some unique features \ncompared to backpropagation. It provides incremental learning and can be applied \nto truly adaptive estimation tasks. In our example it also outperforms a classical \nmaximum likelihood method for the estimation of a discrete dynamic nonlinear \ntransfer function. Future work will be the investigation of ART2/BP's properties for \nmultistep-ahead prediction of dynamic nonlinear transfer functions, and embedding \nART2/BP in a neural adaptive controller. \n\nAcknow ledgments \n\nSpecial thanks to Steve Lehar at Boston University for providing me with his ART2 \nsimulation program. It proved to be crucial for getting a quick start on ART2 and \nunderstanding the concept. \n\nReferences \n\nCarpenter, G.A. & Grossberg, S. (1987). ART2: Self-organization of stable cate(cid:173)\ngory recognition codes for analog input patterns. Applied Optics pp 4919-4930. \nCarpenter, G.A. & Grossberg, S. (1988). The ART of adaptive pattern recognition \nby a self-organizing neural network. Computer 21 pp 77-88. \nFahlman, S.E. (1988). Faster-Learning Variations on Back-Propagation: An Empir(cid:173)\nical Study. Proceedings of the 1988 Connectionist Models Summer School. Morgan \nKaufmann. \nLjung, L. & S~derstr~m (1983). Theory and practice of recursive identification. \nThe MIT press, Cambridge, MA. \n\nOtwell, K. (1990). Incremental backpropagation learning from novelty-based or(cid:173)\nthogonalization. Proceedings IJNN90 . \nPoggio, T., Girosi, F. (1990). Networks for Approximation and Learning. Proceed(cid:173)\nings of the IEEE,Vol. 78, No.9. \nRumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Parallel Distributed \nProcessing: Explorations in the microstructure of Cognition, Vol. 1. The MIT \nPress,Cambridge, MA. \nRyan, T. W. (1988). The resonance correlation network. Proceedings IJNN88. \n\n\f", "award": [], "sourceid": 303, "authors": [{"given_name": "Einar", "family_name": "S\u00f8rheim", "institution": null}]}