{"title": "Self Organizing Neural Networks for the Identification Problem", "book": "Advances in Neural Information Processing Systems", "page_first": 57, "page_last": 64, "abstract": null, "full_text": "57 \n\nSelf Organizing Neural Networks for the \n\nIdentification Problem \n\nManoel Fernando Tenorio \nSchool of Electrical Engineering \nPurdue University \nVV. Lafayette, UN. 47907 \ntenoriQ@ee.ecn.purdue.edu \n\nVVei-Tsih Lee \nSchool of Electrical Engineering \nPurdue University \nVV. Lafayette, UN. 47907 \nlwt@ed.ecn.purdue.edu \n\nABSTRACT \n\nThis work introduces a new method called Self Organizing \nNeural Network (SONN) algorithm and demonstrates its use in a \nsystem identification task. The algorithm constructs the network, \nchooses the neuron functions, and adjusts the weights. It is compared to \nthe Back-Propagation algorithm in the identification of the chaotic time \nseries. The results shows that SONN constructs a simpler, more \naccurate model. requiring less training data and epochs. The algorithm \ncan be applied and generalized to appilications as a classifier. \n\nI. INTRODUCTION \n\n1.1 THE SYSTEM IDENTIFICATION PROBLEM \n\nIn various engineering applications, it is important to be able to estimate, interpolate, \nand extrapolate the behavior of an unknown system when only its input-output pairs are \navailable. Algorithms which produce an estimation of the system behavior based on these \npairs fall under the category of system identification techniques. \n1.2 SYSTEM IDENTIFICATION USING NEURAL \nNETWORKS \nA general form to represent systems, both linear and nonlinear, is the Kolmogorov(cid:173)\nGarbor polynomial tGarbor. 19611 shown below: \n\ny = ao + L aixi + L L aijxiXj + ... \n\n1 \n\ni \n\nJ \n\n(1) \n\n\f58 \n\nTenorio and Lee \n\nwhere the y is the output. and x the input to the system. [Garbor .1961] proposed a \nlearning method that adjusted the coefficient of (1) by minimizing the mean square error \nbetween each desired output sample and the actual output \n\nThis paper describes a supervised learning algorithm for structure construction and \nadjustment Here. systems which can be described by (1) are presented. The computation \nof the function for each neuron performs a choice from a set of possible functions \npreviously assigned to the algorithm. and it is general enough to accept a wide range of \nboth continuous and discrete functions. In this work. the set is taken from variants of the \n2-input quadratic polynomial for simplicity. although there is no requirement making it \nso. This approach abandons the simplistic mean-square error for perfonnance measure in \nfavor of a modified Minimum Description Length (MOL) criterion [Rissanen,1975]. with \nprovisions to measure the complexity of the model generated. The algorithm searches for \nthe simplest model which generates the best estimate. The modified MDL. from hereon \nnamed the Structure Estimation Criterion (SEC). is applied hierarchically in the selection \nof the optimal neuron transfer function from the function set. and then used as an optimal \ncriterion to guide the construction of the structure. The connectivity of the resulting \nstructure is arbitrary. and under the correct conditions [Geman&Geman. 84] the estimation \nof the struCture is optimal in tenns of the output error and low function complexity. This \napproach shares the same spirit of GMDH-type algorithms. However, the concept of \nparameter estimation from Information Theory. combined with a stochastic search \nalgorithm - Simulated Annealing. was used to create a new tool for system identification. \n\nThis work is organized as follows: section II presents the problem formulation and the \nSelf Organizing Neural Network (SONN) algorithm description; section III describes the \nresults of the application of SONN to a well known problem tested before using other \nneural network algorithms [Lapede8&Farber. 1987; Moody. 1988]; and fmally, section IV \npresents a discussion of the results and future directions for this work. \n\nII. THE SELF ORGANIZING NEURAL NETWORK \n\nALGORITHM \n\n11.1 SELF ORGANIZING STRUCTURES \n\nThe Self Organizing Neural Network (SONN) algorithm performs a search on the model \nspace by the construction of hypersurfaces. A network of nodes. each node representing a \nhypersurface. is organized to be an approximate model of the real system. SONN can be \nfully characterized by three major components. which can be modified to incorporate \nknowledge about the process: (1) a generating rule of the primitive neuron transfer \nfunctions. (2) an evaluation method which accesses the quality of the model. and. (3) a \nstructure search strategy. Below. the components of SONN are discussed. \nll.2 THE ALGORITHM STRUCTURE \n\n\fSelf Organizing Neural Networks \n\n59 \n\n11.2.1 The Generating Rule \nGiven a set of observations S: \n\nS = {(Xl, Yl),(Xl, Yl)\",,(XI, YI)} \nYi = f(XV + 11 \n\ngenerated by \n(2) \n\nwhere f(.) is represented by a Kolmogorov-Garbor polynomial. and the random variable \n11 is nonnally distributed. N(O.l). The dimensions of Y is m. and the dimensions of X is \nn. Every component Yk of Y fonns a hypersurface Yk = fk(X) in the space of dim (X) + \n1. The problem is to fmd f(.). given the observations S. which is a corrupted version of \nthe desired function. In this work. the model which estimates f(.) is desired to be as \naccurate and simple (small number of parameters. and low degree of non linearity) as \npossible. \n\nThe approach taken here is to estimate the simplest model which best describes f(.) by \ngenerating optimal functions for each neuron. which can be viewed as the construction of \na hypersurface based on the observed data. It can be described as follows: given a set of \nobservations S; use p components of the n dimensional space of X to create a \nhypersurface which best describes Yk = f(X). through a three step process. First, given X \n= [xl' x2' x3' .... xn) and Yk' and the mapping '\u00a5 n: [Xl' x2' x3' .... Xn) -> [x'\u00a5(1)' \nx'\u00a5(2)' x,\u00a5(3)' .... x'\u00a5(n\u00bb)' construct the hypersurface hi (x'\u00a5(1)' x'\u00a5(2)' x,\u00a5(3)' .... \nx'\u00a5(n\u00bb (hi after the fIrst iteration) of p+ 1 dimensions. where '\u00a5 n is a projection from n \ndimensions to p dimensions. The elements of the domain of '\u00a5 n are called tenninals. \nSecond. If the global optimality criterion is reached by the construction of hi(x'\u00a5(l)' \nx'\u00a5(2)' x,\u00a5(3)' .... x'\u00a5(n\u00bb' then stop. otherwise continue to the third step. Thud. \ngenerate from [Xl' x2' x3' .... xn.hl(x'\u00a5(l)' x'\u00a5(2)' x,\u00a5(3)' .... x'\u00a5(n\u00bb) a new p+l \ndimensional hypersurface hi+ I through the extended mapping '\u00a5 n+ 1 (.). and reapply the \nsecond step.The resulting model is a multilayered neural network whose topology is \narbitrarily complex and created by a stochastic search guided by a structure estimation \ncriterion. For simplicity in this work. the set of prototype functions (F) is restricted to be \n2-input quadratic surfaces or smaller. with only four possible types: \n\ny = 8o+alxl +a2x2 \ny = 8o+alxl +a2x2+a3xlx2 \nY = 3o+alxl+a2x1 \nY = 8o+alxl+a2x2+a3xlx2+~x1+a5x~ \n\n(3) \n(4) \n\n(5) \n\n(6) \n11.2.2 Evaluation or the Model Based on the MDL Criterion \nThe selection rule (T) of the neuron transfer function was based on a modifIcation of the \nMinimal Description Length (MOL) information criterion. In [Rissanen. 1975] the \nprinciple of minimal description for statistical estimation was developed. The MDL \nprovides a trade-off between the accuracy and the complexity of the model by including \nthe structure estimation tenn of the fInal model. The final model (with the minimal \n\n\f60 \n\nTenorio and Lee \n\nMOL) is optimum in the sense of being a consistent estimate of the number of \nparameters while achieving the minimum error [Rissanen.1980]. Given a sequence of \nobservation xl,x2,x3 \u2022...\u2022 xN from the random variable X. the dominant tenn of the MDL \nin [Rissanen. 1975] is: \n\nMDL = -log f(xI8) + 0.5 k log N \n\nwhere f(xI8) is the estimated probability density function of the model. k is the number \nof parameters. and N is the number of observations. The first tenn is actually the negative \nof the maximum likelihood (ML) with respect to the estimated parameter. The second \nterm describes the structure of the models and it is used as a penalty for the complexity of \nthe model. In the case of linear polynomial regression. the MOL is: \n\nMDL = - 0.5 N log S~ + 0.5 k log N \nwhere k is the number of coefficients in the model selected. \n\n(8) \n\nIn the SONN algorithm. the MDL criterion is modified to operate both recursively and \nhierarchically. First. the concept of the MDL is applied to each candidate prototype \nsurface for a given neuron. Second. the acceptance of the node. based on Simulated \nAnnealing. uses the MDL measure as the system energy. However. since the new neuron \nis generated from terminals which can be the output of other neurons. the original \ndefmition of the MDL is unable to compute the true number of system parameters of the \nfinal function. Recall that due to the arbitrary connectivity. feedback loops and other \nit is non trivial to compute the number of parameters in the entire \nconfigurations \nstructure. In order to reflect the hierarchical nature of the model. a modified MDL called \nStructure Estimation Criterion (SEC) is used in conjunction with an heuristic estimator \nof the number of parameters in the system at each stage of the algorithm. A \ncomputationally efficient heuristic for the estimation of the number of parameters in the \nmodel is based on the fact that SONN creates a tree-like structure with multiple roots at \nthe input terminals. Then k. in expression (8). can be estimated recursively by: \n\nk = kL + kR + (no. of parameters of the current node) \n\n(9) \n\nwhere kL and kR are the estimated number of parameters of the left and right parents of \nthe current node. respectively. This heuristic estimator is neither a lower bound nor an \nupper bound of the true number of parameter in the model. \n\n11.2.3 The SONN Algorithm \nTo explain the algorithm. the following definitions are necessary: Node - neuron and the \nassociated function. connections. and SEC; BASIC NODE - A node for the system input \nvariable; FRONT NODE - A node without children; IN1ERMIDIA TE NODE - The nodes \nthat are neither front or basic nodes; STATE - The collection of nodes. and the \nconfiguration of their interconnection; INITIAL STATE (SO - The state with only basic \nnodes; PARENT AND CHILD STATE - The child state is equal to the parent state except \nfor f a new node and its interconnection generated on the parent state structure; \nNEIGHBOR STATE - A state that is either a child or a parent state of another; ENERGY \n\n\fSelf Organizing Neural Networks \n\n61 \n\nOF THE STATE (SEC-Si) - The energy of the state is defined as the minimum SEC of \nall the front nodes in that state. \n\nIn the SONN algorithm. the search for the correct model structure is done via Simulated \nAnnealing. Therefore the algorithm at times can accept partial structures that look less \nthan ideal. In the same way. it is able to discard partially constructed substructures in \nsearch for better results. The use of this algorithm implies that the node accepting rule \n(R) varies at run-time according to a cooling temperature m schedule. The SONN \nalgorithm is as follows: \nInitialize T, and S[ \nRepeat \n\nRepeat \n\nSj = generate (Si), \nIf accept ( SEC_Sj. SEC_Si, T) then Si = Sj. \nWUiI the number of new neurons is greater than N. \nDecrease the temperature T. \n\n- application of P. \n- application ofR. \n\nuntil The temperature T is smaller than tend (Terminal temperature for Simulated \nAnnealing). \n\nEach neuron output and the system input variables are called terminals. Tenninals are \nviewed as potential dimensions from which a new hypersurface can be constructed. Every \ntenninal represents the best tentative to approximate the system function with the \navailable infmnatioo. and are therefore treated equally. \n\nlll. EXAMPLE - THE CHAOTIC TIME SERIES \n\nIn the following results. the chaotic time series generated by the Mackay-Glass \ndifferential equations was used. The SONN with the SEC. and its heuristic variant were \nused to obtain the approximate model of the system. The result is compared with those \nobtained by using the nonlinear signal processing method [LapedeS&Farber. 1987] . The \nadvantages and disadvantages of both approaches are analyzed in the next section. \n111.1 Structure of the Problem \n\nThe MacKay-Glass differential equation used here can be described as: \n\ndX(t) = a x(t - t) _ b x(t) \nat \n\n1 + x10(t - t) \n\n(10) \n\nBy setting a = 0.2. b = 0.1. and t = 17. a chaotic time series with a' strange attractor of \nfractal dimension about 3.5 will be produced [Lapedes&Farber. 1987] . To compare the \naccuracy of prediction the nonnalized root mean square error is used as a perfonnance \nindex: \n\n\f62 \n\nTenorio and Lee \n\nnnalized RMSE -\n\nRMSE \n\nno \n\n- Standard Deviation \n\n(ll) \n\n111.2. SONN WITH THE HEURISTIC SEC (SONN.H) \n\nIn the following examples, a modified hewistic version of the SEC is used. The estimator \nof the number of parameters is given by (9), and the fmal configuraion is shown in figure \n1. \n\n111.2.1 Node 19 \n\nIn this subsection, SONN is allowed to generate up to the 19th accepted node. In this \nfirst version of the algorithm, all neurons have the same number of interconnections. and \ntherefore draw their transfer function from the same pool of functions .. Generalizations of \nthe algorithm can be easily made to accommodate multiple input functions, and neuron \ntransfer function assignment being drawn from separate pools. In this example, the first \none hundred points of the time series was used for training, and samples 101 through 400 \nused for prediction testing. The total number of weights in the network is 27. The \nperformance index average 0.07. The output of the network is overlapped in the figure 2 \nwith the original time series. \nFor comparison purposes, a GDR network with the structure used in [LapedeS&Farber, \n1987] is trained for 6500 epochs. The training data consisted of the first 500 points of the \ntime series, and the testing data ran from the 501st sample to the 832nd. The total \nnumber of weights is 165. and the fmal performance index equal to 0.12. This was done \nto give both algorithms similar computational resources. Figure 3 shows the original \ntime series overlapped with the GDR network output. \n\nIll.2.2 NODE 37 \nIn this subsection, the model chosen was formed by the 37th accepted node. The network \nwas trained in a similar manner to the flfSt example, sioce it is part of the same run. The \nfinal number of weights is 40, and the performance index 0.018. Figure 4 shows the \noutput of the network overlapped with the original time series. Figure 5 shows the GDR \nwith 11,500 epochs. Notice that in both cases, the GDR network demands 150 \nconnections and 150 weights. as compared to 12 connections and 27 weights for the first \nexample and 10 connections and 40 weights for the second example. The comparison of \nthe performance of different models is listed in figure 6. \n\nIV. Conclusion and Future Work \n\nIn this study, we proposed a new approach for the identification problem based on a \nflexible, self-organizing neural network (SONN) structure. The variable structure provides \nthe opportunity to search and construct the optimal model based on input-output \nobservations. The hierarchical version of the MDL, called the structure estimation criteria, \n\n\fSelf Organizing Neural Networks \n\n63 \n\nwas used to guide the trade-off between the model complexity and the accuracy of the \nestimation. The SONN approach demonstrates potential usefulness as a tool for system \nidentification through the example of modeling a chaotic time series. \n\nREFERENCE \n\nGarber, D .\u2022 eL al. ,\"A universal nonlinear filter, predicator and simulator which \noptimizes itsekf by a learning process,\" IEE Proc.,18B, pp. 422-438, 1961 \n\nRissanenJ. \"Modeling by shortest data description,\" Automatica. vo1.14, pp. 465-\n471.1975 \n\nGemen, S, and Gernen D., \"Stochastic relaxation, gibbs deisribution, and the bayesian \nrestoration of images.\" IEEE PAMI .\u2022 PAMI-6,pp.721-741. 1984 \nLapedes.A. and Farber, R. ,\"Nonlinear signal processing using neural networks: \nPredication and system modeling,\" TR.. LA-UR-87-2662. 1987 \n\nMoody. J. This volume \n\nRissanen:,,J. \"Consistent order estimation of autoregressive processing by shortest \ndescription of data.\" Analysis and optimization of stochastic system. Jacobs et. al. Eds. \nN.Y. Academic. 1980 \n\nFigure 1. The 37th State Generated \n--------_ .. _ .. - .. _---_. \n\nS\"s._IO.\u00b7SOHNI_ 191 \n\n0> \n\no. \n\no. \n\n,.,. \n\n\u2022\u2022. f..---.....---....---__.-----.------I \n\n'. \nFigure 2. SONN 19th Model. P.I. = 0.06 \n\n, \n0'\" \n\n'.0 \n\n,IIA \n\n\f64 \n\nTenorio and Lee \n\n..------------ -- ------. \n\n+----~-- - _- .- .-~ -. \nFigure 3. GDR after 6.500 Epochs. P.I. = 0.12 \n\n- - l \nnn. \n\n~nn \n\n'lft \n\nno \n\nFigure 4. SONN 37th Model. P.I. = 0.038 \n\n\".0 \n\n'.00 \n\n~------------------- . ------\n\n<; \u2022\u2022 1_ 10 .. \"ICk P'''Dlq1I!O\" -III II 500 EDOCM \n\n, \\ \nr , \n\n.0 \n\n, \n, \n\n. no \n\n.. \n.. \n\n~ \n\n\\ Ii \\ I rl \n\n\\ \ni \n\n\\ \n' \n\n~ ~o \n\n-. \n\n. no \n\n, \u2022 ...-()t,... \n\n\u00b7,-...C)o .... \n\nOOD \n\nFigure 5. GDR after 11.500 Epochs. P.I. = 0.018 \n\nComparison ollhe Perform alice Index \n\n014~-------------------------. \n\n0.12 \n\n0. 10 \n\n002 \n\n000 \n\n.o. _._._ .. _~ .. _._._._._.-.. -.~.,..~ \n\n.,.-._._.;,_.\", .......... \n\n.............. ~ ........ \n\n.. ............... ...................... . ... -_.... ..... . . ... . -. .. .. ~ ..... . \n\n002 _______ - - - - - - - - ' - - -\n\nW' I>:'UU t: I'''~''~ \n~Vt~tllfl\"\"Uaj l~, \nSOliN (o .. ~j ~7) \nBP \\ I :;W Ep~I\" \n\nFigure 6. Perfonnance Index Versus the Number of Predicted Points \n\n12u \n\n\f", "award": [], "sourceid": 149, "authors": [{"given_name": "Manoel", "family_name": "Tenorio", "institution": null}, {"given_name": "Wei-Tsih", "family_name": "Lee", "institution": null}]}