{"title": "Learning Fuzzy Rule-Based Neural Networks for Control", "book": "Advances in Neural Information Processing Systems", "page_first": 350, "page_last": 357, "abstract": null, "full_text": "Learning Fuzzy Rule-Based Neural \n\nNetworks for Control \n\nCharles M. Higgins and Rodney M. Goodman \n\nDepartment of Electrical Engineering, 116-81 \n\nCalifornia Institute of Technology \n\nPasadena, CA 91125 \n\nAbstract \n\nA three-step method for function approximation with a fuzzy sys(cid:173)\ntem is proposed. First, the membership functions and an initial \nrule representation are learned; second, the rules are compressed \nas much as possible using information theory; and finally, a com(cid:173)\nputational network is constructed to compute the function value. \nThis system is applied to two control examples: learning the truck \nand trailer backer-upper control system, and learning a cruise con(cid:173)\ntrol system for a radio-controlled model car. \n\n1 \n\nIntroduction \n\nFunction approximation is the problem of estimating a function from a set of ex(cid:173)\namples of its independent variables and function value. If there is prior knowledge \nof the type of function being learned, a mathematical model of the function can be \nconstructed and the parameters perturbed until the best match is achieved. How(cid:173)\never, if there is no prior knowledge of the function, a model-free system such as a \nneural network or a fuzzy system may be employed to approximate an arbitrary \nnonlinear function. A neural network's inherent parallel computation is efficient \nfor speed; however, the information learned is expressed only in the weights of the \nnetwork. The advantage of fuzzy systems over neural networks is that the informa(cid:173)\ntion learned is expressed in terms of linguistic rules. In this paper, we propose a \nmethod for learning a complete fuzzy system to approximate example data. The \nmembership functions and a minimal set of rules are constructed automatically from \nthe example data, and in addition the final system is expressed as a computational \n\n350 \n\n\fLearning Fuzzy Rule-Based Neural Networks for Control \n\n351 \n\nPos \n\n5.0 \n\n0 \n\n-1.0 \n1.0 \nVariable Value \n\nFigure 1: Membership function example \n\n(neural) network for efficient parallel computation of the function value, combining \nthe advantages of neural networks and fuzzy systems. The proposed learning algo(cid:173)\nrithm can be used to construct a fuzzy control system from examples of an existing \ncontrol system's actions. \n\nHereafter, we will refer to the function value as the output variable, and the inde(cid:173)\npendent variables of the function as the input variables. \n\n2 Fuzzy Systems \n\nIn a fuzzy system, a function is expressed in terms of membership functions and \nrules. Each variable has membership functions which partition its range into over(cid:173)\nlapping classes (see figure 1). Given these membership functions for each variable, \na function may be expressed by making rules from the input space to the output \nspace and smoothly varying between them. \n\nIn order to simplify the learning of membership functions, we will specify a number \nof their properties beforehand. First, we will use piecewise linear membership func(cid:173)\ntions. We will also specify that membership functions are fully overlapping; that is, \nat any given value of the variable the total membership sums to one. Given these \ntwo properties of the membership functions, we need only specify the positions of \nthe peaks of the membership functions to completely describe them. \n\nWe define a fuzzy rule as if y then X, where y (the condition side) is a conjunction \nin which each clause specifies an input variable and one of the membership func(cid:173)\ntions associated with it, and X (the conclusion side) specifies an output variable \nmembership function. \n\n3 Learning a Fuzzy System from Example Data \n\nThere are three steps in our method for constructing a fuzzy system: first, learn the \nmembership functions and an initial rule representation; second, simplify (compress) \nthe rules as much as possible using information theory; and finally, construct a \ncomputational network with the rules and membership functions to calculate the \nfunction value given the independent variables. \n\n\f352 \n\nHiggins and Goodman \n\n3.1 Learning the Membership Functions \n\nBefore learning, two parameters must be specified. First, the maximum allowable \nRMS error of the approximation from the example data; second, the maximum \nnumber of membership functions for each variable. The system will not exceed \nthis number of membership functions, but may use fewer if the error is reduced \nsufficiently before the maximum number is reached. \n\n3.1.1 Learning by Successive Approximation to the Target Function \n\nThe following procedure is performed to construct membership functions and a set \nof rules to approximate the given data set. All of the rules in this step are eel/(cid:173)\nbased, that is, they have a condition for every input variable; there is a rule for \nevery combination of input variables (eeIQ. \n\nWe begin with input membership functions at input extrema. The closest example \npoint to each \"corner\" of the input space is found and a membership function for \nthe output is added at its value at the corner point. The initial rule set contains \na rule for each corner, specifying the closest output membership function to the \nactual value at that corner. \n\nWe now find the example point with the greatest RMS error from the current model \nand add membership functions in eaeh variable at that point. Next, we construct \na new set of rules to approximate the function. Constructing rules simply means \ndetermining the output membership function to associate with each cell. While \nconstructing this rule set, we also add any output membership functions which are \nneeded. The best rule for a given cell is found by finding the closest example point \nto the rule (recall each rule specifies a point in the input space). If the output \nvalue at this point is \"too far\" from the closest output membership function value, \nthis output value is added as a new output membership. After this addition has \nbeen made, if necessary, the closest output membership function to the value at the \nclosest point is used as the conclusion of the rule. At this point, if the error threshold \nhas been reached or all membership functions are full, we exit. Otherwise, we go \nback to find the point with the greatest error from the model and iterate again. \n\n3.2 Simplifying the Rules \n\nIn order to have as simple a fuzzy system as possible, we would like to use the min(cid:173)\nimum possible number of rules. The initial cell-based rule set can be \"compressed\" \ninto a minimal set of rules; we propose the use of an information-theoretic algorithm \nfor induction of rules from a discrete data set [1] for this purpose. The key to the \nuse of this method is the interpretation of each of the original rules as a discrete \nexample. The rule set becomes a discrete data set which is input to a rule-learning \nalgorithm. This algorithm learns the best rules to describe the data set. \n\nThere are two components of the rule-learning scheme. First, we need a way to tell \nwhich of two candidate rules is the best. Second, we need a way to search the space \nof all possible rules in order to find the best rules without simply checking every \nrule in the search space. \n\n\fLearning Fuzzy Rule-Based Neural Networks for Control \n\n353 \n\n3.2.1 Ranking Rules \n\nSmyth and Goodman[2] have developed an information-theoretic measure of rule \nvalue with respect to a given discrete data set. This measure is known as the \nj-measure; defining a rule as if y then X, the j-measure can be expressed as follows: \n\n. \np(Xly) \nJ(Xly) = p(Xly) log2( p(X) ) + p(Xly) log2( p(X) ) \n\np(Xly) \n\n-\n\n[2] also suggests a modified rule measure, the J-measure: \n\nJ(Xly) = p(y)j(Xly) \n\nThis measure discounts rules which are not as useful in the data set in order to \nremove the effects of \"noise\" or randomness. The probabilities in both measures \nare computed from relative frequencies counted in the given discrete data set. \n\nUsing the j-measure, examples wilt be combined only when no error is caused in the \nprediction ofthe data set. The J-measure, on the other hand, will combine examples \neven if some prediction ability of the data is lost. If we simply use the j-measure \nto compress our original rule set, we don't get significant compression. However, \nwe can only tolerate a certain margin of error in prediction of our original rule set \nand maintain the same control performance. In order to obtain compression, we \nwish to allow some error, but not so much as the J-measure will create. We thus \npropose the following measure, which allows a gradual variation of the amount of \nnoise tolerance: \n\n-ax \nL(Xly) = f(p(y),a)j(XIY) where !(x,a) = 1- e- a \n\n-e \n\nI\n\nThe parameter a may be set at 0+ to obtain the J-measure since !(x,O+) = x or \nat 00 to obtain thej-measure, since f(x, 00) = 1 (x> 0). Any value ofa between \no and 00 will result in an amount of compression between that of the J-measure \nand the j-measure; thus if we are able to tolerate some error in the prediction of \nthe original rule set, we can obtain more compression than the j-measure could give \nus, but not as much as the J-measure would require. We show an example of the \nvariation of a for the truck backer-upper control system in section 4.1. \n\n3.2.2 Searching for the Best Rules \n\nIn [1], we presented an efficient method for searching the space of all possible rules to \nfind the most representative ones for discrete data sets. The basic idea is that each \nexample is a very specific (and quite perfect) rule. However, this rule is applicable \nto only one example. We wish to generalize this very specific rule to cover as many \nexamples as possible, while at the same time keeping it as correct as possible. The \ngoodness-measures shown above are just the tool for doing this. If we calculate the \n\"goodness\" of all the rules generated by removing a single input variable from the \nvery specific rule, then we will be able to tell if any of the slightly more general \nrules generated from this rule are better. If so, we take the best and continue in this \nmanner until no more general rule with a higher \"goodness\" exists. When we have \nperformed this procedure on the very specific rule generated from each example \n(and removed duplicates), we will have a set of rules which represents the data set. \n\n\f354 \n\nHiggins and Goodman \n\nLateral inhibitory connecti~ns \n\nInput \nMembership \nFunctions \n\nRules \n\nDefuzzification \n\nOutput \nMembership \nFunctions \n\nFigure 2: Computational network constructed from fuzzy system \n\n3.3 Constructing a Network \n\nConstructing a computational network to represent a given fuzzy system can be \naccomplished as shown in figure 2. From input to output, layers represent input \nmembership functions, rules, output membership functions, and finally defuzzifica(cid:173)\ntion. A novel feature of our network is the lateral links shown in figure 2 between \nthe outputs of various rules. These links allow inference with dependent rules. \n\n3.3.1 The Layers of the Network \n\nThe first layer contains a node for every input membership function used in the rule \nset. Each of these nodes responds with a value between zero and one to a certain \nregion of the input variable range, implementing a single membership function. \nThe second layer contains a node for each rule - each of these nodes represents \na fuzzy AND, implemented as a product. The third layer contains a node for \nevery output membership function. Each of these nodes sums the outputs from \neach rule that concludes that output fuzzy set. The final node simply takes the \noutput memberships collected in the previous layer and performs a defuzzification \nto produce the final crisp output by normalizing the weights from each output node \nand performing a convex combination with the peaks of the output membership \nfunctions. \n\n3.3.2 The Problem with Dependent Rules and a Solution \n\nThere is a problem with the standard fuzzy inference techniques when used with \ndependent rules. Consider a rule whose conditions are all contained in a more spe(cid:173)\ncific rule (i.e. one with more conditions) which contradicts its conclusion. Using \nstandard fuzzy techniques, the more general rule will drive the output to an inter(cid:173)\nmediate value between the two conclusions. What we really want is that a more \ngeneral rule dependent on a more specific rule should only be allowed to fire to \nthe degree that the more specific rule is not firing. Thus the degree of firing of the \nmore specific rule should gate the maximum firing allowed for the more general \nrule. This is expressed in network form in the links between the rule layer and the \noutput membership functions layer. The lateral arrows are inhibitory connections \nwhich take the value at their input, invert it (subtract it from one), and multiply \nit by the value at their output. \n\n\fLearning Fuzzy Rule-Based Neural Networks for Control \n\n355 \n\n'---- Truck and Trailer \n\n'---(cid:173)\n\n'----\n\n'----\n\nCab \nAngle \n---\n\n--)--Truck \n\n'----\n\nAngle \n\n'----\n\n'----\n\nLoading \nDock \n\nt \n\nY position \n(of truck rear) \n\nFigure 3: The truck and trailer backer-upper problem \n\n4 Experimental Results \n\nIn this section, we show the results of two experiments: first, a truck backer-upper \nin simulation; and second, a simple cruise controller for a radio-controlled model \ncar constructed in our laboratory. \n\n4.1 Truck and Trailer Backer-Upper \n\nJenkins and Yuhas [3] have developed by hand a very efficient neural network for \nsolving the problem of backing up a truck and trailer to a loading dock. The truck \nand trailer backer-upper problem is parameterized in figure 3. \n\nThe function approximator system was trained on 225 example runs of the Yuhas \ncontroller, with initial positions distributed symmetrically about the field in which \nthe truck operates. In order to show the effect of varying the number of membership \nfunctions, we have fixed the maximum number of membership functions for the y \nposition and cab angle at 5 and set the maximum allowable error to zero, thus \nguaranteeing that the system will fill out all of the allowed membership functions. \nWe varied the maximum number of truck angle membership functions from 3 to 9. \nThe effects of this are shown in figure 4. Note that the error decreases sharply and \nthen holds constant, reaching its minimum at 5 membership functions. The Yuhas \nnetwork performance is shown as a horizontal line. At its best, the fuzzy system \nperforms slightly better than the system it is approximating. \n\nFor this experiment, we set a goal of 33% rule compression. We varied the parameter \na in the L-measure for each rule set to get the desired compression. Note in figure 4 \nthe performance of the system with compressed rules. The performance is in every \ncase almost identical to that of the original rule sets. The number of rules and the \namount of rule compression obtained can be seen in table 1. \n\n4.2 Cruise Controller \n\nIn this section, we describe the learning of a cruise controller to keep a radio con(cid:173)\ntrolled model car driving at a constant speed in a circle. We designed a simple PD \ncontroller to perform this task, and then learned a fuzzy system to perform the same \ntask. This example is not intended to suggest that a fuzzy system should replace \na simple PD controller, since the fuzzy system may represent far more complex \n\n\f356 \n\nHiggins and Goodman \n\n, .. \n, .. \n\n... \n... \n\n1\\ \n\\ \n\\ \n\\F IIIVS .. ... \n\\ \n\\. \\ \n1\\ \n\n\\ \n\n\\ \n\n\\ \n\nI~ ~ \n\nI\"-PO -\n\nYuI iooSy-\n\nNO> \n\nMO> \n\nDO> \n\n,0> \n\n.0> \n\n,0> \n\n1\\ \n\\ \n\\ \n\\ 1\"'\",1 III \n\\\\ \\ \n\\ \n. ... ~ \n\nc;jj:jj ~ \n\n'1'.:'\" \n\nYuba Sv ... \n\n-~~----\n\n.. \n\nS \n\n, \n\nJ \n\n\u2022 \n\n~w~W\u00ab~Fm=~~~_ \n\na) Cmtrol error: final y positim \n\n.. \n\n5 \n\n, \n\n7 \n\nI \n\nNumw ofwct qIo m=bcnhip tunCllona \nb) Cmtrol error: fmal truck angle \n\nFigure 4: Results of experiments with the truck backer-upper \n\nN umber of Rules Cell-Based \n\nCompressed \n\nCompressIOn \n\nN umber of truck angle membership functions \n3 \n75 \n48 \n36% 33% \n\n7 \n9 \n225 \n175 \n154 \n114 \n35% 31% 32% \n\n6 \n150 \n100 \n33% \n\n5 \n125 \n86 \n31% \n\n4 \n100 \n67 \n\n8 \n200 \n138 \n\nTable 1: Number of rules and compression figures for learned TBU systems \n\nfunctions, but rather to show that the fuzzy system can learn from real control data \nand operate in real-time. \n\nThe fuzzy system was trained on 6 runs of the PD controller which included runs \ngoing forward and backward, and conditions in which the car's speed was perturbed \nmomentarily by blocking the car or pushing it. Figure 5 shows the error trajectory \nof both the hand-crafted PD and learned fuzzy control systems from rest. The car \nbuilds speed until it reaches the desired set point with a well-damped response, then \nholds speed for a while. At a later time, an obstacle was placed in the path of the \ncar to stop it and then removed; figure 5 shows the similar recovery responses of \nboth systems. It can be seen from the numerical results in table 2 that the fuzzy \nsystem performs as well as the original PD controller. \n\nNo compression was attempted because the rule sets are already very small. \n\nTime from 90% error to 10% error (s) \nRMS error at steady state (uncal) \nTime to correct after obstacle (s) \n\nPD Controller Learned Fuzzy System \n0.9 \n59 \n6.2 \n\n0.7 \n45 \n6.2 \n\nTable 2: Analysis of cruise control performance \n\n\fLearning Fuzzy Rule-Based Neural Networks for Control \n\n357 \n\n\u2022 00 \n\n... \n\n~ \n\nI~ ~ u... \n\n, .-\n\nr \n\n11'1 \n\n-'\",y' \n\nI/Y \n\n-.00 \n\n-... \n\nJ \n\nn \n\nV ..... \n\n\u00b7 ~V' \n\n'\" \n\nV'W \n\n.. \n\n\" \n\nDO \n\n-I \n\n-. ~ \n\n000 \n\ntoo \n\n1000 \n\n1500 \nTimo(o) \n\n:3(01) \n\n21.C1O \n\n1)0) \n\ng