{"title": "An Adaptive Network That Learns Sequences of Transitions", "book": "Advances in Neural Information Processing Systems", "page_first": 653, "page_last": 660, "abstract": null, "full_text": "653 \n\nAN ADAPTIVE NETWORK THAT LEARNS \n\nSEQUENCES OF TRANSITIONS \n\nC. L. Winter \n\nScience Applications International Corporation \n\n5151 East Broadway, Suite 900 \n\nTucson, Auizona 85711 \n\nABSTRACT \n\nWe describe an adaptive network, TIN2, that learns the transition \nfunction of a sequential system from observations of its behavior. It \nintegrates two subnets, TIN-I (Winter, Ryan and Turner, 1987) and \nTIN-2. TIN-2 constructs state representations from examples of \nsystem behavior, and its dynamics are the main topics of the paper. \nTIN-I abstracts transition functions from noisy state representations \nand environmental data during training, while in operation it produces \nsequences of transitions in response to variations in input. Dynamics \nof both nets are based on the Adaptive Resonance Theory of Carpenter \nand Grossberg (1987). We give results from an experiment in which \nTIN2 learned the behavior of a system that recognizes strings with an \neven number of l's . \n\nINTRODUCTION \n\nSequential systems respond to variations in their input environment with sequences of \nactivities. They can be described in two ways. A black box description characterizes a \nsystem as an input-output function, m = B(u), mapping a string of input symbols, ll, \ninto a single output symbol, m. A sequential automaton description characterizes a \nsystem as a sextuple (U, M, S, SO, f, g) where U and M are alphabets of input and output \nsymbols, S is a set of states, sO is an initial state and f and g are transition and output \nfunctions respectively. The transition function specifies the current state, St, as a \nfunction of the last state and the current input, Ut, \n\n(1) \n\nIn this paper we do not discuss output functions because they are relatively simple. To \nfurther simplify discussion, we restrict ourselves to binary input alphabets, although the \nneural net we describe here can easily be extended to accomodate more complex alphabets. \n\n\f654 \n\nWinter \n\nA common engineering problem is to identify and then simulate the functionality of a \nsystem from observations of its behavior. Simulation is straightforward when we can \nactually observe the internal states of a system, since then the function f can be specified \nby learning simple associations among internal states and external inputs. In robotic \nsystems, for instance, internal states can often be characterized by such parameters as \nstepper motor settings, strain gauge values, etc., and so are directly accessible. Artificial \nneural systems have peen found useful in such simulations because they can associate \nlarge, possibly noisy state space and input variables with state and output variables (Tolat \nand Widrow, 1988; Winter, Ryan and Turner, 1987). \n\nUnfortunately, in many interesting cases we must base simulations on a limited set of \nexamples of a system's black box behavior because its internal workings are \nunobservable. The black box description is not, by itself, much use as a simulation tool \nsince usually it cannot be specified without resorting to infinitely large input-output \ntables. As an alternative we can try to develop a sequential automaton description of the \nsystem by observing regularities in its black box behavior. Artificial neural systems can \ncontribute to the development of physical machines dedicated to system identification \nbecause i) frequently state representations must be derived from many noisy input \nvariables, ii) data must usually be processed in continuous time and iii) the explicit \ndynamics of artificial neural systems can be used as a framework for hardware \nimplementations. \n\nIn this paper we give a brief overview of a neural net, TIN2, which learns and processes \nstate transitions from observations of correct black box behavior when the set of \nobservations is large enough to characterize the black box as an automaton. The TIN2 \nnet is based on two component networks. Each uses a modified adaptive resonance circuit \n(Carpenter and Grossberg, 1987) to associate heterogeneous input patterns. TIN-1 \n(Winter, Ryan and Turner, 1987) learns and executes transitions when given state \nrepresentations. It has been used by itself to simulate systems for which explicit state \nrepresentations are available (Winter, 1988a). TIN-2 is a highly parallel, continuous time \nimplementation of an approach to state representation first outlined by Nerode (1958). \n\nNerode's approach to system simulation relies upon the fact that every string, l!. moves a \nmachine into a particular state, s(y). once it has been processed. The s(y) state can be \ncharacterized by putting the system initially into s(u) (by processing y) and then \npresenting a set of experimental strings. (~1 .... , ~n)' for further processing. \nExperiments consist of observing the output mi = BUt\u00b7~i) where \u2022 indicates \nconcatenation. A state can then be represented by the entries in a row of a state \ncharacterization table, C (Table 1). The rows of C are indexed by strings, lI, its columns \nare indexed by experiments. Wi. and its entries are mi. In Table 1 annotations in \nparentheses indicate nodes (artificial neurons) and subnetworks of TIN-2 equivalent to the \nstates are \ncorresponding C table entry. During experimentation C expands as \n\n\fAdaptive Network That Learns Sequences of Transitions \n\n655 \n\ndistinguished from one another. The orchestration of experiments, their selection, the \n\nTABLE 1. C Table Constructed by TIN-2 \n\nA. \n\n1 (Node 7) \no (Node 6) \no (Node 1) \no (Node 3) \n\no (Assembly 1) \no (Node 2) \no (Node 9) \n1 (Node 6) \no (Node 2) \n\n1 (Assembly 2) \no (Node 5) \n1 (Node 1) \no (Node 4) \no (Node 0) \n\nA. \n1 \n0 \n10 \n\nrole of teachers and of the environment have been investigated by Arbib and Zeiger \n(1969), Arbib and Manes (1974), Gold (1972 and 1978) and Angluin (1987) to name a \nfew. TIN-2 provides an architecture in which C can be embedded and expanded as \nnecessary. Collections of nodes within TIN-21earn to associate triples (mi, 11, ~i) so that \ninputting II later results in the output of the representation (m 1, ... , mn)n of the state \nassociated with 11. \n\nTIN-2 \n\nTIN-2 is composed of separate assemblies of nodes whose dynamics are such that each \nassembly comes to correspond to a column in the state characterization table C. Thus we \ncall them column-assemblies. Competition among column-assemblies guarantees that \nnodes of only one assembly, say the ith, learn to respond to experimental pattern ~i' \nHence column-assemblies can be labelled ~ 1 ' ~2 and so on, but since labelings are not \nassigned ahead of time, arbitrarily large sets of experiments can be learned. \n\nThe theory of adaptive resonance is implemented in TIN-2 column-assemblies through \npartitioned adaptive resonance circuits (cf. Ryan, Winter and Turner, 1987). Adaptive \nresonance circuits (Grossberg and Carpenter, 1987; Ryan and Winter, 1987) are composed \nof four collections of nodes: Input, Comparison (FI), Recognition (F2) and Reset. In \nTIN-2 Input, Comparison and Reset are split into disjoint m,.u and ~ partitions. The net \nruns in either training or operational mode, and can move from one to the other as \nrequired. The training dynamics of the circuit are such that an F2 node is stimulated by \nthe overall triple (m. n,~, but can be inhibited by a mismatch with any component. \nDuring operation input of.u recalls the state representation s(u) = (m 1 .... , mn)n' \n\nNode activity for the kth FI partition, FI k' k = m, u, W, is governed by \n\n, \n\n(2) \n\nHere t < 1 scales time, Ii,k is the value of the ith input node of partition k, xi,k is \n\n\f656 \n\nWinter \n\nactivity in the corresponding node of FI and f is a sigmoid function with range [0. 1]. \nThe elements of I are either 1. -lor O. The dynamics of the TIN-2 circuit are such that 0 \nindicates the absence of a symbol, while 1 and -1 represent elements of a binary alphabet. \nThe adaptive feedback filter. T. is a matrix (Tji) whose elements. after training. are also \n1.-1 orO. \n\nActivity, yj. in the jth F2 node is driven by \n\n+ L meFl,m Bmj h(xm)] - 4[ ~*j f(YTl) + Ruj + Rw] . \n\n(3) \n\nThe feedforward fllter B is composed of matrices (Buj)' (Bmj) and (Bw) whose elements \nare normalized to the size of the patterns memorized. Note that (Bw) is the same for \nevery node in a given column-assembly. i.e. the rows of (Bw) are all the same. Hence all \nnodes within a column-assembly learn to respond to the same experimental pattern. w. \nand it is in this sense that an assembly evolves to become equivalent to a column in table \nC. During training the sum ~*j f(YTl) in (3) runs through the recognition nodes of all \nTIN-2 column-assemblies. Thus. during training only one F2 node. say the Jth. can be \nactive at a time across all assemblies. In operation. on the other hand. we remove \ninhibition due to nodes in other assemblies so that at any time one node in each \ncolumn-assembly can be active. and an entire state representation can be recalled. \n\nThe Reset terms Ru,j and Rw in (3) actively inhibit nodes of F2 when mismatches \nbetween memory and input occur. Ruj is specific to the jth F2 node. \n\ndRujldt = -Ruj + f(Yj) f(v 1I1u II - II \u00a3I.u II) . \n\nRw affects all F2 nodes in a column-assembly and is driven by \n\ndRw/dt = -Rw + [LjeF2 f(Yj)] f(v IIlw II-II fI.w II). \n\n(4) \n\n(5) \n\nv < 1 is a vigilance parameter (Carpenter and Grossberg. 1987): for either (4) or (5) R > 0 \nat equilibrium just when the intersection between memory and input. PI = T n I. is \nrelatively small, i.e. R > 0 when v 11111 > II PI II. When the system is in operation. we \nfix Rw = 0 and input the pattern Iw = O. To recall the row in table C indexed by 11, we \ninput 11 to all column-assemblies. and at equilibrium xi.m = Lje F2 Tjif(Yj). Thus xi,m \nrepresents the memory of the element in C corresponding to 11 and the column in C with \nthe same label as the column-assembly. Winter (1988b) discusses recall dynamics in \nmore detail. \n\n\fAdaptive Network That Learns Sequences of Transitions \n\n657 \n\nAt equilibrium in either training or operational mode only the winning F2 node has YJ *(cid:173)\nO. so LjTjif(Yj) = TJi in (2). Hence xi.k = 0 if TJi = -li.k. i.e. if memory and input \nmismatch; IXi.kl = 2 if TJi = Ii,k. i.e. when memory and input match; and IXi.kl = 1 if \nTJ.i = O. Ii.k *- 0 or ifTJ.i *- O. Ii.k = O. The F1 output function h in (3) is defined so \nthat hex) = 1 if x> 1. hex) = -1 if x < -1 and hex) = 0 if -1 S x S 1. The output pattern \n~1 = (h(x1) ..... h(xnl\u00bb reflects IJ ('\\ Ik. as h(xi) *- 0 only if TJi = Ii.k. \n\nThe adaptive filters (Buj) and (Bmj) store normalized versions of those patterns on FI.u \nand F1.m which have stimulated the jth F2 node. The evolution of Bij for u E FI.u or \nF1 m is driven by \n\n\u2022 \n\n(6) \n\nOn the other hand (Bw) stores a normalized version of the experiment w which labels the \nentire column-assembly. Thus all nodes in a column-assembly share a common memory \nof~. \n\n(7) \n\nwhere w E F1 w . \n\n\u2022 \n\nThe feedback mters (T uj). (T mj) and (T w) store exact memories of patterns on partitions \nofFI: \n\nfor i E FI.u \u2022 F1.m \u2022 and \n\n(8) \n\n(9) \n\nfor i E FI.w' In operation long-term memory modification is suspended. \n\n. . \n\nEXPERIMENT \n\nHere we report partial results from an experiment in which TlN-2 learns a state \ncharacterization table for an automaton that recognizes strings containing even numbers of \n\n\f658 \n\nWinter \n\nboth I's and O's. More details can be found in Winter (1988b). For notational \nconvenience in this section we will discuss patterns as if they were composed of l's and \nO's, but be aware that inside TIN-2 every 0 symbol is really a -1. Data is provided in the \nform of triples eM, ll, YD by a teacher; the data set for this example is given in Table 1. \nData were presented to the net in the order shown. The net consisted of three \ncolumn-assemblies. Each F2 collection contained ten nodes. Although the strings that \ncan be processed by an automaton of this type are in principle arbitrarily long, in practice \nsome limitation on the length of training strings is necessary if for no other reason than \nthat the memory capacity of a computer is finite. For this simple example Input and F I \npartitions contain eight nodes, but in order to have a special symbol to represent A.. \nstrings are limited to at most six elements. With this restriction the A. symbol can be \ndistinguished from actual input strings through vigilance criteria. Other solutions to the \nproblem of representing A. are being investigated, but for now the special eight bit \nsymbol, 00000011, is used to represent A. in the strings A.-yt. \n\nThe net was trained using fast-learning (Carpenter and Grossberg, 1987): a triple in Table \n1 was presented to the net. and all nodes were allowed to come to their equilibrium values \nwhere they were held for about three long-term time units before the next triple was \npresented. Consider the processing that follows presentation of (0, 1,0) the first datum \nin Table 1. The net can obtain equivalents to two C table entries from (0, 1,0): the entry \nin row 1l = 10, column Yi. = A. and the entry in row II =1, column w = O. The string 10 \nand the membership value 0 were displayed on the A. assembly's input slabs, and in this \ncase the 3rd F2 node learned the association among the two patterns. When the pattern \n(0, 1, 0) was input to other column-assemblies, one F2 node (in this case the 9th in \ncolumn-assembly 1) learned to associate elements of the triple. Of course a side effect of \nthis was that column-assembly 1 was labelled by Yi.. = 0 thereafter. When (1. 1, 1) was \ninput next, node 9 in column-assembly 1 tried to respond to the new triple, all nodes in \ncolumn-assembly 1 were then inhibited by a mismatch on Yi.., and finally node 1 on \ncolumn-assembly 2 learned (1, 1, 1). From that point on column-assembly 2 was \nlabelled by 1. \n\nLEARNING TRANSITIONS \n\nThe TIN-I net (Winter. Ryan and Turner, 1987) is composed of i) a partitioned adaptive \nresonance circuit with dynamics similar to (2) - (9) for learning state transitions and ii) a \nControl Circuit which forces transitions once they have been learned. Transitions are \nunique in the sense that a previous state and current input completely determine the \ncurrent state. The partitioned adaptive resonance circuit has three input fields: one for the \nprevious state, one for the current input and one for the next state. TIN-l's F2 nodes \nlearn transitions by associating patterns in the three input fields. Once trained. TIN-l \nprocesses strings sequentially. bit-by-bit. \n\n\fAdaptive Network That Learns Sequences of Transitions \n\n659 \n\n1L~r--T-'N---2 -~:t ~ TIN-l \n\nTIN-2 I~ u.eu \n\nFigure 1. Training TIN2. \n\nThe architecture of TIN2, the net that integrates TIN-2 and TIN-I. is shown in Figure 1. \nThe system resorts to the TIN-2 nets only to learn transitions. If TIN-2 has learned a C \ntable in which examples of all transitions appear, TIN-I can easily learn the automaton's \nstate transitions. A C table contains an example of a transition from state si to state Sj \nforced by current input u, if it contains i) a row labelled by a string lli which leaves the \nautomaton in si after processing and ii) a row labelled by the string lltu which leaves the \nautomaton in Sj. To teach TIN-l the transition we simply present lli to the lower TIN-2 \nin Figure I, llieu to the upper TIN-2 net and u to TIN-I. \n\nCONCLUSIONS \n\nWe have described a network, TIN-2, which learns the equivalent of state characterization \ntables (Gold, 1972). The principle reasons for developing a neural net implementation are \ni) neural nets are intrinsically massively parallel and so provide a nice model for systems \nthat must process large data sets, ii) although in the interests of brevity we have not \nstressed the point, neural nets are robust against noisy data, iii) neural nets like the \npartitioned adaptive resonance circuit have continuous time activity dynamics and so can \nbe synchronized with other elements of a larger real-time system through simple scaling \nparameters, and iv) the continuous time dynamics and precise architectural specifications \nof neural nets provide a blueprint for hardware implementations. \n\nWe have also sketched a neural net, TIN2, that learns state transitions by integrating \nTIN-2 nets with the TIN-I net (Winter, Ryan and Turner, 1987). When a complete state \ncharacterization table is available from TIN-2, TIN2 can be taught transitions from \nexamples of system behavior. However, the ultimate goal of a net like this lies in \ndeveloping a system that \"or,rates acceptably\" with a partial state characterization table. \nTo operate acceptably TIN must perform transitions correctly when it can, recognize \nwhen it cannot, signal for new data when it is required and expand the state charcterization \ntaole when it must. Happily TIN2 already provides the first two capabilities, and \ncombinations of TIN2 with rule-based controllers and with auxiliary control networks are \ncurrently being explored as approachws to satisfy the latter (Winter, 1988b). \n\nNets like TIN2 may eventually prove useful as control elements in physical machines \nbecause sequential automata can respond to unpredictable environments with a wide range \nof behavior. Even very simple automata can repeat activities and make decisions based \nupon environmental variations. Currently, most physical machines that make decisions \nare dedicated to a single task; applying one to a new task requires re-programming by a \n\n\f660 \n\nWinter \n\nskilled technician. A programmer must, furthermore, determine a priori precisely which \nmachine state - environment associations are significant enough to warrant insertion in \nthe control structure of a given machine. TIN2, on the other hand, is trained, not \nprogrammed, and can abstract significant associations from noisy input. It is a \"blank \nslate\" that learns the structure of a particular sequential machine from examples. \n\nReferences \n\nD. Angluin, \"Learning Regular Sets from Queries and Counterexamples\", Information \nand Computation, 75 (2), 1987. \nM. A. Arbib and E. G. Manes, \"Machines in a Category: an Expository Introduction\", \nSIAM Review, 16 (2), 1974. \nM. A. Arbib and H. P. Zeiger, \"On the Relevance of Abstract Algebra to Control \nTheory\", Automatica, 5, 1969. \nG. Carpenter and S. Grossberg, \"A Massively Parallel Architecture for a Self-Organizing \nNeural Pattern Recognition Machine\", Comput. Vision Graphics Image Process. 37 (54), \n1987. \nE. M. Gold, \"System Identification Via State Characterization\", Automatica, 8, 1972. \nE. M. Gold, \"Complexity of Automaton Identification from Given Data\", Info. and \nControl, 37, 1978. \nA. Neroda, \"Linear Automaton Transformations\", Proc. Am. Math. Soc., 9, 1958. \nT. W. Ryan and C. L. Winter, \"Variations on Adaptive Resonance\", in Proc. 1st IntI. \nConf. on Neural Networks, IEEE, 1987. \nT. W. Ryan, C. L. Winter and C. J. Turner, \"Dynamic Control of an Artificial Neural \nSystem: the Property Inheritance Network\", Appl. Optics, 261 (23) 1987. \nV. V. Tolat and B. Widrow, \"An Adaptive Neural Net Controller with Visual Inputs\", \nNeural Networks, I, S upp I, 1988. \nC. L. Winter, T. W. Ryan and C. J. Turner, \"TIN: A Trainable Inference Network\", in \nProc. 1st Inti. Conf. on Neural Networks, 1987. \nC. L. Winter, \"An Adaptive Network that Flees Pursuit\", Neural Networks, I, Supp.l, \n1988a. \nC. L. Winter, \"TIN2: An Adaptive Controller\", SAIC Tech. Rpt., SAIC, 5151 E. \nBroadway, Tucson, AZ, 85711, 1988b. \n\n\fPart V \n\nImplementation \n\n\f", "award": [], "sourceid": 103, "authors": [{"given_name": "C.", "family_name": "Winter", "institution": null}]}