{"title": "Finite State Automata that Recurrent Cascade-Correlation Cannot Represent", "book": "Advances in Neural Information Processing Systems", "page_first": 612, "page_last": 618, "abstract": null, "full_text": "Finite State Automata that Recurrent \nCascade-Correlation Cannot Represent \n\nStefan C. Kremer \n\nDepartment of Computing Science \n\nUniversity of Alberta \n\nEdmonton, Alberta, CANADA T6H 5B5 \n\nAbstract \n\nThis paper relates the computational power of Fahlman' s Recurrent \nCascade Correlation (RCC) architecture to that of fInite state automata \n(FSA). While some recurrent networks are FSA equivalent, RCC is not. \nThe paper presents a theoretical analysis of the RCC architecture in the \nform of a proof describing a large class of FSA which cannot be realized \nby RCC. \n\n1 INTRODUCTION \nRecurrent networks can be considered to be defmed by two components: a network \narchitecture, and a learning rule. The former describes how a network with a given set \nof weights and topology computes its output values, while the latter describes how the \nweights (and possibly topology) of the network are updated to fIt a specifIc problem. It is \npossible to evaluate the computational power of a network architecture by analyzing the \ntypes of computations a network could perform assuming appropriate connection weights \n(and topology). This type of analysis provides an upper bound on what a network can be \nexpected to learn, since no system can learn what it cannot represent. \nMany recurrent network architectures have been proven to be fInite state automaton or \neven Turing machine equivalent (see for example [Alon, 1991], [Goudreau, 1994], \n[Kremer, 1995], and [Siegelmann, 1992]). The existence of such equivalence proofs \nnaturally gives confIdence in the use of the given architectures. \nThis paper relates the computational power of Fahlman's Recurrent Cascade Correlation \narchitecture [Fahlman, 1991] to that of fInite state automata. It is organized as follows: \nSection 2 reviews the RCC architecture as proposed by Fahlman. Section 3 describes \nfInite state automata in general and presents some specifIc automata which will play an \nimportant role in the discussions which follow. Section 4 describes previous work by other \n\n\fFinite State Automata that Recurrent Cascade-Correlation Cannot Represent \n\n613 \n\nauthors evaluating RCC' s computational power. Section 5 expands upon the previous \nwork, and presents a new class of automata which cannot be represented by RCC. Section \n6 further expands the result of the previous section to identify an infinite number of other \nunrealizable classes of automata. Section 7 contains some concluding remarks. \n\n2 THE RCC ARCHITECTURE \nThe RCC architecture consists of three types of units: input units, hidden units and output \nunits. After training, a RCC network performs the following computation: First, the \nactivation values of the hidden units are initialized to zero. Second, the input unit \nactivation values are initialized based upon the input signal to the network. Third, each \nhidden unit computes its new activation value. Fourth, the output units compute their new \nactivations. Then, steps two through four are repeated for each new input signal. \nThe third step of the computation, computing the activation value of a hidden unit, is \naccomplished according to the formula: \n\na(t+l) = a( t W . a(t+l) + w.a(t)]. \n\nJJ J \n\nJ \n\n;=\\ \n\n'J' \n\nHere, ai(t) represents the activation value of unit i at time t, a(e) represents a sigmoid \nsquashing function with fInite range (usually from 0 to 1), and Wij represents the weight \nof the connection from unit ito unitj. That is, each unit computes its activation value by \nmUltiplying the new activations of all lowered numbered units and its own previous \nactivation by a set of weights, summing these products, and passing the sum through a \nlogistic activation function . The recurrent weight Wjj from a unit to itself functions as a \nsort of memory by transmitting a modulated version of the unit's old activation value. \n\nThe output units of the RCC architecture can be viewed as special cases of hidden units \nwhich have weights of value zero for all connections originating from other output units. \nThis interpretation implies that any restrictions on the computational powers of general \nhidden units will also apply to the output units. For this reason, we shall concern ourselves \nexclusively with hidden units in the discussions which follow. \n\nFinally, it should be noted that since this paper is about the representational power of the \nRCC architecture, its associated learning rule will not be discussed here. The reader \nwishing to know more about the learning rule, or requiring a more detailed description of \nthe operation of the RCC architecture, is referred to [Fahlman, 1991]. \n\n3 FINITE STATE AUTOMATA \nA Finite State Automaton (FSA) [Hopcroft, 1979] is a formal computing machine defmed \nby a 5-tuple M=(Q,r.,8,qo,F), where Q represents a fmite set of states, r. a fmite input \nalphabet, 8 a state transition function mapping Qxr. to Q, qoEQ the initial state, and FcQ \na set of fmal or accepting states. FSA accept or reject strings of input symbols according \nto the following computation: First, the FSA' s current state is initialized to qo. Second, \nthe next inut symbol of the str ing, selected from r., is presented to the automaton by the \noutside world. Third, the transition function, 8, is used to compute the FSA' s new state \nbased upon the input symbol, and the FSA's previous state. Fourth, the acceptability of \nthe string is computed by comparing the current FSA state to the set of valid fmal states, \nF. If the current state is a member of F then the automaton is said to accept the string of \ninput symbols presented so far. Steps two through four are repeated for each input symbol \npresented by the outside world. Note that the steps of this computation mirror the steps \nof an RCC network's computation as described above. \nIt is often useful to describe specifIc automata by means of a transition diagram [Hopcroft, \n1979]. Figure 1 depicts the transition diagrams of fIve FSA. In each case, the states, Q, \n\n\f614 \n\nS.C. KREMER \n\nare depicted by circles, while the transitions defmed by 0 are represented as arrows from \nthe old state to the new state labelled with the appropriate input symbol. The arrow \nlabelled \"Start\" indicates the initial state, qo; and fmal accepting states are indicated by \ndouble circles. \nWe now defme some terms describing particular FSA which we will require for the \nfollowing proof. The first concerns input signals which oscillate. Intuitively, the input \nsignal to a FSA oscillates if every pm symbol is repeated for p> 1. More formally, a \nsequence of input symbols, s(t), s(t+ 1), s(t+ 2), ... , oscillates with a period of p if and \nonly if p is the minimum value such that: Vt s(t)=s(t+p). \nOur second definition concerns oscillations of a FSA's internal state, when the machine is \npresented a certain sequence of input signals. \nIntuitively, a FSA' s internal state can \noscillate in response to a given input sequence if there is some starting state for which \nevery subsequent <.>th state is repeated. Formally, a FSA' s state can oscillate with a period \nof <.> in response to a sequence of input symbols, s(t), s(t+ 1), s(t+2), ... , if and only if \n<.> is the minimum value for which: \n\n3qo S.t. Vt o(qo, s(t\u00bb = o( .. , o( o( o(qo, s(t\u00bb, s(t+ 1\u00bb, s(t+2\u00bb, ... , s(t+<.>)) \nThe recursive nature of this formulation is based on the fact a FSA' s state depends on its \nprevious state, which in tum depends on the state before, etc .. \nWe can now apply these two defmitions to the FSA displayed in Figure 1. The automaton \nlabelled \"a)\" has a state which oscillates with a period of <.>=2 in response to any sequence \nconsisting of Os and Is (e.g. \"00000 ... \", \"11111.. .. \", \"010101. .. \", etc.). Thus, we can \nsay that it has a state cycle of period <.>=2 (Le. qoqtqoqt ... ), when its input cycles with a \nperiod of p= 1 (Le. \"0000 ... If). Similarly, when automaton b)'s input cycles with period \np= 1 (Le . ''000000 ... \"), its state will cycle with period <.>=3 (Le. qOqtq2qOqtq2' .. ). \nFor automaton c), things are somewhat more complicated. When the input is the sequence \n\"0000 .. . \", the state sequence will either be qoqo%qo .. ' or fA fA fA fA .. . depending on the \ninitial state. On the other hand, when the input is the sequence \"1111 ... \", the state \nsequence will alternate between qo and qt. Thus, we say that automaton c) has a state cycle \nof <.> = 2 when its input cycles with period p = 1. But, this automaton can also have larger \nFor example, when the input oscillates with a period p=2 (Le. \nstate cycles. \n\"01010101. .. If), then the state of the automaton will oscillate with a period <.>=4 (Le . \nqoqoqtqtqoqoqtqt ... ). Thus, we can also say that automaton c) has a state cycle of <.>=4 \nwhen its input cycles with period p =2. \nThe remaining automata also have state cycles for various input cycles, but will not be \ndiscussed in detail. The importance of the relationship between input period (P) and the \nstate period (<.\u00bb will become clear shortly. \n\n4 PREVIOUS RESULTS CONCERNING THE COMPUTATIONAL \nPOWEROFRCC \nThe first investigation into the computational powers of RCC was performed by Giles et. \nal. [Giles, 1995]. These authors proved that the RCC architecture, regardless of \nconnection weights and number of hidden units, is incapable of representing any FSA \nwhich \"for the same input has an output period greater than 2\" (p. 7). Using our \noscillation defmitions above, we can re-express this result as: if a FSA' s input oscillates \nwith a period of p= 1 (Le. input is constant), then its state can oscillate with a period of \nat most <.>=2. As already noted, Figure Ib) represents a FSA whose state oscillates with \na period of <.>=3 in response to an input which oscillates with a period of p=1. Thus, \nGiles et. al.'s theorem proves that the automaton in Figure Ib) cannot be implemented (and \nhence learned) by a RCC network. \n\n\fFinite State Automata that Recurrent Cascade-Correlation Cannot Represent \n\n615 \n\na) \n\nStart \n\nb) \n\nStart \n\n0, I \n\nc) \n\nStart \n\nd) \n\nStart \n\no \n\no \n\no \n\ne) \n\nStart \n\no \n\no \n\nFigure I: Finite State Automata. \n\nGiles et. al. also examined the automata depicted in Figures la) and lc). However, unlike \nthe formal result concerning FSA b), the authors' conclusions about these two automata \nwere of an empirical nature. In particular, the authors noted that while automata which \noscillated with a period of 2 under constant input (Le. Figure la\u00bb were realizable, the \nautomaton of Ic) appeared not be be realizable by RCC . Giles et. al. could not account \nfor this last observation by a formal proof. \n\n\f616 \n\nS.C.KREMER \n\n5 AUTOMATA WITH CYCLES UNDER ALTERNATING INPUT \nWe now turn our attention to the question: why is a RCC network unable to learn the \nautomaton of lc)? We answer this question by considering what would happen if lc) were \nrealizable. In particular, suppose that the input units of a RCC network which implements \nautomaton lc) are replaced by the hidden units of a RCC network implementing la). In \nthis situation, the hidden units of la) will oscillate with a period of 2 under constant input. \nBut if the inputs to lc) oscillate with a period of 2, then the state of Ic) will oscillate with \na period of 4. Thus, the combined network's state would oscillate with a period of four \nunder constant input. Furthermore, the cascaded connectivity scheme of the RCC \narchitecture implies that a network constructed by treating one network's hidden units as \nthe input units of another, would not violate any of the connectivity constraints of RCC. \nIn other words, if RCC could implement the automaton of lc), then it would also be able \nto implement a network which oscillates with a period of 4 under constant input. Since \nGiles et. al. proved that the latter cannot be the case, it must also be the case that RCC \ncannot implement the automaton of lc). \n\nThe line of reasoning used here to prove that the FSA of Figure lc) is unrealizable can also \nbe applied to many other automata. In fact, any automaton whose state oscillates with a \nperiod of more than 2 under input which oscillates with a period 2, could be used to \nconstruct one of the automata proven to be illegal by Giles. This implies that RCC cannot \nimplement any automaton whose state oscillates with a period of greater than <.>=2 when \nits input oscillates with a period of p=2. \n\n6 AUTOMATA WITH CYCLES UNDER OSCILLATING INPUT \nGiles et. aI.' s theorem can be viewed as defining a class of automata which cannot be \nimplemented by the RCC architecture. The proof in Section 5 adds another class of \nautomata which also cannot be realized. More precisely, the two proofs concern inputs \nwhich oscillate with periods of one and two respectively. It is natural to ask whether \nfurther proofs for state cycles can be developed when the input oscillates with a period of \ngreater than two. We now present the central theorem of this paper, a unified defmition \nof unrealizable automata: \nTheorem: If the input signal to a RCC network oscillates with a period, p, then the \nnetwork can represent only those FSA whose outputs form cycles of length <.>, where \npmod<.>=O if p is even and 2pmod<.> =0 if p is odd. \n\nTo prove this theorem we will first need to prove a simpler one relating the rate of \noscillation of the input signal to one node in an RCC network to the rate of oscillation of \nthat node's output signal. By \"the input signal to one node\" we mean the weighted sum \nof all activations of all connected nodes (Le. all input nodes, and all lower numbered \nhidden nodes), but not the recurrent signal. I. e . : \n\nj - I \n\nA(t+ 1) == \" W .. a .(t+ 1) . \n\nIJ \n\nI \n\nL.J \n1=1 \n\nUsing this defmition, it is possible to rewrite the equation to compute the activation of node \nj (given in Section 2) as: \n\nap+l) == a( A(t+l)+Wha/t) ) . \n\nBut if we assume that the input signal oscillates with a period of p, then every value of \nA(t+ 1) can be replaced by one of a fmite number of input signals (.to, AI, A 2, .,. Ap. I ) . In \nother words, A(t+ 1) = A tmodp ' Using this substitution, it is possible to repeatedly expand \nthe addend of the previous equation to derive the formula: \n\nap+ 1) = a( Atmodp + '\")j . \n\na( A(t-I)modp + Wp . \n\na( A(t-2)modp + '\")j .... a( A(t-p+I)modp + ,\")/ait-p+ 1) ) ... ) ) ) \n\n\fFinite State Automata that Recurrent Cascade-Correlation Cannot Represent \n\n617 \n\n'\"11 < 0 and p \n\nis odd. \n\nThe unravelling of the recursive equation now allows us to examine the relationship \nbetween ap+ 1) and t;(t-p+ 1). Specifically, we note that if ~ >0 or if p is even then \naj{t+ 1) = ft.ap-p+ 1\u00bb implies that/is a monotonically increasing function. Furthermore, \nsince 0' is a function with finite range,f must also have finite range. \nIt is well known that for any monotonically increasing function with [mite range, /, the \nsequence, ft.x), fif(x\u00bb , fift.j{x\u00bb) , ... , is guaranteed to monotonically approach a fixed point \n(whereft.x)=x). This implies that the sequence, ap+l), t;(t+p+l), q(t+2p+l), ... , \nmust also monotonically approach a fixed point (where ap+ 1) = q.(t-p+ 1\u00bb. In other \nwords, the sequence does not oscillate. Since every prh value of ~{t) approaches a fixed \npoint, the sequence ap), ap+ 1), ap+2), '\" can have a period of at most p, and must \nhave a period which divides p evenly. We state this as our first lemma: \nLemma 1: If A.(t) oscillates with even period, p, or if Wu > 0, then state unit j's activation \nvalue must oscillate with a period c..>, where pmodc..> =0. \nWe must now consider the case where \nIn this case, \nap+ 1) = ft.ap-p+ 1\u00bb implies that/is a monotonically decreasing function. But, in this \nsituation the function/ 2(x)=ft.f{x\u00bb must be monotonically increasing with finite range. \nThis implies that the sequence: ap+ 1), a;, where 2pmodc..>=0. \nLemmas 1 and 2 relate the rate of oscillation of the weighted sum of input signals and \nlower numbered unit activations, A.(t) to that of unitj. However, the theorem which we \nwish to prove relates the rate of oscillation of only the RCC network's input signal to the \nentire hidden unit activations. To prove the theorem, we use a proof by induction on the \nunit number, i: \nBasis: Node i= 1 is connected only to the network inputs. Therefore, if the input signal \noscillates with period p, then node i can only oscillate with period c..>, where pmodc..> =0 if \nP is even and 2pmodc..> =0 if P is odd. (This follows from Lemmas 1 and 2). \nAssumption: If the input signal to the network oscillates with period p, then node i can \nonly oscillate with period c..>, where pmodc..> =0 if p is even and 2pmodc..>=0 if p is odd. \nProof: If the Assumption holds for all nodes i, then Lemmas 1 and 2 imply that it must \nalso hold for node i+ 1.0 \nThis proves the theorem: \nTheorem: \nIf the input signal to a RCC network oscillates with a period, p, then the \nnetwork can represent only those FSA whose outputs form cycles of length c..>, where \npmodc..>=O ifp is even and 2pmodc..> =0 ifp is odd. \n\nIf A.(t) oscillates with odd period p, and if Wii, where 2(I)modc..>=0, implying that state cannot oscillate with a \nperiod of greater than 2. This is exactly what Giles et. al concluded, and proves that \n(among others) the automaton of Figure Ib) cannot be implemented by RCC. \n\n\f618 \n\nS.C.KREMER \n\nSimilarly, the proof of Section 5 concerns input cycles of length p=2. Applying our \ntheorem proves that an RCC network can only represent those machines whose state \ntransitions form cycles of length <.>, where (2)modw=O. This again implies that state \ncannot oscillate with a period greater than 2, which is exactly what was proven in Section \n5. This proves that the automaton of Figure lc) (among others) cannot be implemented \nby RCC. \nIn addition to unifying both the results of Giles et. al. and Section 5, the theorem of \nSection 6 also accounts for many other FSA which are not representable by RCC. In fact, \nthe theorem identifies an inflnite number of other classes of non-representable FSA (for \np = 3, P =4, P = 5, ... ). Each class itself of course contains an infinite number of machines. \nCareful examination of the automaton illustrated in Figure ld) reveals that it contains a \nstate cycle of length 9 (QcIJ.IQ2QIQ2Q3Q2Q3Q4QcIJ.IQ2Qlq2q3q2q3q4\"') in response to an input cycle \nof length 3 (\"001001... \"). Since this is not one of the allowable input/state cycle \nrelationships defined by the theorem, it can be concluded that the automaton of Figure Id) \n(among others) cannot be represented by RCC. \nFinally, it should be noted that it remains unknown if the classes identified by this paper IS \ntheorem represent the complete extent of RCC's computational limitations. Consider for \nexample the automaton of Figure Ie). This device has no input/state cycles which violate \nthe theorem, thus we cannot conclude that it is unrepresentable by RCC. Of course, the \nissue of whether or not this particular automaton is representable is of little interest. \nHowever, the class of automata to which the theorem does not apply, which includes \nautomaton Ie), requires further investigation. Perhaps all automata in this class are \nrepresentable; perhaps there are other subclasses (not identified by the theorem) which \nRCC cannot represent. This issue will be addressed in future work. \n\nReferences \nN. Alon, A. Dewdney, and T. Ott, Efficient simulation of flnite automata by neural nets, \nJournal of the Association for Computing Machinery, 38 (2) (1991) 495-514. \nS. Fahlman, The recurrent cascade-correlation architecture, in: R. Lippmann, J. Moody \nand D. Touretzky, Eds., Advances in Neural Information Processing Systems 3 (Morgan \nKaufmann, San Mateo, CA, 1991) 190-196. \n\nC.L. Giles, D. Chen, G.Z. Sun, H.H. Chen, Y.C. Lee, and M.W. Goudreau, \nConstructive Learning of Recurrent Neural Networks: Limitations of Recurrent Cascade \nCorrelation and a Simple Solution, IEEE Transactions on Neural Networks, 6 (4) (1995) \n829-836. \nM. Goudreau, C. Giles, S. Chakradhar, and D. Chen, First-order v.S. second-order single \nlayer recurrent neural networks, IEEE Transactions on Neural Networks, 5 (3) (1994) 511-\n513. \nJ.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages and \nComputation (Addison-Wesley, Reading, MA, 1979). \nS.C. Kremer, On the Computational Power of Elman-style Recurrent Networks, IEEE \nTransactions on Neural Networks, 6 (4) (1995) 1000-1004. \nH.T. Siegelmann and E.D. Sontag, On the Computational Power of Neural Nets, in: \nProceedings of the Fifth ACM Workshop on Computational Learning Theory, (ACM, New \nYork, NY, 1992) 440-449. \n\n\f", "award": [], "sourceid": 1101, "authors": [{"given_name": "Stefan", "family_name": "Kremer", "institution": null}]}