{"title": "Learning on a General Network", "book": "Neural Information Processing Systems", "page_first": 22, "page_last": 30, "abstract": null, "full_text": "22 \n\nLEARNING ON A GENERAL NETWORK \n\nAmir F. Atiya \n\nDepartment of Electrical Engineering \n\nCalifornia Institute of Technology \n\nCa 91125 \n\nAbstract \n\nThis paper generalizes the backpropagation method to a general network containing feed(cid:173)\n\nback t;onnections. The network model considered consists of interconnected groups of neurons, \nwhere each group could be fully interconnected (it could have feedback connections, with pos(cid:173)\nsibly asymmetric weights), but no loops between the groups are allowed. A stochastic descent \nalgorithm is applied, under a certain inequality constraint on each intra-group weight matrix \nwhich ensures for the network to possess a unique equilibrium state for every input. \n\nIntroduction \n\nIt has been shown in the last few years that large networks of interconnected \"neuron\" -like \nelemp.nts are quite suitable for performing a variety of computational and pattern recognition \ntasks. One of the well-known neural network models is the backpropagation model [1]-[4]. It \nis an elegant way for teaching a layered feedforward network by a set of given input/output \nexamples. Neural network models having feedback connections, on the other hand, have also \nbeen devised (for example the Hopfield network [5]), and are shown to be quite successful in \nperforming some computational tasks. It is important, though, to have a method for learning \nby examples for a feedback network, since this is a general way of design, and thus one can \navoid using an ad hoc design method for each different computational task. The existence \nof feedback is expected to improve the computational abilities of a given network. This is \nbecause in feedback networks the state iterates until a stable state is reached. Thus processing \nis perforrr:.ed on several steps or recursions. This, in general allows more processing abilities \nthan the \"single step\" feedforward case (note also the fact that a feedforward network is \na special case of a feedback network). Therefore, in this work we consider the problem of \ndeveloping a general learning algorithm for feedback networks. \n\nIn developing a learning algorithm for feedback networks, one has to pay attention to the \nfollowing (see Fig. 1 for an example of a configuration of a feedback network). The state of \nthe network evolves in time until it goes to equilibrium, or possibly other types of behavior \nsuch as periodic or chaotic motion could occur. However, we are interested in having a steady \nand and fixed output for every input applied to the network. Therefore, we have the following \ntwo important requirements for the network. Beginning in any initial condition, the state \nshould ultimately go to equilibrium. The other requirement is that we have to have a unique \n\n\u00a9 American Institute of Physics 1988 \n\n\f23 \n\nequilibrium state. It is in fact that equilibrium state that determines the final output. The \nobjective of the learning algorithm is to adjust the parameters (weights) of the network in small \nsteps, so as to move the unique equilibrium state in a way that will result finally in an output \nas close as possible to the required one (for each given input). The existence of more than op.e \nequilibrium state for a given input causes the following problems. In some iterations one might \nbe updating the weights so as to move one of the equilibrium states in a sought direction, while \nin other iterations (especially with different input examples) a different equilibrium state is \nmoved. Another important point is that when implementing the network (after the completion \noflearning), for a fixed input there can be more than one possible output. Independently, other \nwork appeared recently on training a feedback network [6],[7],[8]. Learning algorithms were \ndeveloped, but solving the problem of ensuring a unique equilibrium was not considered. This \nproblem is addressed in this paper and an appropriate network and a learning algorithm are \nproposed. \n\nneuron 1 \n\ninputs \n\noutputs \n\nFig. 1 \n\nA recurrent network \n\nThe Feedback Network \n\nConsider a group of n neurons which could be fully inter-connected (see Fig. 1 for an \nexample). The weight matrix W can be asymmetric (as opposed to the Hopfield network). \nThe inputs are also weighted before entering into the network (let V be the weight matrix). \nLet x and y be the input and output vectors respectively. In our model y is governed by the \nfollowing set of differential equations, proposed by Hopfield [5]: \n\ndu \nTdj = Wf(u) - u + Vx, \n\ny = f(u) \n\n(1) \n\n\f24 \n\nwhere f(u) = (J(ud, ... , f(un)f, T denotes the transpose operator, f is a bounded and \ndifferentiable function, and.,. is a positive constant. \n\nFor a given input, we would like the network after a short transient period to give a steady \nand fixed output, no matter what the initial network state was. This means that beginning \nany initial condition, the state is to be attracted towards a unique equilibrium. This leads to \nlooking for a condition on the matrix W. \n\nTheorem: A network (not necessarily symmetric) satisfying \n\nL L w'fi < l/max(J')2, \n\ni \n\ni \n\nexhibits no other behavior except going to a unique equilibrium for a given input. \n\nProof : Let udt) and U2(t) be two solutions of (1). Let \n\nwhere \" II is the two-norm. Differentiating J with respect to time, one obtains \n\nUsing (1) , the expression becomes \n\ndJ(t) \n-d- = --lluI(t) - u2(t))11 + -(uI(t) - U2(t)) W f uI(t) - f uz(t) \n\n[ ( \n\nT \n\n) \n\n( \n\n)] \n. \n\nt \n\n2 \n1\" \n\n2 \n\n2 \n.,. \n\nUsing Schwarz's Inequality, we obtain \n\nAgain, by Schwarz's Inequality, \n\ni = 1, ... ,n \n\n(2) \n\nwhere Wi denotes the ith row of W. Using the mean value theorem, we get \n\nIlf(udt)) - f(U2(t))II ~ (maxl!'I)IIUl(t) - uz(t)ll. \n\n(3) \n\nUsing (2),(3), and the expression for J(t), we get \n\nd~~t) ~ -aJ(t) \n\n(4) \n\nwhere \n\n\f25 \n\nBy hypothesis of the Theorem, a is strictly positive. Multiplying both sides of (4) by exp( at), \nthe inequality \n\nresults, from which we obtain \n\nJ(t) ~ J(O)e- at . \n\nFrom that and from the fact that J is non-negative, it follows that J(t) goes to zero as t -+ \nawkp \n\n(A - 1) \n\nwhere b kp is the nt-dimensional vector whose ith component is given by \n\nBy the chain rule, \n\nb~l> = {y~ \n0 \n\u2022 \n\nifi = k \notherwise. \n\naea _ \"\"' aea ay; \n-a I -L..,-a I -a I ' \nw kp \nYj w kp \n\nj \n\nwhich, upon substituting from (A - 1), can be put in the form y!,gk~' where gk is the kth \ncolumn of (A' - Wt)-l. Finally, we obtain the required expression, which is \n\nae\" = [At _ (WI)T] -1 ae\" ( ,)T \naw' \n. \n\nayl y \n\nRegarding a()~~I' it is obtained by differentiating (5) with respect to vr~,. We get similarly \n\nwhere C kl' is the nt-dimensional vector whose ith component is given by \n\nif i = k \notherwise. \n\n\f30 \n\nA derivation very similar to the case of :~l results in the following required expression: \n\nBea = [A' _ (w,)T] -1 Bea ( r)T. \nBVrl \n\nBy' y \n\n8 \n\n8 \n\nj \n\nNow, finally consider ~. Let ~, jf.B, be the matrix whose (k,p)th element is ~. The \nelements of ~ can be obtained by differentiating the equation for the fixed point for group \n. \nJ, as follows, \n\nuy \n\n8 y J \n\nHence, \n\n:~~. = (Ai - Wi) -IV'i. \n\n(A - 2) \n\nUsing the chain rule, one can write \n\nBea = ~(ByJ) Bea \nBy' ~ Byl By;' \n\n\u00b7T \n\nJEEr \n\nWe substitute from (A - 2) into the previous equation to complete the derivation by obtaining \n\nReferences \n\n111 P. Werbos, \"Beyond regression: New tools for prediction and analysis in behavioral sci(cid:173)\n\nences\", Harvard University dissertation, 1974. \n\n[21 D. Parker, \"Learning logic\", MIT Tech Report TR-47, Center for Computational Research \n\nin Economics and Management Science, 1985. \n\n[31 Y. Le Cun, \"A learning scheme for asymmetric threshold network\", Proceedings of Cog(cid:173)\n\nnitiva, Paris, June 1985. \n\n[41 D. Rumelhart, G.Hinton, and R. Williams, \"Learning internal representations by error \npropagation\", in D. Rumelhart, J. McLelland and the PDP research group (Eds.), Parallel \ndistributed processing: Explorations in the microstructure of cognition, Vol. 1, MIT Press, \nCambridge, MA, 1986. \n\n151 J. Hopfield, \"Neurons with graded response have collective computational properties like \n\nthose of two-state neurons\", Proc. N atl. Acad. Sci. USA, May 1984. \n\n[61 L. Ahneida, \" A learning rule for asynchronous perceptrons with feedback in a combinato(cid:173)\n\nrial environment\", Proc. of the First Int. Annual Conf. on Neural Networks, San Diego, \nJune 1987. \n\n[71 R. Rohwer, and B. Forrest, \"Training time-dependence in neural networks\", Proc. of the \n\nFirst Int. Annual Conf. on Neural Networks, San Diego, June 1987. \n\n[81 F. Pineda, \"Generalization of back-propagation to recurrent neural networks\", Phys. Rev. \n\nLett., vol. 59, no. 19, 9 Nov. 1987. \n\n\f", "award": [], "sourceid": 9, "authors": [{"given_name": "Amir", "family_name": "Atiya", "institution": null}]}