{"title": "Subgrouping Reduces Complexity and Speeds Up Learning in Recurrent Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 638, "page_last": 641, "abstract": null, "full_text": "638 \n\nZipser \n\nSubgrouping Reduces Complexity and Speeds Up \n\nLearning in Recurrent Networks \n\nDavid Zipser \n\nDepartment of Cognitive Science \nUniversity of California, San Diego \n\nLa Jolla, CA 92093 \n\n1 INTRODUCTION \n\nRecurrent nets are more powerful than feedforward nets because they allow simulation of \ndynamical systems. Everything from sine wave generators through computers to the brain are \npotential candidates, but to use recurrent nets to emulate dynamical systems we need learning \nalgorithms to program them. Here I describe a new twist on an old algorithm for recurrent nets \nand compare it to its predecessors. \n\n2 BPTT \n\nIn the beginning there was BACKPROPAGATION THROUGH TUvffi (BPTT) which was \ndescribed by Rumelhart, Williams, and Hinton (1986). The idea is to add a copy of the whole \nrecurrent net to the top of a growing feedforward network on each update cycle. Backpropa(cid:173)\ngating through this stack corrects for past mistakes by adding up all the weight changes from \npast times. A difficulty with this method is that the feedforward net gets very big. The obvious \nsolution is to truncate it at a fixed number of copies by killing an old copy every time a new \ncopy is added. The truncated-BPTT algorithm is illustrated in Figure 1. It works well, more \nabout this later. \n\n3RTRL \nIt turns out that it is not necessary to keep an ever growing stack of copies of the recurrent \nnet as BPTT does. A fixed number of parameters can record all of past time. This is done in \nthe REAL TI!\\.1E RECURRENT LEARNING (RTRL) algorithm of Williams and Zipser \n(1989). The derivation is given elsewhere (Rumelhart, Hinton, & Williams, 1986), but a \n\n\fSub grouping Reduces Complexity \n\n639 \n\nIN \n\nIN \n\nt-l \n\nIN \n\nt - k + 2 \n\nt - k + 1 \n\ni~ \nI \n\ni< \n~r \n\n-!::::~ \nl \n\ni;f~ \n\nFigure 1: BPTT. \n\n\f640 \n\nZipser \n\nsimple rational comes from the fact that error backpropagation is linear, which makes it \npossible to collapse the whole feedforward stack ofBPTT into a few fixed size data structures. \nThe biggest and most time consuming to update of these is the matrix of p values whose \nupdate rule is \n\nP it