{"title": "Training a Limited-Interconnect, Synthetic Neural IC", "book": "Advances in Neural Information Processing Systems", "page_first": 777, "page_last": 784, "abstract": null, "full_text": "777 \n\nTRAINING A \n\nLIMITED-INTERCONNECT, \n\nSYNTHETIC NEURAL IC \n\nM.R. Walker. S. Haghighi. A. Afghan. and L.A. Akers \n\nCenter for Solid State Electronics Research \n\nArizona State University \nTempe. AZ 85287-6206 \n\nmwalker@enuxha.eas.asu.edu \n\nABSTRACT \n\nHardware implementation of neuromorphic algorithms is hampered by \nhigh degrees of connectivity. Functionally equivalent feedforward \nnetworks may be formed by using limited fan-in nodes and additional \nlayers. but this complicates procedures for determining weight \nmagnitudes. No direct mapping of weights exists between fully and \nlimited-interconnect nets. Low-level nonlinearities prevent the \nformation of internal representations of widely separated spatial \nfeatures and the use of gradient descent methods to minimize output \nerror is hampered by error magnitude dissipation. The judicious use \nof linear summations or collection units is proposed as a solution. \n\nHARDWARE IMPLEMENTATIONS OF FEEDFORWARD, \n\nSYNTHETIC NEURAL SYSTEMS \n\nThe pursuit of hardware implementations of artificial neural network models is motivated \nby the need to develop systems which are capable of executing neuromorphic algorithms \nin real time. The most significant barrier is the high degree of connectivity required \nbetween the processing elements. Current interconnect technology does not support the \ndirect implementation of large-scale arrays of this type. \nIn particular. the high \nfan-in/fan-outs of biology impose connectivity requirements such that the electronic \nimplementation of a highly interconnected biological neural networks of just a few \nthousand neurons would require a level of connectivity which exceeds the current or even \nprojected interconnection density ofULSI systems (Akers et al. 1988). \n\nHighly layered. limited-interconnected architectures are however. especially well suited for \nVLSI implementations. \nIn previous works. we analyzed the generalization and \nfault-tolerance characteristics of a limited-interconnect perceptron architecture applied in \nthree simple mappings between binary input space and binary output space and proposed a \nCMOS architecture (Akers and Walker. 1988). This paper concentrates on developing an \nunderstanding of the limitations on layered neural network architectures imposed by \nhardware implementation and a proposed solution. \n\n\f778 \n\nWalker, Haghighi, Afghan and Akers \n\nTRAINING CONSIDERATIONS FOR \n\nLIMITED .. INTERCONNECT FEEDFORWARD NETWORKS \n\nThe symbolic layout of the limited fan-in network is shown in Fig. 1. Re-arranging of \nthe individual input components is done to eliminate edge effects. Greater detail on the \nactual hardware architecture may be found in (Akers and Walker, 1988) As in linear \nfilters, the total number of connections which fan-in to a given processing element \ndetermines the degrees of freedom available for forming a hypersurface which implements \nthe desired node output function (Widrow and Stearns, 1985). When processing elements \nwith fixed, low fan-in are employed, the affects of reduced degrees of freedom must be \nconsidered in order to develop workable training methods which permit generalization of \nnovel inputs. First. no direct or indirect relation exists between weight magnitudes \nobtained for a limited-interconnect, multilayered perceptron, and those obtained for the \nfully connected case. Networks of these types adapted with identical exemplar sets must \ntherefore fonn completely different functions on the input space. Second, low-level \nnonlinearities prevent direct internal coding of widely separated spatial features in the \ninput set. A related problem arises when hyperplane nonlinearities are used. Multiple \nhyperplanes required on a subset of input space are impossible when no two second level \nnodes address identical positions in the input space. Finally, adaptation methods like \nbackpropagation which minimize output error with gradient descent are hindered since the \nmagnitude of the error is dissipated as it back-propagates through large numbers of hidden \nlayers. The appropriate placement of linear summation elements or collection units is a \nproposed solution. \n\n1 \n\n2 \n\n3 4 \n\n5 \n\n6 7 \n\n12 \n\n11 10 \n\n9 \n\n8 \n\nFigure 1. Symbolic Layout of Limited-Interconnect Feedforward Architecture \n\n\fTraining a Limited-Interconnect, Synthetic Neural Ie \n\n779 \n\nCOMPARISON OF WEIGHT VALVES IN FULLY \n\nCONNECTED AND LIMITED-INTERCONNECT NETWORKS \n\nFully connected and limited-interconnect feedforward structures may be functionally \nequivalent by virtue of identical training sets, but nonlinear node discriminant functions \nin a fully-connected perceptron network are generally not equivalent to those in a \nlimited-interconnect, multilayered network. This may be shown by comparing the Taylor \nseries expansion of the discriminant functions in the vicinity of the threshold for both \ntypes and then equating terms of equivalent order. A simple limited-interconnect network \nis shown in Fig. 2. \n\nx1 \n\nx2 \n\nx3 \n\nx4 \n\ny3 \n\nFigure 2. Limited-Interconnect Feedforward Network \n\nA discriminant function with a fan-in of two may be represented with the following \nfunctional form, \n\nwhere e is the threshold and the function is assumed to be continuously differentiable. \nThe Taylor series expansion of the discriminant is, \n\nExpanding output node three in Fig. 2 to second order, \n\nwhere fee), fee) and f'(e) are constant terms. Substituting similar expansions for Yl and \nY2 into Y3 yields the expression, \n\n\f780 \n\nWalker, Haghighi, Afghan and Akers \n\nThe output node in the fully-connected case may also be expanded, \n\nx1 \n\nx2 \n\nx3 \n\nx4 \n\n__ ~y3 \n\nFigure 3. Fully Connected Network \n\nwhere \n\nExpanding to second order yields, \n\nWe seek the necessary and sufficient conditions for the two nonlinear discriminant \nfunctions to be analytically equivalent. This is accomplished by comparing terms of \nequal order in the expansions of each output node in the two nets. Equating the constant \nterms yields, \n\nw =-w \n5 \n6 \n\nEquating the fIrst order terms, \n\nW =W = 1 \n\n5 \n\n6 \n\nf(9) \n\nEquating the second order terms, \n\n\fTraining a Limited-Interconnect, Synthetic Neural Ie \n\n781 \n\nThe ftrst two conditions are obviously contradictory. In addition, solving for w5 or w6 \nusing the ftrst and second constraints or the frrst and third constraints yields the trivial \nresult, w5=w6=O. Thus, no relation exists between discriminant functions occurring in \nthe limited and fully connected feedforward networks. This eliminates the possibility \nthat weights obtained for a fully connected network could be transformed and used in a \nlimited-interconnect structure. More signiftcant is the fact that full and limited \ninterconnect nets which are adapted with identical sets of exemplars must form completely \ndifferent functions on the input space, even though they exhibit identical output behavior. \nFor this reason, it is anticipated that the two network types could produce different \nresponses to a novel input. \n\nNON-OVERLAPPING INPUT SUBSETS \n\nSignal routing becomes important for networks in which hidden units do not address \nidentical subsets in the proceeding layer. Figure 4 shows an odd-parity algorithm \nimplemented with a limited-interconnect architecture. Large weight magnitudes are \nindicated by darker lines. Many nodes act as \"pass-through\" elements in that they have \nfew dominant input and output connections. These node types are necessary to pass lower \nIn general, the use of limited fan-in \nlevel signals to common aggregation points. \nprocessing elements implementing a nonlinear discriminant function decreases the \nprobability that a given correlation within the input data will be encoded, especially if \nthe \"width\" of the feature set is greater than the fan-in, requiring encoding at a high level \nwithin the net. In addition, since lower-level connections determine the magnitudes of \nupper level connections in any layered net when baclcpropagation is used, the set of points \nin weight space available to a limited-interconnect net for realizing a given function is \nfurther reduced by the greater number of weight dependencies occurring in \nlimited-interconnect networks, all of which must be satisfted during training. Finally, \nsince gradient descent is basically a shortcut through an NP-complete search in weight \nspace, reduced redundancy and overlapping of internal representations reduces the \nprobability of convergence to a near-optimal solution on the training set. \n\nDISSIPATION OF ERROR MAGNITUDE WITH \n\nINCREASING NUMBERS OF LAYERS \n\nFollowing the derivation of backpropagation in (plaut, 1986), the magnitude change for \na weight connecting a processing element in the m-Iayer with a processing element in the \nI-layer is given by, \n\nwhere \n\n\f782 \n\nWalker, Haghighi, Afghan and Akers \n\nFigure 4. Six-input odd parity function implemented with limited-interconnect \n\nthen \n\n[ f [f [f \n\na=l \n\ni..J \nk=l \n\n'1 \nJ= \n\n] dy 1 dy \n\n) dx \nk \n\nl-k dx m \n\n1 \n\n= ~ L\u00b7\u00b7\u00b7 L(ya-da) dx wb-a \u2022 \"-d wk_, _k W ~ \n\n] \n\ndYa\na \n\ndy j \nx, \n) \n\nWhere y is the output of the discriminant function, x is the activation level, w is a \nconnection magnitude, and f is the fan-in for each processing element. If N layers of \nelements intervene between the m-layer and the output layer, then each of the f (N-l) \ntenns in the above summation consists of the product, \n\n\fTraining a Limited-Interconnect, Synthetic Neural Ie \n\n783 \n\n\u2022\u2022 '-d \n\ndy. \n) \nx \nj \n\nWb \n\n-a \n\nIf we replace the weight magnitudes and the derivatives in each tenn with their mean \nvalues, \n\nThe value of the first derivative of the sigmoid discriminant function is distributed \nbetween 0.0 and 0.5. The weight values are typically initially distrlbuted evenly between \nsmall positive and negative values. Thus with more layers. the product of the derivatives \noccurring in each tenn approaches zero. The use of large numbers of perceptron layers \ntherefore has the affect of dissipating the magnitude of the error. This is exacerbated by \nthe low fan-in, which reduces the total number of tenns in the summation. The use of \nlinear collection units (McClelland. 1986), discussed in the following section, is a \nproposed solution to this problem. \n\nLINEAR COLLECTION UNITS \n\nAs shown in Fig. 5, the output of the limited-interconnect net employing collection units \nis given by the function, \n\n[::>linear summation \n\no non-linear discriminant \n\ny3 \n\nx1 \n\nx2 \n\nx3 \n\nx4 \n\nFigure S. Limited-interconnect network employing linear summations \n\nwhere c 1 and c2 are constants. The position of the summations may be determined by \nusing euclidian k-means clustering on the exemplar set to a priori locate cluster centers \n\n\f784 \n\nWalker, Haghighi, Afghan and Akers \n\nand determine their widths (Duda and Hart, 1973). The cluster members would be \ncombined using linear elements until they reached a nonlinear discrminant, located higher \nin the net and at the cluster center. With this arrangement, weights obtained for a \nfully-connected net could be mapped using a linear transformation into the \nlimited-interconnect network. Alternatively, backpropagation could be used since error \ndissipation would be reduced by setting the linear constant c of the summation elements \nto arbitrarily large values. \n\nCONCLUSIONS \n\nNo direct transformation of weights exists between fully and limited interconnect nets \nwhich employ nonlinear discrmiminant functions. The use of gradient descent methods to \nminimize output error is hampered by error magnitude dissipation. In addition, low-level \nnonlinearities prevent the formation of internal representations of widely separated spatial \nfeatures. The use of strategically placed linear summations or collection units is proposed \nas \na means of overcoming difficulties in determining weight values in \nlimited-interconnect perceptron architectures. K-means clustering is proposed as the \nmethod for determining placement. \n\nReferences \n\nL.A. Akers, M.R. Walker, O.K. Ferry & R.O. Grondin, \"Limited Interconnectivity in \nSynthetic Neural Systems,\" in R. Eckmiller and C. v.d. Malsburg eds., Neural \nComputers. Springer-Verlag, 1988. \n\nL.A. Akers & M.R. Walker, \"A Limited-Interconnect Synthetic Neural IC,\" Proceedings \n\nof the IEEE International Conference on Neural Networks, p. 11-151,1988. \n\nB. Widrow & S.D. Stearns, Adaptive Signal Processing. Prentice-Hall, 1985. \n\nD.C. Plaut, SJ. Nowlan & G.E. Hinton, \"Experiments on Learning by Back \nPropagation,\" Carnegie-Mellon University, Dept. of Computer Science Technical \nReport, June, 1986. \n\nJ.L. McClelland, \"Resource Requirements of Standard and Programmable Nets,\" in D.E. \nRummelhart and J.L. McClelland eds., Parallel Distributed Processing -\nVolume 1: Foundations. MIT Press, 1986. \n\nR.O. Duda & P.E. Hart. Pattern Classification and Scene Analysis. Wiley, \n\n1973. \n\n\f", "award": [], "sourceid": 179, "authors": [{"given_name": "M.", "family_name": "Walker", "institution": null}, {"given_name": "S.", "family_name": "Haghighi", "institution": null}, {"given_name": "A.", "family_name": "Afghan", "institution": null}, {"given_name": "Larry", "family_name": "Akers", "institution": null}]}