{"title": "On Neural Networks with Minimal Weights", "book": "Advances in Neural Information Processing Systems", "page_first": 246, "page_last": 252, "abstract": null, "full_text": "On Neural Networks with Minimal \n\nWeights \n\nVasken Bohossian \n\nJ ehoshua Bruck \n\nCalifornia Institute of Technology \n\nMail Code 136-93 \n\nPasadena, CA 91125 \n\nE-mail: {vincent, bruck }\u00abIparadise. cal tech. edu \n\nAbstract \n\nLinear threshold elements are the basic building blocks of artificial \nneural networks. A linear threshold element computes a function \nthat is a sign of a weighted sum of the input variables. The weights \nare arbitrary integers; actually, they can be very big integers-(cid:173)\nexponential in the number of the input variables. However, in \npractice, it is difficult to implement big weights. In the present \nliterature a distinction is made between the two extreme cases: \nlinear threshold functions with polynomial-size weights as opposed \nto those with exponential-size weights. The main contribution of \nthis paper is to fill up the gap by further refining that separation. \nNamely, we prove that the class of linear threshold functions with \npolynomial-size weights can be divided into subclasses according \nto the degree of the polynomial. In fact, we prove a more general \nthat there exists a minimal weight linear threshold function \nresult-\nfor any arbitrary number of inputs and any weight size. To prove \nthose results we have developed a novel technique for constructing \nlinear threshold functions with minimal weights. \n\n1 \n\nIntroduction \n\nHuman brains are by far superior to computers for solving hard problems like combi(cid:173)\nnatorial optimization and image and speech recognition, although their basic build(cid:173)\ning blocks are several orders of magnitude slower. This observation has boosted \ninterest in the field of artificial neural networks [Hopfield 82]' [Rumelhart 82]. The \nlatter are built by interconnecting multiple artificial neurons (or linear threshold \ngates), whose behavior is inspired by that of biological neurons . Artificial neural \nnetworks have found promising applications in pattern recognition, learning and \n\n\fOn Neural Networks with Minimal Weights \n\n247 \n\nother data processing tasks. However most of the research has been oriented to(cid:173)\nwards the practical aspect of neural networks, simulating or building networks for \nparticular tasks and then comparing their performance with that of more traditional \nmethods for those particular tasks. To compare neural networks to other compu(cid:173)\ntational models one needs to develop the theoretical settings in which to estimate \ntheir capabilities and limitations. \n\n1.1 Linear Threshold Gate \n\nThe present paper focuses on the study of a single linear threshold gate (artificial \nneuron) with binary inputs and output as well as integer weights (synaptic coeffi(cid:173)\ncients). Such a gate is mathematically described by a linear threshold function. \n\nDefinition 1 (Linear Threshold FUnction) \nA linear threshold function of n variables is a Boolean function f \n{ -1, 1} that can be written as \n\n{ -1, I} n ~ \n\nf( .... ) -\n\nx - sgn \n\n(F( .... \u00bb - { \n\nx \n\n-\n\n1 \n1 \n\n-\n\n,for F(x) ~ 0 \n,0 erW1se \n\nth \n\n. \n\n, where F(x) = tV\u00b7 x = L WiXi \n\nn \n\ni=1 \n\nfor any x E {-1, 1}n and a fixed tV E zn. \n\nAlthough we could allow the weights Wi to be real numbers, it is known [Muroga 71), \n[Raghavan 88) that for a, binary input neuron, one needs O( n log n) bits per weight, \nwhere n is the number of inputs. So in the rest ofthe paper, we will assume without \nloss of generality that all weights are integers. \n\n1.2 Motivation \n\nMany experimental results in the area of neural networks have indicated that the \nmagnitudes of the coefficients in the linear threshold elements grow very fast with \nthe size of the inputs and therefore limit the practical use of the network. One \nnatural question to ask is the following. How limited is the computational power of \nthe network if one limits oneself to threshold elements with only \"small\" growth in \nthe size of the coefficients? To answer that question we have to define a measure of \nthe magnitudes of the weights. Note that, given a function I, the weight vector tV \nis not unique (see Example 1 below). \n\nDefinition 2 (Weight Space) \nGiven a lineal' threshold function f we define W as the set of all weights that satisfy \nDefinition 1, that is W = {UI E zn : Vx E {-1, 1}n,sgn(tV\u00b7 x) = f(x)}. \n\nHere follows a measure of the size of the weights. \n\nDefinition 3 (Minimal Weight Size) \nWe define the size of a weight vector as the sum of the absolute values of the weights. \nThe minimal weight size of a linear threshold function is defined as : \n\nS[j) = ~ia/L IWi I) \n\nn \n\n,=1 \n\nThe particular vector that achieves the minimum is called a minimal weight vector. \n\nNaturally, S[f) is a function of n. \n\n\f248 \n\nV. BOHOSSIAN, J. BRUCK \n\nIt has been shown [Hastad 94], [Myhill 61], [Shawe-Taylor 92], (Siu 91] that there \nexists a linear threshold function that can be implemented by a single threshold \nelement with exponentially growing weights, S[j] '\" 2'1, but cannot be implemented \nby a threshold element with smaller: polynomialy growing weights, S[j] '\" n d , d \nconstant. In light of that result the above question was dealt with by defining a \nclass within the set of linear threshold functions: the class of functions with \"small\" \n(Le. polynomialy growing) weights [Siu 91]. Most of the recent research focuses on \nthe power of circuits with small weights, relative to circuits with arbitrary weights \n[Goldmann 92], [Goldman 93]. Rather than dealing with circuits we are interested \nin studying a single threshold gate. The main contribution of the present paper is \nto further refine the division of small versus arbitrary weights. We separate the set \nof functions with small weights into classes indexed by d, the degree of polynomial \ngrowth and show that all of them are non-empty. In particular, we develop a \ntechnique for proving that a weight vector is minimal. We use that technique to \nconstruct a function of size S[j] = s for an arbitrary s. \n\n1.3 Approach \n\nThe main difficulty in analyzing the size of the weights of a threshold element is due \nto the fact that a single linear threshold function can be implemented by different \nsets of weights as shown in the following example. \n\nExample 1 (A Threshold FUnction with Minimal Weights) \nConsider the following two sets of weights (weight vectors). \ntih = (124), FI(X) = Xl + 2X2 + 4X3 \nW2 = (248), F2(X) = 2XI + 4X2 + 8X3 \n\nThey both implement the same threshold function \n\nf(X) = sgn(F2(x\u00bb = sgn(2FI (x\u00bb = sgn(FI (x\u00bb \n\nA closer look reveals that f(x) = sgn(x3), implying that none of the above weight \nvectors has minimal size. Indeed, the minimal one is W3 = (00 1) and S(J] = 1. \n\nIt is in general difficult to determine if a given set of weights is minimal [Amaldi 93], \n[Willis 63]. Our technique consists of limiting the study to only a particular subset \nof linear threshold functions, a subset for which it is possible to prove that a given \nweight vector is minimal. That subset is loosely defined by the requirement that \nthere exist input vectors for which f(x) = f( -x). The existence of such a vector, \ncalled a root of f, puts a constraint on the weight vector used to implement f. The \nlarger the set of roots - the larger the constraint on the set of weight vectors, which \nin turn helps determine the minimal one. A detailed description of the technique is \ngiven in Section 2. \n\n1.4 Organization \n\nHere follows a brief outline of the rest of the paper. Section 2 mathematically defines \nthe setting of the problem as well as derives some basic results on the properties \nof functions that admit roots. Those results are used as bUilding blocks for the \nproof of the main results in Section 3. It also introduces a construction method \nfor functions with minimal weights. Section 3 presents the main result: for any \nweight size, s, and any nunlber of inputs, n, there exists an n-input linear threshold \nfllllction that requires weights of size S[f] = s. Section 4 presents some applications \nof the result of Section 3 and indicates future research directions. \n\n\fOn Neural Networks with Minimal Weights \n\n249 \n\n2 Construction of Minimal Threshold Functions \n\nThe present section defines the mathematical tools used to construct functions with \nminimal weights. \n\n2.1 Mathematical setting \n\nWe are interested in constructing functions for which the minimal weight is easily \ndetermined. Finding the minimal weight involves a search, we are therefore inter(cid:173)\nested in finding functions with a constrained weight spaces. The following tools \nallows us to put constraints on W. \n\nDefinition 4 (Root Space of a Boolean Function) \nA vector v E {-I, 1} n such that 1 (V) = 1 (-V) is called a root of I. We define the \nroot space, R, as the set of all roots of I. \n\nDefinition 5 (Root Generator Matrix) \nFor a given weight vector w E W and a root v E R, the root generator matrix, \nG = (gij), is a (n x k)-matrix, with entries in {-I, 0,1}, whose rows 9 are orthogonal \nto w and equal to vat all non-zero coordinates, namely, \n\n1. Gw = 0 \n2. 9ij = \u00b0 or 9ij = Vj for all i and j. \n\nExample 2 (Root Generator Matrix) \nSuppose that we are given a linear threshold function specified by a weight \nvector w = \n(1,1,2,4,1,1,2,4). By inspection we determine one root v = \n(1,1,1,1, -1, -1, -1, -1). Notice that WI + W2 - W7 = \u00b0 which can be written \nas g. w = 0, where 9 = (1,1,0,0,0,0, -1,0) is a row of G. Set r= v - 2g. Since 9 \nis equal to vat all non-zero coordinates, r E {-I, I} n. Also r\u00b7 w = v\u00b7 w + g. w = 0. \nWe have generated a new root : r = (-1, -1, 1, 1, -1, -1, 1, -1). \n\nLemma 6 (Orthogonality of G and W) \nFor a given weight vector w E Wand a root v E R \n\nilGT = 0 \n\nholds for any weight vector il E W. \nProof. For an arbitrary il E Wand an arbitrary row, gi, of G, let if = v - 2gi. \nBy definition of gi, if E {-I,1}n and if\u00b7 w = 0. That implies I(if) = I(-if) : if \nil\u00b7 (v - 2gi) = \u00b0 and finally, since v\u00b7 il = \u00b0 we get il\u00b7 gi = 0. 0 \nis a root of I. For any weight vector il E W, sgn(il\u00b7 if) = sgn( -il\u00b7 if). Therefore \nLemma 7 (Minimality) \nFor a given weight vector w E W and a root v E R if rank( G) = n - 1 (Le. G \nhas n - 1 independent rows) and IWil = 1 for some i, then w is the minimal weight \nvector. \nProof. From Lemma 6 any weight vector il satisfies ilGT = O. rank( G) = n - 1 \nimplies that dim(W) = 1, i.e. all possible weight vectors are integer multiples of \neach other. Since IWi I = 1, all vectors are of the form il = kw, for k ~ 1. Therefore \nw has the smallest size. 0 \nWe complete Example 2 with an application of Lemma 7. \n\n\f250 \n\nV. BOHOSSIAN, J. BRUCK \n\nExample 3 (Minimality) \nGiven ill = (1,1,2,4,1,1,2,4) and v = (1,1,1,1, -1, -1, -1, -1) we can construct: \n\nG= \n\n1 0 0 0 -1 \n0 1 0 0 \n0 0 1 0 \n0 0 0 1 \n1 0 0 0 \n1 1 0 0 \n1 1 1 0 \n\n0 \n0 \n0 -1 \n0 \n0 -1 \n0 \n0 \n0 \n0 -1 \n0 \n0 \n\n0 \n0 \n0 \n0 -1 \n0 \n0 \n0 -1 \n0 \n0 -1 \n0 \n\nIt is easy to verify that rank( G) = n - 1 = 7 and therefore, by Lemma 7, ill is \nminimal and 8[/] = 16. \n\n2.2 Construction of minimal weight vectors \n\nIn Example 3 we saw how, given a weight vector, one can show that it is minimal. \nIn this section we present an example of a linear threshold function with minimal \nweight size, with an arbitrary number of input variables. \nWe would like to construct a weight vector and show that it is minimal. Let \nthe number of inputs, n, be even. Let ill consist of two identical blocks : \n(Wl,W2, ... ,Wn /2,Wl,W2, ... ,Wn /2)' Clearly, if = (1,1,; .. ,1,-1,-1, ... ,-1) is a root \nand G is the corresponding generator matrix. \n\n1 0 0 0 \n0 1 0 0 \n0 0 1 0 \n\n0 0 0 -1 \n0 0 0 \n0 0 0 \n\n0 \n0 -1 \n0 \n0 \n\n0 0 \n0 0 \n-1 0 \n\n0 \n0 \n0 \n\n0 \n0 \n0 \n\n0 \n0 \n0 \n\nG= \n\n0 0 0 0 \n0 0 0 0 \n\n0 1 0 \n0 0 1 \n\n0 \n0 \n\n0 \n0 \n\n0 0 \n0 0 \n\n0 -1 \n0 \n\n0 \n0 -1 \n\n3 The Main Result \n\nThe following theorem states that given an integer s and a number of variables n \nthere exists a function of n variables and minimal weight size s. \n\nTheorem 8 (Main Result) \nFor any pair (s,n) that satisfies \n\n, for n even \n, for n odd \n\n2. seven \n\nthere exists a linear threshold function of n variables, I, with minimal weight size \n8[J] = s. \nProof. Given a pair (s, n), that satisfies the above conditions we first construct \na weight vector w that satisfies E~l IWil = s, then show that it is the minimal \nweight vector ofthe function I(x) = sgn(w\u00b7X). The proof is shown only for n even. \nCONSTRUCTION. \n\n1. Define (at, a2, ... , an /2) = (1,1, ... , 1). \n\n\fOn Neural Networks with Minimal Weights \n\n251 \n\n2. If L,:::l a, < s/2 then increase by one the smallest a, such that a, < 2'-\n\nn/2 \n\n. 2 \n. \n\n(In the case of a tie take the Wi with smallest index i). \n\n3. Repeat the previous step until L~; ai = s /2 or (aI, a2, ... , aN) = \n\n(1,1,2,4, ... , 2~ - 2). \n\n4. Set w= (al,a2, ... ,an /2,al,a2, ... ,an/2)' \n\nBecause we increase the size by one unit at a time the algorithm will converge to the \ndesired result for any integer s that satisfies n ~ s ~ 2~. We have a construction \nfor any valid (s, n) pair. Let us show that w is minimal. \nMINIMALITY. Given that w = (aI, a2, ... , an /2, aI, a2, ... , aaj2) we find a root v = \n(1, 1, ... , 1, -1, -1, ... , -1) and n/2 rows of the generator matrix G corresponding to \nthe equations w, = wH ~. To form additional rows note that the first k ais are \npowers of two (where k depends on sand n). Those can be written as a, = L~:~ aj \nand generate k - 1 rows. And finally note that all other ai, i > k, are smaller than \n2k+l. Hence, they can be written as a binary expansion a, = L~:::l aijaj where \naij E {O, I}. There are -r - k such weights. G has a total of n -1 independent rows. \nrank(G) = n -1 and WI = 1, therefore by Lemma 7, tV is minimal and S[J] = s. 0 \nExample 4 (A Function of 10 variables and size S[fJ = 26) \nWe start with a = (1,1,1,1,1). We iterate: (1,1,2,1,1), (1,1,2,2,1), (1,1,2,2,2), \n(1,1,2, 3,2), (1,1,2,3,3) , (1,1,2,4,3), (1,1,2,4,4), and finally (1,1 , 2,4,5). The \nconstruction algorithm converges to a = (1,1,2,4,5). We claim that tV = (a, a) = \n(1,1,2,4,5,1,1,2,4,5) is minimal. Indeed, v = (1,1,1,1,1, -1, -1, -1, -1, -1) and \n\n1 0 0 0 0 -1 \n0 1 0 0 0 \n0 0 1 0 0 \n0 0 0 1 0 \n0 0 0 0 1 \n1 0 0 0 0 \n1 1 0 0 0 \n1 1 1 0 0 \n1 0 0 1 0 \n\n0 \n0 \n0 -1 \n0 \n0 -1 \n0 \n0 \n0 \n0 \n0 \n0 -1 \n0 \n0 \n0 \n0 \n0 \n0 \n\n0 \n0 \n0 \n0 -1 \n0 \n0 \n-1 \n\n0 \n0 \n0 \n0 \n0 -1 \n0 \n0 \n0 \n0 \n0 -1 \n0 \n0 -1 \n0 \n\nG= \n\nis a matrix of rank 9. \n\nExample 5 (Functions with Polynomial Size) \nThis example shows an application of Theorem 8. We define fred) as the set of \nlinear threshold functions for which S[I} ~ n d \u2022 The Theorem states that for any \neven n there exists a function 1 of n variables and minimum weight S[I] = nd \u2022 The \nimplication is that for all d, LT \n\nis a proper subset of LT \n\n-- (d- I) \n\n-- (d) \n\n4 Conclusions \n\nWe have shown that for any reasonable pair of integers (n, s), where s is even, there \nexists a linear threshold function of n variables with minimal weight size S[J} = s. \nWe have developed a novel technique for constructing linear threshold functions \nwith minimal weights that is based on the existence of root vectors. An interesting \napplication of our method is the computation of a lower bound on the number \nof linear threshold functions [Smith 66}. In addition, our technique can help in \nstudying the trade-otIs between a number of important parameters associated with \n\n\f252 \n\nV. BOHOSSIAN, 1. BRUCK \n\nlinear threshold (neural) circuits, including, the munber of elements, the number of \nlayers, the fan-in, fan-out and the size of the weights. \n\nAcknow ledgements \n\nThis work was supported in part by the NSF Young Investigator Award CCR-\n9457811, by the Sloan Research Fellowship, by a grant from the IBM Almaden \nResearch Center, San Jose, California, by a grant from the AT&T Foundation and \nby the center for Neuromorphic Systems Engineering as a part of the National \nScience Foundation Engineering Research Center Program; and by the California \nTrade and Commerce Agency, Office of Strategic Technology. \n\nReferences \n\n[Amaldi 93] E. Amaldi and V. Kann. The complexity andapproximabilityoffinding \nmaximum feasible subsystems of linear relations. Ecole Poly technique Federale \nDe Lausanne Technical Report, ORWP 93/11, August 1993. \n\n[Goldmann 92] M. Goldmann, J. Hastad, and A. Razborov. Majority gates vs. gen(cid:173)\neral weighted threshold gates. Computational Complexity, (2):277-300, 1992. \n[Goldman 93] M. Goldmann and M. Karpinski. Simulating threshold circuits by \n\nmajority circuits. In Proc. 25th ACM STOC, pages pp. 551- 560, 1993. \n\n[Hastad 94] .1. Hastad. On the size of weights for threshold gates. SIAM. J. Disc. \n\nMath., 7:484-492, 1994. \n\n[Hopfield 82) .1. Hopfield. Neural networks and physical systems with emergent col(cid:173)\nlective computational abilities. Proc. of the USA National Academy of Sciences, \n79:2554- 2558, 1982. \n\n[Muroga 71) M. Muroga. Threshold Logic and its Applications. Wiley-Interscience, \n\n1971. \n\n[Myhill 61) J. Myhill and W. H. Kautz. On the size of weights required for linear(cid:173)\ninput switching functions. IRE Trans. Electronic Computers, (EClO):pp. 288-\n290, 1961. \n\n[Raghavan 88] P. Raghavan. Learning in threshold networks: a computational \nmodel and applications. Technical Report RC 13859, IBM Research, July \n1988. \n\n[Rumelhart 82] D. Rumelhart and J. McClelland. Parallel distributed processing: \n\nExplorations in the microstructure of cognition. MIT Press, 1982. \n\n[Shawe-Taylor 92] J. S. Shawe-Taylor, M. H. G. Anthony, and W. Kern. Classes \nof feedforward neural networks and their circuit complexity. Neural Networks, \nVol. 5:pp. 971- 977, 1992. \n\n[Siu 91] K. Siu and J. Bruck. On the power of threshold circuits with small weights. \n\nSIAM J. Disc. Math., Vol. 4(No. 3):pp. 423-435, August 1991. \n\n[Smith 66] D. R. Smith. Bounds on the number of threshold functions. \n\nTransactions on Electronic Computers, June 1966. \n\nIEEE \n\n[Willis 63] D. G. Willis. Minimum weights for threshold switches. In Switching \nTheory in Space Techniques. Stanford University Press, Stanford, Calif., 1963. \n\n\f", "award": [], "sourceid": 1066, "authors": [{"given_name": "Vasken", "family_name": "Bohossian", "institution": null}, {"given_name": "Jehoshua", "family_name": "Bruck", "institution": null}]}