{"title": "Optimal Depth Neural Networks for Multiplication and Related Problems", "book": "Advances in Neural Information Processing Systems", "page_first": 59, "page_last": 64, "abstract": null, "full_text": "Optimal Depth Neural Networks for Multiplication \n\nand Related Problems \n\nDept. of Electrical & Compo Engineering \n\nUniversity of California, Irvine \n\nKai-Yeung Siu \n\nIrvine, CA 92717 \n\nVwani Roychowdhury \n\nSchool of Electrical Engineering \n\nPurdue University \n\nWest Lafayette, IN 47907 \n\nAbstract \n\nAn artificial neural network (ANN) is commonly modeled by a threshold \ncircuit, a network of interconnected processing units called linear threshold \ngates. The depth of a network represents the number of unit delays or the \ntime for parallel computation. The SIze of a circuit is the number of gates \nand measures the amount of hardware . It was known that traditional logic \ncircuits consisting of only unbounded fan-in AND, OR, NOT gates would \nrequire at least O(log n/log log n) depth to compute common arithmetic \nfunctions such as the product or the quotient of two n-bit numbers, unless \nwe allow the size (and fan-in) to increase exponentially (in n). We show in \nthis paper that ANNs can be much more powerful than traditional logic \ncircuits. In particular, we prove that that iterated addition can be com(cid:173)\nputed by depth-2 ANN, and multiplication and division can be computed \nby depth-3 ANNs with polynomial size and polynomially bounded integer \nweights, respectively. Moreover, it follows from known lower bound re(cid:173)\nsults that these ANNs are optimal in depth. We also indicate that these \ntechniques can be applied to construct polynomial-size depth-3 ANN for \npowering, and depth-4 ANN for mUltiple product. \n\n1 \n\nIntroduction \n\nRecent interest in the application of artificial neural networks [10, 11] has spurred \nresearch interest in the theoretical study of such networks. In most models of neu(cid:173)\nral networks, the basic processing unit is a Boolean gate that computes a linear \n\n59 \n\n\f60 \n\nSiu and Roychowdhury \n\nthreshold function, or an analog element that computes a sigmoidal function. Arti(cid:173)\nficial neural networks can be viewed as circuits of these processing units which are \nmassively interconnected together. \nWhile neural networks have found wide application in many areas, the behavior \nand the limitation of these networks are far from being understood. One common \nmodel of a neural network is a threshold circuit. Incidentally, the study of threshold \ncircuits, motivated by some other complexity theoretic issues, has also gained much \ninterest in the area of computer science. Threshold circuits are Boolean circuits in \nwhich each gate computes a linear threshold function, whereas in the classical model \nof unbounded fan-in Boolean circuits only AND, OR, NOT gates are allowed. A \nBoolean circuit is usually arranged in layers such that all gates in the same layer are \ncomputed concurrently and the circuit is computed layer by layer in some increasing \ndepth order. We define the depth as the number of layers in the circuit. Thus each \nlayer represents a unit delay and the depth represents the overall delay in the \ncomputation of the circuit . \n\n2 Related Work \n\nTheoretical computer scientists have used unbounded fan-in Boolean circuits as \na model to understand fundamental issues of parallel computation. To be more \nspecific, this computational model should be referred to as unbounded fan-in paral(cid:173)\nlelism, since the number of inputs to each gate in the Boolean circuit is not bounded \nby a constant. The theoretical study of unbounded fan-in parallelism may give us \ninsights into devising faster algorithms for various computational problems than \nwould be possible with bounded fan-in parallelism. In fact, any nondegenerate \nBoolean function of n variables requires at least O(log n) depth to compute in a \nbounded fan-in circuit. On the other hand, in some practical situations, (for ex(cid:173)\nample large fan-in circuits such as programmable logic arrays (PLAs) or multiple \nprocessors simultaneously accessing a shared bus), unbounded fan-in parallelism \nseems to be a natural model. For example, a PLA can be considered as a depth-2 \nAND/OR circuit. \nIn the Boolean circuit model, the amount of resources is usually measured by the \nnumber of gates, and is considered to be 'reasonable' as long as it is bounded \nby a polynomial (as opposed to exponential) in the number of the inputs. For \nexample, a Boolean circuit for computing the sum of two n-bit numbers with O(n3 ) \ngates is 'reasonable', though circuit designers might consider the size of the circuit \nimpractical for moderately large n. One of the most important theoretical issues in \nparallel computation is the following: Given that the number of gates in the Boolean \ncircuit is bounded by a polynomial in the size of inputs, what is the minimum depth \n(i.e. number of layers) that is needed to compute certain functions? \n\nA first step toward answering this important question was taken by Furst et al. [4] \nand independently by Ajtai [2]. It follows from their results that for many basic \nfunctions, such as the parity and the majority of n Boolean variables, or the multi(cid:173)\nplication of two n-bit numbers, any constant depth (i. e. independent of n) classical \nBoolean circuit of unbounded fan-in AND/OR gates computing these functions \nmust have more than a polynomial (in n) number of gates. This lower bound on \nthe size was subsequently improved by Yao [18] and Hastad [7]; it was proved that \n\n\fOptimal Depth Neural Networks for Multiplication and Related Problems \n\n61 \n\nindeed an exponential number of AND/OR gates are needed. So functions such as \nparity and majority are computationally 'hard' with respect to constant depth and \npolynomial size classical Boolean circuits. Another way of interpreting these results \nis that circuits of AND/OR gates computing these 'hard' functions which use poly(cid:173)\nnomial amount of chip area must have unbounded delay (i. e. delay that increases \nwith n). In fact, the lower bound results imply that the minimum possible delay \nfor multipliers (with polynomial number of AND/OR gates) is O(logn/loglogn). \nThese results also give theoretical justification why it is impossible for circuit de(cid:173)\nsigners to implement fast parity circuit or multiplier in small chip area using AND, \nOR gates as the basic building blocks. \n\nOne of the 'hard' functions mentioned above is the majority function, a special case \nof a threshold function in which the weights or parameters are restricted. A natural \nextension is to study Boolean circuits that contain majority gates. This type of \nBoolean circuit is called a threshold circuit and is believed to capture some aspects \nof the computation in our brain [12]. In the rest of the paper, the term 'neural \nnetworks' refers to the threshold circuits model. \n\nWith the addition of majority gates, the resulting Boolean circuit model seems \nmuch more powerful than the classical one. Indeed, it was first shown by Muroga \n[13] three decades ago that any symmetric Boolean function (e.g. parity) can be \ncomputed by a two-layer neural network with (n + 1) gates. Recently, Chandra \net al. [3] showed that multiplication of two n-bit numbers and sorting of n n-bit \nnumbers can be computed by neural networks with 'constant' depth and polynomial \nsize. These 'constants' have been significantly reduced by Siu and Bruck [14, 15] to \n4 in both cases, whereas a lower bound of depth-3 was proved by Hajnal et al. [6] \nin the case of multiplication. It is now known [8] that the size of the depth-4 neural \nnetworks for multiplication can be reduced to O(n2 ). However, the existence of \ndepth-3 and polynomial-size neural networks for multiplication was left as an open \nproblem [6, 5, 15] since the lower bound result in [6]. In [16], some depth-efficient \nneural networks were constructed for division and related arithmetic problems; the \nnetworks in [16] do not have optimal depth. \n\nOur main contribution in this paper is to show that small constant depth neural \nnetworks for multiplication, division and related problems can be constructed. For \nthe problems such as iterated addition, multiplication, and division, the neural net(cid:173)\nworks constructed can be shown to have optimal depth. These results have the \nfollowing implication on their practical significance: Suppose we can use analog de(cid:173)\nvices to build threshold gates with a cost (in terms of delay and chip area) that is \ncomparable to that of AND, OR, logic gates, then we can compute many basic func(cid:173)\ntions much faster than using traditional circuits. Clearly, the particular weighting \nof depth, fan-in, and size that gives a realistic measure of a network's cost and speed \ndepends on the technology used to build it. One case where circuit depth would \nseem to be the most important parameter is when the circuit is implemented using \noptical devices. We refer those who are interested in the optical implementation of \nneural networks to [1]. \n\nDue to space limitations, we shall only state some of the important results; further \nresults and detailed proofs will appear in the journal version of this paper [17]. \n\n\f62 \n\nSiu and Roychowdhury \n\n3 Main Results \n\nDefinition 1 Given n n-bit integers, Zi = Lj~; zi,i2i, i = 1, ... , n, zi,i E {O, I}, \nWe define iterated addition to be the problem of computing the (n + log n )-bit sum \nL~=l Zi of the n integers. \nDefinition 2 Given 2 n-bit integers, x = Lj==-~ xi2i and Y = Lj==-~ Yi2i. We \ndefine multiplication to be the problem of computing the (2n)-bit product of x and \ny. \n\nUsing the notations of [15], let us denote the class of depth-d polynomial-size neural \nnetworks where the (integer) weights are polynomially bounded by & d and the \ncorresponding class where the weights are unrestricted by LTd. It is easy to see that \nif it~ated addition can be computed in &2, then multiplication can be computed \nin LT3 . We first prove the result on iterated addition. Our result hinges on a \nrecent striking result of Goldmann, Hcistad and Razborov [5]. The key observation \nis that iterated addition can be computed as a sum of polynomially many linear \nthreshold (LTd functions (with exponential weights). Let us first state the result \nof Goldmann, Hastad and Razborov [5]. \n\n[5] Let LTd denote the class of depth-d polynomial-size neural net(cid:173)\n\nLemma 1 \nworks where the weights at the output gate are polynomially bounded integers (with \nno restriction on the weights of the other gates). Then LTd = & d for any fixed \ninteger d ~ 1. \n\nThe following lemma is a generalization of the result in [13]. Informally, the result \nsays that if a function is 1 when a weighted sum (possibly exponential) of its inputs \nlies in one of polynomially many intervals, and is 0 otherwise, then the function can \nbe computed as a sum of polynomially many LTI functions. \n\nLet S = L7=1 WiXi and f(X) be a function such that f = 1 if S E \nLemma 2 \n[Ii, ud for i = 1, ... , Nand f = 0 otherwise, where N is polynomially bounded. \nThe~ can be computed as a sum of polynomially many LTI functions and thus \nf E LT2 \u00b7 \n\nCombining the above two lemmas yields a depth-2 neural network for iterated ad(cid:173)\ndition. \n\nTheorem 1 \n\nIterated addition is in LT2 \u2022 \n\n. -\n\nIt is also easy to see that iterated addition cannot be computed in LTI . Simply \nobserve that the first bit of the sum is the parity function, which does not belong \nto LT1 . Thus the above neural network for iterated addition has minimum possible \ndepth. \nTheorem 2 Multiplication of 2 n-bit integers can be computed in LT3. \nIt follows from the results in [6] that the depth-3 neural network for multiplication \nstated in the above theorem has optimal depth. \n\n\fOptimal Depth Neural Networks for Multiplication and Related Problems \n\n63 \n\nWe can further apply the results in [5] to construct small depth neural networks for \ndivision, powering and multiple product. Let us give a formal definition of these \nproblems. \n\nLet X be an input n-bit integer ~ O. We define powering to be the \n\nDefinition 3 \nn 2-bit representation of xn. \nDefinition 4 Given n n-bit integers Zi, i = 1, ... , n, We define multiple product \nto be the n2-bit representation of n~=l Zi. \n\nSuppose we want to compute the quotient of two integers. Some quotient in bi(cid:173)\nnary representation might require infinitely many bits, however, a circuit can only \ncompute the most significant bits of the quotient. If a number has both finite and \ninfinite binary representation (for example 0.1 = 0.0111 ... ), we shall always express \nthe number in its finite binary representation. We are interested in computing the \ntruncated quotient, defined below: \n\nLet X and Y ~ 1 be two input n bit integers. Let X /Y = \nDefinition 5 \nL~;~oo zi 2i be the quotient of X divided by Y. We define DIVk(X/Y) to be \nX/Y truncated to the (n + k)-bit number, i.e. \n\no \n\nIn particular, DIVo(X /Y) is l X /Y J, the greatest integer ~ X /Y. \n\nTheorem 3 \n\n-. \n1. Powering can be computed in LT3 . \n2. DIVk(x/y) can be computed in Lr3 . \n3. Multiple Product can be computed in LT4 . \n\nIt can be shown from the lower-bound results in [9] that the neural networks for \ndivision are optimal in depth. \n\nReferences \n\n[1] Y. S. Abu-Mostafa and D. Psaltis. Optical Neural Computers. Scientific American \n\n, 256(3) :88-95, 1987. \n\n[2] M. Ajtai. L~ -formulae on finite structures. Annals of Pure and Applied Logic, \n\n24:1-48, 1983. \n\n[3] A. K. Chandra, 1. Stockmeyer, and U. Vishkin. Constant depth reducibility. Siam \n\nJ. Comput., 13:423-439, 1984. \n\n[4] M. Furst, J. B. Saxe, and M. Sipser. Parity, Circuits and the Polynomial-Time \n\nHierarchy. IEEE Symp. Found. Compo Sci., 22:260-270, 1981. \n\n[5] M. Goldmann, J. Hastad, and A. Razborov. Majority Gates vs. General Weighted \n\nThreshold Gates. preprint, 1991. \n\n\f64 \n\nSiu and Roychowdhury \n\n[6] A. Hajnal, W. Maass, P. Pudlak, M. Szegedy, and G. Turan. Threshold circuits of \n\nbounded depth. IEEE Symp. Found. Compo Sci., 28:99-110, 1987. \n\n[7] J. H1stad and M. Goldmann. On the power of small-depth threshold circuits. \n\nInProceedings of the 31st IEEE FOCS, pp. 610-618, 1990. \n\n[8] T. Hofmeister, W. Hohberg and S. Kohling . Some notes on threshold circuits and \n\nmultiplication in depth 4. Information Processing Letters, 39:219-225, 1991. \n\n[9] T. Hofmeister and P. PudIa.k, A proof that division is not in TC~. Forschungsbericht \n\nNr. 447, 1992, Uni Dortmund. \n\n[10] J. J. Hopfield. Neural Networks and physical systems with emergent collective com(cid:173)\nputational abilities. Proceedings of the National Academy of Sciences, 79:2554-2558, \n1982. \n\n[11] J. L. McClelland D. E. Rumelhart and the PDP Research Group. Parallel Distributed \nProcessing: Explorations in the Microstructure of Cognition, vol. 1. MIT Press, 1986. \n[12] W. S. McCulloch and W. Pitts. A Logical Calculus of Ideas Immanent in Nervous \n\nActivity. Bulletin of Mathematical Biophysics, 5:115-133, 1943. \n\n[13] S. Muroga. The principle of majority decision logic elements and the complexity of \n\ntheir circuits. Inti. Con/. on Information Processing, Paris, France, June 1959. \n\n[14] K. Y. Siu and J. Bruck. Neural Computation of Arithmetic Functions. Proc. IEEE, \n\n78, No. 10:1669-1675, October 1990. Special Issue on Neural Networks. \n\n[15] K.-Y. Siu and J. Bruck. On the Power of Threshold Circuits with Small Weights. \n\nSIAM J. Discrete Math., 4(3):423-435, August 1991. \n\n[16] K.-Y. Siu, J. Bruck, T. Kailath, and T. Hofmeister. Depth-Efficient Neural Networks \nfor Division and Related Problems . to appear in IEEE Trans. Information Theory, \n1993. \n\n[17] K.-Y. Siu and V. Roychowdhury. On Optimal Depth Threshold Circuits for Mulit(cid:173)\n\nplication and Related Problems. to appear in SIAM J. Discrete Math. \n\n[18] A. Yao. Separating the polynomial-time hierarchy by oracles. IEEE Symp. Found. \n\nCompo Sci., pages 1-10, 1985. \n\n\f", "award": [], "sourceid": 657, "authors": [{"given_name": "Kai-Yeung", "family_name": "Siu", "institution": null}, {"given_name": "Vwani", "family_name": "Roychowdhury", "institution": null}]}