{"title": "Neural Computation with Winner-Take-All as the Only Nonlinear Operation", "book": "Advances in Neural Information Processing Systems", "page_first": 293, "page_last": 299, "abstract": null, "full_text": "Neural Computation with Winner-Take-All as \n\nthe only Nonlinear Operation \n\nWolfgang Maass \n\nInstitute for Theoretical Computer Science \n\nTechnische UniversWit Graz \n\nA-8010 Graz, Austria \n\nemail: maass@igi.tu-graz.ac.at \n\nhttp://www.cis.tu-graz.ac.atiigi/maass \n\nAbstract \n\nEverybody \"knows\" that neural networks need more than a single layer \nof nonlinear units to compute interesting functions. We show that this is \nfalse if one employs winner-take-all as nonlinear unit: \n\n\u2022 Any boolean function can be computed by a single k-winner-take(cid:173)\n\nall unit applied to weighted sums of the input variables. \n\n\u2022 Any continuous function can be approximated arbitrarily well by \na single soft winner-take-all unit applied to weighted sums of the \ninput variables. \n\n\u2022 Only positive weights are needed in these (linear) weighted sums. \nThis may be of interest from the point of view of neurophysiology, \nsince only 15% of the synapses in the cortex are inhibitory. In addi(cid:173)\ntion it is widely believed that there are special microcircuits in the \ncortex that compute winner-take-all. \n\n\u2022 Our results support the view that winner-take-all is a very useful \n\nbasic computational unit in Neural VLS!: \no \n\nit is wellknown that winner-take-all of n input variables can \nbe computed very efficiently with 2n transistors (and a to(cid:173)\ntal wire length and area that is linear in n) in analog VLSI \n[Lazzaro et at., 1989] \n\no we show that winner-take-all is not just useful for special pur(cid:173)\npose computations, but may serve as the only nonlinear unit for \nneural circuits with universal computational power \n\no we show that any multi-layer perceptron needs quadratically in \nn many gates to compute winner-take-all for n input variables, \nhence winner-take-all provides a substantially more powerful \ncomputational unit than a perceptron (at about the same cost \nof implementation in analog VLSI). \n\nComplete proofs and further details to these results can be found in \n[Maass, 2000]. \n\n\f294 \n\n1 Introduction \n\nW. Maass \n\nComputational models that involve competitive stages have so far been neglected in com(cid:173)\nputational complexity theory, although they are widely used in computational brain models, \nartificial neural networks, and analog VLSI. The circuit of [Lazzaro et aI., 1989] computes \nan approximate version of winner-take-all on n inputs with just 2n transistors and wires \noflength O(n), with lateral inhibition implemented by adding currents on a single wire of \nlength O( n). Numerous other efficient implementations of winner-take-all in analog VLSI \nhave subsequently been produced. Among them are circuits based on silicon spiking neu(cid:173)\nrons ([Meador and Hylander, 1994], [Indiveri, 1999]) and circuits that emulate attention in \nartificial sensory processing ([Horiuchi et aI., 1997], [Indiveri, 1999]). Preceding analytical \nresults on winner-take-all circuits can be found in [Grossberg, 1973] and [Brown, 1991]. \n\nWe will analyze in section 4 the computational power of the most basic competitive compu(cid:173)\ntational operation: winner-take-all (= l-WTAn). In section 2 we will discuss the somewhat \nmore complex operation k-winner-take-all (k-WTA n ), which has also been implemented \nin analog VLSI [Urahama and Nagao, 1995]. Section 3 is devoted to soft winner-take-all, \nwhich has been implemented by [Indiveri, 1999] in analog VLSJ via temporal coding of \nthe output. \n\nOur results shows that winner-take-all is a surprisingly powerful computational module \nin comparison with threshold gates (= McCulloch-Pitts neurons) and sigmoidal gates. \nOur theoretical analysis also provides answers to two basic questions that have been \nraised by neurophysiologists in view of the well-known asymmetry between excitatory \nand inhibitory connections in cortical circuits: how much computational power of neural \nnetworks is lost if only positive weights are employed in weighted linear sums, and how \nmuch learning capability is lost if only the positive weights are subject to plasticity. \n\n2 Restructuring Neural Circuits with Digital Output \n\nWe investigate in this section the computational power of a k-winner-take-all gate comput-\ning the function \n\n: ~n -+ {a, l}n \n\nk - WT An \n\nE~ \n\nE {a, I} \n\nk- WTAn \n\n... \n\nwith \nbi = 1 +-+ Xi is among the k largest ofthe inputs Xl, ... ,Xn . \n[precisely: bi = 1 +-+ Xj > Xi holds for at most k - 1 indices j] \n\n\fNeural Computation with Winner-Take-All \n\n295 \n\nTheorem 1. Any two-layer feedf01ward circuit C (with m analog or binary input \nvariables and one binary output variable) consisting of threshold gates (=percep(cid:173)\ntrons) can be simulated by a circuit W consisting of a single k-winner-take-all gate \nk-WTA n I applied to weighted sums of the input variables with positive weights. This holds \nfor all digital inputs. and for analog inputs except for some set S ~ IR.m ~f inputs that has \nmeasure O. \n\nIn particular, any booleanfunction \n\nf : {D , l}m -+ {O, I} \n\ncan be computed by a single k-winner-take-all gate applied to positive weighted sums of \nthe input bits. \n\nRemarks \n\nI. If C has polynomial size and integer weights, whose size is bounded by a polyno(cid:173)\n\nmial in m, then the number oflinear gates S in W can be bounded by a polynomial \nin m, and all weights in the simulating circuit W are natural numbers whose size \nis bounded by a polynomial in m. \n\n2. The exception set of measure D in this result is a union of finitely many hyper(cid:173)\nplanes in lRm. One can easily show that this exception set S of measure D in \nTheorem 1 is necessary. \n\n3. Any circuit that has the structure ofW can be converted back into a 2-layerthresh(cid:173)\n\nold circuit, with a number of gates that is quadratic in the number of weighted \nsums (=1inear gates) in W . This relies on the construction in section 4. \n\nProof of Theorem 1: Since the outputs of the gates on the hidden layer of C are from \n{O, I}, we can assume without loss of generality that the weights a1 , . .. ,an of the out(cid:173)\nput gate G of C are from { - 1, 1} (see for example [Siu et al., 1995] for details; one first \nobserves that it suffices to use integer weights for threshold gates with binary inputs, one \ncan then nonnalize these weights to values in { -1,1} by duplicating gates on the hidden \nlayer of C). Thus for any circuit input & E IR.m we have C(&) = 1 \u00a2:} L: ajG j (&) 2: e, \nwhere G1, ... , Gn are the threshold gates on the hidden layer of C, a1 , .. . , an are from \n{-I, I}, and e is the threshold of the output gate G. In order to eliminate the negative \nweights in G we replace each gate G j for which a j = -1 by another threshold gate (; j so \nthat (;j(&) = 1 - Gj (&) for all & E IR.m except on some hyperpJane. 2 We set Gj := G j \nfor all j E {I, . . . ,n} with a j = 1. Then we have for all & E lRm , except for & from some \nexception set S consisting of up to n hyperplanes, \n\nj=1 \n\nn \n\n2: a j Gj(&) = 2: (;j(&) -I{j E {I , ... , n}: aj = -1}1\u00b7 \n\nn \n\nj=1 \n\nn \n\nj=1 \n\nHence C(&) = 1 \u00a2:} L: Gj (&) 2: k \n\nn\n\n, \n\n, \n\nj=1 \n\nfor all Z E IR.m - S, for some suitable kE N. \n\nLet w{ , . .. , win E lR be the weights and ej E IR. be the threshold of gate (; j ,j = 1, .. . , n. \n\nI of which we only use its last output bit \n2We exploit here that --, I:7:1 W iZi ;::: 0 <=? I:7:1 (-W i )Zi > -0 for arbitrary Wi , Z i, 0 E R . \n\n\f296 \n\nW. Maass \n\nc \n\nb \n\nG1 , \u2022\u2022. ,Gn are arbitrary threshold gates, G \nis a threshold gate with weights from {-I, I} \n\nZI \n\nZm \n\nw \n\nb \n\nSI, ... ,Sn+1 are linear gates (with positive \nweights only, which are sums of absolute val(cid:173)\nues of weights from the gates G 1 , . .\u2022 ,G n) \n\n' \" andback \n\ni:w{>O \n\ni:w{o \n\nfor j = 1, ... ,n \n\nand \n\nn \n\nSn+1 := L L Iw11zi \n\nj=1 i:w1>o \n\nwe have for every j E {I, ... ,n} and every \u00a3 E ~m : \n\nSn+l ~ Sj \u00a2:} L Iw11zi - L Iw11zi > ej \u00a2:} Gj (\u00a3) = 1 . \n\ni:w{>O \n\ni:w{ Ib E {I, ... ,n+ I}: Sj > Sn+dl ~ n - k \n\u00a2:> Ib E {I, ... ,n+ I}: Sn+1 ~ Sj}1 ~ k+ 1 \n\u00a2:> Ib E {I, ... ,n}: Sn+1 ~ Sj}1 ~ k \n\u00a2:> L: Gj(~) ~ k \n\u00a2:> C(~) = 1 . \n\nj=l \n\nA \n\nn \n\nA \n\nNote that all the coefficients in the sums Sl, ... , Sn+1 are positive. \n\n\u2022 \n\n3 Restructuring Neural Circuits with Analog Output \n\nIn order to approximate arbitrary continuous functions with values in [0, 1] by circuits that \nhave a similar structure as those in the preceding section, we consider here a variation of a \nwinner-take-all gate that outputs analog numbers between 0 and I, whose values depend on \nthe rank of the corresponding input in the linear order of all the n input numbers. One may \nargue that such gate is no longer a \"winner-take-all\" gate, but in agreement with common \nterminology we refer to it as a soft winner-take-all gate. Such gate computes a function \nfrom m.n into [0, l]n \n\nXn \n\nElR \n\nsoft winner-take-all \n\n... \n\nE [0,1] \n\nwhose ith output Ti E [0,1] is roughly proportional to the rank of Xi among the numbers \nXl, \u2022\u2022. , X n . More precisely: for some parameter TEN we set \n\nl{jE{I, ... ,n}: xi~xj}I-~ \n' \n\nT \n\nTi = \n\nrounded to 0 or 1 if this value is outside [0,1]. Hence this gate focuses on those \ninputs Xi whose rank among the n input numbers Xl, \u2022 \u2022. ,Xn belongs to the set \n{~, ~ + 1, ... , min{n, T + ~}}. These ranks are linearly scaled into [0, 1].3 \n\nTheorem 2. Circuits consisting oj a single soft winner-take-all gate (oJ which we only use \nits first output T1) applied to positive weighted sums oj the input variables are universal \napproximatorsJor arbitrary continuousJunctionsJrom lRm into [0, 1]. \n\u2022 \n\n3It is shown in [Maass, 2000] that actually any continuous monotone scaling into [0,1] can be \n\nused instead. \n\n\f298 \n\nW Maass \n\nA circuit of the type considered in Theorem 2 (with a soft winner-take-all gate applied to \nn positive weighted sums 51, ... ,5n ) has a very simple geometrical interpretation: Over \neach point &: of the input \"plane\" Rm we consider the relative heights of the n hyperplanes \nHI, ... ,Hn defined by the n positive weighted sums 51, .. . ,5n. The circuit output de(cid:173)\npends only on how many ofthe otherhyperplanesH2 , ... , Hn are above HI at this point\u00a3. \n\n4 A Lower Bound Result for Winner-Take-All \n\nOne can easily see that any k-WTA gate with n inputs can be computed by a 2-layer thresh(cid:173)\nold circuit consisting of (~) + n threshold gates: \n\n? \nl _ \n\nX \u00b7 > X\u00b7 \nJ \n\nXn \n\nG) threshold gates \n\nn threshold gates \n\nI \n\nbl \n\n\": \n\n, \n\n, \n, \n\nbi \n\n, \n\n, \n\nI \n\nb\u00b7 J \n\nI \n\nbn \n\n? \n\nL:~n-k \n\nHence the following result provides an optima/lower bound. \n\nTheorem 3. Any JeedJmward threshold circuit (=multi-Iayer perceptron) that computes \nl-WTAJor n inputs needs to have at least (~) + n gates. \n\u2022 \n\n5 Conclusions \n\nThe lower bound result of Theorem 3 shows that the computational power of winner-take(cid:173)\nall is quite large, even if compared with the arguably most powerful gate commonly studied \nin circuit complexity theory: the threshold gate (also referred to a McCulloch-Pitts neuron \nor perceptron). \n\n\fNeural Computation with Winner-Take-All \n\n299 \n\nIt is well known ([Minsky and Papert, 1969]) that a single threshold gate is not able to \ncompute certain important functions, whereas circuits of moderate (i.e., polynomial) size \nconsisting of two layers of threshold gates with polynomial size integer weights have re(cid:173)\nmarkable computational power (see [Siu et aI., 1995]). We have shown in Theorem 1 that \nany such 2-layer(i.e., I hidden layer) circuit can be simulated by a single k-winner-take-all \ngate, applied to polynomially many weighted sums with positive integer weights of poly(cid:173)\nnomial size. \n\nWe have also analyzed the computational power of soft winner-take-all gates in the context \nof analog computation. It is shown in Theorem 2 that a single soft winner-take-all gate \nmay serve as the only nonlinearity in a class of circuits that have universal computational \npower in the sense that they can approximate any continuous functions. \n\nFurthermore our novel universal approximators require only positive linear operations be(cid:173)\nsides soft winner-take-all, thereby showing that in principle no computational power is lost \nif in a biological neural system inhibition is used exclusively for unspecific lateral inhibi(cid:173)\ntion, and no adaptive flexibility is lost if synaptic plasticity (i.e., \"learning\") is restricted to \nexcitatory synapses. \n\nOur somewhat surprising results regarding the computational power and universality of \nwinner-take-all point to further opportunities for low-power analog VLSI chips, since \nwinner-take-all can be implemented very efficiently in this technology. \n\nReferences \n\n[Brown, 1991] Brown, T. X. (1991). Neural Network Design for Switching Network Con(cid:173)\n\ntrol .. Ph.-D.-Thesis, CAL TECH. \n\n[Grossberg, 1973] Grossberg, S. (1973). Contour enhancement, short term memory, and \nconstancies in reverberating neural networks. Studies in Applied Mathematics, vol. 52, \n217-257. \n\n[Horiuchi et aI., 1997] Horiuchi, T. K., Morris, T. G., Koch, C., DeWeerth, S. P. (1997). \n\nAnalog VLSI circuits for attention-based visual tracking. Advances in Neural Informa(cid:173)\ntion Processing Systems, vol. 9, 706-712. \n\n[Indiveri, 1999] Indiveri, G. (1999). Modeling selective attention using a neuromorphic \n\nanalog VLSI device, submitted for publication. \n\n[Lazzaro et aI., 1989] Lazzaro, 1., Ryckebusch, S., Mahowald, M. A., Mead, C. A. (1989). \n\nWinner-take-all networks of O( n) complexity. Advances in Neural Information Process(cid:173)\ning Systems, vol. I, Morgan Kaufmann (San Mateo), 703-711. \n\n[Maass,2000] Maass, W. (2000). On the computational power of winner-take-all, Neural \n\nComputation, in press. \n\n[Meador and Hylander, 1994] Meador, J. L., and Hylander, P. D. (1994). Pulse coded \nwinner-take-all networks. In: Silicon Implementation of Pulse Coded Neural Networks, \nZaghloul, M. E., Meador, 1., and Newcomb, R. W., eds., Kluwer Academic Publishers \n(Boston),79-99. \n\n[Minsky and Papert, 1969] Minsky, M. C., Papert, S. A. (1969). Perceptrons, MIT Press \n\n(Cambridge). \n\n[Siu et aI., 1995] Siu, K.-Y., Roychowdhury, v., Kailath, T. (1995). Discrete Neural Com(cid:173)\n\nputation: A Theoretical Foundation. Prentice Hall (Englewood Cliffs, NJ, USA). \n\n[Urahama and Nagao, 1995] Urahama, K., and Nagao, T. (1995). k-winner-take-all circuit \n\nwith O(N) complexity. IEEE Trans. on Neural Networks, vol.6, 776--778. \n\n\f", "award": [], "sourceid": 1636, "authors": [{"given_name": "Wolfgang", "family_name": "Maass", "institution": null}]}