{"title": "Foundations for a Circuit Complexity Theory of Sensory Processing", "book": "Advances in Neural Information Processing Systems", "page_first": 259, "page_last": 265, "abstract": null, "full_text": "Foundations for a Circuit Complexity Theory of \n\nSensory Processing* \n\nRobert A. Legenstein & Wolfgang Maass \nInstitute for Theoretical Computer Science \n\nTechnische Universitat Graz, Austria \n\n{Iegi, maass }@igi.tu-graz.ac.at \n\nAbstract \n\nWe introduce total wire length as salient complexity measure for an anal(cid:173)\nysis of the circuit complexity of sensory processing in biological neural \nsystems and neuromorphic engineering. This new complexity measure is \napplied to a set of basic computational problems that apparently need to \nbe solved by circuits for translation- and scale-invariant sensory process(cid:173)\ning. We exhibit new circuit design strategies for these new benchmark \nfunctions that can be implemented within realistic complexity bounds, in \nparticular with linear or almost linear total wire length. \n\n1 Introduction \n\nCircuit complexity theory is a classical area of theoretical computer science, that provides \nestimates for the complexity of circuits for computing specific benchmark functions, such \nas binary addition, multiplication and sorting (see, e.g. (Savage, 1998\u00bb. In recent years \ninterest has grown in understanding the complexity of circuits for early sensory processing, \nboth from the biological point of view and from the point of view of neuromorphic engi(cid:173)\nneering (see (Mead, 1989\u00bb. However classical circuit complexity theory has provided little \ninsight into these questions, both because its focus lies on a different set of computational \nproblems, and because its traditional complexity measures are not tailored to those re(cid:173)\nsources that are of primary interest in the analysis of neural circuits in biological organisms \nand neuromorphic engineering. This deficit is quite unfortunate since there is growing de(cid:173)\nmand for energy-efficient hardware for sensory processing, and complexity issues become \nvery important since the number n of parallel inputs which such circuits have to handle is \ntypically quite large (for example n 2': 106 in the case of many visual processing tasks). \nWe will follow traditional circuit complexity theory in assuming that the underlying graph \nof each circuit is a directed graph without cycles. l The most frequently considered com(cid:173)\nplexity measures in traditional circuit complexity theory are the number (and types) of \n\n\"Research for this article was partially supported by the the Fonds zur Forderung der wis(cid:173)\nsenschaftlichen Forschung (FWF), Austria, project P12153, and the NeuroCOLT project of the EC. \nI Neural circuits in \"wetware\" as well as most circuits in analog VLSI contain in addition to \nfeedforward connections also lateral and recurrent connections. This fact presents a serious obstacle \nfor a direct mathematical analysis of such circuits. The standard mathematical approach is to model \nsuch circuits by larger feedforward circuits, where new \"virtual gates\" are introduced to represent the \nstate of existing gates at later points in time. \n\n\fgates, as well as the depth of a circuit. The latter is defined as the length of the longest \ndirected path in the underlying graph, and is also interpreted as the computation time of the \ncircuit. The focus lies in general on the classification of functions that can be computed by \ncircuits whose number of gates can be bounded by a polynomial in the number n of input \nvariables. This implicitly also provides a polynomial- typically quite large - bound on the \nnumber of \"wires\" (defined as the edges in the underlying graph of the circuit). \n\nWe proceed on the assumption that the area (or volume in the case of neural circuits) oc(cid:173)\ncupied by wires is a severe bottleneck for physical implementations of circuits for sensory \nprocessing. Therefore we wiJI not just count wires, but consider a complexity measure that \nprovides an estimate for the total area or volume occupied by wires. In the cortex, neurons \noccupy an about 2 mm thick 3-dimensional sheet of \"grey \nmatter\". There exists a strikingly general upper bound on the \norder of 105 for the number of neurons under any mm2 of \ncortical surface, and the total length of wires (axons and den(cid:173)\ndrites, including those running in the sheet of \"white matter\" \nthat lies below the grey matter) under any mm2 of cortical \nsurface is estimated to be ~ 8km = 8\u00b7106mm (Koch, 1999). \nTogether this yields an upper bound of 8~~~6 n = 80 . n mm \nfor the wire length of the \"average\" cortical circuit involving \nn neurons. \nIn order to arrive at a concise mathematical model we project \neach 3D cortical circuit into 2D, and assume for simplicity \nthat its n gates (neurons) occupy the nodes of a grid. Then \nfor a circuit with n gates, the total length of the horizontal \ncomponents of all wires is on average ~ 80 . n mm = 80 . n \n.105/ 2 ~ 25300\u00b7 n grid units. Here, one grid unit is the distance between adjacent nodes on \nthe grid, which amounts to 1O-5 / 2mm for an assumed density of 105 neurons per mm2 of \ncortical surface. Thus we arrive at a simple test for checking whether the total wire length \nof a proposed circuit design has a chance to be biologically realistic: Check whether you \ncan arrange its n gates on the nodes of a grid in such a way that the total length of the \nhorizontal components of all wires is ~ 25300 . n grid units. \n\nMore abstractly, we define the following model: \nGates, input- and output-ports of a circuit are placed on different nodes of a 2-dimensional \ngrid (with unit distance 1 between adjacent nodes). These nodes can be connected by \n(unidirectional) wires that run through the plane in any way that the designer wants, in \nparticular wires may cross and need not run rectilinearly (wires are thought of as running \nin the 3 dimensional .Ipace above the plane, without charge for vertical wire segmentsp. \nWe refer to the minimal value of the sum of all wire lengths that can be achieved by any \nsuch arrangement as the total wire length of the circuit. \n\nThe attractiveness of this model lies in its mathematical simplicity, and in its generality. \nIt provides a rough estimate for the cost of connectivity both in artificial (basically 2-\ndimensional) circuits and in neural circuits, where 2-dimensional wire crossing problems \nare apparently avoided (at least on a small scale) since dendritic and axonal branches are \nrouted through 3-dimensional cortical tissue. \n\nThere exist quite reliable estimates for the order of magnitudes for the number n of inputs, \nthe number of neurons and the total wire length of biological neural circuits for sensory pro(cid:173)\ncessing, see (Abeles, 1998; Koch, 1999; Shepherd, 1998; Braitenberg and Schiiz, 1998).3 \n\n2We will allow that a wire from a gate may branch and provide input to several other gates. For \nreasonable bounds on the maximal fan-out (104 in the case of neural circuits) this is realistic both for \nneural circuits and for VLSI. \n\n3The number of neurons that transmit information from the retina (via the thalamus) to the cortex \n\n\fCollectively they suggest that only those circuit architectures for sensory processing are \nbiologically realistic that employ a number of gates that is almost linear in the number n \nof inputs, and a total wire length that is quadratic or subquadratic - with the additional re(cid:173)\nquirement that the constant factor in front of the asymptotic complexity bound has a value \nclose to 1. Since most asymptotic bounds in circuit complexity theory have constant fac(cid:173)\ntors in front that are much larger than 1, one really has to focus on circuit architectures \nwith clearly subquadratic bounds for the total wire length. The complexity bounds for cir(cid:173)\ncuits that can realistically be implemented in VLSI are typically even more severe than for \n\"wetware\", and linear or almost linear bounds for the total wire length are desirable for that \npurpose. \n\nIn this article we begin the investigation of algorithms for basic pattern recognition tasks \nthat can be implemented within this low-level complexity regime. The architecture of such \ncircuits has to differ strongly from most previously proposed circuits for sensory process(cid:173)\ning, which usually involve at least 2 completely connected layers, since already complete \nconnectivity between just two linear size 2-dimensionallayers of a feedforward neural net \nrequires a total wire length on the order of n 5 / 2 . Furthermore a circuit which first se(cid:173)\nlects a salient input segment consisting of a block of up to m adjacent inputs in some \n2-dimensional map, and then sends this block of ~ m inputs in parallel to some central \n\"pattern template matcher\", typically requires a total wire length of O(n3 / 2 \u2022 m) - even \nwithout taking the circuitry for the \"selection\" or the template matching into account. \n\n2 Global Pattern Detection in 2-Dimensional Maps \n\nFor many important sensory processing tasks - such as for vi(cid:173)\nsual or somatosensory input - the input variables are arranged in \na 2-dimensional map whose structure reflects spatial relationship \nin the outside world. We assume that local feature detectors are \nable to detect the presence of salient local features in their spe(cid:173)\ncific \"receptive field\", such as for example a center which emits \n\nis estimated to be around 106 (all estimates given are for primates, and they only reflect the order of \nmagnitude). The total number of neurons that transmit sensory (mostly somatosensory) information \nto the cortex is estimated to be around 108 . In the subsequent sections we assume that these inputs \nrepresent the outputs of various local feature detectors for n locations in some 2-dimensional map. \nThus, if one assumes for example that on average there are 10 different feature detectors for each \nlocation on this map, one arrives at biologically realistic estimates for n that lie between 105 and \n107 . \n\nThe total number of neurons in the primary visual cortex of primates is estimated to be around 109 , \noccupying an area of roughly 104 mm2 of cortical surface. There are up to 105 neurons under one \nmm2 of cortical surface, which yields a value of 10- 5 / 2 mm for the distance between adjacent grid \npoints in our model. The total length of axonal and dendritic branches below one mm 2 of cortical \nsurface is estimated to be between 1 and 10 km, yielding up to lOll mm total wire length for primary \nvisual cortex. Thus if one assumes that 100 separate circuits are implemented in primary visual \ncortex, each of them can use 107 neurons and a total wire length of 109 mm. Hence realistic bounds \nfor the complexity of a single one of these circuits for visual pattern recognition are 107 = n7 / 5 \nneurons (for n = 105 ), and a total wire length of at most 10 1 1.5 = n 2 .3 grid units in the framework \nof our model. \n\nThe whole cortex receives sensory input from about 108 neurons. It processes this input with \nabout 10 10 neurons and less than 10 12 mm total wire length. If one assumes that 103 separate \ncircuits process this sensory information in parallel, each of them processing about l/lOth of the \ninput (where again 10 different local feature detectors report about every location in a map), one \nanives at n = 106 neurons for each circuit, and each circuit can use on average n 7 /6 neurons and a \ntotal wire length of lO ll .5 < n 2 grid units in the sense of our model. The actual resources available \nfor sensory processing are likely to be substantially smaller, since most cortical neurons and circuits \nare believed to have many other functions besides online sensory processing. \n\n\fhigher (or lower) intensity than its immediate surrounding, or a high-intensity line segment \nin a certain direction, the end of a line, a junction of line segments, or even more complex \nlocal visual patterns like an eye or a nose. The ultimate computational goal is to detect \nspecific global spatial arrangements of such local patterns, such as the letter \"T\", or in the \nend also a human face, in a translation- and scale-invariant manner. \n\nWe formalize 2-dimensional global pattern detection problems by assuming that the input \nconsists of arrays g = (al, ... , an), ~ = (bl, ... , bn), etc. of binary variables that are \narranged on a 2-dimensional square grid4 \u2022 Each index i can be thought of as representing a \nlocation within some y'ri x y'ri-square in the outside world. We assume that ai = 1 if and \nonly if feature a is detected at location i and that bi = 1 if and only if feature b is detected \nat location i. In our formal model we can reserve a subsquare within the 2-dimensional \ngrid for each index i, where the input variables ai, bi , etc. are given on adjacent nodes of \nthis grid5 . Since we assume that this spatial arrangement of input variables reflects spatial \nrelations in the outside world, many salient examples for global pattern detection problems \nrequire the computation of functions such as \n\n1, \n\nif there exist i and j so that ai = bj = 1 and input location j \n\nPI) (g,~) = \n\n{ \n\nis above and to the right of input location i \n\n0, else \n\nTheorem 2.1 The fun ction PI) can be computed - and witnesses i and j with ai = bj = 1 \ncan be exhibited if they exist - by a circuit with total wire length O(n), consisting ofO(n) \nBoolean gates offan-in 2 (andfan-out 2) in depth o (log n . log logn). \nThe depth of the circuit can be reduced to o (log n) if one employs threshold gates6 with \nfan-in logn. This can also be done with total wire length O(n). \n\nProof (sketch) At first sight it seems that PI) needs complete connectivity on the plane \nbecause of its global character. However, we show that there exists a divide and conquer \napproach with rather small communication cost. \nDivide the input plane into four sub-squares Cl , ... , C4 (see Figure la). We write \ngl, ... , g4 and ~l , ... , ~4 for the restrictions of the input to these four sub-areas and as(cid:173)\nsume that the following values have already been computed for each sub-square Ci : \n\n\u2022 The x-coordinate of the leftmost occurrence of feature a in Ci \n\u2022 The x-coordinate of the rightmost occurrence of feature b in Ci \n\u2022 The y-coordinate of the lowest occurrence of feature a in Ci \n\u2022 The y-coordinate of the highest occurrence of feature b in Ci \n\u2022 The value of p;;/4(gi,~i) \n\nWe employ a merging algorithm that uses this information to compute corresponding val(cid:173)\nues for the whole input plane. The first four values can be computed by comparison-like \n4Whenever needed we assume for simplicity that n is such that Vii, log n etc. are natural num(cid:173)\nbers. The arrangement of the input variables an the grid will in general leave many nodes empty, \nwhich can be occupied by gates of the circuit. \n\n5To make this more formal one can assume that indices i and) represent pairs (il' i2), (jl, h) of \ncoordinates. Then \"input location) is above and to the right of input location i\" means: il < 1t and \ni2 < )2. The circuit complexity of variations of the function PE where one or both of the \"<\" are \nreplaced by \"~\" is the same. \n\n6 A threshold gate computes a Boolean function T \n\n: {O, 1} k \n\nT(Xl' . . . ,Xk) = 1 \u00a2:} E~=l WiXi ~ Woo \n\n-+ {O, 1} of the form \n\n\f\u2022 \u2022 \n\u2022 \n\u2022 \n\u2022 \n\u2022 \n\n\u2022 \n\u2022 \n\u2022 \u2022 \n\n'--___ \"'---__ ____' a) \n\n,- - - - - - - - - r - - - - - - - - -. \n: \nl \n~ : \n' \n: \n: ~ \n, -\n~ 1 \n: ~ i \n' ~ \n\n~ \n\n~ \n\n~ .L--~-----\n\n~ i ~ ~ \n~ ____ ~ ____ ____'b) \n\n....--,.--or - - - - - - , \n\n~ .w. ~ -. -~ : \n~ ~ -I \n\n1 \n-~ ~ I \n\n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 ~ \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 I \n~ I \n\n' - -___ .... _______ I c) \n\n~ ~ \n\n: \n\n. . ~ \n\nFigure 1: The 2-dimensional input plane. Occurrences of features in Q are indicated by light \nsquares, and occurrences of features in fl. are indicated by dark squares. Divide the input \narea into four sub-squares (a). Merging horizontally adjacent sub-squares (b). Merging \nvertically adjacent sub-squares (c). \n\na) \n\nb) \n\nc) \n\nFigure 2: The H-tree construction. Black squares represent sub-circuits for the merging \nalgorithm. The shaded areas contain the leaves of the tree. The lightly striped areas rep(cid:173)\nresent busses of wires that run along the edges of the H-Tree. The H-tree HI divides the \ninput-area into four sub-squares (a). To construct H 2 , replace the leaves of HI by H-trees \nHI (b). To construct H k , replace the leaves of HI by H-trees H k - I (c). \n\noperations. The computation of P]5(Q, fl.) can be sketched as follows: First, check whether \np{;/4(Qi ,fl.i) = 1 for some i E {I, ... ,4}. Then, check the spatial relationships between \nfeature occurrences in adjacent sub-squares. When checking spatial relationships between \nfeatures from two horizontally adjacent sub-squares, only the lowest and the highest fea(cid:173)\nture occurrence is crucial for the value of P]5 (see Figure Ib). This is true, since the \nx-coordinates are already separated. When checking spatial relationships of features from \ntwo vertically adjacent sub-squares, only the leftmost and the rightmost feature occurrence \nis crucial for the value of P]5 (see Figure lc). This is true, since the y-coordinates are \nalready separated. When checking spatial relationships of features from the lower left and \nthe upper right sub-squares, it suffices to check whether there is an a-feature occurrence \nin the lower left and a b-feature occurrence in the upper right sub-square. Hence, one can \nreduce the amount of information needed from each sub-square to 0 (log n/ 4) bits. \nIn the remaining part of the proof sketch, we present an efficient layout for a circuit that \nimplements this recursive algorithm. We need a layout strategy that is compatible with the \nrecursive two-dimensional division of the input plane. We adopt for this purpose a well \nknown design strategy: the H-tree (see (Mead and Rem, 1979\u00bb. An H-tree is a recursive \ntree-layout on the 2-dimensional plane. Let Hk denote such a tree with 4k leaves. The \nlayout of HI is illustrated in Figure 2a. To construct an H -Tree H k, build an H -tree HI and \nreplace its four leaves by H-trees H k - I (see Figure 2b,c). \n\nWe need to modify the H-tree construction of Mead and Rum to make it applicable to \n\n\four problem. The inner nodes of the tree are replaced by sub-circuits that implement the \nmerging algorithm. Furthermore, each edge of the H-tree is replaced by a \"bus\" consisting \nof O(log m) wires if it originates in an area with m inputs. It is not difficult to show that \n\u2022 \nthis layout uses only linear total wire length. \n\nThe linear total wire length of this circuit is up to a constant factor optimal for any circuit \nwhose output depends on all of its n inputs. Note that most connections in this circuit are \nlocal, just like in a biological neural circuit. Thus, we see that minimizing total wire length \ntends to generate biology-like circuit structures. \n\nThe next theorem shows that one can compute PI) faster (i.e. by a circuit with smaller \ndepth) if one can afford a somewhat larger total wire length. This circuit construction, that \nis based on AND/OR gates of limited fan-in ~, has the additional advantage that it can not \njust exhibit some pair (i, j) as witness for PI) (g\" Q) = 1 (provided such witness exists), \nbut it can exhibit in addition all j that can be used as witness together with some i. This \nproperty allows us to \"chain\" the global pattern detection problem formalized through the \nfunction PI), and to decide within the same complexity bound whether for any fixed number \nk of input vectors g,(l), ... ,g,(k) from {a, 1}n there exist locations i(l), ... ,i(k) so that \nai;:\\ = 1 for m = 1, ... ,k and location i(m+1) lies to the right and above location i(m) \nfor m = 1, ... ,k - 1. In fact, one can also compute a k-tuple of witnesses i(l), ... ,i(k) \nwithin the same complexity bounds, provided it exists. This circuit design is based on an \nefficient layout for prefix computations. \n\nTheorem 2.2 For any given n and ~ E {2, ... ,Vn} one can compute the function PI) in \ndepth O(:~:~) by a feed-forward circuit consisting ofO(n) AND/OR gates offan-in ~ ~, \nwith total wire length O(n . ~ . :~;~). \n\u2022 \n\nAnother essential ingredient of translation- and scale-invariant global pattern recognition \nis the capability to detect whether a local feature c occurs in the middle between locations \ni and j where the local features a and b occur. This global pattern detection problem is \nformalized through the following function PF : {a, 1 pn -t {a, 1}: \nIf LA = LQ = 1 thenPF(g\"Q,~) = 1, if and only if there existi,j,k so that input \nlocation k lies on the middle of the line between locations i and j, and ai = bj = Ck = 1. \nThis function PF can be computed very fast by circuits with the least possible total wire \nlength (up to a constant factor), using threshold gates of fan-in up to Vn: \n\nTheorem 2.3 The function PF can be computed - and witnesses can be exhibited - by a \ncircuit with total wire length and area O(n), consisting ofO(n) Boolean gates offan-in 2 \nand 0 (..jn) threshold gates of fan-in Vn in depth 7. \n\nThe design of the circuit exploits that the computation of PF can be reduced to the solution \n\u2022 \nof two closely related 1-dimensional problems. \n\n3 Discussion \n\nThere exists a very large literature on neural circuits for translation-invariant pattern \nrecognition see http://www.cn!.salk.edurwiskottiBibliographies/Invariances.htm!. Unfor(cid:173)\ntunately there exists substantial disagreement regarding the interpretation of existing ap(cid:173)\nproaches see http://www.ph.tn.tudelft.nIIPRInfo/shiftimaillist.html. Virtually all positive \nresults are based on computer simulations of small circuits, or on learning algorithms for \nconcrete neural networks with a fixed input size n on the order of 20 or 30, without an \nanalysis how the required number of gates and the area or volume occupied by wires scale \n\n\fup with the input size. The computational performance of these networks is often reported \nin an anecdotical manner. \n\nThe goal of this article is to show that circuit complexity theory may become a useful \ningredient for understanding the computational strategies of biological neural circuits, and \nfor extracting from them portable principles that can be applied to novel artificial circuits 7 . \nFor that purpose we have introduced the total wire length as an abstract complexity measure \nthat appears to be among the most salient ones in this context, and which can in principle \nbe applied both to neural circuits in the cortex and to artificial circuitry. We would like to \nargue that only those computational strategies that can be implemented with subquadratic \ntotal wire length have a chance to reflect aspects of cortical information processing, and \nonly those with almost linear total wire length are implementable in special purpose VLSI(cid:173)\nchips for real-world sensory processing tasks. 8 The relevance of the total wire length of \ncortical circuits has been emphasized by numerous neuroscientists, from Cajal (see for \nexample p. 14 in (Cajal, 1995)) to (Chklovskii and Stevens, 2000). On the other hand the \ntotal wire length of a circuit layout is also closely related to the area required by a VLSI \nimplementation of such a circuit (see (Savage, 1998)). \n\nWe have formalized some basic computational problems, that appear to underly various \ntranslation- and scale-invariant sensory processing tasks, as a first set of benchmark func(cid:173)\ntions for a circuit complexity theory of sensory processing. We have presented designs for \ncircuits that compute these benchmark functions with small - in most cases linear or al(cid:173)\nmost linear - total wire length (and constant factors of moderate size). The computational \nstrategies of these circuits differ strongly from those that have been considered in previ(cid:173)\nous approaches, which failed to take the limitations imposed by the realistically available \namount of total wire length into account. \n\nReferences \n\nAbeles, M. (1998). Corticonics: Neural Circuits of the Cerebral Cortex, Cambridge Univ. Press. \nBraitenberg, V., Schuz, A. (1998). Cortex: Statistics and Geometry of Neuronal Connectivity, 2nd \n\ned., Springer Verlag. \n\nCajal, S.R. (1995). Histology of the Nervous System, volumes 1 and 2, Oxford University Press (New \n\nYork). \n\nChklovskii, D.B. and Stevens, C.P. (2000). Wiring optimization in the brain. Advances in Neural \n\nInformation Processing Systems vol. 12, MIT Press, 103-107. \n\nKoch, C. (1999). Biophysics of Computation, Oxford Univ. Press. \nLazzaro, J., Ryckebusch, S., Mahowald, M. A., Mead, C. A. (1989). Winner-take-all networks of \n\nO(n) complexity. Advances in Neural Information Processing Systems, vol. 1, Morgan Kauf(cid:173)\nmann (San Mateo), 703-711. \n\nMead, C. and Rem M. (1979). Cost and performance of VLSI computing structures. IEEE 1. Solid \n\nState Circuits SC-14(1979), 455-462. \n\nMead, C. (1989). Analog VLSI and Neural Systems. Addison-Wesley (Reading, MA, USA). \nSavage, J. E. (1998). Models of Computation: Exploring the Power of Computing. Addison-Wesley \n\n(Reading, MA, USA). \n\nShepherd, G. M. (1998). The Synaptic Organization of the Brain, 2nd ed., Oxford Univ. Press. \n\n7We do not want to argue that learning plays no role in the design and optimization of circuits \n\nfor specific sensory processing tasks; on the contrary. But one of the few points where the discus(cid:173)\nsion from http://www.ph.tn.tudelft.nUPRInfo/shiftfmaillist.html agreed is that translation- and scale(cid:173)\ninvariant pattern recognition is a task which is so demanding, that learning algorithms have to be \nsupported by pre-existing circuit structures. \n\n80f course there are other important complexity measures for circuits - such as energy consump(cid:173)\n\ntion - besides those that have been addressed in this article. \n\n\f", "award": [], "sourceid": 1910, "authors": [{"given_name": "Robert", "family_name": "Legenstein", "institution": null}, {"given_name": "Wolfgang", "family_name": "Maass", "institution": null}]}