{"title": "On Properties of Networks of Neuron-Like Elements", "book": "Neural Information Processing Systems", "page_first": 41, "page_last": 51, "abstract": null, "full_text": "41 \n\nON PROPERTIES OF NETWORKS \n\nOF NEURON-LIKE ELEMENTS \n\nPierre Baldi\u00b7 and Santosh S. Venkatesht \n\n15 December 1987 \n\nAbstract \n\nThe complexity and computational capacity of multi-layered, feedforward \nneural networks is examined. Neural networks for special purpose (structured) \nfunctions are examined from the perspective of circuit complexity. Known re(cid:173)\nsults in complexity theory are applied to the special instance of neural network \ncircuits, and in particular, classes of functions that can be implemented in \nshallow circuits characterised. Some conclusions are also drawn about learning \ncomplexity, and some open problems raised. The dual problem of determining \nthe computational capacity of a class of multi-layered networks with dynamics \nregulated by an algebraic Hamiltonian is considered. Formal results are pre(cid:173)\nsented on the storage capacities of programmed higher-order structures, and \na tradeoff between ease of programming and capacity is shown. A precise de(cid:173)\ntermination is made of the static fixed point structure of random higher-order \nconstructs, and phase-transitions (0-1 laws) are shown. \n\n1 \n\nINTRODUCTION \n\nIn this article we consider two aspects of computation with neural networks. Firstly \nwe consider the problem of the complexity of the network required to compute classes \nof specified (structured) functions. We give a brief overview of basic known com(cid:173)\nplexity theorems for readers familiar with neural network models but less familiar \nwith circuit complexity theories. We argue that there is considerable computational \nand physiological justification for the thesis that shallow circuits (Le., networks with \nrelatively few layers) are computationally more efficient. We hence concentrate on \nstructured (as opposed to random) problems that can be computed in shallow (con(cid:173)\nstant depth) circuits with a relatively few number (polynomial) of elements, and \ndemonstrate classes of structured problems that are amenable to such low cost so(cid:173)\nlutions. We discuss an allied problem-the complexity of learning-and close with \nsome open problems and a discussion of the observed limitations of the theoretical \napproach. \n\nWe next turn to a rigourous classification of how much a network of given \nstructure can do; i.e., the computational capacity of a given construct. (This is, in \n\n\u00b7Department of Mathematics, University of California (San Diego), La Jolla, CA 92093 \n\ntMoore School of Electrical Engineering, University of Pennsylvania, Philadelphia, PA 19104 \n\n\u00a9 American Institute of Physics 1988 \n\n\f42 \n\na sense, the mirror image of the problem considered above, where we were seeking \nto design a minimal structure to perform a given task.) In this article we restrict \nourselves to the analysis of higher-order neural structures obtained from polynomial \nthreshold rules. We demonstrate that these higher-order networks are a special class \nof layered neural network, and present formal results on storage capacities for these \nconstructs. Specifically, for the case of programmed interactions we demonstrate \nthat the storage capacity is of the order of n d where d is the interaction order. \nFor the case of random interactions, a type of phase transition is observed in the \ndistribution of fixed points as a function of attraction depth. \n\n2 COMPLEXITY \n\nThere exist two broad classes of constraints on compl,ltations. \n\n1. Physical constraints: These are related to the hardware in which the computa(cid:173)\n\ntion is embedded, and include among others time constants, energy limitations, \nvolumes and geometrical relations in 3D space, and bandwidth capacities. \n\n2. Logical constraints: These can be further subdivided into \n\n\u2022 Computability constraints-for instance, there exist unsolvable problems, \ni.e., functions such as the halting problem which are not computable in \nan absolute sense . \n\n\u2022 Complexity constraints-usually giving upper and/or lower bounds on \n\nthe amount of resources such as the time, or the number of gates re(cid:173)\nquired to compute a given function. As an instance, the assertion \"There \nexists an exponential time algorithm for the Traveling Salesman Prob(cid:173)\nlem,\" provides a computational upper bound. \n\nIf we view brains as computational devices, it is not unreasonable to think \nthat in the course of the evolutionary process, nature may have been faced several \ntimes by problems related to physical and perhaps to a minor degree logical con(cid:173)\nstraints on computations. If this is the case, then complexity theory in a broad \nsense could contribute in the future to our understanding of parallel computations \nand architectural issues both in natural and synthetic neural systems. \n\nA simple theory of parallel processing at the macro level (where the elements \n\nare processors) can be developed based on the ratio of the time spent on com(cid:173)\nmunications between processors [7] for different classes of problems and different \nprocessor architecture and interconnections. However, this approach does not seem \nto work for parallel processing at the level of circuits, especially if calculations and \ncommunications are intricately entangled. \n\nRecent neural or connectionist models are based on a common structure, that \n\nof highly interconnected networks of linear (or polynomial) threshold (or with sig(cid:173)\nmoid input-output function) units with adjustable interconnection weights. We shall \ntherefore review the complexity theory of such circuits. In doing so, it will be some(cid:173)\ntimes helpful to contrast it with the similar theory based on Boolean (AND, OR, \nNOT) gates. The presentation will be rather informal and technical complements \ncan easily be found in the references. \n\n\f43 \n\nConsider a circuit as being on a cyclic oriented graph connecting n Boolean \ninputs to one Boolean output. The nodes of the graph correspond to the gates \n(the n input units, the \"hidden\" units, and the output unit) of the circuit. The \nsize of the circuit is the total number of gates and the depth is the length of the \nlongest path connecting one input to the output. For a layered, feed-forward circuit, \nthe width is the average number of computational units in the hidden (or interior) \nlayers of elements. The first obvious thing when comparing Boolean and threshold \nlogic is that they are equivalent in the sense that any Boolean function can be \nimplemented using either logic. In fact, any such function can be computed in a \ncircuit of depth two and exponential size. Simple counting arguments show that \nthe fraction of functions requiring a circuit of exponential size approaches one as \nn -+ 00 in both cases, i.e., a random function will in general require an exponential \nsize circuit. (Paradoxically, it is very difficult to construct a family of functions \nfor which we can prove that an exponential circuit is necessary.) Yet, threshold \nlogic is more powerful than Boolean logic. A Boolean gate can compute only one \nfunction whereas a threshold gate can compute to the order of 2on2 functions by \nvarying the weights with 1/2 ~ a ~ 1 (see [19] for the lower bound; the upper \nbound is a classical hyperplane counting argument, see for instance [20,30)). It \nwould hence appear plausible that there exist wide classes of problems which can be \ncomputed by threshold logic with circuits substantially smaller than those required \nby Boolean logic. An important result which separates threshold and Boolean logic \nfrom this point of view has been demonstrated by Yao [31] (see [10,24] for an elegant \nproof). The result is that in order to compute a function such as parity in a circuit \nof constant depth k, at least exp(cnl/2k) Boolean gates with unbounded fanin are \nrequired. As we shall demonstrate shortly, a circuit of depth two and linear size is \nsufficient for the computation of such functions using threshold logic. \n\nIt is not unusual to hear discussions about the tradeoffs between the depth \nand the width of a circuit. We believe that one of the main constributions of \ncomplexity analysis is to show that this tradeoff is in some sense minimal and that \nin fact there exists a very strong bias in favor of shallow (Le., constant depth) \ncircuits. There are multiple reasons for this. In general, for a fixed size, the number \nof different functions computable by a circuit of small depth exceeds the number \nof those computable by a deeper circuit. That is, if one had no a priori knowledge \nregarding the function to be computed and was given hidden units, then the optimal \nstrategy would be to choose a circuit of depth two with the m units in a single \nlayer. In addition, if we view computations as propagating in a feedforward mode \nfrom the inputs to the output unit, then shallow circuits compute faster. And the \ndeeper a circuit, the more difficult become the issues of time delays, synchronisation, \nand precision on the computations. Finally, it should be noticed that given overall \nresponses of a few hundred milliseconds and given the known time scales for synaptic \nintegration, biological circuitry must be shallow, at least within a \"module\" and \nthis is corroborated by anatomical data. The relative slowness of neurons and their \nshallow circuit architecture are to be taken together with the \"analog factor\" and \n\"entropy factor\" [1] to understand the necessary high-connectivity requirements of \nneural systems. \n\n\f44 \n\nFrom the previous analysis emerges an important class of circuits in threshold \nlogic characterised by polynomial size and shallow depth. We have seen that, in \ngeneral, a random function cannot be computed by such circuits. However, many \ninteresting functions-the structured problems--are far from random, and it is then \nnatural to ask what is the class of functions computable by such circuits? While \na complete characterisation is probably difficult, there are several sub-classes of \nstructural functions which are known to be computable in shallow poly-size circuits. \n\nThe symmetric functions, i.e., functions which are invariant under any per(cid:173)\n\nmutation of the n input variables, are an important class of structured problems \nthat can be implemented in shallow polynomial size circuits. In fact, any symmet(cid:173)\nric function can be computed by a threshold circuit of depth two and linear size; \n(n hidden units and one output unit are always sufficient). We demonstrate the \nvalidity of this assertion by the following instructive construction. We consider n \nbinary inputs, each taking on values -1 and 1 only, and threshold gates as units. \nNow array the 2n possible inputs in n + 1 rows with the elements in each row being \npermuted versions of each other (i.e., n-tuples in a row all have the same number \nof +1's) and with the rows going monotonically from zero +1's to n +l's. Any \ngiven symmetric Boolean function clearly assumes the same value for all elements \n(Boolean n-tuples) in a row, so that contiguous rows where the function assumes \nthe value +1 form bands. (There are at most n/2 bands-the worst case occuring \nfor the parity function.) The symmetric function can now be computed with 2B \nthreshold gates in a single hidden layer with the topmost \"neuron\" being activated \nonly if the number of +1's in the input exceeds the number of +1's in the lower \nedge of the lowest band, and proceeding systematically, the lowest \"neuron\" being \nactivated only if the number of +1's in the input exceeds the number of +1's in the \nupper edge of the highest band. An input string will be within a band if and only if \nan odd number of hidden neurons are activated startbg contiguously from the top \nof the hidden layer, and conversely. Hence, a single output unit can compute the \ngiven symmetric function. \n\nIt is easy to see that arithmetic operations on binary strings can be performed \n\nwith polysize small depth circuits. Reif [23] has shown that for a fixed degree of pre(cid:173)\ncision, any analytic function such as polynomials, exponentials, and trigonometric \nfunctions can be approximated with small and shallow threshold circuits. Finally, \nin many situations one is interested in the value of a function only for a vanishingly \nsmall (Le., polynomial) fraction of the total number of possible inputs 2n. These \nfunctions can be implemented by polysize shallow circuits and one can relate the \nsize and depths of the circuit to the cardinal of the interesting inputs. \n\nSo far we only have been concerned with the complexity of threshold circuits. \nWe now turn to the complexity of learning, i.e., the problem of finding the weights \nrequired to implement a given function. Consider the problem of repeating m points \nin 1R l coloured in two colours, using k hyperplanes so that any region contains only \nmonochromatic points. If i and k are fixed the problem can be solved in polynomial \ntime. If either i or k goes to infinity, the problem becomes NP-complete [1]. As \na result, it is not difficult to see that the general learning problem is NP-complete \n(see also [12] for a different proof and \n[21] for a proof of the fact it is already \nNP-complete in the case of one single threshold gate). \n\n\f45 \n\nSome remarks on the limitations of the complexity approach are a pro]XJs at \n\nthis juncture: \n\n1. While a variety of structured Boolean functions can be implemented at rela(cid:173)\ntively low cost with networks of linear threshold gates (McCulloch-Pitts neu(cid:173)\nrons), the extension to different input-output functions and the continuous \ndomain is not always straightforward. \n\n2. Even restricting ourselves to networks of relatively simple Boolean devices such \nas the linear threshold gate, in many instances, only relatively weak bounds \nare available for computational cost and complexity. \n\n3. Time is probably the single most important ingredient which is completely \nabsent from these threshold units and their interconnections [17,14]; there \nare, in addition, non-biological aspects of connectionist models [8]. \n\n4. Finally, complexity results (where available) are often asymptotic in nature \n\nand may not be meaningful in the range corresponding to a particular appli(cid:173)\ncation. \n\nWe shall end this section with a few open questions and speculations. One \nproblem has to do with the time it takes to learn. Learning is often seen as a \nvery slow process both in artificial models (cf. back propagation, for instance) and \nbiological systems (cf. human acquisition of complex skills). However, if we follow \nthe standards of complexity theory, in order to be effective over a wide variety of \nscales, a single learning algorithm should be polynomial time. We can therefore \nask what is learnable by examples in polynomial time by polynomial size shallow \nthreshold circuits? The status of back propagation type of algorithms with respect \nto this question is not very clear. \n\nThe existence of many tasks which are easily executed by biological organisms \nand for which no satisfactory computer program has been found so far leads to the \nquestion of the specificity of learning algorithms, i.e., whether there exists a com(cid:173)\nplexity class of problems or functions for which a \"program\" can be found only by \nlearning from examples as opposed to by traditional programming. There is some \ncircumstantial evidence against such conjecture. As pointed out by Valiant [25], \ncryptography can be seen in some sense as the opposite of learning. The conjectures \nexistence of one way function, i.e., functions which can be constructed in polyno(cid:173)\nmial time but cannot be invested (from examples) in polynomial time suggests that \nlearning algorithms may have strict limitations. In addition, for most of the artificial \napplications seen so far, the programs obtained through learning do not outperform \nthe best already known software, though there may be many other reasons for that. \nHowever, even if such a complexity class does not exist, learning algorithm may \nstill be very important because of their inexpensiveness and generality. The work of \nValiant [26,13] on polynomial time learning of Boolean formulas in his \"distribution \nfree model\" explores some additional limitations of what can be learned by examples \nwithout including any additional knowledge. \n\nLearning may therefore turn out to be a powerful, inexpensive but limited \nfamily of algorithms that need to be incorporated as \"sub-routines\" of more global \n\n\f46 \n\nprograms, the structure of which may be -harder to find. Should evolution be re(cid:173)\ngarded as an \"exponential\" time learning process complemented by the \"polynomial\" \ntime type of learning occurring in the lifetime of organisms? \n\n3 CAPACITY \n\nIn the previous section the focus of our investigation was on the structure and cost of \nminimal networks that would compute specified Boolean functions. We now consider \nthe dual question: What is the computational capacity of a threshold network of \ngiven structure? As with the issues on complexity, it turns out that for fairly general \nnetworks, the capacity results favour shallow (but perhaps broad) circuits [29]. In \nthis discourse, however, we shall restrict ourselves to a specified class of higher-order \nnetworks, and to problems of associative memory. We will just quote the principal \nrigourous results here, and present the involved proofs elsewhere [4]. \n\nWe consider systems of n densely interacting threshold units each of which \nyields an instantaneous state -1 or +1. (This corresponds in the literature to a \nsystem of n Ising spins, or alternatively, a system of n neural states.) The state \nspace is hence the set of vertices of the hypercube. We will in this discussion \nalso restrict our attention throughout to symmetric interaction systems wherein the \ninterconnections between threshold elements is bidirectional. \n\nLet Id be the family of all subsets of cardinality d + 1 of the set {1, 2, ... , n}. \nClearly IIdl = ( d + 1)\u00b7 For any subset I of {1, 2, ... , n}, and for every state \nU = {Ul,U2, ... ,un }E lB n = {-1,l}n, set UI = fIiEIui. \n\nn \n\ndeC \n\nDefinition 1 A homogeneous algebraic threshold network of degree d is a network of \n\nn threshold elements with interactions specified by a set of ( d: 1 ) real coefficients \n\nWI indexed by I in I d , and the evolution rule \n\nut = sgn ( L WIUI\\{i}) \n\nIeId :ieI \n\n(1) \n\nThese systems can be readily seen to be natural generalisations to higher(cid:173)\n\norder of the familiar case d = 1 of linear threshold networks. The added degrees of \nfreedom in the interaction coefficients can potentially result in enhanced flexibility \nand programming capability over the linear case as has been noted independently \nby several authors recently [2,3,4,5,22,27]. Note that each d-wise product uI\\i is just \nthe parity of the corresponding d inputs, and by our earlier discussion, this can be \ncomputed with d hidden units in one layer followed by a single threshold unit. Thus \nthe higher-order network can be realised by a network of depth three, where the first \nhidden layer has d( ~ ) units, the second hidden layer has ( ~ ) units, and there are \nn output units which feedback into the n input units. Note that the weights from \nthe input to the first hidden layer, and the first hidden layer to the second are fixed \n\n\f(computing the various d-wise products), and the weights from the second hidden \nlayer to the output are the coefficients WI which are free parameters. \n\nThese systems can be identified either with long range interactions for higher(cid:173)\n\n47 \n\norder spin glasses at zero temperature, or higher-order neural networks. Starting \nfrom an arbitrary configuration or state, the system evolves asynchronously by a \nsequence of single \"spin\" flips involving spins which are misaligned with the instan(cid:173)\ntaneous \"molecular field.\" The dynamics of these symmetric higher-order systems \nare regulated analogous to the linear system by higher-order extensions of the clas(cid:173)\nsical quadratic Hamiltonian. We define the homogeneous algebraic Hamiltonian of \ndegree d by \n\nHd(u) = - E WI'UI\u00b7 \n\n(2) \n\nThe algebraic Hamiltonians are functionals akin in behaviour to the classical \n\nquadratic Hamiltonian as has been previously demonstrated [5]. \n\nIeId \n\nProposition 1 The functional H d is non-increasing under the evolution rule 1. \n\nIn the terminology of spin glasses, the state trajectories of these higher-order \nnetworks can be seen to be following essentially a zero-temperature Monte Carlo \n(or Glauber) dynamics. Because of the monotonicity of the algebraic Hamiltonians \ngiven by equation 2 under the asynchronous evolution rule 1, the system always \nreaches a stable state (fixed point) where the relation 1 is satisfied for each of the n \nspins or neural states. The fixed points are hence the arbiters of system dynamics, \nand determine the computational capacity of the system. \n\nSystem behaviour and applications are somewhat different depending on \n\nwhether the interactions are random or programmed. The case of random interac(cid:173)\ntions lends itself to natural extensions of spin glass formulations, while programmed \ninteractions yield applications of higher-order extensions of neural network models. \nWe consider the two cases in turn. \n\n3.1 PROGRAMMED INTERACTIONS \n\nHere we query whether given sets of binary n-vectors can be stored as fixed points \nby a suitable selection of interaction coefficients. If such sets of prescribed vectors \ncan be stored as stable states for some suitable choice of interaction coefficients, \nthen proposition 1 will ensure that the chosen vectors are at the bottom of \"energy \nwells\" in the state space with each vector exercising a region of attraction around \nit-all characterestics of a physical associative memory. In such a situation the \ndynamical evolution of the network can be interpreted in terms of computations: \nerror-correction, nearest neighbour search and associative memory. Of importance \nhere is the maximum number of states that can be stored as fixed points for an \nappropriate choice of algebraic threshold network. This represents the maximal \ninformation storage capacity of such higher-order neural networks. \n\nLet d represent the degree ofthe algebraic threshold network. Let u(l), ... , u(m) \n\nbe the m-set of vectors which we require to store as fixed points in a suitable al(cid:173)\ngebraic threshold network. We will henceforth refer to these prescribed vectors as \n\n\f48 \n\nmemories. We define the storage capacity of an algebraic threshold network of de(cid:173)\ngree d to be the maximal number m of arbitrarily chosen memories which can be \nstored with high probability for appropriate choices of coefficients in the network. \n\nTheorem 1 The maximal (algorithm independent) storage capacity of a homoge(cid:173)\nneous algebraic threshold network of degree d is less than or equal to 2 ( ~ ). \n\nGeneralised Sum of Outer-Products Rule: The classical Reb bian rule for the \nlinear case d = 1 (cf. [11] and quoted references) can be naturally extended to \nnetworks of higher-order. The coefficients WI, IE Id are constructed as the sum of \ngeneralised Kronecker outer-products, \n\nm \n\nWI = L u~a). \n\na=l \n\nTheorem 2 The storage capacity of the outer-product algorithm applied to a ho(cid:173)\nmogeneous algebraic threshold network of degree d is less than or equal to n d /2(d + \nl)logn (also cf. [15,27]). \n\nGeneralised Spectral Rule: For d = 1 the spectral rule amounts to iteratively \nprojecting states orthogonally onto the linear space generated by u(1), ... , u(m), and \nthen taking the closest point on the hypercube to this projection (cf. [27,28]). This \napproach can be extended to higher-orders as we now describe. \n\nLet W denote the n X N(n,d) matrix of coefficients WI arranged lexicograph(cid:173)\n\nically; i.e., \n\nW= \n\nWl,l,2, ... ,d-l,d Wl,2,3, ... ,d,d+l \nW2,l,2, ... ,d-l,d W2,2,3, ... ,d,d+l \n\nWl,n-d+l,n-d+2, ... ,n-l,n \n\nW2,n-d+l,n-d+2, ... ,n-l,n \n\nW n ,l,2, ... ,d-l,d W n ,2,3, ... ,d,d+l \n\nW n ,n-d+l,n-d+2, ... ,n-l,n \n\nNote that the symmetry and the \"zero-diagonal\" nature of the interactions have \nbeen relaxed to increase capacity. Let U be the n X m matrix of memories. Form \nthe extended N(n,d) X m binary matrix 1 U = [lu(l) ... lu(m)], where \n\nu(a) \nl,2,. .. ,d-l,d \n(a) \n\nu1,2, ... ,d-l,d+l \n\n(a) \n\nun _ d+ l,n- d+2, ... ,n-l,n \n\nLet A = dgP' f3d , \n\nwhere kd(f3) > 0, and 0 ~ Cd(f3) < 1 are parameters depending solely on f3 and the \ninteraction order d. \n\n4 CONCLUSION \n\nIn fine, it appears possible to design shallow, polynomial size threshold circuits \nto compute a wide class of structured problems. The thesis that shallow circuits \ncompute more efficiently than deep circuits is borne out. For the particular case of \n\n\f50 \n\nhigher-order networks, all the garnered results appear to point in the same direction: \nFor neural networks of fixed degree d, the maximal number of programmable states is \nessentially of the order of nd\u2022 The total number of fixed points, however, appear to \nbe exponential in number (at least for the random interaction case) though almost \nall of them have constant attraction depths. \n\nReferences \n\n[1] Y. S. Abu-Mostafa, \"Number of synapses per neuron,\" in Analog VLSI and \n\nNeural Systems, ed. C. Mead, Addison Wesley, 1987. \n\n[2] P. Baldi, II. Some Contributions to the Theory of Neural Networks. Ph.D. The(cid:173)\n\nsis, California Insitute of Technology, June 1986. \n\n[3] P. Baldi and S. S. Venkatesh, \"Number of stable points for spin glasses and \nneural networks of higher orders,\" Phys. Rev. Lett., vol. 58, pp. 913-916, 1987. \n[4] P. Baldi and S. S. Venkatesh, \"Fixed points of algebraic threshold networks,\" \n\nin preparation. \n\n[5] H. H. Chen, et al, \"Higher order correlation model of associative memory,\" in \n\nNeural Networks for Computing. New York: AlP Conf. Proc., vol. 151, 1986. \n[6] S. F. Edwards and F. Tanaka, \"Analytical theory of the ground state properties \nof a spin glass: I. ising spin glass,\" Jnl. Phys. F, vol. 10, pp. 2769-2778, 1980. \n[7] G. C. Fox and S. W. Otto, \"Concurrent Computations and the Theory of \n\nComplex Systems,\" Caltech Concurrent Computation Program, March 1986. \n\n[8] F. H. Grick and C. Asanuma, ~'Certain aspects of the anatomy and physiology \nof the cerebral cortex,\" in Parallel Distributed Processing, vol. 2, eds. D. E. \nRumelhart and J. L. McCelland, pp. 333-371, MIT Press, 1986. \n\n[9] D. J. Gross and M. Mezard, \"The simplest spin glass,\" Nucl. Phys., vol. B240, \n\npp. 431-452, 1984. \n\n[10] J. Hasted, \"Almost optimal lower bounds for small depth circuits,\" Proc. 18-th \n\nACM STOC, pp. 6-20, 1986. \n\n[11] J. J. Hopfield, \"Neural networks and physical sytems with emergent collective \ncomputational abilities,\" Proc. Natl. Acad. Sci. USA, vol. 79, pp. 25.54-2558, \n1982. \n\n[12] J. S. Judd, \"Complexity of connectionist learning with various node functions,\" \nDept. of Computer and Information Science Technical Report, vol. 87-60, Univ. \nof Massachussetts, Amherst, 1987. \n\n[13] M. Kearns, M. Li, 1. Pitt, and L. Valiant, \"On the learnability of Boolean \n\nformulae,\" Proc. 19-th ACM STOC, 1987. \n\n[14] C. Koch, T. Poggio, and V. Torre, \"Retinal ganglion cells: A functional inter(cid:173)\n\npretation of dendritic morphology,\" Phil. Trans. R. Soc. London, vol. B 288, \npp. 227-264, 1982. \n\n\f51 \n\n[15] R. J. McEliece, E. C. Posner, E. R. Rodemich, and S. S. Venkatesh, \"The \ncapacity of the Hopfield associative memory,\" IEEE Trans. Inform. Theory, \nvol. IT-33, pp. 461-482, 1987. \n\n[16] R. J. McEliece and E. C. Posner, \"The number of stable points of an infinite(cid:173)\nrange spin glass memory,\" JPL Telecomm. and Data Acquisition Progress Re(cid:173)\nport, vol. 42-83, pp. 209-215, 1985. \n\n[17] C. A. Mead (ed.), Analog VLSI and Neural Systems, Addison Wesley, 1987. \n[18] N. Megiddo, \"On the complexity of polyhedral separability,\" to appear in Jnl. \n\nDiscrete and Computational Geometry, 1987. \n\n[19] S. Muroga, \"Lower bounds on the number of threshold functions,\" IEEE Trans. \n\nElec. Comp., vol. 15, pp. 805-806, 1966. \n\n[20] S. Muroga, Threshold Logic and its Applications, Wiley Interscience, 1971. \n[21] V. N. Peled and B. Simeone, \"Polynomial-time algorithms for regular set(cid:173)\n\ncovering and threshold synthesis,\" Discr. Appl. Math., vol. 12, pp. 57-69, 1985. \n[22] D. Psaltis and C. H. Park, \"Nonlinear discriminant functions and associative \nmemories,\" in Neural Networks for Computing. New York: AlP Conf. Proc., \nvol. 151, 1986. \n\n[23] J. Reif, \"On threshold circuits and polynomial computation,\" preprint. \n[24] R. Smolenski, \"Algebraic methods in the theory of lower bounds for Boolean \n\ncircuit complexity,\" Proc. J9-th ACM STOC, 1987. \n\n[25] L. G. Valiant, \"A theory of the learnable,\" Comm. ACM, vol. 27, pp. 1134-1142, \n\n1984. \n\n[26] L. G. Valiant, \"Deductive learning,\" Phil. Trans. R. Soc. London, vol. A 312, \n\npp. 441-446, 1984. \n\n[27] S. S. Venkatesh, Linear Maps with Point Rules: Applications to Pattern Clas(cid:173)\nsification and Associativ~ Memory. Ph.D. Thesis, California Institute of Tech(cid:173)\nnology, Aug. 1986. \n\n[28] S. S. Venkatesh and D. Psaltis, \"Linear and logarithmic capacities in associative \n\nneural networks,\" to appear IEEE Trans. Inform. Theory. \n\n[29] S. S. Venkatesh, D. Psaltis, and J. Yu, private communication. \n[30] R. O. Winder, \"Bounds on threshold gate realisability,\" IRE Trans. Elec. \n\nComp., vol. EC-12, pp. 561-564, 1963. \n\n[31] A. C. C. Yaa, \"Separating the poly-time hierarchy by oracles,\" Proc. 26-th \n\nIEEE FOCS, pp. 1-10, 1985. \n\n\f", "award": [], "sourceid": 66, "authors": [{"given_name": "Pierre", "family_name": "Baldi", "institution": null}, {"given_name": "Santosh", "family_name": "Venkatesh", "institution": null}]}