{"title": "Memory Capacity of Linear vs. Nonlinear Models of Dendritic Integration", "book": "Advances in Neural Information Processing Systems", "page_first": 157, "page_last": 163, "abstract": null, "full_text": "Memory Capacity of Linear vs. Nonlinear \n\nModels of Dendritic Integration \n\nPanayiota Poirazi* \n\nBartlett W. Mel* \n\nBiomedical Engineering Department \n\nUniversity of Southern California \n\nBiomedical Engineering Department \n\nUniversity of Southern California \n\nLos Angeles, CA 90089 \n\npoirazi@sc/. usc. edu \n\nLos Angeles, CA 90089 \n\nmel@lnc.usc.edu \n\nAbstract \n\nPrevious biophysical modeling work showed that nonlinear interac(cid:173)\ntions among nearby synapses located on active dendritic trees can \nprovide a large boost in the memory capacity of a cell (Mel, 1992a, \n1992b). The aim of our present work is to quantify this boost by \nestimating the capacity of (1) a neuron model with passive den(cid:173)\ndritic integration where inputs are combined linearly across the \nentire cell followed by a single global threshold, and (2) an active \ndendrite model in which a threshold is applied separately to the \noutput of each branch, and the branch subtotals are combined lin(cid:173)\nearly. We focus here on the limiting case of binary-valued synaptic \nweights, and derive expressions which measure model capacity by \nestimating the number of distinct input-output functions available \nto both neuron types. We show that (1) the application of a fixed \nnonlinearity to each dendritic compartment substantially increases \nthe model's flexibility, (2) for a neuron of realistic size, the capacity \nof the nonlinear cell can exceed that of the same-sized linear cell by \nmore than an order of magnitude, and (3) the largest capacity boost \noccurs for cells with a relatively large number of dendritic subunits \nof relatively small size. We validated the analysis by empirically \nmeasuring memory capacity with randomized two-class classifica(cid:173)\ntion problems, where a stochastic delta rule was used to train both \nlinear and nonlinear models. We found that large capacity boosts \npredicted for the nonlinear dendritic model were readily achieved \nin practice. \n\n-http://lnc.usc.edu \n\n\f158 \n\nP. Poirazi and B. W. Mel \n\n1 \n\nIntroduction \n\nBoth physiological evidence and connectionist theory support the notion that in \nthe brain, memories are stored in the pattern of learned synaptic weight values. \nExperiments in a variety of neuronal preparations however, inQicate that the ef(cid:173)\nficacy of synaptic transmission can undergo substantial fluctuations up or down, \nor both, during brief trains of synaptic stimuli. Large fluctuations in synaptic ef(cid:173)\nficacy on short time scales seem inconsistent with the conventional connectionist \nassumption of stable, high-resolution synaptic weight values. Furthermore, a recent \nexperimental study suggests that excitatory synapses in the hippocampus-a region \nimplicated in certain forms of explicit memory-may exist in only a few long-term \nstable states, where the continuous grading of synaptic strength seen in standard \nmeasures of long-term potentiation (LTP) may exist only in the average over a large \npopulation of two-state synapses with randomly staggered thresholds for learning \n(Petersen, Malenka, Nicoli, & Hopfield, 1998). According to conventional connec(cid:173)\ntionist notions, the possibility that individual synapses hold only one or two bits of \nlong-term state information would seem to have serious implications for the storage \ncapacity of neural tissue. Exploration of this question is one of the main themes of \nthis paper. \nIn a related vein, we have found in previous biophysical modeling studies that \nnonlinear interactions between synapses co-activated on the same branch of an ac(cid:173)\ntive dendritic tree could provide an alternative form of long-term storage capacity. \nThis capacity, which is largely orthogonal to that tied up in conventional synaptic \nweights, is contained instead in the spatial permutation of synaptic connections \nonto the dendritic tree-which could in principle be modified in the course of learn(cid:173)\ning or development (Mel, 1992a, 1992b). In a more abstract setting, we recently \nshowed that a large repository of model flexibility lies in the choice as to which of \na large number of possible interaction terms available in high dimension is actually \nincluded in a learning machine's discriminant function, and that the excess capac(cid:173)\nity contained in this \"choice flexibility\" can be quantified using straightforward \ncounting arguments (Poirazi & Mel, 1999). \n\n2 Two Alternative Models of Dendritic Integration \n\nIn this paper, we use a similar function-counting approach to address the more \nbiologically relevant case of a neuron with mUltiple quasi-independent dendritic \ncompartments (fig. 1). Our primary objective has been to compare the memory \ncapacity of a cell assuming two different modes of dendritic integration. According \nto the linear model, the neuron's activation level aL(x) prior to thresholding is \ngiven by a weighted sum of of its inputs over the cell as a whole. According to the \nnonlinear model, the k synaptic inputs to each branch are first combined linearly, \na static (e.g. sigmoidal) nonlinearity is applied to each of the m branch subtotals, \nand the resulting branch outputs are summed to produce the cell's overall activity \naN{x): \n\nThe expressions for aL and aN were written in similar form to emphasize that the \nmodels have an identical number of synaptic weights, differing only in the presence \nor absence of a fixed nonlinear function g applied to the branch subtotals. Though \nindividual synaptic weights in both models are constrained to have a value of 1, \nany of the d input lines may form multiple connections on the same or different \n\n(1) \n\n\fMemory Capacity of Linear vs. Nonlinear Models of Dendritic Integration \n\n159 \n\nm \n\n3 \n\n\u2022' . . \n\n, \n\n. \n\n, \n\nI \n\nFigure 1: A cell is modeled as a set of m identical branches connected to a soma, \nwhere each branch contains k synaptic contacts driven by one of d distinct input \nlines. \n\nbranches as a means of representing graded synaptic strengths. Similarly, an input \nline which forms no connection has an implicit weight of O. In light of this restriction \nto positive (or zero) weight values, both the linear and nonlinear models are split \ninto two opponent channels a+ and a- dedicated to positive vs. negative coefficients, \nrespectively. This leads to a final output for each model: \n\nyL(x) = sgn [at(x) - aL(x)] \n\nYN(X) = sgn [a;t(x) - aiV(x)] \n\n(2) \n\nwhere the sgn operator maps the total activation level into a class label of {-I, I}. \nIn the following, we derive expressions for the number of distinct parameter st.ates \navailable to the linear vs. nonlinear models, a measure which we have found to be \na reliable predictor of storage capacity under certain restrictions (Poirazi & Mel, \n1999). Based on these expressions, we compute the capacity boost provided by \nthe branch nonlinearity as a function of the number of branches m, synaptic sites \nper branch k, and input space dimensionality d. Finally, we test the predictions of \nthe analytical model by training both linear and nonlinear models on randomized \nclassification problems using a stochastic delta rule, and empirically measure and \ncompare the storage capacities of the two models. \n\n3 Results \n\n3.1 Counting Parameter States: Linear vs. Nonlinear Model \n\nWe derived expressions for BLand B N, which estimate the total number of param(cid:173)\neter bits available to the linear vs. nonlinear models, respectively: \n\nB N = 2log2 \n\n(( k+d-1) \n\nk m + m -\n\n1) \n\nBL = 2log2 ( S+d-1) \n\nS \n\n(3) \nThese expressions estimate the number of non-redundant states in each neuron \ntype, i.e., those assignments of input lines to dendritic sites which yield distinct \n\n\f160 \n\nP Poirazi and B. W Mel \n\ninput-output functions YL or YN\u00b7 \nThese formulae are plotted in figure 2A with d = 100, where each curve represents \na cell with a fixed number of branches (indicated by m). In each case, the capac(cid:173)\nity increases steadily as the number of synapses per branch, k, is increased. The \nlogarithmic growth in the capacity of the linear model (evident in an asymptotic \nanalysis of the expression for B L) is shown at the bottom of the graph (circles), \nfrom which it may be seen that the boost in capacity provided by the dendritic \nbranch nonlinearity increases steadily with the number of synaptic sites. For a cell \nwith 100 branches containing 100 synaptic sites each, the capacity boost relative to \nthe linear model exceeds a factor of 20. \nFigure 2B shows that for a given total number of synaptic sites, in this case s = \nm\u00b7 k = 10,000, the capacity of the nonlinear cell is maximized for a specific choice \nof m and k. The peak of each of the three curves (computed for different values \nof d) occurs for a cell containing 1,250 branches with 8 synapses each. However, \nthe capacity is only moderately sensitive to the branch count: the capacity of a cell \nwith 100 branches of 100 synapses each, for example, lies within a factor of two of \nthe optimal configuration. The linear cell capacities can be found at the far right \nedge of the plot (m = 10,000), since a nonlinear model with one synapse per branch \nhas a number of trainable states identical to that of a linear model. \n\n3.2 Validating the Analytical Model \n\nTo test the predictions of the analytical model, we trained both linear and non(cid:173)\nlinear cells on randomized two-class classification problems. Training samples were \ndrawn from a 40-dimensional spherical Gaussian distribution and were randomly \nassigned positive or negative labels-in some runs, training patterns were evenly \ndivided between positive and negative labels, with similar results. Each of the 40 \noriginal input dimensions was recoded using a set of 10 I-dimensional binary, non(cid:173)\noverlapping receptive fields with centers spaced along each dimension such that all \nreceptive fields would be activated equally often. This manipulation mapped the \noriginal 40-dimensional learning problem into 400 dimensions, thereby increasing \nthe discriminability of the training samples. The relative memory capacity of linear \nvs. nonlinear cells was then determined empirically by comparing the number of \ntraining patterns learnable at a fixed error rate of 2%. \nThe learning rule used for both cell types was similar to the \"clusteron\" learning \nrule described in (Mel, 1992a), and involved two mechanisms known to contribute to \nneural development: (1) random activity-independent synapse formation, and (2) \nactivity-dependent synapse stabilization. In each iteration, a set of 25 synapses was \nchosen at random, and the \"worst\" synapse was identified based on the correlation \nover the training set of (i) the input's pre-synaptic activity, (ii) the post-synaptic \nactivity (Le. the local nonlinear branch response for the nonlinear energy model or \na constant of 1 for the linear model), and (iii) a global \"delta\" signal with a value \nof a if the cell responded correctly to the input pattern, or \u00b1l if the cell responded \nincorrectly. The poorest-performing synapse on the branch was then targeted for \nreplacement with a new synapse drawn at random from the d input lines. The \nprobability that the replacement actually occurred was given by a Boltzmann equa(cid:173)\ntion based on the difference in the training set error rates before and after the \nreplacement. A \"temperature\" variable was gradually lowered over the course of \nthe simulation, which was terminated when no further improvement in error rates \nwas seen. \n\nResults of the learning runs are shown in fig. 3 where the analytical capacity (mea(cid:173)\nsured in bits) was scaled to the numerical capacity (measured in training patterns \n\n\fMemory Capacity of Linear vs. Nonlinear Models of Dendritic Integration \n\n161 \n\nA Capacity of Linear vs. Nonlinear \n\nModel for Various Geometries \nx 10' \nd = 100 \n\n8.---~----~--~----~---, \n\nNonlinear Model H r \n\nm=lOOO ~ \n\nm- 000 \n\n14 .t\" ....... \n12 : d= lOO~ \n\" \n\nj \n, \n10 j \n\nB Capacity of Linear VS. Nonlinear Model \nfor Different Input Space Dimensions \nx10' \n\ns = 10,000 \n\n, Nonlinear Model \n\n, \n\n, \n\n'(i.) \n\nm \n\n.... co \n\n....... \n\n7 \n\n6 \n~ \n\u00a7 5 \n>. \n'0 4 \n!IS 1t3 \nU \n\n2 \n\nLinear Model \n\n2000 \n\n4000 \n\n6000 \n\n8000 \n\n10000 \n\n~~l Syn:p:c Sires ~ \n\no \n\n'\"'. \n\nLinear ~del \n\nl~ \n\n10000 \n\n4000 \n\n2000 \n8000 \nNumber of Branches (m) \n\n6000 \n\n* \n\nFigure 2: Comparison of linear vs. nonlinear model capacity as a function of branch \ngeometry. A. Capacity in bits for linear and several nonlinear cells with different \nbranch counts (for d = 100). For each curve indexed by branch count m, sites per \nbranch k increases from left to right as indicated iconically beneath the x-axis. For \nall cells, capacity increases with an increasing number of sites, though the capacity \nof the linear model grows logarithmically, leading to an increasingly large capacity \nboost for the size-matched nonlinear cells. B. Capacity of a nonlinear model with \n10,000 sites for different values of input space dimension d. Branch count m grows \nalong the x-axis. Cells at right edge of plot contain only one synapse per branch, \nand thus have a number of modifiable parameters (and hence capacity) equivalent \nto that of the linear model. All three curves show that there exist an optimal \ngeometry which maximizes the capacity of the nonlinear model (in this case 1,250 \nbranches with 8 synapses each). \n\nlearned at 2% error). Two key features of the theoretical curves (dashed lines) are \nechoed in the empirical performance curves (solid lines), including the much larger \nstorage capacity of the nonlinear cell model, and the specific cell geometry which \nmaximizes the capacity boost. \n\n4 Discussion \n\nWe found using both analytical and numerical methods that in the limit of low(cid:173)\nresolution synaptic weights, application of a fixed output nonlinearity to each com(cid:173)\npartment of a dendritic tree leads to a significant boost in capacity relative to a \ncell whose post-synaptic integration is linear. For example, given a cell with 10,000 \nsynaptic contacts originating from 400 distinct input lines, the analysis predicts a \n23-fold increase in capacity for the nonlinear cell, while numerical simulations using \na stochastic delta rule actually achieve a I5-fold boost. \nGiven that a linear and a nonlinear model have an identical number of synaptic con(cid:173)\ntacts with uniform synaptic weight values, what accounts for the capacity boost? \nThe principal insight gained in this work is that the attachment of a fixed non(cid:173)\nlinearity to each branch in a neuron substantially increases its underlying \"model \n\n\f162 \n\nP. Poirazi and B. W. Mel \n\n- _. Analytical \n(Bits/14) \nNumerical \n(Training Patterns) \n\n-\n\n\\ \n\n\\ \n\n\\ \n\n\\ . \n\\ Nonlinear Model \n\\ \n\n____________________ ~, \n\n70 \n\nI \n6 1,+ \n\nI, \n\n>. 50 \n'13 \n~ 40 \n03 \n<..) 30 \n\n2 \n\nFigure 3: Comparison of ca(cid:173)\npacity boost predicted by analy(cid:173)\nsis vs. that observed empirically \nwhen linear and nonlinear mod(cid:173)\nels were trained using the same \nstochastic delta rule. Dashed \nlines: analytical curves for lin(cid:173)\near vs. nonlinear model for a cell \nwith 10,000 sites show capacity \nfor varying cell geometries. Solid \nlines: empirical performance for \nsame two cells at 2% error cri(cid:173)\nterion, using a subunit nonlin(cid:173)\nearity g(x) = x lO (similar re(cid:173)\nsults were seen using a sigmoidal \nnonlinearity, though the param(cid:173)\neters of the optimal sigmoid de(cid:173)\npended on the cell geometry). \nFor both analytical and numeri-\n2 cal curves, peak capacity is seen \nfor cell with 1,000 branches (10 \nsynapses per branch) .. Cap~city \nexceeds that of same-sIzed lmear \n.:Jk- model by a factor of 15 at the \n~ peak, and by more than a factor \nof 7 for cells ranging from about \n3 to 60 synapses per branch (hor-\nizontal dotted line). \n\nx10 \n\n'\" \n\n, , , , , , ,. \n\nLinear Model \n\n, \no \no 10 20 30 40 50 60 70 80 90 100 \n\nNumber of Branches (m) \n\n* \n\n---I...... \nm \n\nflexibility\" , i.e. confers upon the cell a much larger choice of distinct input-output \nrelations from which to select during learning. This may be illustrated as follows. \nFor the linear model, branching structure is irrelevant so that Y L depends only on \nthe number of input connections formed from each of the d input lines. All spatial \npermutations of a set of input connections are thus interchangeable and produce \nidentical cell responses. This massive redundancy confines the capacity of the linear \nmodel to grow only logarithmically with an increasing number of synaptic sites (fig. \n1A), an unfortunate limitation for a brain in which the formation of large num(cid:173)\nbers of synaptic contacts between neurons is routine. In contrast, the model with \nnonlinear subunits contains many fewer redundancies: most spatial permutations \nof the same set of input connections lead to non-identical values of YN, since an \ninput x swapped from branch bi to branch b2 leads to the elimination of the k - 1 \ninteraction terms involving x on branch bi and the creation of k -1 new interaction \nterms on branch b2 \u2022 \n\nInterestingly, the particular form of the branch nonlinearity has virtually no effect \non the capacity of the cell as far as the counting arguments are concerned (though \nit can have a profound effect on the cell's \"representational bias\"-see below), since \nthe principal effect of the nonlinearity in our capacity calculations is to break the \nsymmetry among the different branches. \n\nThe issue of representational bias is a critical one, however, and must be considered \nwhen attempting to predict absolute or relative performance rates for particular \nclassifiers confronted with specific learning problems. Thus, intrinsic differences in \nthe geometry of linear vs. nonlinear discriminant functions mean that the param-\n\n\fMemory Capacity of Linear vs. Nonlinear Models of Dendritic Integration \n\n163 \n\neters available to the two models may be better or worse suited to solve a given \nlearning problem, even if the two models were equated for total parameter flexibility. \nWhile such biases are not taken into account in our analysis, they could nonetheless \nhave a substantial effect on measured error rates-and could thus throw a perfor(cid:173)\nmance advantage to one machine or the other. One danger is that performance \ndifferences measured empirically could be misinterpreted as arising from differences \nin underlying model capacity, when in fact they arise from differential suitability \nof the two classifiers for the learning problem at hand. To avoid this difficulty, the \nrandom classification problems we used to empirically assess memory capacity were \nchosen to level the playing field for the linear vs. nonlinear cells, since in a previous \nstudy we found that the coefficients on linear vs. nonlinear (quadratic) terms were \nabout equally efficient as featUres for this task. In this way, differences in measured \nperformance on these tasks were primarily attributable to underlying capacity dif(cid:173)\nferences, rather than differences in representational bias. This experimental control \npermitted more meaningful comparisons between our analytical and empirical tests \n(fig. 3). \nThe problem of representational bias crops up in a second guise, wherein the an(cid:173)\nalytical expressions for capacity in eq. 1 can significantly overestimate the actual \nperformance of the cell. This occurs when a particular ensemble of learning prob(cid:173)\nlems fails to utilize all of the entropy available in the cell's parameter space-for \nexample, by requiring the cell to visit only a small subset of its parameter states rel(cid:173)\natively often. This invalidates the maximum parameter entropy assumption made \nin the derivation of eq. 1, so that measured performance will tend to fall below \npredicted values. The actual performance of either model when confronted with \nan ensemble of learning problems will thus be determined by (1) the number of \ntrainable parameters available to the neuron (as measured by eq. 1), (2) the suit(cid:173)\nability of the neuron's parameters for solving the assigned learning problems, and \n(3) the utilization of parameters, which relates to the entropy in the joint proba(cid:173)\nbility of the parameter values averaged over the ensemble of learning problems. In \nour comparisons here of linear and nonlinear cells, we we have calculated (1), and \nhave attempted to control for (2) and (3). \nIn conclusion, our results build upon the results of earlier biophysical simulations, \nand indicate that in the limit of a large number of low-resolution synaptic weights, \nnonlinear dendritic processing could nonetheless have a major impact on the storage \ncapacity of neural tissue. \n\nReferences \nMel, B. W. (1992a). The clusteron: Toward a simple abstraction for a complex \nneuron. In Moody, J., Hanson, S., & Lippmann, R. (Eds.), Advances in Neural \nInformation Processing Systems, vol. 4, pp. 35-42: Morgan Kaufmann, San \nMateo, CA. \n\nMel, B. W. (1992b). NMDA-based pattern discrimination in a modeled cortical \n\nneuron. Neural Comp., 4, 502-516. \n\nPetersen, C. C. H., Malenka, R. C., Nicoll, R. A., & Hopfield, J. J. (1998). All-or(cid:173)\nnone potentiation and CA3-CA1 synapses. Proc. Natl. Acad. Sci. USA, 95, \n4732-4737. \n\nPoirazi, P., & Mel, B. W. (1999). Choice and value flexibility jointly contribute to \n\nthe capacity of a subsampled quadratic classifier. Neural Comp., in press. \n\n\f", "award": [], "sourceid": 1646, "authors": [{"given_name": "Panayiota", "family_name": "Poirazi", "institution": null}, {"given_name": "Bartlett", "family_name": "Mel", "institution": null}]}