{"title": "The Capacity of the Kanerva Associative Memory is Exponential", "book": "Neural Information Processing Systems", "page_first": 184, "page_last": 191, "abstract": null, "full_text": "184 \n\nTHE CAPACITY OF THE KANERVA ASSOCIATIVE MEMORY IS EXPONENTIAL \n\nStanford University. Stanford. CA 94305 \n\nP. A. Choul \n\nABSTRACT \n\nThe capacity of an associative memory is defined as the maximum \n\nnumber of vords that can be stored and retrieved reliably by an address \nvithin a given sphere of attraction. It is shown by sphere packing \narguments that as the address length increases. the capacity of any \nassociati ve memory is limited to an exponential grovth rate of 1 - h2 ( 0). \nvhere h2(0) is the binary entropy function in bits. and 0 is the radius \nof the sphere of attraction. This exponential grovth in capacity can \nactually be achieved by the Kanerva associative memory. if its \nparameters are optimally set . Formulas for these op.timal values are \nprovided. The exponential grovth in capacity for the Kanerva \nassociative memory contrasts sharply vith the sub-linear grovth in \ncapacity for the Hopfield associative memory. \n\nASSOCIATIVE MEMORY AND ITS CAPACITY \n\nOur model of an associative memory is the folloving. Let ()(,Y) be \n\nan (address. datum) pair. vhere )( is a vector of n \u00b1ls and Y is a \nvector of m \u00b1ls. and let ()(l),y(I)), ... ,()(M) , y(M)). be M (address, \ndatum) pairs stored in an associative memory. \nis presented at the input vith an address )( that is close to some \nstored address )(W. then it should produce at the output a vord Y that \nis close to the corresponding contents y(j). To be specific, let us say \nthat an associative memory can correct fraction 0 errors if an )( vi thin \nHamming distance no of )((j) retrieves Y equal to y(j). The Hamming \nsphere around each )(W vill be called the sphere of attraction, and 0 \nviII be called the radius of attraction. \n\nIf the associative memory \n\nOne notion of the capacity of this associative memory is the \n\nmaximum number of vords that it can store vhile correcting fraction 0 \nerrors . Unfortunately. this notion of capacity is ill-defined. because \nit depends on exactly vhich (address. datum) pairs have been stored. \nClearly. no associative memory can correct fraction 0 errors for every \nsequence of stored (address, datum) pairs. Consider. for example, a \nsequence in vhich several different vords are vritten to the same \naddress . No memory can reliably retrieve the contents of the \novervritten vords. At the other extreme. any associative memory ' can \nstore an unlimited number of vords and retrieve them all reliably. if \ntheir contents are identical. \n\nA useful definition of capacity must lie somevhere betveen these \n\ntvo extremes. \nthat for most sequences of addresses XU), .. . , X(M) and most sequences of \ndata y(l), ... , y(M). the memory can correct fraction 0 errors. We define \n\nIn this paper. ve are interested in the largest M such \n\nIThis vork vas supported by the National Science Foundation under NSF \n\ngrant IST-8509860 and by an IBM Doctoral Fellovship. \n\n\u00a9 American Institute of Physics 1988 \n\n\f185 \n\nI most sequences' in a probabilistic sense, as some set of sequences yi th \ntotal probability greater than say, .99. When all sequences are \nequiprobab1e, this reduces to the deterministic version: \nsequences. \n\n991. of all \n\nIn practice it is too difficult to compute the capacity of a given \n\nassociative memory yith inputs of length n and outputs of length Tn. \nFortunately, though, it is easier to compute the asymptotic rate at \nwhich A1 increases, as n and Tn increase, for a given family of \nassociative memories. This is the approach taken by McEliece et al. [1] \ntoyards the capacity of the Hopfield associative memory. We take the \nsame approach tovards the capacity of the Kanerva associative memory, \nand tovards the capacities of associative memories in general . \nnext section ve provide an upper bound on the rate of grovth of the \ncapacity of any associative memory fitting our general model. \nIt is \nshown by sphere packing arguments that capacity is limited to an \nexponential rate of grovth of 1- h2(t5), vhere h2(t5) is the binary entropy \nfunction in bits, and 8 is the radius of attraction. \nIn a later section \nit vill turn out that this exponential grovth in capacity can actually \nbe achieved by the Kanerva associative memory, if its parameters are \noptimally set. This exponential grovth in capacity for the Kanerva \nassociative memory contrasts sharply yith the sub-linear grovth in \ncapacity for the Hopfield associative memory [1]. \n\nIn the \n\nA UNIVERSAL UPPER BOUND ON CAPACITY \n\nRecall that our definition of the capacity of an associative memory \n\nis the largest A1 such that for most sequences of addresses \nX(1), ... ,X(M) and most sequences of data y(l), ... , y(M), the memory can \ncorrect fraction 8 errors. Clearly, an upper bound to this capacity is \nthe largest Af for vhich there exists some sequence of addresses \nX(1), . . . , X(M) such that for most sequences of data y(l), ... , y(M), the \nmemory can correct fraction 8 errors. We nov derive an expression for \nthis upper bound. \n\nLet 8 be the radius of attraction and let DH(X(i) , d) be the sphere \nof attraction, i.e., the set of all Xs at most Hamming distance d= Ln8J \nfrom .y(j). Since by assumption the memory corrects fraction 8 errors, \nevery address X E DH(XU),d) retrieves the vord yW. The size of \nDH(XU),d) is easily shown to be independent of xU) and equal to \nvn.d = 2:%=0 (1:), vhere (I:) is the binomial coefficient n!jk!(n - k)!. Thus \nout of a total of 2n n-bit addresses, at least vn.d addresses retrieve \ny(l), at least Vn.d addresses retrieve y(2), at least Vn.d addresses \nretrieve y(~, and so forth. \nIt fol10vs that the total number of \ndistinct yU)s can be at most 2n jvn .d ' Nov, from Stirling's formula it \ncan be shovn that if d:S; nj2, then vn.d = 2nh2 (d/n)+O(logn), vhere \nh2( 8) = -81og2 8 -\nand O(logn) is some function yhose magnitude grovs more slovly than a \nconstant times log n. Thus the total number of distinct y(j)s can be at \nSince any set containing I most sequences' of Af \nmost 2n(1-h2(S\u00bb+O(logn) \nTn-bit vords vill contain a large number of distinct vords (if Tn is \n\n(1 - 8) log2( 1 - 8) is the binary entropy function in bits, \n\n\f186 \n\nFigure 1: Neural net representation of the Kanerva associative memory. Signals propa(cid:173)\ngate from the bottom (input) to the top (output). Each arc multiplies the signal by its \nweight; each node adds the incoming signals and then thresholds. \n\nsufficiently large --- see [2] for details), it follovs that \n\nM :5 2n(l-h 2 (o\u00bb+O(logn). \n\n(1) \n\nIn general a function fen) is said to be O(g(n)) if f(n)fg(n) is \n\nbounded, i.e. , if there exists a constant a such that If(n)1 :5 a\\g(n)1 for \nall n. Thus (1) says that there exists a constant a such that \nM :5 2n(l-h2 (S\u00bb+alogn. It should be emphasized that since a is unknow, \nthis bound has no meaning for fixed n. Hovever, it indicates that \nasymptotically in n, the maximum exponential rate of grovth of M is \n1 - h2 ( 6). \n\nIntui ti vely, only a sequence of addresses X(l), ... , X(M) that \n\noptimally pack the address space {-l,+l}n can hope to achieve this \nupper bound. Remarkably, most such sequences are optimal in this sense, \nvhen n is large. The Kanerva associative memory can take advantage of \nthis fact. \n\nTHE KANERVA ASSOCIATIVE MEMORY \n\nThe Kanerva associative memory [3,4] can be regarded as a tvo-layer \n\nneural netvork, as shovn in Figure 1, vhere the first layer is a \npreprocessor and the second layer is the usual Hopfield style array. \nThe preprocessor essentially encodes each n-bit input address into a \nvery large k-bit internal representation, k ~ n, vhose size will be \npermitted to grov exponentially in n. It does not seem surprising, \nthen, that the capacity of the Kanerva associative memory can grov \nexponentially in n, for it is knovn that the capacity of the Hopfield \narray grovs almost linearly in k, assuming the coordinates of the \nk-vector are dravn at random by independent flips of a fair coin [1]. \n\n\f187 \n\nFigure 2: Matrix representation of the Kanerva associative memory. Signals propagate \nfrom the right (input) to the left (output). Dimensions are shown in the box corners. \nCircles stand for functional composition; dots stand for matrix multiplication. \n\nIn this situation, hovever, such an assumption is ridiculous: Since the \nk-bit internal representation is a function of the n-bit input address, \nit can contain at most n bits of information, whereas independent flips \nof a fair coin contain k bits of information. Kanerva's primary \ncontribution is therefore the specification of the preprocessor, that \nis, the specification of how to map each n-bit input address into a very \nlarge k-bit internal representation. \n\nThe operation of the preprocessor is easily described. Consider \n\nthe matrix representation shovn in Figure 2. The matrix Z is randomly \npopulated vith \u00b1ls. This randomness assumption is required to ease the \nanalysis. The function fr is 1 in the ith coordinate if the ith row of \nZ is within Hamming distance r of X, and is Oothervise. This is \naccomplished by thresholding the ith input against n-2r. The \nparameters rand k are two essential parameters in the Kanerva \nassociative memory. \nin the representation fr(ZX) vill be very small in comparison to the \nnumber of Os. Hence fr(Z~Y) can be considered to be a sparse internal \nrepresentation of X. \nthe internal representation of X. That is, Y = g(W fr(ZX)), vhere \n\nThe second stage of the memory operates in the usual way, except on \n\nIf rand k are set correctly, then the number of 1s \n\nM \n\nl-V = LyU)[Jr(ZXU))]t, \n\ni=l \n\n(2) \n\nand 9 is the threshold function whose ith coordinate is +1 if the ith \ninput is greater than 0 and -1 is the ith input is less than O. The ith \ncolumn of l-V can be regarded as a memory location vhose address is the \nith row of Z. Every X vi thin Hamming distance r of the ith rov of Z \naccesses this location. Hence r is known as the access radius, and k is \nthe number of memory locations. \n\nThe approach taken in this paper is to fix the linear rate p at \n\nwhich r grovs vith n, and to fix the exponential rate ~ at which k grovs \nwith n. It turns out that the capacity then grovs at a fixed \nexponential rate Cp,~(t5), depending on p, ~, and 15. These exponential \nrates are sufficient to overcome the standard loose but simple \npolynomial bounds on the errors due to combinatorial approximations. \n\n\f188 \n\nTHE CAPACITY OF THE KANERVA ASSOCIATIVE MEMORY \n\nFix 0 $ K $1. 0 $ p$ 1/2. and 0 $ 0 $ min{2p,1/2}. Let n be the \n\ninput address length, and let Tn be the output word length. It is \nassumed that Tn is at most polynomial in n, i.e., Tn = exp{O(logn)}. Let \nr = IJmJ be the access radius, let k = 2L\"nJ be the number of memory \nlocations, and let d= LonJ be the radius of attraction. Let Afn be the \nnumber of stored words. The components of the n-vectors X(l), .. . , X(Mn) , \nthe m-vectors y(l), ... , y(,Yn), and the k X n matrix Z are assumed to be \nlID equiprobable \u00b11 random variables. Finally, given an n-vector X, \nlet Y = g(W fr(ZX)) where W = Ef;nl yU)[Jr(ZXW)jf. \n\nDefine the quantity \n\nCp ,,(0) = { 26 + 2(1- 0)h(P;~~2) + K, - 2h(p) \n\n'Cp,ICo(p)(o) \n\nif K, $ K,o(p) \nif K> K,O(p) \n\n, \n\nKO(p) = 2h(p) - 2; - 2(1- ;)h(P~242) + 1- he;) \n\n(3) \n\n(4) \n\nwhere \n\nand \n\nTheorem: If \n\n; = ~ - J 196 - 2p(1 - p). \nAf < 2nCp ... (5)+O(logn) \n\nn_ \n\nthen for all f>O, all sufficiently large n, all jE{l, ... ,Afn }. and all \nX E DH(X(j) , d), \n\nP{y -::J y(j)} < f. \n\nProof: See [2]. \nInterpretation: If the exponential growth rate of the number of \nstored words Afn is asymptotically less than C p,,, ( 0), then for every \nsufficiently large address length n. there is some realization of the \nnx 2n\" preprocessor matrix Z such that the associative memory can \ncorrect fraction 0 errors for most sequences of Afn (address, datum) \npairs. Thus Cp,IC( 0) is a lover bound on the exponential growth rate of \nthe capacity of the Kanerva associative memory with access radius np and \nnumber of memory locations 2nIC \u2022 \n\nFigure 3 shows Cp,IC(O) as a function of the radius of attraction 0, \nfor K,= K,o(p) and p=O.l, 0.2, 0.3, 0.4 and 0.45. For\u00b7 any fixed access \nradius p, Cp,ICO(p) ( 0) decreases as 0 increases. This reflects the fact \nthat fewer (address, datum) pairs can be stored if a greater fraction of \nerrors must be corrected. As p increases, Cp,,,o(p)(o) begins at a lower \nIn a moment we shall see that p can \npoint but falls off less steeply. \nbe adjusted to provide the optimal performance for a given O. \n\nNot ShOVIl in Figure 3 is the behavior of Cp,,, ( 0) as a function of K,. \n\nHowever, the behavior is simple. For K, > K,o(p), Cp,,,(o) remains \nunchanged, while for K$ K,o(p), Cp,,,(o) is simply shifted doVIl by the \ndifference KO(p)-K,. This establishes the conditions under which the \nKanerva associative memory is robust against random component failures. \nAlthough increasing the number of memory locations beyond 2rl11:o(p) does \nnot increase the capacity, it does increase robustness. Random \n\n\f189 \n\n0.8 \n\n0.6 \n\n'!I.2 ...... - - -\n\n\" \n\n\" \u2022 1 \n\n1Il.2 \n\n1Il.3 \n\nIIl.S \n\nFigure 3: Graphs of Cp,lCo(p)(o) as defined by (3). The upper envelope is 1- h2(0). \n\ncomponent failures will not affect the capacity until so many components \nhave failed that the number of surviving memory locations is less than \n2nlCo (p) . \n\nPerhaps the most important curve exhibited in Figure 3 is the \n\nsphere packing upper bound 1 - h2 ( 0). which is achieved for a particular \n\np by b = ~ - J 196 - 2p(1 - p). Equivalently. the upper bound is achieved \n\nfor a particular 0 by P equal to \n\npoCo) = t - Jt - iO(l - ~o). \n\n(5) \n\nThus (4) and (5) specify the optimal values of the parameters K and P. \nrespectively. These functions are shown in Figure 4. With these \noptimal values. (3) simplifies to \n\nthe sphere packing bound. \n\nIt can also be seen that for 0 = 0 in (3). the exponential growth \n\nrate of the capacity is asymptotically equal to K. which is the \nexponential growth rate of the number of memory locations. k n \u2022 That is. \nMn = 2n1C+O(logn) = k n . 20 (logn). Kanerva [3] and Keeler [5] have argued \nthat the capacity at 8 =0 is proportional to the number of memory \nlocations, i.e .\u2022 Mn = k n . (3. for some constant (3. Thus our results are \nconsistent with those of Kanerva and Keeler. provided the 'polynomial' \n20 (logn) can be proved to be a constant. However. the usual statement of \ntheir result. M = k\u00b7(3. that the capacity is simply proportional to the \nnumber of memory locations. is false. since in light of the universal \n\n\f190 \n\nliLS \n\nriJ.S \n\no \n\nFigure 4: Graphs of KO(p) and co(p), the inverse of Po(<5), as defined by (4) and (5). \n\nIn our formulation, this \n\nupper bound, it is impossible for the capacity to grow without bound, \nwith no dependence on the dimension n. \ndifficulty does not arise because we have explicitly related the number \nof memory locations to the input dimension: kn =2n~. In fact, our \nformulation provides explicit, coherent relationships between all of the \nthe capacity .~, the number of memory locations k, \nfollowing variables: \nthe input and output dimensions n and Tn, the radius of attraction C, \nand the access radius p. We are therefore able to generalize the \nresults of [3,5] to the case C >0, and provide explicit expressions for \nthe asymptotically optimal values of p and K as well. \n\nCONCLUSION \n\nWe described a fairly general model of associative memory and \n\nselected a useful definition of its capacity. A universal upper bound \non the growth of the capacity of such an associative memory was shown by \na sphere packing argument to be exponential with rate 1 - h2( c), where \nh2(C) is the binary entropy function and 8 is the radius of attraction. \nWe reviewed the operation of the Kanerva associative memory, and stated \na lower bound on the exponential growth rate of its capacity. This \nlower bound meets the universal upper bound for optimal values of the \nmemory parameters p and K. We provided explicit formulas for these \noptimal values. Previous results for <5 =0 stating that the capacity of \nthe Kanerva associative memory is proportional to the number of memory \nlocations cannot be strictly true. Our formulation corrects the problem \nand generalizes those results to the case C > o. \n\n\f191 \n\nREFERENCES \n\n1. R.J. McEliece, E.C. Posner, E.R. Rodemich, and S.S. Venkatesh, \n\n\"The capacity of the Hopfield associative memory,\" \nTransactions on Information Theory, submi tt ed . \n\nIEEE \n\n2. P.A. Chou, \"The capacity of the Kanerva associative memory,\" \n\nIEEE Transactions on Information Theory, submitted. \n\n3. P. Kanerva, \"Self-propagating search: \n\na unified theory of \n\nmemory,\" Tech. Rep. CSLI-84-7, Stanford Center for the Study of \nLanguage and Information. Stanford. CA, March 1984. \n\n4. P. Kanerva, \"Parallel structures in human and computer memory,\" \n\nin Neural Networks for Computing, (J .S. Denker. ed.), Nev York: \nAmerican Institute of Physics. 1986. \n\n5 . J.D. Keeler. \"Comparison betveen sparsely distributed memory and \n\nHopfield-type neural netvork models,\" Tech . Rep. RIACS TR 86 . 31, \nNASA Research Institute for Advanced Computer Science, Mountain \nViev. CA, Dec. \n\n1986. \n\n\f", "award": [], "sourceid": 2, "authors": [{"given_name": "Philip", "family_name": "Chou", "institution": null}]}