{"title": "Bidirectional Retrieval from Associative Memory", "book": "Advances in Neural Information Processing Systems", "page_first": 675, "page_last": 681, "abstract": "", "full_text": "Bidirectional Retrieval from Associative \n\nMemory \n\nFriedrich T. Sommer and Gunther Palm \nDepartment of Neural Information Processing \n\nUniversity of Ulm, 89069 Ulm, Germany \n{sommer,palm}~informatik.uni-ulm.de \n\nAbstract \n\nSimilarity based fault tolerant retrieval in neural associative mem(cid:173)\nories (N AM) has not lead to wiedespread applications. A draw(cid:173)\nback of the efficient Willshaw model for sparse patterns [Ste61, \nWBLH69], is that the high asymptotic information capacity is of \nlittle practical use because of high cross talk noise arising in the \nretrieval for finite sizes. Here a new bidirectional iterative retrieval \nmethod for the Willshaw model is presented, called crosswise bidi(cid:173)\nrectional (CB) retrieval, providing enhanced performance. We dis(cid:173)\ncuss its asymptotic capacity limit, analyze the first step, and com(cid:173)\npare it in experiments with the Willshaw model. Applying the very \nefficient CB memory model either in information retrieval systems \nor as a functional model for reciprocal cortico-cortical pathways \nrequires more than robustness against random noise in the input: \nOur experiments show also the segmentation ability of CB-retrieval \nwith addresses containing the superposition of pattens, provided \neven at high memory load. \n\n1 \n\nINTRODUCTION \n\nFrom a technical point of view neural associative memories (N AM) provide data \nstorage and retrieval. Neural models naturally imply parallel implementation of \nstorage and retrieval algorithms by the correspondence to synaptic modification \nand neural activation. With distributed coding of the data the recall in N AM \nmodels is fault tolerant: It is robust against noise or superposition in the addresses \nand against local damage in the synaptic weight matrix. As biological models N AM \n\n\f676 \n\nF. T. Sommer and G. Palm \n\nhave been proposed as general working schemes of networks of pyramidal cells in \nmany places of the cortex. \n\nAn important property of a NAM model is its information capacity, measuring \nhow efficient the synaptic weights are used. In the early sixties Steinbuch realized \nunder the name \"Lernmatrix\" a memory model with binary synapses which is now \nknown as Wills haw model [Ste6I, WBLH69]. The great variety of NAM models \nproposed since then, many triggered by Hopfield's work [Hop82], do not reach the \nhigh asymptotic information capacity of the Willshaw model. \n\nFor finite network size, the Willshaw model does not optimally retrieve the \nstored information, since the inner product between matrix colum and input \npattern determines the activity for each output neuron independently. For au(cid:173)\ntoassociative pattern completion iterative retrieval can reduce cross talk noise \n[GM76, GR92, PS92, SSP96]. A simple bidirectional iteration - as in bidirectional \nassociative memory (BAM) [Kos87] - can, however, not improve heteroassociative \npattern mapping. For this task we propose CB-retrieval where each retrieval step \nforms the resulting activity pattern in an autoassociative process that uses the con(cid:173)\nnectivity matrix twice before thresholding, thereby exploiting the stored information \nmore efficiently. \n\n2 WILLSHAW MODEL AND CB EXTENSION \n\nHere pattern mapping tasks XV -+ yV are considered for a set of memory patterns: \n{(XV,yV): XV E {O,I}n,yv E {o,I}m,v = I, ... ,M}. The number of I-components \nin a pattern is called pattern activity. The Willshaw model works efficiently, if \nthe memories are sparse, i.e., if the memory patterns have the same activities: \nIxvi = 2:~=I xi = a,lyvl = 2::1 Yi = b V v with a \u00ab nand b \u00abm. During \nlearning the set of memory patterns is transformed to the weight matrix by \n\nCij = min(I, L xiv}) = supxiy'j\u00b7 \n\nV \n\nV \n\nFor a given initial pattern XJ1. the retrieval yields the output pattern yJ1. by forming \nin each neuron the dendritic sum [CxJ1.]j = 2:i Ci/if and by calculating the activity \nvalue by threshold comparison \n\nyj = H([CxJ1.j j - 9) Vj, \n\n(1) \n\nwith the global threshold value 9 and H(x) denoting the Heaviside function. \nFor finite sizes and with high memory load, Le., 0\u00ab PI := Prob[Cij = 1] \u00ab 0.5), \nthe Willshaw model provides no tolerance with respect to errors in the address, see \nFig. 1 and 2. A bidirectional iteration of standard simple retrieval (1), as proposed in \nBAM models [Kos87], can therefore be ruled out for further retrieval error reduction \n[SP97j. In the energy function of the Willshaw BAM \n\nE(x,y) = - LCijXiYj + 8' LXi + 8 LYj \n\nij \n\ni \n\nj \n\nwe now indroduce a factor accounting for the magnitudes of dendritic potentials at \nacti vated neurons \n\n(2) \n\n\fBidirectional Retrieval from Associative Memory \n\nDifferentiating the energy function (2) yields the gradient descent equations \n\nyr W = H( [CxU + L 'LCijCikXi Yk - 8 ) \n\nX~ew = H( [CT y); + L \"LPiiCljYi Xl - 8' ) \n\nk \n\ni \n\n\" ' -v - - - ' \n\n=:Wjk \n\nI \n\ni ---------\n\n=:wfr \n\n677 \n\n(3) \n\n(4) \n\nAs new terms in (3) and (4) sums over pattern components weighted with the \nquantities wjk and wft occur. wjk is the overlap between the matrix columns j and \nk conditioned by the pattern X, which we call a conditioned link between y-units. \nRestriction on the conditioned link terms yields a new iterative retrieval scheme \nwhich we denote as crosswise bidirectional (eB) retrieval \n\ny(r+ I)i = H( 'L Cij[CT y(r-I))i - 8) \nH( L Cij[Cx(r-I))j - 8') \n\niEx(r) \n\niEy(r) \n\n(5) \n\n(6) \n\nFor r = 0 pattern y(r:-I) has to be replaced by H([Cx(O)] - 0), for r > 2 Boolean \nANDing with results from timestep r - 1 can be applied which has been shown to \nimprove iterative retrieval in the Willshaw model for autoassociation [SSP96]. \n\n3 MODEL EVALUATION \n\nTwo possible retrieval error types can be distinguished: a \"miss\" error converts a \nI-entry in Y~ to '0' and a \"add\" error does the opposite. \n\n]. \n\" \n2. \n\" \n,. \n\n35 \n30 \n25 \n20 \n15 \n10 \n5 \n0 \n\n2. \n\n]. \n\n40 \n\nsimple r. add error ... . . \n\nC8-r. add error -\nCB-r. miss error ..... \n\n5 \n\n10 \n\n15 \n\n20 \n\n25 \n\n30 \n\nFigure 1: Mean retrieval error rates for n = 2000, M = 15000, a = b = 10 \ncorresponding to a memory load of H = 0.3. The x-axes display the address \nactivity: lilLl = 10 corresponds to a errorfree learning pattern, lower activities are \ndue to miss errors, higher activities due to add errors. Left: Theory - Add errors \nfor simple retrieval, eq. (7) (upper curve) and lower bound for the first step of \nCB-retrieval, eq. (9). Right: Simulations - Errors for simple and CB retrieval. \n\nThe analysis of simple retrieval from the address i~ yields with optimal threshold \nsetting 0 = k the add error rate, i.e, the expectation of spurious ones: \n\n& = (m - b)Prob [r ~ k] , \n\n(7) \n\n\f678 \n\nF. T. Sommer and G. Palm \n\nwith the binomial random variable Prob[r=l] = B(Lit'I,Pt}I, where B(n,p), := \n(7)pl(1 - p)n-l. a denotes the add error rate and k = lit'l - a the number of \ncorrect 1-s in the address. \n\nFor the first step of CB-retrieval a lower bound of the add error rate a(l) can be \nderived by the analysis of CB-retrieval with fixed address x(O) = iIJ. and the perfect \nlearning pattern ylJ. as starting patterns in the y-Iayer. In this case the add error \nrate is: \n\n(8) \n\nwhere the random variables rl and r2 have the distributions: \nProb [rl = lib] = B(k, PI), and Prob [r2 = 1] = B(ab, (PI )2) l\" Thus, \n\na(l) ~ (m - b) L B(k, PdsBS [ab, (PI )2, (k - s)b) , \n\nk \n\n(9) \n\n8=0 \n\nwhere BS [n,p, t] := L:~t B(n,p), is the binomial sum. \n\nIn Fig. 1 the analytic results for the first step (7) and (9) can be compared with \nsimulations (left versus right diagram) . In the experiments simple retrieval is per(cid:173)\nformed with threshold () = k. CB-retrieval is iterated in the y-Iayer (with fixed \naddress x) starting with three randomly chosen 1-s from the simple retrieval result \nyt'. The iteration is stopped, if a stable pattern at threshold e = bk is reached. \nThe memory capacity can be calculated per pattern component under the assump(cid:173)\ntion that in the memory patterns each component is independent, i.e., the proba(cid:173)\nbilities for a 1 are p = a/n or q = b/m respectively, and the probabilities of an add \nand a miss error are simply the renormalized rates denoted by a', {3' and a', {3' for \nx-patterns and by,', 6' for y-patterns. The information about the stored pattern \ncontained in noisy initial or retrieved patterns is then given by the transinforma(cid:173)\ntion t(p,a',{3') := i(p) -i(p,a',{3'), where i(p) is the Shannon information, and \ni (p, a', {3') the conditional information. The heteroassociative mapping is evaluated \nby the output capacity: A(a', {3') := Mm t(q, ,', 6')/mn (in units bit/synapse). It \ndepends on the initial noise since the performance drops with growing initial errors \nand assumes the maximum, if no fault tolerance is provided, that is, with noiseless \ninitial patterns, see Fig. 2. Autoassociative completion of a distorted x-pattern is \nevaluated by the completion capacity: C(a', {3') := Mn(t(p, a', {3')-t(p, a', {3'))/mn. \nA BAM maps and completes at the same time and should be therefore evaluated \nby the search capacity S := C + A. \nThe asymptotic capacity of the Willshaw model is strikingly high: The completion \ncapacity (for autoassociation) is C+ = In[2] /4, the mapping capacity (for heteroas(cid:173)\nsociation with input noise) is A+ = In[2] /2 bit/syn [Pal91]' leading to a value for \nthe search capacity of (3 In[2])/4 = 0.52 bit/syn. To estimate S for general retrieval \nprocedures one can consider a recognition process of stored patterns in the whole \nspace of sparse initial patterns; an initial pattern is \"recognized\", if it is invari(cid:173)\nant under a bidirectional retrieval cycle. The so-called recognition capacity of this \nprocess is an upper bound of the completion capacity and it had been determined \nas In [2J/2, see [PS92]. This is achieved again with parameters M, p, q providing \nA = In[2] /2 yielding In[2] bit/syn as upper bound of the asymptotic search capac(cid:173)\nity. In summary, we know about the asymptotic search capacity of the CB-model: \n0.52 ::; S+ ::; 0.69 bit/syn. For experimental results, see Fig. 4. \n\n\fBidirectional Retrieval from Associative Memory \n\n679 \n\n4 EXPERIMENTAL RESULTS \n\nThe CB model has been tested in simulations and compared with the Willshaw \nmodel (simple retrieval) for addresses with random noise (Fig. 2) and for addresses \ncomposed by two learning patterns (Fig. 3). In Fig. 2 the widely enlarged range of \nhigh qualtity retrieval in the CB-model is demonstrated for different system sizes. \n\n6 \n5 \n4 \n3 \n2 \n1 \n0 \n\noutput miss errors \n\n10 \n\nsimple r. .. \". \n\nCB\u00b7r. -\n\n: \n.. : \" \n\n.' \n\n5 10 15 20 25 30 \n\n8 \n\n6 \n4 \n\n2 \n\n0 \n\n....... \n\n, .. ~ '\" \n.' \n\n, .. \" ... \n\n::' \n\n5 10 15 20 25 30 \n\noutput add errors \n\nsimple r. \" .\" \nCB\u00b7r.-\n\n6 r--..--.--,--.----.---, 14 rcr-..---,----.~----, \n5 \n4 \n3 \n2 \n\n.:'1 ...... \u00b7 \n\n.,.,.; , \u2022 .-:: .... os\u00b7,\u00b7 \n\n/\" \n\n/\", \n\n12 \n10 \n8 \n6 \n4 \n2 \n\n',' \n\n1 \no ~'--'--'--'--1----'-\"\"'-'-----' 0 L-.C..:....I..!.-...I.-_~=:::::::..J \n5 1015202530 \n\n5 10 15 20 25 30 \n\n101214161820 101214161820 \n\n7 \n6 \n5 \n4 \n3 \n2 \n1 \n0 \n\n10 \n8 \n\n6 \n4 \n2 \n0 \n\n~ \n60 \n\n7 \n6 \n5 \n4 \n3 \n2 \n\n0 \n\n10 \n8 \n6 \n\n4 \n\n~ \n\n60 \n40 \n\ntransinformation in output pattern (bit) \n\n2 \n0 L..:....L-I.. ........ \"--I \n101214161820 101214161820 \n\n1 00 r-r--,-,--,--,1 00 r-r--.-.-T'\"\"\"1 \n\n~ \n\n50 ;c;::r;:::::r:::=r::=7---\"J 1 00 \n45 L \n~ \n35 \n30 \n25 \n~ \n15 \n1 0 \n5 \no '---''---'----1---'----'---' \n5 10 15 20 25 30 \n\ns imple r. ..\". \n\n\u00b7CB-r. -\n\n20 \n\n...... . \n\n\" . \n\n5 \n\n10 15 20 25 30 \n\n~ \n20 \no \n101214161820 101214161820 \n\n20 \n0 L-.I.---L.--1-.I..--J \n\nFig. 2: Retrieval from addresses with random \nnoise. The x-axis labeling is as in Fig. 1. Small \nsystem with n = 100, M = 35 (left), system size \nas in Fig. 1, two trials (right). Output activities \nadjusted near Iyl = k by threshold setting. \n\nFig. 3: Retrieval from addresses \ncomposed by two learning pat(cid:173)\nterns. Parameters as in right col(cid:173)\numn of Fig. 2, explanation of left \nand right column, see text. \n\nIn Fig. 3 the address contains one learning pattern and I-components of a second \nlearning pattern successively added with increasing abscissae. On the right end \nof each diagram both patterns are completely superimposed. Diagrams in the left \ncolumn show errors and transinformation, if retrieval results are compared with \nthe learning pattern which is for li~ I < 20 dominantly addressed. Simple retrieval \nerrors behave similiar as for random noise in the address (Fig. 2) while the error \nlevel of CB-retrieval raises faster if more than 7 adds from the second pattern are \npresent. Diagrams in the right column show the same quantities, if the retrieval \nresult is compared with the closest of the two learning patterns. It can be observed \ni) that a learning pattern is retrieved even if the address is a complete superposi(cid:173)\ntion and ii) if the second pattern is almost complete in the address the retrieved \npattern corresponds in some cases to the second pattern. However, in all cases CB(cid:173)\nretrieval yields one of the learning pattern pairs and it could be used to generate \na good address for further retrieval of the other by deletion of the corresponding \nI-components in the original address. \n\n\f680 \n\n0.48 \n0.46 \n0.44 \n0.42 \n0.4 \n0.38 . \n\noutput c. ..... \nsearchc. -\n\n. .... \n\n8 10 12 14 16 18 \n\nF. T. Sommer and G. Palm \n\nFig. 4: Output and search capacity of CB retrieval in \nbit/syn with x-axis labeling as in Fig. 2 for n = m = 2000, \na = b = 10 M = 20000. The difference between both curves \nis the contribution due to x-pattern completion, the com(cid:173)\npletion capacity C. It is zero for Ix(O}1 = 10, if the initial \npattern is errorfree. \n\nThe search capacity of the CB model in Fig. 4 is close to the theoretical expectations \nfrom Sect. 3, increasing with input noise due to the address completion. \n\n5 SPARSE CODING \n\nTo apply the proposed N AM model, for instance, in information retrieval, a coding \nof the data to be accessed into sparse binary patterns is required. A useful extraction \nof sparse features should take account of statistical data properties and the way the \nuser is acting on them. There is evidence from cognitive psychology that such a \ncoding is typically quite easy to find. The feature encoding, where a person is \nextracting feature sets to characterize complex situations by a few present features, \nis one of the three basic classes of cognitive processes defined by Sternberg [Ste77]. \nSimilarities in the data are represented by feature patterns having a large number \nof present features in common, that is a high overlap: o(x, x'} := L:i XiX'i' For text \nretrieval word fragments used in existing indexing techniques can be directly taken \nas sparse binary features [Geb87]. For image processing sparse coding strategies \n[Zet90], and neural models for sparse feature extraction by anti-Hebbian learning \n[F6l90] have been proposed. Sparse patterns extracted from different data channels \nin heterogeneous data can simply be concatenated and processed simultaneously in \nN AM. If parts of the original data should be held in a conventional memory, also \nthese addresses have to be represented by distributed and sparse patterns in order \nto exploit the high performance of the proposed NAM. \n\n6 CONCLUSION \n\nA new bidirectional retrieval method (CB-retrieval) has been presented for the Will(cid:173)\nshaw neural associative memory model. Our analysis of the first CB-retrieval step \nindicates a high potential for error reduction and increased input fault tolerance. \nThe asymptotic capacity for bidirectional retrieval in the binary Willshaw matrix \nhas been determined between 0.52 and 0.69 bit/syn. In experiments CB-retrieval \nshowed significantly increased input fault tolerance with respect to the standard \nmodel leading to a practical information capacity in the order of the theoretical \nexpectations (0.5 bit/syn). Also the segmentation ability of CB-retrieval with am(cid:173)\nbiguous addresses has been shown. Even at high memory load such input pat(cid:173)\nterns can be decomposed and corresponding memory entries returned individually. \nThe model improvement does not require sophisticated individual threshold setting \n[GW95], strategies proposed for BAM like more complex learning procedures, or \n\"dummy augmentation\" in the pattern coding [WCM90, LCL95]. \n\nThe demonstrated performance of the CB-model encourages applications as mas(cid:173)\nsively parallel search strategies in Information Retrieval. The sparse coding re(cid:173)\nquirement has been briefly discussed regarding technical strategies and psycholog(cid:173)\nical plausibility. Biologically plausible variants of CB-retrieval contribute to more \n\n\fBidirectional Retrieval from Associative Memory \n\n681 \n\nrefined cell assembly theories, see [SWP98]. \n\nAcknowledgement: One of the authors (F.T.S.) was supported by grant S0352/3-1 \nof the Deutsche Forschungsgemeinschaft. \n\nReferences \n\n[F6190] \n\n[Geb87] \n\n[GM76] \n\n[GR92] \n\n[GW95] \n\n[Hop82] \n\n[Kos87] \n\n[LCL95] \n\n[Pal91] \n\n[PS92] \n\n[SP97] \n\n[SSP96] \n\n[Ste61] \n\n[Ste77] \n\nP. F6ldiak. Forming sparse representations by local anti-hebbian learning. \nBiol. Cybern., 64:165-170, 1990. \n\nF. Gebhardt. Text signatures by superimposed coding of letter triplets and \nquadruplets. Information Systems, 12(2):151-156, 1987. \n\nA.R. Gardner-Medwin. The recall of events through the learning of associ(cid:173)\nations between their parts. Proceedings of the Royal Society of London B, \n194:375-402, 1976. \n\nW.G. Gibson and J. Robinson. Statistical analysis of the dynamics of a sparse \nassociative memory. Neural Networks, 5:645-662, 1992. \n\nB. Graham and D. Willshaw. Improving recall from an associative memory. \nBiological Cybernetics, 72:337-346, 1995. \n\nJ.J. Hopfield. Neural networks and physical systems with emergent collective \ncomputational abilities. Proceedings of the National Academy of Sciences, \nUSA, 79, 1982. \n\nB. Kosko. Adaptive bidirectional associative memories. Applied Optics, \n26(23):4947-4971, 1987. \n\nC.-S. Leung, L.-W. Chan, and E. Lai. Stability, capacity and statistical dy(cid:173)\nnamics of second-order bidirectional associative memory. IEEE Trans. Syst, \nMan Cybern., 25(10):1414-1424, 1995. \n\nG. Palm. Memory Capacities of Local Rules for Synaptic Modification. Con(cid:173)\ncepts in Neuroscience, 2:97-128, 1991. \n\nG. Palm and F. T. Sommer. Information capacity in recurrent McCulloch-Pitts \nnetworks with sparsely coded memory states. Network, 3:1-10, 1992. \n\nF. T. Sommer and G. Palm. Improved bidirectional retrieval of sparse patterns \nstored by Hebbian learning. Submitted to Neural Networks, 1997. \n\nF. Schwenker, F. T. Sommer, and G. Palm. Iterative retrieval of sparsely coded \nassociative memory patterns. Neural Networks, 9(3) :445 - 455, 1996. \n\nK. Steinbuch. Die Lernmatrix. Kybernetik, 1:36-45, 1961. \n\nR. J. Sternberg. Intelligence, information processing and analogical reasoning. \nHillsdale, NJ, 1977. \n\n[SWP98] \n\nF. T. Sommer, T. Wennekers, and G. Palm. Bidirectional completion of Cell \nAssemblies in the cortex. In Computational Neuroscience: Trends in Research. \nPlenum Press, 1998. \n\n[WBLH69] D. J. Willshaw, O. P. Buneman, and H. C. Longuet-Higgins. Nonholographic \n\nassociative memory. Nature, 222:960-962, 1969. \n\n[WCM90] Y. F. Wang, J. B. Cruz, and J. H. Mulligan. Two coding stragegies for bidirec(cid:173)\ntional associative memory. IEEE Trans. Neural Networks, 1(1):81-92, 1990. \n\n[Zet90] \n\nC. Zetsche. Sparse coding: the link between low level vision and associative \nmemory. In R. Eckmiller, G. Hartmann, and G. Hauske, editors, Parallel \nProcessing in Neural Systems a.nd Computers. Elsevier Science Publishers B. \nV. (North Holland), 1990. \n\n\f", "award": [], "sourceid": 1377, "authors": [{"given_name": "Friedrich", "family_name": "Sommer", "institution": null}, {"given_name": "G\u00fcnther", "family_name": "Palm", "institution": null}]}