{"title": "Scaling Properties of Coarse-Coded Symbol Memories", "book": "Neural Information Processing Systems", "page_first": 652, "page_last": 661, "abstract": null, "full_text": "652 \n\nScaling Properties of Coarse-Coded  Symbol Memories \n\nRonald  Rosenfeld \nDavid S.  Touretzky \n\nComputer Science Department \n\nCarnegie Mellon  University \n\nPittsburgh, Pennsylvania 15213 \n\nAbstract:  Coarse-coded symbol memories have appeared in several neural network \nsymbol  processing  models.  In  order  to  determine  how  these  models  would  scale,  one \nmust  first  have  some  understanding  of the  mathematics  of coarse-coded  representa(cid:173)\ntions.  We  define  the  general  structure  of coarse-coded  symbol  memories  and  derive \nmathematical relationships among  their essential parameters:  memory 8ize,  8ymbol-8et \nsize and capacity.  The computed capacity of one of the schemes agrees well with actual \nmeasurements oC  tbe coarse-coded working memory of DCPS, Touretzky and Hinton's \ndistributed connectionist production system. \n\n1 \n\nIntroduction \n\nA  di8tributed repre8entation is a memory scheme in which each entity (concept, symbol) \nis  represented  by a  pattern  of  activity  over many  units  [3].  If each  unit  participates \nin  the representation of many entities, it is said  to be  coar8ely  tuned,  and the memory \nitself is called  a  coar8e-coded memory. \n\nCoarse-coded memories have been used for storing symbols in several neural network \nsymbol  processing  models,  such  as  Touretzky  and  Hinton's  distributed  connectionist \nproduction  system  DCPS  [8,9],  Touretzky's distributed  implementation  of linked  list \nstructures on a  Boltzmann  machine,  BoltzCONS  [10],  and  St.  John  and McClelland's \nPDP model of case role defaults [6].  In all of these models,  memory capacity was mea(cid:173)\nsured empirically and parameters were adjusted by trial and error to obtain the desired \nbehavior.  We are now able to  give a  mathematical foundation to these experiments by \nanalyzing  the relationships among the fundamental memory parameters. \n\nThere  are several  paradigms for  coarse-coded  memories.  In  a  feature-based  repre-\n8entation,  each  unit stands for  some semantic feature.  Binary  units can code features \nwith  binary values, whereas more complicated units or groups of units are required  to \ncode  more  complicated  features,  such  as  multi-valued  properties  or  numerical  values \nfrom  a  continuous  scale.  The  units  that form  the  representation  of a  concept  define \nan  intersection  of features  that constitutes  that concept.  Similarity between concepts \ncomposed  of binary Ceatures can  be measured  by  the Hamming  distance between  their \nrepresentations.  In a  neural  network  implementation,  relationships  between  concepts \nare implemented via connections among the units forming their representations.  Certain \ntypes of generalization phenomena thereby emerge automatically. \n\nA different paradigm is used when representing points in a multidimensional contin(cid:173)\n\nuous space  [2,3].  Each  unit encodes values  in  some subset  of the space.  Typically  the \n\n@ American Institute of Physics 1988 \n\n\f653 \n\nsubsets are  hypercubes  or  hyperspheres,  but they  may  be  more  coarsely  tuned  along \nsome dimensions than others [1].  The point to be represented is in the subspace formed \nby the intersection of all active units.  AB  more units are turned on,  the accuracy of the \nrepresentation improves.  The density and degree of overlap of the units' receptive fields \ndetermines the system's resolution  [7]. \n\nYet  another  paradigm  for  coarse-coded  memories,  and  the  one  we  will  deal  with \nexclusively,  does  not  involve  features.  Each  concept,  or symbol,  is  represented  by an \narbitrary subset of the units, called its pattern.  Unlike in feature-based  representations, \nthe units in the pattern bear no relationship to the meaning of the symbol represented.  A \nsymbol is stored in memory by turning on all the units in its pattern.  A symbol is deemed \npresent if all the units in its pattern are active.l  The receptive field of each unit is defined \nas the set of all symbols in whose pattern it participates.  We call such memories coarse(cid:173)\ncoded  symbol memories (CCSMs).  We  use  the  term  \"symbol\"  instead  of \"concept\"  to \nemphasize that the internal structure of the entity to be represented is not involved in \nits  representation.  In  CCSMs,  a  short  Hamming  distance  between  two  symbols  does \nnot imply semantic similarity, and is in  general an undesirable phenomenon. \n\nThe efficiency  with which CCSMs handle sparse memories is  the major reason  they \nhave been used in many connectionist systems, and hence the major reason for studying \nthem  here.  The  unit-sharing  strategy  that  gives  rise  to  efficient  encoding  in  CCSMs \nis  also  the  source  of their  major weakness.  Symbols  share  units  with  other symbols. \nAB  more  symbols  are  stored,  more  and  more  of the  units  are  turned  on.  At  some \npoint,  some  symbol  may  be  deemed  present  in  memory  because  all  of  its  units  are \nturned  on,  even  though  it  was  not  explicitly  stored:  a  \"ghost\"  is  born.  Ghosts  are \nan  unwanted  phenomenon  arising out of the  overlap among  the  representations of the \nvarious symbols.  The emergence of ghosts  marks  the  limits  of the system's  capacity: \nthe number of symbols it can store simultaneously and reliably. \n\n2  Definitions  and Fundamental Parameters \n\nA coarse coded symbol memory in its most general form consists of: \n\n\u2022  A set of N  binary state units. \n\n\u2022  An alphabet of Q  symbols to be represented.  Symbols in this context are atomic \n\nentities:  they  have no  constituent structure. \n\n\u2022  A  memory  scheme, which  is  a  function  that maps each symbol  to a  subset of \nthe  units - its pattern.  The  receptive  field  of a  unit is  defined  as  the  set  of \nall  symbols  to whose  pattern it belongs  (see  Figure 1).  The exact  nature of the \n\nlThis criterion can be generalized by introducing a  visibility  threshold:  a  fraction of \n\nthe pattern that should be on in order for a symbol to be considered present.  Our analy(cid:173)\nsis deals only with a visibility criterion of 100%, but can be generalized to accommodate \nnOise. \n\n\fII  81  I 82  I 88  I 8 4  I 85  I 86  I 87  I 88  II \n\n654 \n\nII \n\nUl \nU2 \nU8 \nU4 \nU5 \nU6 \n\n\u2022 \n\n\u2022 \n\u2022 \n\n\u2022  \u2022 \n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\u2022 \n\n\u2022 \n\u2022  \u2022 \n\u2022 \n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\u2022 \n\n\u2022 \n\u2022 \n\nFigure 1:  A  memory scheme (N = 6,  Q  = 8)  defined  in terms of units Us  and symbols \n8;.  The columns are the symbols'  patterns.  The rows are the units'  receptive  fieldB. \n\nmemory  scheme  mapping  determines  the  properties  of the  memory,  and  is  the \ncentral target of our investigation. \n\nAs  symbols are stored,  the memory  fills  up and  ghosts eventually appear.  It is  not \npossible  to detect a  ghost simply by inspecting  the contents of memory,  since  there is \nno general way of distinguishing a symbol that was stored from one that emerged out of \noverlaps with other symbols.  (It is sometimes possible,  however,  to conclude that there \nare no ghosts.)  Furthermore, a symbol that emerged as a  ghost at one time may not be \na  ghost  at a  later time  if it was subsequently stored  into memory.  Thus the definition \nof a  ghost depends not only  on the state of the memory  but also on its history. \n\nSome memory schemes guarantee that no ghost will emerge as long as the number of \nsymbols stored does  not  exceed some specified  limit.  In other schemes,  the emergence \nof ghosts is  an ever-present  possibility,  but  its probability  can  be kept  arbitrarily  low \nby adjusting other parameters.  We analyze systems of both types.  First, two more bits \nof notation need to be introduced: \n\nPghost:  Probability of a  ghost.  The probability that at least one ghost will appear \n\nafter some number of symbols have been stored. \n\nk:  Capacity.  The  maximum  number  of  symbols  that  can  be  stored  simultaneously \nbefore the probability of a ghost exceeds a specified threshold.  If the threshold is \n0,  we say  that the capacity is  guaranteed. \n\nA  localist  representation,  where  every  symbol  is  represented  by  a  single  unit  and \nevery unit is  dedicated  to the  representation of a  single symbol,  can now  be  viewed  as \na  special case of coarse-coded  memory,  where  k  = N  = Q  and  Pghost  = o.  Localist \nrepresentations are well suited for  memories that are not sparse.  In these cases, coarse(cid:173)\ncoded  memories are at a  disadvantage.  In  designing coarse-coded symbol  memories we \nare  interested  in cases where  k  \u00ab  N  \u00ab  Q.  The permissible  probability  for  a  ghost  in \nthese systems should be low enough so that its impact can be ignored. \n\n\f655 \n\n3  Analysis  of Four Memory Schemes \n\n3.1  Bounded  Overlap (guaranteed capacity) \n\nIf we want to construct  the memory scheme with  the largest  possible  a  (given  Nand \nk)  while  guaranteeing Pghost =  0,  the problem can be stated formally  as: \n\nGiven a  set of size N, find  the largest collection of subsets of it such that no \nunion of k  such subsets subsumes any other subset in  the collection. \n\nThis is  a  well  known  problem  in  Coding Theory,  in slight  disguise.  Unfortunately, \nno  complete analytical solution  is  known.  We therefore simplify our task and consider \nonly systems in which all symbols are represented by the same number of units (i.e.  all \npatterns are of the same size).  In mathematical terms, we restrict ourselves to constant \nweight codes.  The problem then becomes: \n\nGiven  a  set of size  N,  find  the  largest  collection  of subsets of size  exactly \nL  such  that  no  union  of k  such subsets  subsumes  any  other subset  in  the \ncollection. \n\nThere are no known complete analytical solutions for the size of the largest collection \nof patterns even  when  the  patterns are  of a  fixed  size.  Nor  is  any efficient  procedure \nfor  constructing  such  a  collection  known.  We  therefore  simplify  the  problem further. \nWe  now  restrict  our consideration to patterns whose pairwise overlap  is  bounded  by  a \ngiven  number.  For  a  given  pattern size  L  and desired  capacity  k,  we  require  that  no \ntwo patterns overlap in  more than m  units, where: \n\n_lL -1J \nm- - -k \n\n(1) \n\nMemory  schemes  that obey  this  constraint are guaranteed a  capacity of at least  k \nsymbols,  since  any  k  symbols  taken  together  can  overlap  at  most  L  - 1  units  in  the \npattern  of  any  other  symbol  - one  unit  short  of  making  it  a  ghost.  Based  on  this \nconstraint, our mathematical problem now becomes: \n\nGiven a set of size N, find  the largest collection of subsets of size exactly  L \nsuch  that  the  intersection of any two such subsets is of size  ~ m  (where  m \nis  given by equation 1.) \n\nCoding  theory  has  yet to produce a  complete solution  to  this  problem,  but several \nmethods of deriving  upper bounds have been proposed (see for example [4]).  The simple \nformula we use  here is  a  variant of the Johnson  Bound.  Let  abo  denote  the  maximum \nnumber of symbols attainable in memory schemes that use bounded overlap.  Then \n\n(m~l) \n(m~l) \n\n(2) \n\n\f656 \n\nThe  Johnson  bound  is  known  to  be  an  exact solution  asymptotically  (that  is,  when \nN, L, m  -+ 00  and their ratios remain finite). \n\nSince  we  are  free  to  choose  the  pattern size,  we  optimize  our  memory  scheme  by \n\nmaximizing  the above expression  over all possible  values of L.  For the parameter sub(cid:173)\nspace we are interested in here  (N < 1000,  k  < 50)  we use  numerical approximation to \nobtain: \n\n< \n\n(  N \n\nmax \nLeII,N]  L  - m \n\n)m+l \n\n< \n\n(3) \n\n(Recall that m  is a  function of Land k.)  Thus the upper bound we derived depicts a \nsimple exponential relationship between Q  and N/k.  Next, we try to construct memory \nschemes  of  this  type.  A  Common  Lisp  program  using  a  modified  depth-first  search \nconstructed memory schemes for various parameter values, whose Q'S came within 80% \nto 90%  of the  upper  bound.  These results are far from conclusive,  however, since only \na  small portion of the parameter space was tested. \n\nIn evaluating the  viability of this approach,  its apparent optimality should  be con(cid:173)\n\ntrasted  with  two  major  weaknesses.  First,  this  type  of  memory  scheme  is  hard  to \nconstruct  computationally.  It took  our  program  several  minutes  of CPU  time  on  a \nSymbolics 3600 to produce  reasonable  solutions for  cases  like  N  = 200, k  = 5, m  = 1, \nwith an exponential increase in computing time for  larger values of m.  Second,  if CC(cid:173)\nSMs are  used  as  models of memory  in  naturally evolving  systems  (such  as  the brain), \nthis approach places too great a  burden on developmental mechanisms. \n\nThe importance of the bounded overlap approach lies  mainly in  its role as an upper \nbound for all possible memory schemes, subject to the simplifications made earlier.  All \nschemes with guaranteed capacities can be measured relative to equation  3. \n\n3.2  Random Fixed  Size Patterns (a stochastic approach) \n\nRandomly produced memory schemes are easy to implement and are attractive because \nof their naturalness.  However,  if the patterns of two symbols coincide,  the guaranteed \ncapacity will  be  zero  (storing one of these symbols will render the other a  ghost).  We \ntherefore abandon  the goal of guaranteeing a  certain capacity,  and  instead  establish  a \ntolerance level for ghosts, Pghost.  For large enough memories, where stochastic behavior \nis more robust, we may expect reasonable capacity even with very small Pghost. \n\nIn  the first stochastic approach we  analyze,  patterns are  randomly selected subsets \nof a  fixed  size  L.  Unlike in the previous approach, choosing  k  does  not bound  Q.  We \nmay define  as  many symbols as  we  wish,  although at the cost of increased  probability \nof a  ghost (or,  alternatively, decreased  capacity).  The probability of a  ghost appearing \nafter k  symbols have  been stored is given by Equation 4: \n\n(4) \n\n\f657 \n\nTN,L(k, e)  is  the probability that exactly e units will  be active after k  symbols have \n\nbeen stored.  It is defined  recursively by Equation 5\": \n\nTN,L(O,O)  =  1 \nTN,L(k, e) = 0 \nTN,L(k, e)  =  E~=o T(k - 1, c - a) . (N-~-a)) . (~:~)/(~) \n\nfor  either k = 0 and e 1=  0,  or k  > 0 and e <  L \n\n(5) \n\nWe have constructed various coarse-coded memories with random fixed-size receptive \nfields  and  measured  their  capacities.  The  experimental  results  show  good  agreement \nwith the above equation. \n\nThe  optimal  pattern  size  for  fixed  values  of  N,  k,  and  a  can  be  determined  by \nbinary search on Equation 4,  since Pghost(L)  has exactly one maximum in the interval \n[1, N].  However,  this may be expensive for  large  N.  A computational shortcut can be \nachieved  by estimating  the  optimal  L  and  searching  in  a  small  interval around  it.  A \ngood initial estimate is derived by replacing the summation in Equation 4 with a single \nterm involving  E[e]:  the expected  value  of the number  of active units  after  k  symbols \nhave been stored.  The latter can be expressed as: \n\nThe estimated L  is the one that maximizes the following  expression: \n\nAn alternative formula,  developed by Joseph Tebelskis,  produces very good approx(cid:173)\n\nimations to  Eq.  4  and  is  much  more  efficient  to compute.  After storing  k  symbols  in \nmemory, the probability Pz  that a single arbitrary symbol x  has become a ghost is given \nby: \n\nPz(N,L,k,a)  =  f.(-1)'  i \n\nL \n\nL \n\n.(L) (N _ i)k  (N)k \n\n/  L \n\n(6) \n\nIf we now assume that each symbol's Pz  is independent of that of any other symbol, \n\nwe obtain: \n\n(7) \n\nThis assumption of independence is  not strictly true, but the relative error was less \nthan 0.1%  for  the  parameter ranges  we  considered,  when  Pghost  was no greater  than \n0.01. \n\nWe have constructed the two-dimensional table TN,L(k, c)  for a  wide range of (N, L) \nvalues (70  ~ N  ~ 1000, 7  ~ L  ~ 43),  and produced graphs of the relationships between \nN,  k,  a, and  Pghost  for  optimum pattern sizes,  as determined  by  Equation  4.  The \n\n\f658 \n\nresults show an approximately exponential relationship between a  and  N /k  [5].  Thus, \nfor a  fixed  number of symbols, the capacity is proportional to the number of units.  Let \narlp denote  the maximum number of symbols attainable in memory  schemes that use \nrandom fixed-size  patterns.  Some typical relationships, derived from the data, are: \n\narlP(Pghost = 0.01)  ~ 0.0086. eO.46Sf \narlp(Pghost =  0.001)  ~ O.OOOS.  eO.47Sf \n\n(8) \n\n3.3  Random Receptors  (a stochastic approach) \n\nA  second  stochastic  approach  is  to  have  each  unit  assigned  to  each  symbol  with  an \nindependent fixed probability s.  This method lends itself to easy mathematical analysis, \nresulting in a  closed-form analytical solution. \n\nAfter storing  k  symbols,  the  probability  that  a  given  unit  is  active  is  1 -\n\n(1  - s)k \n(independent  of any  other unit).  For  a  given  symbol  to be a  ghost,  every  unit  must \neither be active or else  not  belong  to  that  symbol's  pattern.  That will  happen with a \nprobability  [1  - s . (1  - s)k] N,  and thus the probability of a  ghost is: \n\nPghost(a, N, k,s) \n\n(9) \n\nAssuming  Pghost \u00ab  1  and  k  \u00ab  a  (both  hold  in  our case),  the expression  can  be \n\nsimplified  to: \n\nPghost(a,N,k,s) \n\na\u00b7 [1- s. (1- s)k]N \n\nfrom which a  can be extracted: \n\narr(N, k, 8, Pghost) \n\n(10) \n\nWe can now optimize by  finding  the value of s  that maximizes a, given any desired \nupper bound on the expected value of Pghost.  This is done straightforwardly by solving \nBa/Bs =  o.  Note that 8\u00b7 N  corresponds to L  in the previous approach.  The solution is \ns =  l/(k + 1),  which yields, after some algebraic manipulation: \n\n(11) \n\nA comparison of the results using the two stochastic approaches reveals an interesting \nsimilarity.  For large k, with Pghost =  0.01  the term 0.468/k  of Equation 8 can be seen \nas  a  numerical  approximation  to  the  log  term in  Equation  11,  and  the  multiplicative \nfactor  of 0.0086  in  Equation  8  approximates  Pghost  in  Equation  11.  This  is  hardly \nsurprising, since  the Law of Large Numbers implies  that in  the  limit  (N, k  -+  00, with \n8  fixed)  the two methods are equivalent. \n\n\f659 \n\nFinally,  it  should  be. noted  that  the  stochastic  approaches  we  analyzed  generate \na  family  of  memory  schemes,  with  non-identical  ghost-probabilities.  Pghost  in  our \nformulas  is  therefore  better understood  as  an  expected  value,  averaged  over  the entire \nfamily. \n\n3.4  Partitioned Binary  Coding  (a reference point) \n\nThe last memory scheme we analyze is not strictly distributed.  Rather, it is somewhere \nin between a  distributed and a  localist  representation, and is  presented for comparison \nwith  the  previous results.  For a  given  number  of units  N  and desired  capacity  k,  the \nunits are partitioned into k  equal-size \"slots,\" each consisting of N / k  units (for simplicity \nwe assume that k  divides  N).  Each slot is  capable of storing exactly one symbol. \n\nThe  most  efficient  representation  for  all  possible  symbols  that  may  be  stored  into \na  slot  is  to  assign  them  binary  codes,  using  the  N / k  units  of  each slot  as  bits.  This \nwould  allow  2NJic  symbols  to  be  represented.  Using  binary  coding,  however,  will  not \ngive  us  the required capacity of 1 symbol, since  binary  patterns subsume one  another. \nFor example, storing  the code '10110' into one of the slots will cause the codes '10010', \n'10100' and '00010'  (as well as several other codes)  to become ghosts. \n\nA  possible solution is to use only  half of the bits in  each slot for  a  binary code, and \nset the other half to the binary complement of that code (we assume that N/k is  even). \nThis way,  the codes are guaranteed  not  to subsume  one another.  Let  Qpbc  denote  the \nnumber of symbols representable using  a  partitioned binary coding scheme.  Then, \n\n'\" \n..... pbc  -\n\n_  2N J2Ic  - eO.847 !:!-\n.. \n\n-\n\n(12) \n\nOnce  again,  Q  is  exponential in  N /k.  The  form  of the  result closely  resembles  the \nestimated upper bound on  the Bounded Overlap method given in Equation 3.  There is \nalso a strong resemblance to Equations 8 and 11, except that the fractional multiplier in \nfront of the exponential, corresponding to Pghost, is missing.  Pghost is 0 for  the Parti(cid:173)\ntioned Binary Coding method, but this is enforced by dividing the memory into disjoint \nsets of units rather than adjusting the patterns to reduce overlap among symbols. \n\nAs  mentioned previously,  this memory scheme is  not really distributed in the sense \nused  in  this  paper,  since  there  is  no  one  pattern associated  with a  symbol.  Instead,  a \nsymbol is represented by anyone of a set of k  patterns, each N /k bits long, corresponding \nto its appearance in one of the k  slots.  To check whether a  symbol is  present, all k  slots \nmust be examined.  To store a new symbol in memory, one must scan the k slots until an \nempty one is found.  Equation 12 should therefore be used only as a  point of reference. \n\n4  Measurement of DCPS \n\nThe three distributed schemes we  have studied all use  unstructured  patterns,  the only \nconstraint being  that patterns are at least  roughly  the same size.  Imposing  more com(cid:173)\nplex structure on any of these schemes may is likely to reduce the capacity somewhat.  In \n\n\f660 \n\nMemory Scheme \n\nQbo(N, k)  < \n\nResult \neO.367t \n\neO.347r \n\nBounded Overlap \nRandom Fixed-size Patterns  Q,,!p(Pghost =  0.01)  ~ 0.0086. e\u00b0.468r \nQ,,!p(Pghost =  0.001) ~ 0.0008 . e\u00b0.473f \nQ  _  P \nghost \n,.,.  -\n-\n-\nQpbc \n\nRandom Receptors \nPartitioned Binary Coding \n\n. eN .1og (k+1)\"'Tl/((k+l)\"'Tl_k\"') \n\nTable 1 Summary of results for  various memory schemes. \n\norder to quantify this effect,  we measured  the memory  capacity of DCPS  (BoltzCONS \nuses  the same memory  scheme)  and  compared  the results  with  the theoretical  models \nanalyzed above. \n\nDCPS' memory scheme is a  modified  version of the Random Receptors method  [5]. \nThe symbol space is the set of all triples over a  25  letter alphabet.  Units have fixed-size \nreceptive fields  organized as 6 x 6 x 6 subspaces.  Patterns are manipulated to minimize \nthe variance in  pattern size across symbols.  The parameters for  DCPS are:  N  = 2000, \nQ = 253 = 15625, and the mean pattern size  is  (6/25)3  x  2000 = 27.65  with a  standard \ndeviation of 1.5.  When Pghost =  0.01  the measured capacity was  k  =  48  symbols.  By \nsubstituting for  N  in Equation 11 we find  that the highest k  value for which Q,.,.  ~ 15625 \nis  51.  There does  not  appear to  be a  significant cost for  maintaining structure in  the \nreceptive fields. \n\n5  Summary and Discussion \n\nTable  1 summarizes  the results obtained for  the four  methods analyzed.  Some dif(cid:173)\n\nferences must be emphasiz'ed: \n\n\u2022  Qbo  and Qpbc  deal with guaranteed capacity, whereas Q,.!p  and Q,.,.  are meaningful \n\nonly for  Pghost >  O. \n\n\u2022  Qbo  is only  an upper bound. \n\n\u2022  Q,.!p  is based on numerical estimates. \n\n\u2022  Qpbc  is based on a  scheme which is not strictly coarse-coded. \n\nThe similar functional form of all the results, although not surprising, is aesthetically \npleasing.  Some  of the  functional  dependencies  among  the  various parameters_ can  be \nderived informally  using  qualitative arguments.  Only a  rigorous analysis,  however,  can \nprovide the definite answers that are needed for a better understanding of these systems \nand their scaling properties. \n\n\f661 \n\nAcknowledgments \n\nWe thank Geoffrey Hinton, Noga Alon and Victor Wei for helpful comments, and Joseph \nTebelskis for  sharing with us  his formula for approximating Pghost in the case of fixed \npattern sizes. \n\nThis work  was supported  by  National  Science  Foundation grants IST-8516330  and \nEET-8716324, and by  the Office of Naval Research under contract number NOOO14-86-\nK-0678.  The  first  author  was  supported  by  a  National  Science  Foundation  graduate \nfellowship. \n\nReferences \n\n[1]  Ballard,  D  H.  (1986)  Cortical  connections  and  parallel  processing:  structure and \n\nfunction.  Behavioral and Brain  Sciences 9(1). \n\n[2]  Feldman, J. A.,  and Ballard, D.  H.  (1982)  Connectionist models and their proper(cid:173)\n\nties.  Cognitive  Science 6,  pp.  205-254. \n\n[3]  Hinton, G.  E.,  McClelland,  J. L.,  and Rumelhart,  D.  E.  (1986)  Distributed repre(cid:173)\n\nsentations.  In  D.  E.  Rumelhart  and  J.  L.  McClelland  (eds.),  Parallel  Distributed \nProcessing:  Explorations in the  Microstructure  of Cognition, volume 1. Cambridge, \nMA:  MIT Press. \n\n[4]  Macwilliams,  F.J.,  and  Sloane,  N.J.A.  (1978).  The  Theory  of Error-Correcting \n\nCodes,  North-Holland. \n\n[5]  Rosenfeld,  R.  and Touretzky,  D.  S.  (1987)  Four capacity  models for  coarse-coded \nsymbol  memories.  Technical report  CMU-CS-87-182,  Carnegie  Mellon  University \nComputer Science Department, Pittsburgh, PA. \n\n[6]  St. John, M.  F. and McClelland, J. L.  (1986)  Reconstructive memory for sentences: \n\na  PDP approach.  Proceedings  of the  Ohio  University  Inference  Conference. \n\n[7]  Sullins,  J.  (1985)  Value  cell  encoding  strategies.  Technical  report  TR-165,  Com(cid:173)\n\nputer Science Department, University of Rochester, Rochester, NY. \n\n[8]  Touretzky, D.  S., and Hinton, G.  E.  (1985)  Symbols among the neurons:  details of \na  connectionist inference architecture.  Proceedings  of IJCAI-85, Los  Angeles,  CA, \npp.  238-243. \n\n[9]  Touretzky,  D.  S.,  and  Hinton,  G.  E.  (1986)  A  distributed  connectionist  produc(cid:173)\n\ntion  system.  Technical  report  CMU-CS-86-172,  Computer  Science  Department, \nCarnegie Mellon  University, Pittsburgh, PA. \n\n[10]  Touretzky, D.  S.  (1986)  BoltzCONS:  reconciling  connectionism with  the  recursive \nnature  of stacks  and  trees.  Proceedings  of the  Eighth  A nnual  Conference  of the \nCognitive  Science  Society,  Amherst, MA,  pp. 522-530. \n\n\f", "award": [], "sourceid": 91, "authors": [{"given_name": "Ronald", "family_name": "Rosenfeld", "institution": null}, {"given_name": "David", "family_name": "Touretzky", "institution": null}]}