{"title": "An Attractor Neural Network Model of Recall and Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 642, "page_last": 648, "abstract": null, "full_text": "An Attractor Neural  Network Model of Recall \n\nand  Recognition \n\nEytan Ruppin \n\nDepartment of Computer Science \nSchool of Mathematical Sciences \nSackler Faculty of Exact  Sciences \nTel  Aviv University \n69978,  Tel Aviv, Israel \n\nYechezkel Yeshurun \n\nDepartment of Computer Science \nSchool of Mathematical Sciences \nSackler Faculty of Exact Sciences \n\nTel  Aviv  University \n\n69978,  Tel Aviv,  Israel \n\nAbstract \n\nThis  work  presents  an  Attractor  Neural  Network  (ANN)  model  of Re(cid:173)\ncall  and  Recognition.  It is  shown  that  an  ANN  model  can  qualitatively \naccount  for  a  wide  range  of experimental  psychological  data  pertaining \nto  the  these  two  main  aspects  of memory  access.  Certain  psychological \nphenomena  are  accounted  for,  including  the  effects  of list-length,  word(cid:173)\nfrequency,  presentation  time,  context  shift,  and  aging.  Thereafter,  the \nprobabilities of successful  Recall  and  Recognition  are  estimated,  in  order \nto  possibly enable further  quantitative examination of the model. \n\n1  Motivation \n\nThe goal of this paper is  to demonstrate that a  Hopfield-based  [Hop82]  ANN  model \ncan  qualitatively account for  a  wide  range  of experimental psychological data per(cid:173)\ntaining  to the  two main  aspects  of memory  access,  Recall  and  Recognition.  Recall \nis  defined  as  the  ability  to  retrieve  an  item from  a  list  of items  (words)  originally \npresented during a  previous learning phase,  given  an appropriate cue (cued RecalQ, \nor spontaneously  (free  RecalQ.  Recognition is  defined  as  the  ability  to successfully \nacknowledge that a certain item has  or  has  not appeared in  the tutorial list  learned \nbefore. \n\nThe main prospects of ANN modeling is that some parameter values, that in former, \n'classical' models of memory retrieval (see e.g.  [GS84]) had to be explicitly assigned, \ncan  now  be shown to be emergent properties of the  model. \n\n642 \n\n\fAn Attractor Neural Network Model of Recall and Recognition \n\n643 \n\n2  The  Model \n\nThe model  consists  of a  Hopfield  ANN,  in  which distributed  patterns representing \nthe  learned  items  are  stored  during  the  learning phase,  and  are later  presented  as \ninputs during the  test phase.  In this framework, successful Recall  and Recognition \nis  defined.  Some  additional  components are  added  to  the  basic  Hopfield  model  to \nenable the modeling of the relevant  psychological phenomena. \n\n2.1  The Hopfield  Model \n\nThe  Hopfield  model's  dynamics  are  composed  of  a  non-linear,  iterative,  asyn(cid:173)\nchronous  transformation  of the  network  state  [Hop82].  The  process  may  include \na  stochastic noise  which is  analogous  to the  'temperature' T  in statistical mechan(cid:173)\nics.  Formally,  the  Hopfield  model is  described  as  follows:  Let  neuron's i state be  a \nbinary  variable Si,  taking the values \u00b1 1 denoting a firing  or  a  resting state,  corre(cid:173)\nspondingly.  Let  the  network's state  be  a  vector  S  specifying  the  binary  values  of \nall  its  neurons.  Let  Jij  be  the  synaptic  strength  between  neurons  i  and  j.  Then, \nhi, the input 'field' of neuron  i  is  given by hi = L:f# JijSj.  The neuron's dynamic \nbehavior is  described  by \n\nSi (t + 1) =  {  1, \n\n-1, \n\nwith  probability ~(1 + tgh( ~\u00bb \nwith probability  ~(1 - tgh( ~\u00bb \n\nStoring a  new  memory  pattern eJ.'  in  the network is  performed  by  modifying every \nij element of the syna.ptic  connection  matrix according  to JlY w  = Ji1d + -keJ.' ieJ.' j. \nA Hopfield network will always converge to a stable state, and every stored memory \nis  an  attractor having an area surrounding it termed its  basin  of attraction [Hop82]. \nIn  addition  to  the  stored  memories,  also  other,  non-memory  states exist  as  stable \nstates (local minima) of the network [AGS85].  The maximal number m of (randomly \ngenerated)  memory  patterns which  can  be stored  in  the basic  Hopfield  network  of \nn  neurons is  m = eke  \u2022 n,  eke  ::::::  0.14  [AGS85]. \n\n2.2  Recall and  Recognition in the model's framework \n\n2.2.1  Recall \n\nRecall is  considered  successful  when  upon  starting from  an  initial  cue  the  network \nconverges  to  a  stable  state  which  corresponds  to  the  learned  memory  nearest  to \nthe  input  pattern.  Inter-pattern  distance  is  measured  by  the  Hamming  distance \nbetween  the  input  and  the  learned  item encodings.  If the  network  converges  to  a \nnon-memory stable state, its output will  stand for  a  'failure of recall'  response.  1. \n\nIThe question of \"How do such non-memory states bear the meaning of 'recall failure'?\" \nis out of the scope of this work.  However, a possible explanation is that during the learning \nphase  'meaning'  is  assigned  to  the stored  patterns  via connections  formed  with  external \npatterns, and since non-memory states lack such associations  with external patterns, they \nare 'meaningless', yielding the 'recall failure' response.  Another possible mechanism is that \nevery output pattern generated in  the recall process passes also a recognition phase so  that \nnon-memory states are rejected, (see the following paragraph describing recognition in our \nmodel). \n\n\f644 \n\nRuppin and Yeshurun \n\n2.2.2  Recognition \n\nRecognition is  considered  successful when  the network arrives at a stable state dur(cid:173)\ning  a  time  interval  A,  beginning  from  input  presentation.  In  general,  the  shorter \nthe distance between an input and its nearest memory, the faster  is  its convergence \n[AM88,  KP88,  RY90].  Since  non-memory  (non-learned)  stable states  have  higher \nenergy levels and much shallower basins of attraction than memorized stable states \n[AGS85,  LN89],  convergence to such states takes significantly longer  timer.  There(cid:173)\nfore,  there  exists  a  range  of possible values of A  that  enable  successful  recognition \nonly  of inputs similar  to one  of the stored memories. \n\n2.3  Other features  of the model \n\n\u2022  The  context of the  psychological  experiments is  represented  as  a  substring  of \nthe input's encoding.  In order to minimize inter-pattern correlation, the size of \nthe  context encoding  relative to the total size of the memory  encoding is  kept \nsmall . \n\n\u2022  The  total  associational  linkage  of a  learned  item,  is  modeled  as  an  external \nfield vector E.  When a learned memory pattern eJ.'  is presented to the network, \nthe  value  of the  external  field  vector generated  is  Ei  = h  . ~J.',  where  h  is  an \n'orientation' coefficient, expressing the  association strength. \n\nAdditional  features,  including  a  modified  storage equation  accounting for  learning \ntaking  place  at  the  test  phase,  and  a  storage  decay  parameter,  are  described  in \n[RY90]. \n\n3  The Modeling  of experimental data. \n\nRegarding  every  phenomenon  discussed,  a  brief description  of the  psychological \nfindings  is  followed  by  an  account  of its  modeling.  We  rely  on  the  known  results \npertaining to Hopfield  models to show that qualitatively, the psychological phenom(cid:173)\nena reviewed  are emergent properties of the model.  When  such analytical evidence \nis  lacking,  simulations  were  performed  in  order  to  account  for  the  experimental \ndata.  For a  review of the  psychological literature supporting the findings  modeled \nsee  [GS84]. \n\nThe List-Length  Effect:  It is  known  that the  probability of successful  Recall  or \nRecognition of a  particular item decreases  as  the length of list of learned items \nlllcreases. \nList  length  is  expressed  in  memory  load.  Since  It has  been  shown  that  the \nwidth  of the  memories  basins  of attraction  monotonically  decreases  following \nan  approximately  inverse  parabolic  curve  [Wei85],  Recall  performance should \ndecrease as memory load is  increased.  We have examined the convergence time \nof the same set of input patterns at different values of memory load.  As  demon(cid:173)\nstrated in Fig.  1,  it was found  tha.t,  as the memory load is increased, successful \nconvergence has  occurred  (on  the  average)  only  after  an  increasingly  growing \nnumber  of asynchronous  iterations.  Hence,  convergence takes  more  time  and \ncan  result  in  Recognition  failure,  although  memories'  stability  is  maintained \ntill  the critical capacity a c  is  reached. \n\n\fAn Attractor Neural Network Model of Recall and Recognition \n\n645 \n\n4000.0 \n\n3000.0 \n\nen \nc \n\u00b72  2000.0 \n0 \n'-\n~ \n\n1000.0 \n\n0.0  L----'-_--'--_'----L_ '--......1.._....1....-----1_-'---1 \n\n10.0 \n\n20.0 \n\n30.0 \n\n40.0 \n\n50.0 \n\n60.0 \n\nMemory  load \n\nFigurc  I:  Ilccogllitioll speed (No.  of a.<;Ylldll\u00b7OIlOIlS  iterations) a.c;  a  (1Il1ction  of IIlcllIory \nload  (No.  of storcd  memories).  The nclwork's  parameters arc n = 500, T  = 0.28 \n\nThe word-frequency effect:  The more frequent a  word is in  language, the prob(cid:173)\n\nabilit.y of recalling it increases, while the probability of recognizing it decreases. \nA  word's  frequency  in  the  language  is  assumed  to effect  its  retrieval  through \nthe  stored  word's  semantic  relations  and  associa.tions  [Kat85,  NCBK87].  It \nis  assumed,  that  relative  to  low  frequency  words,  high  frequency  words  have \nmore  semantic relations  and  therefore  more  connections  between  the patterns \nrepresenting them and other patterns stored in  the memory (i.e., in other  net(cid:173)\nworks).  This one-ta-many relationship is  assumed to be  reciprocal, i.e., each .. ~f \nthe externally stored  patterns has also  connections  projected  to several of th\u20ac: \nstored patterns in  the allocated network. \nThe process  leading to the formation  of the external field  E  (acting upon  the \nallocated network), generated by an input pattern n~arest to some stored mem(cid:173)\nory pattern {IJ  is assumed  to be characterized as follows: \n\n1.  There is  a  threshold  degree of overlap  &ntin,  such  that  E  > 0 only  when \n\nthe allocated  network's state overiap  H IJ  is  higher  than  Omin. \n\n2.  At overlap  values  HIJ  which  are  only  moderately larger  than  Omin,  hI-'  is \nmonotonically  increasing,  but as  HI-'  continues to rise,  a  certain  'optimal' \npoint is  reached, beyond  which  hIJ  is  monotonically decreasing. \n\n3.  High-frequency  words have lower  Omin  values than low-frequency words. \n\nRecognition  tests  are  characterized  by  a  high  initial  value  of overlap  HI-',  to \nsome  memory  {IJ.  The  value  of  hIJ  and  EJ.'  generated  is  post-optimal  and \ntherefore  smaller  than  in  the  case  of low-frequency  words  which  have  higher \nOmin  values. \nIn Recall tests  the initial situation is  characterized by low  values of overlap HI-' \nto some nearest  memory {I-'.  only  the overlap value of high-frequency  words  is \nsufficient for  activating associated  it.ems,  i.e.  HJ.'  > Omin. \n\n\f646 \n\nRuppin and Yeshurun \n\nPresentation Time:  Increasing the  presentation  time  of learned  words  is  known \n\nto improve both their Recall  and  Recognition. \nThis is explained by the phenomenon of maintenance rehearsal; The memories' \nbasins  of attraction  get  deeper,  since  the  'energy'  E  of a  given  state  equals \nto 2.:;:1 H/J 2 \u2022  Deeper  basins of attraction  are also  wider  [HFP83,  KPKP90]. \nTherefore,  the  probability  of successful  Recall  and  Recognition  of  rehearsed \nitems is increased.  The effect of a uniform rehearsal is equivalent to a tempera(cid:173)\nture decrease.  Hence, increasing presentation time will  attenuate and delay the \nList  length  phenomenon,  till  a  certain  limit.  In  a  similar  way,  the Test  Delay \nphenomenon  is  accounted for  [RY90]. \n\nContext Shift:  The  term  Context Shift  refers  to the  change in  context from  the \ntutorial period to the test period.  Studies examining the effect of context shift \nhave  shown  a  decrement  in  Recall  performance  with  context  shift,  but  little \nchange in  Recognition performance. \nAs  demonstrated in  [RY90],  when  a  context shift is  simulated by flipping some \nof the context string's bits, Recall performance severely deteriorates while mem(cid:173)\nories stability remains intact.  No significant increase in the time (i.e.  number of \nasynchronous iterations) required for  convergence was  found , thus maintaining \nthe  pre-shift  probability of successful  Recognition. \n\nAge differences  in Recall  and  Recognition:  It  was  found  that  older  people \nperform more poorly on Recall tasks than they do on Recognition tasks [eM87]. \nThese findings  can  be accounted for  by  the  assumption  that synapses  are  be(cid:173)\ning  weakened  and  deleted  with  aging,  which  although  being  controversial  has \ngained some  experimental support  (see  [RY90]) .  'Ve have  investigated the re(cid:173)\ntrieval  performance  as  a  function  of the  input's initial  overlap,  various  levels \nof synaptic  dilution,  and  memory  load:  As  demonstrated  in  Fig.  2,  when \nthe  synaptic  dilution  is  increased,  a  'critical'  phase  is  reached  where  memory \nretrieval of far-away input patterns is  decreased  but the retrieval of input pat(cid:173)\nterns  with  a  high  level  of ihitial  overlap  remains  intact.  As  the  memory  load \nis  increased,  this 'critical' phase begins at lower levels of synaptic dilution.  On \nthe other hand, only a  mild  increase  (of 15%) in recognition speed  was found. \n\n?;(cid:173)'i.  1 \nu \n\u00a3J \n2 n. \nu .. a: \n'0 \n\nFigure\u00a3 Til ...  I'I\"I'AJ,ilily (If I'1I(rr;s\"ful  n:Lriev:ol  ,~rfllrrllAtlre IlS  II  runcliotl or memory \nlu,ullltlclll'e illput  ,,,,II'CIII'I;  iuitialll\\'Crlal', lIt  lwo clifTcJcuL cl ... grccs of \"YIIAplic  llilu. \nliun;  ~O% ill  lhe right\u00b7si,lc-J  figllle,  AI\"I  5J';f,  in  lIlc  Idl\u00b7sitlc<1  figure.  Tlrc  nclwork's \nparlllllc\\crj; IIrC  \"  =  500,  l' = O.OS. \n\n\fAn Attractor Neural Network Model of Recall and Recognition \n\n647 \n\nThe interested reader can find  a  description of the modeling of additional  phe(cid:173)\nnomena, including test position, word fragment completion, and distractor sim(cid:173)\nilarity, in [RY90]. \n\n4  On  a  quantitative test  of the  model. \n\n4.1  Estimating Recall performance \n\nIn  a  given  network,  with  n  neurons  and  m  memories,  the  radius  r  of the  basins \nof attraction  of the memories  decreases  as  the  memory load  parameter  (a = min) \nis  increased.  According  to  [MPRV87],  n, m,  and  r  are  related  according  to  the \nexpression  m  =  (1-2.r)l.  n \n\n4 \n\nlogn' \n\nThe  concept  of the  basins  of attraction  implies  a  non-linear  probability  function \nwith  low  probability  when  input  vectors  are  further  than  the  radius  of attraction \nand high probability otherwise.  The slope of this non-linearity increases as the noise \nlevel T  is  decreased. \nThe  probability  Pc  that  a  random input  vector  will  converge  to  one  of the stored \nmemories  can  be  estimated  by  Pc  ~ l::~~ (~)  . m.  It  is  interesting  to  note  that \nthe  rates  of change of r  and of Pc  have distinct forms;  Recall  tests  beginning from \nrandomly generated cues  would yield  a very low  rate of successful Recall  (Pc).  Yet, \nif one  examines  Recall  by  picking  a  stored  memory,  flipping  some  of its  encoding \nbits,  and  presenting  it  as  an  input  to  the  network  (determining  r),  'reasonable' \nlevels  of successful  Recall  can still  be  obtained  even  when  a  'considerable' number \nof encoding  bits  are  flipped.  Pc  can  be  also  estimated  by  considering  the  context \nrepresentation  [RY90]. \n\n4.2  Estimating Recognition  performance \n\nThe  probability  of correct  Recognition  depends  mainly  on  the  the  length  of the \ninterval ~; assume that after an input pattern is presented to a network of n neurons, \nduring  the  time  interval  ~,  k  iterations  steps  of  a  Monte  Carlo  simulation  are \nperformed:  In each such step, a  neuron is  randomly selected,  and  then it examines \nwhether or  not it should flip  its state,  according to its input. \nWe  show  that  the  probability  Pg { d}  that  an  input  pattern  will  be  successfully \nrecognized  is  is  bounded by  Pg {d}  ~ 1 - d . e -nlc  \u2022  It can  be seen  that Recognition's \nsuccess  depends  strongly  on  the  initial  input  proximity  to  a  stored  memory,  and \neven more  strongly  dependent on  the number of allowed  asynchronous  iteration  k, \ndetermined  by  the  length  of~.  For  a  selection  of k  =  n(ln(d) + e),  one  obtains \nPg  ~ 1-e-c .  The expected number of iterations, (denoted as Exp( X\u00bb \ntill successful \nconvergence is  achieved is  E(X) =  l:t=l E(Xd =  n . l:t=l + ~ n .In(d). \nIn the more general case, Let 0  denote the Hamming distance (between the network's \nstate S  and a stored memory)  below which retrieval is  considered successful.  Then, \n(!)  . e -~.o,  and \nthe  corrected  estimations  of retrieval  performance  are  Pg  ~ 1 -\nE(X) ~ n .In( ~).  In simulations we have performed, (n = 500, d = 20,  0  = 10),  the \n\n\f648 \n\nRuppin and Yeshurun \n\naverage  number of iterations until successful convergence  was  in  the range of 300  -\n400,  in excellent correspondence with the predicted expectation, E(X) = 500\u00b7ln(2). \n\nReferences \n\n[AM88] \n\n[CM87] \n\n[GS84] \n\n[AGS85]  D. J. Amit, H.  Gutfreund, and H.  Sompolinsky. Storing infinite numbers \nof patterns  in  a  spin-glass  model of neural  networks.  Phys.  Rev.  Lett., \n55:1530,  1985. \nS.  I.  Amari  and  K.  Maginu.  Statistical  neurodynamics  of associative \nmemory.  Neural Networks,  1:63,  1988. \nF.I.M.  Craik  and  J .M.  McDowd.  Age  differences  in  recall  and  recog(cid:173)\nnition.  Journal  of Experimental  Psychology;  Learning,  Memory,  and \nCognition,  13(3):474,  1987. \nG.  Gillund  and  M.  Shiffrin.  A  retrieval model for  both recognition  and \nrecall.  Psychological Review, 91:1,  1984. \nJ.J.  Hopfield,  D.  I.  Fienstien,  and  R.  G.  Palmer.  Unlearning'  has  a \nstabilizing effect  in  collective memories.  Nature,  304:158,  1983. \nJ.J.  Hopfield.  Neural  networks  and  physical  systems  with  emergent \ncollective abilities.  Proc.  Nat.  Acad.  Sci.  USA,  79:2554,  1982. \nT. Kato. Semantic-memory sources of episodic retrieval failure.  Memory \n&  Cognition,  13(5):442,  1985. \nJ. Komios  and  R.  Paturi.  Convergence results in  an associative memory \nmodel.  Neural Networks,  1:239,  1988. \n\n[HFP83] \n\n[Hop82] \n\n[Kat85] \n\n[KP88] \n\n[KPKP90]  B.  Kagmar-Parsi  and  B.  Kagmar-Parsi.  On problem solving  with  hop(cid:173)\n\n[LN89] \n\nfield  neural networks.  Bioi.  Cybern.,  62:415,  1990. \nM.  Lewenstein  and  A.  Nowak.  Fully  connected  neural  networks  with \nself-control of noise  levels.  Phys.  Rev.  Lett., 62(2):225,  1989. \n\n[MPRV87]  R.J.  McEliece,  E.C.  Posner,  E.R.  Rodemich,  and  S.S.  Venkatesh.  The \nIEEE  Transactions  on \n\ncapacity  of  the  hopfield  associative  memory. \nInformation  theory,  IT-33( 4):461,  1987. \n\n[RY90] \n\n[NCBK87]  D.L  Nelson,  J.J. Canas,  M.T. Bajo, and  P.D.  Keelan.  Comparing word \nfragment completion and cued recall with letter cues.  Journal of Exper(cid:173)\nimental Psychology:  Learning,  Memory  and  Cognition,  13(4) :542,  1987. \nE.  Ruppin  and  Y.  Yeshurun.  Recall  and  recognition  in  an  attractor \nneural  network  model  of memory  retrieval.  Technical  report,  Dept.  of \nComputer Science, Tel-Aviv  University,  1990. \nG.  Weisbuch.  Scaling  laws  for  the  attractors  of hopfield  networks.  J. \nPhysique  Lett., 46:L-623,  1985. \n\n[Wei85] \n\n\f", "award": [], "sourceid": 307, "authors": [{"given_name": "Eytan", "family_name": "Ruppin", "institution": null}, {"given_name": "Yehezkel", "family_name": "Yeshurun", "institution": null}]}