{"title": "A Method for the Associative Storage of Analog Vectors", "book": "Advances in Neural Information Processing Systems", "page_first": 590, "page_last": 595, "abstract": null, "full_text": "590 \n\nAtiya and Abu-Mostafa \n\nA Method for the Associative Storage \n\nof Analog Vectors \n\nAmir  Atiya (*)  and Yaser Abu-Mostafa (**) \n\n(*)  Department of Electrical Engineering \n\n(**)  Departments of Electrical  Engineering and Computer Science \n\nCalifornia Institute Technology \n\nPasadena,  Ca 91125 \n\nABSTRACT \n\nA method for  storing analog vectors in  Hopfield's continuous feed(cid:173)\nback model is proposed.  By  analog vectors we mean vectors whose \ncomponents  are  real-valued.  The  vectors  to  be  stored  are  set  as \nequilibria of the network.  The network model consists of one layer \nof visible  neurons  and  one  layer  of hidden  neurons.  We  propose \na  learning  algorithm,  which  results  in  adjusting  the  positions  of \nthe  equilibria,  as  well  as  guaranteeing  their  stability.  Simulation \nresults confirm the effectiveness of the method . \n\n1  INTRODUCTION \n\nThe  associative  storage  of binary  vectors  using  discrete  feedback  neural  nets  has \nbeen  demonstrated by Hopfield  (1982).  This  has attracted  a  lot  of attention, and \na  number  of  alternative  techniques  using  also  the  discrete  feedback  model  have \nappeared.  However,  the  problem of the  distributed  associative  storage  of analog \nvectors has  received  little  attention in  literature.  By analog vectors we  mean  vec(cid:173)\ntors  whose  components  are  real-valued.  This  problem  is  important  because  in  a \nvariety  of applications  of associative  memories  like  pattern  recognition  and vector \nquantization the patterns are  originally  in  analog form  and  therefore  one  can  save \nhaving the costly quantization step and therefore also save increasing the dimension \nof the vectors.  In dealing with analog vectors, we  consider feedback  networks of the \ncontinuous-time graded-output variety,  e.g.  Hopfield's model  (1984): \n\ndu dt = -u + Wf(u) + a, \n\nx = f(u), \n\n(1) \n\nwhere  u  = (Ul, ... , UN)T  is  the  vector of neuron  potentials, x = (x!, ... , XN)T  is  the \nvector of firing  rates,  W  is  the weight  matrix,  a  is  the  threshold  vector,  and f(u) \nmeans the vector (f( uI), ... , f( UN)) T,  where  f  is a  sigmoid-shaped function. \n\nThe vectors to be stored are set as  equilibria of the network.  Given a  noisy  version \nof any of the stored vectors as the initial state of the network, the network state has \n\n\fA Method for the Associative Storage of Analog Vectors \n\n591 \n\nto  reach eventually  the  equilibrium state  corresponding  to the correct  vector.  An \nimportant  requirement  is  that  these  equilibria  be  asymtotically  stable,  otherwise \nthe  attraction to  the equilibria will  not  be  guaranteed.  Indeed,  without  enforcing \nthis requirement, our numerical simulations show mostly unstable equilibria. \n\n2  THE  MODEL \n\nIt  can  be  shown  that  there  are  strong  limitations  on  the  set  of memory  vectors \nwhich  can  be  stored  using  Hopfield's  continuous  model  (Atiya  and  Abu-Mostafa \n1990).  To relieve these limitations,  we use an architecture consisting of both visible \nand hidden units.  The outputs of the visible units correspond to the components of \nthe stored vector.  Our proposed architecture will be close to the continuous version \nof the  BAM  (Kosko  1988).  The  model  consists  of one  layer  of visible  units  and \nanother layer  of hidden  units  (see  Figure  1).  The output of each layer is fed  as  an \ninput to the other layer.  No  connections exist  within  each  of the layers.  Let  y  and \nx  be the output vectors of the hidden layer and the visible layer respectively.  Then, \nin our model, \n\ndu dt =  -u + Wf(z) + a = e, \ndz \ndt  = -z + Vf(u) + b = h, \n\ny  = f(u) \n\nx = f(z) \n\n(2a) \n\n(2b) \n\nwhere W  = [Wij]  and V  = [Vij]  are the weight  matrices,  a  and  b  are  the threshold \nvectors,  and f  is  a  sigmoid  function  (monotonically  increasing)  in  the  range  from \n-1  to 1,  for example \n\nf(u) =  tanh(u). \n\nx \n\nx \n\nhld~n \nl~y.,. \n\nvlSlbl. \nl~y.,. \n\nFigure 1:  The model \n\n\f592 \n\nAtiya and Abu\u00b7Mostafa \n\nAs  we  mentioned  before,  for  a  basin  of attraction  to  exist  around  a  given  mem(cid:173)\nory  vector,  the  corresponding equilibrium has  to  be  asymtotically  stable.  For  the \nproposed architecture a  condition for  stability is  given by the following  theorem. \n\nTheorem:  An  equilibrium point (u*, z*)  satisfying \n\nJ'l/2( un 2:IWij If'l/2(zj) < 1 \n\nj \n\nJ'l/\\Z;) 2:I Vij l!,l/2(uj) < 1 \n\n(3a) \n\n(3b) \n\nj \nfor  all  i is  asymptotically stable. \n\nProof:  We  linearize  (2a),  (2b)  around the equilibrium.  We  get \n\nwhere \n\ndq \n-=Jq, \ndu \n\nif i = 1, ... , Nl \nif i = Nl + 1, ... , Nl + N 2, \n\nNl  and N2  are the number of units in the hidden layer and the visible layer respec(cid:173)\ntively,  and J  is  the Jacobian matrix,  given  by \n\n~  ~ fu  ~ \naUl \n\naZ 1 \n\naUNl \n\naZN'J \n\nJ= \n\nae~l \naUl \nghl \nUl \n\nahNa \naUl \n\nae~l \naUNl \nah \n8U\";1 \n\nae~l \naZ 1 \n~hl  ~ \nZl \n\naeN1 \naZN'J \n\naZN'J \n\nahNa \naUNl \n\nahNa \nlhl \n\nahNa \naZN'J \n\nthe  partial  derivatives  evaluated  at  the  equilibrium  point.  Let  Al  and  A2  be  re(cid:173)\nspectively the Nl x Nl  and N2  x N2  diagonal matrices with the ith  diagonal element \nbeing respectively  f'(un  and  f'(z;).  Furthermore, let \n\nThe Jacobian  is  evaluated as \n\nwhere  IL  means  the  L  x  L  identity matrix.  Let \n\nA-\n-\n\n(\n\n_A- l \n1 \nV \n\n\fA Method for the Associative Storage of Analog Vectors \n\n593 \n\nThen, \n\nJ=AA. \n\nEigenvalues of AA are identical to the eigenvalues of A 1/2 AA 1/2  because if ).  is an \neigenvalue of AA  corresponding to eigenvector v, then \n\nAAv =  ).v, \n\nand hence \n\nNow,  we  have \n\nAl/2AAI/2 _  (-INl \n\n-\n\nA~/2V A~/2 \n\nA~/2WA~/2) \n. \n\n-IN2 \n\nBy  Gershgorin's Theorem (Franklin 1968), an eigenvalue of J  has to satisfy at least \none of the inequalities: \n\nI). + 11  ::;  f'1/2( un 2:IWii 1f'1/2(zi) \n\ni  =  1, ... ,N1 \n\ni \n\nI). + 11::;  f'1/2(zn2:lvjil!,1/2(uj) \n\ni \n\ni = 1, ... ,N2' \n\nIt follows  that under  conditions (3a),  (3b)  that the eigenvalues of J  will  have  neg(cid:173)\native real parts, and hence  the equilibrium of the original system (2a),  (2b)  will  be \nasymptotically stable. \n\nThus,  if the  hidden  unit  values  are  driven  far  enough  into  the  saturation  region \n(i.e.  with values close  to 1 or -1),  then the corresponding equilibrium will  be stable \nbecause  then,  1'( un  will  be  very  small,  causing  Inequalities  (3)  to  be  satisfied. \nAlthough  there  is  nothing to rule out the existence  of spurious equilibria and limit \ncycles,  if they occur then they would be far  away from the memory vectors because \neach memory vector has a basin of attraction around it.  In our simulations we  have \nnever encountered limit  cycles. \n\n3  TRAINING ALGORITHM \nLet xm, m = 1, ... , M  be the vectors to be stored.  Each xm should correspond to the \nvisible  layer  component  of one  of the  asymptotically  stable equilibria.  We design \nthe network such that the hidden layer component of the equilibrium corresponding \nto xm  is far  into the saturation region.  The target hidden  layer component ym  can \nbe  taken  as  a  vector  of l's  and -1 's,  chosen  arbitrarily  for  example  by  generating \nthe  components randomly.  Then,  the weights have to satisfy \n\nyj =  !(2:Wi/X, + aj), \n\n/ \n\nxi =  ![2:Vjj!(2:Wj/x/ + aj) + b;]. \n\nj \n\n/ \n\n\f594 \n\nAtiya and Abu-Mostafa \n\nTraining  is  performed  in  two  steps.  In  the  first  step  we  train  the  weights  of the \nhidden  layer.  We use  steepest descent on  the error function \n\nEl = Lllyj - f(LWjlX; + aj )11 2 . \n\nm,j \n\nI \n\nIn  the second step  we  train  the  weights of the visible layer,  using  steepest  descent \non  the error function \n\nE2  = L \n\nII xi  - ![LVij!(LWj/x; + aj) + bd 112. \n\nm,i \n\nj \n\nI \n\nWe remark that in the first step convergence might be slow since the targets are lor \n-1.  A way to have fast convergence is  to stop if the outputs are within some constant \n(say 0.2)  from  the targets.  Then we  multiply the  weights and the thresholds of the \nhidden  layer  by  a  big  positive  constant,  so  as  to force  the  outputs  of the  hidden \nlayer to be close  to 1 or -1. \n\n4  IMPLEMENTATION \n\nWe consider a network with 10 visible and 10 hidden units.  The memory vectors are \nrandomly generated (the components are from -0.8 to 0.8 rather than the full  range \nto have a faster  convergence).  Five memory vectors are  considered.  After learning, \nthe memory  is  tested by giving memory  vectors plus  noise  (100  vectors for  a  given \nvariance).  Figure  2  shows  the  percentage  correct  recall  in  terms  of the  signal  to \nnoise ratio.  Although we found that we  could store up to 10  vectors, working  close \nto the full  capacity is  not recommended,  as  the recall  accuracy  dc>teriorates. \n\n/.  correct \n\n100  -r--.......--~~---------::::_-----> \n80 \n60 \n40 \n20 \nO.f...o.-----------............-----I \n-6 \n10 \n\n-2 \n\n2 \n\n6 \n\nsnr  (db) \n\nFigure 2:  Recall  accuracy  versus signal  to noise ratio \n\n\fA Method for the Associative Storage of Analog Vectors \n\n595 \n\nAcknowledgement \n\nThis  work  is  supported  by  the  Air  Force  Office  of Scientific  Research  under  grant \nAFO SR-88-0231 . \n\nReferences \n\nJ. Hopfield  (1982),  \"Neural networks and physical systems with emergent collective \ncomputational abilities\",  Proc.  Nat.  Acad.  Sci.  USA,  vol.  79,  pp.  2554-2558. \nJ.  Hopfield  (1984),  \"Neurons  with  graded  response  have  collective  computational \nproperties like  those of two state neurons\",  Proc.  Nat.  Acad.  Sci.  USA,  vol.  81,  p. \n3088-3092. \n\nA.  Atiya  and Y.  Abu-Mostafa  (1990),  \"An  analog  feedback  associative  memory\", \nto be submitted. \nB.  Kosko  (1988),  \"Bidirectional  associative  memories\",  IEEE  Trans.  Syst.  Man \nCybern.,  vol.  SMC-18,  no.  1,  pp.  49-60. \nJ.  Franklin  (1968)  Matrix  Theory,  Prentice-Hall, Englewood  Cliffs,  New  Jersey. \n\n\f\fPART VII: \n\nEMPIRICAL ANALYSES \n\n\f", "award": [], "sourceid": 206, "authors": [{"given_name": "Amir", "family_name": "Atiya", "institution": null}, {"given_name": "Yaser", "family_name": "Abu-Mostafa", "institution": null}]}