{"title": "Distributed Recursive Structure Processing", "book": "Advances in Neural Information Processing Systems", "page_first": 591, "page_last": 597, "abstract": null, "full_text": "Distributed Recursive Structure Processing \n\nGeraldine Legendre \nDepartment of \nLinguistics \n\nYoshiro Miyata \n\nOptoelectronic \n\nComputing Systems  Center \n\nUniversity  of Colorado \nBoulder,  CO 80309-0430\u00b7 \n\nPaul Smolensky \nDepartment of \nComputer Science \n\nAbstract \n\nHarmonic grammar (Legendre,  et al., 1990) is a connectionist theory of lin(cid:173)\nguistic  well-formed ness  based on the assumption  that the well-formedness \nof a  sentence  can  be  measured  by  the  harmony  (negative  energy)  of the \ncorresponding  connectionist  state.  Assuming  a  lower-level  connectionist \nnetwork that obeys a  few  general connectionist  principles  but is  otherwise \nunspecified,  we  construct  a  higher-level  network  with  an  equivalent  har(cid:173)\nmony function  that captures the most linguistically relevant global aspects \nof the  lower  level  network.  In  this  paper,  we  extend  the  tensor  product \nrepresentation  (Smolensky  1990)  to fully  recursive  representations  of re(cid:173)\ncursively  structured objects like  sentences  in  the lower-level  network.  We \nshow  theoretically  and  with  an  example  the  power  of the  new  technique \nfor  parallel distributed  structure  processing. \n\n1 \n\nIntroduction \n\nA  new  technique  is  presented  for  representing  recursive  structures in  connectionist \nnetworks.  It has  been  developed  in  the  context  of the  framework  of Harmonic \nGrammar  (Legendre  et  a1.  1990a,  1990b),  a  formalism  for  theories  of linguistic \nwell-formedness  which involves  two basic  levels:  At the lower level,  elements  of the \nproblem domain are represented  as distributed  patterns of activity in a  networkj At \nthe higher level,  the elements  in  the domain are represented  locally and connection \nweights are interpreted as soft rules involving these elements.  There are two aspects \nthat are central  to the  framework. \n\n-The authors  are listed in alphabetical order. \n\n591 \n\n\f592 \n\nLegendre, Miyata, and Smolensky \n\nFirst,  the  connectionist  well-formedness  measure  harmony (or negative  \"energy\"), \nwhich  we  use  to  model  linguistic  well-formed ness ,  has  the  properties  that  it  is  p(cid:173)\nreserved  between  the  lower  and  the  higher  levels  and  that  it  is  maximized  in  the \nnetwork processing.  Our previous work developed techniques for deriving harmonies \nat the higher level  from linguistic  data, which  allowed  us  to make contact  with  ex(cid:173)\nisting higher-level  analyses  of a  given linguistic  phenomenon. \nThis  paper  concentrates  on  the  second  aspect  of the  framework:  how  particular \nlinguistic  structures  such  as  sentences  can be efficiently  represented  and  processed \nat  the  lower  level.  The  next  section  describes  a  new  method  for  representing  tree \nstructures  in  a  network  which  is  an extension  of the tensor  product  representation \nproposed in  (Smolensky 1990) that allows recursive tree structures to be represented \nand various  tree  operations  to be performed in  parallel. \n\n2  Recursive  tensor product  representations \n\nA  tensor  product  representation  of a  set  of structures  S  assigns  to  each  8  E  S  a \nvector built  up  by  superposing role-sensitive  representations  of its  constituents.  A \nrole  decomposition  of S  specifies  the  constituent  structure  of s  by  assigning  to it \nan unordered  set  of filler-role  bindings.  For example,  if S  is  the set  of strings  from \nthe alphabet  {a, b, chand 8  = cba, then  we  might  choose  a  role  decomposition  in \nwhich  the  roles  are  absolute  positions  in  the  string  (rl  =  first,  r2  =  second,  ... ) \nand the constituents  are  the filler/role  bindings {b/r2, a/rs, c/rl}.  1 \nIn  a  tensor  product  representation  a  constituent  - i.e.,  a  filler/role  binding  - is \nrepresented  by the tensor  (or generalized  outer) product of vectors  representing  the \nfiller  and role in isolation:  fir is  represented by the vector v  =  f\u00aer, which is  in fact \na second-rank tensor whose elements are conveniently labelled by two subscripts and \ndefined  simply  by vt.pp = ft.prp. \nWhere do the filler  and role vectors f  and r  come from?  In the most straightforward \ncase,  each  filler  is  a  member of a  simple  set  F  (e.g.  an alphabet)  and each  role  is \na  member of a  simple set  R  and the designer  of the  representation  simply specifies \nvectors  representing  all  the  elements  of F  and  R.  In  more  complex cases,  one  or \nboth of the sets F  and R might be sets of structures which in turn can be viewed as \nhaving constituents,  and  which  in  turn can be  represented  using  a  tensor  product \nrepresentation.  This  recursive  construction  of the  tensor  product  representations \nleads to tensor products of three or more vectors, creating tensors of rank three and \nhigher,  with elements conveniently labelled  by three or  more subscripts. \n\nThe recursive  structure  of trees  leads  naturally  to such  a  recursive  construction  of \na  tensor  product  representation.  (The following  analysis  builds on Section  3.7.2 of \n(Smolensky  1990.\u00bb  We consider binary trees  (in  which  every  node has at most two \nchildren)  since  the techniques developed below  generalize immediately to trees with \nhigher  branching  factor,  and since  the  power  of binary  trees  is  well  attested,  e.g., \nby  the  success  of Lisp,  whose  basic  datastructure is  the binary tree.  Adopting the \nconventions and notations of Lisp,  we assume for  simplicity that the terminal nodes \n\nlThe other major kind of role decomposition considered in (Smolensky  1990) is contex(cid:173)\n\ntual roles; under one such decomposition, one constituent of cba is  \"b in the role 'preceded \nby  c  and followed  by  a'''. \n\n\fDistributed Recursive Structure Processing \n\n593 \n\nof the  tree  (those  with  no  children),  and  only  the  terminal  nodes,  are labelled  by \nsymbols or atoms.  The set of structures S  we want to represent is  the union of a set \nof atoms and the set of binary trees  with terminal nodes labelled  by  these  atoms. \n\nOne way to view  a  binary tree,  by analogy with  how  we  viewed  strings  above, is  as \nhaving a  large  number  of positions  with  various  locations  relative  to the  root:  we \nadopt positional roles rill  labelled  by  binary strings (or bit vectors)  such as  Z  = 0110 \nwhich  is  the  position  in  a  tree  accessed  by  \"caddar  = car(cdr(cdr(car)))\", that \nis,  the left child  (0;  car) of the right child  (1;  cdr) of the right child  of the left child \nof the root of the tree.  Using  this  role  decomposition,  each constituent  of a  tree is \nan atom  (the filler)  bound to some role  rill  specifying  its location;  so if a  tree  s  has \na  set  of atoms  {fi} at  respective  locations  {zih then  the  vector  representing  s  is \n8  = Ei fi\u00aerXi' \nA more recursive  view  of a  binary tree sees  it as having only  two constituents:  the \natoms  or  subtrees  which  are  the  left  and  right  children  of the  root.  In  this  fully \nrecursive role  decomposition, fillers  may either be atoms or trees:  the set of possible \nfillers  F  is  the  same as  the original set  of structures  S. \n\nThe fully  recursive  role  decomposition  can be incorporated into the tensor product \nframework by  making the vector spaces  and operations  a  little more complex than \nin  (Smolensky  1990).  The goal is  a  representation  obeying, Vs, p, q E S: \n\ns = cons(p, q) =>  8  =  p\u00aerO + q\u00aerl \n\n(1) \nHere,  s  = cons(p, q)  is  the  tree  with  left  subtree  p  and  right  subtree  q,  while \n8, p  and  q  are  the  vectors  representing  s, p  and  q.  The  only  two  roles  in  this \nrecursive  decomposition  are  ro,  rl:  the left  and right  children  of root.  These  roles \nare represented  by  two vectors  rO  and rl' \n\nA  fully  recursive  representation  obeying  Equation  1  can  actually  be  constructed \nfrom  the  positional  representation,  by  assuming  that  the  (many)  positional  role \nvectors  are  constructed  recursively  from  the  (two)  fully  recursive  role  vectors  ac(cid:173)\ncording  to: \n\nrxO = rx\u00aerO  rxl = rx\u00aerl' \n\nFor  example,  rOllO =  rO\u00aerl \u00aerl \u00aerO'  2  Thus  the  vectors  representing  positions \nat  depth  d  in  the  tree  are  tensors  of rank  d  (taking  the  root  to  be  depth  0).  As \nan  example,  the  tree  s  = cons(A, cons(B, e))  = cons(p, q),  where  p  =  A and  q  = \ncons(B, e), is  represented  by \n\n8 \n\nA\u00aerO + B\u00aerOl + C\u00aerll = A\u00aerO + B\u00aerO\u00aerl + C\u00aerl \u00aerl \nA\u00aerO + (B\u00aerO + C\u00aerl)\u00aerl = p\u00aerO + q\u00aerl, \n\nin accordance  with  Equation  1. \n\nThe complication  in  the vector spaces  needed  to accomplish  this  recursive  analysis \nis  one,  that  allows  us  to  add  together  the  tensors  of different  ranks  representing \ndifferent  depths  in  the tree.  All  we  need  do is  take the direct  sum of the spaces  of \ntensors of different  rank; in effect,  concatenating into a long vector all  the elements \n\n'By adopting this definition of rXt we are essentially taking the recursive structure that \nis implicit in the subscripts  z  labelling the positional role vectors, and mapping it into the \nstructure of the  vectors  themselves. \n\n\f594 \n\nLegendre, Miyata, and Smolensky \n\nI  .  h \n\nof the tensors.  For example, in S  = cons(A, cons(B, C\u00bb, depth 0 is  0,  since  s  isn't an \natom; depth  1 contains  A,  represented  by the tensor S~~1 = AI;'rOP1'  and depth  2 \ncontains Band C,  represented  by S~~IP2 = Bl;'r Opl rlp2 + Cl;'rlpl rl p2 '  The  tree \nas a  who e  IS  t  en  represente  by  t  e  sequence  s  = \nI;'P1P2\"\"  were \nthe tensor for depth 0,  S~), and the tensors for depths d> 2, S~~l\"'PI.' are all zero. \nWe  let  V  denote  the  vector  space  of such  sequences  of tensors  of rank  0,  rank  I, \n...  ,  up  to some maximum depth  D  which  may be infinite.  Two elements  of V  are \nadded (or  \"superimposed\")  simply by adding together the  tensors  of corresponding \nrank.  This is  our vector space for  representing  trees.  a \n\n1;\"  SI;'P1' \n\n(1)  S(2) \n\n{S(O) \n\nh \n\nd \n\nh \n\n} \n\nThe vector operation  cons for  building  the representation  of a  tree from that of its \ntwo subtrees  is  given  by Equation  1.  As  an operation on V  this  can be written: \n\ncons : ({P~), P~~I' P~J1P2\"\"}, {Q~), Q~~l' Q~~lP2''''}) 1-+ \n} \n\n}  {Q(O) \n\nO'PI;'  rO P1 'PI;'P1r OP2\"\"  +  0, \n{ \n\nI;'  rlp1 ,QI;'Pl rl P2\"\" \n\n(1) \n\n(0) \n\n(1) \n\n(Here,  0  denotes  the  zero  vector  in  the  space  representing  atoms.)  In  terms  of \nmatrices  multiplying  vectors in V,  this  can be written \n\ncons(p, q) = W consO  p  + W consl  q \n\n(parallel  to Equation 1)  where  the non-zero elements  of the matrix W consO  are \n\nW \n\n0 \n\ncons  I;'P1P2,,,PI.PI.+l'I;'P1P2\u00b7 .. PI.  -\n\n-rO \n\nPHI \n\nand W consl is  gotten by replacing  ro  with  rl' \nTaking the car or cdr of a  tree - extracting the left or right child - in the recursive \ndecomposition  is  equivalent  to  unbinding either  \"0  or  7'1.  As  shown in  (Smolensky \n1990, Section 3.1), if the role vectors are linearly independent, this unbinding can be \nperformed accurately, via a linear operation, specifically,  a generalized inner product \n(tensor  contraction)  of the  vector  representing  the  tree  with  an  unbinding  vector \nUo  or  ul'  In  general,  the unbinding  vectors are  the  dual basis  to the role  vectors; \nequivalently,  they  are  the  vectors  comprising  the  inverse  matrix  to  the  matrix  of \nall role  vectors.  If the role  vectors  are orthonormal (as in  the simulation discussed \nbelow),  the unbinding  vectors  are the same as  the role  vectors.  The car operation \ncan  be written  explicitly  as an operation on V: \n\n{S(O) \n\n(1) \n\n(2) \n\ncar: \n\n1;\"  SI;'P' SI;'P1P1 ' . ..  .-\n\n} \n\n{E p1  S~~l UOP1 ' E p2 S~~IP2 UOp2 ' E p, S~~lP2P' UOp,' .. -} \n\n3In the connectionist implementation simulated below, there is one unit for each element \nof each  tensor in the  sequence.  In the simulation we  report,  seven  atoms are  represented \nby  (binary)  vectors  in a  three-dimensional space,  so  cp  = O,1,2j  rO  and  rl are  vectors in \na  two-dimensional space,  so  p =  0,1.  The number  of units representing  the  portion of V \nfor  depth  d  is  thus  3 . 24  and  the  total  number  of units  representing  depths  up  to  D  is \n3(2D+l  - 1).  In tensor  product representations,  exact  representation of deeply  embedded \nstructure  does  not  come cheap. \n\n\fDistributed Recursive Structure Processing \n\n595 \n\n(Replacing  uo  by  u1  gives  cdr.)  The  operation  car can  be  realized  as  a  matrix \nW car  mapping V  to V  with  non-zero elements: \n\nW car CPPJ P2\"\"PI..CPP1P2\"\"\"PI.PHJ  =  uOPI.+J\u00b7 \n\nW cdr is  the same matrix,  with  uo  replaced  by u 1.  4 \nOne  of the  main  points  of developing  this  connectionist  representation  of trees \nis  to  enable  massively  parallel  processing.  Whereas  in  the  traditional  sequential \nimplementation of Lisp,  symbol processing  consists  of a  long sequence  of car, cdr, \nand cons operations,  here  we  can compose  together  the corresponding  sequence  of \nW car, W cdr' W consO  and  W cons1  operations  into  a  single  matrix operation. \nAdding some  minimal nonlinearity  allows  us  to compose more complex operations \nincorporating  the  equivalent  of conditional  branching.  We  now illustrate  this  with \na  simple linguistically  motivated example. \n\n3  An  example \n\nThe  symbol  manipulation  problem  we  consider  is  that  of transforming  a  tree  rep(cid:173)\nresentation of a  syntactic  parse of an English  sentence  into a  tree  representation  of \na  predicate-calculus  expression  for  the meaning of the sentence.  We considered  two \n\npossible syntactic structures:  simple active sentences of the form ~ and passive \nsentences  of the form~. Each  was  to be  transformed into  a  tree  represent(cid:173)\ning V(A,P),  namely v~. Here,  the  agent &  and  patient.\u00a3. of the  verb  V are \n\nboth  arbitrarily  complex  noun  phrase  trees.  (Actually,  the  network  could  handle \narbitrarily  complex V's as  well.)  Aux  is  a  marker for  passive  (eg.  is in  is  feared.) \n\nThe  network  was  presented  with  an  input  tree  of either  type,  represented  as  an \nactivation vector using the fully recursive tensor product representation developed in \nthe preceding section.  The seven non-zero binary vectors oflength three coded seven \natoms;  the  role  vectors  used  were  technique  described  above.  The  desired  output \nwas  the  same  tensorial  representation  of the  tree  representing  V(A, B).  The  filler \nvectors  for  the  verb  and for  the constituent  words  of the  two  noun  phrases  should \nbe unbound  from  their  roles  in  the input  tree  and  then  bound  to  the  appropriate \nroles  in  the output tree. \n\nSuch  transformation  was  performed,  for  an  active  sentence,  by  the  operation \ncons ( cadr( s), cons( car( s), cddr( s))) on the input tree s, and for a passive sentence, \nby cons(cdadr(s), cons(cdddr(s), car(s))).  These operations were implemented in \nthe network  as two weight  matrices,  W a  and W p' 5  connecting the input units  to \nthe  output  units  as  shown in  Figure  1.  In  additIon,  the  network  had a  circuit  for \ntNote  that  in the  caSe  when  the  {rO,rl}  are  orthonormal,  and  therefore  uo  =  1'0, \n\nW car = W consO T i  similarly, W cdr = W consl T . \n&The  two  weight  matrices  were  constructed  from  the  four  basic  matrices  as  Wa  -\nW consO W car W cdr + W cons1 (W consO W car + W cons1 W cdr W cdr) and  Wp = \nW consO W cdr W car W cdr + W consl (W consO W cdr W cdr W cdr + W cons1 W car). \n\n\f596 \n\nLegendre, Miyata, and Smolensky \n\nOutput = cons{V,cons{C,cons(A,B\u00bb) \n\nInput =  cons(cons(A,B),cons(cons(Aux,V),cons(by,C)) \n\nFigure  1:  Recursive  tensor product network  processing  a  passive  sentence \n\ndetermining  whether  the input  sentence  was  active  or passive.  In  this  example,  it \nsimply computed, by a  weight  matrix, the caddr of the input tree  (where a  passive \nsentence  should  have  an  Aux),  and if it  was  the  marker  Aux,  gated  (with  sigma-pi \nconnections)  W p , and otherwise  gated Wa. \nGiven  this  setting,  the  network  was  able  to  process  arbitrary  input  sentences  of \neither  type,  up  to  a  certain  depth  (4  in  this  example)  limited  by  the  size  of the \nnetwork,  properly  and generated correct  case role  assignments.  Figure  1 shows the \nnetwork  processing  a  passive  sentence  \u00abA.B).\u00abAux.V).(by.C))) as in  All connection(cid:173)\nist, are  feared  by  Minsky and generating  (V.(C.(A.B\u00bb)  as output. \n\n4  Discussion \n\nThe  formalism  developed  here  for  the  recursive  representation  of trees  generates \nquite different  representations  depending on the choice  of the two fundamental role \nvectors  rO  and  rl and the vectors  for  representing  the  atoms.  At  one  extreme  is \nthe  trivial  fully  local  representation  in  which  one  connectionist  unit  is  dedicated \nto  each  possible  atom  in  each  possible  position:  this  is  the  special  case  in  which \nrO  and  rl  are  chosen  to  be  the  canonical  basis  vectors  (1  0)  and  (0  I),  and  the \nvectors  representing  the  n  atoms  are also  chosen  to be the  canonical  basis  vectors \nof n-space.  The example of the previous  section  illustrated  the case  of (a)  linearly \ndependent  vectors  for  atoms  and  (b)  orthonormal  vectors  for  the  roles  that  were \n\"distributed\"  in  that  both  elements  of both  vectors  were  non-zero.  Property  (a) \npermits  the representation of many more than n  atoms with n-dimensional vectors, \nand  could  be  used  to enrich  the  usual  notions  of symbolic  computation  by letting \n\"similar  atoms\"  be  represented  by  vectors  that  are  closer  to  each  other  than  are \n\"dissimilar  atoms.\"  Property  (b)  contributes  no  savings  in  units  of  the  purely \nlocal  case,  amounting  to  a  literal  rotation  in  role  space.  But  it  does  allow  us \n\n\fDistributed Recursive Structure Processing \n\n597 \n\nto demonstrate  that  fully  distributed  representations  are  as  capable  as  fully  local \nones  at  supporting  massively  parallel  structure  processing.  This  point  has  been \ndenied  (often rather loudly) by advocates oflocal representations and by such critics \nas  (Fodor  &  Pylyshyn  1988)  and  (Fodor  &  McLaughlin  1990)  who  have  claimed \nthat only connectionist  implementations  that preserve  the concatenative  structure \nof language-like  representations  of symbolic  structures  could  be  capable  of  true \nstructure-sensitive  processing. \n\nThe case illustrated  in our example is  distributed  in the sense  that all  units corre(cid:173)\nsponding  to depth  d  in the tree are involved  in  the representation  of all  the atoms \nat  that depth.  But  different  depths  are kept  separate in  the formalism  and in  the \nnetwork.  We  can go  further  by allowing  the  role  vectors  to be linearly  dependent, \nsacrificing full  accuracy  and generality  in structure processing  for  representation  of \ngreater depth in fewer  units.  This case is  the subject  of current  research,  but space \nlimitations  have prevented  us  from  describing  our preliminary  results  here. \n\nReturning  to  Harmonic  Grammar,  the  next  question  is,  having  developed  a  fully \nrecursive  tensor  product  representation  for  lower-level  representation  of embedded \nstructures  such  as  those  ubiquitous  in  syntax,  what  are  the  implications  for  well(cid:173)\nformedness  as  measured  by  the  harmony  function?  A  first  approximation  to  the \nnatural  language  case  is  captured  by  context  free  grammars,  in  which  the  well(cid:173)\nformedness  of a  subtree is  independent  of its level  of embedding.  It turns  out  that \nsuch depth-independent well-formed ness is captured by a simple equation governing \nthe  harmony function  (or  weight  matrix).  At  the  higher  level  where  grammatical \n\"rules\"  of Harmonic  Grammar reside,  this  has the consequence  that  the numerical \nconstant  appearing  in  each  soft  constraint  that  constitutes  a  \"rule\"  applies  at all \nlevels  of embedding.  This  greatly constrains the parameters in the grammar. \n\nReferences \n\n[1]  J. A.  Fodor and B.  P.  McLaughlin.  Connectionism  and the problem of system(cid:173)\naticity:  Why smolensky's solution doesn't  work.  Cognition,  35:183-204, 1990. \n\n[2]  J. A.  Fodor and Z.  W.  Pylyshyn.  Connectionism  and cognitive  architecture:  A \n\ncritical  analysis.  Cognition,  28:3-71, 1988. \n\n[3]  G.  Legendre,  Y.  Miyata,  and  P.  Smolensky.  Harmonic  grammar  - a  formal \n\nmulti-level connectionist  theory of linguistic  well-formedness:  Theoretical foun(cid:173)\ndations.  In  the  Proceeding.  of the  twelveth  meeting  of the  Cognitive  Science \nSociety,  1990a. \n\n[4]  G. Legendre, Y. Miyata, and P. Smolensky. Harmonic grammar - a formal multi(cid:173)\n\nlevel  connectionist  theory  of linguistic  well-formedness:  An  application.  In  the \nProceedings  of the  twelveth  meeting of the  Cognitive  Science  Society,  1990b. \n\n[5]  P.  Smolensky.  Tensor  product  variable  binding and  the representation  of sym(cid:173)\n\nbolic  structures  in  connectionist  networks.  Artificial Intelligence,  46:159-216, \n1990. \n\n\f", "award": [], "sourceid": 406, "authors": [{"given_name": "Geraldine", "family_name": "Legendre", "institution": null}, {"given_name": "Yoshiro", "family_name": "Miyata", "institution": null}, {"given_name": "Paul", "family_name": "Smolensky", "institution": null}]}