{"title": "Propagation Filters in PDS Networks for Sequencing and Ambiguity Resolution", "book": "Advances in Neural Information Processing Systems", "page_first": 233, "page_last": 240, "abstract": null, "full_text": "Propagation Filters in  PDS  Networks for \n\nSequencing and  Ambiguity Resolution \n\nRonald A.  Sumida \nMichael G.  Dyer \n\nArtificial  Intelligence  Laboratory \nComputer Science  Department \n\nUniversity of California \nLos  Angeles,  CA,  90024 \n\nsumida@cs.ucla.edu \n\nAbstract \n\nWe  present  a  Parallel  Distributed  Semantic  (PDS)  Network  architecture \nthat  addresses  the  problems  of sequencing  and  ambiguity  resolution  in \nnatural language understanding.  A PDS Network stores phrases and their \nmeanings  using  multiple  PDP  networks,  structured  in  the  form  of a  se(cid:173)\nmantic net.  A  mechanism called  Propagation  Filters  is  employed:  (1)  to \ncontrol  communication  between  networks,  (2)  to  properly  sequence  the \ncomponents of a  phrase, and (3)  to resolve ambiguities.  Simulation results \nindicate that PDS Networks and Propagation Filters can successfully  rep(cid:173)\nresent  high-level knowledge, can be trained relatively quickly, and provide \nfor  parallel inferencing at the knowledge level. \n\n1 \n\nINTRODUCTION \n\nBackpropagation has shown considerable  potential for  addressing problems in nat(cid:173)\nural  language  processing  (NLP).  However,  the  traditional  PDP  [Rumelhart  and \nMcClelland,  1986] approach of using one (or a small number) of backprop networks \nfor  NLP  has been  plagued by a  number of problems:  (1)  it has been  largely unsuc(cid:173)\ncessful  at representing high-level knowledge,  (2)  the networks are slow to train, and \n(3)  they  are  sequential  at  the  knowledge  level.  A  solution  to  these  problems is  to \nrepresent  high-level  knowledge structures  over a  large number of smaller PDP net-\n\n233 \n\n\f234 \n\nSumida and Dyer \n\nworks.  Reducing the size of each network allows for  much faster training, and since \nthe different  networks  can  operate  in  parallel,  more  than one  knowledge  structure \ncan be stored or accessed  at a  time. \n\nIn  using  multiple  networks,  however,  a  number  of important  issues  must  be  ad(cid:173)\ndressed:  how  the individual networks communicate with one another, how patterns \nare  routed  from  one  network  to  another,  and  how  sequencing  is  accomplished  as \npatterns  are  propagated.  In  previous  papers  [Sumida and  Dyer,  1989]  [Sumida, \n1991],  we  have  demonstrated  how  to  represent  high-level  semantic  knowledge  and \ngenerate dynamic inferences  using  Parallel Distributed Semantic  (PDS)  Networks, \nwhich  structure  multiple  PDP  networks  in  the  form  of a  semantic  network.  This \npaper  discusses  how  Propagation  Filters  address  communication and  sequencing \nissues  in using multiple PDP networks for  NLP. \n\n2  PROPAGATION  FILTERS \n\nPropagation  Filters  are  inspired  by  the  idea  of skeleton  filters,  proposed  by  [Se(cid:173)\njnowski,  1981,  Hinton,  1981].  They  are  composed  of:  (1)  sets  of filter  ensembles \nthat gate the connection from a  source to a destination and (2) a  selector  ensemble \nthat  decides  which  filter  group  to  enable.  Each  filter  group  is  sensitive  to  a  par(cid:173)\nticular  pattern  over  the  selector.  When  the  particular  pattern  occurs,  the  source \npattern is  propagated to its destination.  Figure 1 is an example of a  propagation fil(cid:173)\nter where  the  \"01\" pattern over units 2 and 3 of the selector opens up filter group1, \nthus  permitting the  pattern  to be copied  from source1  to  destination!.  The units \nof filter  group2 do not respond  to the  \"01\"  pattern and remain well below  thresold, \nso the activation pattern over  the source2  ensemble  is  not  propagated. \n\nH*wrMA~f-~ \n... i 1 \n\nI \nI \n\nI \nI \n\n\u2022 \u2022 \u2022 \u2022   I \n\nI \n\nI \n\nI \n\nlOurcol \n\nfill\u00ab IJ'OUpl  I  I \n\n~-----~-~~--~ \n\nI \n\nI \n\nsourcc2 \n\nMv..-~-\n\nfilter group2 \n\ndestin.tioo2 \n\nFigure  1:  A  Propagation  Filter architecture.  The small circles  indicate PDP  units \nwithin an ensemble (oval), the black arrows represent  full connectivity between two \nensembles,  and  the  dotted  lines  connecting  units  2  and  3  of the  selector  to  each \nfilter  group oval indicate total connectivity from selector  units to filter  units.  The \njagged lines  are suggestive  of temporary patterns of activation over an ensemble. \n\nThe  units  in  a  filter  group  receive  input  from  units  in  the  selector.  The  weights \non  these  input connections  are set so  that  when  a  specific  pattern  occurs  over  the \n\n\fPropagation Filters  in  PDS Networks \n\n235 \n\nselector,  every  unit  in  the filter  group  is  driven  above  threshold.  The  filter  units \nalso receive input from the source units and provide output to the destination units. \nThe weights on both these i/o connections can be set so that the filter merely  copies \nthe  pattern from  the  source  to the destination when  its  units exceed  threshold  (as \nin  Figure  1).  Alternatively,  these  weights  can  be set  (e.g.  using  backpropagation) \nso  that the filter  transforms the source  pattern to a  desired  destination  pattern. \n\n3  PDS  NETWORKS \n\nPDS  Networks  store  syntactic  and  semantic  information over  multiple  PDP  net(cid:173)\nworks,  with  each  network  representing  a  class  of concepts  and  with  related  net(cid:173)\nworks  connected  in  the  general  manner of a  semantic  net.  For  example,  Figure  2 \nshows  a  network  for  encoding  a  basic  sentence  consisting  of a  subject,  verb  and \ndirect  object.  The network is connected  to other PDP networks,  such  as  HUMAN, \nVERB and  ANIMAL,  that store information about the content of the subject  role \n(s-content),  the  filler  for  the  verb  role,  and  the  content  of the  direct-object  role \n(do-content).  Each network functions as a type of encoder net,  where:  (1)  the input \nand output layers have the same number of units and are presented with exactly the \nsame pattern, (2)  the weights of the network are modified so  that the input pattern \nwill  recreate  itself as  output,  and  (3)  the  resulting hidden  unit  pattern  represents \na  reduced  description  of the  input.  In  the  networks  that  we  use,  a  single  set  of \nunits  is  used  for  both the  input and  output layers.  The net  can thus be viewed  as \nan  encoder  with  the  output  layer  folded  back  onto  the  input  layer  and  with  two \nsets  of connections:  one  from  the  single  input/output  layer  to  the  hidden  layer, \nand  one from  the hidden  layer  back  to the  i/o layer.  In  Figure  2 for  example,  the \nsubject-content,  verb,  and  direct-object-content  role-groups  collectively  represent \nthe input/output layer,  and the  BASIC-S  ensemble represents  the hidden  layer . \n\n, \n\nI \n\n..................... \n\n-----....<//tvM \n, \nI MA \n\nHUMAN \n\nI \nI \n\n=\"hit\" \n\nVERB \n\n= DOG \n\nFigure  2:  The network  that stores  information about a  basic  sentence.  The black \narrows represent  links from the input layer to the hidden layer and the grey arrows \nindicate  links from  the  hidden  layer to  the output layer.  The thick  lines  represent \nlinks between  networks  that propagate a  pattern without changing it. \n\nA  network  stores  information by  learning  to  encode  the  items in  its  training set. \n\n\f236 \n\nSumida and Dyer \n\nFor  each  item,  the  patterns  that  represent  its features  are  presented  to  the  input \nrole  groups,  and the  weights  are  modified so  that the  patterns recreate  themselves \nas  output.  For example, in Figure  2,  the  MAN-\"hit\"-DOG  pattern  is  presented  to \nthe BASIC-S network by  propagating the MAN  pattern from the HUMAN  network \nto the s-content role, the \"hit\" pattern from the VERB network to the verb-content \nrole,  and  the  DOG  pattern  from  the  ANIMAL  network  to  the  do-content  role. \nThe  BASIC-S  network  is  then  trained  on  this  pattern  by  modifying  the  weights \nbetween  the  input/output  role  groups  and  the  BASIC-S  hidden  units  so  that  the \nMAN-\"hit\"-DOG  pattern  recreates  itself as  output.  The  network  automatically \ngeneralizes  by having the hidden units  become sensitive  to common features  of the \ntraining  patterns.  When  the  network  is  tested  on  a  new  concept  (i.e.,  one  that  is \nnot  in  the training set),  the  pattern  over  the hidden  units  reflects  its similarity to \nthe items seen  during training. \n\n3.1  SEQUENCING  PHRASES \n\nTo illustrate how Propagation Filters sequence the components of a phrase, consider \nthe following sentence, whose constituents occur in the standard subject-verb-object \norder:  81.  The  man  hit  the  dog.  We  would  like  to  recognize  that  the  BASIC-S \nnetwork of Figure 2 is  applicable to the input by binding the roles of the network to \nthe correct  components.  In  order to generate  the  proper  role  bindings,  the system \nmust:  (1)  recognize  the components of the sentence  in  the  correct  order  (e.g.  \"the \nman\"  should  be  recognized  as  the  subject,  \"hit\"  as  the  verb,  and  \"the  dog\"  as \nthe  direct  object),  and  (2)  associate  each  phrase  of  the  input  with  its  meaning \n(e.g.  reading the  phrase  \"the man\" should cause the pattern for  the concept  MAN \nto  appear  over  the  HUMAN  units).  Figure  3  illustrates  how  Propagation  Filters \nproperly sequence  the components of the sentence. \nFirst,  the phrase  \"the man\" is  read by placing the pattern for  \"the\"  over the deter(cid:173)\nminer network  (Step 1)  and the pattern for  \"man\" over the noun network  (Step 2). \nThe  \"the\"  pattern  is  then  propagated  to  the  np-determiner  input  role  units  of \nthe  NP  network  (Step  3)  and  the  \"man\"  pattern  to  the  np-noun  role  input  units \n(Step 4).  The pattern  that results  over  the hidden  NP units is  then  used  to repre(cid:173)\nsent the entire phrase  \"the man\"  (Step 5).  The filters  connecting the NP units with \nthe subject and direct  object roles are not enabled, so  the pattern is  not yet  bound \nto  any  role.  Next,  the  word  \"hit\"  is  read  and  a  pattern  for  it  is  generated  over \nthe  VERB  units  (Step  6).  The  BASIC-S  network  is  now  applicable  to  the  input \n(for simplicity of exposition,  we  ignore passive constructions  here).  Since there  are \nno  restrictions  (i.e.,  no  filter)  on  the connection  between  the  VERB  units and  the \nverb  role  of BASIC-S,  the  \"hit\"  pattern  is  bound  to  the  verb  role  (Step  7).  The \nverb  role  units  act  as the selector  of the  Propagation Filter  that connects  the  NP \nunits  to  the  subject  units.  The  filter  is  constructed  so  that  whenever  any  of the \nverb role  units receive  non-zero input (i.e., whenever  the role is  bound) it opens up \nthe  filter  group  connecting  NP  with  the  subject  role  (Step  8).  Thus,  the  pattern \nfor  \"the man\"  is  copied from  NP  to the subject  (Step  9)  and deleted  from  the  NP \nunits.  Similarly,  the subject  units  act  as  the  selector  of a  filter  that  connects  NP \nwith the  direct  object.  Since  the subject  was just bound,  the  connection from the \nNP  to  direct  object  is  enabled  (Step  10).  At  this  point,  the system has  generated \nthe  expectation that  a  NP  will  occur  next.  The phrase  \"the dog\"  is  now  read  and \n\n\fPropagation Filters  in  PDS Networks \n\n237 \n\n9.  MA \n\n\"the man\" \n\n7.~ \n,. \n\"hit\" \n,. \n\n,. \n\n.... \n\n.... \n\n.... \n\n16.~ \n\n\"the dog\" \n\ns\u00b7MA \n\n\"the man\" \n\n11-IS.  ~ \n\n\"the dog\" \n\nVERB \n\n6.~ \n\n\"hit\" \n\n3.~ \nDET  l.~ \n\n\"the\" \n\n\"the\" \n\nN \n\n4\u00b7NM \n\n\"man\" \n\n2.NM \n\n\"man\" \n\nFigure  3:  The  figure  shows  how  Propagation  Filters  sequence  the  components  of \nthe  sentence  \"The  man  hit  the  dog\".  The  numbers  indicate  the  order  of events. \nThe  dotted  arrows  indicate  Propagation  Filter  connections  from  a  selector  to  an \nopen  filter  group  (indicated  by  a  black  circle)  and  the  dark  arrows  represent  the \nconnections from a  source to a destination. \n\nits pattern  is  generated  over  the  NP  units  (Steps  11-15).  Finally, the  pattern for \n\"the dog\"  is  copied across  the open connection from  NP to direct-object  (Step  16). \n\n3.2  ASSOCIATING  PHRASES WITH  MEANINGS \n\nThe next task is to associate lexical patterns with their corresponding semantic pat(cid:173)\nterns and bind semantic patterns to the appropriate roles in  the BASIC-S  network. \nFigure  4  indicates  how  Propagation  Filters:  (1)  transform  the  phrase  \"the  man\" \ninto its meaning (i.e.,  MAN),  and  (2)  bind MAN  to the s-content  role of BASIC-S. \n\nReading the word  \"man\", by placing the \"man\" pattern into the noun units (Step 2), \nopens the filter connecting N to HUMAN  (Step 5),  while leaving the filters  connect(cid:173)\ning  N  to  other  networks  (e.g.  ANIMAL)  closed.  The  opened  filter  transforms \nthe  lexical  pattern  \"man\"  over  N  into  the  semantic  pattern  MAN  over  HUMAN \n(Step  7).  Binding  \"the  man\"  to  subject  (Step  8)  by  the  procedure  shown  in  the \nFigure  3  opens  the  filter  connecting  HUMAN  to  the  s-content  role  of  BASIC-S \n(Step  9).  The s-content  role  is  then  bound to  MAN  (Step 10). \nThe  do-content  role  is  bound  by  a  procedure  similar  to  that  shown  in  Figure  4. \nWhen  \"dog\"  is  read,  the filter  connecting  N  with ANIMAL  is  opened  while filters \nto other  networks  (e.g.  HUMAN)  remain closed.  The  \"dog\"  pattern is  then  trans(cid:173)\nformed  into the semantic pattern  DOG over  the  ANIMAL  units.  When  \"the dog\" \n\n\f238 \n\nSumida and Dyer \n\nBASIC-S \n\nFigure 4:  The figure illustrates how the concept MAN is  bound to the s-content role \nof BASIC-S,  given  the  phrase  \"the  man\"  as  input.  Black  (white)  circles  indicate \nopen (closed)  filters. \n\nis  bound  to direct-object  as  in  Figure  3,  the filter  from  ANIMAL  to do-content  is \nopened,  and DOG is  propagated from ANIMAL  to the do-content role of BASIC-S. \n\n3.3  AMBIGUITY RESOLUTION  AND  INFERENCING \n\nThere are two forms  that inference  and ambiguity resolution  can take:  (1)  routing \npatterns  (e.g.  propagation  of role  bindings)  to  the  appropriate  subnets  and  (2) \npattern  reconstruction from items seen  during training. \n(1)  Pattern  Routing:  Propagation  Filters  help  resolve  ambiguities  by  having  the \nselector only open connections to the network containing the correct interpretation. \nAs  an  example, consider  the following sentence:  S2.  The  singer hit the  note.  Both \nS2 and Sl (Sec.  3.1) have the same syntactic structure and are therefore represented \nover  the  BASIC-S  ensemble  of Figure  2.  However,  the  meaning of the  word  \"hit\" \nin Sl refers  to physically striking an object while in S2  it refers  to singing a musical \nnote.  The  pattern  over  the  BASIC-S  units  that  represents  Sl  differs  significantly \nfrom  the  pattern  that  represents  S2,  due  to  the  differences  in  the  s-content  and \ndo-content  roles.  A  Propagation Filter with  the  BASIC-S  units as  its selector  uses \nthe differences  in the two patterns to determine whether to open connections  to the \nHIT network  or to the PERFORM-MUSIC network  (Figure 5). \n\n\fPropagation Filters  in PDS Networks \n\n239 \n\nPERRlRM-MUSIC \n\n---/wA \n\nFigure  5:  The  pattern  over  BASIC-S  acts  as  a  selector  that  determines  whether \nto open  the connections  to HIT or to  PERFORM-MUSIC.  Since  the input  here  is \nMAN-\"hit\"-DOG,  the  filters  to  HIT  are  opened  while  the  filters  to  PERFORM(cid:173)\nMUSIC  remain closed.  The black and  grey  arrows indicating connections  between \nthe input/output and hidden layers  have been  replaced  by a  single thin line. \n\nDuring training, the  BASIC-S  network was  presented  with sentences  of the general \nform  <MUSIC-PERFORMER  \"hit\"  MUSICAL-NOTE>  and  <ANIMATE  \"hit\" \nOBJECT>.  The  BASIC-S  hidden  units  generalize  from  the  training sentences  by \ndeveloping a distinct pattern for each of the two types of \"hit\" sentences.  The Prop(cid:173)\nagation  Filter  is  then  constructed  so  that  the  hidden  unit  pattern  for  <MUSIC(cid:173)\nPERFORMER  \"hit\"  MUSICAL-NOTE>  opens  up  connections  to  PERFORM(cid:173)\nMUSIC, while the pattern for  <ANIMATE  \"hit\"  OBJECT> opens up connections \nto HIT. Thus, when S1  is  presented,  the BASIC-S hidden units develop the pattern \nclassifying it as  <ANIMATE  \"hit\"  OBJECT>, which  enables  connections  to HIT. \nFor  example,  Figure  5  shows  how  the  MAN  pattern  is  routed  from  the  s-content \nrole  of BASIC-S  to the  actor role  of HIT and  the  DOG pattern  is  routed from  the \ndo-content role of BASIC-S  to the object role of HIT. If S2 is presented instead, the \nhidden  units will classify  it as  <MUSIC-PERFORMER  \"hit\"  MUSICAL-NOTE> \nand open  the connections  to  PERFORM-MUSIC. \nThe  technique  of using  propagation filters  to  control  pattern  routing  can  also  be \napplied  to generate  inferences.  Consider  the sentence,  \"Douglas hit Tyson\".  Since \nboth  are  boxers,  it  is  plausible  they  are  involved  in  a  competitive activity.  In  S1, \nhowever,  punishing  the  dog  is  a  more  plausible  motivation for  HIT.  The  proper \ninference  is  generated  in  each  case  by  training  the  HIT  network  (Figure  5)  on  a \nnumber of instances of boxers  hitting one  another and of people hitting dogs.  The \nnetwork learns two  distinct sets of hidden unit patterns:  <BOXER-HIT-BOXER> \nand  <HUMAN-HIT-DOG>.  A  Propagation  Filter,  (like  that  shown  in  Figure  5) \nwith the HIT units as its selector, uses  the differences  in the two classes of patterns \nto  route  to either  the  network  that stores  competitive activities or  to  the  network \nthat stores punishment acts. \n(2)  Pattern  Reconstruction:  The system also resolves ambiguities by reconstructing \npatterns that  were  seen  during training.  For example, the word  \"note\"  in sentence \n\n\f240 \n\nSumida and Dyer \n\nS2  is  ambiguous  and  could  refer  to  a  message,  as  in  \"The  singer  left  the  note\". \nThus,  when  the  word  \"note\"  is  read  in  S2,  the  do-content  role  of BASIC-S  can \nbe  bound  to  MESSAGE  or  to  MUSICAL-NOTE.  To  resolve  the  ambiguity,  the \nBASIC-S network uses  the information that SINGER is  bound to the s-content role \nand  \"hit\"  to  the  verb  role  to:  (1)  reconstruct  the  <MUSIC-PERFORMER  \"hit\" \nMUSICAL-NOTE>  pattern that it learned during training and (2) predict that the \ndo-content will be MUSICAL-NOTE.  Since the prediction is consistent  with one  of \nthe possible meanings for  the do-content role, the ambiguity is resolved.  Similarly, if \nthe input had been  \"The singer left  the note\",  BASIC-S would use  the binding of a \nhuman to the s-content role and the binding of \"left\"  to the verb role  to reconstruct \nthe pattern  <HUMAN  \"left\"  MESSAGE>  and thus resolve  the ambiguity. \n\n4  CURRENT STATUS  AND  CONCLUSIONS \n\nPDS Networks  and Propagation Filters are implemented in  OCAIN, a  natural lan(cid:173)\nguage understanding system that:  (1)  takes each word of the input sequentially, (2) \nbinds the roles of the corresponding syntactic and semantic structures in the proper \norder,  and  (3)  resolves  ambiguities.  In  our  simulations with  OCAIN,  we  success(cid:173)\nfully  represented  high-level  knowledge  by  structuring  individual  PDP  networks  in \nthe form of a semantic net.  Because  the system's knowledge is spread over multiple \nsubnetworks, each one is relatively small and can therefore  be trained quickly.  Since \nthe subnetworks can operate in  parallel,  OCAIN  is  able to store and  retrieve  more \nthan one  knowledge  structure  simultaneously, thus achieving knowlege-Ievel  paral(cid:173)\nlelism.  Because  PDP ensembles  (versus  single localist  units) are used,  the general(cid:173)\nization, noise  and fault-tolerance  properties of the  PDP approach  are  retained.  At \nthe same time, Propagation Filters provide control over the way patterns are routed \n(and transformed) between  subnetworks.  The  PDS architecture,  with its Propaga(cid:173)\ntion  Filters,  thus  provides significant  advantages over  traditional  PDP  models for \nnatural language understanding. \n\nReferences \n[Hinton,  1981]  G.  E.  Hinton.  Implementing Semantic  Networks  in  Parallel  Hard(cid:173)\n\nware.  In  Parallel  Models  of Associative  Memory,  Lawrence  Erlbaum,  Hillsdale, \nNJ,  1981. \n\n[Rumelhart and McClelland,  1986]  D. E. Rumelhart and J. L.  McClelland.  Parallel \nDistributed Processing,  Volume 1.  MIT Press,  Cambridge, Massachusetts,  1986. \n[Sejnowski,  1981]  T. J. Sejnowski.  Skeleton Filters in the Brain.  In  Parallel Models \n\nof Associative  Memory,  Lawrence  Erlbaum, Hillsdale,  NJ,  1981. \n\n[Sumida and  Dyer,  1989]  R.  A.  Sumida and  M.  G.  Dyer.  Storing and  Generalizing \n\nMultiple  Instances  while  Maintaining Knowledge-Level  Parallelism.  In  Proceed(cid:173)\nings  of  the  Eleventh  International  Joint  Conference  on  Artificial  Intelligence, \nDetroit,  MI,  1989. \n\n[Sumida, 1991]  R.  A.  Sumida.  Dynamic Inferencing in Parallel Distributed Seman(cid:173)\n\ntic Networks.  In Proceedings of the  Thirteenth Annual Conference  of the  Cognitive \nScience  Society,  Chicago, IL,  1991. \n\n\f", "award": [], "sourceid": 553, "authors": [{"given_name": "Ronald", "family_name": "Sumida", "institution": null}, {"given_name": "Michael", "family_name": "Dyer", "institution": null}]}