{"title": "Reinforcement Learning Predicts the Site of Plasticity for Auditory Remapping in the Barn Owl", "book": "Advances in Neural Information Processing Systems", "page_first": 125, "page_last": 132, "abstract": null, "full_text": "Reinforcement Learning Predicts the Site \nof Plasticity for  Auditory Remapping in \n\nthe Barn  Owl \n\nAlexandre Pougett \n\nalex@salk.edu \n\nCedric Deffayett \n\ncedric@salk.edu \n\nTerrence J.  Sejnowskit \n\nterry@salk.edu \n\ntHoward  Hughes  Medical Institute \n\nThe Salk Institute \nLa Jolla,  CA 92037 \n\nDepartment of Biology \n\nUniversity  of California, San Diego \n\nand \n\ntEcole  Normale Superieure \n\n45  rue  d'Ulm \n\n75005  Paris,  France \n\nAbstract \n\nThe auditory system of the barn owl  contains several spatial maps. \nIn young barn owls raised with optical prisms over their eyes,  these \nauditory  maps are  shifted  to stay  in  register  with the  visual  map, \nsuggesting  that  the  visual  input  imposes  a  frame  of reference  on \nthe  auditory  maps.  However,  the  optic  tectum,  the  first  site  of \nconvergence  of visual  with  auditory information, is  not the  site  of \nplasticity for  the  shift  of the  auditory  maps;  the  plasticity occurs \ninstead  in  the  inferior  colliculus, which  contains  an  auditory  map \nand projects into the optic tectum.  We explored a model of the owl \nremapping in which  a global reinforcement signal whose  delivery  is \ncontrolled  by  visual foveation.  A hebb  learning rule gated by  rein(cid:173)\nforcement  learned  to appropriately adjust auditory  maps.  In addi(cid:173)\ntion,  reinforcement  learning  preferentially  adjusted  the  weights  in \nthe inferior colliculus,  as in the owl brain, even  though the weights \nwere  allowed  to  change  throughout  the auditory system.  This ob(cid:173)\nservation  raises  the  possibility  that  the  site  of learning  does  not \nhave  to  be  genetically specified,  but  could  be  determined  by  how \nthe learning procedure  interacts with the network  architecture. \n\n\f126 \n\nAlexandre Pouget,  Cedric Deffayet,  Te\"ence J.  Sejnowski \n\nc:::======:::::\u00bb  \u2022 \n\nc,an ~_m \n\nOptic Tectum \n\nVisual System \n\n.-\n\nInferior Colllc-ulus \nExternal nucleua \n\nU~ \n\nt \nt \n\nForebrain Field L \n\na~ \n\nOvold.H. Nucleull \n\n\u00b7\"bala:m.ic Relay \n\nInferior Colltculu. \n\nCenlnll Nucleus \n\nC1ec) \n\nt \n\nCochlea \n\nFigure 1:  Schematic view  of the  auditory pathways in the barn owl. \n\n1 \n\nIntroduction \n\nThe barn owl relies  primarily on sounds to localize prey [6]  with an accuracy  vastly \nsuperior  to  that  of humans.  Figure  1A  illustrates  some  of the  nuclei  involved  in \nprocessing auditory signals.  The barn owl  determines the location of sound sources \nby  comparing the  time  and  amplitude  differences  of the  sound  wave  between  the \ntwo ears.  These  two  cues  are  combined together for  the first  time in  the shell  and \ncore  of the  inferior  colliculus  (ICc)  which  is  shown  at  the  bottom of the  diagram. \nCells  in  the ICc are frequency  tuned  and subject  to spatial aliasing.  This prevents \nthem from unambiguously encoding the position of objects.  The first  unambiguous \nauditory  map  is  found  at  the  next  stage,  in  the  external  capsule  of  the  inferior \ncolliculus  (ICx)  which  itself  projects  to  the  optic  tectum  (OT).  The  OT  is  the \nfirst  subforebrain structure which contains a multimodal spatial map in which  cells \ntypically have spatially congruent visual and auditory receptive  fields. \n\nIn addition, these subforebrain auditory pathways send one major collateral toward \nthe forebrain  via a  thalamic relay.  These  collaterals  originate  in  the  ICc  and  are \nthought  to convey  the spatial location of objects  to  the  forebrain  [3].  Within  the \nforebrain,  two  major  structures  have  been  involved  in  auditory  processing: \nthe \narchistriatum and field  L.  The archistriatum sends  a projection to both the inferior \ncolliculus and the optic tectum. \nKnudsen  and  Knudsen  (1985)  have  shown  that  these  auditory maps  can  adapt  to \nsystematic  changes  in  the  sensory  input.  Furthermore,  the  adaptation appears  to \nbe  under  the  control  of visual  input,  which  imposes  a  frame  of reference  on  the \nincoming auditory  signals.  In  owls  raised  with optical  prisms,  which  introduce  a \nsystematic shift in part of the visual field,  the  visual  map in  the optic  tectum was \nidentical  to  that found  in  control  animals,  but  the  auditory  map in  the  ICx  was \nshifted  by  the  amount  of visual  shift  introduced  by  the  prisms.  This  plasticity \nensures  that  the  visual  and  auditory  maps  stay  in spatial  register  during  growth \n\n\fReinforcement Learning  Predicts  the  Site of Plasticity for Auditory Remapping \n\n127 \n\nand other perturbations  to sensory  mismatch. \n\nSince  vision instructs  audition, one  might expect  the auditory  map to shift  in the \noptic  tectum,  the  first  site of visual-auditory  convergence.  Surprisingly,  Brainard \nand  Knudsen  (1993b)  observed  that  the  synaptic  changes  took  place  between  the \nICc and  the ICx, one synapse  before  the site of convergence. \n\nThese observations raise two important questions:  First, how does the animal knows \nhow  to  adapt  the  weights  in  the  ICx  in  the  absence  of a  visual  teaching  signal? \nSecond,  why  does  the  change  take  place  at this  particular  location  and  not  in  the \naT where a  teaching signal would  be readily available? \n\nIn a previous model [7],  this shift was simulated using backpropagation to broadcast \nthe  error  back  through  the  layers  and  by  constraining  the  weights  changes  to the \nprojection  from  the  ICc  to  ICx.  There  is,  however,  no  evidence  for  a  feedback \nprojection  between  from  the  aT to  the  ICx  that  could  transmit  the  error  signal; \nnor is  there  evidence  to exclude  plasticity at other synapses  in  these pathways. \n\nIn this paper, we suggest an alternative approach in which vision guides the remap(cid:173)\nping of auditory  maps by  controlling the  delivery  of a  scalar reinforcement signal. \nThis learning proceeds  by generating random actions and increasing the probability \nof actions  that  are  consistently  reinforced  [1,  5] .  In  addition,  we  show  that  rein(cid:173)\nforcement  learning  correctly  predicts  the  site  of learning in  the  barn  owl,  namely \nat  the  ICx-ICc synapse,  whereas  backpropagation  [8]  does  not  favor  this  location \nwhen  plasticity is  allowed  at every  synapse.  This raises  a  general  issue:  the site of \nsynaptic adjustment might be  imposed by  the combination of the  architecture  and \nlearning rule,  without having to restrict  plasticity to a  particular synapse. \n\n2  Methods \n\n2.1  Network Architecture \n\nThe  network  architecture  of  the  model  based  on  the  barn  owl  auditory  system, \nshown  in figure  2A,  contains  two  parallel  pathways.  The input layer  was  an  8x21 \nmap corresponding to the ICc in which units responded  to frequency  and interaural \nphase  differences.  These  responses  were  pooled  together  to  create  auditory spatial \nmaps at subsequent  stages in  both  pathways.  The rest  of the network  contained  a \nseries  of similar auditory  maps,  which were  connected  topographically by  receptive \nfields  13  units wide.  We  did not  distinguish  between  field  L and  the archistriatum \nin the forebrain  pathways and simply used  two  auditory maps, both called  FBr. \n\nWe used  multiplicative (sigma-pi) units in the aT whose activities were  determined \naccording to: \n\nYi  = L,. w~Br yfBr WfkBr yfc:c \n\nj \n\n(1) \n\nThe  multiplicative interaction  between  ICx  and  FBr  activities  was  an  important \nassumption  of our  model.  It forced  the  ICx  and  FBr  to  agree  on  a  particular \nposition  before  the  aT was  activated.  As  a  result,  if the  ICx-aT synapses  were \nmodified during learning, the ICx-FBr synapses  had to be  changed  accordingly. \n\n\f128 \n\nAlexandre Pouget,  Cedric Deffayet,  Terrence  J.  Sejnowski \n\nFigure  2:  Schematic  diagram of weights  (white  blocks)  in  the  barn  owl  auditory \nsystem.  A)  Diagram of the  initial  weights  in  the  network.  B)  Pattern  of weights \nafter training with reinforcement learning on a prism-induced shift offour units.  The \nremapping took place within the ICx and FBr.  C)  Pattern of weights after training \nwith backpropagation.  This time the ICx-OT and  FBr-OT weights changed. \n\nWeights  were  clipped  between  5.0  and  0.01,  except  for  the  FBr-ICx  connections \nwhose values were allowed to vary between  8.0 and 0.01.  The minimum values were \nset to 0.01 instead of zero to prevent getting trapped in unstable local minima which \nare often associated  with weights  values of zero.  The strong coupling between  FBr \nand  ICx  was  another  important  assumption of the  model  whose  consequence  will \nbe  discussed  in  the last section. \n\nExamples were generated by simply activating one unit in the ICc while keeping the \nothers to zero,  thereby simulating the pattern of activity that would be triggered by \na single localized auditory stimulus.  In all simulations, we  modeled a prism-induced \nshift of four  units. \n\n2.2  Reinforcement learning \n\nWe  used stochastic units and trained  the network  using  reinforcement learning [1]. \nThe weighted sum of the inputs, neti, passed through a sigmoid, f(x) , is interpreted \nas  the probability, Pi,  that the unit will be  active: \n\nPi  =  f(neti) * 0.99 + 0.01 \n\nwere  the output of the  unit Yi  was: \n\n. _ {a  with probability 1 - Pi \n\n1  with  probability Pi \n\ny,  -\n\n(2) \n\n(3) \n\n\fReinforcement Learning  Predicts  the  Site of Plasticity for Auditory Remapping \n\n129 \n\nBecause  of the  form  of the  equation  for  Pi,  all  units  in  the  network  had  a  small \nprobability (0.01) of being spontaneously  active in the absence  of any inputs.  This \nis  what  allowed  the  network  to  perform a  stochastic search  in  action space  to find \nwhich  actions were  consistently  associated  with positive reinforcement. \n\nWe  ensured  that  at  most  one  unit  was  active  per  trial by  using  a  winner-take-all \ncompetition in  each  layer. \nAdjustable weights  in  the network  were  updated  after each  training examples with \nhebb-like  rule gated by reinforcement: \n\n(4) \n\nA trial consisted  in choosing a  random target location for  auditory input (ICc)  and \nthe output of the OT was  used  to generate  a  head  movement.  The reinforcement, \nr , was  then set  to  1 for  head  movements resulting in the foveation  of the stimulus \nand to -0.05 otherwise. \n\n2.3  Backpropagation \n\nFor  the backpropagation network , we  used  deterministic units with sigmoid activa(cid:173)\ntion functions  in which the output of a  unit was given  by: \n\nwhere  neti  is  the weighted sum of the inputs  as  before. \n\nThe  chain  rule  was  used  to  compute  the  partial  derivatives  of the  squared  error, \nE , with  respect  to  each  weights  and  the  weights  were  updated  after  each  training \nexample according  to: \n\n(5) \n\n(6) \n\nThe target vectors  were similar to the input vectors,  namely only one OT units was \nrequired  to  be  activated for  a  given  pattern, but at  a position displaced  by  4  units \ncompared to the input. \n\n3  Results \n\n3.1  Learning site with reinforcement \n\nIn a first set of simulation we kept the ICc-ICx and ICc-FBr weights fixed.  Plasticity \nwas  allowed at these  site in later simulations. \n\nFigure 2A shows the initial set of weights  before learning starts.  The central diago(cid:173)\nnal lines  in  the  weight  diagrams illustrate the fact  that each  unit receives  only one \nnon-zero  weight  from the  unit in the layer below  at the same location. \n\n\f130 \n\nAlexandre Pouget,  Cedric Deffayet, Terrence J.  Sejnowski \n\nThere  are  two  solutions  to  the  remapping:  either  the  weights  change  within  the \nICx  and  FBr,  or  from  the  ICx  and  the  FBr  to  the  ~T. As  shown  in  figure  2B , \nreinforcement  learning  converged  to  the  first  solution. \nIn  contrast,  the  weights \nbetween  the other layers were  unaltered, even  though they  were  allowed to change. \n\nTo  prove  that  the  network  could  have  actually  learned  the  second  solution,  we \ntrained  a  network  in  which  the  ICc-ICx  weights  were  kept  fixed .  As  we  expected, \nthe network shifted its maps simultaneously in both sets of weights converging onto \nthe OT, and the resulting weights were  similar to the ones  illustrated in figure  2C. \nHowever, to reach this solution, three times as many training examples were needed. \n\nThe  reason  why  learning  in  the  ICx  and  FBr  were  favored  can  be  attributed  to \nprobabilistic  nature  of reinforcement  learning.  If the  probability  of finding  one \nsolution is  p,  the probability of finding it twice independently is  p2.  Learning in the \nICx and FBR is  not independent  because of the strong connection from the FBr to \nthe ICx.  When the remapping is  learned in the FBR this connection  automatically \nremapped  the  activities  in  the  ICx  which  in  turn  allows  the  ICx-ICx  weights  to \nremap  appropriately.  In  the  OT on  the  other  hand,  the  multiplicative connection \nbetween  the  ICx  and  FBr  weights  prevent  a  cooperation  between  this  two  sets  of \nweights.  Consequently,  they  have  to  change  independently,  a  process  which  took \nmuch more training. \n\n3.2  Learning at the ICc-ICx and ICc-FBr synapses \n\nThe  aliasing  and  sharp  frequency  tuning  in  the  response  of ICc  neurons  greatly \nslows  down  learning  at  the  ICc-ICx  and  ICc-FBr  synapses.  We  found  that  when \nthese  synapses  were  free  to change,  the  remapping still  took  place  within  the  ICx \nor FBr (figure  3). \n\n3.3  Learning site with backpropagation \n\nIn contrast  to reinforcement learning, backpropagation adjusted the weights in two \nlocations:  between  the ICx and the OT and  between  the  Fbr and  OT (figure  2C). \nThis is  the  consequence  of the  tendency  of the  backpropagation algorithm to first \nchange the weights  closest  to where  the error  is  injected. \n\n3.4  Temporal evolution of weights \n\nWhether  we  used  reinforcement  or  supervised  learning,  the  map  shifted  in  a  very \nsimilar way.  There was a simultaneous decrease  of the original set of weights with a \nsimultaneous increase of the new  weights,  such  that both sets  of weights  coexisted \nhalf way  through  learning.  This indicates  that  the  map shifted  directly  from  the \noriginal setting to the new configuration without going through intermediate shifts. \n\nThis  temporal  evolution of the  weights  is  consistent  the findings  of Brainard  and \nKnudsen  (1993a)  who found  that during the intermediate phase of the remapping, \ncells  in the inferior  colli cuI us  typically  have two  receptive  fields.  More  recent  work \nhowever  indicates  that for  some  cells  the  remapping is  more  continuous(Brainard \nand  Knudsen ,  personal  communication) ,  a  behavior  that  was  not  reproduced  by \neither of the learning rule. \n\n\fReinforcement Learning  Predicts  the  Site  of Plasticity for Auditory Remapping \n\n131 \n\nFigure  3:  Even  when  the ICc-ICx  weights  are free  to change,  the  network  update \nthe weights in the ICx first.  A separate weight matrix is shown for each isofrequency \nmap from  the ICc  to ICx.  The final  weight  matrices were  predominantly diagonal; \nin  contrast,  the weight  matrix in ICx was  shifted. \n\n4  Discussion \n\nOur  simulations  suggest  a  biologically  plausible  mechanism  by  which  vision  can \nguide  the  remapping of auditory spatial  maps in the owl's  brain.  Unlike  previous \napproaches,  which  relied  on  visual  signals  as  an  explicit  teacher  in  the  optic  tec(cid:173)\ntum [7],  our model uses  a global reinforcement signal whose delivery is controlled by \nthe foveal  representation  of the  visual  system.  Other global  reinforcement  signals \nwould  work  as  well.  For example,  a  part of the forebrain  might compare  auditory \nand visual patterns and report spatial mismatch between the two.  This signal could \nbe  easily  incorporated  in our  network  and  would  also  remap the  auditory  map in \nthe inferior colli cuI us. \n\nOur  model  demonstrates  that  the  site  of synaptic  plasticity  can  be  constrained \nby  the  interaction  between  reinforcement  learning  and  the  network  architecture. \nReinforcement learning converged  to the most probably solution through stochastic \nsearch.  In  the  network,  the  strong lateral  coupling between  ICx  and  FBr  and  the \nmultiplicative interaction in the OT favored a solution in which the remapping took \nplace  simultaneously  in  the  ICx  and  FBr.  A  similar mechanism may  be  at  work \nin  the  barn  owl's  brain.  Colaterals from  FBr  to  ICx  are  known  to  exist,  but  the \nmultiplicative interaction has not been  reported in the barn owl  optic tectum. \nLearning mechanisms may also limit synaptic plasticity.  NMDA receptors have been \nreported in the ICx, but they might not be expressed  at other synapses.  There may, \nhowever,  be other mechanisms for  plasticity. \n\nThe site  of remapping in our  model  was  somewhat  different  from  the  existing ob(cid:173)\nservations.  We found  that  the  change took place  within the ICx whereas  Brainard \nand Knudsen  [3]  report that it is  between the ICc and the ICx.  A close examination \nof their  data  (figure  11  in  [3])  reveals  that  cells  at  the  bottom  of ICx  were  not \n\n\f132 \n\nAlexandre Pouget,  Cedric Deffayet,  Terrence  J.  Sejnowski \n\nremapped,  as predicted  by our model, but at the same time, there  is  little anatom(cid:173)\nical  or  physiological evidence  for  a  functional  and  hierarchical  organization within \nthe  ICx.  Additional  recordings  are  need  to  resolve  this  issue.  We  conclude  that \nfor  the  barn  owl's  brain,  as  well  as  for  our  model,  synaptic  plasticity  within  ICx \nwas  favored  over  changes  between  ICc and ICx.  This supports the hypothesis  that \nreinforcement  learning is  used  for  remapping in  the  barn owl  auditory system. \n\nAcknowledgments \n\nWe  thank  Eric  Knudsen  and Michael  Brainard for  helpful  discussions  on  plasticity \nin the barn owl  auditory system  and the results of unpublished experiments.  Peter \nDayan and P.  Read  Montague helped  with  useful  insights on  the  biological basis  of \nreinforcement  learning in the  early stages  of this project. \n\nReferences \n\n[1]  A.G.  Barto  and  M.1.  Jordan.  Gradient  following  without  backpropagation  in \n\nlayered  networks.  Proc.  IEEE Int.  Conf.  Neural  Networks,  2:629-636, 1987. \n\n[2]  M.S.  Brainard  and  E.1.  Knudsen.  Dynamics  of  the  visual  calibration  of the \nmap of interaural time difference  in the  barn owl's optic tectum.  In  Society  For \nNeuroscience  Abstracts, volume 19,  page  369.8,  1993. \n\n[3]  M.S.  Brainard and E.!. Knudsen.  Experience-dependent  plasticity in the inferior \ncolliculus:  a  site  for  visual  calibration of the  neural  representation  of auditory \nspace in  the barn owl.  The  journal  of Neuroscience,  13:4589-4608, 1993. \n\n[4]  E.  Knudsen  and P.  Knudsen.  Vision guides  the adjustment of auditory localiza(cid:173)\n\ntion in the  young  barn owls.  Science,  230:545-548, 1985. \n\n[5]  P.R.  Montague,  P.  Dayan,  S.J.  Nowlan,  A.  Pouget,  and  T.J.  Sejnowski.  U s(cid:173)\n\ning  aperiodic  reinforcement  for  directed  self-organization  during  development. \nIn  S.J.  Hanson,  J.D.  Cowan,  and  C.L.  Giles,  editors,  Advances  in  Neural  In(cid:173)\nformation  Processing  Systems,  volume  5.  Morgan-Kaufmann, San  Mateo,  CA, \n1993. \n\n[6]  R.S.  Payne.  Acoustic  location  of prey  by  barn  owls  (tyto  alba).  Journal  of \n\nExperimental Biology,  54:535-573, 1970. \n\n[7]  D.J.  Rosen,  D.E. Rumelhart,  and  E.I.  Knudsen.  A  connectionist  model of the \nowl's sound  localization system.  In  Advances  in  Neural Information  Processing \nSystems, volume 6.  Morgan-Kaufmann, San  Mateo,  CA,  1994. \n\n[8]  D.E. Rumelhart, G.E.  Hinton, and R.J . Williams.  Learning internal representa(cid:173)\n\ntions by error propagation.  In D.  E.  Rumelhart, J.  L.  McClelland, and the  PDP \nResearch  Group,  editors,  Parallel  Distributed  Processing,  volume  1,  chapter  8, \npages  318-362.  MIT Press,  Cambridge, MA,  1986. \n\n\f", "award": [], "sourceid": 928, "authors": [{"given_name": "Alexandre", "family_name": "Pouget", "institution": null}, {"given_name": "Cedric", "family_name": "Deffayet", "institution": null}, {"given_name": "Terrence", "family_name": "Sejnowski", "institution": null}]}