{"title": "Using Collective Intelligence to Route Internet Traffic", "book": "Advances in Neural Information Processing Systems", "page_first": 952, "page_last": 960, "abstract": null, "full_text": "USING  COLLECTIVE INTELLIGENCE \n\nTO  ROUTE INTERNET  TRAFFIC \n\nDavid H.  Wolpert \n\nNASA  Ames  Research Center \n\nMoffett Field, CA  94035 \ndhw@ptolemy.arc.nasa.gov \n\nKagan Turner \n\nNASA  Ames  Research Center \n\nMoffett  Field,  CA  94035 \n\nkagan@ptolemy.arc.nasa.gov \n\nJeremy Frank \n\nNASA  Ames  Research Center \n\nMoffett  Field,  CA  94035 \n\nfrank@ptolemy.arc.nasa.gov \n\nAbstract \n\nA  COllective INtelligence  (COIN)  is  a  set of interacting reinforce(cid:173)\nment  learning  (RL)  algorithms  designed  in  an  automated fashion \nso that their collective behavior optimizes a global utility function. \nWe summarize the theory of COINs, then present experiments us(cid:173)\ning that theory to design COINs to control internet traffic routing. \nThese experiments  indicate that COINs  outperform all previously \ninvestigated RL-based, shortest path routing algorithms. \n\n1 \n\nINTRODUCTION \n\nCOllective  INtelligences  (COINs)  are  large,  sparsely  connected  recurrent  neural \nnetworks,  whose  \"neurons\"  are reinforcement  learning  (RL)  algorithms.  The  dis(cid:173)\ntinguishing feature of COINs is  that their dynamics involves no centralized control, \nbut  only  the  collective  effects  of the  individual  neurons  each  modifying  their  be(cid:173)\nhavior via their  individual  RL  algorithms.  This  restriction  holds  even  though  the \ngoal  of the  COIN  concerns  the system's  global  behavior.  One  naturally-occurring \nCOIN is  a  human economy, where the  \"neurons\"  consist of individual humans try(cid:173)\ning to maximize their reward, and the  \"goal\", for  example, can be viewed as  having \nthe  overall  system  achieve  high  gross  domestic  product.  This  paper  presents  a \npreliminary  investigation of designing  and  using  artificial  COINs  as  controllers of \ndistributed systems.  The domain we  consider is  routing of internet traffic. \n\nThe  design  of a  COIN  starts  with  a  global  utility  function  specifying  the  desired \nglobal behavior.  Our task is to initialize and then update the neurons' \"local\" utility \n\n\fUsing Collective Intelligence to Route Internet  Traffic \n\n953 \n\nfunctions, without centralized control, so that as the neurons improve their utilities, \nglobal utility also improves.  (We  may also wish to update the local topology of the \nCOIN.)  In  particular, we  need to ensure that the neurons  do  not  \"frustrate\"  each \nother as they attempt to increase their utilities.  The RL algorithms at each neuron \nthat  aim  to  optimize  that  neuron's  local  utility  are  microlearners.  The  learning \nalgorithms that update the neuron's utility functions  are macrolearners. \n\nFor robustness and breadth of applicability, we assume essentially no knowledge con(cid:173)\ncerning the dynamics of the full system, i.e., the macrolearning and/ or microlearning \nmust  \"learn\"  that dynamics,  implicitly  or otherwise.  This  rules  out any  approach \nthat models the full  system.  It also means that rather than use domain knowledge \nto  hand-craft  the  local  utilities  as  is  done  in  multi-agent  systems,  in  COINs  the \nlocal utility functions  must be automatically initialized and updated using only the \nprovided global utility and  (locally)  observed dynamics. \n\nThe  problem  of designing  a  COIN  has  never  previously  been  addressed  in  full  -\nhence  the  need  for  the  new  formalism  described  below.  Nonetheless,  this  prob(cid:173)\nlem  is  related  to  previous  work  in  many  fields:  distributed  artificial  intelligence, \nmulti-agent  systems,  computational  ecologies,  adaptive  control,  game  theory  [6], \ncomputational markets  [2],  Markov decision theory,  and ant-based optimization. \n\nFor the particular problem of routing, examples of relevant work include  [4,  5,  8,  9, \n10].  Most  of that previous  work uses  microlearning to set the internal p'arameters \nof routers running conventional shortest path algorithms  (SPAs).  However the mi(cid:173)\ncrolearning occurs, they do not address the problem of ensuring that the associated \nlocal utilities  do not cause the microlearners to work at cross purposes. \n\nThis  paper  concentrates  on  COIN-based  setting  of  local  utilities  rather  than \nmacrolearning.  We  used  simulations  to  compare  three  algorithms.  The first  two \nare  an  SPA  and  a  COIN.  Both  had  \"full  knowledge\"  (FK)  of the  true  reward(cid:173)\nmaximizing  path,  with  reward  being  the  routing  time  of  the  associated  router's \npackets for  the SPAs, but set by  COIN theory for  the COINs.  The third algorithm \nwas  a  COIN  using  a  memory-based  (MB)  microlearner  [1]  whose  knowledge  was \nlimited to local observations. \nThe performance of the FK COIN  was  the theoretical optimum.  The performance \nof the  FK  SPA  was  12.5 \u00b1 3  % worse  than optimum.  Despite  limited  knowledge, \nthe  MB  COIN outperformed the  FK  SPA,  achieving performance 36 \u00b1 8  % closer \nto optimum.  Note that the performance of the FK SPA  is  an upper bound on the \nperformance of any RL-based SPA.  Accordingly, the performance of the MB COIN \nis  at least 36%  superior to that of any RL-based SPA. \n\nSection  2  below  presents  a  cursory  overview  of the  mathematics  behind  COINs. \nSection  3  discusses  how  the  network  routing  problem  is  mapped  into  the  COIN \nformalism,  and  introduces  our  experiments.  Section  4  presents  results  of  those \nexperiments, which establish the power of COINs in the context of routing problems. \nFinally,  Section 5 presents conclusions and summarizes future  research directions. \n\n2  MATHEMATICS  OF  COINS \n\nThe  mathematical  framework  for  COINs  is  quite  extensive  [11,  12].  This  paper \nconcentrates  on  four  of  the  concepts  from  that  framework:  subworlds,  factored \nsystems, constraint-alignment, and the wonderful-life utility function. \nWe consider the state of the system across a set of discrete, time steps, t  E  {O, 1, ... }. \nAll characteristics of a  neuron at time t  -\nincluding its internal parameters at that \n\n\f954 \n\nD.  H.  Wolpert,  K.  Turner and J.  Frank \n\ntime  as  well  as  its  externally  visible  actions  -\nare  encapsulated  in  a  real-valued \nvector i 17,t'  We  call  this the  \"state\"  of neuron  1]  at time  t,  and  let i  be the state \nof all  neurons  across  all  time.  World  utility,  G((), is  a  function  of the state of all \nneurons across all time,  potentially not expressi@.e  as a  discounted sum. \n\nA subworld is  a  set of neurons.  All  neurons in the same subworld w  share the same \nsubworld utility function 9w (().  So when each subworld is  a set of neurons that have \nthe most effect on each other, neurons are unlikely to work at cross-purposes -\nall \nneurons that affect each other substantially share the same local  utility. \n\nAssociated with subworlds is  the concept of a  (perfectly)  constraint-aligned system. \nIn  such  systems  any  change  to the  neurons  in  subworld  w  at time  0  will  have  no \neffects  on  the  neurons  outside  of w  at  times  later  than  O. \nIntuitively,  a  system \nis  constraint-aligned if the neurons  in  separate subworlds  do  not affect  each other \ndirectly,  so that the rationale behind the use of subworlds holds. \n\nA  subworld-factored system is  one where for each subworld w  considered by  itself,  a \nchange at time 0 to the states of the neurons in that subworld results in an increased \nvalue for 9w(() if and only if it results in an increased value for G(().  For a subworld(cid:173)\nfactored system,  the side effects  on the rest of the system of w's-increasing its own \nutility  (which perhaps decrease other subworlds' utilities)  do not end up decreasing \nworld utility.  For these systems, the separate subworlds successfully pursuing their \nseparate goals do not frustrate each other as far  as  world utility  is  concerned. \n\nThe desideratum of subworld-factored is  carefully crafted.  In particular, it does  not \nconcern changes in the value of the utility of subworlds other than the one changing \nits  actions.  Nor  does  it  concern  changes  to  the  states  of neurons  in  more  than \none subworld at once.  Indeed,  consider the following  alternative desideratum:  any \nchange to the t  =  0 state of the entire system that improves all  subworld utilities \nsimultaneously also  improves  world  utility.  Reasonable as  it  may  appear,  one  can \nconstruct  examples  of systems  that obey this  desideratum  and  yet  quickly  evolve \nto a  minimum of world utility  [12J. \n\nIt  can  be  proven  that  for  a  subworld-factored  system,  when  each of the neurons' \nreinforcement  learning  algorithms  are  performing  as  well  as  they  can,  given  each \nothers' behavior,  world utility  is  at a  critical point.  Correct global  behavior corre(cid:173)\nsponds to learners reaching a  (Nash)  equilibrium  [8,  13J.  There can be no tragedy \nof the commons for  a  subworld-factored system [7,  11,  12J. \nLet CLw (() be defined as the vector (  modified by clamping the states of all neurons \nin subworld w,  across all time,  to an-arbitrary fixed  value,  here taken to be O.  The \nwonderful life  subworld utility (WLU)  is: \n\n(1) \n\nWhen the system is  constraint-aligned, so that, loosely speaking, subworld w's  \"ab(cid:173)\nsence\"  would not affect  the rest of the system,  we  can view  the WLU  as  analogous \nto the change in world  utility that would  have arisen if subworld w  \"had never ex(cid:173)\nisted\".  (Hence the name of this utility - cf.  the Frank Capra movie.)  Note however, \nthat  CL  is  a  purely  mathematical operation.  Indeed,  no  assumption is  even  being \nmade that CLw (()  is  consistent with the dynamics of the system.  The sequence of \nstates the neurons  in  w are clamped to in  the definition  of the  WL U  need  not  be \nconsistent with the dynamical laws of the system. \nThis  dynamics-independence  is  a  crucial  strength  of the  WLU.  It means  that  to \nevaluate the WLU  we  do  not try to infer how  the system would have evolved if all \nneurons in w were set to 0 at time 0 and the system evolved from  there.  So long as \n\n\fUsing Collective Intelligence to Route Internet Traffic \n\n955 \n\nwe  know  (  extending over all  time,  and so  long as  we  know  G,  we  know  the value \nof WL U.  This is  true even if we  know nothing of the dynamics of the system. \n\nIn  addition  to  assuring  the  correct  equilibrium  behavior,  there  exist  many  other \ntheoretical advantages to having a  system be subworld-factored.  In  particular, the \nexperiments  in  this  paper  revolve  around  the following  fact:  a  constraint-aligned \nsystem with wonderful life  subworld utilities  is  subworld-factored.  Combining this \nwith our previous result that subworld-factored systems are at Nash equilibrium at \ncritical points of world utility, this result leads us to expect that a constraint-aligned \nsystem  using  WL  utilities  in  the  microlearning  will  approach  near-optimal  values \nof the world  utility.  No such assurances accrue to WL utilities  if the system is  not \nconstraint-aligned  however.  Accordingly  our experiments  constitute  an  investiga(cid:173)\ntion of how  well  a  particular system performs when WL  utilities  are used but little \nattention is  paid to ensuring that the system is  constraint-aligned. \n\n3  COINS  FOR NETWORK ROUTING \n\nIn our experiments we  concentrated on the two networks in Figure 1,  both slightly \nlarger than those in  [9].  To facilitate the analysis,  traffic  originated only at routers \nindicated  with  white  boxes  and  had  only  the  routers  indicated  by  dark  boxes  as \nultimate destinations.  Note that in both networks there is  a  bottleneck at router 2. \n\n-(a)  Network  A \n\n(b )  Network  B \n\nFigure 1:  Network Architectures. \n\nAs  is  standard  in  much  of traffic  network  analysis  [3],  at  any  time  all  traffic  at \na  router  is  a  real-valued  number  together  with  an  ultimate  destination  tag.  At \neach  timestep,  each  router sums  all  traffic  received  from  upstream  routers  in  this \ntimestep,  to get a  load.  The router then  decides  which  downstream router to send \nits load to,  and the cycle repeats. \nA running average is kept of the total value of each router's load over a window of the \nprevious  L  timesteps.  This average is  run through a  load-to-delay function,  W(x), \nto  get  the summed  delay  accrued  at this  timestep  by  all  those  packets  traversing \nthis router at this timestep.  Different routers had different W(x), to reflect the fact \nthat real networks have differences in router software and hardware (response time, \nqueue  length,  processing speed  etc).  In  our experiments  W(x)  =  x 3  for  routers  1 \nand  3,  and  W(x)  =  log(x + 1)  for  router  2,  for  both  networks.  The global goal is \nto minimize total delay encountered by  all traffic. \n\n\f956 \n\nD.  H.  Wolpert,  K.  Tumer and J.  Frank \n\nIn terms of the COIN formalism,  we  identified the neurons \"I  as  individual pairs of \nrouters  and  ultimate  destinations.  So  ~17,t  was  the  vector  of traffic  sent  along  all \nlinks exiting rJ's router, tagged for rJ's  ultimate destination, at time t.  Each subworld \nconsisted of the set all  neurons that shared a  particular ultimate destination. \n\n-17,t \n\nIn  the  SPA  each  node  \"I  tries  to  set  ( \nto  minimize  the  sum  of  the  delays  to \nbe  accrued  by  that  traffic  on  the  way  to its  ultimate  destination.  In  contrast,  in \na  COIN  \"I  tries  to  set  ~17,t  to  optimize  gw  for  the  subworld  w  containing  \"I.  For \nboth algorithms,  \"full  knowledge\"  means that at time t  all of the routers  know  the \nwindow-averaged loads for  all  routers for  time t - 1,  and assume that those values \nwill  be the same at t.  For large enough L,  this assumption will  be arbitrarily good, \nand  therefore will  allow  the routers  to make  arbitrarily  accurate estimates  of how \nbest to route their traffic, according to their respective routing criteria. \n\nIn  contrast, having limited  knowledge,  the MB  COIN could only  predict the WLU \nvalue resulting from each routing decision.  More precisely, for each router-ultimate(cid:173)\ndestination  pair,  the associated microlearner estimates the map from  traffic  on  all \noutgoing links  (the inputs)  to WLU-based  reward  (the outputs - see  below).  This \nwas  done  with  a  single-nearest-neighbor algorithm.  Next,  each  router  could  send \nthe packets along the path that results in outbound traffic with the best (estimated) \nreward.  However to be conservative, in these experiments we instead had the router \nrandomly select between that path and the path selected by  the FK SPA. \nThe  load  at  router  r  at  time  t  is  determined  by  (.  Accordingly,  we  can  encap(cid:173)\nsulate  the  load-to-delay  functions  at  the  nodes  by  writing  the  delay  at  node  r \nat  time  t  as  Wr,t(O. \nIn  our  experiments  world  utility  was  the  total  delay,  i.e., \nG(~) =  2:r,t Wr,t(~).  So using the WLU, gw(~) =  2:r,t ~w,r,t(~), where  ~w,r,t(~) = \n[Wr,t(()  - Wr,t(CLw(())].  At each time  t,  the MB  COIN  used 2:r ~w,r,t(O as  the \n\"WLU--=-based\"  reward-signal for  trying optimize this full  WLU. \n\n-\n\nIn  the  MB  COIN,  evaluating this  reward  in  a  decentralized  fashion  was  straight(cid:173)\nforward.  All packets have a header containing a running sum of the ~'s encountered \nin  all  the  routers  it  has  traversed so far.  Each  ultimate  destination sums  all  such \nheaders it  received and echoes  that sum back to all  routers  that had routed to it. \nIn this way each neuron is  apprised of the WLU-based reward of its subworld. \n\n4  EXPERIMENTAL RESULTS \n\nThe  networks  discussed  above  were  tested  under  light,  medium  and  heavy  traffic \nloads.  Table 1 shows  the associated destinations  (cf.  fig.  1). \n\nTable  1:  Source- Destination Pairings for  the Three Traffic  Loads \nNetwork  I Source  II  Dest.  (Light)  I Dest.  (Medium)  I Dest.  (Heavy) \n\nA \n\nB \n\n4 \n5 \n4 \n5 \n\n6 \n7 \n7,8 \n6,9 \n\n6,7 \n7 \n\n7,8,9 \n6,7,9 \n\n6,7 \n6,7 \n\n6,7,8,9 \n6,7,8,9 \n\nIn our experiments one new packet was fed  to each source router at each time step. \nTable 2 reports the average total delay (i.e., average per packet time to traverse the \ntotal network)  in  each  of the traffic  regimes,  for  the shortest  path algorithm  with \nfull  knowledge, the COIN with full  knowledge, and the MB COIN. Each table entry \nis  based  on  50  runs  with  a  window  size  of 50,  and  the  errors  reported  are  errors \n\n\fUsing Collective Intelligence to Route Internet  Traffic \n\n957 \n\nin  the  meanl .  All  the entries in Table  2 are statistically  different  at the  .05  level, \nincluding FK SPA  vs.  MB  COIN for  Network A under light traffic  conditions. \n\nNetwork  II  Load \nlight \n\nFK SPA \n\nA \n\nB \n\nTable 2:  Average Total Delay \n\n0.53 \u00b1  .007  0.45 \u00b1  .001 \nmedium  1.26 \u00b1  .010  1.10 \u00b1  .001 \n1.93 \u00b1  .001 \nheavy \n1.92 \u00b1  .001 \nlight \nmedium  4.37  \u00b1  .014  3.96 \u00b1  .001 \n6.94  \u00b1  .015  6.35  \u00b1  .001 \nheavy \n\nI  FK COIN  MB  COIN \n0.50 \u00b1  .008 \n1.21  \u00b1  .009 \n2.06 \u00b1  .010 \n2.05 \u00b1  .010 \n4.19 \u00b1  .012 \n6.82  \u00b1  .024 \n\n2.17 \u00b1  .012 \n2.13  \u00b1  .012 \n\nTable  2  provides  two  important  observations:  First,  the  WLU-based  COIN  out(cid:173)\nperformed  the  SPA  when  both  have  full  knowledge,  thereby  demonstrating  the \nsuperiority  of the  new  routing strategy.  By  not  having  its  routers  greedily  strive \nfor  the  shortest  paths  for  their  packets,  the  COIN  settles  into  a  more  desirable \nstate that reduces  the average  total delay  for  all  packets.  Second,  even  when  the \nWLU  is  estimated through a  memory-based learner  (using  only  information avail(cid:173)\nable to the local  routers),  the performance of the COIN  still surpasses that of the \nFK SPA. This result not only establishes the feasibility of COIN-based routers, but \nalso demonstrates that for  this task COINs will outperform any algorithm that can \nonly  estimate the shortest path,  since  the performance of the FK SPA  is  a  ceiling \non the performance of any such RL-based SPA. \n\nFigure  2  shows  how  total  delay  varies  with  time  for  the  medium  traffic  regime \n(each plot is  based on 50 runs).  The  \"ringing\"  is  an artifact caused by the starting \nconditions and the window size (50).  Note that for both networks the FK COIN not \nonly provides the shortest delays,  but also settles into that solution very rapidly. \n\ni \n\"\" ~ \nCl. \niii \nQ. \n>-cu \na; \n0 \nCij \n\n~ \n\n1.4 \n\n1.35 \n\n1.3 \n\n1.25 \n\n1.2 \n\n1.15 \n\n1.1 \n\n1.05 \n\n1 \n\n0 \n\nFKSPA  0+(cid:173)\nFKCOIN  -+ --(cid:173)\nMBCOIN  \" 0 \"\" \n\nFKSPA  0+(cid:173)\nFKCOIN  + -_. \nMBCOIN\u00b7 \u00b7 \n\n4.6 \n\n4.5 \n\n4.4 \n\n4.3 \n\n42 \n\n4.1 \n\n4 \n\n3.9 \n\n3.8 \n\ni \n~ \nCl. ... Q) \nQ. \n~ \na; \n0 \n~ t-\n\n100 \n\n200 \n300 \nUnit Time Steps \n\n400 \n\n500 \n\n3. 7 '---\"---'-----'-----'---,--'-\"--.......... -'----'-~ \nISO  200' 250  300  350  400  450  SOO \n\no  SO  100 \n\nUnit Time Steps \n\n(a)  Network  A \n\n(b)  Network B \n\nFigure 2:  Total Delay. \n\n5  DISCUSSION \n\nMany distributed  computational tasks  are naturally addressed as  recurrent neural \nnetworks ofreinforcement learning algorithms  (i.e.,  COINs) .  The difficulty in doing \nso is  ensuring that,  despite the absence of centralized communication and control, \n\nIThe  results  are  qualitatively  identical  for  window  sizes  20  and  100  along  with  total \n\ntimesteps of 100  and 500. \n\n\f958 \n\nD.  H.  Wolpert,  K.  Turner and J.  Frank \n\nthe reward functions of the separate neurons work in synchrony to foster good global \nperformance, rather than cause their associated neurons to work at cross-purposes. \n\nThe mathematical framework  synopsized  in  this  paper is  a  theoretical  solution  to \nthis difficulty.  To assess its real-world applicability, we employed it to design a full(cid:173)\nknowledge  (FK)  COIN as well  as  a  memory-based  (RL-based)  COIN, for  the task \nof packet routing on a  network.  We  compared the performance of those algorithms \nto that of a FK shortest-path algorithm (SPA).  Not only did the FK COIN beat the \nFK SPA,  but also the memory-based COIN, despite having only limited knowledge, \nbeat the full-knowledge  SPA.  This  latter result  is  all  the more  remarkable in that \nthe performance of the FK SPA is an upper bound on the performance of previously \ninvestigated RL-based routing schemes, which use the RL to try to provide accurate \nknowledge to an SPA. \n\nThere  are  many  directions  for  future  work  on  COINs,  even  restricting  attention \nto  domain  of  packet  routing.  Within  that  particular  domain,  currently  we  are \nextending our experiments to larger networks, using industrial event-driven network \nsimulators.  Concurrently, we  are investigating the use of macrolearning for  COIN(cid:173)\nbased packet-routing, i.e., the run-time modification of the neurons' utility functions \nto improve the subworld-factoredness of the COIN. \n\nReferences \n\n[1]  C.  G.  Atkenson,  A.  W.  Moore,  and  S.  Schaal.  Locally  weighted  learning. \n\nArtificial Intelligence  Review,  Submitted,  1996. \n\n[2]  E.  Baum.  Manifesto  for  an evolutionary economics  of intelligence.  In  C.  M. \nBishop, editor,  Neural Networks  and Machine  Learning. Springer-Verlag, 1998. \n\n[3]  D.  Bertsekas and R.  Gallager.  Data  Networks.  Prentice Hall,  NJ,  1992. \n[4]  J.  Boyan and M.  Littman.  Packet routing in  dynamically  changing networks: \n\nA  reinforcement  learning  approach.  In  Advances  in  Neural  Information  Pro(cid:173)\ncessing  Systems  - 6,  pages  671-678. Morgan Kaufmann,  1994. \n\n[5]  S.  P.  M.  Choi and D.  Y.  Yeung.  Predictive Q-routing:  A  memory based rein(cid:173)\n\nforcement learning approach to adaptive traffic control.  In  Advances in Neural \nInformation  Processing  Systems  - 8,  pages 945-951. MIT Press,  1996. \n\n[6]  D.  Fudenberg and J. Tirole.  Game  Theory.  MIT Press, Cambridge, MA,  1991. \n[7]  G.  Hardin.  The tragedy of the commons.  Science,  162:1243-1248,1968. \n[8]  Y.  A.  Korilis,  A.  A.  Lazar,  and  A.  Orda.  Achieving  network  optima  using \nStackelberg routing strategies.  IEEE Tran.  on Networking,  5(1):161-173, 1997. \n[9]  P.  Marbach, O.  Mihatsch, M.  Schulte, and J. Tsisiklis.  Reinforcement learning \nfor  call  admission  control and routing in  integrated service networks.  In  Adv. \nin  Neural  Info.  Proc.  Systems  - 10,  pages 922-928. MIT Press,  1998. \n\n[10]  D.  Subramanian, P.  Druschel,  and J.  Chen.  Ants and reinforcement learning: \nA  case study in  routing in  dynamic  networks.  In  Proceedings  of the  Fifteenth \nInternational  Conference  on Artificial Intelligence,  pages 832-838, 1997. \n\n[11]  D.  Wolpert and K. Tumer.  Collective Intelligence.  In J.  M.  Bradshaw, editor, \n\nHandbook  of Agent technology.  AAAI Press/MIT Press,  1999.  to appear. \n\n[12]  D.  Wolpert,  K.  Wheeler,  and  K.  Tumer.  Automated  design  of  multi-agent \nsystems.  In Proc.  of the 3rd Int.  Conf.  of Autonomous Agents, 1999.  to appear. \n[13]  D.  Wolpert,  K.  Wheeler,  and K. Tumer.  Collective intelligence for  distributed \n\ncontrol.  1999.  (pre-print). \n\n\fPART IX \n\nCONTROL, NAVIGATION AND PLANNING \n\n\f\f", "award": [], "sourceid": 1591, "authors": [{"given_name": "David", "family_name": "Wolpert", "institution": null}, {"given_name": "Kagan", "family_name": "Tumer", "institution": null}, {"given_name": "Jeremy", "family_name": "Frank", "institution": null}]}