{"title": "On the Computational Utility of Consciousness", "book": "Advances in Neural Information Processing Systems", "page_first": 11, "page_last": 18, "abstract": null, "full_text": "On the Computational Utility  of \n\nConsciousness \n\nDonald W.  Mathis  and Michael C.  Mozer \nmathis@cs.colorado.edu,  mozer@cs.colorado.edu \n\nDepartment of Computer Science  and Institute of Cognitive  Science \n\nUniversity of Colorado,  Boulder \n\nBoulder,  CO  80309-0430 \n\nAbstract \n\nWe  propose  a  computational  framework  for  understanding  and \nmodeling  human consciousness.  This  framework  integrates  many \nexisting theoretical perspectives,  yet is sufficiently concrete to allow \nsimulation experiments.  We do not attempt to explain qualia (sub(cid:173)\njective  experience),  but  instead  ask  what  differences  exist  within \nthe cognitive information processing  system  when a  person is  con(cid:173)\nscious  of mentally-represented  information versus  when that infor(cid:173)\nmation is  unconscious.  The central idea we explore is  that the con(cid:173)\ntents  of consciousness  correspond  to  temporally  persistent  states \nin a  network of computational modules.  Three simulations are de(cid:173)\nscribed  illustrating  that  the  behavior  of  persistent  states  in  the \nmodels  corresponds  roughly  to  the  behavior  of  conscious  states \npeople  experience when  performing similar tasks.  Our simulations \nshow that periodic  settling to persistent  (i.e.,  conscious)  states im(cid:173)\nproves  performance  by cleaning  up inaccuracies  and noise,  forcing \ndecisions,  and helping  keep the system on track  toward a  solution. \n\n1 \n\nINTRODUCTION \n\nWe  propose a  computational framework for  understanding and modeling conscious(cid:173)\nness.  Though our ultimate goal is  to explain  psychological  and brain imaging data \nwith our theory, and to make testable predictions, here we simply present the frame(cid:173)\nwork in the context of previous experimental and theoretical  work,  and argue  that \n\n\f12 \n\nDonald  Mathis,  Michael  C.  Mozer \n\nit  is  sensible  from  a  computational  perspective.  We  do  not  attempt  to  explain \nqualia-subjective experience  and feelings.  It is  not clear  that qualia are amenable \nto scientific  investigation.  Rather, our aim is  to understand the mechanisms under(cid:173)\nlying awareness,  and their  role in cognition.  We address  three key  questions: \n\n\u2022  What are the  preconditions for  a  mental representation  to  reach  consciousness? \n\n\u2022  What are the computational consequences  of a  representation reaching conscious(cid:173)\nness?  Does a  conscious state affect  processing differently than an unconscious state? \n\n\u2022  What is  the computational utility  of consciousness?  That is,  what is  the compu(cid:173)\ntational role of the mechanism(s)  underlying  consciousness? \n\n2  THEORETICAL FRAMEWORK \n\nModular  Cognitive  Architecture.  We  propose  that  the  human  cognitive  ar(cid:173)\nchitecture  consists  of a  set of functionally  specialized  computational modules  (e.g., \nFodor,  1983).  We imagine the modules  to be organized at a  somewhat coarse level \nand  to  implement  processes  such  as  visual  object  recognition,  visual  word-form \nrecognition,  auditory  word and sound recognition,  computation of spatial relation(cid:173)\nships, activation of semantic representations of words, sentences,  and visual objects, \nconstruction  of motor  plans,  etc.  Cognitive  behaviors  require  the  coordination  of \nmany modules.  For example,  functional  brain imaging studies  indicate  that  there \nare  several  brain  areas  used  for  different  subtasks  during  cognitive  tasks  such  as \nword  recognition  (Posner  & Carr,  1992). \n\nModules  Have  Mapping  And  Cleanup  Processes.  We  propose  that  mod(cid:173)\nules  perform  an associative  memory  function  in  their  domain,  and  operate  via  a \ntwo-stage  process:  a  fast,  essentially  feedforward  input-output  mapping1  followed \nby  a  slower  relaxation  search  (Figure  1).  The  computational justification  for  this \ntwo  stage  process  is  as follows.  We  assume  that, in general,  the  output space  of a \nmodule can represent  a large number of states relative to the number of states that \nare meaningful or  well formed-i.e.,  states that are interpretable  by other modules \nor  (for  output  modules)  that  correspond  to  sensible  motor primitives.  If we  know \nwhich  representations  are  well-formed,  we  can  tolerate  an inaccurate  feedforward \nmapping,  and  \"clean  up\"  noise  in  the  output  by  constraining  it  to  be  one  of the \nwell-formed  states.  This  is  the  purpose  of the  relaxation  step:  to  clean  up  the \noutput of the feedforward  step,  resulting  in a  well-formed  state.  The cleanup  pro(cid:173)\ncess  knows  nothing  about  which  output  state  is  the  best  response  to  the  input; \nit acts  solely  to enforce  well-formedness.  Similar  architectures  have  been  used  re(cid:173)\ncently to model various neuropsychological data (Hinton & Shallice,  1991;  Mozer & \nBehrmann,  1990;  Plaut & Shallice,  1993).  The empirical motivation for  identifying \nconsciousness  with  the  results  of relaxation  search  comes  from  studies  indicating \nthat the contents  of consciousness  tend to be  coherent,  or well-formed  (e.g.,  Baars, \n1988; Crick,  1994;  Damasio,  1989). \n\nPersistent  States  Enter Consciousness.  In our  model,  module  outputs  enter \nconsciousness if they persist for a sufficiently long time.  What counts as long enough \n\nlWe  do  not  propose  that  this  process  is feedforward  at  the  neural  level.  Rather,  we \nmean that any iterative refinement of the output over  time is unimportant and irrelevant. \n\n\fOn  the  Computational  Utility  of Consciousness \n\n13 \n\nrelaxation search \n\n.00 \u2022 \u2022  \n\nfeedforward mapping \n\nFigure  1:  Modules  consist of two components. \n\nis  not yet  determined,  but in order to model specific  psychological  data, we  will  be \nrequired  to make  this  issue  precise.  At  that  time a  specific  commitment will  need \nto be made, and this commitment must be maintained when modeling further data. \n\nAn  important  property  of our model  is  that  there is  no hierarchy  of modules  with \nrespect to awareness, in contrast to several existing theories that propose that access \nto some particular  module  (or  neural processing  area) is  required  for  consciousness \n(e.g.,  Baars, 1988).  Rather, information in any module reaches awareness simply by \npersisting long enough.  The persistence  hypothesis is consistent with the theoretical \nperspectives  of Smolensky  (1988),  Rumelhart  et al  (1986),  Damasio  (1989),  Crick \nand  Koch  (1990),  and others. \n\n2.1  WHEN  ARE  MENTAL  STATES  CONSCIOUS? \n\nIn our framework,  the output of any module will enter consciousness if it persists in \ntime.  The persistence  of an output state of a  module is  assured  if:  (1)  it is  a  point \nattractor  of the  relaxation  search  (i.e.,  a  well-formed  state),  and  (2)  the inputs  to \nthe  module are  relatively  constant, i.e., they  continue  to be mapped into the same \nattractor basin. \n\nWhile  our framework  appears  to make strong claims  about  the  necessary  and  suf(cid:173)\nficient  conditions  for  consciousness,  without  an exact  specification  of the  modules \nforming  the  cognitive  architecture,  it  is  lacking  as  a  rigorous,  testable  theory.  A \ncomplete  theory  will  require  not  only  a  specification  of the  modules,  but  will  also \nhave  to  avoid  arbitrariness  in  claiming  that  certain  cognitive  operations  or  brain \nregions  are  modules  while  other  are  not.  Ultimately,  one  must  identify  the  neu(cid:173)\nrophysiological  and  neuroanatomical  properties  of the  brain  that  determine  the \nmodule  boundaries  (see  Crick,  1994, for  a  promising step in this  regard). \n\n3  COMPUTATIONAL  UTILITY  OF  CONSCIOUSNESS \n\nFor the moment, suppose that our framework provides a  sensible  account of aware(cid:173)\nness  phenomena  (demonstrating  this  is  the  goal of ongoing  work.)  If one accepts \nthis,  and hence  the notion that a  cleanup process and the resulting  persistent  states \nare  required  for  awareness,  questions  about  the  role  of cleanup  in  the  model  be(cid:173)\ncome  quite  interesting  because  they  are  equivalent  to  questions  about  the  role  of \nthe  mechanism  underlying  awareness  in  cognition.  One  question  one  might  ask  is \nwhether there is  computational utility to achieving conscious states.  That is,  does a \nsystem that achieves  persistent  states perform  better than a  system  that does not? \n\n\f14 \n\nDonald  Mathis,  Michael  C.  Mozer \n\nDoes a  system that encourages settling to well-formed  states perform better than a \nsystem that does  not?  We  now show that the answer  to this  question is  yes. \n\n3.1  ADDITION  SIMULATION \n\nTo examine the utility of cleanup, we trained a module to perform a simple multistep \ncognitive  task:  adding  a  pair of two-digit  numbers  in  three  steps. 2  We  tested  the \nsystem  with and without cleanup and compared the generalization  performance. \n\nThe network architecture  (Figure 2)  consists of a single module.  The inputs consist \nof the  problem  statement  and  the  current  partial  solution-state.  The  output  is \nan  updated  solution-state.  The  module's  output  feeds  back  into  its  input.  The \nproblem  statement is  represented  by four  pools  of units,  one for  each digit  of each \noperand, where each pool uses a  local encoding of digits.  Partial solution states are \nrepresented  by five  pools,  one for  each of the three result  digits and one for  each of \nthe two carry  digits. \n\nProjection \n\nInput-ot,rtpu \nt \nmapping \n\n.......... \n\n( \n\n(copy)  ~ if) \n~ ~ jU \n+ \n\nleany tI \n\nI result 31 \n\n1 result 11 \n\nIresult21 \n\nleany21 \n\nI \n\nhidden units \n\nI \n\n(copy) \n\n.,/ \n\n~[~E::~:::][:iIl[iI] Iresult111resutt 211result311eany 111eany 21 \n\n\"-\n\n\" \n\nFigure 2:  Network architecture for  the addition task \n\nEach addition problem was decomposed into three steps (Figure 3), each describing \na  transformation from one partial solution state  to the next,  and the mapping net \nwas  trained  perform each transformation individually. \n\n?  ? \n\n48 \n+  62 \n??? \n\nstep 1 \n\n--.. \n\n?  1 \n\n48 \n+  62 \n??O \n\nstep 2 \n\n--.. \n\n1  1 \n\n48 \n+  62 \n?10 \n\nstep 3 \n\n--.. \n\n1  1 \n\n48 \n+  62 \n11 0 \n\nFigure  3:  The sequence  of steps in an example addition problem \n\nStep 1  Given the problem statement, activate the rightmost  result digit and right(cid:173)\n\nmost  carry digit  (comprising  the first  partial solution). \n\n2 Of course, we don't believe that there is a brain module dedicated to addition problems. \n\nThis choice  was  made because addition is  an intuitive example of a multistep task. \n\n\fOn  the  Computational  Utility  of Consciousness \n\n15 \n\nStep 2  Given  the  first  partial  solution,  activate  the  next  result  and  carry  digits \n\n(second  partial solution). \n\nStep 3  Given  the  second  partial  solution,  activate  the  leftmost  result  digit  (final \n\nsolution). \n\nThe set of well-formed states in this  domain consists  of all possible  combinations of \ndigits  and  \"don't knows\"  among the pools  (\"don't knows\"  are denoted by question \nmarks in  Figure  3).  Local  representations  of digits  are  used  within  each  pool,  and \n\"don't  knows\"  are  represented  by  the  state in  which  no  unit  is  active.  Thus,  the \nset  of well-formed  states  are  those  in  which  either  one  or  no  units  are  active  in \neach  pool.  To  make  these  states  attractors  of the  cleanup  net,  the  connections \nwere  hand-wired such that each  pool was a  winner-take-all pool with an additional \nattractor at the zero  state. \n\nTo  run  the  net,  a  problem  statement  pattern  is  clamped  on  the  input  units,  and \nthe net is  allowed  to update for  200  iterations.  Unit  activities  were  updated using \nan incremental rule approximating continuous dynamics: \n\nai(t) = TI(L: Wijaj(t  - 1)) + (1  - T)ai(t - 1) \n\nj \n\nwhere  ai(t)  is  the  activity  of unit  i  at  time  t,  T  is  a  time  constant  in  the interval \n[0,1]' and 10 is  the usual sigmoid  squashing function. \nFigure  4  shows  the  average  generalization  performance  of networks  run  with  and \nwithout  cleanup,  as  a  function  of training  set  size.  Note  that,  in  principle,  it  is \nnot  necessary  for  the  system  to  have  a  cleanup  process  to  learn  the  training  set \nperfectly,  or to generalize  perfectly.  Thus, it is  not simply the case that no solutions \nexist without cleanup.  The generalization results were  that for any size  training set, \npercent correct on the generalization set is always better with cleanup than without. \nThis indicates  that although the mapping network often generalizes  incorrectly,  the \noutput  pattern  often  falls  within  the  correct  attractor  basin.  This  is  especially \nbeneficial in multistep tasks because cleanup can correct the inaccuracies introduced \nby  the mapping network,  preventing  the system from  gradually diverging from  the \ndesired  trajectory. \n\nProjection \n\n-\n=  No projection \n\ntraining set size \n(% of all problems) \n\nFigure  4:  Cleanup improves  generalization performance. \n\nFigure  5  shows  an  example  run  of a  trained  network.  There  is  one  curve  for  each \nof the five  result  and carry  pools,  showing  the degree  of \"activity\"  of the ultimate \ntarget  pattern,  t,  for  that  pool  as  a  function  of time.  Activity  is  defined  to  be \ne-lit-aIl2  where  a  is  the current  activity  pattern and t  is  the target.  The network \n\n\f/6 \n\nDonald  Mathis,  Michael  C.  Mozer \n\nsolves  the  problem  by  passing  though the  correct  sequence  of intermediate  states, \neach  of which are temporarily  persistent.  This resembles  the sequence  of conscious \nstates a person might experience while performing this task; each step of the problem \nis  performed  by  an unconscious  process,  and  the  results  of each  of step  appear in \nconscious  awareness. \n\n1.0r----=====~------_\"\"::--_~\"\"-:-\"\"\"\"\"'----------, \n\nActivation of \ntarget pattern \nin each pool \n\n.8 \n\n.6 \n\n.' \n\n,:' :'\" \n: \n.'  \" \n\" \" \n...  :' \n,:  ,'. \n\nresult digit  I, \n- - carry digit  I \n\nresult digit 2, \n----- carrydigit2 \n\n..........  result digit 3 \n\n- - - - - - - - - - ---\".:!/ \n~~~~IO~~~~2~O~--~3~O---~4~O---~50~--~ TIme \n\nFigure 5:  Network solving the addition task in three steps \n\n3.2  CHOICE  POINT SIMULATION \n\nIn  many  ordinary  situations,  people  are  required  to  make  decisions,  e.g.,  drive \nstraight  through  an  intersection  or  turn  left,  order  macaroni  or  a  sandwich  for \nlunch.  At these choice  points,  any of the alternative actions are reasonable a  priori. \nContextual  information  determines  which  action  is  correct,  e.g.,  whether  you  are \ntrying  to  drive  to  work  or  to  the  supermarket.  Conscious  decision  making  often \noccurs at these  choice  points,  except  when  the task is  overlearned  (Mandler,  1975). \n\nWe  modeled  a  simple  form  of a  choice  point  situation.  We  trained  a  module  to \noutput  sequences  of states,  e.g.,  ABCD  or  EFGH,  where  states  were  represented  by \nunique  activity  patterns  over  a  set  of units.  If the  sequences  shared  no  elements, \nthen  presenting  the first  element  of any sequence  would  be sufficient  to regenerate \nthe sequence.  But when sequences  overlap,  choice  points are created.  For example, \nwith  the sequences  ABCD  and AEFG,  state A can be followed  by either B or E. \n\nWe  show  that  cleanup  allows  the module  to  make a  decision  and  complete  one  of \nthe  two  sequences.  Figure  6  shows  the operation  of the  module  with and  without \ncleanup following  presentation of an A after training on the sequence  pair ABCD  and \nAEFG.  There is one curve for each state, showing the activation of that state (defined \nas before),  as a  function  of time.  When the network is  run  with cleanup,  although \nboth states Band E are initially  partially  activated,  the cleanup  process  maps this \nill-formed  state to state B,  and the network  then correctly  completes  the sequence \nABCD.  Without cleanup,  the initial activation of states Band E causes  a  blending of \nthe two sequences  ABCD  and AEFG  and the state degenerates. 3 \n\nAlthough  the  arithmetic  and  choice  point  tasks  seem  simple  in  part  because  we \npredefined  the set  of well-formed  states.  However,  because  the  architecture  segre-\n\n3In  this  simulation,  we  are  not  modeling  the  role  of context  in helping  to  select  one \nsequence or  another;  we  are  simply assuming  that either sequence  is  valid in the  current \ncontext.  The nature of the model  does  not change  when  we  consider  context.  Assuming \nthat  the  domain is  not  highly  overlearned,  the context  will not  strongly  evoke  one  alter(cid:173)\nnative  action  or  the  other  in  the  feedforward  mapping,  leading  to  partial activation  of \nmultiple states, and  the cleanup  process  will be  needed  to force  a  decision. \n\n\fOn  the  Computational  Utility  of Consciollsness \n\n17 \n\nActivity \nof state  0 .\u2022 \n\no. \n\n-\n\nstateB \n__ state C \n\n- - - -state D \n........  state E \n\nFigure 6:  Decision-point  task with and without cleanup \n\ngates knowledge of well-formed ness from knowledge of how to solve problems in the \ndomain,  well-formedness  could  be  learned  simultaneously  with,  or  prior  to  learn(cid:173)\ning  the  task.  One  could  imagine  training  the  cleanup  network  to  autoassociate \nstates it observes in  the domain before or during training using an unsupervised  or \nself-supervised  procedure. \n\n4  COMPUTATIONAL  CONSEQUENCES  OF \n\nPERSISTENT STATES \n\nIn a  network of modules,  a  persistent  well-formed state in one module exerts larger \ninfluences  on the state of other modules than do transient or ill-formed states.  As a \nresult  the dynamics of the system tends to  be dominated by well-formed  persistent \nstates.  We show  this in a  final  simulation. \n\nThe network consisted  of two modules,  A and B,  connected  in a  simple feedforward \ncascade.  Each  module's  cleanup  net  was  trained  to  have  ten  attractors,  locally \nrepresented  in  a  winner-take-all  pool of units.  The  mapping  network  of module B \nwas  trained  to  map  the  attractors  of module  A to  attractors in  B in  a  one-to-one \nfashion.  Thus,  state a1  in module A is  mapped to {31  in module A,  a2  to {32,  etc. \n\nModule B was initialized  to a  well-formed state {31,  and the output state of module \nA was  varied,  creating  three  conditions.  In  the  persistent  well-formed condition, \nmodule A was clamped in the well-formed state a2 for 50 time steps.  In the transient \nwell-formed condition, module A was clamped in state a2 for only 30 time steps.  And \nin  the  ill-formed condition,  module  A was  clamped in  an ill-formed  state in  which \ntwo  states,  a2  and  a3,  were  both  partially  active.  Figure  7  shows  the  subsequent \nactivation  of state {32  in  module  B as  a  function  of time.  Module  B undergoes  a \ntransition  from  state {31  to  state {32  only  in  the  persistent  well-formed  condition. \nThis  indicates  that the  conjunction  of well-formedness  and  persistence  is  required \nto effect  a  transition from one state to another. \n\n5  CONCLUSIONS \n\nOur computational framework and simulation results  suggest the following  answers \nto our three  key  questions: \n\n\f18 \n\nActivity of \nstate ~2 in \nmodule B \n\n0' \n\nDonald  Mathis,  Michael  C.  Mozer \n\nwell-formed, persistent \nstate <X2  in module A \n\nwell-formed, transient \nstate <X2  in module A \n\nill-formed, persistent \nstate in module A \n\nFigure 7:  Well-formeclness and persistence  are  both required for  attractor transitions. \n\n\u2022  In  order to reach consciousness,  the output of a  module must be both per(cid:173)\n\nsistent  and semantically  well-formed,  and must not initiate  an overlearned \nprocess. \n\n\u2022  The  computational  consequences  of conscious  (persistent)  representations \n\ninclude  exerting  larger  influences  on the  cognitive  system,  resulting  in  in(cid:173)\ncreased  ability to drive  response  processes  such  as verbal report. \n\n\u2022  The computational utility of consciousness in our model lies in the ability of \ncleanup  to  \"focus\"  cognition,  by  keeping  the system  close  to states  which \nare  semantically  meaningful.  Because  the  system  has  learned  to  process \nsuch states,  performance is  improved. \n\nReferences \nBaars, B.  J. (1988)  A  Cognitive  Theory of Con8ciou&ne8ll,  Cambridge University  Press. \nCrick, F. (1994)  The astonishing hypothesis:  The scientific search for  the soul.  Scribner. \nCrick,  F.,  &  Koch,  C.  (1990)  Towards  a  neurobiological  theory  of consciousness.  Sem. \nNeuro.,  2:  263-275 \nDamasio, A.  (1989)  The brain binds  entities  and events  by multiregional activation from \nconvergence  zones.  Neural  Computation, 1,  123-132 \nFodor,  J. A.  (1983)  The  modularity of mind:  An e8llay  on faculty p8ychology.  Cambridge, \nMA:  MIT Press. \nHinton,  G.  E.,  &  Shallice,  T.  (1991)  Lesioning  an  attractor  network:  Investigations  of \nacquired  dyslexia.,  P8ych.  Rev., 98:  74-95 \nMandler,  G.  (1975)  Consciousness:  Respectable,  useful and probably  necessary.  In Infor(cid:173)\nmation Processing and Cognition, The Loyola  Symposium, R.  Solso  (Ed.).  Erlbaum. \nMozer, M.  C., &  Behrmann, M.  (1990).  On the interaction of spatial attention and lexical \nknowledge:  A connectionist account of neglect dyslexia.  Cognitive Neuro8Cience, 2,  96-123. \nPlaut, D.  C., &  Shallice, T. (1993)  Perseverative and semantic influences  on visual object \nnaming errors in optic aphasia:  A  connectionist  account.  J.  Cog.  Neuro.,5(1):  89-117 \nPosner, M.  1.,  &  Carr, T. (1992)  Lexical access  and the brain:  Anatomical constraints on \ncognitive models of word recognition.  American Journal of P8ychology, 105(1):  1-26 \nRumelhart,  D.  E.,  Smolensky,  P.,  McClelland,  J.  L.,  &  Hinton,  G.  E.  (1986)  Schemata \nand  sequential  thought  in PDP models.  In J.  L.  McClelland  &  D.  E.  Rumelhart  (Eds.), \nParallel Di8tributed Proceuing, Vol.  2.  Cambridge, MA:  MIT Press. \nSmolensky,  P.  (1988)  On the  proper  treatment of connectionism.  Brain  Behav.  Sci.,  11: \n1-74 \n\n\f", "award": [], "sourceid": 983, "authors": [{"given_name": "Donald", "family_name": "Mathis", "institution": null}, {"given_name": "Michael", "family_name": "Mozer", "institution": null}]}