{"title": "A Multiscale Adaptive Network Model of Motion Computation in Primates", "book": "Advances in Neural Information Processing Systems", "page_first": 349, "page_last": 355, "abstract": null, "full_text": "A  Multiscale  Adaptive  Network  Model  of \n\nMotion  Computation  in  Primates \n\nH.  Taichi  Wang \nScience Center, A18 \nRockwell International \n1049 Camino Dos Rios \nThousand Oaks,  CA 91360 \n\nDimal  Mathur \nScience Center, A 7 A \nRockwell International \n1049 Camino Dos Rios \nThousand Oaks,  CA 91360 \n\nChristor  Koch \nComputation & Neural Systems \nCaltech,216-76 \nPasadena, CA 91125 \n\nAbstract \n\nWe  demonstrate  a  multiscale  adaptive  network  model  of motion \ncomputation in primate area MT. The model consists of two stages:  (l) \nlocal velocities are measured across multiple spatio-temporal channels, \nand (2) the optical flow  field  is computed by a  network of direction(cid:173)\nselective neurons  at multiple  spatial  resolutions.  This model  embeds \nthe computational efficiency of Multigrid algorithms within a parallel \nnetwork as well as adaptively  computes  the most reliable estimate of \nthe flow  field across different spatial scales. Our model neurons show \nthe same nonclassical receptive field properties as Allman's type I MT \nneurons.  Since local velocities are measured across multiple channels, \nvarious  channels  often  provide  conflicting  measurements  to  the \nnetwork. We have incorporated a  veto scheme for conflict resolution. \nThis mechanism provides a novel explanation for the spatial frequency \ndependency of the psychophysical phenomenon called Motion Capture. \n\n1  MOTIVATION \nWe previously developed a two-stage model of motion computation in the visual system \nof primates  (Le.  magnocellular pathway from  retina to  V1  and  MT;  Wang,  Mathur & \nKoch, 1989). This algorithm has these deficiencies:  (1) the issue of optimal spatial scale \nfor  velocity measurement, and (2)  the issue optimal spatial scale for  the smoothness of \nmotion field.  To address these deficiencies, we have implemented a multi-scale motion \nnetwork based on multigrid algorithms. \nAll  methods of estimating optical flow  make a basic assumption about the scale of the \nvelocity relative to the  spatial  neighborhood and to the  temporal  discretization step of \ndelay. Thus,  if the velocity of the pattern  is much larger than  the ratio of the spatial to \ntemporal  sampling step, an incorrect velocity  value will be obtained (Battiti, Amaldi  & \nKoch,  1991). Battiti et al. proposed a coarse-to-fine strategy for adaptively detennining \n\n349 \n\n\f350  Wang, Mathur, and Koch \n\nthe optimal discretization grid by evaluating the local estimate of the relative error in the \nflow field due to discretization. The optimal spatial grid is the one minimizing this error. \nThis  strategy  both  leads  to  a  superior  estimate  of the  optical  flow  field  as  well  as \nachieving the speedups associated with multigrid methods. This is important. given the \nlarge number of iterations needed for relaxation-based algorithms and the remarkable speed \nwith  which  humans  can  reliably estimate  velocity  (on  the order  of 10 neuronal  time \nconstants). \nOur previous model was based on the standard regularization approach. which involves \nsmoothing  with  weight  A..  This  parameter controls  the  smoothness  of the  computed \nmotion field. The scale over which the velocity field is smooth depends on the size of the \nobject The larger the object is. the larger the value of A. has to be. Since a real life vision \nsystem has to deal with objects of various sizes simultaneously. there does not exist an \n\"optimal\" smoothness parameter. Our network architecture allows us to circumvent this \nproblem by having the same smoothing weight A. at different resolution grids. \n\n2  NETWORK  ARCHITECTURE \nThe overall architecture of the two-stage model is shown in Figure  1. In the rust stage. \nlocal velocities are measured at multiple spatial resolutions. At each spatial resolution p. \nthe  local  velocities  are  represented  by a  set of direction-selective neurons.  u(ij.k.p). \nwhose preferred direction is in direction 8tc  (the Component cells;  Movshon. Adelson. \nGizzi & Newsome.  1985). In  the second stage. the optical flow  field  is computed by a \nnetwork  of direction-selective neurons  (pattern  cells)  at multiple  spatial  resolutions. \nv(ij.k.p). In the following. we briefly summarize the network. \nWe have used a multiresolution population coding: \n\nNor  Nru-l  1 \n\nV = L  L  n (: vf 81 \n\n, 1 \n\n1 \n\np:<O  p'.\" \n\nwhere Nor is the number of directions in each grid. Nres is the number of resolutions in \nthe network and I  is a 2-D linear interpolation operator (Brandt, 1982). \nIn our single resolution model. the input source. sO(ij.k). to a pattern cell v(ij.k) was: \n\n(1) \n\n(2) \n\nav(iJ.k) = so(ij.k) = L COS(81 - 8t') {u(ij.k~ - (u \u2022 V(iJ)} e(ij.k') \n\nat \n\nl' \n\nwhere u  is the the unit vector in the direction of local  velocity and e(ij.k') is the local \nedge strength.  For our multiscale  network.  we have  used  a  convergent multi-channel \nsource term. SO' to a pattern cell v(ij.k.p) is: \n\np  ~ n  p\" \n\np' \nSo  = ~  Rp\"_l  So \n\np \n\np'Sp p\"-p' \n\n(3) \n\nwhere R  is a  2-D restriction operator. We use the full  weighting operator instead of the \ninjection operator because of the sparse nature of the input data. \nThe computational efficiency of the multigrid algorithms chas been embedded  in our \nmultiresolution network by a set of spatial-fIltering synapses. SI' written as: \n\n\fA Multiscale Adaptive Network Model of Motion Computation in IHmates \n\n351 \n\nI~ \n\nv(~J, II, p) \n\nmulti resolution \n\nmotion field \n\nI(I,D \n\nretina \n\nU(~J,lI,p),E(I,J,\"'p) \n\nmultichannel normal \nvelocity measurement \n\nFigure 1. The network  architecture. \n\nIC-'''T \n\nFigure 2. A coarse-to-fine veto scheme. \n\n\f352  Wang, Mathur, and Koch \n\nsf = a R~lVP-l - fjI;'lRC+ 1 vp \n\u2022 \n\n(4) \n\nwhere a and ~ are constants. \nAs discussed in the section 1. the scale over which  the velocity field  is smooth depends \non the size of the object Consider. for example. an object of certain size is moving with \na  given  velocity across  the  field  of view.  The  multiresolution representation  and  the \nspatial  frequency  filtering  connections  will  force  the  velocity  field  to  be represented \nmostly by a few neurons whose resolution grid matches the size of the object Therefore, \nthe  smoothness  constraint  should  be  enforced  on  the  individual  resolution  grids.  If \nmembrane potential is used. the source for the smoothness term. S2' at resolution grid P. \ncan be written as: \nSf(ij,k) = A. L COS(aA; - at')  (v(i-1j,k',p) + v(i+1j,k',p) + v(ij-1,k',p) + v(ij+l.k',p) - 4v(ij,k',p)} \n\nk' \n\n(5) \n\nwhere A.  is the smoothness parameter. The smoothing weight A.  in our formulation is the \nsame for each grid and is independent of object sizes. \nThe network equation becomes, \n\naV(ij,k,p)  = S& + sf + sf \n. \n\nat \n\n(6) \n\nThe multiresolution  network architecture  has considerably more complicated  synaptic \nconnection pattern  but only  33%  more neurons as  compared  to  the  single  resolution \nmodel, the convergence is improved by about two orders of magnitude (as measured by \nnumbers of iterations needed). \n\n3  CONFLICT  RESOLUTION \nThe velocity estimated by our -- or any other motion algorithm -- depends on the spatial \n(Ax)  and  temporal  (At)  discretization  step  used.  Battiti  et ale  derived  the  following \nexpression for the relative error in velocity due to incorrect derivative estimation: \n\n6 = 14K.1 ==  21r2 [(Lixl- (u~l] \n\nu \n\n3~ \n\nm \n\nwhere u is the velocity, A.  is the spatial frequency  of the moving pattern. As velocity u \ndeviates from Ax=uAt, the velocity measurement become less accurate. The scaling factor \nin  (7)  depends  on  the  spatial  filtering  in  the  retina.  Therefore.  the  choice of spatial \ndiscretization and spatial filtering bandwidth have to satisfy the requirements of both the \nsampling theorem and the velocity measurement accuracy. Even though (7) was derived \nbased on the gradient model. we believe similar constraint applies to correlation models. \nWe model the receptive field profiles of primate retinal ganglion cells by the Laplacian-of(cid:173)\nGaussian  (LOG) operators. If we require  that the accuracy  of velocity  measurement be \nwithin 10% within u = 0 to u = 2 (Ax/At). then the standard deviation. a. of the Gaussian \nmust be greater or equal to Ux. \nWhat  happens  if velocity  measurement  at  various  scales  gives  inconsistent  results? \nConsider. for example. an object moving at a speed of 3 pixels/sec across the retina. As \n\n\fA Multiscale Adaptive Network Model of Motion Computation in IHmates \n\n353 \n\nshown in Figure 2, channels p=1  and p=2 will give the correct measurement, since it is in \nthe reliable ranges of these channels, as depicted by fIlled circles. The finest channel, p=O, \non the other hand will give an erroneous reading.  This suggests a  coarser-to-fine veto \nscheme  for  conflict  resolution.  We  have  incorporated  this  strategy  in  our  network \narchitecture by implementing a shunting term in Eq. (4). In this way, the erroneous input \nsignals from  the component cells at grid p=O are shunted out (the open circles in Figure \n2) by the component cells (the fIlled circles) at coarser grids. \n\n4  MOTION  CAPTURE \nHow does human visual system deal with the potential conflicts among various spatial \nchannels?  Is there any  evidence for  the use  of such a coarse-to-fine conflict resolution \nscheme? We believe that the well-known psychophysical phenomenon of Motion Capture \nis the manifestation of this strategy. \nWhen human subjects are presented a sequence of randomly moving random dots pattern, \nwe perceive random motion. Ramachandran and Anstis (1983) found, surprisingly, that \nour perception of it can  be greatly influenced by the movement of a  superimposed low \ncontrast, low spatial frequency grating. They found that the human subject has a tendency \nto perceive the random  dots  as  moving  with  the spatial grating, as if the random  dots \nadhere  to the grating. For a  given spatial  frequency  of the grating, the percentage of \ncapture is highest when the phase shift between frames of the grating is about 900. Even \nmore surprisingly, the lower the spatial frequency of the grating, the higher the percentage \nof capture. \nOther researchers (e.g.  Yuille & Grzywacz, 1988) and we have attempted to explain this \nphenomenon  based  on  the  smoothness  constraint  on  the  velocity  field.  However, \nsmoothness alone can not explain the dependencies on spatial frequency and the phase \nshift of the gratings. The coarser-to-fine shunting scheme provides a natural explanation \nof these dependencies. \nWe have  simulated  the  spatial  frequency  and phase shift dependency.  The results  are \nshown  in  Figure  3.  In  these  simUlations,  we  plotted  the  relative  uniformity  of the \nmotion-captured  velocity fields.  Uniformity of 1 signifies total capture.  As can be seen \nclearly, for a given spatial frequency, the effect of capture increases with phase shift, and \nfor a given phase shift, the effect of capture also increase as the spatial frequency become \nlower.  The lower spatial frequency gratings are more effective, because the coarser the \nchannels are, the more finer component cells can be effectively shunted out, as is clear \nfrom the receptive field relationship shown in Figure 2. \n\n5  NONCLASSICAL  RECEPTIVE  FIELD \nTraditionally, physiologists use isolated bars and slits to map out the classical receptive \nfields  (CRF)  of a  neuron  which  is  the  portion  of visual  field  that  can  be  directly \nstimulated.  Recently,  there is  mounting evidence  that  in  many  visual  neurons  stimuli \npresented outside the CRF strongly and selectively influence neural responses to stimuli \npresented within the CRF. This is tenned nonclassical receptive field. \nAllman, Miezin & McGuinness (1985) have found that the true receptive field  of more \nthan 90% of neurons in the middle temporal (MT) area extends well beyond their CRF. \nThe surrounds commonly  have directional  and  velocity-selectivity  influences that are \n\n\f354  Wang, Mathur, and Koch \n\n----= :. ........... _ .... ,....... \n.----\n, \n, \n, \n, \n, \n, \n, \n, \n, \n, \n\" \n,II' \n\n.' \n.' \n.' \n\" \n\" \n.' \n\" \n.; \n\n.,.,. \n.' \n.' \n~ \n\n. \n-----. \n\n- - - - lambda. 64 \n. -.-...... \nlambda_32 \nIambda_18 \n-- .... -\n\n100 \n\n90 \n\n80 \n\n70 \n\n60 \n\n\"-\n\n!::-'s .. .2 \n\n'c \n::J \n\n, , , , , \n\" \n, , , , , \n, , \n\" \n\n50+-..... ~ ..... -r ..... - .  ..... ~~ ..... --..... ~ ..... ~ ..... , \n\n20 \n\n40 \n\n80 \n\n80 \n\n100 \n\nSpatial  Phase  (Degree) \n\nFigure 3. Spatial  frequency  dependency  of Motion  Capture. \n\n- -0 - './h.doJ  tJ\" WUll \n\n\" rtfl1..I1\"~ I ypu  I NWJro, I \n\n.'  .. \n' . .. \n, \n\n.. \nf \u00b7\u00b7 ... \n\nIno \n\n75 \n\n50 \n\n25 \n\n0 \n\n.. \n~ \n'\" \nc \n.. .. \n0 \nQ. \nVI \n.. \n.~ ;; \n.. 0 \ne \nZ \n\n'0 \n\n.... :/ \n\ntil' \n.; \n\n...... \n. . -\" \n.. \n\u2022 r--....=., \u2022\u2022 \n\u2022\u2022 L...-..  I '  \u2022 \n. ..  .  .. \n\u2022  L_~\u00b7. \n.. ....... ' \n.. \n\nueTI \n\nCENTER DOTS \nMOVE \n\n.. . . . . .  e  \u2022 \u2022 \u2022 \u2022 \u2022  \n\nBACKGROUND \nDOTS \nSTATIONARY \n\n\\\\ \n\\ .. -. \n\n\u00b725 \n\n-200 \n\n- 100 \n\n0 \n\n100 \n\n200 \n\nDirection  or movement  or center  dots \n\n- Model Neuron \n\n........... -...  Allman', Type I Neuron \n\n.. \n\n'\\ \n\n\\ \n\n\\, \n.~.-\n\n0 \n\n~ \n\n.. \n~ \n.. \n'u \n'\"' \n\n~ \n\n.. \n0 \n'::1 \n:0 \n:a \noS \n\n100 \n\nso \n\n0 \n\n-so \n\n-100 \n\n-\n-c..o.:::e.,-\n-\n-\n-t._-=-:!-\n\n-\n-\n\nCENTER DOTS \nMOVE IN \nOPTIMUM \nDIRECTION \n\nBACKGROUND \nDIRECTION \nVARIES \n\n\u00b7200 \n\n\u00b7100 \n\n0 \n\n100 \n\n200 \n\nDirection  or movement  of  background  dots \n\nFigure 4.  Simulation of Allman's type I non-classical receptive field  properties. \n\n\fA Multiscale Adaptive Network Model of Motion Computation in R'imates \n\n355 \n\nantagonistic  to  the response  from  the CRF.  Based on  the  surround selectivitYt  the  MT \nneurons can be classified into three  types. Our model  neurons show that same type  of \nnonclassical receptive field selectivity as Allman's type I neuron. We have performed a \nseries of simulations similar to Allman's original experiments. \nAfter the CRF of a model is determined, the optimal motion stimulus is presented within \nthe  CRF.  The surrounds are,  however, moved  by  the  same amount  but  in  the  various \ndirections. Dearly, the motion in the surround has profound effect of the activity of the \ncell we are monitoring. 1be effect of the surround motion on the cell as a function of the \nthe direction of surround motion is plotted in Figure 4 (b). When the surround is moved \nin  a  similar direction  as  the  center,  the  neuron  activity  of the  cell  is  almost totally \nsuppressed. On the other hand,  when the  surround is moved  opposite to  the center, the \ncell's activity is enhanced. Superimposed on Figure 4 are the similar plots from  Allman's \npaper. \n\n6  CONCLUSION \nIn  conclusion,  we  have developed a multi-channel, multi-resolution  network  model of \nmotion  computation  in  primates.  The  model  MT  neurons  show  similar  nonclassical \nsurround properties as Allman's type I cells. We also proposed a novel explanation of the \nMotion Capture phenomenon based on  a coarse-to-fine strategy  for  conflict resolution \namong the various input channels. \n\nAcknowledgements \nCK acknowledges ONR, NSF and  the James McDonnell Foundation for  supporting this \nresearch. \n\nReferences \nAllman,  J. t Miezin,  F.,  and  McGuinness,  E.  (1985)  \"Direction- and  velocity-specific \nresponses from  beyond the classical receptive field  in  the  middle temporal  visual  area \n(MT)\", Perception, 14, 105 - 126. \nBattiti,  R.,  Koch,  C.  and Amaldi,  E.  (1991)  \"Computing optical  flow  across  multiple \nscales: an adaptive coarse-to-fme approach\", to appear in Inti. J. Computer Vision. \nBrandt, A.  (1982) \"Guide to multigrid development\".  In: Muitlgrid Methods, Ed. Dold, \nA. and Eckmann, B., Springer-Verlag. \nMovshon, J.A., Adelson, E.H., Gizzi, M.S., and  Newsome, W.T.  (1985)  \"The  Analysis \nof Moving Visual Pattern\", In Pattern Recognition Mechanisms, ed' Chagas. C., Gattas, \nR., Gross, C.G., Rome:  Vatican Press. \nRamachandran,  V.S.  and  Anstis,  S.M.  (1983)  \"Displacement  thresholds  for  coherent \napparent motion in random dot-patterns\", Vision Res. 23 (12), 1719 - 1724. \nYuille, A.L.  and Grzywacz, N.M.  (1988)  \"A computational theory for the perception of \ncoherent visual motion\", Nature, 333, 71  - 74. \nWang,  H.  T.,  Mathur,  B.  P.  and  Koch,  C.  (1989)  \"Computing  optical  flow  in  the \nprimate visual system\", Neural Computation, 1(1),92 - 103. \n\n\f", "award": [], "sourceid": 311, "authors": [{"given_name": "H.", "family_name": "Wang", "institution": null}, {"given_name": "Bimal", "family_name": "Mathur", "institution": null}, {"given_name": "Christof", "family_name": "Koch", "institution": null}]}