{"title": "Emergence of Topography and Complex Cell Properties from Natural Images using Extensions of ICA", "book": "Advances in Neural Information Processing Systems", "page_first": 827, "page_last": 833, "abstract": null, "full_text": "Emergence of Topography and Complex \n\nCell Properties from  Natural Images \n\nusing Extensions of ICA \n\nAapo Hyviirinen and Patrik Hoyer \n\nNeural Networks Research Center \nHelsinki University of Technology \n\nP.O.  Box 5400,  FIN-02015 HUT,  Finland \n\naapo.hyvarinen~hut.fi,  patrik.hoyer~hut.fi \n\nhttp://www.cis.hut.fi/projects/ica/ \n\nAbstract \n\nIndependent  component  analysis of natural images  leads to emer(cid:173)\ngence  of  simple  cell  properties,  Le.  linear  filters  that  resemble \nwavelets  or  Gabor  functions. \nIn  this  paper,  we  extend  ICA  to \nexplain further properties of VI cells.  First, we decompose natural \nimages  into  independent  subspaces  instead of scalar  components. \nThis  model  leads  to  emergence  of phase  and  shift  invariant  fea(cid:173)\ntures,  similar  to  those  in  VI  complex  cells.  Second,  we  define  a \ntopography between the linear components obtained by ICA.  The \ntopographic distance between  two components is  defined  by  their \nhigher-order correlations, so that two components are close to each \nother  in  the  topography  if  they  are  strongly  dependent  on  each \nother.  This  leads  to simultaneous emergence  of both  topography \nand invariances similar to complex cell  properties. \n\n1 \n\nIntroduction \n\nA  fundamental  approach  in  signal  processing  is  to design  a  statistical  generative \nmodel  of the  observed  signals.  Such  an  approach  is  also  useful  for  modeling  the \nproperties of neurons in primary sensory areas.  The basic models that we  consider \nhere  express a  static monochrome image  J (x, y)  as  a  linear  superposition  of some \nfeatures or basis functions  bi (x, y): \n\nn \n\nJ(x, y)  =  2: bi(x, Y)Si \n\ni=l \n\n(1) \n\nwhere the Si  are stochastic coefficients,  different for  each image J(x, y).  Estimation \nof the model in Eq.  (1)  consists of determining the values of Si  and bi(x, y)  for  all i \nand (x, y), given a sufficient number of observations of images, or in practice, image \npatches J(x,y).  We  restrict ourselves here to the basic case where the bi(x,y) form \nan invertible linear system.  Then we  can invert  Si  =< Wi, J  > where the Wi  denote \nthe  inverse  filters,  and  < Wi, J  >=  L.x,y Wi(X, y)J(x, y)  denotes  the  dot-product. \n\n\f828 \n\nA.  Hyviirinen and P  Hoyer \n\nThe Wi (x, y)  can then be identified as the receptive fields  of the model simple cells, \nand the  Si  are their activities when  presented with a given image patch I(x, y). \nIn the basic case, we assume that the Si  are nongaussian, and mutually independent. \nThis type of decomposition  is  called  independent  component  analysis  (ICA)  [3,  9, \n1,  8],  or sparse coding  [13].  Olshausen and Field  [13]  showed that when this model \nis  estimated with input  data consisting of patches of natural scenes,  the obtained \nfilters  Wi(X,y)  have  the  three principal  properties  of simple  cells  in  VI:  they  are \nlocalized,  oriented,  and  bandpass  (selective  to scale/frequency).  Van  Hateren and \nvan der Schaaf [15]  compared quantitatively the obtained filters Wi(X, y)  with those \nmeasured by single-cell recordings of the macaque cortex,  and found  a good match \nfor  most of the parameters. \nWe show in this paper that simple extensions of the basic ICA model explain emer(cid:173)\ngence of further properties of VI cells:  topography and the invariances of complex \ncells.  Due to space limitations, we can only give the basic ideas in this paper.  More \ndetails can be found  in  [6,  5,  7]. \nFirst, using the method of feature subspaces [11],  we model the response of a com(cid:173)\nplex  cell  as  the  norm  of the  projection  of the  input  vector  (image  patch)  onto  a \nlinear subspace,  which is  equivalent to the classical energy models.  Then we  maxi(cid:173)\nmize the independence between the norms of such projections, or energies.  Thus we \nobtain features that are localized in space, oriented, and bandpass, like those given \nby simple cells,  or Gabor analysis.  In contrast to simple  linear filters,  however,  the \nobtained feature subspaces  also show emergence of phase invariance  and  (limited) \nshift  or translation  invariance.  Maximizing the independence,  or equivalently,  the \nsparseness of the norms  of the projections to feature subspaces thus  allows for  the \nemergence of exactly those invariances that are encountered in complex cells. \nSecond, we extend this model of independent subspaces so that we have overlapping \nsubspaces, and every subspace corresponds to a neighborhood on a topographic grid. \nThis is called topographic ICA, since it defines a topographic organization between \ncomponents.  Components that are far from each other on the grid are independent, \nlike in ICA. In contrast, components that are near to each other are not independent: \nthey  have  strong higher-order  correlations.  This  model  shows  emergence  of both \ncomplex cell  properties and topography from  image data. \n\n2 \n\nIndependent subspaces as complex cells \n\nIn addition to the simple cells that can be modelled by basic ICA, another important \nclass of cells  in  VI is  complex cells.  The two principal properties that  distinguish \ncomplex cells from  simple cells  are phase invariance and  (limited)  shift  invariance. \nThe  purpose of the  first  model  in  this  paper  is  to explain  the emergence  of such \nphase  and  shift  invariant  features  using  a  modification  of the  ICA  model.  The \nmodification is  based on combining the principle of invariant-feature subspaces [11] \nand the model of multidimensional independent component analysis [2]. \n\nInvariant  feature  subspaces.  The  principle  of  invariant-feature  subspaces \nstates that one may  consider an invariant feature as  a linear subspace in a feature \nspace.  The value of the invariant, higher-order feature is given by (the square of) the \nnorm of the projection of the given  data point on that subspace, which is typically \nspanned  by  lower-order features.  A  feature  subspace,  as  any linear  subspace,  can \nalways be represented by a set of orthogonal basis vectors, say Wi(X, y), i  =  1, ... , m, \nwhere  m  is  the dimension  of the subspace.  Then  the value  F(I)  of the feature  F \nwith input vector I(x, y)  is  given by  F(I) =  L::l < Wi, I  >2,  where a  square root \n\n\fEmergence of VI  properties using Extensions of leA \n\n829 \n\nmight  be taken.  In  fact,  this  is  equivalent  to computing the distance between  the \ninput  vector  I (X, y)  and  a  general  linear combination of the  basis  vectors  (filters) \nWi(X, y)  of the feature subspace [11].  In [11],  it was shown that this principle,  when \ncombined with competitive learning techniques, can lead to emergence of invariant \nimage features. \n\nMultidimensional  independent  component  analysis. \nIn  multidimensional \nindependent component analysis  [2]  (see  also [12]),  a  linear generative model as in \nEq.  (1)  is  assumed.  In  contrast  to  ordinary  leA,  however,  the  components  (re(cid:173)\nsponses)  Si  are not assumed to be all mutually independent.  Instead, it is  assumed \nthat the  Si  can  be  divided  into  couples,  triplets  or  in  general  m-tuples,  such  that \nthe  Si  inside  a  given  m-tuple may  be dependent  on  each  other,  but  dependencies \nbetween  different  m-tuples are not  allowed.  Every  m-tuple of Si  corresponds to m \nbasis vectors  bi(x, y).  The m-dimensional probability densities  inside the m-tuples \nof Si is not specified in advance in the general definition of multidimensional leA [2]. \nIn the following,  let  us denote by J  the number of independent feature  subspaces, \nand by  Sj,j =  1, ... , J  the set of the indices of the  Si  belonging to the subspace of \nindex j . \n\nInvariant-feature subspaces can be embedded \n\nIndependent feature subspaces. \nin multidimensional independent component analysis by considering probability dis(cid:173)\ntributions  for  the  m-tuples  of Si  that  are  spherically  symmetric,  i.e.  depend  only \non  the  norm.  In  other  words,  the  probability  density  Pj (.)  of the  m-tuple  with \nindex j  E {1, ... , J}, can be expressed as  a  function of the sum of the squares of the \nsi,i E  Sj  only.  For simplicity,  we  assume further  that the Pj(')  are equal for  all  j, \ni.e.  for  all subspaces. \nAssume  that  the data consists  of K  observed image  patches  I k (x, y), k  =  1, ... , K. \nThen the logarithm of the likelihood L  of the data given the model can be expressed \nas \n\n10gL(wi(x, y), i  = L.n) = L L 10gp(L < Wi, h  >2) + Klog I det WI \n\n(2) \n\nK \n\nJ \n\nk=1  j=1 \n\niESj \n\nwhere  P(LiESj sT)  =  pj(si,i  E  Sj)  gives  the  probability  density  inside  the  j-th \nm-tuple of Si,  and W  is  a  matrix containing the filters Wi(X, y)  as  its columns. \nAs  in basic leA, prewhitening of the data allows  us  to consider the Wi(X, y)  to be \northonormal,  and  this  implies  that  log I det WI  is  zero  [6].  Thus  we  see  that  the \nlikelihood  in  Eq.  (2)  is  a  function  of the  norms  of the  projections  of  Ik(x,y)  on \nthe subspaces indexed by j, which are spanned by the orthonormal basis sets given \nby Wi(X, y), i  E  Sj.  Since the  norm of the projection of visual  data on practically \nany  subspace  has a  supergaussian distribution,  we  need  to choose  the  probability \ndensity  P in  the model  to  be  sparse  [13],  i.e.  supergaussian  [8].  For  example,  we \ncould use  the following probability distribution \n\nlogp( L st) =  -O:[L s~11/2 + {3, \n\n(3) \n\niESj \n\niESj \n\nwhich could be considered a  multi-dimensional version of the exponential distribu(cid:173)\ntion.  Now  we  see  that  the  estimation  of  the  model  consists  of finding  subspaces \nsuch  that  the  norms  of the  projections  of the  (whitened)  data  on  those  subspaces \nhave  maximally sparse  distributions. \n\nThe introduced \"independent (feature) subspace analysis\" is a natural generalization \nof ordinary  leA.  In fact,  if  the  projections  on  the  subspaces  are  reduced  to  dot(cid:173)\nproducts,  i.e.  projections  on  1-D  subs paces ,  the  model  reduces  to  ordinary  leA \n\n\f830 \n\nA.  Hyviirinen and P.  Hoyer \n\n(provided that, in addition, the independent components are assumed to have non(cid:173)\nskewed  distributions).  It is  to be  expected  that  the  norms  of the  projections  on \nthe subspaces represent some higher-order, invariant features.  The exact nature of \nthe invariances has not  been specified in the model but will  emerge from the input \ndata, using only the prior information on their independence. \n\nWhen independent subspace analysis is applied to natural image data, we  can iden(cid:173)\ntify  the  norms  of the  projections  (2:iESj st)1/2  as  the  responses  of the  complex \ncells.  If the individual  filter  vectors Wi(X, y)  are identified with  the receptive fields \nof simple  cells,  this  can  be interpreted  as  a  hierarchical model  where the  complex \ncell  response is computed from  simple cell  responses  Si,  in a  manner similar to the \nclassical  energy  models  for  complex  cells.  Experiments  (see  below  and  [6])  show \nthat the model does lead to emergence of those invariances that are encountered in \ncomplex cells. \n\n3  Topographic leA \n\nThe independent subspace analysis model introduces a certain dependence structure \nfor the components Si.  Let us assume that the distribution in the subspace is sparse, \nwhich means that the norm of the projection is  most of the time very near to zero. \nThis is  the case,  for  example,  if the  densities inside the subspaces are specified  as \nin  (3).  Then  the model  implies that  two  components  Si  and  Sj  that belong to the \nsame subspace  tend  to  be  nonzero  simultaneously.  In  other  words,  s;  and  S]  are \npositively correlated.  This seems to be a  preponderant structure of dependency  in \nmost natural data.  For image data, this has also been noted by Simoncelli  [14). \nNow  we  generalize the  model  defined  by  (2)  so  that it  models  this  kind  of depen(cid:173)\ndence  not  only  inside  the  m-tuples,  but  among  all  ''neighboring''  components.  A \nneighborhood relation defines  a  topographic order [10).  (A  different generalization \nbased on an explicit generative model  is  given  in  [5].)  We  define  the model  by  the \nfollowing  likelihood: \n\n10gL(wi(x,y),i =  1, ... ,n) =  LLG(Lh(i,j) < Wi,h  >2) +KlogldetWI  (4) \n\nK \n\nn \n\nn \n\nk=I j=l \n\ni=l \n\nHere,  h(i, j) is  a  neighborhood  function,  which  expresses  the  strength of the  con(cid:173)\nnection between the i-th and j-th units.  The neighborhood function can be defined \nin  the same way  as  with  the self-organizing map  [10).  Neighborhoods can thus  be \ndefined  as  one-dimensional  or  two-dimensional;  2-D  neighborhoods  can  be square \nor hexagonal.  A simple example is to define a  1-D neighborhood  relation by \n\nh(i,j) = {I, \n\nif Ii - ~I ~ m \n\n0,  otherwIse. \n\n(5) \n\nThe constant m  defines  here the width  of the neighborhood. \nThe function G has a similar role as the log-density of the independent components \nin  classic ICA.  For image data, or other data with a  sparse structure, G should be \nchosen as in  independent subspace analysis,  see Eq.  (3). \n\nProperties of the topographic leA model.  Here,  we  consider for  simplicity \nonly the case of sparse data.  The first basic property is that all the components Si  are \nuncorrelated,  as can be easily proven by symmetry arguments [5].  Moreover,  their \nvariances can be defined to be equal to unity, as in classic ICA. Second, components \nSi  and  S j  that are near to each other, Le.  such that h( i, j) is  significantly non-zero, \n\n\fEmergence oj VI properties using Extensions oj leA \n\n831 \n\ntend  to  be  active  (non-zero)  at  the  same  time.  In  other  words,  their  energies sf \nand  s;  are  positively  correlated.  Third,  latent  variables  that  are  far  from  each \nother are practically independent.  Higher-order correlation decreases as a  function \nof distance,  assuming  that  the neighborhood is  defined  in  a  way  similar to that in \n(5).  For details, see  [5]. \n\nLet  us  note  that  our  definition  of  topography  by  higher-order  correlations  is  very \ndifferent from the one used in practically all existing topographic mapping methods. \nUsually, the distance is defined by basic geometrical relations like Euclidean distance \nor correlation.  Interestingly, our principle makes  it possible to define a  topography \neven  among  a  set  of  orthogonal  vectors  whose  Euclidean  distances  are  all  equal. \nSuch orthogonal vectors  are  actually encountered in  leA,  where  the  basis  vectors \nand filters  can be constrained to be orthogonal in the whitened space. \n\n4  Experiments with natural image data \n\nWe  applied our methods on  natural image data.  The data was  obtained by  taking \n16  x  16  pixel  image  patches  at  random  locations  from  monochrome  photographs \ndepicting wild-life scenes (animals, meadows, forests,  etc.).  Preprocessing consisted \nof removing the De component  and  reducing the dimension of the  data to 160 by \npeA. For details on the experiments, see  [6,  5]. \nFig.  1  shows  the  basis  vectors  of the  40  feature  subspaces  (complex  cells),  when \nIt can  be  seen  that  the  basis  vectors \nsubspace  dimension  was  chosen  to  be  4. \nassociated  with  a  single  complex  cell  all  have  approximately  the  same orientation \nand frequency.  Their locations are not identical, but close to each other.  The phases \ndiffer considerably.  Every feature subspace can thus be considered a generalization \nof a  quadrature-phase filter  pair as found  in  the classical energy  models,  enabling \nthe  cell  to  be selective  to  some  given  orientation  and  frequency,  but  invariant  to \nphase  and  somewhat  invariant  to shifts.  Using  4  dimensions  instead  of 2 greatly \nenhances the shift invariance of the feature subspace. \nIn topographic leA, the neighborhood function was defined so that every neighbor(cid:173)\nhood consisted of a 3 x 3 square of 9 units on a  2-D  torus lattice [10].  The obtained \nbasis vectors,  are shown  in  Fig.  2.  The basis vectors are similar  to those obtained \nby  ordinary leA of image data [13,  1].  In addition,  they  have a  clear topographic \norganization. In addition,  the connection to independent subspace analysis is clear \nfrom  Fig.  2.  Two neighboring basis vectors in Fig.  2 tend to be of the same orienta(cid:173)\ntion and frequency.  Their locations are near to each other as well.  In contrast, their \nphases are very different.  This means that a neighborhood of such basis vectors, i.e. \nsimple cells,  is  similar to an independent subspace.  Thus it functions as a  complex \ncell.  This was  demonstrated in  detail in  [5]. \n\n5  Discussion \n\nWe  introduced  here  two  extensions  of  leA  that  are  especially  useful  for  image \nmodelling.  The first  model  uses  a  subspace  representation  to  model  invariant fea(cid:173)\ntures.  It  turns  out  that  the  independent  subspaces  of  natural images  are similar \nto complex cells.  The second  model is  a  further extension of the independent sub(cid:173)\nspace  model.  This  topographic  leA  model  is  a  generative  model  that  combines \ntopographic  mapping  with  leA.  As  in  all  topographic  mappings,  the  distance  in \nthe representation space  (on  the topographic \"grid\")  is  related to some  measure of \ndistance  between  represented  components.  In  topographic  leA,  the  distance  be(cid:173)\ntween  represented components is  defined  by  higher-order correlations,  which  gives \n\n\f832 \n\nA.  Hyviirinen and P  Hoyer \n\nthe natural distance measure in the context of leA. \n\nAn approach closely related to ours is  given by Kohonen's Adaptive Subspace Self(cid:173)\nOrganizing Map [11).  However, the emergence of shift invariance in [11)  was condi(cid:173)\ntional to restricting consecutive patches to come from nearby locations in the image, \ngiving  the input  data a  temporal  structure like  in  a  smoothly  changing image  se(cid:173)\nquence.  Similar developments  were  given  by  F6ldiak [4).  In  contrast to these two \ntheories, we formulated an explicit image model.  This independent subspace analy(cid:173)\nsis model shows that emergence of complex cell  properties is possible using patches \nat random, independently selected locations, which proves that there is  enough in(cid:173)\nformation in static images to explain the properties of complex cells.  Moreover, by \nextending this subspace model to model topography, we showed that the  emergence \nof both  topography  and complex cell properties can be  explained by  a single principle: \nneighboring cells should have strong higher-order correlations. \n\nReferences \n[1]  A.J.  Bell  and T.J.  Sejnowski.  The  'independent  components'  of natural  scenes  are \n\nedge filters.  Vision  Research,  37:3327-3338,  1997. \n\n[2]  J.-F. Cardoso.  Multidimensional independent component analysis.  In Proc.  IEEE Int. \nConf.  on Acoustics,  Speech  and Signal  Processing  (ICASSP'98),  Seattle,  WA,  1998. \n[3]  P.  Comon.  Independent  component  analysis  - a  new  concept?  Signal  Processing, \n\n36:287-314,  1994. \n\n[4]  P.  Foldiak.  Learning invariance from transformation sequences.  Neural  Computation, \n\n3:194-200,  1991. \n\n[5]  A.  Hyvarinen and P.  O.  Hoyer.  Topographic independent component analysis.  1999. \n\nSubmitted, available  at http://www.cis.hut.firaapo/. \n\n[6]  A. Hyvarinen  and P.  O.  Hoyer.  Emergence  of  phase  and shift  invariant  features  by \n\ndecomposition of natur:al images into independent feature subspaces.  Neural  Compu(cid:173)\ntation,  2000.  (in press). \n\n[7]  A.  Hyvarinen,  P.  O.  Hoyer,  and M.  Inki.  The  independence assumption:  Analyzing \nthe independence of the components by topography.  In M.  Girolami,  editor,  Advances \nin Independent  Component  Analysis.  Springer-Verlag,  2000.  in  press. \n\n[8]  A.  Hyvarinen  and  E.  Oja.  A fast  fixed-point  algorithm  for  independent component \n\nanalysis.  Neural  Computation,  9(7):1483-1492,  1997. \n\n[9]  C.  Jutten and J.  Herault.  Blind separation of sources,  part I:  An  adaptive algorithm \n\nbased on  neuromimetic architecture.  Signal  Processing,  24:1-10,  1991. \n\n[10]  T.  Kohonen.  Self-Organizing  Maps.  Springer-Verlag,  Berlin,  Heidelberg,  New  York, \n\n1995. \n\n[11]  T.  Kohonen.  Emergence of invariant-feature detectors in  the adaptive-subspace self(cid:173)\n\norganizing map.  Biological  Cybernetics,  75:281-291,  1996. \n\n[12]  J. K. Lin.  Factorizing multivariate function classes.  In Advances in Neural Information \n\nProcessing  Systems,  volume  10,  pages 563-569.  The MIT Press,  1998. \n\n[13]  B.  A.  Olshausen  and D.  J.  Field.  Emergence of simple-cell  receptive  field  properties \n\nby learning a sparse code for  natural images.  Nature,  381:607-609,  1996. \n\n[14]  E.  P.  Simoncelli  and  O.  Schwartz.  Modeling  surround  suppression  in  VI  neurons \nwith a statistically-derived normalization  model.  In Advances  in Neural  Information \nProcessing  Systems  11,  pages 153-159.  MIT  Press,  1999. \n\n[15]  J.  H.  van  Hateren  and  A.  van der Schaaf.  Independent component filters  of natural \nimages  compared  with  simple  cells  in  primary  visual  cortex.  Proc.  Royal  Society \nser.  B,  265:359-366,  1998. \n\n\fEmergence of Vi properties using Extensions of leA \n\n- \", \n\n;II \n\n833 \n\n.. \n\n:II  \u2022 \n\u2022  \u2022 \n\u2022 \n\n- , \n\n\u2022 \n..  \u2022 \n\n\u2022 \n\n\u2022 \n\u2022  \u2022 \n\n'11'1.; \n\n\u2022  \u2022 \n\n\u2022  \u2022 \n\n\u2022 \n\n\u2022 \n\u2022 \n\n\u2022  \u2022 \n\n\u2022 \n\n\u2022 \n\nFigure  1:  Independent  subspaces of natural image  data.  The model  gives  Gabor(cid:173)\nlike basis vectors for  image windows.  Every group of four  basis vectors corresponds \nto one independent  feature  subspace,  or complex cell.  Basis  vectors  in  a  subspace \nare similar in orientation, location and frequency.  In  contrast, their phases are very \ndifferent. \n\n\u2022 \n\n-\n\n\" \n\n\u2022 \nIiioiII  -\n\u2022 \u2022 \n\n~ \n\niii \n\nI \n\nI \niii  I; \n\n\u2022  .. \n\n, \n'i \n~ \n\n\u2022 \n\n.-\n\nI \n\n.. \n\n.. \" \n\nI \n\n\u2022 \n\n\u2022 \n\nFigure 2:  Topographic leA of natural image data.  This gives Gabor-like basis vec(cid:173)\ntors as well.  Basis vectors that are similar in orientation, location and/or frequency \nare close to each other.  The phases of near by basis vectors are very different, giving \neach  neighborhood properties similar to a  complex cell. \n\n\f", "award": [], "sourceid": 1670, "authors": [{"given_name": "Aapo", "family_name": "Hyv\u00e4rinen", "institution": null}, {"given_name": "Patrik", "family_name": "Hoyer", "institution": null}]}