{"title": "Statistics of Natural Images: Scaling in the Woods", "book": "Advances in Neural Information Processing Systems", "page_first": 551, "page_last": 558, "abstract": null, "full_text": "Statistics of Natural Images: \n\nScaling in the Woods \n\nDaniel L.  Ruderman* and William Bialek \n\nNEe Research  Institute \n\n4 Independence  Way \nPrinceton,  N.J.  08540 \n\nAbstract \n\nIn  order  to  best  understand  a  visual  system  one  should  attempt \nto characterize  the natural images it processes.  We  gather  images \nfrom the woods and find that these scenes possess an ensemble scale \ninvariance.  Further,  they  are  highly  non-Gaussian,  and  this  non(cid:173)\nGaussian  character  cannot  be  removed  through local  linear filter(cid:173)\ning.  We find  that including a simple  \"gain control\"  nonlinearity in \nthe filtering  process  makes the filter  output quite Gaussian, mean(cid:173)\ning information is maximized at fixed  channel variance.  Finally, we \nuse  the measured power spectrum to place  an upper bound on the \ninformation conveyed about natural scenes by an array of receptors. \n\n1 \n\nIntroduction \n\nNatural stimuli are  playing an increasingly  important role  in our  understanding  of \nsensory  processing.  This is because  a sensory system's ability to perform a task is a \nstatistical quantity which  depends on the signal and noise  characteristics.  Recently \nseveral  approaches  have  explored  visual  processing  as  it  relates  to natural  images \n(Atick  &  Redlich  '90,  Bialek  et  al  '91,  van  Hateren  '92,  Laughlin  '81,  Srinivasan \net  al  '82) .  However,  a  good characterization  of natural scenes  is  sorely  lacking.  In \nthis  paper  we  analyze  images from  the  woods  in  an  effort  to  close  this  gap.  We \n\n\u2022 Current  address:  The  Physiological  Laboratory,  Downing  Street,  Cambridge \n\nCB2  3EG,  England. \n\n551 \n\n\f552 \n\nRuderman and Bialek \n\nfurther  attempt  to  understand  how  a  biological visual  system  should  best  encode \nthese  images. \n\n2  The Images \n\nOur images consist  of 256  x 256  pixels 1(x)  which  are calibrated against luminance \n(see  Appendix).  We  define  the  image contrast logarithmically as \n\ncf;(x)  =  In(I(x)/10), \n\nwhere  10  is  a  reference  intensity  defined  for  each  image.  We  choose  this  constant \nsuch  that Ex cf;(x)  =  0;  that  is,  the  average  contrast  for  each  image is  zero.  Our \nanalysis is  of the  contrast  data cf;( x). \n\n3  Scaling \n\nRecent  measurements  (Field  '87,  Burton  &  Moorhead  '87)  suggest  that ensembles \nof natural scenes  are scale-invariant.  This means that  and any quantity defined  on \na  given  scale  has  statistics  which  are  invariant  to  any  change  in  that  scale.  This \nseems  sensible  in  light  of the  fact  that  the  images  are  composed  of objects  at  all \ndistances,  and so no particular angular scale should stand out.  (Note that this does \nnot  imply that  any particular image is  fractal!  Rather,  the  ensemble  of scenes  has \nstatistics which  are  invariant to scale.) \n\n3.1  Distribution of Contrasts \n\nWe  can test  this scaling hypothesis  directly  by seeing  how  the statistics of various \nquantities  change  with  scale.  We  define  the  contrast  averaged  over  a  box of size \nN  x  N  (pixels)  to be \n\nN \n\ncf;N  =  ~2 L  cf;( i, j). \n\ni,j=l \n\nWe  now  ask:  \"How  does  the probability P( cf;N)  change  with N?\" \nIn the left graph of figure  1 we plot log(P( cf;N / cf;~MS)) for  N  = 1,2,4,8,16,32 along \nwith  the  parabola corresponding  to  a  Gaussian  of the  same variance.  By  dividing \nout  the  RMS  value  we  simply plot  all  the graphs on the  same contrast scale.  The \ngraphs all lie  atop one  another,  which means the contrast scales-the distribution's \nshape is  invariant to a change in  angular scale.  Note that the probability is far from \nGaussian,  as the  graphs have  linear,  and not  parabolic,  tails.  Even  after  averaging \nnearly 1000 pixels (in the case of 32x32), it remains non-Gaussian.  This breakdown \nof the  central  limit theorem  implies  that  the  pixels  are  correlated  over  very  long \ndistances.  This is  analogous to the physics of a  thermodynamic system at a critical \npoint. \n\n3.2  Distribution of Gradients \n\nAs  another  example  of scaling,  we  consider  the  probability  distribution  of image \ngradients.  We  define  the  magnitude of the  gradient  by  a  discrete  approximation \n\n\fStatistics of Natural Images:  Scaling in the Woods \n\n553 \n\n., \n\n\u00b7 15 \n\n-2.5 \n\n\u00b735 \n\n., ' - - -............ -~--'---~-----'-----' \n., \n\n., \n\n-2 \n\nFigure  1:  Left:  Semi-log  plot  of  P(</JN/(VJMS )  for  N \n1,2,4,8,16,32  with  a \nGaussia~ of the  same variance for  comparison  (solid  line).  Right:  Semi-log plot  of \nP(GN/GN) for  same set  of N's with  a  Rayleigh distribution for  comparison (solid \nline) . \n\nsuch  that \n\nG(x) = IG(x)1 ~ 1 'V</J (x) I\u00b7 \n\nWe examine this quantity over different scales by first  rescaling the images as above \nand then evaluating the gradient at the new scale.  We plot log( P( G N  / G N  )) for N  = \n1,2,4,8,16,32 in  the right  graph  of figure  1,  along with  the Rayleigh  distribution, \nP  ~ G exp( -aG2 ).  If the  images had Gaussian statistics,  local gradients would be \nRayleigh distributed.  Note once  again scaling of the  distribution. \n\n3.3  Power Spectrum \n\nScaling can also be demonstrated at the level of the power spectrum.  If the ensemble \nis scale-invariant, then the spectrum should  be of the form \n\nA \n\nS(k) = k2 -'7' \n\nwhere  k  is  measured  in  cycles/degree,  and  S  is  the  power  spectrum  averaged  over \norientations. \n\nThe spectrum is shown in figure 2 on log-log axes.  It displays overlapping data from \nthe two focal  lengths, and shows that the spectrum scales over  about 2.5  decades in \nspatial frequency.  We determine the parameters as A = (6.47\u00b10.13) x 1O- 3deg.(O.19) \nand  1J  = 0.19 \u00b1  0.01.  The  integrated  power  spectrum  up  to  60  cycles/degree  (the \nhuman resolution limit) gives  an  RMS  contrast  of about  30%. \n\n4  Local Filtering \n\nThe  early  stages  of vision  consist  of  neurons  which  respond  to  local  patches  of \nimages.  What do the statistics of these local processing units look like?  We convolve \nimages  with  the  filter  shown  in  the  left  of figure  3,  and  plot  the  histogram of its \noutput on  a semi-log scale on the  right of the figure. \n\n\f554 \n\nRuderman and Bialek \n\n< \n\n-~ \n\" \u2022 \u2022 ~ '\" \u2022 'tl \n\n~ ., \n\" ~ \n~ ., \n\n~ \n0 \n~ \n~ \u2022 ) \n0 \n.': \nrl '\" 0 ..., \n\n0 \n\n-1 \n\n-2 \n\n-3 \n\n-4 \n\n-5 \n\n-6 \n-1. 5 \n\n-1 \n\n-0 . 5 \n\n0 \n\nLoglO[Spatial  Frequency  (cycles/degree\u00bb) \n\n0.5 \n\n1 \n\n\u2022 \u2022 \n\n1.5 \n\nFigure  2:  Power  spectrum of the  contrast  of natural scenes  (log-log plot). \n\nThe  distribution  is  quite exponential over  nearly  4  decades  in  probability.  In fact, \nalmost any local linear filter which passes no DC has this property, including center(cid:173)\nsurround  receptive fields.  Information theory tells  us  that it is  best  to send signals \nwith  Gaussian  statistics  down  channels  which  have  power  constraints.  It  is  of in(cid:173)\nterest,  then,  to find  some type of filtering  which  transforms the exponential  distri(cid:173)\nbutions we  find  into Gaussian quantities. \n\nMusic,  as it turns out, has some similar properties.  An  amplitude histogram from 5 \nminutes of \"The Blue Danube\"  is shown on the left of figure 4.  It is  almost precisely \nexponential  over  4  decades  in  probability.  We  can  guess  what  causes  the  excesses \nover  a  Gaussian  distribution  at  the  peak  and the  tails;  it's  the  dynamics.  When \na  quiet  passage  is  played  the  amplitudes lie  only  near  zero,  and  create  the  excess \nin  the  peak.  When  the  music is  loud  the  fluctuations  are  large,  thus  creating  the \n\n\u00b70. \n\n\u00b71 \n\n.,. \n.. \n\n\u00b72' \n\n+ \n\n-\n\n-\n\n+ \n\nFigure  3:  Left:  2  X  2  local filter.  Right:  Semi-log plot  of histogram of its  output \nwhen filtering  natural scenes. \n\n\u00b72 \n\n\fStatistics of Natural Images:  Scaling in the Woods \n\n555 \n\n-<15 \n\n., \n\n-15 \n\n-2 \n\n\u00b725 \n\nI \nI \nI \n\n/ \ni \nI \n\nI I ; \nI \nI \n! \n/ \ni \n\n.. \n-\n.. \n.. \n.. \n.. \n\" \n\n.. \n.. \n.. \n.. \n.. \n.. \n\" \n\n\"-\n.. \n.. \n.. \n.. \n... \n.. \n\n.. \n-\n.. \n.. \n.. \n.. \n\" \n\n.. \n\n\".\"\".\"  .\u2022\u2022\u2022\u2022\u2022 \".,1, \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 ,\"  \u2022\u2022 \n\n:::::::1:::::::0:::::::1::::::: \n\n... \n\n... \n\nI \n\nI  . . . . . . . .  111 , . , . ,  I I  \u2022 \u2022  I  \u2022 \u2022 \u2022  I I .  I  \u2022 \u2022 \u2022  ,1111 , .  \n\n.. \n\n-4 \n\n-2 \n\nFigure  4:  Left:  Semi-log  histogram  of  \"The  Blue  Danube\"  with  a  Gaussian  for \ncomparison (dashed).  Right:  5  x  5 center-surround  filter  region. \n\ntails.  Most  importantly, these  quiet  and  loud  passages  extend  coherently  in  time; \nso  to  remove  the  peak  and  tails,  we  can  simply slowly  adjust  a  \"volume knob\"  to \nnormalize  the  fluctuations.  The  images  are  made  of objects  which  have  coherent \nstructure over space,  and a similar localized dynamic occurs.  To remove it, we  need \nsome sort of gain control. \n\nTo do this, we  pass the images through a local filter  and then normalize by the local \nstandard deviation of the image (analogous to the volume of a  sound passage): \n\n./,(  ) = \u00a2(x) - \u00a2(x) \nO'(x)' \n\n'f/  X \n\nHere  \u00a2(x) is  the mean image contrast in the N  x N  region surrounding x,  and O'(x) \nis  the standard  deviation within the same region  (see  the right of figure  4) . \n\n> \n\n~ \n\n! \n\n~ \n\n-1 \n\n- 1  , \n\n-, , \n\n-J  , \n\n/' \n! \nI \n\n/ \n\nI \n,/' \nI \n\nI ; \nI \ni \ni \n:' \ni \n\n.-\",--\n\n/ \n\n\u2022 \n\nContr .... t \n\n, \n--,\\ \n, \n, \n\\ \n\\ \n\\, \n\\ \n\\ \n\n\\ \n\n\\ \n\\ \n\\ \n\\\\ \n\n~ \n\n~ \n\ni \n\n3 \n\n- 1  , \n\n- 2 \n\n-, , \n\n-J \n\n. J  , \n-. \n\nc \n\n0  , \n\n1  5 \n\nGradlent \n\n\" \n\n\"\"\"\" \n\n\" .\"~, \n\n\"''\\'' \n\n\\'~\" \n\n:l \n\n2.5 \n\n(UrHtl  of  Me.n1 \n\n)  S \n\nFigure  5:  Left:  Semi-log  plot  of  histogram  of  1/J,  with  Gaussian  for  comparison \n(dashed).  Right:  Semi-log  plot  of histogram of gradients of 1/J,  with  Rayleigh  dis(cid:173)\ntribution shown for  comparison (dashed). \n\nWe  find  that  for  a  value  N  =  5  (ratio  of the  negative  surround  to  the  positive \ncenter),  the  histograms of 1/J  are  the  closest  to  Gaussian  (see  the  left  of figure  5) . \nFurther,  the  histogram of gradients  of 1/J  is  very  nearly  Rayleigh  (see  the  right  of \n\n\f556 \n\nRuderman and Bialek \n\nfigure  5).  These  are  both signatures  of a  Gaussian  distribution.  Functionally,  this \n\"variance normalization\"  procedure  is  similar to  contrast gain control found  in  the \nretina  and  LGN  (Benardete  et  ai,  '92).  Could  its  role  be  in  \"Gaussianizing\"  the \nimage statistics? \n\n5 \n\nInformation in  the Retina \n\nFrom the  measured  statistics  we  can  place  an  upper  bound  on  the  amount  of in(cid:173)\nformation an  array of photo receptors  conveys  about natural images.  We  make the \nfollowing  assumptions: \n\n\u2022  Images  are  Gaussian  with  the  measured  power  spectrum.  This  places  an \nupper  bound  on  the  entropy  of natural  scenes,  and  thus  an  upper  bound \non the information represented. \n\n\u2022  The receptors  sample images in  a  hexagonal  array with  diffraction-limited \n\noptics.  There  is  no aliasing. \n\n\u2022  Noise  is  additive,  Gaussian,  white,  and independent  of the image. \n\nThe output of the  nth  receptor  is  thus given by \n\nYn  = J d2x  \u00a2(x) M(x - xn)  + 'f/n, \n\nwhere  Xn  is  the  location of the receptor,  M(x)  is  the  point-spread  function  of the \noptics,  and 'f/n  is  the noise.  For  diffraction-limited optics, \n\nIkl/kc, \nwhere  kc  is  the cutoff frequency  of 60  cycles/degree. \n\nM(k) ~ 1 -\n\nIn the limit of an infinite lattice, Fourier components are independent,  and the total \ninformation is  the sum of the  information in each  component: \n\n+=  Ac  fkCdkklog[1+A1  2 IM (k)1 2S(k)]. \n\n47J\"  Jo \n\ncu \n\nHere I  is  the information per receptor,  Ac  is the  area of the unit  cell  in  the lattice, \nand  u 2  is  the variance of the noise. \nWe  take  S(k)  =  A/k 2- fJ ,  with  A  and  'f/  taking their  measured  values,  and express \nthe  noise  level  in  terms of the  signal-to-noise ratio  in  the  receptor.  In  figure  6  we \nplot the information per receptor  as  a function  of SN R  along with the information \ncapacity  (per  receptor)  of the  photoreceptor  lattice at that SN R,  which  is \n\n1 \n\nC =  2 log [1 + S N R] . \n\nThe information conveyed is less than 2 bits per receptor per image, even  at SN R = \n1000.  The  redundancy  of  this  representation  is  quite  high,  as  seen  by  the  gap \nbetween  the curves;  at least  as much of the information capacity is  being wasted  as \nis  being used . \n\n\fStatistics of Natural Images: Scaling in the Woods \n\n557 \n\nI  (bits) \n\n5 \n\n4 \n\n0.5 \n\n1 \n\n1.5 \n\n2 \n\n2.5 \n\n3  LoglO[SNR) \n\nFigure  6:  Information per  receptor  per  image (in  bits)  as  a  function  of 10g(SN R) \n(lower  line).  Information capacity per  receptor  ( upper  line). \n\n6  Conclusions \n\nWe  have  shown  that  images  from  the  forest  have  scale-invariant,  highly  non(cid:173)\nGaussian statistics.  This is evidenced by the scaling of the non-Gaussian histograms \nand the power-law form of the power spectrum.  Local  linear filtering  produces  val(cid:173)\nues  with  quite exponential probability distributions.  In order to  \"Gaussianize,\"  we \nmust use a nonlinear filter which acts as a gain control.  This is analogous to contrast \ngain control,  which is seen  in  the mammalian retina.  Finally, an array of receptors \nwhich  encodes  these  natural  images  only  conveys  at  most  a  few  bits  per  receptor \nper  image of information, even  at  high  SN R.  At  an  image rate  of 50  per  second, \nthis places  an information requirement  of less  than about  100  bits per second  on  a \nfoveal  ganglion cell. \n\nAppendix \n\nSnapshots  were  gathered  using  a  Sony  Mavica  MVC-5500  still  video  camera \nequipped  with  a  9.5-123.5mm zoom  lens.  The  red,  green,  and  blue  signals  were \ncombined  according  to  the  standard  CIE  formula Y  =  0.59  G + 0.30  R + 0.11  B \nto produce  a  grayscale  value  at each  pixel.  The quantity Y  was  calibrated  against \nincident luminance to produce  the image intensity  I(x).  The images were  cropped \nto the central  256  x  256  region. \n\nThe  dataset  consists  of 45  images taken  at  a  15mm focal  length  (images subtend \n15 0  of visual angle)  and 25  images at an 80mm focal length (3 0  of visual angle) .  All \nimages were  of distant objects  to  avoid  problems of focus.  Images  were  chosen  by \nplacing  the  camera at  a  random point  along a  path  and  rotating  the  field  of view \nuntil no nearby  objects appeared  in  the frame.  The camera was tilted by  less  than \n100  up  or  down  in  an  effort  to  avoid  sky  and  ground.  The  forested  environment \n(woods  in  New  Jersey  in  springtime)  consisted  mainly of trees,  rocks,  hillside,  and \na  stream. \n\n\f558 \n\nRuderman and Bialek \n\nAcknowledgements \n\nWe  thank H.  B.  Barlow,  B.  Gianulis, A.  J.  Libchaber,  M.  Potters,  R.  R.  de  Ruyter \nvan Stevenink,  and A.  Schweitzer.  Work was supported in part by a fellowship from \nthe Fannie  and John Hertz  Foundation (to D.L.R.). \n\nReferences \n\nJ .J.  Atick  and  N.  Redlich.  Towards  a  theory  of early  visual  processing  Neural \nComputation,  2:308,  1990. \n\nE.  A.  Benardete,  E. Kaplan, and B.  W. Knight.  Contrast gain control in the primate \nretina:  P  cells  are  not X-like, some M-cells  are.  Vis.  Neuosci.,  8:483-486,  1992. \n\nW.  Bialek,  D.  L.  Ruderman,  and  A.  Zee.  The  optimal  sampling  of natural  im(cid:173)\nages:  a  design  principle for  the visual system?,  in  Advances  in  Neural Information \nProcessing systems,  3,  R.  P.  Lippman, J. E.  Moody and D.  S.  Touretzky, eds.,  1991. \n\nG.  J.  Burton  and  I.  R.  Moorhead.  Color  and  spatial  structure  in  natural  scenes. \nApplied  Optics,  26:157-170,  1987. \n\nD.  J.  Field.  Relations  between  the  statistics  of natural  images  and  the  response \nproperties  of cortical cells.  I.  Opt.  Soc.  Am.  A, 4:2379,  1987. \n\nJ.  H.  van  Hateren.  Theoretical predictions of spatiotemporal receptive  fields  of fly \nLMCs,  and experimental validation.  I.  Compo  Physiol.  A,  171:157-170, 1992. \n\nS.  B.  Laughlin.  A simple coding procedure enhances  a  neuron's information capac(cid:173)\nity.  Z.  Naturforsh., 36c:910-912,  1981. \n\nM.  V.  Srinivasan,  S.  B.  Laughlin,  and  A.  Dubs.  Predictive  coding:  a  fresh  view  of \ninhibition in  the retina.  Proc.  R.  Soc.  Lond.  B,  216:427-459,  1982. \n\n\f", "award": [], "sourceid": 835, "authors": [{"given_name": "Daniel", "family_name": "Ruderman", "institution": null}, {"given_name": "William", "family_name": "Bialek", "institution": null}]}