{"title": "Learning to Perceive Transparency from the Statistics of Natural Scenes", "book": "Advances in Neural Information Processing Systems", "page_first": 1271, "page_last": 1278, "abstract": null, "full_text": "Learning to Perceive Transparency from\n\nthe Statistics of Natural Scenes\n\nAnat Levin\n\nAssaf Zomet\n\nYair Weiss\n\nSchool of Computer Science and Engineering\n\nThe Hebrew University of Jerusalem\n\n91904 Jerusalem, Israel\n\nfalevin,zomet,yweissg@cs.huji.ac.il\n\nAbstract\n\nCertain simple images are known to trigger a percept of trans-\nparency: the input image I is perceived as the sum of two images\nI(x; y) = I1(x; y) + I2(x; y). This percept is puzzling. First, why\ndo we choose the \\more complicated\" description with two images\nrather than the \\simpler\" explanation I(x; y) = I1(x; y) + 0 ? Sec-\nond, given the in\ufb02nite number of ways to express I as a sum of two\nimages, how do we compute the \\best\" decomposition ?\nHere we suggest that transparency is the rational percept of a sys-\ntem that is adapted to the statistics of natural scenes. We present\na probabilistic model of images based on the qualitative statistics\nof derivative \ufb02lters and \\corner detectors\" in natural scenes and\nuse this model to \ufb02nd the most probable decomposition of a novel\nimage. The optimization is performed using loopy belief propa-\ngation. We show that our model computes perceptually \\correct\"\ndecompositions on synthetic images and discuss its application to\nreal images.\n\n1\n\nIntroduction\n\nFigure 1a shows a simple image that evokes the percept of transparency. The\nimage is typically perceived as a superposition of two layers: either a light square\nwith a dark semitransparent square in front of it or a dark square with a light\nsemitransparent square in front of it.\n\nMathematically, our visual system is taking a single image I(x; y) and representing\nas the sum of two images:\n\nI1(x; y) + I2(x; y) = I(x; y)\n\n(1)\n\nWhen phrased this way, the decomposition is surprising. There are obviously an\nin\ufb02nite number of solutions to equation 1, how does our visual system choose one?\nWhy doesn\u2019t our visual system prefer the \\simplest\" explanation I(x; y) = I1(x; y)+\n0 ?\n\n\fa\n\nb\n\nFigure 1: a. A simple image that evokes the percept of transparency. b. A simple\nimage that does not evoke the percept of transparency.\n\nFigure 1b shows a similar image that does not evoke the percept of transparency.\nHere again there are an in\ufb02nite number of solutions to equation 1 but our visual\nsystem prefers the single layer solution.\n\nStudies of the conditions for the percept of transparency go back to the very \ufb02rst re-\nsearch on visual perception (see [1] and references within). Research of this type has\nmade great progress in understanding the types of junctions and their e\ufb01ects (e.g.\nX junctions of a certain type trigger transparency, T junctions do not). However,\nit is not clear how to apply these rules to an arbitrary image.\n\nIn this paper we take a simple Bayesian approach. While equation 1 has an in\ufb02nite\nnumber of possible solutions, if we have prior probabilities P (I1(x; y)); P (I2(x; y))\nthen some of these solutions will be more probable than others. We use the statistics\nof natural images to de\ufb02ne simple priors and \ufb02nally use loopy belief propagation\nto \ufb02nd the most probable decomposition. We show that while the model knows\nnothing about \\T junctions\" or \\X junctions\", it can generate perceptually correct\ndecompositions from a single image.\n\n2 Statistics of natural images\n\nA remarkably robust property of natural images that has received much attention\nlately is the fact that when derivative \ufb02lters are applied to natural images, the \ufb02lter\noutputs tend to be sparse [5, 7]. Figure 2 illustrates this fact: the histogram of the\nhorizontal derivative \ufb02lter is peaked at zero and fall o\ufb01 much faster than a Gaussian.\nSimilar histograms are observed for vertical derivative \ufb02lters and for the gradient\nmagnitude: jrIj.\n\nThere are many ways to describe the non Gaussian nature of this distribution\n(e.g. high kurtosis, heavy tails). Figure 2b illustrates the observation made by\nMallat [4] and Simoncelli [8]: that the distribution is similar to an exponential\ndensity with exponent less than 1. We show the log probability for densities of the\nform p(x) / e\u00a1x\ufb01\n. We assume x 2 [0; 100] and plot the log probabilities so that\nthey agree on p(0); p(100). There is a qualitative di\ufb01erence between distributions\nfor which \ufb01 > 1 (when the log probability is convex) and those for which \ufb01 < 1\n(when it becomes concave). As \ufb02gure 2d shows, the natural statistics for derivative\n\n\fderiv \ufb02lter\n\ncorner operator\n\nx 105\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\u2212150\n\n\u2212100\n\n\u221250\n\n0\n\n100\n\n150\n\n200\n\n250\n\n50\n\nc\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n0\n\n0\n\n50\n\n100\n\n150\n\n200\n\n250\n\nd\n\nx 105\n\n5\n\n4.5\n\n4\n\n3.5\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\u22120.5\n\n0\n\n\u22122\n\n\u22124\n\n\u22126\n\n\u22128\n\n\u221210\n\n\u221212\n\n\u221214\n\n0\n\n0\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\nx 107\n\ne\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\nx 107\n\nf\n\n0\n\n\u22120.2\n\na\n\nGaussian:\u2212x2\n\nb\no\nr\np\ng\no\n\nl\n\n\u22120.4\n\n\u22120.6\n\nLaplacian: \u2212x \n\n\u2212X1/2\n \n\u2212X1/4\n \n\n\u22120.8\n\n\u22121\n0\n\n100\n\n150\n\n50\n\nx\n\nb\n\nFigure 2: a. A natural image. c Histogram of \ufb02lter outputs. e Histogram of corner\ndetector outputs. d,e log histograms.\n\n\ufb02lters has the qualitative nature of a distribution e\u00a1x\ufb01\n\nwith \ufb01 < 1.\n\nIn [9] the sparsity of derivative \ufb02lters was used to decompose an image sequence as\na sum of two image sequences. Will this prior be su\u2013cient for a single frame ? Note\nthat decomposing the image in \ufb02gure 1a into two layers does not change the output\nof derivative \ufb02lters: exactly the same derivatives exist in the single layer solution\nas in the two layer solution. Thus we cannot appeal to the marginal histogram of\nderivative \ufb02lters to explain the percept of transparency in this image.\n\nThere are two ways to go beyond marginal histograms of derivative \ufb02lters. We\ncan either look at joint statistics of derivative \ufb02lters at di\ufb01erent locations or ori-\nentations [6] or look at marginal statistics of more complicated feature detectors\n(e.g. [11]).\n\nWe looked at the marginal statistics of a \\corner detector\". The output of the\n\\corner detector\" at a given location x0; y0 is de\ufb02ned as:\n\nc(x0; y0) = det(X w(x; y)(cid:181)\n\nI 2\nx(x; y)\n\nIx(x; y)Iy(x; y)\n\nIx(x; y)Iy(x; y)\n\nI 2\ny (x; y)\n\n\u00b6)\n\n(2)\n\nwhere w(x; y) is a small Gaussian window around x0; y0 and Ix; Iy are the derivatives\nof the image.\n\nFigures 2e,f show the histogram of this corner operator on a typical natural image.\nAgain, note that it has the qualitative statistic of a distribution e\u00a1x\ufb01\n\nfor \ufb01 < 1.\n\nTo get a more quantitative description of the statistics we used maximum likelihood\nto \ufb02t a distribution of the form P (x) = 1\nto gradient magnitudes and corner\ndetector histograms in a number of images. We found that the histograms shown\nin \ufb02gure 2 are typical: for both gradients and corner detectors the exponent was\nless than 1 and the exponent for the corner detector was smaller than that of the\ngradients. Typical exponents were 0:7 for the derivative \ufb02lter and 0:25 for the corner\ndetector. The scaling parameter a of the corner detector was typically larger than\n\nZ e\u00a1ax\ufb01\n\n\fthat of the gradient magnitude.\n\n3 Simple prior predicts transparency\n\nMotivated by the qualitative statistics observed in natural images we now de\ufb02ne a\nprobability distribution over images. We de\ufb02ne the log probability of an image by\nmeans of a probability over its gradients:\n\nlog P (Ix; Iy) = \u00a1 log Z \u00a1Xx;y \u00a1jrI(x; y)j\ufb01 + \u00b7c(x; y)\ufb02\u00a2\n\n(3)\n\nwith \ufb01 = 0:7; \ufb02 = 0:25. The parameter \u00b7 was determined by the ratio of the scaling\nparameters in the corner and gradient distributions.\n\nGiven a candidate decomposition of an image I into I1 and I2 = I \u00a1 I1 we de\ufb02ne\nthe log probability of the decomposition as the sum of the log probabilities of the\ngradients of I1 and I2. Of course this is only an approximation: we are ignoring\ndependencies between the gradients across space and orientation. Although this is\na weak prior, one can ask: is this enough to predict transparency? That is, is the\nmost probable interpretation of \ufb02gure 1a one with two layers and the most probable\ndecomposition of \ufb02gure 1b one with a single layer?\n\nAnswering this question requires \ufb02nding the global maximum of equation 3. To\ngain some intuition we calculated the log probability of a one dimensional family\nof solutions. We de\ufb02ned s(x; y) the image of a single white square in the same\nlocation as the bottom right square in \ufb02gure 1a,b. We considered decompositions\nof the form I1 = (cid:176)s(x; y),I2 = I \u00a1 I1 and evaluated the log probability for values of\n(cid:176) between \u00a11 and 2.\n\nFigure 3a shows the result for \ufb02gure 1a. The most probable decomposition is the\none that agrees with the percept: (cid:176) = 1 one layer for the white square and another\nfor the gray square. Figure 3b shows the result for \ufb02gure 1b. The most probable\ndecomposition again agrees with the percept: (cid:176) = 0 so that one layer is zero and\nthe second contains the full image.\n\n3.1 The importance of being non Gaussian\n\nEquation 3 can be verbally described as preferring decompositions where the total\nedge and corner detector magnitudes are minimal. Would any cost function that\nhas this preference give the same result?\n\nFigure 3c shows the result with \ufb01 = \ufb02 = 2 for the transparency \ufb02gure (\ufb02gure 1a).\nThis would be the optimal interpretation if the marginal histograms of edge and\ncorner detectors were Gaussian. Now the optimal interpretation indeed contains\ntwo layers but they are not the ones that humans perceive. Thus the non Gaussian\nnature of the histograms is crucial for getting the transparency percept. Similar\n\\non perceptual\" decompositions are obtained with other values of \ufb01; \ufb02 > 1.\n\nWe can get some intuition for the importance of having exponents smaller than\n1 from the following observation which considers the analog of the transparency\nproblem with scalars. We wish to solve the equation a + b = 1 and we have a prior\nover positive scalars of the form P (x).\n\nObservation: The MAP solution to the scalar transparency problem is obtained\nwith a = 1; b = 0 or a = 0; b = 1 if and only if log P (x) is concave.\n\nThe proof follows directly from the de\ufb02nition of concavity.\n\n\f160\n\n140\n\nI=\n\nI1=\n\ng\n\n200\n\n180\n\nI=\n\nI1=g\n\n800\n\n700\n\nI=\n\nI1=g\n\n)\nb\no\nr\np\n(\ng\no\nl\n-\n\n \n\n120\n\n100\n\n)\nb\no\nr\np\n(\ng\no\nl\n-\n\n \n\n160\n\n140\n\n80\n\n60\n-1\n\n120\n\n100\n-1\n\n1\n\n2\n\n0\n\ng\n\na\n\n0\n\n1\n\n2\n\nb\n\n600\n\n500\n\n400\n\n)\nb\no\nr\np\n(\ng\no\nl\n-\n\n \n\n300\n\n200\n\n100\n-1\n\n0\n\n1\n\n2\n\nc\n\nFigure 3: a-b. negative log probability (equation 3) for a sequence of decompo-\nsitions of \ufb02gure 1a,b respectively. The \ufb02rst layer is always a single square with\ncontrast (cid:176) and the second layer is shown in the insets. c. negative log probability\n(equation 3) for a sequence of decompositions of \ufb02gure 1a with \ufb01 = \ufb02 = 2.\n\n4 Optimization using loopy BP\n\nFinding the most likely decomposition requires a highly nonlinear optimization. We\nchose to discretize the problem and use max-product loopy belief propagation to \ufb02nd\nthe optimum. We de\ufb02ned a graphical model in which every node gi corresponded to\na discretization of the gradient of one layer I1 at that location gi = (gix; giy)T . For\nevery value of gi we de\ufb02ned fi which represents the gradient of the second layer at\nthat location: fi = (Ix; Iy)T \u00a1 gi. Thus the two gradients \ufb02elds fgig; ffig represent\na valid decomposition of the input image I.\n\nThe joint probability is given by:\n\nP (g) =\n\n1\n\nZ Yi\n\n\u201ci(gi) Y\n\n\u201cijkl(gi; gj; gk; gl)\n\nwhere < ijkl > refers to four adjacent pixels that form a 2x2 local square.\n\nThe local potential \u201ci(gi) is based on the histograms of derivative \ufb02lters:\n\nwhere T is an arbitrary system \\temperature\".\n\n\u201ci(gi) = e(\u00a1jgj\ufb01\u00a1jf j\ufb01)=T\n\n(4)\n\n(5)\n\nThe fourway potential: \u201cijkl(gi; gj; gk; gl) is based on the histogram of the corner\noperator:\n\u201cijkl(gi; gj; gk; gl) = e\u00a1\u00b7=T (det(gigT\n\nl )\ufb02 +det(fif T\n\nj +gkgT\n\ni +gj gT\n\nk +glgT\n\ni +fj f T\n\nj +fkf T\n\nk +flf T\n\nl )\ufb02)\n\n(6)\nTo enforce integrability of the gradient \ufb02elds the fourway potential is set to zero\nwhen gi; gj; gk; gl violate the integrability constraint (cf. [3]).\n\nThe graphical model de\ufb02ned by equation 4 has many loops. Nevertheless motivated\nby the recent results on similar graphs [2, 3] we ran the max-product belief propa-\ngation algorithm on it. The max-product algorithm \ufb02nds a gradient \ufb02eld fgig that\nis a local maximum of equation 4 with respect to a large neighbourhood [10]. This\ngradient \ufb02eld also de\ufb02nes the complementary gradient \ufb02eld ffig and \ufb02nally we in-\ntegrate the two gradient \ufb02elds to \ufb02nd the two layers. Since equation 4 is completely\nsymmetric in ff g and fgg we break the symmetry by requiring that the gradient\nin a single location gi0 belong to layer 1.\n\nIn order to run BP we need to somehow discretize the space of possible gradients\nat each pixel. Similar to the approach taken in [2] we use the local potentials to\n\ng\ng\n\fInput I\n\nOutput I1\n\nOutput I2\n\nFigure 4: Output of the algorithm on synthetic images. The algorithm e\ufb01ectively\nsearches over an exponentially large number of possible decompositions and chooses\ndecompositions that agree with the percept.\n\nsample a small number of candidate gradients at each pixel. Since the local potential\npenalizes non zero gradients, the most probable candidates are gi = (Ix; Iy) and\ngi = (0; 0). We also added two more candidates at each pixel gi = (Ix; 0) and\ngi = (0; Iy). With this discretization there are still an exponential number of\npossible decompositions of the image. We have found that the results are unchanged\nwhen more candidates are introduced at each pixel.\n\nFigure 4 shows the output of the algorithm on the two images in \ufb02gure 1. An\nanimation that illustrates the dynamics of BP on these images is available at\nwww.cs.huji.ac.il= \u00bbyweiss. Note that the algorithm is essentially searching expo-\nnentially many decompositions of the input images and knows nothing about \\X\njunctions\" or \\T junctions\" or squares. Yet it \ufb02nds the decompositions that are\nconsistent with the human percept.\n\nWill our simple prior also allow us to decompose a sum of two real images ? We\n\ufb02rst tried a one dimensional family of solutions as in \ufb02gure 3. We found that for\nreal images that have very little texture (e.g. \ufb02gure 5b) the maximal probability\nsolution is indeed obtained at the perceptually correct solution. However, nearly\nany other image that we tried had some texture and on such images the model failed\n(e.g. 5a). When there is texture in both layers, the model always prefers a one layer\ndecomposition: the input image plus a zero image. To understand this failure,\nrecall that the model prefers decompositions that have few corners and few edges.\nAccording to the simple \\edge\" and \\corner\" operators that we have used, real\nimages have edges and corners at nearly every pixel so the two layer decomposition\nhas twice as many edges and corners as the one layer decomposition. To decompose\ngeneral real images we need to use more sophisticated features to de\ufb02ne our prior.\n\nEven for images with little texture standard belief propagation with synchronous\n\n\fa\n\nc\n\nb\n\nd\n\nFigure 5: When we sum two arbitrary images (e.g. in a.) the model usually prefers\nthe one layer solution. This is because of the texture that results in gradients and\ncorners at every pixel. For real images that are relatively texture free (e.g. in b.)\nthe model does prefer splitting into two layers (c. and d.)\n\nupdates did not converge. Signi\ufb02cant manual tweaking was required to get BP to\nconverge. First, we manually divided the input image into smaller patches and ran\nBP separately on each patch. Second, to minimize discretization artifacts we used\na di\ufb01erent number of gradient candidates at each pixel and always included the\ngradients of the original images in the list of candidates at that pixel. Third, to\navoid giving too much weight to corners and edges in textured regions, we increased\nthe temperature at pixels where the gradient magnitude was not a local maximum.\nThe results are shown at the bottom of 5. In preliminary experiments we have found\nthat similar results can be obtained with far less tweaking when we use generalized\nbelief propagation to do the optimization.\n\n5 Discussion\n\nThe percept of transparency is a paradigmatic example of the ill-posedness of vision:\nthe number of equations is half the number of unknowns. Nevertheless our visual\nsystems reliably and e\ufb01ectively compute a decomposition of a single image into\ntwo images. In this paper we have argued that this perceptual decomposition may\ncorrespond to the most probable decomposition using a simple prior over images\nderived from natural scene statistics.\n\nWe were surprised with the mileage we got out of the very simple prior we used: even\nthough it only looks at two operators (gradients, and cornerness) it can generate\nsurprisingly powerful predictions. However, our experiments with real images show\nthat this simple prior is not powerful enough.\nIn future work we would like to\nadd additional features. One way to do this is by de\ufb02ning features that look for\n\n\f\\texture edges\" and \\texture corners\" and measuring their statistics in real images.\nA second way to approach this is to use a full exponential family maximum likelihood\nalgorithm (e.g. [11]) that automatically learned which operators to look at as well\nas the weights on the histograms.\n\nReferences\n\n[1] E.H. Adelson. Lightness perception and lightness illusions. In M. Gazzaniga,\n\neditor, The new cognitive neurosciences, 2000.\n\n[2] W.T. Freeman and E.C. Pasztor. Learning to estimate scenes from images.\nIn M.S. Kearns, S.A. Solla, and D.A. Cohn, editors, Adv. Neural Information\nProcessing Systems 11. MIT Press, 1999.\n\n[3] B.J. Frey, R. Koetter, and N. Petrovic. Very loopy belief propagation for\nunwrapping phase images. In Adv. Neural Information Processing Systems 14.\n2001.\n\n[4] S. Mallat. A theory for multiresolution signal decomposition : the wavelet\n\nrepresentation. IEEE Trans. PAMI, 11:674{693, 1989.\n\n[5] B.A. Olshausen and D. J. Field. Emergence of simple-cell receptive \ufb02eld prop-\nerties by learning a sparse code for natural images. Nature, 381:607{608, 1996.\n\n[6] J. Portilla and E. P. Simoncelli. A parametric texture model based on joint\nstatistics of complex wavelet coe\u2013cients. Int\u2019l J. Comput. Vision, 40(1):49{71,\n2000.\n\n[7] E.P. Simoncelli. Statistical models for images:compression restoration and syn-\nthesis. In Proc Asilomar Conference on Signals, Systems and Computers, pages\n673{678, 1997.\n\n[8] E.P. Simoncelli. Bayesian denoising of visual images in the wavelet domain. In\n\nP Mller and B Vidakovic, editors, Wavelet based models, 1999.\n\n[9] Y. Weiss. Deriving intrinsic images from image sequences. In Proc. Intl. Conf.\n\nComputer Vision, pages 68{75. 2001.\n\n[10] Y. Weiss and W.T. Freeman. On the optimality of solutions of the max-\nproduct belief propagation algorithm in arbitrary graphs. IEEE Transactions\non Information Theory, 47(2):723{735, 2001.\n\n[11] Song Chun Zhu, Zing Nian Wu, and David Mumford. Minimax entropy prin-\nciple and its application to texture modeling. Neural Computation, 9(8):1627{\n1660, 1997.\n\n\f", "award": [], "sourceid": 2157, "authors": [{"given_name": "Anat", "family_name": "Levin", "institution": null}, {"given_name": "Assaf", "family_name": "Zomet", "institution": null}, {"given_name": "Yair", "family_name": "Weiss", "institution": null}]}