{"title": "Bayesian Model of Surface Perception", "book": "Advances in Neural Information Processing Systems", "page_first": 787, "page_last": 793, "abstract": "", "full_text": "Bayesian model of surface perception \n\nWilliam T. Freeman \n\nMERL, Mitsubishi Electric Res. Lab. \n\n201 Broadway \n\nCambridge, MA 02139 \nfreeman~erl.com \n\nPaul A. Viola \n\nArtificial Intelligence Lab \n\nMassachusetts Institute of Technology \n\nCambridge, MA 02139 \nviola~ai.mit.edu \n\nAbstract \n\nImage intensity variations can result from several different object \nsurface effects, including shading from 3-dimensional relief of the \nobject, or paint on the surface itself. An essential problem in vision , \nwhich people solve naturally, is to attribute the proper physical \ncause, e.g. surface relief or paint, to an observed image. We ad(cid:173)\ndressed this problem with an approach combining psychophysical \nand Bayesian computational methods. \nWe assessed human performance on a set of test images, and found \nthat people made fairly consistent judgements of surface properties. \nOur computational model assigned simple prior probabilities to \ndifferent relief or paint explanations for an image, and solved for \nthe most probable interpretation in a Bayesian framework. The \nratings of the test images by our algorithm compared surprisingly \nwell with the mean ratings of our subjects. \n\n1 \n\nIntroduction \n\nWhen people study a picture, they can judge whether it depicts a shaded, 3-\ndimensional surface, or simply a flat surface with markings or paint on it. The \ntwo images shown in Figure 1 illustrate this distinction [1]. To many observers \nFigure 1a appears to be a raised plateau lit from the left. Figure 1b is simply a \nre-arrangement of the local features of la, yet it does not give an impression of \nshape or depth. There is no simple correct answer for this problem; either of these \nimages could be explained as marks on paper, or as illuminated shapes. Neverthe(cid:173)\nless people tend to make particular judgements of shape or reflectance. We seek an \nalgorithm to arrive at those same judgements. \n\nThere are many reasons to study this problem. Disentangling shape and reflectance \nis a prototypical underdetermined vision problem, which biological vision systems \nroutinely solve. Insights into this problem may apply to other vision problems \n\n\f788 \n\nW T. Freeman and P. A. Viola \n\nas well. A machine that could interpret images as people do would have many \napplications, such as the interactive editing and manipulation of images. Finally, \nthere is a large body of computer vision work on \"shape from shading\"-inferring \nthe 3-dimensional shape of a shaded object [4]. Virtually every algorithm assumes \nthat all image intensity changes are caused by shading; these algorithms fail for any \nimage with reflectance changes. To bring this body of work into practical use , we \nneed to be able to disambiguate shading from reflectance changes. \n\nThere has been very little work on this problem. Sinha and Adelson [9] examined \na world of painted polyhedra, and used consistancy constraints to identify regions \nof shape and reflectance changes. Their consistancy constraints involved specific \nassumptions which need not always hold and may be better described in a prob(cid:173)\nabilistic framework . In addition , we seek a solution for more general, greyscale \nlmages. \n\nOur approach combines psychophysics and computational modeling. First we will \nreview the physics of image formation and describe the under-constrained surface \nperception problem. We then describe an experiment to measure the interpretations \nof surface shading and reflectance among different individuals. We will see that the \njudgements are fairly consistent across individuals and can be averaged to define \n\"ground truth\" for a set of test images. Our approach to modeling the human \njudgements is Bayesian. We begin by formulating prior probabilities for shapes and \nreflectance images, in the spirit of recent work on the statistical modeling of images \n[5, 8, 11]. Using these priors, the algorithm then determines whether an image is \nmore likely to have been generated by a 3D shape or as a pattern of reflectance. \nWe compare our algorithm's performance to that of the human subjects. \n\n(a) \n\n(b) \n\n(c) \n\n(d) \n\nFigure 1: Images (a) and (b), designed by Adelson [1] , are nearly the same \neverywhere, yet give different percepts of shading and reflectance . (a) looks like \na plateau, lit from the left ; (b) looks like marks on paper. Illustrating the under(cid:173)\nconstrained nature of perception , both images can be explained either by reflectance \nchanges on paper (they are), or, under appropriate lighting conditions, by the \nshapes (c) and (d) , respectively (vertical scale exaggerated). \n\n\fBayesian Model of Surface Perception \n\n2 Physics of Imaging \n\n789 \n\nOne simple model for the generation of an image from a three dimensional shape is \nthe Lambertian model: \n\nI(x, y) = R(x, y) (i. n(x, y)) , \n\n(1) \n\nwhere I (x, y) is an image indexed by pixel location , n( x, y) is the surface normal at \nevery point on the surface conveniently indexed by the pixel to which that surface \npatch projects, i is a unit vector that points in the direction of the light source, \nand R( x, y) is the reflectance at every point on the surface 1 . A patch of surface \nis brighter if the light shines onto it directly and darker if the light shines on it \nobliquely. A patch can also be dark simply because it is painted with a darker \npigment. The shape of the object is probably more easily described as a depth map \nz(x, y) from which n(x, y) is computed. \n\nThe classical \"shape from shading\" task attempts to compute z from I given knowl(cid:173)\nedge of i and assuming R is everywhere constant. Notice that the problem is \"ill(cid:173)\nposed\"; while I( x, y) does constrain n( x, y) it is not sufficient to uniquely determine \nthe surface normal at each pixel. Some assumption about global properties of z is \nnecessary to condition the problem. If R is allowed to vary, the problem becomes \neven more under-constrained. For example, R = I and n( x, y) = j is a valid solution \nfor every image. This is the \"all reflectance\" hypothesis, where the inferred surface \nis flat and\u00b7 all of the image variation is due to reflectance. Interestingly there is also \nan \"all shape\" solution for every image where R =. 1 and I(x, y) = i\u00b7 n(x, y) (see \nFigure 1 for examples of such shapes). \n\nSince the relationship between z and I is non-linear, \"shape from shading\" cannot \nbe solved directly and requires a time consuming search procedure. For our com(cid:173)\nputational experiments we seek a rendering model for shapes which simplifies the \nmathematics, yet maintains the essential ambiguities of the problem. We use the ap(cid:173)\nproximations of linear shading [6]. This involves two sets of approximations. First, \nthat the rendered image I (x, y) is some function, G ( g; , g; ), only of the surface \nslope at any point: \n\n(2) \n\nThe second approximation is that the rendering function G itself is a linear function \nof the surface slopes, \n\noz oz \n\noz \nG( ox' Oy) ~ kl + k2 ox + k3 oy . \n\noz \n\n(3) \n\nUnder linear shading, finding a shape which explains a given image is a trivial \nintegration along the direction of the assumed light source. Despite this simplicity, \nimages rendered under linear shading appear fairly realistically shaded [6]. \n\n3 Psychophysics \n\nWe used a survey to assess subjects' image judgements. We made a set of 60 test \nimages, using Canvas and Photoshop programs to generate and manipulate the \nimages. Our goal was to create a set of images with varying degrees of shadedness. \nWe sought to assess to what extent each subject saw each image as created by \n\n1 Note: we assume orthographic projection, a distant light source, and no shadowing. \n\n\f790 \n\nW T. Freeman and P. A. Vwla \n\nshading changes or reflectance changes. Each of our 18 naive observers was given a \n4 page survey showing the images in a different random order. \n\nTo explain the problem of image interpretation quickly to naive subjects, we used \na concrete story (Adelson's Theater Set Shop analogy [2] is a related didactic ex(cid:173)\nample). The survey instructions were as follows : \n\nPretend that each of the following pictures is a photograph of work \nmade by either a painter or a sculptor. \nThe painter could use paint, markers, air brushes, computer, etc. , \nto make any kind of mark on a fiat canvas. The paint had no \n3-dimensionality; everything was perfectly fiat. \nThe sculptor could make 3-dimensional obJects, but could make no \nmarkings on them . She could mold, sculpt, and scrape her sculp(cid:173)\ntures, but could not draw or paint. All the objects were made out \nof a uniform plaster material and were made visible by lighting and \nshading effects. \n\nThe subjects used a 5-point rating scale to indicate whether each image was made \nby the painter (P) or sculptor (S): S, S? , ?, P?, P. \n\n3.1 Survey Results \n\nWe examined a non-parametric comparison of the image ratings, the rank order \ncorrelation (the linear correlation of image rankings in order of shapeness by each \nobserver) [7]. Over all possible pairings of subjects, the rank order correlations \nranged from 0.3 to 0.9, averaging 0.65. All of these correlations were statistically \nsignificant , most at the 0.0001 level. We concluded that for our set of test images, \npeople do give a very similar set of interpretations of shading and reflectance. \n\nWe assigned a numerical value to each of the 5 survey responses (S=2; S?=l; ?=O; \nP?=-l; P=-2) and found the average numerical \"shadedness\" score for each image. \nFigure 2 shows a histogram of the survey responses for each image, ordered in \ndecreasing order of shadedness. The two images of Figure 1 had average scores of 1.7 \nand -1.6, respectively, confirming the impressions of shading and reflectance . There \nwas good consensus for the rankings of the most paint-like and most sculpture(cid:173)\nlike images; the middle images showed a higher score variance. The rankings by \neach individual showed a strong correlation with the rankings by the average of the \nremaining subjects, ranging from 0.6 to 0.9 . Figure 4 shows the histogram of those \ncorrelations. The ordering of the images by the average of the subjects' responses \nprovides a \"ground truth\" with which to compare the rankings of our algorithm. \nFigure 3, left , shows a randomly chosen subset of the sorted images , in decreasing \norder of assessed sculptureness . \n\n4 Algorithm \n\nWe will assume that people are choosing the most probable interpretation of the \nobserved image. We will adopt a Bayesian approach and calculate the most probable \ninterpretation for each image under a particular set of prior probabilities for images \nand shapes. To parallel the choices we gave our subjects, we will choose between \ninterpretations that account for the image entirely by shape changes, or entirely \nby reflectance changes. Thus, our images are either a rendered shape, multiplied \nby a uniform reflectance image , or a flat shape, multiplied by some non-uniform \nreflectance image. \n\n\fBayesian Model of Surface Perception \n\n791 \n\nintensity: score frequency for each image \n\nl!!S \no \n(,) cnp \n\n10 \n\n20 \n\n50 \nimage number (sorted by shapeness) \n\n30 \n\n40 \n\n60 \n\nFigure 2: Histogram of survey responses. Intensity shows the number of responses \nof each score (vertical scale) for each image (horizontal, sorted in increasing order \nof shape ness ). \n\nTo find the most probable interpretation, given an image, we need to assign prior \nprobabilities to shape and reflectance configurations. There has been recent inter(cid:173)\nest in characterizing the probabilities of images by the expected distributions of \nsubband coefficient values [5, 8, 11]. The statistical distribution of bandpass linear \nfilter outputs, for natural images, is highly kurtotic; the output is usually small , \nbut in rare cases it takes on very large values. This non-gaussian behavior is not a \nproperty of the filter operation, because filtered \"random\" images appear gaussian. \nRather it is a property of the structure of natural images. An exponential distri(cid:173)\nbution, P(c) ex: e- 1cl , where c is the filter coefficient value, is a reasonable model. \nThese priors have been used in texture synthesis, noise removal, and receptive field \nmodeling. Here, we apply them to the task of scene interpretation. \n\nWe explored using a very simple image prior: \n\nP(I) ex: exp (- L \n\nr,y \n\noI(x,y)2 + OI(x , y)2) \n\nox \n\noy \n\n(4) \n\nHere we treat the image derivative as an image subband corresponding to a very \nsimple filter. We applied this image prior to both reflectance images, I(x , y), as \nwell as range images , z(x, y). \nFor any given picture, we seek to decide whether a shape or a reflectance explanation \nis more probable. The proper Bayesian approach would be to integrate the prior \nprobabilities of all shapes which could explain the image in order to arrive at the \ntotal probability of a shape explanation. (The reflectance explanation, R is unique ; \nthe image itself). We employed a computationally simpler procedure, a very rough \napproximation to the proper calculation: we evaluated the prior probability, P(S) \nof the single, most probable shape explanation, S, for the image. Using the ratio \ntest of a binary hypothesis, we formed a shapeness index, J, by the ratio of the \nprobabilities for the shape and reflectance explanations, J = ~i!~. The index J \nwas used to rank the test images by shapeness. \n\nWe need to find the most probable shape explanation. The overall log likelihood of \na shape, z, given an image is, using the linear shading approximations of Eq. (3): \n\nlog P(z, kl' k2, k31I) \n\nlog P(Ilz, kl' k2, k3) + log P(z) + c \nLx,y(I - kl + k2* + k3~~)2 + Lx,y J g~2 + g;2 + c, \n\n(5) \nwhere c is a normalization constant. We use a multi-scale gradient descent algorithm \nthat simultaneously determines the optimal shape and illumination parameters for \nan image (similar to that used by [10]). The optimization procedure has three \nstages starting with a quarter resolution version of I, and moving to the half and \n\n\f792 \n\nW. T. Freeman and P. A. Vwla \n\nFigure 3: 28 of the 60 test images, arranged in decreasing order of subjects' \nshapeness ratings. Left: Subjects' rankings. Right: Algorithm 's rankings . \n\nthen full resolution. The solution found at the low resolution is interpolated up to \nthe next level and is used as a starting point for the next step in the optimization. \nIn our experiments images are 128x128 pixels. The optimization procedure takes \n4000 descent steps at each resolution level. \n\n5 Results \n\nSurprisingly, the simple prior probability of Eq. (4) accounts for much of the ratings \nof shading or paint by our human subjects. Figure 3 compares the rankings (shown \nin raster scan order) of a subset of the test images for our algorithm and the average \nof our subjects. The overall agreement is good. Figure 4 compares two measures: (1) \nthe correlations (dark bars) of the subjects' individual ratings to the mean subject \nrating with (2) the correlation of our algorithm's ratings to the mean subject rating. \nSUbjects show correlations between 0.6 and 0.9; our Bayesian algorithm showed a \ncorrelation of 0.64. Treating the mean subjects' ratings as the right answer, our \nalgorithm did worse than most subjects but not as badly as some subjects. \n\nFigure 1 illustrates how our algorithm chooses an interpretation for an image. If a \nsimple shape explains an image, such as the shape explanation (c) for image (a), \nthe shape gradient penalties will be small, assigning a high prior probability to that \nshape. If a complicated shape (d) is required to explain a simple image (b), the \n\n\fBayesian Model of Surface Perception \n\n793 \n\nlow prior probability of the shape and the high prior probability of the reflectance \nimage will favor a \"paint\" explanation. \n\nWe noted that many of the shapes inferred from paint-like images showed long \nridges coincidently aligned with the assumed light direction. The assumption of \ngeneric light direction can be applied in a Bayesian framework [3] to penalize such \ncoincidental alignments. We speculate that such a term would further penalize \nthose unlikely shape interpretations and may improve algorithm performance. \n\nRank order corralaUon with mean Imago rating \n\n0.2 \n\nFigure 4: Correlation of individual subjects' image ratings compared with the \nmean rating (bars) compared with correlation of algorithm's rating with the mean \nrating (dashed line). \n\nAcknow ledgements \n\nWe thank E. Adelson, D. Brainard, and J. Tenenbaum for helpful discussions. \n\nReferences \n\n[1] E. H. Adelson, 1995. personal communication. \n[2] E. H. Adelson and A. P. Pentland. The perception of shading and reflectance. In \n\nB. Blum, editor, Channels in the Visual Nervous System: Neurophysiology, Psy(cid:173)\nchophysics, and Models, pages 195-207. Freund Publishing, London, 1991. \n\n[3] W. T. Freeman. The generic viewpoint assumption in a framework for visual percep(cid:173)\n\ntion. Nature, 368(6471):542-545, April 7 1994. \n\n[4] B. K. P. Horn and M. J. Brooks. Shape from shading. MIT Press, Cambridge, MA, \n\n1989. \n\n[5] B. A . Olshausen and D. J. Field. Emergence of simple-cell receptive field properties \n\nby learning a sparse code for \u00b7natural images. Nature, 381:607-609, 1996. \n\n[6] A. P. Pentland. Linear shape from shading. Inti. J. Compo Vis., 1(4):153-162, 1990. \n[7] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes \n\nin C. Cambridge Univ. Press, 1992. \n\n[8] E. P. Simoncelli and E. H. Adelson. Noise removal via Bayesian wavelet coring. In \n\n3rd Annual Inti. Con/. on Image Processing, Laussanne, Switzerland, 1996. IEEE. \n\n[9] P. Sinha and E. H. Adelson . Recovering reflectance and illumination in a world of \npainted polyhedra. In Proc. 4th Intl. Con/. Computer Vision, pages 156-163. IEEE, \n1993. \n\n[10] D. Terzopoulos. Multilevel computational processes for visual surface reconstruction. \n\nComp o Vis., Graphics, Image Proc., 24:52-96 , 1983. \n\n[11] S. C. Zhu and D. Mumford. Learning generic prior models for visual computation. \n\nSubmitted to IEEE Trans. PAMI, 1997. \n\n\f", "award": [], "sourceid": 1421, "authors": [{"given_name": "William", "family_name": "Freeman", "institution": null}, {"given_name": "Paul", "family_name": "Viola", "institution": null}]}