{"title": "Recovering Intrinsic Images from a Single Image", "book": "Advances in Neural Information Processing Systems", "page_first": 1367, "page_last": 1374, "abstract": null, "full_text": "Recovering Intrinsic Images from a Single Image\n\nMarshall F Tappen\n\nWilliam T Freeman\n\nEdward H Adelson\n\nMIT Arti\ufb01cial Intelligence Laboratory\n\nCambridge, MA 02139\n\nmtappen@ai.mit.edu, wtf@ai.mit.edu, adelson@ai.mit.edu\n\nAbstract\n\nWe present an algorithm that uses multiple cues to recover shading and\nre\ufb02ectance intrinsic images from a single image. Using both color in-\nformation and a classi\ufb01er trained to recognize gray-scale patterns, each\nimage derivative is classi\ufb01ed as being caused by shading or a change in\nthe surface\u2019s re\ufb02ectance. Generalized Belief Propagation is then used to\npropagate information from areas where the correct classi\ufb01cation is clear\nto areas where it is ambiguous. We also show results on real images.\n\n1 Introduction\n\nEvery image is the product of the characteristics of a scene. Two of the most important\ncharacteristics of the scene are its shading and re\ufb02ectance. The shading of a scene is the\ninteraction of the surfaces in the scene and the illumination. The re\ufb02ectance of the scene\ndescribes how each point re\ufb02ects light. The ability to \ufb01nd the re\ufb02ectance of each point\nin the scene and how it is shaded is important because interpreting an image requires the\nability to decide how these two factors affect the image. For example, the geometry of\nan object in the scene cannot be recovered without being able to isolate the shading of\nevery point. Likewise, segmentation would be simpler given the re\ufb02ectance of each point\nin the scene. In this work, we present a system which \ufb01nds the shading and re\ufb02ectance\nof each point in a scene by decomposing an input image into two images, one containing\nthe shading of each point in the scene and another image containing the re\ufb02ectance of\neach point. These two images are types of a representation known as intrinsic images [1]\nbecause each image contains one intrinsic characteristic of the scene.\n\nMost prior algorithms for \ufb01nding shading and re\ufb02ectance images can be broadly classi-\n\ufb01ed as generative or discriminative approaches. The generative approaches create possible\nsurfaces and re\ufb02ectance patterns that explain the image, then use a model to choose the\nmost likely surface. Previous generative approaches include modeling worlds of painted\npolyhedra [11] or constructing surfaces from patches taken out of a training set [3]. In\ncontrast, discriminative approaches attempt to differentiate between changes in the image\ncaused by shading and those caused by a re\ufb02ectance change. Early algorithms, such as\nRetinex [8], were based on simple assumptions, such as the assumption that the gradients\nalong re\ufb02ectance changes have much larger magnitudes than those caused by shading. That\nassumption does not hold for many real images, so recent algorithms have used more com-\nplex statistics to separate shading and re\ufb02ectance. Bell and Freeman [2] trained a classi\ufb01er\nto use local image information to classify steerable pyramid coef\ufb01cients as being due to\n\n\fshading or re\ufb02ectance. Using steerable pyramid coef\ufb01cients allowed the algorithm to clas-\nsify edges at multiple orientations and scales. However, the steerable pyramid decomposi-\ntion has a low-frequency residual component that cannot be classi\ufb01ed. Without classifying\nthe low-frequency residual, only band-pass \ufb01ltered copies of the shading and re\ufb02ectance\nimages can be recovered. In addition, low-frequency coef\ufb01cients may not have a natural\nclassi\ufb01cation.\n\nIn a different direction, Weiss [13] proposed using multiple images where the re\ufb02ectance\nis constant, but the illumination changes. This approach was able to create full frequency\nimages, but required multiple input images of a \ufb01xed scene.\n\nIn this work, we present a system which uses multiple cues to recover full-frequency shad-\ning and re\ufb02ectance intrinsic images from a single image. Our approach is discriminative,\nusing both a classi\ufb01er based on color information in the image and a classi\ufb01er trained to rec-\nognize local image patterns to distinguish derivatives caused by re\ufb02ectance changes from\nderivatives caused by shading. We also address the problem of ambiguous local evidence\nby using a Markov Random Field to propagate the classi\ufb01cations of those areas where the\nevidence is clear into ambiguous areas of the image.\n\n2 Separating Shading and Re\ufb02ectance\n\nOur algorithm decomposes an image into shading and re\ufb02ectance images by classifying\neach image derivative as being caused by shading or a re\ufb02ectance change. We assume that\nthe input image, I(x; y), can be expressed as the product of the shading image, S(x; y), and\nthe re\ufb02ectance image, R(x; y). Considering the images in the log domain, the derivatives\nof the input image are the sum of the derivatives of the shading and re\ufb02ectance images. It is\nunlikely that signi\ufb01cant shading boundaries and re\ufb02ectance edges occur at the same point,\nthus we make the simplifying assumption that every image derivative is either caused by\nshading or re\ufb02ectance. This reduces the problem of specifying the shading and re\ufb02ectance\nderivatives to that of binary classi\ufb01cation of the image\u2019s x and y derivatives.\nLabelling each x and y derivative produces estimates of the derivatives of the shading and\nre\ufb02ectance images. Each derivative represents a set of linear constraints on the image\nand using both derivative images results in an over-constrained system. We recover each\nintrinsic image from its derivatives by using the method introduced by Weiss in [13] to \ufb01nd\nthe pseudo-inverse of the over-constrained system of derivatives. If fx and fy are the \ufb01lters\nused to compute the x and y derivatives and Fx and Fy are the estimated derivatives of\nshading image, then the shading image, S(x; y) is:\n\nS(x; y) = g ? [(fx((cid:0)x; (cid:0)y) ? Fx) + (fy((cid:0)x; (cid:0)y) ? Fy)]\n\n(1)\n\nwhere ? is convolution, f ((cid:0)x; (cid:0)y) is a reversed copy of f (x; y), and g is the solution of\n\ng ? [(fx((cid:0)x; (cid:0)y) ? fx(x; y)) + (fy((cid:0)x; (cid:0)y) ? fx(x; y))] = (cid:14)\n\n(2)\n\nThe re\ufb02ectance image is found in the same fashion. One nice property of this technique is\nthat the computation can be done using the FFT, making it more computationally ef\ufb01cient.\n\n3 Classifying Derivatives\n\nWith an architecture for recovering intrinsic images, the next step is to create the classi\ufb01ers\nto separate the underlying processes in the image. Our system uses two classi\ufb01ers, one\nwhich uses color information to separate shading and re\ufb02ectance derivatives and a second\nclassi\ufb01er that uses local image patterns to classify each derivative.\n\n\fOriginal Image\n\nShape Image\n\nRe\ufb02ectance Image\n\nFigure 1: Example computed using only color information to classify derivatives. To fa-\ncilitate printing, the intrinsic images have been computed from a gray-scale version of the\nimage. The color information is used solely for classifying derivatives in the gray-scale\ncopy of the image.\n\n3.1 Using Color Information\n\nOur system takes advantage of the property that changes in color between pixels indicate\na re\ufb02ectance change [10]. When surfaces are diffuse, any changes in a color image due to\nshading should affect all three color channels proportionally. Assume two adjacent pixels\nin the image have values c1 and c2, where c1 and c2 are RGB triplets.\nIf the change\nbetween the two pixels is caused by shading, then only the intensity of the color changes\nand c2 = (cid:11)c1 for some scalar (cid:11). If c2 6= (cid:11)c1, the chromaticity of the colors has changed\nand the color change must have been caused by a re\ufb02ectance change. A chromaticity\nchange in the image indicates that the re\ufb02ectance must have changed at that point.\n\nTo \ufb01nd chromaticity changes, we treat each RGB triplet as a vector and normalize them\nto create ^c1 and ^c2. We then use the angle between ^c1 and ^c2 to \ufb01nd re\ufb02ectance changes.\nWhen the change is caused by shading, (^c1 (cid:1) ^c2) equals 1. If (^c1 (cid:1) ^c2) is below a threshold,\nthen the derivative associated with the two colors is classi\ufb01ed as a re\ufb02ectance derivative.\nUsing only the color information, this approach is similar to that used in [6]. The primary\ndifference is that our system classi\ufb01es the vertical and horizontal derivatives independently.\n\nFigure 1 shows an example of the results produced by the algorithm. The classi\ufb01er marked\nall of the re\ufb02ectance areas correctly and the text is cleanly removed from the bottle. This\nexample also demonstrates the high quality reconstructions that can be obtained by classi-\nfying derivatives.\n\n3.2 Using Gray-Scale Information\n\nWhile color information is useful, it is not suf\ufb01cient to properly decompose images. A\nchange in color intensity could be caused by either shading or a re\ufb02ectance change. Using\nonly local color information, color intensity changes cannot be classi\ufb01ed properly. Fortu-\nnately, shading patterns have a unique appearance which can be discriminated from most\ncommon re\ufb02ectance patterns. This allows us to use the local gray-scale image pattern sur-\nrounding a derivative to classify it.\n\nThe basic feature of the gray-scale classi\ufb01er is the absolute value of the response of a linear\n\ufb01lter. We refer to a feature computed in this manner as a non-linear \ufb01lter. The output of a\nnon-linear, F , given an input patch Ip is\n\nF = jIp ? wj\n\n(3)\n\nwhere ? is convolution and w is a linear \ufb01lter. The \ufb01lter, w is the same size as the image\npatch, I, and we only consider the response at the center of Ip. This makes the feature\na function from a patch of image data to a scalar response. This feature could also be\nviewed as the absolute value of the dot product of Ip and w. We use the responses of linear\n\n\fFigure 2: Example images from the training set. The \ufb01rst two are examples of re\ufb02ectance\nchanges and the last three are examples of shading\n\n(a) Original Image\n\n(b) Shading Image\n\n(c) Re\ufb02ectance Image\n\nFigure 3: Results obtained using the gray-scale classi\ufb01er.\n\n\ufb01lters as the basis for our feature, in part, because they have been used successfully for\ncharacterizing [9] and synthesizing [7] images of textured surfaces.\n\nThe non-linear \ufb01lters are used to classify derivatives with a classi\ufb01er similar to that used\nby Tieu and Viola in [12]. This classi\ufb01er uses the AdaBoost [4] algorithm to combine a\nset of weak classi\ufb01ers into a single strong classi\ufb01er. Each weak classi\ufb01er is a threshold\ntest on the output of one non-linear \ufb01lter. At each iteration of the AdaBoost algorithm, a\nnew weak classi\ufb01er is chosen by choosing a non-linear \ufb01lter and a threshold. The \ufb01lter\nand threshold are chosen greedily by \ufb01nding the combination that performs best on the\nre-weighted training set. The linear \ufb01lter in each non-linear \ufb01lter is chosen from a set of\noriented \ufb01rst and second derivative of Gaussian \ufb01lters.\n\nThe training set consists of a mix of images of rendered fractal surfaces and images of\nshaded ellipses placed randomly in the image. Examples of re\ufb02ectance changes were cre-\nated using images of random lines and images of random ellipse painted onto the image.\nSamples from the training set are shown in 2. In the training set, the illumination is always\ncoming from the right side of the image. When evaluating test images, the classi\ufb01er will\nassume that the test image is also lit from the right.\n\nFigure 3 shows the results of our system using only the gray-scale classi\ufb01er. The results\ncan be evaluated by thinking of the shading image as how the scene should appear if it\nwere made entirely of gray plastic. The re\ufb02ectance image should appear very \ufb02at, with the\nthe three-dimensional depth cues placed in the shading image. Our system performs well\non the image shown in Figure 3. The shading image has a very uniform appearance, with\nalmost all of the effects of the re\ufb02ectance changes placed in the re\ufb02ectance image.\n\nThe examples shown are computed without taking the log of the input image before pro-\ncessing it. The input images are uncalibrated and ordinary photographic tonescale is very\nsimilar to a log transformation. Errors from not taking log of the input image \ufb01rst would\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 4: An example where propagation is needed. The smile from the pillow image\nin (a) has been enlarged in (b). Figures (c) and (d) contain an example of shading and a\nre\ufb02ectance change, respectively. Locally, the center of the mouth in (b) is as similar to the\nshading example in (c) as it is to the example re\ufb02ectance change in (d).\n\n(a) Original Image\n\n(b) Shading Image\n\n(c) Re\ufb02ectance Image\n\nFigure 5: The pillow from Figure 4. This is found by combining the local evidence from\nthe color and gray-scale classi\ufb01ers, then using Generalized Belief Propagation to propagate\nlocal evidence.\n\ncause one intrinsic image to modulate the local brightness of the other. However, this does\nnot occur in the results.\n\n4 Propagating Evidence\n\nWhile the classi\ufb01er works well, there are still areas in the image where the local infor-\nmation is ambiguous. An example of this is shown in Figure 4. When compared to the\nexample shading and re\ufb02ectance change in Figure 4(c) and 4(d), the center of the mouth in\nFigure 4(b) is equally well classi\ufb01ed with either label. However, the corners of the mouth\ncan be classi\ufb01ed as being caused by a re\ufb02ectance change with little ambiguity. Since the\nderivatives in the corner of the mouth and the center all lie on the same image contour, they\nshould have the same classi\ufb01cation. A mechanism is needed to propagate information from\nthe corners of the mouth, where the classi\ufb01cation is clear, into areas where the local evi-\ndence is ambiguous. This will allow areas where the classi\ufb01cation is clear to disambiguate\nthose areas where it is not.\n\nIn order to propagate evidence, we treat each derivative as a node in a Markov Random\nField with two possible states, indicating whether the derivative is caused by shading or\ncaused by a re\ufb02ectance change. Setting the compatibility functions between nodes correctly\nwill force nodes along the same contour to have the same classi\ufb01cation.\n\n\f4.1 Model for the Potential Functions\n\nEach node in the MRF corresponds to the classi\ufb01cation of a derivative. We constrain the\ncompatibility functions for two neighboring nodes, xi and xj, to be of the form\n\n (xi; xj) = (cid:20)\n\n(cid:12)\n\n1 (cid:0) (cid:12)\n\n1 (cid:0) (cid:12)\n\n(cid:12)\n\n(cid:21)\n\n(4)\n\nwith 0 (cid:20) (cid:12) (cid:20) 1.\nThe term (cid:12) controls how much the two nodes should in\ufb02uence each other. Since derivatives\nalong an image contour should have the same classi\ufb01cation, (cid:12) should be close to 1 when\ntwo neighboring derivatives are along a contour and should be 0.5 when no contour is\npresent.\nSince (cid:12) depends on the image at each point, we express it as (cid:12)(Ixy), where Ixy is the\nimage information at some point. To ensure (cid:12)(Ixy) between 0 and 1, it is modelled as\n(cid:12)(Ixy) = g(z(Ixy)), where g((cid:1)) is the logistic function and z(Ixy) has a large response\nalong image contours.\n\n4.2 Learning the Potential Functions\n\nThe function z(Ixy) is based on two local image features, the magnitude of the image and\nthe difference in orientation between the gradient and the orientation of the graph edge.\nThese features re\ufb02ect our heuristic that derivatives along an image contour should have the\nsame classi\ufb01cation.\nThe difference in orientation between a horizontal graph edge and image contour, ^(cid:30), is\nfound from the orientation of the image gradient, (cid:30). Assuming that (cid:0)(cid:25)=2 (cid:20) (cid:30) (cid:20) (cid:25)=2, the\nangle between a horizontal edge and the image gradient, ^(cid:30), is ^(cid:30) = j(cid:30)j. For vertical edges,\n^(cid:30) = j(cid:30)j (cid:0) (cid:25)=2.\nTo \ufb01nd the values of z((cid:1)) we maximize the probability of a set of the training examples\nover the parameters of z((cid:1)). The examples are taken from the same set used to train the\ngray-scale classi\ufb01ers. The probability of training samples is\n\nP =\n\n1\n\nZ Y(i;j)\n\n (xi; xj)\n\n(5)\n\nwhere all (i; j) are the indices of neighboring nodes in the MRF and Z is a normalization\nconstant. Note that each  ((cid:1)) is a function of z(Ixy).\nThe function relating the image features to  ((cid:1)), z((cid:1)), is chosen to be a linear function and\nis found by maximizing equation 5 over a set of training images similar to those used to\ntrain the local classi\ufb01er. In order to simplify the training process, we approximate the true\nprobability in Equation 5 by assuming that Z is constant. Doing so leads to the following\nvalue of z((cid:1)):\n\nz( ^(cid:30); jrIj) = (cid:0)1:2 (cid:2) ^(cid:30) + 1:62 (cid:2) jrIj + 2:3\n\n(6)\n\nwhere jrIj is the magnitude of the image gradient and both ^(cid:30) and jrIj have been nor-\nmalized to be between 0 and 1. These measures break down in areas with a weak gra-\ndient, so we set (cid:12)(Ixy) to 0.5 for regions of the image with a gradient magnitude less\nthan 0.05. Combined with the values learned for z((cid:1)), this effectively limits (cid:12) to the range\n0:5 (cid:20) (cid:12) (cid:20) 1.\nLarger values of z((cid:1)) correspond to a belief that the derivatives connected by the edge\nshould have the same value, while negative values signify that the derivatives should have\n\n\f(a) Original Image\n\n(b) Shading Image\n\n(c) Re\ufb02ectance Image\n\nFigure 6: Example generated by combining color and gray-scale information, along with\nusing propagation.\n\na different value. The values in equation 6 correspond with our expected results; two\nderivatives are constrained to have the same value when they are along an edge in the\nimage that has a similar orientation to the edge in the MRF connecting the two nodes.\n\n4.3 Inferring the Correct Labelling\n\nOnce the compatibility functions have been learned, the label of each derivative can be\ninferred. The local evidence for each node in the MRF is obtained from the results of the\ncolor classi\ufb01er and from the gray-scale classi\ufb01er by assuming that the two are statistically\nindependent. It is necessary to use the color information because propagation cannot help\nin areas where the gray-scale classi\ufb01er misses an edge altogether. In Figure 5, the cheek\npatches on the pillow, which are pink in the color image, are missed by the gray-scale\nclassi\ufb01er, but caught by the color classi\ufb01er. For the results shown, we used the results of\nthe AdaBoost classi\ufb01er to classify the gray-scale images and used the method suggested by\nFriedman et al. to obtain the probability of the labels [5].\n\nWe used the Generalized Belief Propagation algorithm [14] to infer the best label of each\nnode in the MRF because ordinary Belief Propagation performed poorly in areas with both\nweak local evidence and strong compatibility constraints. The results of using color, gray-\nscale information, and propagation can be seen in Figure 5. The ripples on the pillow are\ncorrectly identi\ufb01ed as being caused by shading, while the face is correctly identi\ufb01ed as\nhaving been painted on. In a second example, shown in Figure 6, the algorithm correctly\nidenti\ufb01es the change in re\ufb02ectance between the sweatshirt and the jersey and correctly iden-\nti\ufb01es the folds in the clothing as being caused by shading. There are some small shading\nartifacts in the re\ufb02ectance image, especially around the sleeves of the sweatshirt, presum-\nably caused by particular shapes not present in the training set. All of the examples were\ncomputed using ten non-linear \ufb01lters as input for the AdaBoost gray-scale classi\ufb01er.\n\n5 Discussion\n\nWe have presented a system that is able to use multiple cues to produce shading and re-\n\ufb02ectance intrinsic images from a single image. This method is also able to produce sat-\nisfying results for real images. The most computationally intense steps for recovering the\nshading and re\ufb02ectance images are computing the local evidence, which takes about six\nminutes on a 700MHz Pentium for a 256 (cid:2) 256 image, and running the Generalized Belief\nPropagation algorithm. Belief propagation was used on both the x and y derivative images\n\n\fand took around 6 minutes to run 200 iterations on each image. The pseudo-inverse process\ntook under 5 seconds.\n\nThe primary limitation of this method lies in the classi\ufb01ers. For each type of surface, the\nclassi\ufb01ers must incorporate knowledge about the structure of the surface and how it appears\nwhen illuminated. The present classi\ufb01ers operate at a single spatial scale, however the MRF\nframework allows the integration of information from multiple scales.\n\nAcknowledgments\n\nPortions of this work were completed while W.T.F was a Senior Research Scientist and\nM.F.T was a summer intern at Mitsubishi Electric Research Labs. This work was supported\nby an NDSEG fellowship to M.F.T, by NIH Grant EY11005-04 to E.H.A., by a grant from\nNTT to E.H.A., and by a contract with Unilever Research.\n\nReferences\n\n[1] H. G. Barrow and J. M. Tenenbaum. Recovering intrinsic scene characteristics from\n\nimages. In Computer Vision Systems, pages 3\u201326. Academic Press, 1978.\n\n[2] M. Bell and W. T. Freeman. Learning local evidence for shading and re\ufb02ection. In\n\nProceedings International Conference on Computer Vision, 2001.\n\n[3] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael. Learning low-level vision. In-\n\nternational Journal of Computer Vision, 40(1):25\u201347, 2000.\n\n[4] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning\nand an application to boosting. Journal of Computer and System Sciences, 55(1):119\u2013\n139, 1997.\n\n[5] J. Friedman, T. Hastie, and R. Tibshirami. Additive logistic regression: A statistical\n\nview of boosting. The Annals of Statistics, 38(2):337\u2013374, 2000.\n\n[6] B. V. Funt, M. S. Drew, and M. Brockington. Recovering shading from color images.\nIn G. Sandini, editor, ECCV-92: Second European Conference on Computer Vision,\npages 124\u2013132. Springer-Verlag, May 1992.\n\n[7] D. Heeger and J. Bergen. Pyramid-based texture analysis/synthesis.\n\nGraphics Proceeding, SIGGRAPH 95, pages 229\u2013238, August 1995.\n\nIn Computer\n\n[8] E. H. Land and J. J. McCann. Lightness and retinex theory. Journal of the Optical\n\nSociety of America, 61:1\u201311, 1971.\n\n[9] T. Leung and J. Malik. Recognizing surfaces using three-dimensional textons.\n\nIEEE International Conference on Computer Vision, 1999.\n\nIn\n\n[10] J. M. Rubin and W. A. Richards. Color vision and image intensities: When are\n\nchanges material. Biological Cybernetics, 45:215\u2013226, 1982.\n\n[11] P. Sinha and E. H. Adelson. Recovering re\ufb02ectance in a world of painted polyhedra.\nIn Fourth International Conference on Computer Vision, pages 156\u2013163. IEEE, 1993.\n[12] K. Tieu and P. Viola. Boosting image retrieval. In Proceedings IEEE Computer Vision\n\nand Pattern Recognition, volume 1, pages 228\u2013235, 2000.\n\n[13] Y. Weiss. Deriving intrinsic images from image sequences. In Proceedings Interna-\n\ntional Conference on Computer Vision, Vancouver, Canada, 2001. IEEE.\n\n[14] J. Yedidia, W. T. Freeman, and Y. Weiss. Generalized belief propagation. In Advances\n\nin Neural Information Processing Systems 13, pages 689\u2013695, 2001.\n\n\f", "award": [], "sourceid": 2286, "authors": [{"given_name": "Marshall", "family_name": "Tappen", "institution": null}, {"given_name": "William", "family_name": "Freeman", "institution": null}, {"given_name": "Edward", "family_name": "Adelson", "institution": null}]}