{"title": "Shape and Illumination from Shading using the Generic Viewpoint Assumption", "book": "Advances in Neural Information Processing Systems", "page_first": 226, "page_last": 234, "abstract": "The Generic Viewpoint Assumption (GVA) states that the position of the viewer or the light in a scene is not special. Thus, any estimated parameters from an observation should be stable under small perturbations such as object, viewpoint or light positions. The GVA has been analyzed and quantified in previous works, but has not been put to practical use in actual vision tasks. In this paper, we show how to utilize the GVA to estimate shape and illumination from a single shading image, without the use of other priors. We propose a novel linearized Spherical Harmonics (SH) shading model which enables us to obtain a computationally efficient form of the GVA term. Together with a data term, we build a model whose unknowns are shape and SH illumination. The model parameters are estimated using the Alternating Direction Method of Multipliers embedded in a multi-scale estimation framework. In this prior-free framework, we obtain competitive shape and illumination estimation results under a variety of models and lighting conditions, requiring fewer assumptions than competing methods.", "full_text": "Shape and Illumination from Shading using the\n\nGeneric Viewpoint Assumption\n\nDaniel Zoran \u2217\nCSAIL, MIT\n\ndanielz@mit.edu\n\nDilip Krishnan \u2217\nCSAIL, MIT\n\ndilipkay@mit.edu\n\nJose Bento\n\nBoston College\n\njose.bento@bc.edu\n\nWilliam T. Freeman\n\nCSAIL, MIT\n\nbillf@mit.edu\n\nAbstract\n\nThe Generic Viewpoint Assumption (GVA) states that the position of the viewer\nor the light in a scene is not special. Thus, any estimated parameters from an\nobservation should be stable under small perturbations such as object, viewpoint\nor light positions. The GVA has been analyzed and quanti\ufb01ed in previous works,\nbut has not been put to practical use in actual vision tasks. In this paper, we show\nhow to utilize the GVA to estimate shape and illumination from a single shading\nimage, without the use of other priors. We propose a novel linearized Spherical\nHarmonics (SH) shading model which enables us to obtain a computationally ef-\n\ufb01cient form of the GVA term. Together with a data term, we build a model whose\nunknowns are shape and SH illumination. The model parameters are estimated\nusing the Alternating Direction Method of Multipliers embedded in a multi-scale\nestimation framework. In this prior-free framework, we obtain competitive shape\nand illumination estimation results under a variety of models and lighting condi-\ntions, requiring fewer assumptions than competing methods.\n\n1\n\nIntroduction\n\nThe generic viewpoint assumption (GVA) [5, 9, 21, 22] postulates that what we see in the world\nis not seen from a special viewpoint, or lighting condition. Figure 1 demonstrates this idea with\nthe famous Necker cube example1. A three dimensional cube may be observed with two vertices\nor edges perfectly aligned, giving rise to a two dimensional interpretation. Another possibility is\na view that exposes only one of the faces of the cube, giving rise to a square. However, these 2D\nviews are unstable to slight perturbations in viewing position. Other examples in [9] and [22] show\nsituations where views are unstable to lighting rotations.\nWhile there has been interest in the GVA in the psychophysics community [22, 12], to the best of\nour knowledge, this principle seems to have been largely ignored in the computer vision community.\nOne notable exception is the paper by Freeman [9] which gives a detailed analytical account on how\nto incorporate the GVA in a Bayesian framework. In that paper, it is shown that using the GVA\nmodi\ufb01es the probability space of different explanations to a scene, preferring perceptually valid and\nstable solutions to contrived and unstable ones, even though all of these fully explain the observed\nimage. No algorithm incorporating the GVA, beyond exhaustive search, was proposed.\n\n\u2217Equal contribution\n1Taken from http://www.cogsci.uci.edu/\u02dcddhoff/three-cubes.gif\n\n1\n\n\fFigure 1: Illustration of the GVA principle using the Necker cube example. The cube in the middle\ncan be viewed in multiple ways. However, the views on the left and right require a very speci\ufb01c\nviewing angle. Slight rotations of the viewer around the exact viewing positions would dramatically\nchange the observed image. Thus, these views are unstable to perturbations. The middle view, on\nthe contrary, is stable to viewer rotations.\n\nShape from shading is a basic low-level vision task. Given an input shading image - an image of\na constant albedo object depicting only changes in illumination - we wish to infer the shape of the\nobjects in the image. In other words, we wish to recover the relative depth Zi at each pixel i in\nthe image. Given values of Z, local surface orientations are given by the gradients \u2207xZ and \u2207yZ\nalong the coordinate axes. A key component in estimating the shape is the illumination L. The\nparameters of L may be given with the image, or may need to be estimated from the image along\nwith the shape. The latter is a much harder problem due to the ambiguous nature of the problem, as\nmany different surface orientations and light combinations may explain the same image. While the\nnotion of a shading image may seem unnatural, extracting them from natural images has been an\nactive \ufb01eld of research. There are effective ways of decomposing images into shading and albedo\nimages (so called \u201cintrinsic images\u201d [20, 10, 1, 29]), and the output of those may be used as input to\nshape from shading algorithms.\nIn this paper we show how to effectively utilize the GVA for shape and illumination estimation from\na single shading image. The only terms in our optimization are the data term which explains the\nobservation and the GVA term. We propose a novel shading model which is a linearization of the\nspherical harmonics (SH) shading model [25]. The SH model has been gaining popularity in the\nvision and graphics communities in recent years [26, 17]) as it is more expressive than the pop-\nular single source Lambertian model. Linearizing this model allows us, as we show below, to get\nsimple expressions for our image and GVA terms, enabling us to use them effectively in an optimiza-\ntion framework. Given a shading image with an unknown light source, our optimization procedure\nsolves for the depth and illumination in the scene. We optimize using Alternating Direction Method\nof Multipliers (ADMM) [4, 6]. We show that this method is competitive with current shape and\nillumination from shading algorithms, without the use of other priors over illumination or geometry.\n\n2 Related Work\n\nClassical works on shape from shading include [13, 14, 15, 8, 23] and newer works include [3, 2,\n19, 30]. It is out of scope of this paper to give a full survey of this well studied \ufb01eld, and we refer the\nreader to [31] and [28] for good reviews. A large part of the research has been focused on estimating\nthe shape under known illumination conditions. While still a hard problem, it is more constrained\nthan estimating both the illumination and the shape.\nIn impressive recent work, Barron and Malik [3] propose a method for estimating not just the illu-\nmination and shape, but also the albedo of a given masked object from a single image. By using\na number of novel (and carefully balanced) priors over shape (such as smoothness and contour in-\nformation), albedo and illumination, it is shown that reasonable estimates of shape and illumination\nmay be extracted. These priors and the data term are combined in a novel multi-scale framework\nwhich weights coarser scale (lower frequency) estimates of shape more than \ufb01ner scale estimates.\nFurthermore, Barron and Malik use a spherical harmonics lighting model to provide for richer re-\ncovery of real world scenes and diffuse outdoor lighting conditions. Another contribution of their\nwork has been the observation that joint inference of multiple parameters may prove to be more\nrobust (although this is hard to prove rigorously). The expansion to the original MIT dataset [11]\nprovided in [3] is also a useful contribution.\n\n2\n\n\fAnother recent notable example is that of Xiong et al. [30]. In this thorough work, the distribution\nof possible shape/illumination combinations in a small image patch is derived, assuming a quadratic\ndepth model. It is shown that local patches may be quite informative, and that are only a few possible\nexplanations of light/shape pairs for each patch. A framework for estimating full model geometry\nwith known lighting conditions is also proposed.\n\n3 Using the Generic View Assumption for Shape from Shading\n\nIn [9], Freeman gave an analytical framework to use the GVA. However, the computational exam-\nples in the paper were restricted to linear shape from shading models. No inference algorithm was\npresented; instead the emphasis was on analyzing how the GVA term modi\ufb01es the posterior dis-\ntribution of candidate shape and illumination estimates. The key idea in [9] is to marginalize the\nposterior distribution over a set of \u201cnuisance\u201d parameters - these correspond to object or illumi-\nnation perturbations. This integration step corresponds to \ufb01nding a solution that is stable to these\nperturbations.\n\n3.1 A Short Introduction to the GVA\n\nHere we give a short summary of the derivations in [9], which we use in our model. We start\nwith a generative model f for images, which depends on scene parameters Q and a set of generic\nparameters w. The generative model we use is explained in Section 4. w are the parameters which\nwill eventually be marginalized. In our shape and illumination from shading case, f corresponds to\nour shading model in Eq. 14 (de\ufb01ned below). Q includes both surface depth at each point Z and the\nlight coef\ufb01cients vector L. Finally, the generic variable w corresponds to different object rotation\nangles around different axes of rotations (though there could be other generic variables, we only use\nthis one). Assuming measurement noise \u03b7 the result of the generative process would be:\n\nI = f (Q, w) + \u03b7\n\n(1)\n\nNow, given an image I we wish to infer scene parameters Q by marginalizing out the generic vari-\nables w. Using Bayes\u2019 theorem, this results in the following probability function:\n\nP (Q|I) =\n\nP (Q)\nP (I)\n\nP (w)P (I|Q, w)dw\n\n(2)\n\n(cid:90)\n\nw\n\nAssuming a low Gaussian noise model for \u03b7, the above integral can be approximated with a Laplace\napproximation, which involves expanding f using a Taylor expansion around w0. We get the fol-\nlowing expression, aptly named in [9] as the \u201dscene probability equation\u201d:\n\n(cid:18)\n\nP (Q|I) = C(cid:124)(cid:123)(cid:122)(cid:125)\n\nconstant\n\nexp\n\n(cid:124)\n\n\u2212(cid:107)I \u2212 f (Q, w0)(cid:107)2\n\n2\u03c32\n\n(cid:123)(cid:122)\n\n\ufb01delity\n\n(cid:19)\n(cid:125)\n\n(cid:124)\n\nP (Q)P (w0)\n\n(cid:123)(cid:122)\n\nprior\n\n(cid:125)\n\n1\u221a\ndet A\n\n(cid:124) (cid:123)(cid:122) (cid:125)\n\ngenericity\n\n(3)\n\n(4)\n\nwhere A is a matrix whose i, j-th entry is:\n\nAi,j =\n\n\u2202f (Q, w)\n\nT \u2202f (Q, w)\n\n\u2202wi\n\n\u2202wj\n\nand the derivatives are estimated at w0. A is often called the Fisher information matrix.\nEq. 3 has three terms: the \ufb01delity term (sometimes called the likelihood term, data term or image\nterm) tells us how close we are to the observed image. The prior tells us how likely are our current\nparameter estimates. The last term, genericity, tells us how much our observed image would change\nunder perturbations of the different generic variables. This term is the one which penalizes for\nunstable results w.r.t to the generic variables. From the form of A, it is clear why the genericity term\nhelps; the determinant of A is large when the rendered image f changes rapidly with respect to w.\nThis makes the genericity term small and the corresponding hypothesis Q less probable.\n\n3\n\n\f3.2 Using the GVA for Shape and Illumination Estimation\n\nWe now show how to derive the GVA term for general object rotations by using the result in [9] and\napplying it to our linearized shading model. Due to lack of space, we provide the main results here;\nplease refer to the supplementary material for full details. Given an axis of rotation parametrized by\nangles \u03b8 and \u03b3, the derivative of f w.r.t to a rotation \u03c6 about the axis is:\n\na = cos(\u03b8) sin(\u03b3),\n\n\u2202f\n\n\u2202\u03c6 = aRx + bRy + cRz\nb = sin(\u03b8) sin(\u03b3),\n\nc = cos(\u03b3)\n\n(5)\n(6)\n\nwhere Rx,Ry and Rz are three derivative images for rotations around the canonical axes for which\nthe i-th pixel is:\n\nWe use these images to derive the GVA term for rotations around different axes, resulting in:\n\nRx\ni = I x\ni = \u2212I y\nRy\nRz\ni = I x\n\ni Xi + \u03b1iky\n\ni Zi + \u03b1i\u03b2ikx\ni + (1 + \u03b22\ni Zi \u2212 \u03b1i\u03b2iky\ni \u2212 (1 + \u03b12\ni Yi \u2212 I y\n(cid:88)\n(cid:88)\n\ni )ky\ni\ni )kx\ni\ni \u2212 \u03b2ikx\n1(cid:113)\n\ni\n\n\u03b8\u2208\u0398\n\n\u03b3\u2208\u0393\n\n2\u03c0\u03c32(cid:107) \u2202f\n\n\u2202\u03c6(cid:107)2\n\nGVA(Z, L) =\n\n(7)\n(8)\n(9)\n\n(10)\n\nwhere \u0398 and \u0393 are discrete sets of angles in [0, \u03c0) and [0, 2\u03c0) respectively. The terms in Eqs. 5\u201310\nare quite daunting, especially considering that \u03b1 = \u2207xZ and \u03b2 = \u2207yZ are functions of the depth\nZ. In the next section we present our linearized shading model which makes these expressions more\nmanageable.\n\n4 Linearized Spherical Harmonics Shading Model\n\nThe Spherical Harmonics (SH) lighting2 model allows for a rich yet concise description of a light-\ning environment [25]. By keeping just a few of the leading SH coef\ufb01cients when describing the\nillumination, it allows an accurate description for low frequency changes of lighting as a function\nof direction, without needing to explicitly model the lighting environment in whole. This model\nhas been used successfully in the graphics and the vision communities. The popular setting for SH\nlighting is to keep the \ufb01rst three orders of the SH functions, resulting in nine coef\ufb01cients which we\nwill denote by the vector L. Let Z be a depth map, with the depth at pixel i given by Zi. The surface\nslopes at pixel i are de\ufb01ned as \u03b1i = (\u2207xZ)i and \u03b2i = (\u2207yZ)i respectively. Given L and Z, the log\nshading at pixel i for a diffuse, Lambertian surface under the SH model is given by:\n\n(cid:104)\n\nni =\n\nwhere ni:\n\nand:\n\nlog Si = nT\n\ni Mni\n\n\u03b2i\u221a\n\n1+\u03b12\n\ni +\u03b22\ni\n\n1\u221a\n\n1+\u03b12\n\ni +\u03b22\ni\n\ni +\u03b22\ni\n\n\u03b1i\u221a\n\n1+\u03b12\n\n\uf8ee\uf8ef\uf8f0c1L9\n\nM =\n\nc1L5\nc1L5 \u2212c1L9\nc1L6\nc1L8\nc2L4\nc2L2\n\nc1L8\nc1L6\nc3L7\nc2L3\n\nc2L4\nc2L2\nc2L3\n\nc4L1 \u2212 c5L7\n\n1 (cid:105)T\n\uf8f9\uf8fa\uf8fb\n\n(11)\n\n(12)\n\n(13)\n\nc1 = 0.429043, c2 = 0.511664, c3 = 0.743125, c4 = 0.886227, c5 = 0.247708\n\nThe formation model in Eq. 11 is non-linear and non-convex in the surface slopes \u03b1 and \u03b2.\nIn\npractice, this leads to optimization dif\ufb01culties such as local minima, which have been noted by\nBarron and Malik in [3]. In order to overcome this, we linearize Eq. 11 around the local surface\nslope estimate \u03b10\n\ni , \u03b20\n\ni , L) + kx(\u03b10\n\ni , \u03b20\n\ni , L)\u03b1i + ky(\u03b10\n\ni , \u03b20\n\ni , L)\u03b2i\n\n(14)\n\ni and \u03b20\n\ni , such that:\nlog Si \u2248 kc(\u03b10\n\n2We will use the terms lighting and shading interchangeably\n\n4\n\n\fi ,kx\n\ni , \u03b20\n\ni and ky\n\ni respectively for the remainder of the paper.\n\nwhere the local surface slopes are estimated in a local patch around each pixel in our current esti-\nmated surface. The derivation of the linearization is given in the supplementary material. For the\nsake of brevity, we will omit the dependence on the \u03b10\ni and L terms, and denote the coef\ufb01cients\nat each location as kc\nA natural question is the accuracy of the linearized model Eq. 14. The linearization is accurate\nin most situations where the depth Z changes gradually, such that the change in slope is linear or\nsmall in magnitude. In [30], locally quadratic shapes are assumed; this leads to linear changes in\nslopes, and in such situations, the linearization is highly accurate. We tested the accuracy of the\nlinearization by computing the difference between the estimates in Eq. 14 and Eq. 11, over ground\ntruth shape and illumination estimates. We found it to be highly accurate for the models in our\nexperiments. The linearization in Eq. 14 leads to a quadratic formation model for the image term\n(described in Section 5.2.1), leading to more ef\ufb01cient updates for \u03b1 and \u03b2. Furthermore, this allows\nus to effectively incorporate the GVA even with the spherical harmonics framework.\n\n5 Optimization using the Alternating Direction Method of Multipliers\n\n5.1 The Cost Function\n\nFollowing Eq. 3, we can now derive the cost function we will optimize w.r.t the scene parameters\nZ and L. To derive a MAP estimate, we take the negative log of Eq. 3 and use constant priors over\nboth the scene parameters and the generic variables; thus we have a prior-free cost function. This\nresults in the following cost:\n\ng(Z, L) = \u03bbimg(cid:107)I \u2212 log S(Z, L)(cid:107)2 \u2212 \u03bbGVA log GVA(Z, L)\n\n(15)\nwhere f (Z, L) = log S(Z, L) is our linearized shading model Eq. 14 and the GVA term is de\ufb01ned in\nEq. 10. \u03bbimg and \u03bbGVA are hyper-parameters which we set to 2 and 1 respectively for all experiments.\nBecause of the dependence of \u03b1 and \u03b2 on Z directly optimizing for this cost function is hard, as it\nresults in a large, non-linear differential system for Z. In order to make this more tractable, we\nintroduce \u02dc\u03b1 and \u02dc\u03b2, the surface spatial derivatives, as auxiliary variables, and solve for the following\ncost function which constrains the resulting surface to be integrable:\n\n\u02dcg(Z, \u02dc\u03b1, \u02dc\u03b2, L|I) = \u03bbimg(cid:107)I \u2212 log S(\u02dc\u03b1, \u02dc\u03b2, L)(cid:107)2 \u2212 \u03bbGVA log GVA(Z, \u02dc\u03b1, \u02dc\u03b2, L)\n\u02dc\u03b2 = \u2207yZ, \u2207y\u2207xZ = \u2207x\u2207yZ\n\n\u02dc\u03b1 = \u2207xZ,\n\ns.t\n\n(16)\n\nADMM allows us to subdivide the cost into relatively simple subproblems, solve each one indepen-\ndently and then aggregate the results. We brie\ufb02y review the message passing variant of ADMM [7]\nin the supplementary material.\n\n5.2 Subproblems\n\n5.2.1 Image Term\n\nThis subproblem ties our solution to the input log shading image. The participating variables are the\nslopes \u02dc\u03b1 and \u02dc\u03b2 and illumination L. We minimize the following cost:\n\n(cid:17)2\n\n(cid:88)\n\n(cid:16)\n\narg min\n\n\u02dc\u03b1, \u02dc\u03b2,L\n\n\u03bbimg\n\ni\n\nIi \u2212 kc\n\ni \u2212 kx\n\ni \u02dc\u03b1i \u2212 ky\n\n\u02dc\u03b2i\n\ni\n\n(cid:107)\u02dc\u03b1 \u2212 n \u02dc\u03b1(cid:107)2 +\n\n+\n\n\u03c1\n2\n\n(cid:107) \u02dc\u03b2 \u2212 n \u02dc\u03b2(cid:107)2 +\n\n\u03c1\n2\n\n\u03c1\n2\n\n(cid:107)L \u2212 nL(cid:107)2 (17)\n\nwhere n \u02dc\u03b1, n \u02dc\u03b2 and nL are the incoming messages for the corresponding variables as described above.\nWe solve this subproblem iteratively: for \u02dc\u03b1 and \u02dc\u03b2 we keep L constant (and as a result the k-s are\nconstant). A closed form solution exists since this is just a quadratic due to our linearized model. In\norder to solve for L we do a few (5 to 10) steps of L-BFGS [27].\n\n5.2.2 GVA Term\nThe participating variables here are the depth values Z, the slopes \u02dc\u03b1 and \u02dc\u03b2 and the light L. We look\nfor the parameters which minimize:\n\narg min\nZ, \u02dc\u03b1, \u02dc\u03b2,L\n\n\u2212 \u03bbGVA\n2\n\nlog GVA(Z, \u02dc\u03b1, \u02dc\u03b2, L) +\n\n(cid:107)\u02dc\u03b1 \u2212 n \u02dc\u03b1(cid:107)2 +\n\n\u03c1\n2\n\n(cid:107) \u02dc\u03b2 \u2212 n \u02dc\u03b2(cid:107)2 +\n\n\u03c1\n2\n\n\u03c1\n2\n\n(cid:107)L \u2212 nL(cid:107)2\n\n(18)\n\n5\n\n\fHere, though the expression for the GVA (Eq. 10) term is greatly simpli\ufb01ed due to the shading model\nlinearization, we have to resort to numerical optimization. We solve for the parameters using a few\nsteps of L-BFGS [27].\n\n5.2.3 Depth Integrability Constraint\n\nShading only depends on local slope (regardless of the choice of shading model, as long as there\nare no shadows in the scene), hence the image term only gives us information about surface slopes.\nUsing this information we need to \ufb01nd an integrable surface Z [8]. Finding integrable surfaces from\nlocal slope measurements has been a long standing research question and there are several ways\nof doing this [8, 14, 18]. By \ufb01nding such as a surface we will satisfy both constraints in Eq. 16\nautomatically. Enforcing integrability through message passing was performed in [24], where it was\nshown to be helpful in recovering smooth surfaces. In that work, belief propagation based message-\npassing was used. The cost for this subproblem is:\n\narg min\n\nZ, \u02dc\u03b1, \u02dc\u03b2\ns.t\n\n(cid:107)Z \u2212 nZ(cid:107)2 +\n\n\u03c1\n2\n\u02dc\u03b1 = \u2207xZ,\n\n\u03c1\n2\n\n(cid:107)\u02dc\u03b1 \u2212 n \u02dc\u03b1(cid:107)2 +\n\n(cid:107) \u02dc\u03b2 \u2212 n \u02dc\u03b2(cid:107)2\n\u02dc\u03b2 = \u2207yZ, \u2207y\u2207xZ = \u2207x\u2207yZ\n\n\u03c1\n2\n\n(19)\n\nWe solve for the surface Z given the messages for the slopes n \u02dc\u03b1 and n \u02dc\u03b2 by solving a least squares\nsystem to get the integrable surface. Then, the solution for \u02dc\u03b1 and \u02dc\u03b2 is just the spatial derivative of\nthe resulting surface, satisfying all the constraints and minimizing the cost simultaneously.\n\n5.3 Relinearization\n\nAfter each ADMM iteration, we perform re-linearization of the kc,kx and ky coef\ufb01cients. We take\nthe current estimates for Z and L and use them as input to our linearization procedure (see the\nsupplementary material for details). These coef\ufb01cients are then used for the next ADMM iteration.\nand this process is repeated.\n\n6 Experiments and Results\n\n(a) Models from [30] using\n\u201clab\u201d lights\n\n(b) MIT models using\n\u201cnatural\u201d lights\n\n(c) Average result over all\nmodels and lights\n\nFigure 2: Summary of results: Our performance is quite similar to that of SIFS [3] although we do\nnot use contour normals, nor any shape or illumination priors unlike [3]. We outperform SIFS on\nmodels from [30], while SIFS performs well on the MIT models. On average, we are comparable to\nSIFS in N-MAE and sightly better at light estimation.\n\nWe use the GVA algorithm to estimate shape and illumination from synthetic, grayscale shading\nimages, rendered using 18 different models from the MIT/Berkeley intrinsic images dataset [3] and\n7 models from the Harvard dataset in [30]. Each of these models is rendered using several different\nlight sources: the MIT models are lit with a \u201dnatural\u201d light dataset which comes with each model,\nand we use 2 lights from the \u201dlab\u201d dataset in order to light the models from [30], resulting in 32\ndifferent images. We use the provided mask just in the image term, where we solve only for pixels\nwithin the mask. We do not use any other contour information as in [3]. Models were downscaled\nto a quarter of their original size. Running times for our algorithm are roughly 7 minutes per image\n\n6\n\nN\u2212MAEL\u2212MSE00.20.40.60.8SIFSOurs\u2212GVAOurs\u2212NoGVAN\u2212MAEL\u2212MSE00.10.20.30.40.50.60.7SIFSOurs\u2212GVAOurs\u2212NoGVAN\u2212MAEL\u2212MSE00.10.20.30.40.50.60.7SIFSOurs\u2212GVAOurs\u2212NoGVA\fh\nt\nu\nr\nT\nd\nn\nu\no\nr\nG\n\nA\nV\nG\n\n-\n\ns\nr\nu\nO\n\nS\nF\nI\nS\n\nA\nV\nG\no\nN\n\n-\n\ns\nr\nu\nO\n\nViewpoint 1\n\nViewpoint 2\n\nEstimated Light Rendered Image\n\nFigure 3: Example of our results - note that the vertical scale of the mesh plots is different between\nthe plots and have been rescaled for display (speci\ufb01cally, the SIFS result are 4 times deeper). Our\nmethod preserves features such as the legs and belly while SIFS smoothes them out. The GVA light\nestimate is also quite reasonable. Unlike SIFS, no contour normals, nor tuned shape or lighting\npriors are needed for GVA.\n\nwith the GVA term and about 1 minute without the GVA term. This is with unoptimized MATLAB\ncode. We compare to the SIFS algorithm of [3] which is a subset of their algorithm that does not\nestimate albedo. We use their publicly released code.\nWe initialize with an all zeros depth (corresponding to a \ufb02at surface) and the light is initialized to\nthe mean light from the \u201cnatural\u201d dataset in [3]. We perform the estimation in multiple scales using\nV-sweeps - solving at a coarse scale, upscaling, solving at a \ufb01ner scale then downsampling the result,\nrepeating the process 3 times. The same parameter settings were used in all cases3.\nWe use the same error measures as in [3]. The error for the normals is measured using Median\nAngular Error (MAE) in radians. For the light, we take the resulting light coef\ufb01cients and render\na sphere lit by this light. We look for a DC shift which minimizes the distance between this image\nand the rendered ground truth light and shift the two images. Then the \ufb01nal error for the light is the\nL2 distance of the two images, normalized by the number of pixels. The error measure for depth Z\nused in [3] is quite sensitive to the absolute scaling of the results. We have decided to omit it from\nthe main paper (even though our performance under this measure is much better than [3]).\nA summary of the results can be seen in Figure 2. The GVA term helps signi\ufb01cantly in estimation\nresults. This is especially true for light estimation. On average, our performance is similar to that\nof [3]. Our light estimation results are somewhat better, while our geometry estimation results are\nslightly poorer. It seems that [3] is somewhat over\ufb01t to the models in the MIT dataset. When tested\non the models from [30], it gets poorer results.\nFigure 3 shows an example of the results we get, compared to that of SIFS [3], our algorithm with\nno GVA term, and the ground truth. As can be seen, the light we estimate is quite close to the\nground truth. The geometry we estimate certainly captures the main structures of the ground truth.\nEven though we use no smoothness prior, the resulting mesh is acceptable - though a smoothness\nprior, such as the one used [3] would help signi\ufb01cantly. The result by [3] misses a lot of the large\n\n3We will make our code publicly available at http://dilipkay.wordpress.com/sfs/\n\n7\n\n\fh\nt\nu\nr\nT\nd\nn\nu\no\nr\nG\n\nA\nV\nG\n\n-\n\ns\nr\nu\nO\n\nS\nF\nI\nS\n\nA\nV\nG\no\nN\n\n-\n\ns\nr\nu\nO\n\nViewpoint 1\n\nViewpoint 2\n\nEstimated Light Rendered Image\n\nFigure 4: Another example. Note how we manage to recover some of the dominant structure like\nthe neck and feet, while SIFS mostly smooths features (albeit resulting in a more pleasing surface).\n\nscale structures of such as the hippo\u2019s belly and feet, but it is certainly smooth and aesthetic. It is\nseen that without the GVA term, the resulting light is highly directed and the recovered shape has\nsnake-like structures which precisely line up with the direction of the light. These are very speci\ufb01c\nlocal minima which satisfy the observed image well, in agreement with the results in [9]. Figure 4\nshows some more results on a different model where the general story is similar.\n7 Discussion\nIn this paper, we have presented a shape and illumination from shading algorithm which makes use\nof the Generic View Assumption. We have shown how to utilize the GVA within an optimization\nframework. We achieve competitive results on shape and illumination estimation without the use of\nshape or illumination priors. The central message of our work is that the GVA can be a powerful\nregularizing term for the shape from shading problem. While priors for scene parameters can be very\nuseful, balancing the effect of different priors can be hard and inferred results may be biased towards\na wrong solution. One may ask: is the GVA just another prior? The GVA is a prior assumption,\nbut a very reasonable one: it merely states that all viewpoints and lighting directions are equally\nlikely. Nevertheless, there may exist multiple stable solutions and priors may be necessary to enable\nchoosing between these solutions [16]. A classical example of this is the convex/concave ambiguity\nin shape and light.\nFuture directions for this work are applying the GVA to more vision tasks, utilizing better optimiza-\ntion techniques and investigating the coexistence of priors and GVA terms.\nAcknowledgments\nThis work was supported by NSF CISE/IIS award 1212928 and by the Qatar Computing Research\nInstitute. We would like to thank Jonathan Yedidia for fruitful discussions.\n\nReferences\n[1] J. T. Barron and J. Malik. Color constancy, intrinsic images, and shape estimation. In Computer Vision\u2013\n\nECCV 2012, pages 57\u201370. Springer, 2012.\n\n8\n\n\f[2] J. T. Barron and J. Malik. Shape, albedo, and illumination from a single image of an unknown object.\nIn Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 334\u2013341. IEEE,\n2012.\n\n[3] J. T. Barron and J. Malik. Shape, illumination, and re\ufb02ectance from shading. Technical Report\n\nUCB/EECS-2013-117, EECS, UC Berkeley, May 2013.\n\n[4] J. Bento, N. Derbinsky, J. Alonso-Mora, and J. S. Yedidia. A message-passing algorithm for multi-agent\n\ntrajectory planning. In Advances in Neural Information Processing Systems, pages 521\u2013529, 2013.\n\n[5] T. O. Binford. Inferring surfaces from images. Arti\ufb01cial Intelligence, 17(1):205\u2013244, 1981.\n[6] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning\nvia the alternating direction method of multipliers. Foundations and Trends R(cid:13) in Machine Learning,\n3(1):1\u2013122, 2011.\n\n[7] N. Derbinsky, J. Bento, V. Elser, and J. S. Yedidia. An improved three-weight message-passing algorithm.\n\narXiv preprint arXiv:1305.1961, 2013.\n\n[8] R. T. Frankot and R. Chellappa. A method for enforcing integrability in shape from shading algorithms.\n\nPattern Analysis and Machine Intelligence, IEEE Transactions on, 10(4):439\u2013451, 1988.\n\n[9] W. T. Freeman. Exploiting the generic viewpoint assumption. International Journal of Computer Vision,\n\n20(3):243\u2013261, 1996.\n\n[10] P. V. Gehler, C. Rother, M. Kiefel, L. Zhang, and B. Sch\u00a8olkopf. Recovering intrinsic images with a global\n\nsparsity prior on re\ufb02ectance. In NIPS, volume 2, page 4, 2011.\n\n[11] R. Grosse, M. K. Johnson, E. H. Adelson, and W. T. Freeman. Ground truth dataset and baseline evalu-\nations for intrinsic image algorithms. In Computer Vision, 2009 IEEE 12th International Conference on,\npages 2335\u20132342. IEEE, 2009.\n\n[12] D. D. Hoffman. Genericity in spatial vision. Geometric Representations of Perceptual Phenomena:\n\nPapers in Honor of Tarow indow on His 70th Birthday, page 95, 2013.\n\n[13] B. K. Horn. Obtaining shape from shading information. MIT press, 1989.\n[14] B. K. Horn and M. J. Brooks. The variational approach to shape from shading. Computer Vision, Graph-\n\nics, and Image Processing, 33(2):174\u2013208, 1986.\n\n[15] K. Ikeuchi and B. K. Horn. Numerical shape from shading and occluding boundaries. Arti\ufb01cial intelli-\n\ngence, 17(1):141\u2013184, 1981.\n\n[16] A. D. Jepson. Comparing stories. Perception as Bayesian Inference, pages 478\u2013488, 1995.\n[17] J. Kautz, P.-P. Sloan, and J. Snyder. Fast, arbitrary brdf shading for low-frequency lighting using spherical\nharmonics. In Proceedings of the 13th Eurographics workshop on Rendering, pages 291\u2013296. Eurograph-\nics Association, 2002.\n\n[18] P. Kovesi. Shapelets correlated with surface normals produce surfaces. In Computer Vision, 2005. ICCV\n\n2005. Tenth IEEE International Conference on, volume 2, pages 994\u20131001. IEEE, 2005.\n\n[19] B. Kunsberg and S. W. Zucker. The differential geometry of shape from shading: Biology reveals curva-\nture structure. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer\nSociety Conference on, pages 39\u201346. IEEE, 2012.\n\n[20] Y. Li and M. S. Brown. Single image layer separation using relative smoothness. CVPR, 2014.\n[21] J. Malik. Interpreting line drawings of curved objects. International Journal of Computer Vision, 1(1):73\u2013\n\n103, 1987.\n\n1363, 1992.\n\n1990.\n\n[22] K. Nakayama and S. Shimojo. Experiencing and perceiving visual surfaces. Science, 257(5075):1357\u2013\n\n[23] A. P. Pentland. Linear shape from shading. International Journal of Computer Vision, 4(2):153\u2013162,\n\n[24] N. Petrovic, I. Cohen, B. J. Frey, R. Koetter, and T. S. Huang. Enforcing integrability for surface re-\nconstruction algorithms using belief propagation in graphical models. In Computer Vision and Pattern\nRecognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol-\nume 1, pages I\u2013743. IEEE, 2001.\n\n[25] R. Ramamoorthi and P. Hanrahan. An ef\ufb01cient representation for irradiance environment maps. In Pro-\nceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 497\u2013500.\nACM, 2001.\n\n[26] R. Ramamoorthi and P. Hanrahan. A signal-processing framework for inverse rendering. In Proceedings\nof the 28th annual conference on Computer graphics and interactive techniques, pages 117\u2013128. ACM,\n2001.\n\n[27] M. Schmidt. Minfunc, 2005.\n[28] R. Szeliski. Computer vision: algorithms and applications. Springer, 2010.\n[29] Y. Weiss. Deriving intrinsic images from image sequences.\n\nIn Computer Vision, 2001. ICCV 2001.\n\nProceedings. Eighth IEEE International Conference on, volume 2, pages 68\u201375. IEEE, 2001.\n\n[30] Y. Xiong, A. Chakrabarti, R. Basri, S. J. Gortler, D. W. Jacobs, and T. Zickler. From shading to local\n\nshape. http://arxiv.org/abs/1310.2916, 2014.\n\n[31] R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah. Shape-from-shading: a survey. Pattern Analysis and\n\nMachine Intelligence, IEEE Transactions on, 21(8):690\u2013706, 1999.\n\n9\n\n\f", "award": [], "sourceid": 166, "authors": [{"given_name": "Daniel", "family_name": "Zoran", "institution": "Massachusetts Institute of Technology"}, {"given_name": "Dilip", "family_name": "Krishnan", "institution": "Massachusetts Institute of Technology"}, {"given_name": "Jos\u00e9", "family_name": "Bento", "institution": "Boston College"}, {"given_name": "Bill", "family_name": "Freeman", "institution": "Massachusetts Institute of Technology"}]}