{"title": "Efficient Inference of Continuous Markov Random Fields with Polynomial Potentials", "book": "Advances in Neural Information Processing Systems", "page_first": 936, "page_last": 944, "abstract": "In this paper, we prove that every multivariate polynomial with even degree can be decomposed into a sum of convex and concave polynomials. Motivated by this property, we exploit the concave-convex procedure to perform inference on continuous Markov random fields with polynomial potentials. In particular, we show that the concave-convex decomposition of polynomials can be expressed as a sum-of-squares optimization, which can be efficiently solved via semidefinite programming. We demonstrate the effectiveness of our approach in the context of 3D reconstruction, shape from shading and image denoising, and show that our approach significantly outperforms existing approaches in terms of efficiency as well as the quality of the retrieved solution.", "full_text": "Ef\ufb01cient Inference of Continuous Markov Random\n\nFields with Polynomial Potentials\n\nShenlong Wang\n\nUniversity of Toronto\n\nAlexander G. Schwing\nUniversity of Toronto\n\nRaquel Urtasun\n\nUniversity of Toronto\n\nslwang@cs.toronto.edu\n\naschwing@cs.toronto.edu\n\nurtasun@cs.toronto.edu\n\nAbstract\n\nIn this paper, we prove that every multivariate polynomial with even degree can\nbe decomposed into a sum of convex and concave polynomials. Motivated by\nthis property, we exploit the concave-convex procedure to perform inference on\ncontinuous Markov random \ufb01elds with polynomial potentials. In particular, we\nshow that the concave-convex decomposition of polynomials can be expressed as\na sum-of-squares optimization, which can be ef\ufb01ciently solved via semide\ufb01nite\nprograming. We demonstrate the effectiveness of our approach in the context\nof 3D reconstruction, shape from shading and image denoising, and show that\nour method signi\ufb01cantly outperforms existing techniques in terms of ef\ufb01ciency as\nwell as quality of the retrieved solution.\n\n1\n\nIntroduction\n\nGraphical models are a convenient tool to illustrate the dependencies among a collection of random\nvariables with potentially complex interactions. Their widespread use across domains from com-\nputer vision and natural language processing to computational biology underlines their applicability.\nMany algorithms have been proposed to retrieve the minimum energy con\ufb01guration, i.e., maximum\na-posteriori (MAP) inference, when the graphical model describes energies or distributions de\ufb01ned\non a discrete domain. Although this task is NP-hard in general, message passing algorithms [16] and\ngraph-cuts [4] can be used to retrieve the global optimum when dealing with tree-structured models\nor binary Markov random \ufb01elds composed out of sub-modular energy functions.\nIn contrast, graphical models with continuous random variables are much less well understood. A\nnotable exception is Gaussian belief propagation [31], which retrieves the optimum when the poten-\ntials are Gaussian for arbitrary graphs under certain conditions of the underlying system. Inspired\nby discrete graphical models, message-passing algorithms based on discrete approximations in the\nform of particles [6, 17] or non-linear functions [27] have been developed for general potentials.\nThey are, however, computationally expensive and do not perform well when compared to dedi-\ncated algorithms [20]. Fusion moves [11] are a possible alternative, but they rely on the generation\nof good proposals, a task that is often dif\ufb01cult in practice. Other related work focuses on representing\nrelations on pairwise graphical models [24], or marginalization rather than MAP [13].\nIn this paper we study the case where the potentials are polynomial functions. This is a very general\nfamily of models as many applications such as collaborative \ufb01ltering [8], surface reconstruction [5]\nand non-rigid registration [30] can be formulated in this way. Previous approaches rely on either\npolynomial equation system solvers [20], semi-de\ufb01nite programming relaxations [9, 15] or approxi-\nmate message-passing algorithms [17, 27]. Unfortunately, existing methods either cannot cope with\nlarge-scale graphical models, and/or do not have global convergence guarantees.\nIn particular, we exploit the concave-convex procedure (CCCP) [33] to perform inference on con-\ntinuous Markov random \ufb01elds (MRFs) with polynomial potentials. Towards this goal, we \ufb01rst show\nthat an arbitrary multivariate polynomial function can be decomposed into a sum of a convex and\n\n1\n\n\fa concave polynomial. Importantly, this decomposition can be expressed as a sum-of-squares opti-\nmization [10] over polynomial Hessians, which is ef\ufb01ciently solvable via semide\ufb01nite programming.\nGiven the decomposition, our inference algorithm proceeds iteratively as follows: at each iteration\nwe linearize the concave part and solve the resulting subproblem ef\ufb01ciently to optimality. Our algo-\nrithm inherits the global convergence property of CCCP [25].\nWe demonstrate the effectiveness of our approach in the context of 3D reconstruction, shape from\nshading and image denoising. Our method proves superior in terms of both computational cost and\nthe energy of the solutions retrieved when compared to approaches such as dual decomposition [20],\nfusion moves [11] and particle belief propagation [6].\n\n2 Graphical Models with Continuous Variables and Polynomial Functions\n\nIn this section we \ufb01rst review inference algorithms for graphical models with continuous random\nvariables, as well as the concave-convex procedure. We then prove existence of a concave-convex\ndecomposition for polynomials and provide a construction. Based on this decomposition and con-\nstruction, we propose a novel inference algorithm for continuous MRFs with polynomial potentials.\n\nThe MRFs we consider represent distributions de\ufb01ned over a continuous domain X =(cid:81)\nthe system as a sum of local scoring functions, i.e., f (x) = (cid:80)\nwhich is speci\ufb01ed by the restriction often referred to as region r \u2286 {1, . . . , n}, i.e., Xr =(cid:81)\n\n2.1 Graphical Models with Polynomial Potentials\ni Xi, which\nis a product-space assembled by continuous sub-spaces Xi \u2282 R. Let x \u2208 X be the output con\ufb01g-\nuration of interest, e.g., a 3D mesh or a denoised image. Note that each output con\ufb01guration tuple\nx = (x1,\u00b7\u00b7\u00b7 , xn) subsumes a set of random variables. Graphical models describe the energy of\nr\u2208R fr(xr). Each local function\nfr(xr) : Xr \u2192 R depends on a subset of variables xr = (xi)i\u2208r de\ufb01ned on a domain Xr \u2286 X ,\ni\u2208r Xi.\nWe refer to R as the set of all restrictions required to compute the energy of the system.\nWe tackle the problem of maximum a-posteriori (MAP) inference, i.e., we want to \ufb01nd the con\ufb01gu-\nration x\u2217 having the minimum energy. This is formally expressed as\n\nfr(xr).\n\n(1)\n\n(cid:88)\n\nr\u2208R\n\nx\u2217 = arg min\n\nx\n\nSolving this program for general functions is hard. In this paper we focus on energies composed of\npolynomial functions. This is a fairly general case, as the energies employed in many applications\nobey this assumption. Furthermore, for well-behaved continuous non-polynomial functions (e.g.,\nk-th order differentiable) polynomial approximations could be used (e.g., via a Taylor expansion).\nLet us de\ufb01ne polynomials more formally:\nDe\ufb01nition 1. A d-degree multivariate polynomial f (x) : Rn \u2192 R is a \ufb01nite linear combination of\nmonomials, i.e.,\n\nwhere we let the coef\ufb01cient cm \u2208 R and the tuple m = (m1, . . . , mn) \u2208 M \u2286 Nn with(cid:80)n\n\nd \u2200m \u2208 M. The set M subsumes all tuples relevant to de\ufb01ne the function f.\n\n1 xm2\n\ncmxm1\n\nf (x) =\n\n\u00b7\u00b7\u00b7 xmn\nn ,\n\n2\n\n(cid:88)\n\nm\u2208M\n\ni=1 mi \u2264\n\nWe are interested in minimizing Eq. (1) where the potential functions fr are polynomials with arbi-\ntrary degree. This is a dif\ufb01cult problem as polynomial functions are in general non-convex. More-\nover, for many applications of interest we have to deal with a large number of variables, e.g., more\nthan 60,000 when reconstructing shape from shading of a 256 \u00d7 256 image. Optimal solutions ex-\nist under certain conditions when the potentials are Gaussian [31], i.e., polynomials of degree 2.\nMessage passing algorithms have not been very successful for general polynomials due to the fact\nthat the messages are continuous functions. Discrete [6, 17] and non-parametric [27] approxima-\ntions have been employed with limited success. Furthermore, polynomial system solvers [20], and\nmoment-based methods [9] cannot scale up to such a large number of variables. Dual-decomposition\nprovides a plausible approach for tackling large-scale problems by dividing the task into many small\nsub-problems [20]. However, solving a large number of smaller systems is still a bottleneck, and\ndecoding the optimal solution from the sub-problems might be dif\ufb01cult. In contrast, we propose to\nuse the Concave-Convex Procedure (CCCP) [33], which we now brie\ufb02y review.\n\n2\n\n\fInference via CCCP\n\n2.2\nCCCP is a majorization-minimization framework for optimizing non-convex functions that can be\nwritten as the sum of a convex and a concave part, i.e., f (x) = fvex(x) + fcave(x). This frame-\nwork has recently been used to solve a wide variety of machine learning tasks, such as learning in\nstructured models with latent variables [32, 22], kernel methods with missing entries [23] and sparse\nprinciple component analysis [26]. In CCCP, f is optimized by iteratively computing a linearization\nof the concave part at the current iterate x(i) and solving the resulting convex problem\n\nx(i+1) = arg min\n\nx\n\nfvex(x) + xT\u2207fcave(x(i)).\n\n(2)\n\nThis process is guaranteed to monotonically decrease the objective and it converges globally, i.e.,\nfor any point x (see Theorem 2 of [33] and Theorem 8 [25]). Moreover, Salakhutdinov et al. [19]\nshowed that the convergence rate of CCCP, which is between super-linear and linear, depends on\nthe curvature ratio between the convex and concave part. In order to take advantage of CCCP to\nsolve our problem, we need to decompose the energy function into a sum of convex and concave\nparts. In the next section we show that this decomposition always exists. Furthermore, we provide a\nprocedure to perform this decomposition given general polynomials.\n\n2.3 Existence of a Concave-Convex Decomposition of Polynomials\nTheorem 1 in [33] shows that for all arbitrary continuous functions with bounded Hessian a decom-\nposition into convex and concave parts exists. However, Hessians of polynomial functions are not\nbounded in Rn. Furthermore,\n[33] did not provide a construction for the decomposition. In this\nsection we show that for polynomials this decomposition always exists and we provide a construc-\ntion. Note that since odd degree polynomials are unbounded from below, i.e., not proper, we only\nfocus on even degree polynomials in the following. Let us therefore consider the space spanned by\npolynomial functions with an even degree d.\nProposition 1. The set of polynomial functions f (x) : Rn \u2192 R with even degree d, denoted P n\na topological vector space. Furthermore, its dimension dim(P n\n\n(cid:18) n + d \u2212 1\n\n(cid:19)\n\nd , is\n\nd ) =\n\n.\n\nd\n\nProof. (Sketch) According to the de\ufb01nition of vector spaces, we know that the set of polynomial\nfunctions forms a vector space over R. We can then show that addition and multiplication over the\npolynomial ring P n\nd ) is equivalent to computing a d-combination\nwith repetition from n elements [3].\n\nd is continuous. Finally, dim(P n\n\nNext we investigate the geometric properties of convex even degree polynomials.\nLemma 1. Let the set of convex polynomial functions c(x) : Rn \u2192 R with even degree d be Cn\nd .\nThis subset of P n\nProof. Given two arbitrary convex polynomial functions f and g \u2208 Cn\nscalars a, b \u2208 R+. \u2200x, y \u2208 Rn,\u2200\u03bb \u2208 [0, 1], we have:\n\nd , let h = af +bg with positive\n\nd is a convex cone.\n\nh(\u03bbx + (1 \u2212 \u03bb)y) = af (\u03bbx + (1 \u2212 \u03bb)y) + bg(\u03bbx + (1 \u2212 \u03bb)y)\n\n\u2264 a(\u03bbf (x) + (1 \u2212 \u03bb)f (y)) + b(\u03bbh(x) + (1 \u2212 \u03bb)h(y))\n= \u03bbh(x) + (1 \u2212 \u03bb)h(y).\nd ,\u2200a, b \u2208 R+, we have af + bg \u2208 Cn\n\nd is a convex cone.\n\nd , i.e., Cn\n\nTherefore, \u2200f, g \u2208 Cn\n\nWe now show that the eigenvalues of the Hessian of f (hence the smallest one) continuously depend\non f \u2208 P n\nd .\nProposition 2. For any polynomial function f \u2208 P n\nd with d \u2265 2, the eigenvalues of its Hessian\neig(\u22072f (x)) are continuous w.r.t. f in the polynomial space P n\nd .\nProof. \u2200f \u2208 P n\ni cigi, linear in\nthe coef\ufb01cients ci. It is easy to see that \u2200f \u2208 P n\nd , the Hessian \u22072f (x) is a polynomial matrix,\ni ci\u22072gi(x) de\ufb01ne\nthe Hessian as a function of the coef\ufb01cients (c1,\u00b7\u00b7\u00b7 , cn). The eigenvalues eig(M (c1,\u00b7\u00b7\u00b7 , cn)) are\n\nd , we obtain the representation f = (cid:80)\ni ci\u22072gi(x). Let M (c1,\u00b7\u00b7\u00b7 , cn) = \u22072f (x) =(cid:80)\n\nlinear in ci, i.e., \u22072f (x) =(cid:80)\n\nd , given a basis {gi} of P n\n\n3\n\n\fequivalent to the root of the characteristic polynomial of M (c1,\u00b7\u00b7\u00b7 , cn), i.e., the set of solutions for\ndet(M \u2212 \u03bbI) = 0. All the coef\ufb01cients of the characteristic polynomial are polynomial expressions\nw.r.t. the entries of M, hence they are also polynomial w.r.t. (c1,\u00b7\u00b7\u00b7 , cn) since each entry of M is\nlinear on (c1,\u00b7\u00b7\u00b7 , cn). Therefore, the coef\ufb01cients of the characteristic polynomial are continuously\ndependent on (c1,\u00b7\u00b7\u00b7 , cn). Moreover, the root of a polynomial is continuously dependent on the\ncoef\ufb01cients of the polynomial [28]. Based on these dependencies, eig(M (c1,\u00b7\u00b7\u00b7 , cn)) are continu-\nously dependent on (c1,\u00b7\u00b7\u00b7 , cn), and eig(M (c1,\u00b7\u00b7\u00b7 , cn)) are continuous w.r.t. f in the polynomial\nspace P n\nd .\n\nThe following proposition illustrates that the relative interior of the convex cone of even degree\npolynomials is not empty.\nProposition 3. For an even degree function space P n\nd , such that\n\u2200x \u2208 Rn, the Hessian is strictly positive de\ufb01nite, i.e., \u22072f (x) (cid:31) 0. Hence the relative interior of\nCn\nd is not empty.\n\nd , there exists a function f (x) \u2208 P n\n\nProof. Let f (x) =(cid:80)\ni +(cid:80)\n\u22072f (x) = diag(cid:0)(cid:2)d(d \u2212 1)xd\u22122\n\ni xd\n\nd . It follows trivially that\n\ni \u2208 P n\n\ni x2\n1 + 2, d(d \u2212 1)xd\u22122\n\n2 + 2,\u00b7\u00b7\u00b7 , d(d \u2212 1)xd\u22122\n\nn + 2(cid:3)(cid:1) (cid:31) 0 \u2200x.\n\nd is P n\n\nd\n\nd ) = dim(P n\nd ).\n\nd and P n\n\nd is identical.\n\nd . This concludes the proof.\n\nGiven the above two propositions it follows that the dimensionality of Cn\nLemma 2. The dimension of the polynomial vector space is equal to the dimension of the convex\neven degree polynomial cone having the same degree d and the same number of variables n, i.e.,\ndim(Cn\nProof. According to Proposition 3, there exists a function f \u2208 P n\nd , with strictly positive de\ufb01nite\nHessian, i.e., \u2200x \u2208 Rn, eig(\u22072f (x)) > 0. Consider a polynomial basis {gi} of P n\nd . Consider\nthe vector of eigenvalues E(\u02c6ci) = eig(\u22072(f (x) + \u02c6cigi)). According to Proposition 2, E(\u02c6ci) is\ncontinuous w.r.t. \u02c6ci, and E(0) is an all-positive vector. According to the de\ufb01nition of continuity,\nthere exists an \u0001 > 0, such that E(\u02c6ci) > 0, \u2200\u02c6ci \u2208 {c : |c| < \u0001}. Hence, there exists a nonzero\nconstant \u02c6ci such that the polynomial f + \u02c6cigi is also strictly convex. We can construct such a strictly\nconvex polynomial \u2200gi. Therefore the polynomial set f + \u02c6cigi is linearly independent and hence a\nbasis of Cn\nLemma 3. The linear span of the basis of Cn\nProof. Suppose P n\n{g1, g2,\u00b7\u00b7\u00b7 gN} a basis of Cn\nby {g1, g2,\u00b7\u00b7\u00b7 gN}. We have {g1, g2,\u00b7\u00b7\u00b7 , gN , h} are N +1 linear independent vectors in P n\nis in contradiction with P n\nTheorem 1. \u2200f \u2208 P n\nProof. Let the basis of Cn\nd be {g1, g2,\u00b7\u00b7\u00b7 , gN}. According to Lemma 3, there exist coef\ufb01cients\nc1,\u00b7\u00b7\u00b7 , cN , such that f = c1g1 + c2g2 + \u00b7\u00b7\u00b7 + cN gN . We can partition the coef\ufb01cients into\nci\u22650 cigi and\n\nd is also N-dimensional. Denote\nd such that h cannot be linearly represented\nd , which\n\nd . Assume there exists h \u2208 P n\nd being N-dimensional.\n\ncj <0 cjgj. Let h = (cid:80)\n\nd is N-dimensional. According to Lemma 2, Cn\n\ntwo sets, according to their sign, i.e., f = (cid:80)\ng = \u2212(cid:80)\n(cid:18) n + d \u2212 1\n\nci\u22650 cigi +(cid:80)\n(cid:19)\n\ncj <0 cjgj. We have f = h \u2212 g, while both h and g are convex polynomials.\n\nAccording to Theorem 1 there exists a concave-convex decomposition given any polynomial, where\nboth the convex and concave parts are also polynomials with degree no greater than the original\npolynomial. As long as we can \ufb01nd\nlinearly independent convex polynomial basis\nfunctions for any arbitrary polynomial function f \u2208 P n\nd , we obtain a valid decomposition by looking\nat the sign of the coef\ufb01cients. It is however worth noting that the concave-convex decomposition\nis not unique. In fact, there is an in\ufb01nite number of decompositions, trivially seen by adding and\nsubtracting an arbitrary convex polynomial to an existing decomposition.\nFinding a convex basis is however not an easy task, mainly due to the dif\ufb01culties on checking\nconvexity and the exponentially increasing dimension. Recently, Ahmadi et al. [1] proved that even\ndeciding on the convexity of quartic polynomials is NP-hard.\n\nd\n\nd , there exist convex polynomials h, g \u2208 Cn\n\nd such that f = h \u2212 g.\n\n4\n\n\fAlgorithm 1 CCCP Inference on Continuous MRFs with Polynomial Potentials\nInput: Initial estimation x0\n\u2200r \ufb01nd fr(xr) = fr,vex(xr) + fr,cave(xr) via Eq. (4) or via a polynomial basis (Theorem 1)\nrepeat\n\nr fr,vex(xr) + xT\u2207x((cid:80)\n(cid:80)\n\nr\u2208R fr,cave(x(i)\n\nr )) with L-BFGS.\n\nsolve x(i+1) = arg minx\n\nuntil convergence\n\nOutput: x\u2217\n\n2.4 Constructing a Concave-Convex Decomposition of Polynomials\nIn this section we derive an algorithm to construct the concave-convex decomposition of arbitrary\npolynomials. Our algorithm \ufb01rst constructs the convex basis of the polynomial vector space P n\nd\nbefore extracting a convex polynomial containing the target polynomial via a sum-of-squares (SOS)\nprogram. More formally, given a non-convex polynomial f (x) we are interested in constructing\ni cigi(x), with gi(x), i = {1, . . . , m}, the set of all con-\nvex monomials with degree no grater than deg(f (x)). From this it follows that fvex = h(x) and\ni cigi(x). In particular, we want a convex function h(x), with coef\ufb01cients ci as small\n\na convex function h(x) = f (x) +(cid:80)\nfcave = \u2212(cid:80)\n\nas possible:\n\n(cid:88)\n\ni\n\nwT c s.t. \u22072f (x) +\n\nmin\n\nc\n\nci\u22072gi(x) (cid:31) 0 \u2200x \u2208 Rn,\n\n(3)\n\nwith the objective function being a weighted sum of coef\ufb01cients. The weight vector w can encode\npreferences in the minimization, e.g., smaller coef\ufb01cients for larger degrees. This minimization\nproblem is NP-hard. If it was not, we could decide whether an arbitrary polynomial f (x) is convex\nby solving such a program, which contradicts the NP-hardness result of [1]. Instead, we utilize a\ntighter set of constraints, i.e., sum-of-square constraints, which are easier to solve [14].\nDe\ufb01nition 2. For an even degree polynomial f (x) \u2208 P n\nand only if there exist g1, . . . , gk \u2208 P n\nThus, instead of solving the NP-hard program stated in Eq. (3), we optimize:\nci\u22072gi(x) \u2208 SOS.\n\nm such that f (x) =(cid:80)k\n\nd , with d = 2m, f is an SOS polynomial if\n\nwT c s.t. \u22072f (x) +\n\ni=1 gi(x)2.\n\n(4)\n\nmin\n\nc\n\n(cid:88)\n\ni\n\nThe set of SOS Hessians is a subset of the positive de\ufb01nite Hessians [9]. Hence, every solution of\nthis problem can be considered a valid construction. Furthermore, the sum-of-squares optimization\nin Eq. (4) can be formulated as an ef\ufb01ciently solvable semi-de\ufb01nite program (SDP) [10, 9]. It is im-\nportant to note that the gap between the SOS Hessians and the positive de\ufb01nite Hessians increases\nas the degree of the polynomials grows. Hence using SOS constraints we might not \ufb01nd a solution,\neven though there exists one for the original program given in Eq. (3). In practice, SOS optimization\nworks well for monomials and low-degree polynomials. For pairwise graphical models with arbi-\ntrary degree polynomials, as well as for graphical models of order up to four with maximum fourth\norder degree polynomials, we are guaranteed to \ufb01nd a decomposition. This is due to the fact that\nSOS convexity and polynomial convexity coincide (Theorem 5.2 in [2]). Most practical graphical\nmodels are within this set. Known counter-examples [2] are typically found using speci\ufb01c tools.\nWe summarize our algorithm in Alg. 1. Given a graphical model with polynomial potentials with\ndegree at most d, we obtain a concave-convex decomposition by solving Eq. (4). This can be done\nfor the full polynomial or for each non-convex monomial. We then apply CCCP in order to perform\ninference, where we solve a convex problem at each iteration. In particular, we employ L-BFGS,\nmainly due to its super-linear convergence and its storage ef\ufb01ciency [12]. In each L-BFGS step, we\napply a line search scheme based on the Wolfe conditions [12].\n\n2.5 Extensions\nDealing with very large graphs: Motivated by recent progress on accelerating graphical model\ninference [7, 21, 20], we can handle large-scale problems by employing dual decomposition and\nusing our approach to solve the sub-problems.\nNon-polynomial cases: We have described our method in the context of graphical models with\npolynomial potentials. It can be extended to the non-polynomial case if the involved functions have\n\n5\n\n\fEnergy\n\nRMSE (mm)\nTime (second)\n\nL-BFGS\n10736.4\n\n4.98\n0.11\n\nPCBP\n6082.7\n4.50\n56.60\n\nFusionMove ADMM-Poly\n\n4317.7\n2.95\n0.12\n\n3221.1\n3.82\n18.32\n\nOurs\n3062.8\n3.07\n8.70 (\u00d72)\n\nTable 1: 3D Reconstruction on 3 \u00d7 3 meshes with noise variance \u03c3 = 2.\n\n(a) Synthetic meshes\n\n(b) Cardboard meshes\n\n(c) Shape-from-Shading\n\n(d) Denoising\n\nFigure 1: Average energy evolution curve for different applications.\n\nbounded Hessians, since we can still construct the concave-convex decomposition. For instance, for\n8 } \u2212 x2\nthe Lorentzian regularizer \u03c1(x) = log(1 + x2\n8\nis a valid concave-convex decomposition. We refer the reader to the supplementary material for\na detailed proof. Alternatively, we can approximate any continuous function with polynomials by\nemploying a Taylor expansion around the current iterate, and updating the solution via one CCCP\nstep within a trust region.\n\n2 ), we note that \u03c1(x) = {log(1 + x2\n\n2 ) + x2\n\n3 Experimental Evaluation\n\nWe demonstrate the effectiveness of our approach using three different applications: non-rigid 3D\nreconstruction, shape from shading and image denoising. We refer the reader to the supplementary\nmaterial for more \ufb01gures as well as an additional toy experiment on a densely connected graph with\nbox constraints.\n\n3.1 Non-rigid 3D Reconstruction\nWe tackle the problem of deformable surface reconstruction from a single image. Following [30],\nwe parameterize the 3D shape via the depth of keypoints. Let x \u2208 RN be the depth of N points.\nWe follow the locally isometric deformation assumption [20], i.e., the distance between neighboring\nkeypoints remains constant as the non-rigid surface deforms. The 3D reconstruction problem is then\nformulated as\n\n(cid:0)(cid:107)xiqi \u2212 xjqj(cid:107)2 \u2212 d2\n\n(cid:1)2\n\n,\n\ni,j\n\n(cid:88)\n\nmin\n\nx\n\n(i,j)\u2208N\n\n(5)\n\nwhere di,j is the distance between keypoints (given as input), N is the set of all neighboring pixels,\nxi is the unknown depth of point i, qi = A\u22121(ui, vi, 1)T is the line-of-sight of pixel i with A\ndenoting the known internal camera parameters. We consider a six-neighborhod system, i.e., up,\ndown, left, right, upper-left and lower-right. Note that each pairwise potential is a four-degree non-\nconvex polynomial with two random variables. We can easily decompose it into 15 monomials,\nand perform a concave-convex decomposition given the corresponding convex polynomials (see\nsupplementary material for an example).\nWe \ufb01rst conduct reconstruction experiments on the 100 randomly generated 3 \u00d7 3 meshes of [20],\nwhere zero-mean Gaussian noise with standard deviation \u03c3 = 2 is added to each observed keypoint\ncoordinate. We compare our approach to Fusion Moves [30], particle convex belief propagation\n(PCBP) [17], L-BFGS as well as dual decomposition with the alternating direction method of mul-\ntipliers using a polynomial solver (ADMM-Poly) [20]. We employ three different metrics, energy at\nconvergence, running time and root mean square error (RMSE). For L-BFGS and our method, we\nuse a \ufb02at mesh as initialization with two rotation angles (0, 0, 0) and (\u03c0/4, 0, 0). The convergence\ncriteria is an energy decrease of less than 10\u22125 or a maximum of 500 iterations is reached. As\nshown in Table 1 our algorithm achieves lower energy, lower RMSE, and faster running time than\nADMM-Poly and PCBP. Furthermore, as shown in Fig. 1(a) the time for running our algorithm to\nconvergence is similar to a single iteration of ADMM-Poly, while we achieve much lower energy.\n\n6\n\n100102910111213141516Time (seconds)Log\u2212scale EnergyReal Data 3D Reconstruction Energy Evolution ADMM\u2212PolyOurs1001011010.51111.51212.513Time (seconds)Log\u2212scale EnergySynthetic 3\u00d7 3 Mesh Energy Evolution ADMM\u2212PolyOurs10\u22125100105\u221210\u2212505Time (seconds)Log\u2212scale EnergyShape\u2212from\u2212shading Energy Evolution Curve ADMM\u2212PolyLBFGSOurs10110210314.8914.914.9114.9214.9314.9414.95Time (seconds)Log\u2212scale EnergyFoE Energy Evolution GradDescLBFGSOurs\fEnergy\n\nRMSE (mm)\nTime (second)\n\nL-BFGS CLVM ADMM-Poly\n736.98\n4.16\n0.3406\n\n905.37\n5.68\n314.8\n\nN/A\n7.23\nN/A\n\nOurs\n687.21\n3.29\n10.16\n\nTable 2: 3D Reconstruction on Cardboard sequences.\n\nFigure 2: 3D reconstruction results on Cardboard. Left to right: sample comparison, energy curve,\ngroundtruth, ADMM-Poly and our reconstruction.\n\nFigure 3: Shape-from-Shading results on Penny. Left to right: energy curve, inferred shape, rendered\nimage with inferred shape, groundtruth image.\nWe next reconstruct the real-world 9\u00d79 Cardboard sequence [20]. We compare with both ADMM-\nPoly and L-BFGS in terms of energy, time and RMSE. We also compare with the constrained latent\nvariable model of [29], in terms of RMSE. We cannot compare the energy value since the energy\nfunction is different. Again, we use a \ufb02at mesh as initialization. As shown in Table 2, our algorithm\noutperforms all baselines. Furthermore, it is more than 20 times faster than ADMM-Poly, which is\nthe second best algorithm. Average energy as a function of time is shown in Fig. 1(b). We refer\nthe reader to Fig. 2 and the video in the supplementary material for a visual comparison between\nADMM-Poly and our method. From the \ufb01rst sub\ufb01gure we observe that our method achieves lower\nenergy for most samples. The second sub\ufb01gure illustrate the fact that our approach monotonically\ndecreases the energy, as well as our method being much faster than ADMM-Poly.\n\nfunctions.\n\n3.2 Shape-from-Shading\nFollowing [5, 20], we formulate the shape from shading problem with 3rd-order 4-th de-\nLet xi,j = (ui,j, vi,j, wi,j)T be the 3D coordinates of\ngree polynomial\neach triangle vertex. Under the Lambertian model assumption,\nthe intensity of a trian-\ngle r is represented as:\n, where l = (l1, l2, l3)T is the direction of\nthe light, pr and qr are the x and y coordinates of normal vector nr = (pr, qr, 1)T ,\nwhich is computed as pr = (vi,j+1\u2212vi,j )(wi+1,j\u2212wi,j )\u2212(vi+1,j\u2212vi,j )(wi,j+1\u2212wi,j )\nand pr =\n(ui,j+1\u2212ui,j )(vi+1,j\u2212vi,j )\u2212(ui+1,j\u2212ui,j )(vi,j+1\u2212vi,j )\n(ui,j+1\u2212ui,j )(wi+1,j\u2212wi,j )\u2212(ui+1,j\u2212ui,j )(wi,j+1\u2212wi,j )\n(ui,j+1\u2212ui,j )(vi+1,j\u2212vi,j )\u2212(ui+1,j\u2212ui,j )(vi,j+1\u2212vi,j ) , respectively. Each clique r represents a trian-\ngle, which is constructed by three neighboring points on the grid, i.e., either (xi,j, xi,j+1, xi+1,j)\nor (xi,j, xi,j\u22121, xi+1,j). Given the rendered image and lighting direction, shape from shading is\nformulated as\n\nIr = l1pr+l2qr+l3\nr +1\n\np2\nr+q2\n\n\u221a\n\nr \u2212 (l1pr + l2qr + l3)2(cid:1)2\n\n.\n\n(6)\n\n(cid:0)(p2\n\n(cid:88)\n\nr\u2208R\n\nmin\n\nw\n\nr + q2\n\nr + 1)I 2\n\nWe tested our algorithm on the Vase, Penny and Mozart datasets, where Vase and Penny are 128\u00d7128\nimages and Mozart is a 256 \u00d7 256 image with light direction l = (0, 0, 1)T . The energy evolution\ncurve, the inferred shape as well as the rendered and groud-truth images are illustrated in Fig. 3.\nSee the supplementary material for more \ufb01gures on Penny and Mozart. Our algorithm achieves very\nlow energy, producing very accurate results in only 30 seconds. ADMM-Poly hardly runs on such\nlarge-scale data due to the computational cost of the polynomial system solver (more than 2 hours\n\n7\n\n05101520500100015002000Sample IndexEnergyConvergent Energy for Samples OursADMM\u2212Poly1001021046810121416Log\u2212Energy Evolution Curve (4th Sample)Time (log scale)Log\u2212Energy OursADMM\u2212Poly\u221250050\u221250050100150200ADMM\u2212Poly, Error: 4.9181 mm\u221250050\u221250050100150200Ours, Error: 2.1997 mm\u221250050\u221250050100150200GroundTruth010203005101520Time (sceonds)Log\u2212EnergyLog\u2212Energy evolution curve204060801001202040608010012001020Iteration: 98, Energy: 81.564, Time: 28.549Iteration: 98, RMSE: 0.012595, Time: 28.549GroundTruth\fEnergy\nPSNR\n\nOurs\n29413\n31.43\nTime (sec)\n384.5\nTable 3: FoE Energy Minimization Results.\n\nL-BFGS GradDesc\n29547\n30.96\n189.5\n\n29598\n31.56\n1122.5\n\nFigure 4: FoE based image denoising results on Cameraman, \u03c3 = 15.\n\nper iteration).\nIn order to compare with ADMM-Poly, we also conduct the shape from shading\nexperiment on a scaled 16 \u00d7 16 version of the Vase data. Both methods retrieve a shape that is very\nclose to the global optimum (0.00027 for ADMM-Poly and 0.00032 for our approach), however,\nour algorithm is over 500 times faster than ADMM-Poly (2250 seconds for ADMM-Poly and 13.29\nseconds for our proposed method). The energy evolution curve on the 16 \u00d7 16 re-scaled image in\nshown in Fig. 1(c).\n\nImage Denoising\n\n3.3\nWe formulate image denoising via minimizing the Fields-of-Experts (FoE) energy [18]. The\ndata term encodes the fact that the recovered image should be close to the noisy input, where\ncloseness is weighted by the noise level \u03c3. Given a pre-learned linear \ufb01lterbank of \u2018experts\u2019\n{Ji}i=1,...,K, the image prior term encodes the fact that natural images are Gibbs distributed via\np(x) = 1\n\ni xr)2)\u03b1i ). Thus we formulate denoising as\n\nr\u2208R(cid:81)K\n\nZ exp((cid:81)\n\ni=1(1 + 1\n\n2 (JT\n\u03c32(cid:107)x \u2212 y(cid:107)2\n\n\u03bb\n\n2 +\n\nmin\n\nx\n\n(cid:88)\n\nK(cid:88)\n\nr\u2208R\n\ni=1\n\n\u03b1i log(1 +\n\n1\n2\n\n(JT\n\ni xr)2),\n\n(7)\n\ni Ji\n8\n\nto obtain the concave-convex decomposition log(1+ 1\n\n(proof in the supplementary material). Therefore, we simply add an extra term \u03b3xT\n2 (JT\n\nwhere y is the noisy image input, x is the clean image estimation, r indexes 5 \u00d7 5 cliques and i is\nthe index for each FoE \ufb01lter. Note that this energy function is not a polynomial function. However,\nfor each FoE model, the Hessian of the energy function log(1 + 1\ni xr)2) is lower bounded by\n\u2212 J T\ni Ji\nr xr with\n8\n\u03b3 > J T\ni xr)2)+\nr xr. We utilize a pre-trained 5\u00d7 5 \ufb01lterbank with 24 \ufb01lters, and conduct experiments\n\u03b3xT\non the BM3D benchmark 1 with noise level \u03c3 = 15. In addition to the other baselines, we compare\nto the original FoE inference algorithm, which essentially is a \ufb01rst-order gradient descent method\nwith \ufb01xed gradient step [18]. For L-BFGS, we set the maximum number of iterations to 10,000, to\nmake sure that the algorithm converges. As shown in Table 3 and Fig. 1(d), our algorithm achieves\nlower energy than L-BFGS and \ufb01rst-order gradient descent. Furthermore, we see that lower energy\ndoes not translate to higher PSNR, showing the limitation of FoE as an image prior.\n\ni xr)2) = {log(1+ 1\n\nr xr}\u2212 \u03b3xT\n\n2 (JT\n\n2 (JT\n\n4 Conclusions\nWe investigated the properties of polynomials, and proved that every multivariate polynomial with\neven degree can be decomposed into a sum of convex and concave polynomials with degree no\ngreater than the original one. Motivated by this property, we exploited the concave-convex proce-\ndure to perform inference on continuous Markov random \ufb01elds with polynomial potentials. Our al-\ngorithm is especially \ufb01t for solving inference problems on continuous graphical models, with a large\nnumber of variables. Experiments on non-rigid reconstruction, shape-from-shading and image de-\nnoising validate the effectiveness of our approach. We plan to investigate continuous inference with\narbitrary differentiable functions, by making use of polynomial approximations as well as tighter\nconcave-convex decompositions.\n\n1http://www.cs.tut.fi/\u02dcfoi/GCF-BM3D/\n\n8\n\nClean ImageNoisy Image, PSNR: 24.5952GradDesc, PSNR: 31.0689Ours, PSNR: 30.9311L\u2212BFGS, PSNR: 30.769505010015011.4111.4211.4311.4411.4511.46Time (seconds)Energy (log\u2212scale)Energy evolution curve for FoE GradDescLBFGSOurs\fReferences\n[1] A. A. Ahmadi, A. Olshevsky, P. A. Parrilo, and J. N. Tsitsiklis. Np-hardness of deciding convexity of\n\nquartic polynomials and related problems. Mathematical Programming, 2013.\n\n[2] A. A. Ahmadi and P. A. Parrilo. A complete characterization of the gap between convexity and sos-\n\nconvexity. SIAM J. on Optimization, 2013.\n\n[3] K. Batselier, P. Dreesen, and B. D. Moor. The geometry of multivariate polynomial division and elimina-\n\ntion. SIAM Journal on Matrix Analysis and Applications, 2013.\n\n[4] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. PAMI, 2001.\n[5] A. Ecker and A. D. Jepson. Polynomial shape from shading. In CVPR, 2010.\n[6] A. T. Ihler and D. A. McAllester. Particle belief propagation. In AISTATS, 2009.\n[7] N. Komodakis, N. Paragios, and G. Tziritas. Mrf energy minimization and beyond via dual decomposition.\n\nPAMI, 2011.\n\n[8] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer,\n\n2009.\n\n[9] J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM Journal on\n\nOptimization, 2001.\n\n[10] J. B. Lasserre. Convergent sdp-relaxations in polynomial optimization with sparsity. SIAM Journal on\n\nOptimization, 2006.\n\n[11] V. Lempitsky, C. Rother, S. Roth, and A. Blake. Fusion moves for markov random \ufb01eld optimization.\n\nPAMI, 2010.\n\n[12] J. Nocedal and S. J. Wright. Numerical optimization 2ed. Springer-Verlag, 2006.\n[13] N. Noorshams and M. J. Wainwright. Belief propagation for continuous state spaces: Stochastic message-\n\npassing with quantitative guarantees. JMLR, 2013.\n\n[14] A. Papachristodoulou, J. Anderson, G. Valmorbida, S. Prajna, P. Seiler, and P. Parrilo. Sostools version\n\n3.00 sum of squares optimization toolbox for matlab. arXiv:1310.4716, 2013.\n\n[15] P. A. Parrilo. Structured semide\ufb01nite programs and semialgebraic geometry methods in robustness and\n\noptimization. PhD thesis, Caltech, 2000.\n\n[16] J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kauf-\n\nmann, 1988.\n\n[17] J. Peng, T. Hazan, D. McAllester, and R. Urtasun. Convex max-product algorithms for continuous mrfs\n\nwith applications to protein folding. In ICML, 2011.\n\n[18] S. Roth and M. J. Black. Fields of experts. IJCV, 2009.\n[19] R. Salakhutdinov, S. Roweis, and Z. Ghahramani. On the convergence of bound optimization algorithms.\n\nIn UAI, 2002.\n\n[20] M. Salzmann. Continuous inference in graphical models with polynomial energies. In CVPR, 2013.\n[21] A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Distributed Message Passing for Large Scale\n\nGraphical Models. In CVPR, 2011.\n\n[22] A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Ef\ufb01cient Structured Prediction with Latent\n\nVariables for General Graphical Models. In ICML, 2012.\n\n[23] A. Smola, S. Vishwanathan, and T. Hofmann. Kernel methods for missing variables. AISTATS, 2005.\n[24] L. Song, A. Gretton, D. Bickson, Y. Low, and C. Guestrin. Kernel belief propagation. In AISTATS, 2011.\n[25] B. Sriperumbudur and G. Lanckriet. On the convergence of the concave-convex procedure. In NIPS, \u201909.\n[26] B. Sriperumbudur, D. Torres, and G. Lanckriet. Sparse eigen methods by dc programming. In ICML, \u201907.\n[27] E. B. Sudderth, A. T. Ihler, M. Isard, W. T. Freeman, and A. S. Willsky. Nonparametric belief propagation.\n\nCommunications of the ACM, 2010.\n\n[28] D. J. Uherka and A. M. Sergott. On the continuous dependence of the roots of a polynomial on its\n\ncoef\ufb01cients. American Mathematical Monthly, 1977.\n\n[29] A. Varol, M. Salzmann, P. Fua, and R. Urtasun. A constrained latent variable model. In CVPR, 2012.\n[30] S. Vicente and L. Agapito. Soft inextensibility constraints for template-free non-rigid reconstruction. In\n\nECCV, 2012.\n\n[31] Y. Weiss and W. T. Freeman. Correctness of belief propagation in gaussian graphical models of arbitrary\n\ntopology. Neural computation, 2001.\n\n[32] C. N. Yu and T. Joachims. Learning structural svms with latent variables. In ICML, 2009.\n[33] A. L. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 2003.\n\n9\n\n\f", "award": [], "sourceid": 583, "authors": [{"given_name": "Shenlong", "family_name": "Wang", "institution": "University of Toronto"}, {"given_name": "Alex", "family_name": "Schwing", "institution": "University of Toronto"}, {"given_name": "Raquel", "family_name": "Urtasun", "institution": "University of Toronto"}]}