{"title": "An Analysis of Convex Relaxations for MAP Estimation", "book": "Advances in Neural Information Processing Systems", "page_first": 1041, "page_last": 1048, "abstract": "", "full_text": "An Analysis of Convex Relaxations for MAP Estimation\r\n\r\nM. Pawan Kumar Dept. of Computing Oxford Brookes University\r\npkmudigonda@brookes.ac.uk\r\n\r\nV. Kolmogorov Computer Science University College London\r\nvnk@adastral.ucl.ac.uk\r\n\r\nP.H.S. Torr Dept. of Computing Oxford Brookes University\r\nphiliptorr@brookes.ac.uk\r\n\r\nAbstract\r\nThe problem of obtaining the maximum a posteriori estimate of a general discrete random field (i.e. a random field defined using a finite and discrete set of labels) is known to be N P-hard. However, due to its central importance in many applications, several approximate algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) L P - S: the linear programming (L P) relaxation proposed by Schlesinger [20] for a special case and independently in [4, 12, 23] for the general case; (ii) Q P - R L: the quadratic programming (Q P) relaxation by Ravikumar and Lafferty [18]; and (iii) S O C P - M S: the second order cone programming (S O C P) relaxation first proposed by Muramatsu and Suzuki [16] for two label problems and later extended in [14] for a general label set. We show that the S O C P - M S and the Q P - R L relaxations are equivalent. Furthermore, we prove that despite the flexibility in the form of the constraints/objective function offered by Q P and S O C P, the L P - S relaxation strictly dominates (i.e. provides a better approximation than) Q P - R L and S O C P - M S. We generalize these results by defining a large class of S O C P (and equivalent Q P) relaxations which is dominated by the L P - S relaxation. Based on these results we propose some novel S O C P relaxations which strictly dominate the previous approaches.\r\n\r\n1 Introduction\r\nDiscrete random fields are a powerful tool to obtain a probabilistic formulation for various applications in Computer Vision and related areas [3]. Hence, developing accurate and efficient algorithms for performing inference on a given discrete random field is of fundamental importance. In this work, we will focus on the problem of maximum a posteriori (M A P) estimation. M A P estimation is a key step in obtaining the solutions to many applications such as stereo, image stitching and segmentation [21]. Furthermore, it is closely related to many important Combinatorial Optimization problems such as M A X C U T [6], multi-way cut [5], metric labelling [3, 11] and 0-extension [3, 9]. Given data D, a discrete random field models the distribution (i.e. either the joint or the conditional probability) of a labelling for a set of random variables. Each of these variables v = {v0 , v1 , , vn-1 } can take a label from a discrete set l = {l0 , l1 , , lh-1 }. A particular labelling of variables v is specified by a function f whose domain corresponds to the indices of the random variables and whose range is the index of the label set, i.e. f : {0, 1, , n - 1} {0, 1, , h - 1}. In other words, random variable va takes label lf (a) . For convenience, we assume the model to be a conditional random field (C R F) while noting that all the results of this paper also apply to Markov random fields (M R F). A C R F specifies a neighbourhood relationship E between the random variables, i.e. (a, b) E if, and only if, va and vb are neighbouring random variables. Within this framework, the conditional probability of a labelling f given data D is specified as Pr(f |D, ) = 1 exp(-Q(f ; D, ). Here Z ( ) represents the parameters of the C R F and Z () is a normalization constant which ensures that the probability sums to one (also know( as the partition function). The energy Q(f ; D, ) is given by n v 2 1 1 a;f (a) + Q(f ; D, ) = a,b)E ab;f (a)f (b) . The term a;f (a) is called a unary potential a v 2 since its value depends on the labelling of one random variable at a time. Similarly, ab;f (a)f (b) is called a pairwise potential as it depends on a pair of random variables. For simplicity, we assume 1\r\n\r\n\f\r\n2 that ab;f (a)f (b) = w(a, b)d(f (a), f (b)) where w(a, b) is the weight that indicates the strength of the pairwise relationship between variables va and vb , with w(a, b) = 0 if (a, b) E , and d(, ) is / a distance function on the labels. As will be seen later, this formulation of the pairwise potentials would allow us to concisely describe our results.\r\n\r\nThe problem of M A P estimation is well known to be N P-hard in general. Since it plays a central role in several applications, many approximate algorithms have been proposed in the literature. In this work, we analyze three such algorithms which are based on convex relaxations. Specifically, we consider: (i) L P - S, the linear programming (L P) relaxation of [4, 12, 20, 23]; (ii) Q P - R L, the quadratic programming (Q P) relaxation of [18]; and (iii) S O C P - M S, the second order cone programming (S O C P) relaxation of [14, 16]. In order to provide an outline of these relaxations, we formulate the problem of M A P estimation as an Integer Program (I P). 1.1 Integer Programming Formulation We define a binary variable vector x of length nh. We denote the element of x at index a h + i as xa;i where va v and li l. These elements xa;i specify a labelling f such that xa;i = 1 if f (a) = i and xa;i = -1 otherwise. We say that the variable xa;i belongs to variable va since it defines which label va does (or does not) take. Let X = xx . We refer to the (a h + i, b h + j )th element of the matrix X as Xab;ij where va , vb v and li , lj l. Clearly, the following I P finds the labelling with the minimum energy, i.e. it is equivalent to the M A P estimation problem: v ( (1+xa;i +xb;j +Xab;ij ) (1+xa;i ) 2 IP: x = arg minx 1 + a,b)E ,li ,lj ab;ij 2 4 a ,li a;i s.t. x {-1, 1}nh, l xa;i = 2 - h, i l X = xx . (1 ) (2 ) (3 )\r\n\r\nConstraints (1) and (3) specify that the variables x and X are binary such that Xab;ij = xa;i xb;j . We will refer to them as the integer constraints. Constraint (2), which specifies that each variable should be assigned only one label, is known as the uniqueness constraint. Note that one uniqueness constraint is specified for each variable va . Solving the above I P is in general N P-hard. It is therefore common practice to obtain an approximate solution using convex relaxations. We describe four such convex relaxations below. 1.2 Linear Programming Relaxation The L P relaxation (proposed by Schlesinger [20] for a special case and independently in [4, 12, 23] for the general case), which we call L P - S, is given as follows: v ( (1+xa;i ) (1+xa;i +xb;j +Xab;ij ) 2 1 LP-S: x = arg minx + a,b)E ,li ,lj ab;ij 2 4 a ,li a;i s.t. x [-1, 1]nh , X [-1, 1]nhnh, l xa;i = 2 - h, i l l Xab;ij = (2 - h)xa;i , j l Xab;ij = Xba;j i , 1 + xa;i + xb;j + Xab;ij 0. (4 ) (5 ) (7 ) (8 ) (6 )\r\n\r\nIn the L P - S relaxation only those elements Xab;ij of X are used for which (a, b) E and li , lj l. Unlike the I P, the feasibility region of the above problem is relaxed such that the variables xa;i and Xab;ij lie in the interval [-1, 1]. Further, the constraint (3) is replaced by equation (6) which is called the marginalization constraint [23]. One marginalization constraint is specified for each 2 (a, b) E and li l. Constraint (7) specifies that X is symmetric. Constraint (8) ensures that ab;ij is multiplied by a number between 0 and 1 in the objective function. These constraints (7) and (8) are defined for all (a, b) E and li , lj l. Note that the above constraints are not exhaustive, i.e. it is possible to specify other constraints for the problem of M A P estimation (as will be seen in the different relaxations described in the subsequent sections). 1.3 Quadratic Programming Relaxation We now describe the Q P relaxation for the M A P estimation I P which was proposed by Ravikumar and Lafferty [18]. To this end, it would be convenient to reformulate the objective function of the I P ^ using a vector of unary potentials of length nh (denoted by 1 ) and a matrix of pairwise potentials 2\r\n\r\n\f\r\n^ of size nh nh (denoted by 2 ). The element of the unary potential vector at index (a h + i) is v l 2 1 ^1 |ac;ik |, where va v and li l. The (a h + i, b h + j )th defined as a;i = a;i - c v k l ^ element of the pairwise potential matrix 2 is defined such that l v 2 |ac;ik |, if a = b, i = j, c v k l ^2 ab;ij = (9 ) 2 ab;ij otherwise, where va , vb v and li , lj l. In other words, the potentials are modified by defining a pairwise ^2 potential aa;ii and subtracting the value of that potential from the corresponding unary potential 1 ^2 a;i . The advantage of this reformulation is that the matrix is guaranteed to be positive semidefi1 2 1+x ^ 2 0. Using the fact that for xa;i {-1, 1}, +xa;i nite, i.e. = 2 a;i , it can be shown that 2 the following is equivalent to the M A P estimation problem [18]: 1 1 1 2 1 +x , ^ ^ QP-RL: x = arg minx +x + +x (1 0 ) 2 2 2 l xa;i = 2 - h, va v, (1 1 ) s.t. i l x {-1, 1}nh,\r\n\r\n(1 2 )\r\n\r\nwhere 1 is a vector of appropriate dimensions whose elements are all equal to 1. By relaxing the feasibility region of the above problem to x [-1, 1]nh, the resulting Q P can be solved in ^ 2 0 (i.e. the relaxation of the Q P (10)-(12) is convex). We call the above polynomial time since relaxation Q P - R L. Note that in [18], the Q P - R L relaxation was described using the variable y = 1+x . 2 However, the above formulation can easily be shown to be equivalent to the one presented in [18]. 1.4 Semidefinite Programming Relaxation The S D P relaxation of the M A P estimation problem replaces the non-convex constraint X = xx by the convex semidefinite constraint X - xx 0 [6, 15] which can be expressed as 0 1 x , (1 3 ) xX using Schur's complement [2]. Further, like L P - S, it relaxes the integer constraints by allowing the variables xa;i and Xab;ij to lie in the interval [-1, 1] with Xaa;ii = 1 for all va v, li l. The S D P relaxation is a well-studied approach which provides accurate solutions for the M A P estimation problem (e.g. see [25]). However, due to its computational inefficiency, it is not practically useful for large scale problems with nh > 1000. See however [17, 19, 22]. 1.5 Second Order Cone Programming Relaxation We now describe the S O C P relaxation that was proposed by Muramatsu and Suzuki [16] for the M A X C U T problem (i.e. M A P estimation with h = 2) and later extended for a general label set [14]. This relaxation, which we call S O C P - M S, is based on the technique of Kim and Kojima [10] who observed that the S D P constraint can be further relaxed to second order cone (S O C) constraints. For this purpose, it employs a set of matrices S = {Ck |Ck = Uk (Uk ) 0, k = 1, 2, . . . , nC }. Using the fact that the Frobenius dot product of two semidefinite matrices is non-negative, we get (Uk ) x 2 Ck X, k = 1, , nC . (1 4 ) Each of the above S O C constraints may involve some or all variables xa;i and Xab;ij . For example, k if Cab;ij = 0, then the k th S O C constraint will not involve Xab;ij (since its coefficient will be 0). In order to describe the S O C P - M S relaxation, we consider a pair of neighbouring variables va and vb , i.e. (a, b) E , and a pair of labels li and lj . These two pairs define the following variables: xa;i , xb;j , Xaa;ii = Xbb;j j = 1 and Xab;ij = Xba;j i (since X is symmetric). For each such pair of variables and labels, the S O C P - M S relaxation specifies two S O C constraints which involve only the above variables [14, 16]. In order to specify the exact form of these S O C constraints, we need the following definitions. Using the variables va and vb (where (a, b) E ) and labels li and lj , we define the submatrices x(a,b,i,j ) and X(a,b,i,j ) of x and X respectively as: x , X . Xab;ij a;i aa;ii x(a,b,i,j ) = X(a,b,i,j ) = (1 5 ) xb;j Xba;j i Xbb;j j 3\r\n\r\n\f\r\nThe S O C P - M S relaxation specifies S O C constraints of the form (14) for all pairs of neighbouring var1ables (a,, b) E and l1bels li , lj . l. To this end, it uses the following two matrices: C1 S = i a M -1 1 2 Hence, in the S O C P - M S formulation, the M A P estimation I P is CM S = -1 1 11 relaxed to v ( (1+xa;i +xb;j +Xab;ij ) (1+xa;i ) 2 1 SOCP-MS: x = arg minx + a,b)E ,li ,lj ab;ij 2 4 a ,li a;i s.t. x [-1, 1]nh , X [-1, 1]nhnh, l xa;i = 2 - h, i l (xa;i - xb;j )2 2 - 2Xab;ij , (xa;i + xb;j )2 2 + 2Xab;ij , Xab;ij = Xba;j i . We refer the reader to [14, 16] for details. (1 6 ) (1 7 ) (1 8 ) (1 9 ) (2 0 )\r\n\r\n2 Comparing Relaxations\r\nIn order to compare the relaxations described above, we require the following definitions. We say that a relaxation A dominates the relaxation B (alternatively, B is dominated by A) if and only if\r\n(x,X)F ( A)\r\n\r\nmin\r\n\r\ne(x, X; ) \r\n\r\n(x,X)F ( B)\r\n\r\nmin\r\n\r\ne(x, X; ), ,\r\n\r\n(2 1 )\r\n\r\nwhere F (A) and F (B ) are the feasibility regions of the relaxations A and B respectively. The term e(x, X; ) denotes the value of the objective function at (x, X) (i.e. the energy of the possibly fractional labelling (x, X)) for the M A P estimation problem defined over the C R F with parameter . Thus the optimal value of the dominating relaxation A is always greater than or equal to the optimal value of relaxation B. We note here that the concept of domination has been used previously in [4] (to compare L P - S with the linear programming relaxation in [11]). Relaxations A and B are said to be equivalent if A dominates B and B dominates A, i.e. their optimal values are equal to each other for all C R Fs. A relaxation A is said to strictly dominate relaxation B if A dominates B but B does not dominate A . In other words there exists at least one C R F with parameter such that\r\n(x,X)F ( A)\r\n\r\nmin\r\n\r\ne(x, X; ) >\r\n\r\n(x,X)F ( B)\r\n\r\nmin\r\n\r\ne(x, X; ).\r\n\r\n(2 2 )\r\n\r\nNote that, by definition, the optimal value of any relaxation would always be less than or equal to the energy of the optimal (i.e. the M A P) labelling. Hence, the optimal value of a strictly dominating relaxation A is closer to the optimal value of the M A P estimation I P compared to that of relaxation B. In other words, A provides a tighter lower bound for M A P estimation than B. Our Results: We prove that L P - S strictly dominates S O C P - M S (see section 3). Further, in section 4, we show that Q P - R L is equivalent to S O C P - M S. This implies that L P - S strictly dominates the Q P - R L relaxation. In section 5 we generalize the above results by proving that a large class of S O C P (and equivalent Q P) relaxations is dominated by L P - S. Based on these results, we propose a novel set of constraints which result in S O C P relaxations that dominate L P - S, Q P - R L and S O C P - M S. These relaxations introduce S O C constraints on cycles and cliques formed by the neighbourhood relationship of the C R F. Note that we will only provide the statement of the results here due to page limit. All the proofs are described in [13].\r\n\r\n3 LP-S vs. SOCP-MS\r\nWe now show that for the M A P estimation problem the linear constraints of L P - S are stronger than the S O C P - M S constraints. In other words the feasibility region of L P - S is a strict subset of the feasibility region of S O C P - M S (i.e. F (L P - S) F (S O C P - M S )). This in turn would allow us to prove the following theorem. Theorem 1: The L P - S relaxation strictly dominates the S O C P - M S relaxation.\r\n\r\n4 QP-RL vs. SOCP-MS\r\nWe now prove that Q P - R L and S O C P - M S are equivalent (i.e. their optimal values are equal for M A P estimation problems defined over all C R Fs). Specifically, we consider a vector x which lies in the 4\r\n\r\n\f\r\nfeasibility regions of the Q P - R L and S O C P - M S relaxations, i.e. x [-1, 1]nh . For this vector, we show that the values of the objective functions of the Q P - R L and S O C P - M S relaxations are equal. This would imply that if x is an optimal solution of Q P - R L for some C R F with parameter then there exists an optimal solution (x , X ) of the S O C P - M S relaxation. Further, if eQ and eS are the optimal values of the objective functions obtained using the Q P - R L and S O C P - M S relaxation, then eQ = eS . Theorem 2: The Q P - R L relaxation and the S O C P - M S relaxation are equivalent. Theorems 1 and 2 prove that the L P - S relaxation strictly dominates the Q P - R L and S O C P - M S relaxations. A natural question that now arises is whether the additive bound of Q P - R L (proved in [18]) is applicable to the L P - S and S O C P - M S relaxations. Our next theorem answers this question in an affirmative. Theorem 3: Using the rounding scheme of [18], L P - S and S O C P - M S provide the same additive ( l 2 bound as the Q P - R L relaxation, i.e. S where S = |ab;ij | (i.e. the sum of the a,b)E 4 i , lj l absolute values of all pairwise potentials). Furthermore, this bound is tight. The above bound was proved for the case of binary variables (i.e. h = 2) in [8] using a slightly different rounding scheme.\r\n\r\n5 QP and SOCP Relaxations over Trees and Cycles\r\nWe now generalize the results of Theorem 1 by defining a large class of S O C P relaxations which is dominated by L P - S. Specifically, we consider the S O C P relaxations which relax the non-convex constraint X = xx using a set of second order cone (S O C) constraints of the form where C = U (U )\r\nk k k\r\n\r\n||(Uk ) x|| Ck X, k = 1, , nC 0, for all k = 1, , nC .\r\n\r\n(2 3 )\r\n\r\nNote that each S O C P relaxation belonging to this class would define an equivalent Q P relaxation (similar to the equivalent Q P - R L relaxation defined by the S O C P - M S relaxation). Hence, all these Q P relaxations will also be dominated by the L P - S relaxation. Before we begin to describe our results in detail, we need to set up some notation as follows.\r\n\r\n( a)\r\n\r\n(b )\r\n\r\n( c)\r\n\r\nFigure 1: (a) An example C R F defined over four variables which form a cycle. Note that the observed nodes are not shown for the sake of clarity of the image. (b) The set E k specified by the matrix Ck shown in equation (25), i.e. E k = {(a, b), (b, c), (c, d)}. (c) The set V k = {a, b, c, d}. See text for definitions of these sets. Notation: We consider an S O C constraint which is of the form described in equation (23), i.e. ||(Uk ) x|| Ck X, (2 4 ) where k {1, , nC }. In order to help the reader understand the notation better, we use an example C R F shown in Fig. 1(a). This C R F is defined over four variables v = {va , vb , vc , vd } (connected to form a cycle of length 4), each of which take a label from the set l = {l0 , l1 }. For this C R F we specify a constraint using a matrix Ck 0 which is 0 everywhere, except for the following 4 4 submatrix: k k k k Caa;00 Cab;00 Cac;00 Cad;00 2110 k k k k Cba;00 Cbb;00 Cbc;00 Cbd;00 1 2 1 1 k (2 5 ) k k k Cca;00 Ccb;00 Ccc;00 Ccd;00 = 1 1 2 1 k k k k 0112 C C C C\r\nda;00 db;00 dc;00 dd;00\r\n\r\nUsing the S O C constraint shown in equation (24) we define the following two sets: (i) The set E k is defined such that (a, b) E k if, and only if, it satisfies the following conditions: (a, b) E , (2 6 ) 5\r\n\r\n\f\r\nk li , lj l such that Cab;ij = 0.\r\n\r\n(2 7 )\r\n\r\nRecall that E specifies the neighbourhood relationship for the given C R F. In other words E k is the subset of the edges in the graphical model of the C R F such that Ck specifies constraints for the random variables corresponding to those edges. For the example C R F (shown in Fig. 1(a)) and Ck matrix (in equation (25)), the set E k obtained is shown in Fig. 1(b). (ii) The set V k is defined as a V k if, and only if, there exists a vb v such that (a, b) E k . In other words V k is the subset of hidden nodes in the graphical model of the C R F such that Ck specifies constraints for the random variables corresponding to those hidden nodes. Fig. 1(c) shows the set V k for our example S O C constraint. We also define a weighted graph Gk = (V k , E k ) whose vertices are specified by the set V k and whose edges are specified by the set E k . The weight of an edge (a, b) E k is given by w(a, b). Recall that w(a, b) specifies the strength of the pairwise relationship between two neighbouring variables va and vb . Thus, for our example S O C constraint, the vertices of this graph are given in Fig. 1(c) while the edges are shown in Fig. 1(b). This graph can be viewed as a subgraph of the graphical model representation for the given C R F. Theorem 4: S O C P relaxations (and the equivalent Q P relaxations) which define constraints only using graphs Gk = (V k , E k ) which form (arbitrarily large) trees are dominated by the L P - S relaxation. We note that the above theorem can be proved using the results of [24] on moment constraints (which imply that L P - S provides the exact solution for the M A P estimation problems defined over treestructured random fields). However, our alternative proof presented in [13] allows us to generalize the results of Theorem 4 for certain cycles as follows. Theorem 5: When d(i, j ) 0 for all li , lj l, the S O C P relaxations which define constraints only using non-overlapping graphs Gk which form (arbitrarily large) even cycles with all positive or all negative weights are dominated by the L P - S relaxation. The above theorem can be proved for cycles of any length whose weights are all negative by a similar construction. Further, it also holds true for odd cycles (i.e. cycles of odd number of variables) which have only one positive or only one negative weight. However, as will be seen in the next section, unlike trees it is not possible to extend these results for any general cycle.\r\n\r\n6 Some Useful SOC Constraints\r\nWe now describe two S O C P relaxations which include all the marginalization constraints specified in L P - S. Note that the marginalization constraints can be incorporated within the S O C P framework but not in the Q P framework. 6.1 The SOCP-C Relaxation The S O C P - C relaxation (where C denotes cycles) defines second order cone (S O C) constraints using positive semidefinite matrices C such that the graph G (defined in section 5) form cycles. Let the variables corresponding to vertices of one such cycle G of length c be denoted as vC = {vb |b {a1 , a2 , , ac }}. Further, let lC = {lj |j {i1 , i2 , , ic }} lc be a set of labels for the variables vC . In addition to the marginalization constraints, the S O C P - C relaxation specifies the following S O C constraint: ||U x|| C X, (2 8 ) such that the graph G defined by the above constraint forms a cycle. The matrix C is 0 everywhere except the following elements: if k = l, c (2 9 ) Cak ,al ,ik ,il = Dc (k , l) otherwise. Here Dc is a c c matrix which is defined as follows: 1 if |k - l| = 1 Dc (k , l) = (-1)c-1 if |k - l| = c - 1 0 otherwise,\r\n\r\n(3 0 )\r\n\r\nand c is the absolute value of the smallest eigenvalue of Dc . In other words the submatrix of C defined by vC and lC has diagonal elements equal to c and off-diagonal elements equal to the 6\r\n\r\n\f\r\nelements of Dc . Clearly, C = U U 0 since its only non-zero submatrix c I + Dc (where I is a c c identity matrix) is positive semidefinite. This allows us to define a valid S O C constraint as shown in inequality (28). We choose to define the S O C constraint (28) for only those set of labels lC which satisfy the following: ( ( 2 2 Dc (k , l)ak al ;jk jl , {j1 , j2 , , jc }. (3 1 ) Dc (k , l)ak al ;ik il \r\nak ,al )E ak ,al )E\r\n\r\nNote that this choice is motivated by the fact that the variables Xak al ;ik il corresponding to these sets vC and lC are assigned trivial values by the L P - S relaxation in the presence of non-submodular terms. Since marginalization constraints are included in the S O C P - C relaxation, the value of the objective function obtained by solving this relaxation would at least be equal to the value obtained by the L P - S relaxation (i.e. S O C P - C dominates L P - S, see Case II in section 2). We can further show that in the case where |l| = 2 and the constraint (28) is defined over a frustrated cycle (i.e. a cycle with an odd number of non-submodular terms) S O C P - C strictly dominates L P - S. One such example is given in [13]. Note that if the given C R F contains no frustrated cycle, then it can be solved exactly using the method described in [7]. The constraint defined in equation (28) is similar to the (linear) cycle inequality constraints [1] which are given by k Dc (k , l)Xak al ;ik il 2 - c. (3 2 )\r\n,l\r\n\r\nWe believe that the feasibility region defined by cycle inequalities is a strict subset of the feasibility region defined by equation (28). In other words a relaxation defined by adding cycle inequalities to L P - S would strictly dominate S O C P - C. We are not aware of a formal proof for this. We now describe the S O C P - Q relaxation. 6.2 The SOCP-Q Relaxation In this previous section we saw that L P - S dominates S O C P relaxations whose constraints are defined on trees. However, the S O C P - C relaxation, which defines its constraints using cycles, strictly dominates L P - S. This raises the question whether matrices C, which result in more complicated graphs G, would provide an even better relaxation for the M A P estimation problem. In this section, we answer this question in an affirmative. To this end, we define an S O C P relaxation which specifies constraints such that the resulting graph G from a clique. We denote this relaxation by S O C P - Q (where Q indicates cliques).\r\n\r\nThe S O C P - Q relaxation contains the marginalization constraint and the cycle inequalities (defined above). In addition, it also defines S O C constraints on graphs G which form a clique. We denote the variables corresponding to the vertices of clique G as vQ = {vb |b {a1 , a2 , , aq }}. Let lQ = {lj |j {i1 , i2 , , iq }} be a set of labels for these variables vQ . Given this set of variables vQ and labels lQ , we define an S O C constraint using a matrix C of size nh nh which is zero everywhere except for the elements Cak al ;ik il = 1. Clearly, C is a rank 1 matrix with eigenvalue 1 and eigenvector u which is zero everywhere except uak ;ik = 1 where vak vQ and lik lQ . This implies that C 0, which enables us to obtain the following S O C constraint: 2 k k q+ Xak al ;ik il . (3 3 ) xak ;ik\r\n,l\r\n\r\nAgain, this choice is motivated by the fact that the variables Xak al ;ik il corresponding to these sets vQ and lQ are assigned trivial values by the L P - S relaxation in the presence of non-submodular pairwise potentials. When the clique contains a frustrated cycle, it can be shown that S O C P - Q dominates the L P - S relaxation (similar to S O C P - C). Further, using a counter-example, it can proved that the feasibility region given by cycle inequalities is not a subset of the feasibility region defined by constraint (33). One such example is given in [13]. 7\r\n\r\nWe choose to specify the above constraint only for the set of labels lQ which satisfy the following condition: ( ( 2 2 ak al ;jk jl , {j1 , j2 , , jq }. (3 4 ) ak al ;ik il \r\nak ,al )E ak ,al )E\r\n\r\n\f\r\n7 Discussion\r\nWe presented an analysis of approximate algorithms for M A P estimation which are based on convex relaxations. The surprising result of our work is that despite the flexibility in the form of the objective function/constraints offered by Q P and S O C P, the L P - S relaxation dominates a large class of Q P and S O C P relaxations. It appears that the authors who have previously used S O C P relaxations in the Combinatorial Optimization literature [16] and those who have reported Q P relaxation in the Machine Learning literature [18] were unaware of this result. We also proposed two new S O C P relaxations (S O C P - C and S O C P - Q) and presented some examples to prove that they provide a better approximation than L P - S. An interesting direction for future research would be to determine the best S O C constraints for a given M A P estimation problem (e.g. with truncated linear pairwise potentials).\r\nAcknowledgments: We thank Pradeep Ravikumar and John Lafferty for careful reading of the manuscript and for pointing out an error in our description of the S O C P - M S relaxation.\r\n\r\nReferences\r\n[1] F. Barahona and A. Mahjoub. On the cut polytope. Mathematical Programming, 36:157173, 1986. [2] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. [3] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. PAMI, 23(11):12221239, 2001. [4] C. Chekuri, S. Khanna, J. Naor, and L. Zosin. Approximation algorithms for the metric labelling problem via a new linear programming formulation. In SODA, 2001. [5] E. Dalhaus, D. Johnson, C. Papadimitriou, P. Seymour, and M. Yannakakis. The complexity of multiterminal cuts. SICOMP, 23(4):864894, 1994. [6] M. Goemans and D. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of ACM, 42:11151145, 1995. [7] P. Hammer, P. Hansen, and B. Simeone. Roof duality, complementation and persistency in quadratic 0-1 optimization. Mathematical Programming, 28:121155, 1984. [8] P. Hammer and B. Kalantari. A bound on the roof duality gap. Technical Report RRR 46, Rutgers Center for Operations Research, Rutgers University, 1987. [9] A. Karzanov. Minimum 0-extension of graph metrics. Euro. J. of Combinatorics, 19:71101, 1998. [10] S. Kim and M. Kojima. Second-order cone programming relaxation of nonconvex quadratic optimization problems. Technical report, Tokyo Institute of Technology, 2000. [11] J. Kleinberg and E. Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields. In STOC, pages 1423, 1999. [12] A. Koster, C. van Hoesel, and A. Kolen. The partial constraint satisfaction problem: Facets and lifting theorems. Operations Research Letters, 23(3-5):8997, 1998. [13] M. P. Kumar, V. Kolmogorov, and P. H. S. Torr. An analysis of convex relaxations for MAP estimation. Technical report, Oxford Brookes University, 2007. Available at http://cms.brookes.ac.uk/staff/PawanMudigonda/. [14] M. P. Kumar, P. H. S. Torr, and A. Zisserman. Solving Markov random fields using second order cone programming relaxations. In CVPR, volume I, pages 10451052, 2006. [15] J. Lasserre. Global optimization with polynomials and the problem of moments. SIAM Journal of Optimization, 11:796817, 2001. [16] M. Muramatsu and T. Suzuki. A new second-order cone programming relaxation for max-cut problems. Journal of Operations Research of Japan, 43:164177, 2003. [17] C. Olsson, A. Eriksson, and F. Kahl. Solving large scale binary quadratic problems: Spectral methods vs. semidefinite programming. In CVPR, pages 18, 2007. [18] P. Ravikumar and J. Lafferty. Quadratic programming relaxations for metric labelling and Markov random field MAP estimation. In ICML, 2006. [19] C. Schellewald and C. Schnorr. Subgraph matching with semidefinite programming. In IWCIA, 2003. [20] M. Schlesinger. Sintaksicheskiy analiz dvumernykh zritelnikh singnalov v usloviyakh pomekh (syntactic analysis of two-dimensional visual signals in noisy conditions). Kibernetika, 4:113130, 1976. [21] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A comparative study of energy minimization methods for markov random fields. In ECCV, pages II: 1629, 2006. [22] P. H. S. Torr. Solving Markov random fields using semidefinite programming. In AISTATS, 2003. [23] M. Wainwright, T. Jaakola, and A. Willsky. MAP estimation via agreement on trees: Message passing and linear programming. IEEE Trans. on Information Theory, 51(11):36973717, 2005. [24] M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Technical Report 649, University of California, Berkeley, 2003. [25] M. Wainwright and M. Jordan. Treewidth-based conditions for exactness of the Sherali-Adams and Lasserre relaxations. Technical Report 671, University of California, Berkeley, 2004.\r\n\r\n8\r\n\r\n\f\r\n", "award": [], "sourceid": 3373, "authors": [{"given_name": "Pawan", "family_name": "Mudigonda", "institution": null}, {"given_name": "Vladimir", "family_name": "Kolmogorov", "institution": null}, {"given_name": "Philip", "family_name": "Torr", "institution": null}]}