{"title": "Generalized roof duality and bisubmodular functions", "book": "Advances in Neural Information Processing Systems", "page_first": 1144, "page_last": 1152, "abstract": "Consider a convex relaxation $\\hat f$ of a pseudo-boolean function $f$. We say that the relaxation is {\\em totally half-integral} if $\\hat f(\\bx)$ is a polyhedral function with half-integral extreme points $\\bx$, and this property is preserved after adding an arbitrary combination of constraints of the form $x_i=x_j$, $x_i=1-x_j$, and $x_i=\\gamma$ where $\\gamma\\in\\{0,1,\\frac{1}{2}\\}$ is a constant. A well-known example is the {\\em roof duality} relaxation for quadratic pseudo-boolean functions $f$. We argue that total half-integrality is a natural requirement for generalizations of roof duality to arbitrary pseudo-boolean functions. Our contributions are as follows. First, we provide a complete characterization of totally half-integral relaxations $\\hat f$ by establishing a one-to-one correspondence with {\\em bisubmodular functions}. Second, we give a new characterization of bisubmodular functions. Finally, we show some relationships between general totally half-integral relaxations and relaxations based on the roof duality.", "full_text": "Generalized roof duality and bisubmodular functions\n\nVladimir Kolmogorov\n\nDepartment of Computer Science\nUniversity College London, UK\n\nv.kolmogorov@cs.ucl.ac.uk\n\nAbstract\n\nConsider a convex relaxation \u02c6f of a pseudo-boolean function f. We say that\nthe relaxation is totally half-integral if \u02c6f (x) is a polyhedral function with half-\nintegral extreme points x, and this property is preserved after adding an arbitrary\ncombination of constraints of the form xi = xj, xi = 1 \u2212 xj, and xi = \u03b3 where\n\u03b3 \u2208 {0, 1, 1\n2} is a constant. A well-known example is the roof duality relaxation\nfor quadratic pseudo-boolean functions f. We argue that total half-integrality is a\nnatural requirement for generalizations of roof duality to arbitrary pseudo-boolean\nfunctions.\nOur contributions are as follows. First, we provide a complete characterization\nof totally half-integral relaxations \u02c6f by establishing a one-to-one correspondence\nwith bisubmodular functions. Second, we give a new characterization of bisub-\nmodular functions. Finally, we show some relationships between general totally\nhalf-integral relaxations and relaxations based on the roof duality.\n\nIntroduction\n\n1\nLet V be a set of |V | = n nodes and B \u2282 K1/2 \u2282 K be the following sets:\n\nB = {0, 1}V\n\nK1/2 = {0, 1\n\n2 , 1}V\n\nK = [0, 1]V\n\nA function f : B \u2192 R is called pseudo-boolean. In this paper we consider convex relaxations\n\u02c6f : K \u2192 R of f which we call totally half-integral:\nDe\ufb01nition 1. (a) Function \u02c6f : P \u2192 R where P \u2286 K is called half-integral if it is a convex\npolyhedral function such that all extreme points of the epigraph {(x, z) | x \u2208 P, z \u2265 \u02c6f (x)} have\nthe form (x, \u02c6f (x)) where x \u2208 K1/2. (b) Function \u02c6f : K \u2192 R is called totally half-integral if\nrestrictions \u02c6f : P \u2192 R are half-integral for all subsets P \u2286 K obtained from K by adding an\narbitrary combination of constraints of the form xi = xj, xi = xj, and xi = \u03b3 for points x \u2208 K.\nHere i, j denote nodes in V , \u03b3 denotes a constant in {0, 1, 1\n\n2}, and z \u2261 1 \u2212 z.\n\npseudo-boolean functions f (x) = (cid:80)\n\ni cixi +(cid:80)\n\nA well-known example of a totally half-integral relaxation is the roof duality relaxation for quadratic\n(i,j) cijxixj studied by Hammer, Hansen and\nIt is known to possess the persistency property: for any half-integral minimizer\nSimeone [13].\n\u02c6x \u2208 arg min \u02c6f (\u02c6x) there exists minimizer x \u2208 arg min f (x) such that xi = \u02c6xi for all nodes i with\nintegral component \u02c6xi. This property is quite important in practice as it allows to reduce the size\nof the minimization problem when \u02c6x (cid:54)= 1\n2. The set of nodes with guaranteed optimal solution can\nsometimes be increased further using the PROBE technique [6], which also relies on persistency.\nThe goal of this paper is to generalize the roof duality approach to arbitrary pseudo-boolean func-\ntions. The total half-integrality is a very natural requirement of such generalizations, as discussed\nlater in this section. As we prove, total half-integrality implies persistency.\n\n1\n\n\fWe provide a complete characterization of totally half-integral relaxations. Namely, we prove in sec-\ntion 2 that if \u02c6f : K \u2192 R is totally half-integral then its restriction to K1/2 is a bisubmodular function,\nand conversely any bisubmodular function can be extended to a totally half-integral relaxation.\nDe\ufb01nition 2. Function f : K1/2 \u2192 R is called bisubmodular if\n\nf (x (cid:117) y) + f (x (cid:116) y) \u2264 f (x) + f (y)\n\n\u2200 x, y \u2208 K1/2\n\nwhere binary operators (cid:117),(cid:116) : K1/2 \u00d7 K1/2 \u2192 K1/2 are de\ufb01ned component-wise as follows:\n\n(cid:117)\n0\n\n1\n2\n1\n\n0\n\n0\n\n1\n2\n1\n2\n\n1\n2\n1\n2\n1\n2\n1\n2\n\n1\n\n1\n2\n1\n2\n1\n\n(cid:116)\n0\n\n1\n2\n1\n\n0\n\n0\n\n0\n\n1\n2\n\n1\n2\n0\n\n1\n2\n1\n\n1\n\n1\n2\n1\n\n1\n\n(1)\n\n(2)\n\nAs our second contribution, we give a new characterization of bisubmodular functions (section 3).\nUsing this characterization, we then prove several results showing links with the roof duality relax-\nation (section 4).\n\n1.1 Applications\n\nobjective function of the form f (x) =(cid:80)\n\nThis work has been motivated by computer vision applications. A fundamental task in vision is\nto infer pixel properties from observed data. These properties can be the type of object to which\nthe pixel belongs, distance to the camera, pixel intensity before being corrupted by noise, etc. The\npopular MAP-MRF approach casts the inference task as an energy minimization problem with the\nC fC(x) where C \u2282 V are subsets of neighboring pixels\nof small cardinality (|C| = 1, 2, 3, . . .) and terms fC(x) depend only on labels of pixels in C.\nFor some vision applications the roof duality approach [13] has shown a good performance [30,\n32, 23, 24, 33, 1, 16, 17].1 Functions with higher-order terms are steadily gaining popularity in\ncomputer vision [31, 33, 1, 16, 17]; it is generally accepted that they correspond to better image\nmodels. Therefore, studying generalizations of roof duality to arbitrary pseudo-boolean functions\nis an important task. In such generalizations the total half-integrality property is essential. Indeed,\nin practice, the relaxation \u02c6f is obtained as the sum of relaxations \u02c6fC constructed for each term\nindependently. Some of these terms can be c|xi \u2212 xj| and c|xi + xj \u2212 1|.\nIf c is suf\ufb01ciently\nlarge, then applying the roof duality relaxation to these terms would yield constraints xi = xj and\nx = xj present in the de\ufb01nition of total half-integrality. Constraints xi = \u03b3 \u2208 {0, 1, 1\n2} can also\nbe simulated via the roof duality, e.g. xi = xj, xi = xj for the same pair of nodes i, j implies\n2.\nxi = xj = 1\n\n1.2 Related work\n\nHalf-integrality There is a vast literature on using half-integral relaxations for various combina-\ntorial optimization problems. In many cases these relaxations lead to 2-approximation algorithms.\nBelow we list a few representative papers.\nThe earliest work recognizing half-integrality of polytopes with certain pairwise constraints was\nperhaps by Balinksi [3], while the persistency property goes back to Nemhauser and Trotter [28]\nwho considered the vertex cover problem. Hammer, Hansen and Simeone [13] established that these\nproperties hold for the roof duality relaxation for quadratic pseudo-boolean functions. Their work\nwas generalized to arbitrary pseudo-boolean functions by Lu and Williams [25]. (The relaxation\nin [25] relied on converting function f to a multinomial representation; see section 4 for more\ndetails.) Hochbaum [14, 15] gave a class of integer problems with half-integral relaxations. Very\nrecently, Iwata and Nagano [18] formulated a half-integral relaxation for the problem of minimizing\nsubmodular function f (x) under constraints of the form xi + xj \u2265 1.\n\n1In many vision problems variables xi are not binary. However, such problems are often reduced to\na sequence of binary minimization problems using iterative move-making algorithms, e.g. using expansion\nmoves [9] or fusion moves [23, 24, 33, 17].\n\n2\n\n\f(cid:80) fC(x), convert terms fC(x) to quadratic pseudo-boolean functions by introducing auxiliary\n\nIn computer vision, several researchers considered the following scheme: given a function f (x) =\n\nbinary variables, and then apply the roof duality relaxation to the latter. Woodford et al. [33] used\nthis technique for the stereo reconstruction problem, while Ali et al. [1] and Ishikawa [16] explored\ndifferent conversions to quadratic functions.\nTo the best of our knowledge, all examples of totally half-integral relaxations proposed so far belong\nto the class of submodular relaxations, which is de\ufb01ned in section 4. They form a subclass of more\ngeneral bisubmodular relaxations.\nBisubmodularity Bisubmodular functions were introduced by Chandrasekaran and Kabadi as rank\nfunctions of (poly-)pseudomatroids [10, 19]. Independently, Bouchet [7] introduced the concept of\n\u2206-matroids which is equivalent to pseudomatroids. Bisubmodular functions and their generaliza-\ntions have also been considered by Qi [29], Nakamura [27], Bouchet and Cunningham [8] and Fu-\njishige [11]. The notion of the Lov\u00b4asz extension of a bisubmodular function introduced by Qi [29]\nwill be of particular importance for our work (see next section).\nIt has been shown that some submodular minimization algorithms can be generalized to bisubmod-\nular functions. Qi [29] showed the applicability of the ellipsoid method. A weakly polynomial com-\nbinatorial algorithm for minimizing bisubmodular functions was given by Fujishige and Iwata [12],\nand a strongly polynomial version was given by McCormick and Fujishige [26].\nRecently, we introduced strongly and weakly tree-submodular functions [22] that generalize bisub-\nmodular functions.\n\n2 Total half-integrality and bisubmodularity\n\nThe \ufb01rst result of this paper is following theorem.\nTheorem 3. If \u02c6f : K \u2192 R is a totally half-integral relaxation then its restriction to K1/2 is bisub-\nmodular. Conversely, if function f : K1/2 \u2192 R is bisubmodular then it has a unique totally half-\nintegral extension \u02c6f : K \u2192 R.\nThis section is devoted to the proof of theorem 3. Denote L = [\u22121, 1]V , L1/2 = {\u22121, 0, 1}V . It\nwill be convenient to work with functions \u02c6h : L \u2192 R and h : L1/2 \u2192 R obtained from \u02c6f and f via\na linear change of coordinates xi (cid:55)\u2192 2xi \u2212 1. Under this change totally half-integral relaxations are\ntransformed to totally integral relaxations:\nDe\ufb01nition 4. Let \u02c6h : L \u2192 R be a function of n variables. (a) \u02c6h is called integral if it is a convex\npolyhedral function such that all extreme points of the epigraph {(x, z)| x \u2208 L, z \u2265 \u02c6h(x)} have the\nform (x, \u02c6h(x)) where x \u2208 L1/2. (b) \u02c6h is called totally integral if it is integral and for an arbitrary\nordering of nodes the following functions of n \u2212 1 variables (if n > 1) are totally integral:\n\n\u02c6h(cid:48)(x1, . . . , xn\u22121) = \u02c6h(x1, . . . , xn\u22121, xn\u22121)\n\u02c6h(cid:48)(x1, . . . , xn\u22121) = \u02c6h(x1, . . . , xn\u22121,\u2212xn\u22121)\n\u02c6h(cid:48)(x1, . . . , xn\u22121) = \u02c6h(x1, . . . , xn\u22121, \u03b3)\n\nfor any constant \u03b3 \u2208 {\u22121, 0, 1}\n\nThe de\ufb01nition of a bisubmodular function is adapted as follows: function h : L1/2 \u2192 R is bisub-\nmodular if inequality (1) holds for all x, y \u2208 L1/2 where operations (cid:117),(cid:116) are de\ufb01ned by tables (2)\nafter replacements 0 (cid:55)\u2192 \u22121, 1\n2 (cid:55)\u2192 0, 1 (cid:55)\u2192 1. To prove theorem 3, it suf\ufb01ces to establish a link\nbetween totally integral relaxations \u02c6h : L \u2192 R and bisubmodular functions h : L1/2 \u2192 R. We can\nassume without loss of generality that \u02c6h(0) = h(0) = 0, since adding a constant to the functions\ndoes not affect the theorem.\nA pair \u03c9 = (\u03c0, \u03c3) where \u03c0 : V \u2192 {1, . . . , n} is a permutation of V and \u03c3 \u2208 {\u22121, 1}V will be\ncalled a signed ordering. Let us rename nodes in V so that \u03c0(i) = i. To each signed ordering \u03c9 we\nassociate labelings x0, x1, . . . , xn \u2208 L1/2 as follows:\nx1 = (\u03c31, 0, . . . , 0)\n\nxn = (\u03c31, \u03c32, . . . , \u03c3n)\n\n(3)\n\nx0 = (0, 0, . . . , 0)\n\n. . .\n\n3\n\n\fwhere nodes are ordered according to \u03c0.\nConsider function h : L1/2 \u2192 R with h(0) = 0. Its Lov\u00b4asz extension \u02c6h : RV \u2192 R is de\ufb01ned in\nthe following way [29]. Given a vector x \u2208 RV , select a signed ordering \u03c9 = (\u03c0, \u03c3) as follows:\n(i) choose \u03c0 so that values |xi|, i \u2208 V are non-increasing, and rename nodes accordingly so that\n|x1| \u2265 . . . \u2265 |xn|; (ii) if xi (cid:54)= 0 set \u03c3i = sign(xi), otherwise choose \u03c3i \u2208 {\u22121, 1} arbitrarily. It\nis not dif\ufb01cult to check that\n\n(4a)\nwhere labelings xi are de\ufb01ned in (3) (with respect to the selected signed ordering) and \u03bbi = |xi| \u2212\n|xi+1| for i = 1, . . . , n \u2212 1, \u03bbn = |xn|. The value of the Lov\u00b4asz extension is now de\ufb01ned as\n\n\u03bbixi\n\nx =\n\ni=1\n\nn(cid:88)\n\nn(cid:88)\n\n\u02c6h(x) =\n\n\u03bbih(xi)\n\n(4b)\n\ni=1\n\nTheorem 5 ([29]). Function h is bisubmodular if and only if its Lov\u00b4asz extension \u02c6h is convex on\nL. 2\nLet L\u03c9 be the set of vectors in L for which signed ordering \u03c9 = (\u03c0, \u03c3) can be selected. Clearly,\nL\u03c9 = {x \u2208 L | |x1| \u2265 . . . \u2265 |xn|, xi\u03c3i \u2265 0 \u2200i \u2208 V }. It is easy to check that L\u03c9 is the convex hull\nof n + 1 points (3). Equations (4) imply that \u02c6h is linear on L\u03c9 and coincides with h in each corner\nx0, . . . , xn.\nLemma 6. Suppose function \u02dch : L \u2192 R is totally integral. Then \u02dch is linear on simplex L\u03c9 for each\nsigned ordering \u03c9 = (\u03c0, \u03c3).\nProof. We use induction on n = |V |. For n = 1 the claim is straightforward; suppose that n \u2265 2.\nConsider signed ordering \u03c9 = (\u03c0, \u03c3). We need to prove that \u02dch is linear on the boundary \u2202L\u03c9; this\nwill imply that \u02c6g is linear on L\u03c9 since otherwise \u02dch would have an extreme point in the the interior\nL\u03c9\\\u2202L\u03c9 which cannot be integral.\nLet X = {x0, . . . , xn} be the set of extreme points of L\u03c9 de\ufb01ned by (3). The boundary \u2202L\u03c9 is the\nunion of n + 1 facets L0\n\u03c9 is the convex hull of points in X\\{xi}. Let us prove\n\u03c9 = {x \u2208 L\u03c9 | x1 =\nthat \u02dch is linear on L0\n\u03c31}. Consider function of n \u2212 1 variables \u02dch(cid:48)(x2, . . . , xn) = \u02dch(\u03c31, x2, . . . , xn), and let L(cid:48) 0\n\u03c9 be the\nprojection of L0\n\u03c9 , and thus \u02dch is linear on\nL0\n\u03c9.\nThe fact that \u02dch is linear on other facets can be proved in a similar way. Note that for i = 2, . . . , n\u2212 1\nthere holds Li\n\u03c9 = {x \u2208 L\u03c9 | xn = 0}.\n\n\u03c9 = {x \u2208 L\u03c9 | xi = \u03c3i\u22121\u03c3ixi\u22121}, and for i = n we have Ln\n\n\u03c9 to RV \\{1}. By the induction hypothesis \u02dch(cid:48) is linear on L(cid:48) 0\n\n\u03c9. All points x \u2208 X\\{x0} satisfy x1 = \u03c31, therefore L0\n\n\u03c9, . . . ,Ln\n\n\u03c9 where Li\n\nCorollary 7. Suppose function \u02dch : L \u2192 R with \u02dch(0) = 0 is totally integral. Let h be the restriction\nof \u02dch to L1/2 and \u02c6h be the Lov\u00b4asz extension of h. Then \u02dch and \u02c6h coincide on L.\n\nTheorem 5 and corollary 7 imply the \ufb01rst part of theorem 3. The second part will follow from\nLemma 8. If h : L1/2 \u2192 R with h(0) = 0 is bisubmodular then its Lov\u00b4asz extension \u02c6h : L \u2192 R is\ntotally integral.\n\n2Note, Qi formulates this result slightly differently: \u02c6h is assumed to be convex on RV rather than on L.\nHowever, it is easy to see that convexity of \u02c6h on L implies convexity of \u02c6h on RV . Indeed, it can be checked\nthat \u02c6h is positively homogeneous, i.e. \u02c6h(\u03b3x) = \u03b3\u02c6h(x) for any \u03b3 \u2265 0, x \u2208 RV . Therefore, for any x, y \u2208 RV\nand \u03b1, \u03b2 \u2265 0 with \u03b1 + \u03b2 = 1 there holds\n\n\u02c6h(\u03b1x + \u03b2y) =\n\n1\n\u03b3\n\n\u02c6h(\u03b1\u03b3x + \u03b2\u03b3y) \u2264 \u03b1\n\u03b3\n\n\u02c6h(\u03b3x) +\n\n\u03b2\n\u03b3\n\n\u02c6h(\u03b3y) = \u03b1\u02c6h(x) + \u03b2\u02c6h(y)\n\nwhere the inequality in the middle follows from convexity of \u02c6h on L, assuming that \u03b3 is a suf\ufb01ciently small\nconstant.\n\n4\n\n\fProof. We use induction on n = |V |. For n = 1 the claim is straightforward; suppose that n \u2265 2.\nBy theorem 5, \u02c6h is convex on L. Function \u02c6h is integral since it is linear on each simplex L\u03c9 and\nvertices of L\u03c9 belong to L1/2. It remains to show that functions \u02c6h(cid:48) considered in de\ufb01nition 4 are\ntotally integral. Consider the following functions h(cid:48) : {\u22121, 0, 1}V \\{n} \u2192 R:\n\nh(cid:48)(x1, . . . , xn\u22121) = h(x1, . . . , xn\u22121, xn\u22121)\nh(cid:48)(x1, . . . , xn\u22121) = h(x1, . . . , xn\u22121,\u2212xn\u22121)\nh(cid:48)(x1, . . . , xn\u22121) = h(x1, . . . , xn\u22121, \u03b3) , \u03b3 \u2208 {\u22121, 0, 1}\n\nIt can be checked that these functions are bisubmodular, and their Lov\u00b4asz extensions coincide with\nrespective functions \u02c6h(cid:48) used in de\ufb01nition 4. The claim now follows from the induction hypothesis.\n\n3 A new characterization of bisubmodularity\n\nIn this section we give an alternative de\ufb01nition of bisubmodularity; it will be helpful later for de-\nscribing a relationship to the roof duality. As is often done for bisubmodular functions, we will\nencode each half-integral value xi \u2208 {0, 1, 1\n2} via two binary variables (ui, ui(cid:48)) according to the\nfollowing rules:\n1 \u2194 (1, 0)\nThus, labelings in K1/2 will be represented via labelings in the set\n\n2 \u2194 (0, 0)\n\n0 \u2194 (0, 1)\n\n1\n\nX \u2212 = {u \u2208 {0, 1}V | (ui, ui(cid:48)) (cid:54)= (1, 1) \u2200 i \u2208 V }\n\nwhere V = {i, i(cid:48) | i \u2208 V } is a set with 2n nodes. The node i(cid:48) for i \u2208 V is called the \u201cmate\u201d of\ni; intuitively, variable ui(cid:48) corresponds to the complement of ui. We de\ufb01ne (i(cid:48))(cid:48) = i for i \u2208 V .\nLabelings in X \u2212 will be denoted either by a single letter, e.g. u or v, or by a pair of letters, e.g.\n(x, y). In the latter case we assume that the two components correspond to labelings of V and\nV \\V , respectively, and the order of variables in both components match. Using this convention, the\none-to-one mapping X \u2212 \u2192 K1/2 can be written as (x, y) (cid:55)\u2192 1\n2 (x + y). Accordingly, instead of\nfunction f : K1/2 \u2192 R we will work with the function g : X \u2212 \u2192 R de\ufb01ned by\n\n(cid:18) x + y\n\n(cid:19)\n\n2\n\n(5)\nNote that the set of integer labelings B \u2282 K1/2 corresponds to the set X \u25e6 = {u \u2208 X \u2212 | (ui, ui(cid:48)) (cid:54)=\n(0, 0)}, so function g : X \u2212 \u2192 R can be viewed as a discrete relaxation of function g : X \u25e6 \u2192 R.\nDe\ufb01nition 9. Function f : X \u2212 \u2192 R is called bisubmodular if\n\ng(x, y) = f\n\nf (u (cid:117) v) + f (u (cid:116) v) \u2264 f (u) + f (v)\n\n(6)\nwhere u (cid:117) v = u \u2227 v, u (cid:116) v = REDUCE(u \u2228 v) and REDUCE(w) is the labeling obtained from\nw by changing labels (wi, wi(cid:48)) from (1, 1) to (0, 0) for all i \u2208 V .\nTo describe a new characterization, we need to introduce some additional notation. We denote\nX = {0, 1}V to be the set of all binary labelings of V . For a labeling u \u2208 X , de\ufb01ne labeling u(cid:48) by\n(u(cid:48))i = ui(cid:48). Labels (ui, ui(cid:48)) are transformed according to the rules\n\n\u2200 u, v \u2208 X \u2212\n\n(0, 1) \u2192 (0, 1)\n\n(1, 0) \u2192 (1, 0)\n\n(7)\nEquivalently, this mapping can be written as (x, y)(cid:48) = (y, x). Note that u(cid:48)(cid:48) = u, (u\u2227 v)(cid:48) = u(cid:48)\u2228 v(cid:48)\nand (u \u2228 v)(cid:48) = u(cid:48) \u2227 v(cid:48) for u, v \u2208 X . Next, we de\ufb01ne sets\n\n(0, 0) \u2192 (1, 1)\n\n(1, 1) \u2192 (0, 0)\n\nX \u2212 = {u \u2208 X | u \u2264 u(cid:48)} = {u \u2208 X | (ui, u(cid:48)\nX + = {u \u2208 X | u \u2265 u(cid:48)} = {u \u2208 X | (ui, u(cid:48)\nX \u25e6 = {u \u2208 X | u = u(cid:48)} = {u \u2208 X | (ui, u(cid:48)\nX (cid:63) = X \u2212 \u222a X +\n\ni) (cid:54)= (1, 1) \u2200i \u2208 V }\ni) (cid:54)= (0, 0) \u2200i \u2208 V }\ni) \u2208 {(0, 1), (1, 0)} \u2200i \u2208 V } = X \u2212 \u2229 X +\n\nClearly, u \u2208 X \u2212 if and only if u(cid:48) \u2208 X +. Also, any function g : X \u2212 \u2192 R can be uniquely extended\nto a function g : X (cid:63) \u2192 R so that the following condition holds:\n\u2200 u \u2208 X (cid:63)\n\ng(u(cid:48)) = g(u)\n\n(8)\n\n5\n\n\fProposition 10. Let g : X (cid:63) \u2192 R be a function satisfying (8). The following conditions are equiva-\nlent:\n\n(a) g is bisubmodular, i.e. it satis\ufb01es (6).\n\n(b) g satis\ufb01es the following inequalities:\n\ng(u \u2227 v) + g(u \u2228 v) \u2264 g(u) + g(v)\n\n(9)\n(c) g satis\ufb01es those inequalities in (6) for which u = w \u2228 ei, v = w \u2228 ej where w = u \u2227 v\nand i, j are distinct nodes in V with wi = wj = 0. Here ek for node k \u2208 V denotes the\nlabeling in X with ek\n\nk(cid:48) = 0 for k(cid:48) \u2208 V \\{k}.\n\nk = 1 and ek\n\nif u, v, u \u2227 v, u \u2228 v \u2208 X (cid:63)\n\n(d) g satis\ufb01es those inequalities in (9) for which u = w \u2228 ei, v = w \u2228 ej where w = u \u2227 v\n\nand i, j are distinct nodes in V with zi = zj = 0.\n\nA proof is given [20]. Note, an equivalent of characterization (c) was given by Ando et al. [2]; we\nstate it here for completeness.\nRemark 1 In order to compare characterizations (b,d) to existing characterizations (a,c), we need\nto analyze the sets of inequalities in (b,d) modulo eq. (8), i.e. after replacing terms g(w), w \u2208 X +\nwith g(w(cid:48)). In can be seen that the inequalities in (a) are neither subset nor superset of those in (b)3,\nso (b) is a new characterization. It is also possible to show that from this point of view (c) and (d)\nare equivalent.\n\n4 Submodular relaxations and roof duality\nConsider a submodular function g : X \u2192 R satisfying the following \u201csymmetry\u201d condition:\n\ng(u(cid:48)) = g(u)\n\n\u2200 u \u2208 X\n\n(10)\nWe call such function g a submodular relaxation of function f (x) = g(x, x). Clearly, it satis\ufb01es\nconditions of proposition 10, so g is also a bisubmodular relaxation of f. Furthermore, minimizing\ng is equivalent to minimizing its restriction g : X \u2212 \u2192 R; indeed, if u \u2208 X is a minimizer of g then\nso are u(cid:48) and u \u2227 u(cid:48) \u2208 X \u2212.\nIn this section we will do the following: (i) prove that any pseudo-boolean function f : B \u2192 R has\na submodular relaxation g : X \u2192 R; (ii) show that the roof duality relaxation for quadratic pseudo-\nboolean functions is a submodular relaxation, and it dominates all other bisubmodular relaxations;\n(iii) show that for non-quadratic pseudo-boolean functions bisubmodular relaxations can be tighter\nthan submodular ones; (iv) prove that similar to the roof duality relaxation, bisubmodular relaxations\npossess the persistency property.\nReview of roof duality Consider a quadratic pseudo-boolean function f : B \u2192 R:\n\nf (x) =\n\nfi(xi) +\n\nfij(xi, xj)\n\n(11)\n\n(cid:88)\n\n(i,j)\u2208E\n\nwhere (V, E) is an undirected graph and xi \u2208 {0, 1} for i \u2208 V are binary variables. Hammer,\nHansen and Simeone [13] formulated several linear programming relaxations of this function and\n\n3Denote u =\n\n0\n0\n\n1\n0\n\nand v =\n\n0\n1\n\n0\n0\n\nwhere the top and bottom rows correspond to the labelings\n\nof V and V \\V respectively, with |V | = 4. Plugging pair (u, v) into (6) gives the following inequality:\n\n(cid:17) \u2264 g\n(cid:17) \u2264 g\n\n0\n\n(cid:16) 1\n(cid:16) 1\n\n0\n\n(cid:17)\n(cid:17)\n\n0\n\n(cid:16) 0\n(cid:16) 0\n\n0\n\n(cid:17)\n(cid:17)\n\ng\n\n0\n0\n\n0\n0\n\n0\n0\n\n+ g\n\n1\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n1\n0\n\n0\n0\n\n+ g\n\n1\n0\n\n0\n1\n\n0\n0\n\nThis inequality is a part of (a), but it is not present in (b): pairs (u, v) and (u(cid:48), v(cid:48)) do not satisfy the RHS\nof (9), while pairs (u, v(cid:48)) and (u(cid:48), v) give a different inequality:\n0\n0\n\n+ g\n\n+ g\n\n0\n0\n\n1\n0\n\n0\n0\n\n1\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n0\n\n0\n1\n\n1\n0\n\ng\n\n0\n\nwhere we used condition (8). Conversely, the second inequality is a part of (b) but it is not present in (a).\n\n6\n\n(cid:88)\n\ni\u2208V\n\n(cid:17)\n\n0\n\n1\n0\n\n(cid:16) 0\n(cid:16) 1\n(cid:16) 0\n\n0\n\n(cid:17)\n(cid:17)\n\n(cid:16) 1\n\n0\n\n0\n0\n\n(cid:17)\n(cid:16) 0\n(cid:16) 1\n\n0\n\n0\n\n\fshowed their equivalence. One of these formulations was called a roof dual. An ef\ufb01cient max\ufb02ow-\nbased method for solving the roof duality relaxation was given by Hammer, Boros and Sun [5, 4].\nWe will rely on this algorithmic description of the roof duality approach [4]. The method\u2019s idea\ncan be summarized as follows. Each variable xi is replaced with two binary variables ui and ui(cid:48)\ncorresponding to xi and 1 \u2212 xi respectively. The new set of nodes is V = {i, i(cid:48) | i \u2208 V }. Next,\nfunction f is transformed to a function g : X \u2192 R by replacing each term according to the following\nrules:\n\nfi(xi)\n\nfij(xi, xj)\n\nfij(xi, xj)\n\n(cid:55)\u2192 1\n2\n(cid:55)\u2192 1\n2\n(cid:55)\u2192 1\n2\n\n[fi(ui) + fi(ui(cid:48))]\n\n[fij(ui, uj) + fij(ui(cid:48), uj(cid:48))]\n\n[fij(ui, uj(cid:48)) + fij(ui(cid:48), uj)]\n\nif fij(\u00b7,\u00b7) is submodular\nif fij(\u00b7,\u00b7) is not submodular\n\n(12a)\n\n(12b)\n\n(12c)\n\n2 (ui + ui(cid:48)) [4].\n\ng is a submodular quadratic pseudo-boolean function, so it can be minimized via a max\ufb02ow al-\ngorithm. If u \u2208 X is a minimizer of g then the roof duality relaxation has a minimizer \u02c6x with\n\u02c6xi = 1\nIt is easy to check that g(u) = g(u(cid:48)) for all u \u2208 X , therefore g is a submodular relaxation. Also, f\nand g are equivalent when ui(cid:48) = ui for all i \u2208 V , i.e.\ng(x, x) = f (x)\n\n\u2200x \u2208 B\n\n(13)\n\nInvariance to variable \ufb02ipping Suppose that g is a (bi-)submodular relaxation of function f :\nB \u2192 R. Let i be a \ufb01xed node in V , and consider function f(cid:48)(x) obtained from f (x) by a change of\ncoordinates xi (cid:55)\u2192 xi and function g(cid:48)(u) obtained from g(u) by swapping variables ui and ui(cid:48). It is\neasy to check that g(cid:48) is a (bi-)submodular relaxation of f(cid:48). Furthermore, if f is a quadratic pseudo-\nboolean function and g is its submodular relaxation constructed by the roof duality approach, then\napplying the roof duality approach to f(cid:48) yields function g(cid:48). We will sometimes use such \u201c\ufb02ipping\u201d\noperation for reducing the number of considered cases.\nConversion to roof duality Let us now consider a non-quadratic pseudo-boolean function f : B \u2192\nR. Several papers [33, 1, 16] proposed the following scheme: (1) Convert f to a quadratic pseudo-\nboolean function \u02dcf by introducing k auxiliary binary variables so that f (x) = min\u03b1\u2208{0,1}k \u02dcf (x, \u03b1)\nfor all labelings x \u2208 B. (2) Construct submodular relaxation \u02dcg(x, \u03b1, y, \u03b2) of \u02dcf by applying the roof\nduality relaxation to \u02dcf; then\n\n\u02dcg(x, \u03b1, y, \u03b2) = \u02dcg(y, \u03b2, x, \u03b1) ,\n\n\u02dcg(x, \u03b1, x, \u03b1) = \u02dcf (x, \u03b1)\n\n\u2200x, y \u2208 B, \u03b1, \u03b2 \u2208 {0, 1}k\nvariables:\n\ng(x, y)\n\n=\n\nout\n\nfunction\n\nauxiliary\n\nby minimizing\n\n(3) Obtain\ng\nmin\u03b1,\u03b2\u2208{0,1}k \u02dcg(x, \u03b1, y, \u03b2).\nOne can check that g(x, y) = g(y, x), so g is a submodular relaxation4.\nIn general, however,\nit may not be a relaxation of function f, i.e. (13) may not hold; we are only guaranteed to have\ng(x, x) \u2264 f (x) for all labelings x \u2208 B.\nIt is easy to check that if f : B \u2192 R is submodular\nExistence of submodular relaxations\n2 [f (x) + f (y)] is a submodular relaxation of f.5 Thus, monomials of\nthen function g(x, y) = 1\nthe form c\u03a0i\u2208Axi where c \u2264 0 and A \u2286 V have submodular relaxations. Using the \u201c\ufb02ipping\u201d\noperation xi (cid:55)\u2192 xi, we conclude that submodular relaxations also exist for monomials of the form\n4It is well-known that minimizing variables out preserves submodularity. Indeed, suppose that h(x) =\n\nmin\u03b1 \u02dch(x, \u03b1) where \u02dch is a submodular function. Then h is also submodular since\n\nh(x) + h(y) = \u02dch(x, \u03b1) + \u02dch(y, \u03b2) \u2265 \u02dch(x \u2227 y, \u03b1 \u2227 \u03b2) + \u02dch(x \u2228 y, \u03b1 \u2228 \u03b2) \u2265 h(x \u2227 y) + h(x \u2228 y)\n\n5In fact, it dominates all other bisubmodular relaxations \u00afg : X \u2212 \u2192 R of f. Indeed, consider labeling\n(x, y) \u2208 X \u2212. It can be checked that (x, y) = u (cid:117) v = u (cid:116) v where u = (x, x) and v = (y, y), therefore\n\u00afg(x, y) \u2264 1\n\n2 [f (x) + f (y)] = g(x, y).\n\n2 [\u00afg(u) + \u00afg(v)] = 1\n\n7\n\n\fc\u03a0i\u2208Axi\u03a0i\u2208Bxi where c \u2264 0 and A, B are disjoint subsets of U. It is known that any pseudo-\nboolean function f can be represented as a sum of such monomials (see e.g. [4]; we need to represent\n\u2212f as a posiform and take its negative). This implies that any pseudo-boolean function f has a\nsubmodular relaxation.\nNote that this argument is due to Lu and Williams [25] who converted function f to a sum of\nmonomials of the form c\u03a0i\u2208Axi and cxk\u03a0i\u2208Axi, c \u2264 0, k /\u2208 A. It is possible to show that the\nrelaxation proposed in [25] is equivalent to the submodular relaxation constructed by the scheme\nabove (we omit the derivation).\nSubmodular vs. bisubmodular relaxations An important question is whether bisubmodular\nrelaxations are more \u201cpowerful\u201d compared to submodular ones. The next theorem gives a class of\nfunctions for which the answer is negative; its proof is given in [20].\nTheorem 11. Let g be the submodular relaxation of a quadratic pseudo-boolean function f de\ufb01ned\nby (12), and assume that the set E does not have parallel edges. Then g dominates any other\nbisubmodular relaxation \u00afg of f, i.e. g(u) \u2265 \u00afg(u) for all u \u2208 X \u2212.\nFor non-quadratic pseudo-boolean functions, however, the situation can be different. In [20]. we\ngive an example of a function f of n = 4 variables which has a tight bisubmodular relaxation g (i.e.\ng has a minimizer in X \u25e6), but all submodular relaxations are not tight.\nPersistency Finally, we show that bisubmodular functions possess the autarky property, which\nimplies persistency.\nProposition 12. Let f : K1/2 \u2192 R be a bisubmodular function and x \u2208 K1/2 be its minimizer.\n[Autarky] Let y be a labeling in B. Consider labeling z = (y (cid:116) x) (cid:116) x. Then z \u2208 B and\nf (z) \u2264 f (y).\n[Persistency] Function f : B \u2192 R has a minimizer x\u2217 \u2208 B such that x\u2217\nwith integral xi.\n\ni = xi for nodes i \u2208 V\n\n2 and zi = xi if xi \u2208 {0, 1}. Thus, z \u2208 B. For\nProof. It can be checked that zi = yi if xi = 1\nany w \u2208 K1/2 there holds f (w (cid:116) x) \u2264 f (w) + [f (x) \u2212 f (w (cid:117) x)] \u2264 f (w). This implies that\nf ((y (cid:116) x) (cid:116) x) \u2264 f (y). Applying the autarky property to a labeling y \u2208 arg min{f (x) | x \u2208 B }\nyields persistency.\n\n5 Conclusions and future work\n\nby computer vision applications that use functions of the form f (x) =(cid:80)\n\nWe showed that bisubmodular functions can be viewed as a natural generalization of the roof duality\napproach to higher-order cliques. As mentioned in the introduction, this work has been motivated\nC fC(x). An important\nopen question is how to construct bisubmodular relaxations \u02c6fC for individual terms. For terms of\nlow order, e.g. with |C| = 3, this potentially could be done by solving a small linear program.\nAnother important question is how to minimize such functions. Algorithms in [12, 26] are unlikely\nto be practical for most vision problems, which typically have tens of thousands of variables. How-\never, in our case we need to minimize a bisubmodular function which has a special structure: it\nis represented as a sum of low-order bisubmodular terms. We recently showed [21] that a sum of\nlow-order submodular terms can be optimized more ef\ufb01ciently using max\ufb02ow-like techniques. We\nconjecture that similar techniques can be developed for bisubmodular functions as well.\n\nReferences\n[1] Asem M. Ali, Aly A. Farag, and Georgy L. Gimel\u2019Farb. Optimizing binary MRFs with higher order\n\ncliques. In ECCV, 2008.\n\n[2] Kazutoshi Ando, Satoru Fujishige, and Takeshi Naitoh. A characterization of bisubmodular functions.\n\nDiscrete Mathematics, 148:299\u2013303, 1996.\n\n[3] M. L. Balinski. Integer programming: Methods, uses, computation. Management Science, 12(3):253\u2013\n\n313, 1965.\n\n8\n\n\f[4] E. Boros and P. L. Hammer. Pseudo-boolean optimization. Discrete Applied Mathematics, 123(1-3):155\n\n\u2013 225, November 2002.\n\n[5] E. Boros, P. L. Hammer, and X. Sun. Network \ufb02ows and minimization of quadratic pseudo-Boolean\n\nfunctions. Technical Report RRR 17-1991, RUTCOR, May 1991.\n\n[6] E. Boros, P. L. Hammer, and G. Tavares. Preprocessing of unconstrained quadratic binary optimization.\n\nTechnical Report RRR 10-2006, RUTCOR, 2006.\n\n[7] A. Bouchet. Greedy algorithm and symmetric matroids. Math. Programming, 38:147\u2013159, 1987.\n[8] A. Bouchet and W. H. Cunningham. Delta-matroids, jump systems and bisubmodular polyhedra. SIAM\n\nJ. Discrete Math., 8:17\u201332, 1995.\n\n[9] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. PAMI, 23(11),\n\nNovember 2001.\n\n[10] R. Chandrasekaran and Santosh N. Kabadi. Pseudomatroids. Discrete Math., 71:205\u2013217, 1988.\n[11] S Fujishige. Submodular Functions and Optimization. North-Holland, 1991.\n[12] Satoru Fujishige and Satoru Iwata. Bisubmodular function minimization. SIAM J. Discrete Math.,\n\n19(4):1065\u20131073, 2006.\n\n[13] P. L. Hammer, P. Hansen, and B. Simeone. Roof duality, complementation and persistency in quadratic\n\n0-1 optimization. Mathematical Programming, 28:121\u2013155, 1984.\n\n[14] D. Hochbaum. Instant recognition of half integrality and 2-approximations. In 3rd International Workshop\n\non Approximation Algorithms for Combinatorial Optimization, 1998.\n\n[15] D. Hochbaum. Solving integer programs over monotone inequalities in three variables: A framework for\nhalf integrality and good approximations. European Journal of Operational Research, 140(2):291\u2013321,\n2002.\n\n[16] H. Ishikawa. Higher-order clique reduction in binary graph cut. In CVPR, 2009.\n[17] H. Ishikawa. Higher-order gradient descent by fusion-move graph cut. In ICCV, 2009.\n[18] Satoru Iwata and Kiyohito Nagano. Submodular function minimization under covering constraints. In\n\nFOCS, October 2009.\n\n[19] Santosh N. Kabadi and R. Chandrasekaran. On totally dual integral systems. Discrete Appl. Math.,\n\nGeneralized roof duality and bisubmodular\n\nfunctions.\n\nTechnical Report\n\n26:87\u2013104, 1990.\n[20] V. Kolmogorov.\n\narXiv:1005.2305v2, September 2010.\n\n[21] V. Kolmogorov. Minimizing a sum of submodular functions. Technical Report arXiv:1006.1990v1, June\n\n2010.\n\n[22] V. Kolmogorov. Submodularity on a tree: Unifying L(cid:92)-convex and bisubmodular functions. Technical\n\nReport arXiv:1007.1229v2, July 2010.\n\n[23] Victor Lempitsky, Carsten Rother, and Andrew Blake. LogCut - ef\ufb01cient graph cut optimization for\n\nMarkov random \ufb01elds. In ICCV, 2007.\n\n[24] Victor Lempitsky, Carsten Rother, Stefan Roth, and Andrew Blake. Fusion moves for Markov random\n\n\ufb01eld optimization. PAMI, July 2009.\n\n[25] S. H. Lu and A. C. Williams. Roof duality for polynomial 0-1 optimization. Math. Programming,\n\n37(3):357\u2013360, 1987.\n\n[26] S. Thomas McCormick and Satoru Fujishige. Strongly polynomial and fully combinatorial algorithms for\n\nbisubmodular function minimization. Math. Program., Ser. A, 122:87\u2013120, 2010.\n\n[27] M. Nakamura. A characterization of greedy sets: universal polymatroids (I). In Scienti\ufb01c Papers of the\n\nCollege of Arts and Sciences, volume 38(2), pages 155\u2013167. The University of Tokyo, 1998.\n\n[28] G. L. Nemhauser and L. E. Trotter. Vertex packings: Structural properties and algorithms. Mathematical\n\nProgramming, 8:232\u2013248, 1975.\n\n[29] Liqun Qi. Directed submodularity, ditroids and directed submodular \ufb02ows. Mathematical Programming,\n\n42:579\u2013599, 1988.\n\n[30] A. Raj, G. Singh, and R. Zabih. MRF\u2019s for MRI\u2019s: Bayesian reconstruction of MR images via graph cuts.\n\nIn CVPR, 2006.\n\n[31] Stefan Roth and Michael J. Black. Fields of experts. IJCV, 82(2):205\u2013229, 2009.\n[32] C. Rother, V. Kolmogorov, V. Lempitsky, and M. Szummer. Optimizing binary MRFs via extended roof\n\nduality. In CVPR, June 2007.\n\n[33] O. Woodford, P. Torr, I. Reid, and A. Fitzgibbon. Global stereo reconstruction under second order smooth-\n\nness priors. In CVPR, 2008.\n\n9\n\n\f", "award": [], "sourceid": 59, "authors": [{"given_name": "Vladimir", "family_name": "Kolmogorov", "institution": null}]}