{"title": "Approximating MAP by Compensating for Structural Relaxations", "book": "Advances in Neural Information Processing Systems", "page_first": 351, "page_last": 359, "abstract": "We introduce a new perspective on approximations to the maximum a posteriori (MAP) task in probabilistic graphical models, that is based on simplifying a given instance, and then tightening the approximation. First, we start with a structural relaxation of the original model. We then infer from the relaxation its deficiencies, and compensate for them. This perspective allows us to identify two distinct classes of approximations. First, we find that max-product belief propagation can be viewed as a way to compensate for a relaxation, based on a particular idealized case for exactness. We identify a second approach to compensation that is based on a more refined idealized case, resulting in a new approximation with distinct properties. We go on to propose a new class of algorithms that, starting with a relaxation, iteratively yields tighter approximations.", "full_text": "Relax then Compensate:\n\nOn Max-Product Belief Propagation and More\n\nArthur Choi\n\nComputer Science Department\n\nUniversity of California, Los Angeles\n\nLos Angeles, CA 90095\n\nAdnan Darwiche\n\nComputer Science Department\n\nUniversity of California, Los Angeles\n\nLos Angeles, CA 90095\n\naychoi@cs.ucla.edu\n\ndarwiche@cs.ucla.edu\n\nAbstract\n\nWe introduce a new perspective on approximations to the maximum a posteriori\n(MAP) task in probabilistic graphical models, that is based on simplifying a given\ninstance, and then tightening the approximation. First, we start with a structural\nrelaxation of the original model. We then infer from the relaxation its de\ufb01cien-\ncies, and compensate for them. This perspective allows us to identify two distinct\nclasses of approximations. First, we \ufb01nd that max-product belief propagation can\nbe viewed as a way to compensate for a relaxation, based on a particular idealized\ncase for exactness. We identify a second approach to compensation that is based\non a more re\ufb01ned idealized case, resulting in a new approximation with distinct\nproperties. We go on to propose a new class of algorithms that, starting with a\nrelaxation, iteratively seeks tighter approximations.\n\n1\n\nIntroduction\n\nRelaxations are a popular approach for tackling intractable optimization problems. Indeed, for \ufb01nd-\ning the maximum a posteriori (MAP) assignment in probabilistic graphical models, relaxations play\na key role in a variety of algorithms. For example, tree-reweighted belief propagation (TRW-BP) can\nbe thought of as a linear programming relaxation of an integer program for a given MAP problem\n[1, 2]. Branch-and-bound search algorithms for \ufb01nding optimal MAP solutions, such as [3, 4], rely\non structural relaxations, such as mini-bucket approximations, to provide upper bounds [4, 5].\n\nWhether a relaxation is used as an approximation on its own, or as a guide for \ufb01nding optimal\nsolutions, a trade-off is typically made between the quality of an approximation and the complexity\nof computing it. We illustrate here instead how it is possible to tighten a given relaxation itself,\nwithout impacting its structural complexity.\n\nMore speci\ufb01cally, we propose here an approach to approximating a given MAP problem by perform-\ning two steps. First, we relax the structure of a given probabilistic graphical model, which results in\na simpler model whose MAP solution provides an upper bound on that of the original. Second, we\ncompensate for the relaxation by introducing auxiliary parameters, which we use to restore certain\nproperties, leading to a tighter approximation. We shall in fact propose two distinct properties on\nwhich a compensation can be based. The \ufb01rst is based on a simpli\ufb01ed case where a compensation\ncan be guaranteed to yield exact results. The second is based on a notion of an ideal compensation,\nthat seeks to correct for a relaxation more directly. As we shall see, the \ufb01rst approach leads to a\nnew semantics for the max-product belief propagation algorithm. The second approach leads to\nanother approximation that further yields upper bounds on the MAP solution. We further propose\nan algorithm for \ufb01nding such a compensation, that starts with a relaxation and iteratively provides\nmonotonically decreasing upper bounds on the MAP solution (at least empirically).\n\nProofs of results are given in the auxiliary Appendix.\n\n1\n\n\f2 MAP Assignments\n\nLet M be a factor graph over a set of variables X, inducing a distribution Pr (x) \u221d Qa \u03c8a(xa)\nwhere x = {X1 = x1, . . . , Xn = xn} is an assignment of factor graph variables Xi to states xi, and\nwhere a is an index to the factor \u03c8a(Xa) over the domain Xa \u2286 X. We seek the maximum a\nposteriori (MAP) assignment x\u22c6 = argmaxx Qa \u03c8a(xa). We denote the log of the value of a MAP\nassignment x\u22c6 by:\n\nmap\n\n\u22c6 = log max\n\nx Y\n\n\u03c8a(xa) = max\n\nx X\n\nlog \u03c8a(xa)\n\na\n\na\n\nwhich we refer to more simply as the MAP value. Note that there may be multiple MAP assignments\nx\u22c6, so we may refer to just the value map\n\u22c6 when the particular assignment is not relevant. Next,\nif z is an assignment over variables Z \u2286 X, then let x \u223c z denote that x and z are compatible\nassignments, i.e., they set their common variables to the same states. Consider then the MAP value\nunder a partial assignment z:\n\nmap(z) = max\n\nx\u223cz X\n\nlog \u03c8a(xa).\n\na\n\nWe will, in particular, be interested in the MAP value map(X = x) where we assume a single vari-\nable X is set to a particular state x. We shall also refer to these MAP values more generally as\nmap(.), without reference to any particular assignment.\n\n3 Relaxation\n\nThe structural relaxations that we consider here are based on the relaxation of equivalence con-\nstraints from a model M, where an equivalence constraint Xi \u2261 Xj is a factor \u03c8eq(Xi, Xj) over\ntwo variables Xi and Xj that have the same states. Further, \u03c8eq(xi, xj) is 1 if xi = xj and 0 oth-\nerwise. We call an assignment x valid, with respect to an equivalence constraint Xi \u2261 Xj, if it sets\nvariables Xi and Xj to the same state, and invalid otherwise. Note that when we remove an equiva-\nlence constraint from a model M, the values map(x) for valid con\ufb01gurations x do not change, since\nlog 1 = 0. However, the values map(x) for invalid con\ufb01gurations can increase, since they are \u2212\u221e\nprior to the removal. In fact, they could overtake the optimal value map\n\u22c6. Thus, the MAP value\nafter relaxing an equivalence constraint in M is an upper bound on the original MAP value.\nIt is straightforward to augment a model M to another where equivalence constraints can be relaxed.\nConsider, for example, a factor \u03c81(A, B, C). We can replace the variable C in this factor with a\n1(A, B, C \u2032). When we now add the factor \u03c82(C, C \u2032) for the\nclone variable C \u2032, resulting in a factor \u03c8\u2032\nequivalence constraint C \u2261 C \u2032, we have a new model M\u2032 which is equivalent to the original model\nM, in that an assignment x in M corresponds to an assignment x\u2032 in M\u2032, where assignment x\u2032 sets\na variable and its clone to the same state. Moreover, the value map(x) in model M is the same as\nthe value map\n\n\u2032(x\u2032) in model M\u2032.\n\nWe note that a number of structural relaxations can be reduced to the removal of equivalence con-\nstraints, including relaxations found by deleting edges [6, 7], as well as mini-bucket approximations\n[5, 4]. In fact, the example above can be considered a relaxation where we delete a factor graph\nedge C \u2192 \u03c81, substituting clone C \u2032 in place of variable C. Note that mini-bucket approximations\nin particular have enabled algorithms for solving MAP problems via branch-and-bound search [3, 4].\n\n4 Compensation\n\nSuppose that we have a model M with MAP values map(.). Say that we remove the equivalence\nconstraints in M, resulting in a relaxed model with MAP values r-map(.). Our goal is to identify\na compensated model M\u2032 with MAP values c-map(.) that is as tractable to compute as the values\nr-map(.), but yielding tighter approximations of the original values map(.).\nTo this end, we introduce into the relaxation additional factors \u03c8ij;i(Xi) and \u03c8ij;j(Xj) for each\nequivalence constraint Xi \u2261 Xj that we remove. Equivalently, we can introduce the log factors\n\u03b8(Xi) = log \u03c8ij;i(Xi) and \u03b8(Xj) = log \u03c8ij;j(Xj) (we omit the additional factor indices, as they\n\n2\n\n\fwill be unambiguous from the context). These new factors add new parameters into the approxima-\ntion, which we shall use to recover a weaker notion of equivalence into the model. More speci\ufb01cally,\ngiven a set of equivalence constraints Xi \u2261 Xj to relax, we have the original MAP values map(.),\nthe relaxation r-map(.) and the compensation c-map(.), where:\n\n\u2022 map(z) = maxx\u223cz Pa log \u03c8a(xa) + PXi\u2261Xj\n\u2022 r-map(z) = maxx\u223cz Pa log \u03c8a(xa)\n\u2022 c-map(z) = maxx\u223cz Pa log \u03c8a(xa) + PXi\u2261Xj\n\nlog \u03c8eq(Xi = xi, Xj = xj)\n\n\u03b8(Xi = xi) + \u03b8(Xj = xj)\n\nNote that the auxiliary factors \u03b8 of the compensation do not introduce additional complexity to the\nrelaxation, in the sense that the treewidth of the resulting model is the same as that of the relaxation.\n\n\u22c6 = map\n\nConsider then the case where an optimal assignment x\u22c6 for the relaxation happens to set variables\nXi and Xj to the same state x, for each equivalence constraint Xi \u2261 Xj that we relaxed. In this\ncase, the optimal solution for the relaxation is also an optimal solution for the original model, i.e.,\nr-map\n\u22c6. On the other hand, if a relaxation\u2019s optimal assignment sets Xi and Xj to different\nstates, then it is not a valid assignment for the original model M, as it violates the equivalence\nconstraint and thus has log probability \u2212\u221e.\nConsider, for a given equivalence constraint Xi \u2261 Xj, the relaxation\u2019s MAP values r-map(Xi = x)\nand r-map(Xj = x) when we set, respectively, a single variable Xi or Xj to a state x. If for all states\nx we \ufb01nd that r-map(Xi = x) 6= r-map(Xj = x), then we can infer that the MAP assignment sets\nvariables Xi and Xj to different states: the MAP value when we set Xi to a state x is different than\nthe MAP value when we set Xj to the same state. We can then ask of a compensation, for all states\nx, that c-map(Xi = x) = c-map(Xj = x), enforcing a weaker notion of equivalence. In this case, if\nthere is a MAP assignment that sets variable Xi to a state x, then there is at least a MAP assignment\nthat sets variable Xj to the same state, even if there is no MAP assignment that sets both Xi and Xj\nto the same state at the same time.\nWe now want to identify parameters \u03b8(Xi) and \u03b8(Xj) to compensate for a relaxation in this manner.\nWe propose two approaches: (1) based on a condition for exactness in a special case, and (2) based\non a notion of ideal compensations. To get the intuitions behind these approaches, we consider \ufb01rst\nthe simpli\ufb01ed case where a single equivalence constraint is relaxed.\n\n4.1\n\nIntuitions: Splitting a Model into Two\n\nConsider the case where relaxing a single equivalence constraint Xi \u2261 Xj splits a model M into two\nindependent sub-models, Mi and Mj, where sub-model Mi contains variable Xi and sub-model\nMj contains variable Xj. Intuitively, we would like the parameters added in one sub-model to\nsummarize the relevant information about the other sub-model. In this way, each sub-model could\nindependently identify their optimal sub-assignments. For example, we can use the parameters:\n\n\u03b8(Xi = x) = mapj(Xj = x)\n\nand\n\n\u03b8(Xj = x) = mapi(Xi = x).\n\nSince sub-models Mi and Mj become independent after relaxing the single equivalence constraint\nXi \u2261 Xj, computing these parameters is suf\ufb01cient to reconstruct the MAP solution for the original\nmodel M. In particular, we have that \u03b8(Xi = x) + \u03b8(Xj = x) = map(Xi = x, Xj = x), and further\nthat map\n\n\u22c6 = maxx[\u03b8(Xi = x) + \u03b8(Xj = x)].\n\nWe propose then that the parameters of a compensation, with MAP values c-map(.), should satisfy\nthe following condition:\n\nc-map(Xi = x) = c-map(Xj = x) = \u03b8(Xi = x) + \u03b8(Xj = x) + \u03b3\n2 c-map\n\n(1)\nfor all states x. Here \u03b3 is an arbitrary normalization constant, but the choice \u03b3 = 1\n\u22c6 results in\nsimpler semantics. The following proposition con\ufb01rms that this choice of parameters does indeed\nre\ufb02ect our earlier intuitions, showing that this choice allows us to recover exact solutions in the\nidealized case when a model is split into two.\n\nProposition 1 Let map(.) denote the MAP values of a model M, and let c-map(.) denote the MAP\nvalues of a compensation that results from relaxing an equivalence constraint Xi \u2261 Xj that split M\ninto two independent sub-models. Then the compensation has parameters satisfying Equation 1 iff\nc-map(Xi = x) = c-map(Xj = x) = map(Xi = x, Xj = x) + \u03b3.\n\n3\n\n\fNote that the choice \u03b3 = 1\nthe case where relaxing an equivalent constraint splits a model into two.\n\n\u22c6 implies that \u03b8(Xi = x) + \u03b8(Xj = x) = map(Xi = x, Xj = x) in\n\n2 c-map\n\nIn the case where relaxing an equivalence constraint does not split a model into two, a compensation\nsatisfying Equation 1 at least satis\ufb01es a weaker notion of equivalence. We might expect that such a\ncompensation may lead to more meaningful, and hopefully more accurate, approximations than a re-\nlaxation. Indeed, this compensation will eventually lead to a generalized class of belief propagation\napproximations. Thus, we call a compensation satisfying Equation 1 a REC-BP approximation.\n\n4.2\n\nIntuitions: An Ideal Compensation\n\nIn the case where a single equivalence constraint Xi \u2261 Xj is relaxed, we may imagine the possibility\nof an \u201cideal\u201d compensation where, as far as computing the MAP solution is concerned, a compen-\nsated model is as good as a model where the equivalence constraint was not relaxed. Consider then\nthe following proposal of an ideal compensation, which has the following two properties. First, it\nhas valid con\ufb01gurations:\n\nc-map(Xi = x) = c-map(Xj = x) = c-map(Xi = x, Xj = x)\n\nfor all states x. Second it has scaled values for valid con\ufb01gurations:\n\nc-map(Xi = x, Xj = x) = \u03ba \u00b7 map(Xi = x, Xj = x).\n\nfor all states x, and for some \u03ba > 1. If a compensation has valid con\ufb01gurations, then its optimal\nsolution sets variables Xi and Xj to the same state, and is thus a valid assignment for the orig-\ninal instance (it satis\ufb01es the equivalence constraint). Moreover, if it has scaled values, then the\ncompensation further allows us to recover the MAP value as well. A compensation having valid\ncon\ufb01gurations and scaled values is thus ideal as it is suf\ufb01cient for us to recover the exact solution.\n\nIt may not always be possible to \ufb01nd parameters that lead to an ideal compensation. However, we\npropose that a compensation\u2019s parameters should satisfy:\n\nc-map(Xi = x) = c-map(Xj = x) = 2 \u00b7 [\u03b8(Xi = x) + \u03b8(Xj = x)]\n\n(2)\nfor all states x, where we choose \u03ba = 2. As the following proposition tells us, if a compensation is\nan ideal one, then it must at least satisfy Equation 2.\n\nProposition 2 Let map(.) denote the MAP values of a model M, and let c-map(.) denote the MAP\nvalues of a compensation that results from relaxing an equivalence constraint Xi \u2261 Xj in M. If\nc-map(.) has valid con\ufb01gurations and scaled values, then c-map(.) satis\ufb01es Equation 2.\n\nWe thus call a compensation satisfying Equation 2 a REC-I compensation.\n\nWe note that other values of \u03ba > 1 can be used, but the choice \u03ba = 2 given above re-\nsults in simpler semantics. In particular, if a compensation happens to satisfy c-map(Xi = x) =\nc-map(Xj = x) = c-map(Xi = x, Xj = x) for some state x, we have that \u03b8(Xi = x) + \u03b8(Xj = x) =\nmap(Xi = x, Xj = x) (i.e., the parameters alone can recover an original MAP value).\nBefore we discuss the general case where we relax multiple equivalence constraints, we highlight\n\ufb01rst a few properties shared by both REC-BP and REC-I compensations, that shall follow from more\ngeneral results that we shall present. First, if the optimal assignment x\u22c6 for a compensation sets the\nvariables Xi and Xj to the same state, then: (1) the assignment x\u22c6 is also optimal for the original\nmodel M; and (2) 1\n\u22c6. In the case where x\u22c6 does not set variables Xi and Xj to the\n2 c-map\n\u22c6 = map\nsame state, the value c-map\n\u22c6 gives at least an upper bound that is no worse than the bound given by\nthe relaxation alone. In particular:\n\nmap\n\n\u22c6 \u2264\n\n1\n2\n\nc-map\n\n\u22c6 \u2264 r-map\n\n\u22c6.\n\nThus, at least in the case where a single equivalence constraint is relaxed, the compensations implied\nby Equations 1 and 2 do indeed tighten a relaxation (see the auxiliary Appendix for further details).\n\n4.3 General Properties\n\nIn this section, we identify the conditions that compensations should satisfy in the more general case\nwhere multiple equivalence constraints are relaxed, and further highlight some of their properties.\n\n4\n\n\fSuppose that k equivalence constraints Xi \u2261 Xj are relaxed from a given model M. Then compen-\nsations REC-BP and REC-I seek to recover into the relaxation two weaker notions of equivalence.\n\nFirst, a REC-BP compensation has auxiliary parameters satisfying:\n\nwhere \u03b3 = k\n\nc-map(Xi = x) = c-map(Xj = x) = \u03b8(Xi = x) + \u03b8(Xj = x) + \u03b3\n\u22c6 by the value\n1+k c-map\n\n\u22c6. We then approximate the exact MAP value map\n\n(3)\n\u22c6.\n\n1\n\n1+k c-map\n\nThe following theorem relates REC-BP to max-product belief propagation.\n\nTheorem 1 Let map(.) denote the MAP values of a model M, and let c-map(.) denote the MAP\nvalues of a compensation that results from relaxing enough equivalence constraints Xi \u2261 Xj in M\nto render it fully disconnected. Then a compensation whose parameters satisfy Equation 3 has\nvalues exp{c-map(Xi = x)} that correspond to the max-marginals of a \ufb01xed-point of max-product\nbelief propagation run on M, and vice-versa.\n\nLoopy max-product belief propagation is thus the degenerate case of a REC-BP compensation, when\nthe approximation is fully disconnected (by deleting every factor graph edge, as de\ufb01ned in Sec-\ntion 3). Approximations need not be this extreme, and more structured approximations correspond\nto instances in the more general class of iterative joingraph propagation approximations [8, 6].\n\nNext, a REC-I compensation has parameters satisfying:\n\nc-map(Xi = x) = c-map(Xj = x) = (1 + k)[\u03b8(Xi = x) + \u03b8(Xj = x)]\n\n(4)\n\nWe again approximate the exact MAP value map\n\n\u22c6 with the value\n\n1\n\n1+k c-map\n\n\u22c6.\n\nIn both compensations, it is possible to determine if the optimal assignment x\u22c6 of a compensation is\nan optimal assignment for the original model M: we need only check that it is a valid assignment.\n\nTheorem 2 Let map(.) denote the MAP values of a model M, and let c-map(.) denote the MAP val-\nues of a compensation that results from relaxing k equivalence constraints Xi \u2261 Xj. If the compen-\nsation has parameters satisfying either Eqs. 3 or 4, and if x\u22c6 is an optimal assignment for the com-\npensation that is also valid, then: (1) x\u22c6 is optimal for the model M, and (2)\n\u22c6.\n\n\u22c6 = map\n\n1\n\n1+k c-map\n\nThis result is analogous to results for max-product BP, TRW-BP, and related algorithms [9, 2, 10].\n\nA REC-I compensation has additional properties over a REC-BP compensation. First, a REC-I com-\npensation yields upper bounds on the MAP value, whereas REC-BP does not yield a bound in general.\n\nTheorem 3 Let map(.) denote the MAP values of a model M, and let c-map(.) denote the MAP\nvalues of a compensation that results from relaxing k equivalence constraints Xi \u2261 Xj. If the com-\npensation has parameters satisfying Equation 4, then map\n\n\u22c6 \u2264 1\n\n\u22c6.\n\n1+k c-map\n\nWe remark now that a relaxation alone has analogous properties. If an assignment x\u22c6 is optimal\nfor a relaxation with MAP values r-map(.), and it is also a valid assignment for a model M (i.e.,\nit does not violate the equivalence constraints Xi \u2261 Xj), then x\u22c6 is also optimal for M, where\nr-map(x\u22c6) = map(x\u22c6) (since they are composed of the same factor values). If an assignment x\u22c6 of\na relaxation is not valid for model M, then the MAP value of the relaxation is an upper bound on\nthe original MAP value. On the other hand, REC-I compensations are tighter approximations than\nthe corresponding relaxation, at least in the case when a single equivalence constraint is relaxed:\n\u22c6. When we relax multiple equivalence constraints we \ufb01nd, at least\nmap\nempirically, that REC-I bounds are never worse than relaxations, although we leave this point open.\n\n\u22c6 \u2264 r-map\n\n2 c-map\n\n\u22c6 \u2264 1\n\nThe following theorem has implications for MAP solvers that rely on relaxations for upper bounds.\n\nTheorem 4 Let map(.) denote the MAP values of a model M, and let c-map(.) denote the MAP val-\nues of a compensation that results from relaxing k equivalence constraints Xi \u2261 Xj. If the compen-\nsation has parameters satisfying Eq. 4, and if z is a partial assignment that sets the same sign to vari-\nables Xi and Xj, for any equivalence constraint Xi \u2261 Xj relaxed, then: map(z) \u2264 1\n1+k c-map(z).\nAlgorithms, such as those in [3, 4], perform a depth-\ufb01rst branch-and-bound search to \ufb01nd an optimal\nMAP solution. They rely on upper bounds of a MAP solution, under partial assignments, in order to\nprune the search space. Thus, any method capable of providing upper bounds tighter than those of a\nrelaxation, can potentially have an impact in the performance of a branch-and-bound MAP solver.\n\n5\n\n\f0 \u2190 result of relaxing all Xi \u2261 Xj in M\n\n0 the factors \u03b8(Xi), \u03b8(Xj), for each Xi \u2261 Xj\n\nAlgorithm 1 RelaxEq-and-Compensate (REC)\ninput: a model M with k equivalence constraints Xi \u2261 Xj\noutput: a compensation M\u2032\nmain:\n1: M\u2032\n2: add to M\u2032\n3: initialize all parameters \u03b80(Xi = x), \u03b80(Xj = x), e.g., to 1\n4: t \u2190 0\n5: while parameters have not converged do\n6:\n7:\n8:\n9:\n10:\n11:\n12: return M\u2032\nt\n\nt \u2190 t + 1\nfor each equivalence constraint Xi \u2261 Xj do\n\nfor REC-BP: Equations 5 & 6\nfor REC-I: Equations 7 & 8\n\n2 r-map\n\n\u22c6\n\nupdate parameters \u03b8(Xi = x)t, \u03b8(Xj = x)t, computed using compensation M\u2032\n\nt\u22121, by:\n\n\u03b8t(Xi) \u2190 q \u00b7 \u03b8t(Xi) + (1 \u2212 q) \u00b7 \u03b8t\u22121(Xi) and \u03b8t(Xj) \u2190 q \u00b7 \u03b8t(Xj) + (1 \u2212 q) \u00b7 \u03b8t\u22121(Xj)\n\n5 An Algorithm to Find Compensations\n\nUp to this point, we have not discussed how to actually \ufb01nd the auxiliary parameters \u03b8(Xi = x) and\n\u03b8(Xj = x) of a compensation. However, Equations 3 and 4 naturally suggest iterative algorithms for\n\ufb01nding REC-BP and REC-I compensations. Consider, for the case of REC-BP, the fact that parameters\nsatisfy Equation 3 iff they satisfy:\n\n\u03b8(Xi = x) = c-map(Xj = x) \u2212 \u03b8(Xj = x) \u2212 \u03b3\n\u03b8(Xj = x) = c-map(Xi = x) \u2212 \u03b8(Xi = x) \u2212 \u03b3\n\nThis suggests an iterative \ufb01xed-point procedure for \ufb01nding the parameters of a compensation that\nsatisfy Equation 3. First, we start with an initial compensation with MAP values c-map0(.), where\nparameters have been initialized to some value. For an iteration t > 0, we can update our parameters\nusing the compensation from the previous iteration:\n\n1+k c-map\n\n\u03b8t(Xi = x) = c-mapt\u22121(Xj = x) \u2212 \u03b8t\u22121(Xj = x) \u2212 \u03b3t\u22121\n\u03b8t(Xj = x) = c-mapt\u22121(Xi = x) \u2212 \u03b8t\u22121(Xi = x) \u2212 \u03b3t\u22121\n\n(5)\n(6)\nwhere \u03b3t\u22121 = k\nt\u22121. If at some point, the parameters of one iteration do not change in\nthe next, then we can say that the iterations have converged, and that the compensation satis\ufb01es\nEquation 3. Similarly, for REC-I compensations, we use the update equations:\n1+k c-mapt\u22121(Xj = x) \u2212 \u03b8t\u22121(Xj = x)\n1+k c-mapt\u22121(Xi = x) \u2212 \u03b8t\u22121(Xi = x)\n\n\u03b8t(Xj = x) =\n\n\u03b8t(Xi = x) =\n\n(7)\n(8)\n\n\u22c6\n\n1\n\n1\n\nto identify compensations that satisfy Equation 4.\n\n2 ). Note also that in Line 3, we suggest that we initialize parameters by 1\n\nAlgorithm 1 summarizes our proposal to compensate for a relaxation, using the iterative procedures\nfor REC-BP and REC-I. We refer to this algorithm more generically as RelaxEq-and-Compensate\n(REC). Note that in Line 11, we further damp the updates by q, which is typical for such algorithms\n(we use q = 1\n\u22c6. The\n\u22c6.1 That\nconsequence of this is that our initial compensation has the MAP value 1\nis, the initial compensation is equivalent to the relaxation, for both REC-BP and REC-I. Typically,\nboth algorithms tend to have compensations with decreasing MAP values. REC-BP may eventually\nhave MAP values that oscillate however, and may not converge. On the other hand, by Theorem 3,\nwe know that a REC-I compensation must yield an upper bound on the true MAP value map\n\u22c6.\n\u22c6 from the relaxation, REC-I yields, at least empirically,\nStarting with an initial upper bound r-map\nmonotonically decreasing upper bounds on the true MAP value from iteration to iteration. We\nexplore this point further in the following section.\n\n2 r-map\n0 = r-map\n\n1+k c-map\n\n\u22c6\n\n1\n\nc-map\n\n\u22c6\n\n0 = maxx c-map0(x) = maxx[r-map(x) + PXi\u2261Xj\n= maxx[r-map(x) + k \u00b7 r-map\n\n\u22c6] = r-map\n\n\u22c6 + k \u00b7 r-map\n\n\u22c6\n\n\u03b8(Xi = x) + \u03b8(Xj = x)]\n\n6\n\n\frandom grid (REC-BP)\n\nrandom frustrated grid (REC-I)\n\nrandom frustrated grid (REC-BP)\n\n1.0\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nr\no\nr\nr\ne\n \nn\no\ni\nt\na\nm\nx\no\nr\np\np\na\n\ni\n\n0.0\n0\n\n1.0\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nr\no\nr\nr\ne\n \nn\no\ni\nt\na\nm\nx\no\nr\np\np\na\n\ni\n\n0.0\n0\n\n1.0\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nr\no\nr\nr\ne\n \nn\no\ni\nt\na\nm\nx\no\nr\np\np\na\n\ni\n\n0.0\n0\n\n1.0\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nr\no\nr\nr\ne\n \nn\no\ni\nt\na\nm\nx\no\nr\np\np\na\n\ni\n\n0.0\n0\n\niterations\n\n5000\n\nrandom grid (REC-I)\n\niterations\n\n5000\n\n1.0\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nr\no\nr\nr\ne\n \nn\no\ni\nt\na\nm\nx\no\nr\np\np\na\n\ni\n\n0.0\n0\n\niterations\n\n5000\n\niterations\n\n5000\n\nrandom frustrated grid (REC-I)\n\nrandom frustrated grid (REC-I)\n\n1.0\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nr\no\nr\nr\ne\n \nn\no\ni\nt\na\nm\nx\no\nr\np\np\na\n\ni\n\n0.0\n0\n\niterations\n\n5000\n\niterations\n\n5000\n\nFigure 1: The REC algorithm in 10 \u00d7 10 grids. Left column: random grids, using REC-BP (top) and\nREC-I (bottom). Center column: frustrated grids, using REC-I with p = 1\n3 (bottom).\nRight column: frustrated grids, using REC-BP (top) with a fully disconnected relaxation, and REC-I\n(bottom) with a relaxation with max cluster size 3.\n\n2 (top), p = 1\n\n6 Experiments\n\n\u2212map\n\n\u22c6\n\n\u22c6\u2212map\n\n1\n\n1+k c-map\n\n\u22c6\n\nr-map\n\nOur goal in this section is to highlight the degree to which different types of compensations can\ntighten a relaxation, as well as to highlight the differences in the iterative algorithms to \ufb01nd them.\nWe evaluated our compensations using randomly parametrized 10 \u00d7 10 grid networks. We judge the\nquality of an approximation by the degree to which a compensation is able to improve a relaxation.\n\u22c6 which is zero when the compensation is\nIn particular, we measured the error E =\nexact, and one when the compensation is equivalent to the relaxation (remember that we initialize the\nREC algorithm, for both types of compensations, with parameters that led to an initial compensation\nwith an optimal MAP value\n\u22c6). Note also that we use no instances where the\n\u22c6 = 0, where the relaxation alone was able to recover the\nerror E is unde\ufb01ned, i.e., r-map\nexact solution.\nWe \ufb01rst consider grid networks where factors \u03c8a(xi, xj) were assigned to grid edges (i, j), with\nvalues drawn uniformly at random from 0 to 1 (we assigned no factors to nodes). We assumed \ufb01rst\nthe coarsest possible relaxation, one that results in a fully disconnected approximation, and where\nthe MAP value is found by maximizing factors independently.2 We expect a relaxation\u2019s upper\nbound to be quite loose in this case.\n\n1+k c-map\n\n\u22c6 \u2212 c-map\n\n\u22c6\n\n0 = r-map\n\n1\n\nConsider \ufb01rst Figure 1 (left), where we generated ten random grid networks (we plotted only ten\nfor clarity) and plotted the compensation errors (y-axis) as they evolved over iterations (x-axis). At\niteration 0, the MAP value of each compensation is equivalent to that of the relaxation (by design).\nWe see that, once we start iterating, that both methods of compensation can tighten the approxima-\ntion of our very coarse relaxation. For REC-BP, we do so relatively quickly (in fewer iterations),\nand to exact or near-exact levels (note that the 10 instances plotted behave similarly). For REC-I,\nconvergence is slower, but the compensation is still a signi\ufb01cant improvement over the relaxation.\nMoreover, it is apparent that further iterations would bene\ufb01t the compensation further.\n\nWe next generated random grid networks with frustrated interactions. In particular, each edge was\n2 . An attractive\ngiven either an attractive factor or repulsive factor, at random each with probability 1\nfactor \u03c8a(Xi, Xj) was given a value at random from 1 \u2212 p to 1 if xi = xj and a value from 0 to\n\n2For each factor \u03c8a and for each variable X in \u03c8a, we replaced variable X with a unique clone \u02c6X and\nintroduced the equivalence constraint X \u2261 \u02c6X. When we then relax all equivalence constraints, the resulting\nfactor graph is fully disconnected. This corresponds to deleting all factor graph edges, as described in Section 3.\n\n7\n\n\fp if xi 6= xj, which favors con\ufb01gurations xi = xj when p \u2264 1\n2 . Similarly for repulsive factors,\nwhich favors instead con\ufb01gurations where xi 6= xj. It is well known that belief propagation tends to\nnot converge in networks with frustrated interactions [11]. Non-convergence is the primary failure\nmode for belief propagation, and in such cases, we may try to use instead REC-I. We generated 10\nrandom grid networks with p = 1\n3 . Although the frustration\nin these networks is relatively mild, REC-BP did not converge in any of these cases. On the other\nhand, REC-I compensations were relatively well behaved, and produced monotonically decreasing\nupper bounds on the MAP value; see Figure 1 (center). Although the degree of compensation is not\nas dramatic, we note that we are compensating for a very coarse relaxation (fully disconnected).\n\n2 and another 10 networks with p = 1\n\nIn Figure 1 (right), we considered frustrated grid networks where p = 1\n10 , where REC-BP converged\nin only one of 10 networks generated. Moreover, we can see in that one instance, REC-BP converges\nbelow the true MAP value; remember that by Theorem 3, REC-I compensations always yield upper\nbounds. In the case of REC-I, the compensations did not improve signi\ufb01cantly on the fully discon-\nnected relaxations (not shown). It is, however, straightforward to try less extreme relaxations. For\nexample, we used the mini-buckets-based approach to relaxation proposed in [4], and identi\ufb01ed re-\nlaxed models M\u2032 with jointrees that had a maximum cluster size of 3 (c.f., [12] which re-introduced\nconstraints over triples). Surprisingly, this was enough for REC-I to compensate for the relaxation\ncompletely (to within 10\u22128) in 7 of the 10 instances plotted. REC-BP bene\ufb01ts from added structure\nas well, converging and compensating completely (to within 10\u22124) in 9 of 10 instances (not plotted).\n\n7 Discussion\n\nThere are two basic concepts underlying our proposed framework. The \ufb01rst is to relax a problem by\ndropping equivalence constraints. The second is that of compensating for a relaxation in ways that\ncan capture existing algorithms as special cases, and in ways that allow us to design new algorithms.\nThe idea of using structural relaxations for upper-bounding MAP solutions in probabilistic graphical\nmodels goes back to mini-bucket approximations [13], which can be considered to be a particular\nway of relaxing equivalence constraints from a model [4]. In this paper, we propose further a way\nto compensate for these relaxations, by restoring a weaker notion of equivalence. One approach to\ncompensation identi\ufb01ed a generalized class of max-product belief propagation approximations. We\nthen identi\ufb01ed a second approach that led to another class of approximations that we have observed\nto yield tighter upper bounds on MAP solutions as compared to a relaxation alone.\n\nAn orthogonal approach to upper-bounding MAP solutions is based on linear programming (LP)\nrelaxations, which has seen signi\ufb01cant interest in recent years [1, 2]. This perspective is based on\nformulating MAP problems as integer programs, whose solutions are upper-bounded by tractable LP\nrelaxations. A related approach based on Lagrangian relaxations is further capable of incorporating\nstructural simpli\ufb01cations [14]. Indeed, there has been signi\ufb01cant interest in identifying a precise\nconnection between belief propagation and LP relaxations [2, 10].\n\nIn contrast to the above approaches, compensations further guarantee, in Theorem 4, upper bounds\non MAP solutions under any partial assignment (without rerunning the algorithm). This property\nhas the potential to impact algorithms, such as [3, 4], that rely on such upper bounds, under partial\nassignments, to perform a branch-and-bound search for optimal MAP solutions.3 Further, as we\napproximate MAP by computing it exactly in a compensated model, we avoid the dif\ufb01culties that al-\ngorithms such as max-product BP and related algorithms face, which infer MAP assignments using\nmax-marginals (which may not have unique maximal states), which is based on local information\nonly [1]. The perspective that we propose further allows us to identify the intuitive differences be-\ntween belief propagation and an upper-bound approximation, namely that they arise from different\nnotions of compensation. We hope that this perspective will enable the design of new approxima-\ntions, especially in domains where speci\ufb01c notions of compensation may suggest themselves.\n\nAcknowledgments\n\nThis work has been partially supported by NSF grant #IIS-0916161.\n\n3We investigated the use of REC-I approximations in depth-\ufb01rst branch-and-bound search for solving\n\nweighted Max-SAT problems, where we were able to use a more specialized iterative algorithm [15].\n\n8\n\n\fReferences\n\n[1] Martin J. Wainwright, Tommi Jaakkola, and Alan S. Willsky. MAP estimation via agreement\non trees: message-passing and linear programming. IEEE Transactions on Information Theory,\n51(11):3697\u20133717, 2005.\n\n[2] Amir Globerson and Tommi Jaakkola. Fixing max-product: Convergent message passing al-\n\ngorithms for MAP LP-relaxations. In NIPS, pages 553\u2013560, 2008.\n\n[3] Radu Marinescu, Kalev Kask, and Rina Dechter. Systematic vs. non-systematic algorithms for\n\nsolving the MPE task. In UAI, pages 394\u2013402, 2003.\n\n[4] Arthur Choi, Mark Chavira, and Adnan Darwiche. Node splitting: A scheme for generating\n\nupper bounds in Bayesian networks. In UAI, pages 57\u201366, 2007.\n\n[5] Rina Dechter and Irina Rish. Mini-buckets: A general scheme for bounded inference. J. ACM,\n\n50(2):107\u2013153, 2003.\n\n[6] Arthur Choi and Adnan Darwiche. An edge deletion semantics for belief propagation and its\n\npractical impact on approximation quality. In AAAI, pages 1107\u20131114, 2006.\n\n[7] Arthur Choi and Adnan Darwiche. Approximating the partition function by deleting and then\n\ncorrecting for model edges. In UAI, pages 79\u201387, 2008.\n\n[8] Rina Dechter, Kalev Kask, and Robert Mateescu. Iterative join-graph propagation. In UAI,\n\npages 128\u2013136, 2002.\n\n[9] Martin J. Wainwright, Tommi Jaakkola, and Alan S. Willsky. Tree consistency and bounds on\nthe performance of the max-product algorithm and its generalizations. Statistics and Comput-\ning, 14:143\u2013166, 2004.\n\n[10] Yair Weiss, Chen Yanover, and Talya Meltzer. MAP estimation, linear programming and belief\n\npropagation with convex free energies. In UAI, 2007.\n\n[11] Gal Elidan, Ian McGraw, and Daphne Koller. Residual belief propagation: Informed schedul-\n\ning for asynchronous message passing. In UAI, 2006.\n\n[12] David Sontag, Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss. Tightening\n\nLP relaxations for MAP using message passing. In UAI, pages 503\u2013510, 2008.\n\n[13] Rina Dechter. Mini-buckets: a general scheme for approximation in automated reasoning.\nIn Proc. International Joint Conference on Arti\ufb01cial Intelligence (IJCAI), pages 1297\u20131302,\n1997.\n\n[14] Jason K. Johnson, Dmitry M. Malioutov, and Alan S. Willsky. Lagrangian relaxation for\nIn Proceedings of the 45th Allerton Conference on\n\nMAP estimation in graphical models.\nCommunication, Control and Computing, pages 672\u2013681, 2007.\n\n[15] Arthur Choi, Trevor Standley, and Adnan Darwiche. Approximating weighted Max-SAT prob-\nlems by compensating for relaxations. In Proceedings of the 15th International Conference on\nPrinciples and Practice of Constraint Programming (CP), pages 211\u2013225, 2009.\n\n9\n\n\f", "award": [], "sourceid": 891, "authors": [{"given_name": "Arthur", "family_name": "Choi", "institution": null}, {"given_name": "Adnan", "family_name": "Darwiche", "institution": null}]}