{"title": "Lifted Weighted Mini-Bucket", "book": "Advances in Neural Information Processing Systems", "page_first": 10329, "page_last": 10337, "abstract": "Many graphical models, such as Markov Logic Networks (MLNs) with evidence, possess highly symmetric substructures but no exact symmetries. Unfortunately, there are few principled methods that exploit these symmetric substructures to perform efficient approximate inference. In this paper, we present a lifted variant of the Weighted Mini-Bucket elimination algorithm which provides a principled way to (i) exploit the highly symmetric substructure of MLN models, and (ii) incorporate high-order inference terms which are necessary for high quality approximate inference. Our method has significant control over the accuracy-time trade-off of the approximation, allowing us to generate any-time approximations. Experimental results demonstrate the utility of this class of approximations, especially in models with strong repulsive potentials.", "full_text": "Lifted Weighted Mini-Bucket\n\nNicholas Gallo\n\nUniversity of California Irvine\n\nIrvine, CA 92637-3435\n\nngallo1@uci.edu\n\nAlexander Ihler\n\nUniversity of California Irvine\n\nIrvine, CA 92637-3435\nihler@ics.uci.edu\n\nAbstract\n\nMany graphical models, such as Markov Logic Networks (MLNs) with evidence,\npossess highly symmetric substructures but no exact symmetries. Unfortunately,\nthere are few principled methods that exploit these symmetric substructures to\nperform ef\ufb01cient approximate inference. In this paper, we present a lifted variant of\nthe Weighted Mini-Bucket elimination algorithm which provides a principled way\nto (i) exploit the highly symmetric substructure of MLN models, and (ii) incorporate\nhigh-order inference terms which are necessary for high quality approximate\ninference. Our method has signi\ufb01cant control over the accuracy-time trade-off of\nthe approximation, allowing us to generate any-time approximations. Experimental\nresults demonstrate the utility of this class of approximations, especially in models\nwith strong repulsive potentials.\n\n1\n\nIntroduction\n\nMany applications require computing likelihoods and marginal probabilities over a distribution\nde\ufb01ned by a graphical model, tasks which are intractable in general [24]. This has motivated the\ndevelopment of approximate inference techniques with controlled computational cost. Inference in\nthese settings often involves reasoning over a set of regions (subsets of variables), with larger regions\nproviding higher accuracy at a higher cost. This paper utilizes the Weighted Mini-Bucket (WMB)\n[10] algorithm which employs a simple heuristic method of region selection that mimics a variable\nelimination procedure.\nRecently, there has been interest in modeling large problems with repeated potentials and structure,\noften described with a Markov Logic Network (MLN) language [16]. Such models arise in many\nsettings such as social network analysis (e.g. estimating voting habits), collective classi\ufb01cation (e.g.\nclassifying text in connected web-pages), and many others [16]. In these settings, lifted inference\nrefers to a broad class of techniques, both exact [3, 15, 21] and approximate [4, 11, 13, 5, 12, 9],\nthat exploit model symmetries. Most of these methods work well when the model possesses well-\nde\ufb01ned symmetries [4, 11, 13, 5, 12, 14], but break down in unpredictable ways in the presence of\nunstructured model perturbations present in most practical settings. The problem of asymmetries in\nthe approximate inference structure is compounded when higher order inference terms (for which\nlifted inference requires higher order model symmetry) are incorporated [14, 20, 5].\nMethods to control computational cost in the presence of asymmetries are largely heuristic, such\nas [19] which presents a belief propagation procedure that approximates messages in a symmetric\nform. Other works create an over-symmetric approximate model [23, 22] on which inference is\nrun, but provide no guarantees on its relation to the original problem. Similar to our work, [17]\nemploy (non-weighted) mini-bucket inference; however, they too rely on over-symmetric model\napproximation heuristics to control computational cost.\nThis paper addresses the shortcomings of the methods described above with a lifted variant of\nWeighted mini-bucket (LWMB) that is able to (i) trade-off inference cost with accuracy in a controlled\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fmanner in the presence of asymmetries, and (ii) incorporate higher-order approximate inference\nterms, which are crucial for high quality inference. This work can be seen as a high-order (property\n(ii)) extension of [9] (which is qualitatively identical in property (i)). Additionally, this work employs\nef\ufb01cient region selection and representation for MLN models, and hence never grounds the graph as\nmany others are required to do for symmetry detection (e.g. [9, 20, 14]).\n\n2 Background\n\nA Markov random \ufb01eld (MRF) over n discrete random variables (RVs) X = [X1 . . . Xn] taking\nvalues x = [x1 . . . xn] \u2208 (X 1 \u00d7 . . . \u00d7 X n) has probability density function\n\np(X = x) =\n\n1\nZ\n\n(cid:89)\n\n\u03b1\u2208I\n\nf\u03b1(x\u03b1);\n\nZ =\n\nf\u03b1(x\u03b1)\n\n(cid:88)\n\n\u00b7\u00b7\u00b7(cid:88)\n\nxn\n\nx1\n\n(cid:89)\n\n\u03b1\u2208I\n\nwhere I indexes subsets of variables and each \u03b1 \u2208 I is associated with potential table f\u03b1. The\npartition function Z normalizes the distribution. Calculating Z is a central problem in many learning\nand inference tasks, but exact evaluation of the summation is exponential in n, and hence intractable.\n\n2.1 Bucket and Mini-Bucket Elimination\n\nBucket Elimination (BE) [6] is an exact inference algorithm that directly eliminates RVs along a\nsequence o called the elimination order. Without loss of generality, we assume that each factor\nindex \u03b1 \u2208 I is ordered according to o. BE operates by performing the summation (2) along each\nRV in sequence. The computation is organized with a set of buckets B1 . . . Bn where initially each\nBv = {f\u03b1 | \u03b11 = v} is the set of model factors whose earliest eliminated RV index is v. Proceeding\nsequentially along o, we multiply the factors in Bv, then sum over xv producing a message\n\nmv\u2192w(xpa(v)) =\n\nf(cid:48)\n\u03b1(x\u03b1)\n\n(1)\n\n(cid:88)\n\n(cid:89)\n\nxv\n\n\u03b1\u2208Bv\n\nwhere pa(v) is the arguments of factors in Bv not including v. The message is then placed in bucket\nBw. If w = \u2205 the message is a scalar. All such messages are multiplied to form Z. The computational\ncost is exponential in the scope of the largest message, which is prohibitive in most applications.\n\nMini-Bucket Elimination (MBE) Mini-Bucket Elimination [7] avoids the complexity of BE\nby upper (or lower) bounding the message (1) as the product of terms each over a controlled\nnumber iBound of RVs. During elimination, factors in bucket Bv are grouped into partitions\nv \u2208 Qv is called a mini-bucket and is associated with factors that\nQv = {q1\n(collectively) use at most iBound + 1 RVs. The true message is bounded using the inequality\n\nv . . . qk\n\nv}, where each qj\n(cid:88)\n\n(cid:89)\n\nxv\n\n\u03b1\u2208Bv\n\n\u03b1(x\u03b1) \u2264 (cid:88)\n\nf(cid:48)\n\n(cid:89)\n\nxv\n\n\u03b1\u2208q1\n\nv\n\n\u03b1(x\u03b1) \u00b7\nf(cid:48)\n\nf(cid:48)\n\u03b1(x\u03b1)\n\n(2)\n\n|Qv|(cid:89)\n\nj=2\n\n(cid:89)\n\n\u03b1\u2208qj\n\nv\n\nmax\nxv\n\nEach message is an upper bound on the exact message, hence the full procedure yields an upper\nbound on Z.\n\n2.2 Weighted mini-bucket elimination (WMB)\n\nWMB [10] generalizes MBE by using a tighter bound based on Holder\u2019s inequality\n\nw(cid:88)\n\nf (x) =(cid:2)(cid:88)\n\nf (x)1/w(cid:3)w\n\n(3)\n\nh(x), where\n\nx\n\nx\n\nx\n\ng(x) \u00b7 1\u2212w(cid:88)\n\n(cid:88)\ninequality(cid:80)\n\nx\n\ng(x) \u00b7 h(x) \u2264 w(cid:88)\nx g(x) \u00b7 h(x) \u2264(cid:80)\n\nx\n\nis the power-sum operator and w \u2265 0, h(x) \u2265 0, g(x) \u2265 0. The power-sum reduces to standard\nsum when w = 1 and approaches maxx f (x) as w \u2192 0+. Thus, Holder\u2019s inequality generalizes the\n\nx g(x) \u00b7 maxx h(x) used by mini-bucket elimination (MBE).\n\n2\n\n\fWMB associates a weight wq \u2265 0 with each mini-bucket q \u2208 Qv where(cid:80)\n\nthen forms the bound\n\n(cid:88)\n\n(cid:89)\n\nxv\n\n\u03b1\u2208Bv\n\n\u03b1(x\u03b1) \u2264 (cid:89)\n\nf(cid:48)\n\nq\u2208Qv\n\nwq(cid:88)\n\nxv\n\n(cid:89)\n\n\u03b1\u2208q\n\nf(cid:48)\n\u03b1(x\u03b1).\n\nq\u2208Qv\n\nwr = 1 for all v,\n\n(4)\n\nVariational Optimization. The weights can be optimized to provide tighter bounds. Additionally,\napplying WMB to any parameterization of the distribution yields a bound on Z, thus it makes sense\nto optimize over all valid parameterizations as well. Each parameterization is obtained by shifting\nfactor potentials between mini-buckets. That is, for each v, associated with each mini-bucket q \u2208 Qv\nis the cost-shifting parameter \u03c6q(xv) such that\n\nmq\u2192q(cid:48)(xq(cid:48)) =\n\nf \u03c6\nq (xq)\n\nwhere\n\nf(cid:48)\n\u03b1(x\u03b1)\n\n(\u2200q \u2208 Qv)\n\n(5)\n\nwq(cid:88)\n\nxv\n\nq (xq) = \u03c6q(xv)\u22121(cid:89)\n\nf \u03c6\n\n\u03b1\u2208q\n\nis the reparameterized potential of bucket q. The cost-shifting terms that were divided out of each\nq \u2208 Qv are multiplied into an aggregated cost-shifting term\n\n(cid:89)\n\nq\u2208Qv\n\n\u03c60\n\nv(xv) =\n\n\u03c6q(xv)\n\n(6)\n\nq (xq) \u00b7 \u03c60\nf \u03c6\n\nv(xv). We then have the following bound (rather than\n\nq\u2208Qv\n\n(cid:20) wv(cid:88)\n\n(cid:89)\n\nxv\n\nq\u2208Qv\n\n(cid:21) (cid:89)\n\nq\u2208Qv\n\n\u03c6q(xv)\n\nmq\u2192q(cid:48)(xq(cid:48))\n\n(7)\n\n\u03b1(x\u03b1) \u2264\nf(cid:48)\n\nq\u2208Qv\n\n(4)) on the exact BE message\n\nsuch that(cid:81)\n\nfq(xq) =(cid:81)\n(cid:89)\n(cid:88)\ninequality we require wv +(cid:80)\n\n\u03b1\u2208Bv\n\nxv\n\nAugmented with wv \u2265 0, this is simply another term in the product that was bounded with Holder\u2019s\nwq = 1. We search for the tightest bound by performing convex\noptimization over (\u03b4,w) where log(\u03c6q) = \u03b4q for all q. Gradients can be computed and a black-box\nsolver can be used, or a \ufb01xed point iteration [10] can be used.\n\nq\u2208Bv\n\n2.3 Symmetric models and lifted inference\n\nMany models of interest, such as MLNs, are de\ufb01ned by repeated potentials organized in a symmetric\nstructure. Lifted inference refers to a broad class of techniques that exploit this structure for exact or\napproximate inference. The basic idea is to represent identical terms in the model and identical terms\ngenerated during ground inference implicitly with a single template.\nThe simplest form of symmetry used for lifted inference is based on the stable vertex coloring of a\ngraph [1] in which two vertices of the same color have identically colored neighborhoods. In the\ncontext of lifted inference, we require a stable coloring of the ground factor graph where factor nodes\nof the same color are required to have the same ordered node neighborhood and factor potential table.\nNodes of the same color behave identically during approximate inference (e.g. [18, 12]).\nRV nodes of the same color are grouped together to form the index Vi \u2282 {1 . . . n} and denote\n\u00afV = {V1 . . . VN}. Factor nodes of the same color are grouped together to form Aj \u2282 I and denote\n\u00afI = {A1 . . . AM}. Thus, given a stable partition (coloring) we can de\ufb01ne\nDe\ufb01nition 2.1. The lifted scope of A \u2208 \u00afI (relative to \u00afV) is \u03c3A = [Vb1 . . . Vbk ] where each \u03b1 \u2208 A\nhas |\u03b1| = k and \u03b1i \u2208 \u03c3A\nThis simple symmetry will help us organize higher order symmetries throughout the LWMB algorithm.\nNote, that the lifted scope (unlike the scope of a ground factor) may have repeated elements. This\noccurs, for example in a complete symmetric graph, where N = 1 and \u03c3A = [V1, V1].\n\ni for i = 1 . . . k.\n\n3\n\n\f2.3.1 Markov Logic Networks\n\nA Markov Logic Network (MLN) [16] de\ufb01nes a large symmetric model implicitly via a compact \ufb01rst\norder logic (FOL) language. The MLN predicates de\ufb01nes the set of RVs, and each MLN formula\nde\ufb01nes a set of factors with identical potential. Both are parameterized compactly by a set of logical\nvariables (LVs) taking values in a \ufb01nite domain (for example the set of all people \u2206P ).\nThe predicates of an MLN represent an attribute associated with domain elements or a relationship\namong domain elements. The instantiation of a predicate with a domain element is the index of a\nmodel RV. For example, the attribute predicate \u201cSm\u201d (for smokes) over the domain of all people (\u2206P )\ncorresponds to the set of ground RV indices {Sm(y) | y \u2208 \u2206P} (meaning that Sm(Ana) indexes\nxSm(Ana)). An example relationship predicate \u201cFr\u201d (for friends) among all pairs of people is the set\nof indices {F r(x, y) | x (cid:54)= y \u2208 \u2206P}.1\nA formula speci\ufb01es a soft-logic rule applied identically to all people (or groups of people). An example\nrelating smoking habits between friends is \u201c(\u2200 y (cid:54)= z \u2208 \u2206P ) F r(y, z) \u2227 (Sm(y) \u21d4 Sm(z)), \u03b3\u201d.\nThis corresponds to the set of R = { [F r(y, z), Sm(y), Sm(z)] | y (cid:54)= z \u2208 \u2206P} and where\neach \u03b1 \u2208 R, f\u03b1(x\u03b1) = fR(\u00afxR) where fR(\u00afxR) is a template with log potential taking value \u03b3 if\nF r(y, z) \u2227 (Sm(y) \u21d4 Sm(z)) is true and 0 otherwise (\u201cFr(y,z)\u201d corresponds to \u00afx1 in the template).\nThe FOL expressions de\ufb01ning formulas can be arbitrary, but they often have the form of simple\ndomain constraints on LVs, with all-diff constraints on LVs ranging over identical domain (note\nall-diff(y,z) is equivalent to y (cid:54)= z used in Fr-Smoker formula above). The stable coloring of the factor\ngraph groups predicate RVs together and factors associated with each formula together. Formulas of\nthis form also possess many higher order symmetries which we exploit later.\n\n3 Lifted Weighted Mini-Bucket (LWMB)\n\nThis section presents a variant of WMB that operates on lifted factors, each of which is a group of\nidentical ground factors, and eliminates blocks of random variables simultaneously. A key dif\ufb01culty is\nchoosing an approximating structure that guarantees symmetric messages (which can be represented\nas a lifted factor) are produced and, furthermore, that allows forming high order (high iBound)\nsymmetric inference terms. We \ufb01rst discuss these operations in models that possess the necessary\nsymmetric structure, then discuss modi\ufb01cations that allow us to control the size of the LWMB\ngraph in the presence of model asymmetries. Algorithm 1 summarizes the LWMB tree construction\nalgorithm (similar to ground mini-bucket construction [7]) developped in this section.\n\nDe\ufb01nition 3.1. A lifted factor FG(xG) =(cid:81)\n\n3.1 LWMB in Symmetric Models\nFirst order symmetries speci\ufb01ed by a stable partition of variables \u00afV = {V1 . . . VN} and model factors\n\u00afI = {A1 . . . AM} are necessary to provide a LWMB bound. We further require that the lifted scope\n\u03c3A not have repeated elements (we relax both of these restrictions later). The computations for\nLWMB will be described by their equivalence to a set of ground operations. To this end, we de\ufb01ne\n\u03b1\u2208G fG(x\u03b1) is the product of the template potential\nfG(\u00afx) applied to all sets of ground RVs indexed by elements of G, which range over the same domain\nas \u00afx used to de\ufb01ne the template.2\nBlocks of ground RVs indexed by V \u2208 \u00afV are eliminated simultaneously, along a lifted elimination\norder O. We assume the lifted scope \u03c3A of all lifted factors in the input and generated during\ninference are ordered by O. The computation is organized with a set of buckets {BV1 . . . BVN}\n1 = V } is the set of lifted model factors whose earliest eliminated\nwhere initially each BV = {FA | \u03c3A\nlifted RV block is V .\n\nLifted multiplication Having collected lifted factors in lifted buckets, an RV partition V will be\nprocessed by \ufb01rst forming lifted mini-buckets \u00afQV = {q1\nV } each of which groups together\nand multiplies lifted factors. The lifted product corresponds to a product of ground terms and may,\nin general, not have a lifted factor representation. This situation can cause symmetries to break\n\nV . . . qk\n\n1In general, can have > 2 LVs and > 1 domain type\n2xG abuses notation, refering to xG(cid:48) where G(cid:48) = \u222a\u03b1\u2208G \u03b1 is the set of all RVs used by elements of G\n\n4\n\n\frepeat\n\nA\u2208q\u222aq(cid:48) FA(xA)\n\n1 = V, A \u2208 \u00afI(cid:48) }\n\nlifted elimination order O(cid:48), iBound\n\nis valid and has template size \u2264 iBound + 1\n\nSelect q, q(cid:48) \u2208 QV s.t.(cid:81)\n\nBv \u2190 Bv \u222a (q \u222a q(cid:48)) \\ {q, q(cid:48)}\nFor all b, replace Mb\u2192q or Mb\u2192q(cid:48) with Mb\u2192(q\u222aq(cid:48)).\n\nAlgorithm 1 LWMB Tree Build\n1: Input: Lifted model factors \u00afI(cid:48), RV partition \u00afV(cid:48),\n2: BV = { FA | \u03c3A\n3: Initialize empty set of messages and empty QV \u2200V \u2208 V(cid:48)\n4: for (V = First(O(cid:48)); \u00afv (cid:54)= \u2205; \u00afv \u2190 Next(V, O(cid:48))) do\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16:\n17: end for\n18: Output Mini-Buckets {QV | V \u2208 \u00afV(cid:48)}, and messages structure {mq\u2192q(cid:48) | \u2200q \u2208 \u222aV \u2208 \u00afV (cid:48)QV }\n\nSet p to indices where \u03c3C (cid:54)= \u03c3C\nD = {cp | c \u2208 C},\n1\n\u03c3D = \u03c3C\np\nAdd {D} to B\u03c3D\n\nand add message pointer mq\u2192{D}.\n\nq = q(cid:48) = \u2205\n\nuntil\nfor q \u2208 Bv do\n\nFC =(cid:81)\n\n(cid:46) Simulate mini-bucket message pass\n(cid:46) Get C: ground regions associated with product\n\n(cid:46) Merge MBs, delete old\n(cid:46) Re-route incoming messages\n\n(cid:46) D : scope indices of ground messages\n\n(cid:46) Lifted multiply\n\nend for\n\n1\n\nA\u2208q FA(xA)\n\n(cid:89)\n\n(cid:89)\n\nin arbitrarily complex ways (e.g., during lifted variable elimination; see [15],[21]). We need to\nunderstand lifted multiplication to design LWMB bounds that avoid this situation.\nTo compute the lifted product FT (xT ) = FR(xR)FS(xS), we require a symmetric join, for given\nindex vectors p and q, to exist. This means there exists a T where for each t \u2208 T , tp \u2208 R and\ntq \u2208 S, and furthermore that for each r \u2208 R, |{t | tp = r, t \u2208 T}| = |T|/|R|, meaning that each r\nparticipates in the same number of elements of T in position p (and similarly for S). We then have\n\nfR(xtp )|R|/|T|fS(xtq )|S|/|T|\n\nt\u2208T\n\nt\u2208T\n\nfT (xt) =\n\nFT (xT ) =\n\n(8)\nwhere fT (\u00afx) = fR(\u00afxp)fS(\u00afxq). In the simplest case, |R| = |S| = |T| and there is a one-to-one\nmapping, corresponding to a series of standard ground multiplications. Otherwise, it corresponds to\n\u201cspreading\u201d a ground factor across many identical (up to renaming of RVs) ground multiplications. If\nthe symmetric join does not exist, we say the lifted multiplication is invalid.\nOnly one set of p and q can be valid when the lifted scopes have unique elements. The lifted scope of\nthe multiplication (if valid) will be \u03c3T = [\u03c3R, \u03c3S]. Hence we set p and q such that \u03c3T\np = \u03c3R and\nq = \u03c3S. This matches the lifted factors on \ufb01rst order symmetries, which is a necessary but not\n\u03c3T\nsuf\ufb01cient condition for higher order symmetries.\n\nSymmetric join with FOL formulas\nIf R and S are represented with FOL formulas and each\ncontains only domain constraints, the symmetric join can be performed quickly (or determine none\nexists). If the domain constraints between R and S are either disjoint or identical, we simply match\nthe two on their LVs with identical domain.\nFor example, multiplying factors de\ufb01ned by {[A(x), B(x)] | x \u2208 \u22061} and {[A(x), C(x)] | x \u2208 \u22061}\nproduces {[A(x), B(x), C(x)] | x \u2208 \u22061}. As another example, {[A(x), B(x)] | x \u2208 \u22061} and\n{[A(x), C(z)] | x \u2208 \u22061, z \u2208 \u22063} will produce {[A(x), B(x), C(z)] | x \u2208 \u22061, z \u2208 \u22063} if\n\u22061 \u2229 \u22063 = \u2205. The main algorithm in section 4 uses many symmetric lifted factors of this form.\nLifted cost-shifting For each V , each lifted mini-bucket q \u2208 BV is associated with a weight Wq\nand cost-shifting lifted factor \u03a6q\n\nV (xV ) with template \u03c6q(\u00afx). We form (analogous to 5)\nQ (xQ) = \u03a6q\nF \u03a6\n\nF (cid:48)\nA(xA)\n\nV (xV )\u22121(cid:89)\n\n(9)\n\nA\u2208q\n\nwhere F (cid:48)\nset of ground factors associated with the product of all of mini-bucket q\u2019s lifted factors.\n\nA represents lifted factors arising from model or a message in mini-bucket, Q represents the\n\n5\n\n\f(a) Ground graph\n\n(b) Initial LWMB graph\n\n(c) LWMB graph after split\n\nFigure 1: (a) Symmetric graph with potential \u00aff on each edge and a distinct unary potential at each\nnode, (b) LWMB graph with partition A = {a(1), a(2)}, B = {b(1), b(2), b(3)} and lifted elimination\norder (A, B), (c) LWMB graph with B partitioned into B(1) = {b(1)} and B(2) = {b(2), b(3)}. Each\nRV partition is connected to its associated ground RVs with a horizontal edge through a solid square\nnode. Each other horizontally oriented edge (e.g. between nodes (A) and (A, B) in panel (b)) is\nassociated with a cost-shifting term.\n\nLifted message passing.\nThe message Q sends should be a lifted factor equal to the product of\nidentical messages sent from each \u03b1 \u2208 Q. The result will be a lifted factor over D = {\u03b1\\\u03b11 | \u03b1 \u2208 Q}.\nSince each v \u2208 V appears |Q|/|V | times in Q (due to symmetry), each ground factor should be\neliminated with weight wq = Wq/(|V |/|Q|) yielding the template\n\n(cid:18) wq(cid:88)\n\n(cid:19)|Q|/|D|\n\nq\u2192q(cid:48)(\u00afx \\ \u00afx1) =\nm(cid:48)\n\nf \u03a6\nQ(\u00afx)\n\n(10)\n\nThe power |Q|/|D| arises since each d \u2208 D receives (by symmetry) |Q|/|D| copies of the message.\nThe lifted factor message, denoted Mq\u2192q(cid:48)(xD) has template m(cid:48)\n\nq\u2192q(cid:48) applied at all indices in D.\n\n\u00afx1\n\n3.2 Handling asymmetries\n\nSuppose for a lifted factor FG there are K = |\u03c3G = \u03c3G\n\nK RV\u2019s simultaneously(cid:80)wq\n\nThe exact symmetries necessary to perform lifted inference rarely exist in practice. In the extreme\ncase, model asymmetries cause lifted algorithms to ground the model. Here, we extend LWMB\nto handle asymmetries (i) induced by the elimination order (to prevent, for example, grounding a\ncomplete symmetric graph), and (ii) induced by unstructured unary evidence.\n1 | > 1 copies\nSequential Asymmetries.\nof the earliest variable partition in the lifted scope \u03c3G. In this case, any ground elimination order will\ntreat RVs in the partition differently (hence requires grounding). The way around this is to eliminate\nfG(\u00afx), where wq = Wq/(|V |/|Q| \u00b7 K). This can be justi\ufb01ed\nby applying Holder\u2019s inequality with any elimination order and appropriately tied weights, noting\nthat the power-sum with tied weights commute (details omitted for space).\nDistinct Unary Evidence.\nThe LWMB bound can be modi\ufb01ed to incorporate distinct single-RV\npotentials. The trick (similar to [9]) is to aggregate the lifted cost-shifting terms and multiply the\nresult with the ground unary terms. That is, de\ufb01ne\n\n\u00b7\u00b7\u00b7(cid:80)wq\n\nx1\n\nxK\n\n(cid:89)\n\nWV(cid:88)\n\nv\u2208V\n\nxv\n\nwhere \u03c6V (\u00afx) =(cid:81)\n\nevidence terms.\n\nq\u2208QV\n\nmV \u2192\u2205 =\n\nfv(xv) \u00b7 \u03c6V (xv)\n\n(11)\n\nV (\u00afx).3 Figure 1b illustrates a LWMB graph with aggregated approximate\n\u03c6q\n\n4 Coarse to \ufb01ne LWMB for MLN models\n\nIn this section we build a sequence of LWMB approximations of gradually increasing accuracy and\ncomputational cost. Starting with a LWMB tree using the coarsest possible partition, we iteratively\n\n3Identical potentials, such hard evidence, can be grouped into a single computation (omitted for space).\n\n6\n\n\fAlgorithm 2 Coarse To Fine LWMB for MLNs\n1: Initialize Choose elimination order O on MLN predicates\n2: Initialize Build LWMB tree with MLN predicates and formulas\n3: repeat\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11: until Exact answer computed\n\nAssociate unary evidence with lifted RVs\nOptimize bound over (\u00af\u03b4, W )\nFQ \u2190 F \u03a6\nSet domain partition \u2206d \u2192 {\u22061\nSplit (\u00af\u03b4, W ), \u00afV, and lifted factors that use domain \u2206d\nUpdate lifted elimination order O\nBuild LWMB tree with new lifted MB regions and RVs as input\n\nQ /(Mq\u2192q(cid:48)) \u2200q\n\nd}\nd, \u22062\n\n(cid:46) To compute (11) during inference\n\n(cid:46) See \u201cMaintaining Monotonicity\u201d\n(cid:46) via gradient cluster or another method\n(cid:46) Section 4.1\n\nimprove the approximation with Splitting and Joining operations. Splitting partitions the cost-shifting\nparameters (eq. (11)) into \ufb01ner groupings, allowing a more \ufb02exible interaction with evidence. Joining\nincorporates high-order inference terms by performing LWMB Tree Build with high iBound. This\nprocedure is summarized in Algorithm 2 and described in this section. An example of the effect of\nsplitting on the LWMB graph in Figure 1b is show in Figure 1c.\n\n4.1 Splitting\n\n1 , . . . , V (cid:48)\n\nLWMB splitting is similar to the splitting operation presented in [9] for factor-graph models, but op-\nerates by partitioning a group of MLN domain elements rather than a group of variational parameters\n(as in [9]). That is, we partition a single domain \u2206 \u2208 \u00af\u2206 into two disjoint domains (\u2206(1) \u222a \u2206(2) = \u2206\nand \u2206(1) \u2229 \u2206(2) = \u2205). Then, we split all lifted factors and RV partitions that use M \u2265 1 LVs with\ndomain \u2206 into 2M \ufb01ner lifted factors. An important property of this splitting scheme is that lifted\nfactors with FOL form are split into lifted factors with FOL form.\nExample 1. A variable partition V = {Sm(y) | y \u2208 \u2206)} splits into {{Sm(y) | y \u2208 \u2206(i)} | i \u2208\n{1, 2}}. A lifted factor FR with R = {[F r(y, z), Sm(y), Sm(z)] | \u2200y (cid:54)= z \u2208 \u2206} splits into 4 lifted\nfactors FR(cid:48) where fR(cid:48) = fR, wR(cid:48) = wR and where R(cid:48) \u2208 {{[F r(x, y), Sm(x), Sm(y)] | \u2200x (cid:54)=\ny, x \u2208 \u2206(i), y \u2208 \u2206(j)} | (i, j) \u2208 {1, 2}2}.\nAnother important property is that lifted factors with M > 1 LVs with the same domain split into\nlifted factors with LVs of all distinct domains. Lifted factors of this form can participate in lifted\nmultiplications resulting in higher order joins (section 3). For example, in Example 1 when i (cid:54)= j, the\nx (cid:54)= y constraint is super\ufb02uous and can be dropped.\nUpdating the lifted elimination order.\ninto V (cid:48)\neliminate RVs sequentially V (cid:48)\n1 , . . . , V (cid:48)\nexample, in Figure 1c (A, B(1), B(2)) is a valid elimination order while (B(2), A, B(1)) is not.\nMaintaining Monotonicity. Modi\ufb01cations to the LWMB tree (either splitting or joining) can cause\nunpredictable changes in the message structure. Qualitatively, we must ensure that each new region\ncan send a forward message, creating new regions as necessary. Such structural modi\ufb01cations to\nthe \ufb02ow of messages can cause an increase in the bound. To guarantee a monotonically improving\nbound, we replace each lifted factor with the cost-shifted mini-bucket functions divided by their\nforward message FQ \u2190 F \u03a6\nQ /(Mq\u2192q(cid:48)) (a similar idea was used in the ground case in [8]). This is\nsimply a reparameterization of the model, but ensures that each node in the LWMB tree sends a\nuniform message of all 1\u2019s. Hence, after split or join we can simply call LWMB Tree Build with the\nreparameterized terms and guarantee a monotonic bound improvement. In practice, we apply this\ntechnique only to nodes in the tree affected by a split or join operation, leaving the message structure\nof other nodes unchanged.\nChoosing the domain partition.\nThe goal is to cheaply \ufb01nd a reasonably good split grouping. A\nsimilar problem was considered in [9]. They compute a 2-way clustering of the gradient of their infer-\nence objective with respect to variational parameters. The main idea is if parameters are constrained\nto be identical (for computational improvement with lifting) but their greedy unconstrained next step\n\nThe domain split causes an RV partition V \u2208 \u00afV to split\n2M . Since the previous LWMB bound eliminated RVs xV simultaneously, we must\n2M in the same position in O relative to other RV partitions. For\n\n7\n\n\f(a) \u03b3 = \u22120.5\n\n(b) \u03b3 = \u22120.05\n\n(c) \u03b3 = +0.05\n\nFigure 2: Repulsive (\u03b3 < 0) vs. attractive (\u03b3 > 0) collective classi\ufb01cation example. (a)-(b)High\niBound is extremely important in the presence of strong repulsive potentials, (c) but performs slightly\nworse than the baseline (\u201cNo join\u201d) in the attractive case. Dashed black lines indicate when a batch\nof splitting occurs for the blue curve (split transitions for other curves occur at similar locations).\n\nwould have been similar, then the lifting restriction incurs little error. Here, we perform a similar\noperation, but a 2-way split of a domain partition \u2206d induces a split of many parameters, associated\nwith lifted RVs that use domain \u2206d, into 2 groups. Our clustering objective is a sum over squared\nerror of all these terms (details omitted for space).\n\n5 Experiments\n\nThis section provides an empirical illustration of our LWMB algorithm. We demonstrate the superior-\nity of utilizing high order LWMB approximations for models with repulsive potentials. In models\nwith strictly attractive potentials, low order approximations work slightly better, likely due to their\nability to split more, obtaining better approximations of the evidence.\nSetup. We consider a standard collective classi\ufb01cation MLN with formula (\u2200x (cid:54)= y \u2208 \u2206)L(x, y)\u2227\n(C(x) \u21d4 C(y)), \u03b3. If \u03b3 < 0, a hard true observation on link L(x, y) induces a repulsive potential\nbetween C(x) and C(y). \u03b3 > 0 induces attractive potentials. We run experiments with N =\n|\u2206| = 512, with clustered evidence. We randomly assign elements of \u2206 to one of K = 16 clusters.\nEvidence on C predicate has potentials [0; a]. Each cluster generates a (scalar) center on N (0, 2)\neach member of the cluster is then perturbed from its center by N (0, 0.4) noise. Relational evidence\non L is generated as follows: each of the K blocks has all true evidence with each other block with\nprobability 0.25. We then \ufb02ip 25% of the evidence uniformly at random.\nOptimization. We call a black-box convex optimization (using non-linear conjugate gradients)\nallowing a maximum of 1000 function evaluations. The gradient of the LWMB objective (derivation\nomitted for space) is computable in time roughly equal to the cost of evaluating the objective.\nTiming. We report only time spent doing inference (optimization), and allow each method 250\nseconds total. Inference is the algorithmic bottleneck, and code has been written in C++. The rest\nof Algorithm 2 simply updates the LWMB structure (performing less work than a single inference\niteration) but is coded in MATLAB and thus yields unreliable timing. We note that other works on\nlifted inference report only inference time [12, 2, 9], yet incur signi\ufb01cant overhead of symmetry\ndetection (that could require touching the ground model) which we never do.\nResults.\nFigure 2 shows results for (a) strongly repulsive case, (b) weakly repulsive case, (c)\nweakly attractive case. Strongly attractive (\u03b3 = 0.5) was qualitatively similar to (c) and omitted\nfor space.4 We see in (a) that lifted inference with higher order terms signi\ufb01cantly outperforms the\nfully relaxed (\u201cNo Join\u201d) method. In the attractive case (c) higher order performs slightly worse. We\nbelieve this is because the cheaper inference method builds approximations of \ufb01ner resolution. For\npanel (c), by the end, Blue performed 31 splits, Red 60 splits, and Green 73 splits.\n\n4Bumps in the curves are due to optimization initialization. Numerical issues arise when power-sum weights\n\nare small. Hence at the beginning of each optimization we \ufb02oor them at 10\u22124.\n\n8\n\n100101102Inference time (seconds)-1200-1000-800-600-400-2000200logZ Upper BoundiBound=5iBound=2No Join100101102Inference time (seconds)-40-20020406080100120140160logZ Upper Bound10-1100101102Inference time (seconds)361036203630364036503660367036803690logZ Upper Bound\fAcknowledgements\n\nThis work is sponsored in part by NSF grants IIS-1526842, IIS-1254071, by the United States Air\nForce under Contract No. FA9453-16-C-0508, and DARPA Contract No. W911NF-18-C-0015.\n\nReferences\n[1] C. Berkholz, P. Bonsma, and M. Grohe. Tight lower and upper bounds for the complexity of canonical\n\ncolour re\ufb01nement. In European Symposium on Algorithms, pages 145\u2013156. Springer, 2013.\n\n[2] G. V. d. Broeck and M. Niepert. Lifted probabilistic inference for asymmetric graphical models. arXiv\n\npreprint arXiv:1412.0315, 2014.\n\n[3] H. B. Bui, T. N. Huynh, and R. de Salvo Braz. Exact lifted inference with distinct soft evidence on every\n\nobject. In AAAI, 2012.\n\n[4] H. H. Bui, T. N. Huynh, and S. Riedel. Automorphism groups of graphical models and lifted variational\n\ninference. arXiv preprint arXiv:1207.4814, 2012.\n\n[5] H. H. Bui, T. N. Huynh, and D. Sontag. Lifted tree-reweighted variational inference. arXiv preprint\n\narXiv:1406.4200, 2014.\n\n[6] R. Dechter. Bucket elimination: A unifying framework for reasoning. Arti\ufb01cial Intelligence, 113(1-2):41\u2013\n\n85, 1999.\n\n[7] R. Dechter and I. Rish. A scheme for approximating probabilistic inference. In Proc. Uncertainty in\n\nArti\ufb01cial Intelligence, pages 132\u2013141. Morgan Kaufmann Publishers Inc., 1997.\n\n[8] S. Forouzan and A. T. Ihler. Incremental region selection for mini-bucket elimination bounds. In UAI,\n\npages 268\u2013277, 2015.\n\n[9] N. Gallo and A. Ihler. Lifted generalized dual decomposition. In AAAI. AAAI Press, 2018.\n[10] Q. Liu and A. T. Ihler. Bounding the partition function using holder\u2019s inequality. In ICML-11, pages\n\n849\u2013856, 2011.\n\n[11] M. Mladenov, B. Ahmadi, and K. Kersting. Lifted linear programming. In AISTATS, pages 788\u2013797, 2012.\n[12] M. Mladenov, A. Globerson, and K. Kersting. Lifted message passing as reparametrization of graphical\n\nmodels. In UAI, pages 603\u2013612, 2014.\n\n[13] M. Mladenov and K. Kersting. Equitable partitions of concave free energies. In UAI, pages 602\u2013611, 2015.\n[14] M. Mladenov, K. Kersting, and A. Globerson. Ef\ufb01cient lifting of MAP LP relaxations using k-locality. In\n\nAISTATS, pages 623\u2013632, 2014.\n\n[15] D. Poole. First-order probabilistic inference. In IJCAI, volume 3, pages 985\u2013991, 2003.\n[16] M. Richardson and P. Domingos. Markov logic networks. Machine learning, 62(1):107\u2013136, 2006.\n[17] P. Sen, A. Deshpande, and L. Getoor. Bisimulation-based approximate lifted inference. In Proceedings of\n\nthe twenty-\ufb01fth conference on uncertainty in arti\ufb01cial intelligence, pages 496\u2013505. AUAI Press, 2009.\n\n[18] P. Singla and P. M. Domingos. Lifted \ufb01rst-order belief propagation. In AAAI, volume 8, pages 1094\u20131099,\n\n2008.\n\n[19] P. Singla, A. Nath, and P. M. Domingos. Approximate lifting techniques for belief propagation. In AAAI,\n\npages 2497\u20132504, 2014.\n\n[20] D. Smith, P. Singla, and V. Gogate.\n\narXiv:1606.09637, 2016.\n\nLifted region-based belief propagation.\n\narXiv preprint\n\n[21] N. Taghipour, D. Fierens, J. Davis, and H. Blockeel. Lifted variable elimination with arbitrary constraints.\n\nIn International Conference on Arti\ufb01cial Intelligence and Statistics, pages 1194\u20131202, 2012.\n\n[22] G. Van den Broeck and A. Darwiche. On the complexity and approximation of binary evidence in lifted\n\ninference. In Advances in Neural Information Processing Systems, pages 2868\u20132876, 2013.\n\n[23] D. Venugopal and V. Gogate. Evidence-based clustering for scalable inference in Markov logic. In Joint\nEuropean Conference on Machine Learning and Knowledge Discovery in Databases, pages 258\u2013273.\nSpringer, 2014.\n\n[24] M. J. Wainwright, M. I. Jordan, et al. Graphical models, exponential families, and variational inference.\n\nFoundations and Trends R(cid:13) in Machine Learning, 1(1\u20132):1\u2013305, 2008.\n\n9\n\n\f", "award": [], "sourceid": 6609, "authors": [{"given_name": "Nicholas", "family_name": "Gallo", "institution": "UC Irvine"}, {"given_name": "Alexander", "family_name": "Ihler", "institution": "UC Irvine"}]}