{"title": "Fast Lifted MAP Inference via Partitioning", "book": "Advances in Neural Information Processing Systems", "page_first": 3240, "page_last": 3248, "abstract": "Recently, there has been growing interest in lifting MAP inference algorithms for Markov logic networks (MLNs). A key advantage of these lifted algorithms is that they have much smaller computational complexity than propositional algorithms when symmetries are present in the MLN and these symmetries can be detected using lifted inference rules. Unfortunately, lifted inference rules are sound but not complete and can often miss many symmetries. This is problematic because when symmetries cannot be exploited, lifted inference algorithms ground the MLN, and search for solutions in the much larger propositional space. In this paper, we present a novel approach, which cleverly introduces new symmetries at the time of grounding. Our main idea is to partition the ground atoms and force the inference algorithm to treat all atoms in each part as indistinguishable. We show that by systematically and carefully refining (and growing) the partitions, we can build advanced any-time and any-space MAP inference algorithms. Our experiments on several real-world datasets clearly show that our new algorithm is superior to previous approaches and often finds useful symmetries in the search space that existing lifted inference rules are unable to detect.", "full_text": "Fast Lifted MAP Inference via Partitioning\n\nSomdeb Sarkhel\n\nThe University of Texas at Dallas\n\nParag Singla\nI.I.T. Delhi\nAbstract\n\nVibhav Gogate\n\nThe University of Texas at Dallas\n\nRecently, there has been growing interest in lifting MAP inference algorithms for\nMarkov logic networks (MLNs). A key advantage of these lifted algorithms is that\nthey have much smaller computational complexity than propositional algorithms\nwhen symmetries are present in the MLN and these symmetries can be detected\nusing lifted inference rules. Unfortunately, lifted inference rules are sound but\nnot complete and can often miss many symmetries. This is problematic because\nwhen symmetries cannot be exploited, lifted inference algorithms ground the MLN,\nand search for solutions in the much larger propositional space. In this paper, we\npresent a novel approach, which cleverly introduces new symmetries at the time of\ngrounding. Our main idea is to partition the ground atoms and force the inference\nalgorithm to treat all atoms in each part as indistinguishable. We show that by\nsystematically and carefully re\ufb01ning (and growing) the partitions, we can build\nadvanced any-time and any-space MAP inference algorithms. Our experiments\non several real-world datasets clearly show that our new algorithm is superior to\nprevious approaches and often \ufb01nds useful symmetries in the search space that\nexisting lifted inference rules are unable to detect.\n\nMarkov logic networks (MLNs) [5] allow application designers to compactly represent and reason\nabout relational and probabilistic knowledge in a large number of application domains including\ncomputer vision and natural language understanding using a few weighted \ufb01rst-order logic formulas.\nThese formulas act as templates for generating large Markov networks \u2013 the undirected probabilistic\ngraphical model. A key reasoning task over MLNs is maximum a posteriori (MAP) inference, which\nis de\ufb01ned as the task of \ufb01nding an assignment of values to all random variables in the Markov network\nthat has the maximum probability. This task can be solved using propositional (graphical model)\ninference techniques. Unfortunately, these techniques are often impractical because the Markov\nnetworks can be quite large, having millions of variables and features.\nRecently, there has been growing interest in developing lifted inference algorithms [4, 6, 17, 22]\nfor solving the MAP inference task [1, 2, 3, 7, 13, 14, 16, 18, 19]. These algorithms work, as much\nas possible, on the much smaller \ufb01rst-order speci\ufb01cation, grounding or propositionalizing only as\nnecessary and can yield signi\ufb01cant complexity reductions in practice. At a high level, lifted algorithms\ncan be understood as algorithms that identify symmetries in the \ufb01rst-order speci\ufb01cation using lifted\ninference rules [9, 13, 19], and then use these symmetries to simultaneously infer over multiple\nsymmetric objects. Unfortunately, in a vast majority of cases, the inference rules are unable to identify\nseveral useful symmetries (the rules are sound but not complete), either because the symmetries are\napproximate or because the symmetries are domain-speci\ufb01c and do not belong to a known type. In\nsuch cases, lifted inference algorithms partially ground some atoms in the MLN and search for a\nsolution in this much larger partially propositionalized space.\nIn this paper, we propose the following straight-forward yet principled approach for solving this\npartial grounding problem [21, 23]: partition the ground atoms into groups and force the inference\nalgorithm to treat all atoms in each group as indistinguishable (symmetric). For example, consider\na \ufb01rst-order atom R(x) and assume that x can be instantiated to the following set of constants:\n{1, 2, 3, 4, 5}. If the atom possesses the so-called non-shared or single-occurrence symmetry [13, 19],\nthen the lifted inference algorithm will search over only two assignments: all \ufb01ve groundings of R(x)\nare either all true or all false, in order to \ufb01nd the MAP solution. When no identi\ufb01able symmetries\nexist, the lifted algorithm will inef\ufb01ciently search over all possible 32 truth assignments to the 5\n\n1\n\n\fground atoms and will be equivalent in terms of (worst-case) complexity to a propositional algorithm.\nIn our approach, we would partition the domain, say as {{1, 3},{2, 4, 5}}, and search over only\nthe following 4 assignments: all groundings in each part can be either all true or all false. Thus, if\nwe are lucky and the MAP solution is one of the 4 assignments, our approach will yield signi\ufb01cant\nreductions in complexity even though no identi\ufb01able symmetries exist in the problem.\nOur approach is quite general and includes the fully lifted and fully propositional approaches as\nspecial cases. For instance, setting the partition size k to 1 and n respectively where n is the number\nof constants will yield exactly the same solution as the one output by the fully lifted and fully\npropositional approach. Setting k to values other than 1 and n yields a family of inference schemes\nthat systematically explores the regime between these two extremes. Moreover, by controlling the\nsize k of each partition we can control the size of the ground theory, and thus the space and time\ncomplexity of our algorithm.\nWe prove properties and improve upon our basic idea in several ways. First, we prove that our\nproposed approach yields a consistent assignment that is a lower-bound on the MAP value. Second,\nwe show how to improve the lower bound and thus the quality of the MAP solution by systematically\nre\ufb01ning the partitions. Third, we show how to further improve the complexity of our re\ufb01nement\nprocedure by exploiting the exchangeability property of successive re\ufb01nements. Speci\ufb01cally, we show\nthat the exchangeable re\ufb01nements can be arranged on a lattice, which can then be searched via a\nheuristic search procedure to yield an ef\ufb01cient any-time, any-space algorithm for MAP inference.\nFinally, we demonstrate experimentally that our method is highly scalable and yields close to optimal\nsolutions in a fraction of the time as compared to existing approaches. In particular, our results show\nthat for even small values of k (k bounds the partition size), our algorithm yields close to optimal\nMAP solutions, clearly demonstrating the power of our approach.\n1 Notation And Background\nPartition of a Set. A collection of sets C is a partition of a set X if and only if each set in C is\nnonempty, pairwise disjoint and the union of all sets equals X. The sets in C are called the cells or\nparts of the partition. If two elements, a, b, of the set appear in a same cell of a partition \u03c1 we denote\nthem by the operator \u2018\u223c\u03c1\u2019, i.e., a \u223c\u03c1 b. A partition \u03b1 of a set X is a re\ufb01nement of a partition \u03c1 of\nX if every element of \u03b1 is a subset of some element of \u03c1. Informally, this means that \u03b1 is a further\nfragmentation of \u03c1. We say that \u03b1 is \ufb01ner than \u03c1 (or \u03c1 is coarser than \u03b1) and denote it as \u03b1 \u227a \u03c1. We\nwill also use the notation \u03b1 (cid:22) \u03c1 to denote that either \u03b1 is \ufb01ner than \u03c1, or \u03b1 is the same as \u03c1. For\nexample, let \u03c1 = {{1, 2},{3}} be a partition of the set X = {1, 2, 3} containing two cells {1, 2} and\n{3} and let \u03b1 = {{1},{2},{3}} be another partition of X, then \u03b1 is a re\ufb01nement \u03c1, namely, \u03b1 \u227a \u03c1.\nFirst-order logic. We will use a strict subset of \ufb01rst-order logic that has no function symbols,\nequality constraints or existential quanti\ufb01ers. Our subset consists of (1) constants, denoted by upper\ncase letters (e.g., X, Y , etc.), which model objects in the domain; (2) logical variables, denoted\nby lower case letters (e.g., x, y, etc.) which can be substituted with objects, (3) logical operators\nsuch as \u2228 (disjunction), \u2227 (conjunction), \u21d4 (implication) and \u21d2 (equivalence), (4) universal (\u2200)\nand existential (\u2203) quanti\ufb01ers and (5) predicates which model properties and relationships between\nobjects. A predicate consists of a predicate symbol, denoted by typewriter fonts (e.g., Friends, R,\netc.), followed by a parenthesized list of arguments. A term is a logical variable or a constant. A literal\nis a predicate or its negation. A formula in \ufb01rst order logic is an atom (a predicate), or any complex\nsentence that can be constructed from atoms using logical operators and quanti\ufb01ers. For example, \u2200x\nSmokes(x) \u21d2 Asthma(x) is a formula. A clause is a disjunction of literals. Throughout, we will\nassume that all formulas are clauses and their variables are standardized apart.\nA ground atom is an atom containing only constants. A ground formula is a formula obtained by\nsubstituting all of its variables with a constant, namely a formula containing only ground atoms.\nFor example, the groundings of \u00ac Smokes(x) \u2228 Asthma(x) where \u2206x = {Ana, Bob}, are the two\npropositional formulas: \u00ac Smokes(Ana) \u2228 Asthma(Ana) and \u00ac Smokes(Bob) \u2228 Asthma(Bob).\nMarkov logic. A Markov logic network (MLN) is a set of weighted clauses in \ufb01rst-order logic. We\nwill assume that all logical variables in all formulas are universally quanti\ufb01ed (and therefore we will\ndrop the quanti\ufb01ers from all formulas), are typed and can be instantiated to a \ufb01nite set of constants\n(for a variable x, this set will be denoted by \u2206x) and there is a one-to-one mapping between the\nconstants and objects in the domain (Herbrand interpretations). Note that the class of MLNs we\nare assuming is not restrictive at all because almost all MLNs used in application domains such as\n\n2\n\n\f(cid:32)(cid:88)\n\ni\n\n(cid:33)\n\n(cid:88)\n\ni\n\nnatural language processing and the Web fall in this class. Given a \ufb01nite set of constants, the MLN\nrepresents a (ground) Markov network that has one random variable for each ground atom in its\nHerbrand base and a weighted feature for each ground clause in the Herbrand base. The weight\nof each feature is the weight of the corresponding \ufb01rst-order clause. Given a world \u03c9, which is a\ntruth assignment to all the ground atoms, the Markov network represents the following probability\ni wiN (fi, \u03c9)) where (fi, wi) is a weighted \ufb01rst-order formula,\n\ndistribution P (\u03c9) = Z\u22121 exp((cid:80)\n\nN (fi, \u03c9) is the number of true groundings of fi in \u03c9 and Z is the partition function.\nFor simplicity, we will assume that the MLN is in normal form, which is de\ufb01ned as an MLN that\nsatis\ufb01es the following two properties: (i) there are no constants in any formula; and (ii) if two distinct\natoms of predicate R have variables x and y as the same argument of R, then \u2206x = \u2206y. Because of\nthe second condition, in normal MLNs, we can associate domains with each argument of a predicate.\nLet iR denote the i-th argument of predicate R and let D(iR) denote the number of elements in the\ndomain of iR. We will also assume that all domains are of the form {1, ..., D(iR)}. Since domain size\nis \ufb01nite, any domain can be converted to this form.\nA common optimization inference task over MLNs is \ufb01nding the most probable state of the world \u03c9,\nthat is \ufb01nding a complete assignment to all ground atoms which maximizes the probability. Formally,\n\narg max\n\n\u03c9\n\nPM(\u03c9) = arg max\n\n\u03c9\n\n1\n\nZ(M)\n\nexp\n\nwiN (fi, \u03c9)\n\n= arg max\n\n\u03c9\n\nwiN (fi, \u03c9)\n\n(1)\n\nAlgorithm 1 LMAP(MLN M)\n\nFrom Eq. (1), we can see that the MAP problem reduces to \ufb01nding a truth assignment that max-\nimizes the sum of weights of satis\ufb01ed clauses. Therefore, any weighted satis\ufb01ability solver such\nas MaxWalkSAT [20] can used to solve it. However, MaxWalkSAT is a propositional solver and is\nunable to exploit symmetries in the \ufb01rst-order representation, and as a result can be quite inef\ufb01cient.\nAlternatively, the MAP problem can be solved in a lifted manner by leveraging various lifted inference\nrules such as the decomposer, the binomial rule [6, 9, 22] and the recently proposed single occurrence\nrule [13, 19]. A schematic of such a procedure is given in Algorithm 1. Before presenting the\nalgorithm, we will describe some required de\ufb01nitions. Let iR denote the i-th argument of predicate R.\nGiven an MLN, two arguments iR and jS of its predicates R and S respectively are called uni\ufb01able\nif they share a logical variable in an MLN formula. Being symmetric and transitive, the uni\ufb01able\nrelation splits the arguments of all the predicates into a set of domain equivalence classes.\nExample 1. Consider a normal MLN M having two weighted formulas (R(x) \u2228 S(x, y), w1) and\n(R(z) \u2228 T(z), w2). Here, we have two sets of domain equivalence classes {1R, 1S, 1T} and {2S}.\nAlgorithm 1 has \ufb01ve recursive steps and returns\nthe optimal MAP value. The \ufb01rst two lines are\nthe base case and the simpli\ufb01cation step, in\nwhich the MLN is simpli\ufb01ed by deleting redun-\ndant formulas, rewriting predicates by remov-\ning constants (so that lifted conditioning can be\napplied) and assigning values to ground atoms\nwhose values can be inferred using assignments\nmade so far. The second step is the propositional\ndecomposition step in which the algorithm re-\ncurses over disjoint MLNs (if any) and returns\ntheir sum. In the lifted decomposition step, the\nalgorithm \ufb01nds a domain equivalence class U\nsuch that in the MAP solution all ground atoms\nof the predicates that have elements of U as ar-\nguments are either all true or all false. To \ufb01nd\nsuch a class, rules given in [9, 13, 19] can be\nused. In the algorithm, M|U denotes the MLN\nobtained by setting the domain of all elements\nof U to 1 and updating the formula weights accordingly. In the lifted conditioning step, if there is\nan atom having just one argument (singleton atom), then the algorithm partitions the possible truth\nassignments to groundings of A such that, in each part all truth assignments have the same number\nof true atoms. In the algorithm, M|(A, i) denotes the MLN obtained by setting i groundings of A\nto true and the remaining to false. w(A, i) is the total weight of ground formulas satis\ufb01ed by the\n\nreturn maxD(1A)\n// Partial grounding\nHeuristically select a domain equivalence class U\nand ground it yielding a new MLN M(cid:48)\nreturn LMAP(M(cid:48))\n\n// base case\nif M is empty return 0\nSimplify(M)\n// Propositional decomposition\nif M has disjoint MLNs M1, . . . , Mk then\n\n// Lifted decomposition\nif M has a liftable domain equivalence class U then\nreturn LMAP(M|U )\n\n// Lifted conditioning\nif M has a singleton atom A then\n\ni=0 LMAP(M|(A, i)) + w(A, i)\n\nreturn(cid:80)k\n\ni=1 LMAP(Mi)\n\n3\n\n\fassignment. The \ufb01nal step in LMAP is the partial grounding step and is executed only when the\nalgorithm is unable to apply lifted inference rules. In this step, the algorithm heuristically selects a\ndomain equivalence class U and grounds it completely. For example,\nExample 2. Consider an MLN with two formulas: R(x, y) \u2228 S(y, z), w1 and S(a, b) \u2228 T(a, c), w2.\nLet D(2R) = 2. After grounding the equivalence class {2R, 1S, 1T}, we get an MLN having four\nformulas: (R(x1, 1)\u2228S(1, z1), w1), (R(x2, 2)\u2228S(1, z2), w1), (S(1, b1)\u2228T(1, c1), w2) and (S(2, b2)\u2228\nT(2, c2), w2).1\n\nAlgorithm 2 Constrained-Ground\n(MLN M, Size k and domain equivalence class U)\n\nM(cid:48) = M\nCreate a partition \u03c0 of size k of \u2206iR where iR \u2208 U\nforeach predicate R such that \u2203 iR \u2208 U do\n\nreturn M(cid:48)\n\nforeach cell \u03c0j of \u03c0 do\n\nAdd all possible hard formulas of the form\nR(x1, . . . , xr) \u21d4 R(y1, . . . , yr)\nsuch that xi = yi if iR /\u2208 U and\nxi = Xa, yi = Xb if iR \u2208 U where Xa, Xb \u2208 \u03c0j.\n\n2 Scaling up the Partial Grounding Step using Set Partitioning\nPartial grounding often yields a much big-\nger MLN than the original MLN and is the\nchief reason for the inef\ufb01ciency and poor\nscalability of Algorithm LMAP. To address\nthis problem, we propose a novel approach\nto speed up inference by adding additional\nconstraints to the existing lifted MAP for-\nmulation. Our idea is as follows: reduce the\nnumber of ground atoms by partitioning them\nand treating all atoms in each part as indistin-\nguishable. Thus, instead of introducing O(tn)\nnew ground atoms where t is the cardinality\nof the domain equivalence class and n is the number of constants, our approach will only introduce\nO(tk) ground atoms where k << n.\nOur new, approximate partial grounding method (which will replace the partial grounding step in\nAlgorithm 1) is formally described in Algorithm 2. The algorithm takes as input an MLN M, an\ninteger k > 0 and a domain equivalence class U as input and outputs a new MLN M(cid:48). The algorithm\n\ufb01rst partitions the domain of the class U into k cells, yielding a partition \u03c0. Then, for each cell \u03c0j of\n\u03c0 and each predicate R such that one or more of its arguments is in U, the algorithm adds all possible\nconstraints of the form R(x1, . . . , xr) \u21d4 R(y1, . . . , yr) such that for each i: (1) we add the equality\nconstraint between the logical variables xi and yi if the i-th argument of the predicate is not in U\nand (1) set xi = Xa and yi = Xb if i-th argument of R is in U where Xa, Xb \u2208 \u03c0j. Since adding\nconstraints restricts feasible solutions to the optimization problem, it is easy to show that:\nProposition 1. Let M(cid:48) = Constrain-Ground(M, k), where M is an MLN and k > 0 is an integer,\nbe the MLN used in the partial grounding step of Algorithm 1 (instead of the partial grounding step\ndescribed in the algorithm). Then, the MAP value returned by the modi\ufb01ed algorithm will be smaller\nthan or equal to the one returned by Algorithm 1.\n\nThe following example demonstrates how Algorithm 2 constructs a new MLN.\nExample 3. Consider the MLN in Example 2. Let {{1, D2,R}} be a 1-partition of the domain of\nU. Then, after applying Algorithm 2, the new MLN will have the following three hard formulas in\naddition to the formulas given in Example 2: (1) R(x3, 1) \u21d4 R(x3, 2), (2) S(1, x4) \u21d4 S(2, x4) and\n(3) T(1, x5) \u21d4 T(2, x5).\nAlthough, adding constraints reduces the search space of the MAP problem, Algorithm 2 still needs\nto ground the MLN. This can be time consuming. Alternatively, we can group indistinguishable\natoms together without grounding the MLN using the following de\ufb01nition:\nDe\ufb01nition 1. Let U be a domain equivalence class and let \u03c0 be its partition. Two ground atoms\nR(x1, ..., xr) and R(y1, ..., yr) of a predicate R such that \u2203iR \u2208 U are equivalent if xi = yi if iR /\u2208 U\nand xi = Xa, yi = Xb if iR \u2208 U where Xa, Xb \u2208 \u03c0j. We denote this by R(x1, ..., xr)\u22a5\u03c0R(y1, ..., yr).\nNotice that the relation \u22a5\u03c0 is symmetric and re\ufb02exive. Thus, we can group all the ground atoms\ncorresponding to the transitive closure of this relation, yielding a \u201cmeta ground atom\u201d such that if\nthe meta atom is assigned to true (false), all the ground atoms in the transitive closure will be true\n(false). This yields the partition-ground algorithm described as Algorithm 3. The algorithm starts\n\n1The constants can be removed by renaming the predicates yielding a normal MLN. For example, we can\n\nrename R(x1, 1) as R1(x1). This renaming occurs in the simpli\ufb01cation step.\n\n4\n\n\fby creating a k partition of the domain of U. It then updates the domain of U so that it only contains\nk values, grounds all arguments of predicates that are in the set U and updates the formula weights\nappropriately. The formula weights should be updated because, when the domain is compressed,\nseveral ground formulas are replaced by just one ground formula. Intuitively, if t (partially) ground\nformulas having weight w are replaced by one (partially) ground formula (f, w(cid:48)) then w(cid:48) should be\nequal to wt. The two for loops in Algorithm 3 accomplish this. We can show that:\nProposition 2. The MAP value output by replacing the partial grounding step in Algorithm 1 with\nAlgorithm Partition-Ground, is the same as the one output by replacing the the partial grounding step\nin Algorithm 1 with Algorithm Constrained-Ground.\n\nLet f be the formula in M from which f(cid:48) was derived\nforeach logical variable in f that was substituted\nby the j-th value in \u2206iR to yield f(cid:48) do\nw(cid:48) = w(cid:48) \u00d7 |\u03c0j| where \u03c0j is the j-th cell of \u03c0\n\nreturn M(cid:48)\n\nAlgorithm 3 Partition-Ground\n(MLN M, Size k and domain equivalence class U)\n\nM(cid:48) = M\nCreate a partition \u03c0 of size k of \u2206iR where iR \u2208 U\nUpdate the domain \u2206iR to {1, . . . , k} in M(cid:48)\nGround all predicates R such that iR \u2208 U\nforeach formula (f(cid:48), w(cid:48)) in M(cid:48) such that f\ncontains an atom of R where iR \u2208 U do\n\nThe key advantage using Algorithm Partition-\nGround is that the lifted algorithm (LMAP)\nwill have much smaller space complexity\nthan the one using Algorithm Constrained-\nGround. Speci\ufb01cally, unlike the latter, which\nyields O(n|U|) ground atoms (assuming\neach predicate has only one argument in\nU) where n is the number of constants in\nthe domain of U, the former generates only\nO(k|U|) ground atoms, where k << n.\nThe following example illustrates how al-\ngorithm partition-ground constructs a new\nMLN.\nExample 4. Consider an MLN M, with two formulas: (R(x, y) \u2228 S(y, z), w1) and (S(a, b) \u2228\nT(a, c), w2). Let D(2R) = 3 and \u03c0 = {{1, 2},{3}} = {\u03bd1, \u03bd2}. After grounding 2R with respect to \u03c0,\nwe get an MLN, M(cid:48), having four formulas: (R\u03bd1 (x1) \u2228 S\u03bd1 (z1), 2w1), (R\u03bd2 (x2) \u2228 S\u03bd2 (z2), w1),\n(S\u03bd1 (b1) \u2228 T\u03bd1 (c1), 2w2) and (S\u03bd2(b2) \u2228 T\u03bd2 (c2), w2). The total weight of grounding in M is\n(3w1D(1R)D(2S) + 3w2D(2T)D(2S)) which is the same as in M(cid:48).\nThe following example illustrates how the algorithm constructs a new MLN in presence of self-joins.\nExample 5. Consider an MLN, M, with the single formula: \u00acR(x, y) \u2228 R(y, x), w. Let D(1R) =\nD(2R) = 3 and \u03c0 = {{1, 2},{3}} = {\u03bd1, \u03bd2}. After grounding 1R (and also on D(2R), as they belong\nto the same domain equivalence class) with respect to \u03c0, we get an MLN, M(cid:48), having following four\nformulas: (R\u03bd1,\u03bd1 \u2228 R\u03bd1,\u03bd1 , 4w), (R\u03bd1,\u03bd2 \u2228 R\u03bd2,\u03bd1, 2w), (R\u03bd2,\u03bd1 \u2228 R\u03bd1,\u03bd2, 2w) and (R\u03bd2,\u03bd2 \u2228 R\u03bd2,\u03bd2, w).\n2.1 Generalizing the Partition Grounding Approach\nAlgorithm Partition-Ground allows us to group the equivalent atoms with respect to a partition and\nhas much smaller space complexity and time complexity than the partial grounding strategy described\nin Algorithm 1. However, it yields a lower bound on the MAP value. In this section, we show how to\nimprove the lower bound using re\ufb01nements of the partition. The basis of our generalization is the\nfollowing theorem:\nTheorem 1. Given two partitions \u03c0 and \u03c6 of U such that \u03c6 (cid:22) \u03c0, the MAP value of the partially\nground MLN with respect to \u03c6 is less than or equal to the MAP value of the partially ground MLN\nwith respect to \u03c0 .\n\nProof. Sketch: Since the partition \u03c6 is a \ufb01ner re\ufb01nement of \u03c0, any candidate MAP assignment corre-\nsponding to the MLN obtained via \u03c6 already includes all the candidate assignments corresponding to\nthe MLN obtained via \u03c0, and since the MAP value of both of these MLNs are a lower bound of the\noriginal MAP value, the theorem follows.\n\nWe can use Theorem 1 to devise a new any-time MAP algorithm which re\ufb01nes the partitions to get a\nbetter estimate of MAP values. Our approach is presented in Algorithm 4.\nThe algorithm begins by identifying all non-liftable domains, namely domains Ui that will be\npartially grounded during the execution of Algorithm 1, and associating a 1-partition \u03c0i with each\ndomain. Then, until there is timeout, it iterates through the following two steps. First, it runs the\nLMAP algorithm, which uses the pair (Ui, \u03c0i) in Algorithm partition-ground during the i-th partial\n\n5\n\n\fgrounding step, yielding a MAP solution \u00b5. Second, it heuristically selects a partition \u03c0j and re\ufb01nes\nit. From Theorem 1, it is clear that as the number of iterations is increased, the MAP solution will\neither improve or remain the same. Thus, Algorithm Re\ufb01ne-MAP is an anytime algorithm.\nAlternatively, we can also devise an any-space\nalgorithm using the following idea. We will\n\ufb01rst determine k, the maximum size of a parti-\ntion that we can \ufb01t in the memory. As different\npartitions of size k will give us different MAP\nvalues, we can search through them to \ufb01nd the\nbest possible MAP solution. A drawback of\nthe any-space approach is that it explores a\nprohibitively large search space. In particular,\nthe number of possible partitions of size k for\n\nLet U = {Ui} be the non-liftable domains\nSet \u03c0i = {\u2206jR} where jR \u2208 Ui for all Ui \u2208 U\n\u00b5 = \u2212\u221e\nwhile timeout has not occurred do\n\n\u00b5 =LMAP(M )\n/* LMAP uses the pair (Ui, \u03c0i) and Algorithm\npartition-ground for its i-th partial grounding step. */\nHeuristically select a partition \u03c0j and re\ufb01ne it\n\nAlgorithm 4 Re\ufb01ne-MAP(MLN M)\n\nthe so called Stirling numbers of the second\nkind which grows exponentially with n. (The total number of partitions of a set is given by the Bell\n\n(cid:9)). Clearly, searching over all the possible partitions of size k is not practical.\n\nLuckily, we can exploit symmetries in the MLN representation to substantially reduce the number of\npartitions we have to consider, since many of them will give us the same MAP value. Formally,\nTheorem 2. Given two k-partitions \u03c0 = {\u03c01, . . . , \u03c0k} and \u03c6 = {\u03c61, . . . , \u03c6k} of U such that\n|\u03c0i| = |\u03c6i| for all i, the MAP value of the partially ground MLN with respect to \u03c0 is equal to the\nMAP value of the partially ground MLN with respect to \u03c6 .\n\na set of size n (denoted by(cid:8)n\nnumber, Bn =(cid:80)n\n\n(cid:8)n\n\n(cid:9)) is given by\n\nreturn \u00b5\n\nk=1\n\nk\n\nk\n\nProof. Sketch: A formula f, when ground on an argument iR with respect to a partition \u03c0 creates |\u03c0|\ncopies of the formula. Since |\u03c6| = |\u03c0| = k grounding on iR with respect to \u03c6 also creates the same\nnumber of formulas which are identical upto a renaming of constants. Furthermore, since |\u03c0i| = |\u03c6i|\n(each of their parts have identical cardinality) and as weight of a ground formula is determined by\nthe cell sizes (see Algorithm Partition-Ground) the ground formulas obtained using \u03c6 and \u03c0 will\nhave same weights as well. As a result, MLNs obtained by grounding on any argument iR with\nrespect to \u03c6 and \u03c0 are indistinguishable (subject to renaming of variables and constants) and the\nproof follows.\n\nFrom Theorem 2, it follows that the number\nof elements in cells and the number of cells\nof a partition is suf\ufb01cient to de\ufb01ne a partially\nground MLN with respect to that partition.\nConsecutive re\ufb01nements of such partitions\nwill thus yield a lattice, which we will refer to\nas Exchangeable Partition Lattice. The term\n\u2018exchangeable\u2019 refers to the fact that two parti-\ntions containing same number of elements of\ncells and same number of cells are exchange-\nable with each other (in terms of MAP so-\nlution quality). Figure 1 shows the Exchangeable Partition Lattice corresponding to the domain\n{1, 2, 3, 4}. If we do not use exchangeability, the number of partitions in the lattice would have been\n\n(cid:9) = 1 + 7 + 6 + 1 = 15. On the other hand, the lattice has 5 elements.\n\nFigure 1: Exchangeable Partition Lattice corresponding\nto the domain {1, 2, 3, 4}.\n\nB4 =(cid:8)4\n\n(cid:9) +(cid:8)4\n\n(cid:9) +(cid:8)4\n\n(cid:9) +(cid:8)4\n\n1\n\n2\n\n3\n\n4\n\nDifferent traversal strategies of this exchangeable partition lattice will give rise to different lifted\nMAP algorithms. For example, a greedy depth-\ufb01rst traversal of the lattice yields Algorithm 4. We can\nalso explore the lattice using systematic depth-limited search and return the maximum solution found\nfor a particular depth limit d. This yields an improved version of our any-space approach described\nearlier. We can even combine the two strategies by traversing the lattice in some heuristic order. For\nour experiments, we use greedy depth-limited search, because full depth-limited search was very\nexpensive. Note that although our algorithm assumes normal MLNs, which are pre-shattered, we can\neasily extend it to use shattering as needed [10]. Moreover by clustering evidence atoms together\n[21, 23] we can further reduce the size of the shattered theory [4].\n\n6\n\n{{1},{2},{3},{4}}{{1},{2},{3,4}}{{1},{2,3,4}}{{1,2},{3,4}}{{1,2,3,4}}\f3 Experiments\nWe implemented our algorithm on top of the lifted MAP algorithm of Sarkhel et al. [18], which\nreduces lifted MAP inference to an integer polynomial program (IPP). We will call our algorithm\nP-IPP (which stands for partition-based IPP). We performed two sets of experiments. The \ufb01rst set\nmeasures the impact of increasing the partition size k on the quality of the MAP solution output\nby our algorithm. The second set compares the performance and scalability of our algorithm with\nseveral algorithms from literature. All of our experiments were run on a third generation i7 quad-core\nmachine having 8GB RAM.\n(1) An MLN which we call Equiva-\nWe used following \ufb01ve MLNs in our experimental study:\nlence that consists of following three formulas: Equals(x,x), Equals(x,y) \u2192 Equals(y,x), and\nEquals(x,y) \u2227 Equals(y,z) \u2192 Equals(x,z); (2) The Student MLN from [18, 19], consisting\nof four formulas and three predicates; (3) The Relationship MLN from [18], consisting of four\nformulas and three predicates; (4) WebKB MLN [11] from the Alchemy web page, consisting of\nthree predicates and seven formulas; and (5) Citation Information-Extraction (IE) MLN from the\nAlchemy web page [11], consisting of \ufb01ve predicates and fourteen formulas .\nWe compared the solution quality and scalability of our approach with the following algorithms\nand systems: Alchemy (ALY) [11], Tuffy (TUFFY) [15], ground inference based on integer linear\nprogramming (ILP) and the IPP algorithm of Sarkhel et al. [18]. Alchemy and Tuffy are two state-\nof-the-art open source software packages for learning and inference in MLNs. Both of them ground\nthe MLN and then use an approximate solver, MaxWalkSAT [20] to compute the MAP solution.\nUnlike Alchemy, Tuffy uses clever Database tricks to speed up computation and in principle can be\nmuch more scalable than Alchemy. ILP is obtained by converting the MAP problem over the ground\nMarkov network to an Integer Linear Program. We ran each algorithm on the aforementioned MLNs\nfor varying time-bounds and recorded the solution quality, which is measured using the total weight\nof the false clauses in the (approximate) MAP solution, also referred to as the cost. Smaller the cost,\nbetter the MAP solution. For a fair comparison, we used a parallelized Integer Linear Programming\nsolver called Gurobi [8] to solve the integer linear programming problems generated by our algorithm\nas well as by other competing algorithms.\nFigure 2 shows our experimental results. Note that if the curve for an algorithm is not present in a plot,\nthen it means that the corresponding algorithm ran out of either memory or time on the MLN and did\nnot output any solution. We observe that Tuffy and Alchemy are the worst performing systems both in\nterms of solution quality and scalability. ILP scales slightly better than Tuffy and Alchemy. However,\nit is unable to handle MLNs having more than 30K clauses. We can see that our new algorithm P-IPP,\nrun as an anytime scheme, by re\ufb01ning partitions, not only \ufb01nds higher quality MAP solutions but also\nscales better in terms of time complexity than IPP. In particular, IPP could not scale to the equivalence\nMLN having roughly 1 million ground clauses and the relation MLN having roughly 125.8M ground\nclauses. The reason is that these MLNs have self-joins (same predicate appearing multiple times in\na formula), which IPP is unable to lift. On the other hand, our new approach is able to \ufb01nd useful\napproximate symmetries in these hard MLNs.\nTo measure the impact of varying the partition size on the MAP solution quality, we conducted the\nfollowing experiment. We \ufb01rst ran the IPP algorithm until completion to compute the optimum MAP\nvalue. Then, we ran our algorithm multiple times, until completion as well, and recorded the solution\nquality achieved in each run for different partition sizes. Figure 3 plots average cost across various\nruns as a function of k (the error bars show the standard deviation). For brevity, we only show results\nfor the IE and Equivalence MLNs. The optimum solutions for the three MLNs were found in (a) 20\nminutes, (b) 6 hours and (c) 8 hours respectively. On the other hand, our new approach P-IPP yields\nclose to optimal solutions in a fraction of the time, and for relatively small values of k (\u2248 5 \u2212 10).\n4 Summary and Future Work\nLifted inference techniques have gained popularity in recent years, and have quickly become the\napproach of choice to scale up inference in MLNs. A pressing issue with existing lifted inference\ntechnology is that most algorithms only exploit exact, identi\ufb01able symmetries and resort to grounding\nor propositional inference when such symmetries are not present. This is problematic because\ngrounding can blow up the search space. In this paper, we proposed a principled, approximate\napproach to solve this grounding problem. The main idea in our approach is to partition the ground\natoms into a small number of groups and then treat all ground atoms in a group as indistinguishable\n\n7\n\n\f(a) IE(3.2K,1M)\n\n(b) IE(380K,15.6B)\n\n(c) IE(3.02M,302B)\n\n(d) Equivalence(100,1.2K)\n\n(e) Equivalence(900,28.8K)\n\n(f) Equivalence(10K,1.02M)\n\n(g) WebKb(3.2K,1M)\n\n(h) Student(3M,1T)\n\n(i) Relation(750K,125.8M)\n\nFigure 2: Cost vs Time: Cost of unsatis\ufb01ed clauses(smaller is better) vs time for different domain sizes.\nNotation used to label each \ufb01gure: MLN(numvariables, numclauses). Note: the quantities reported are for ground\nMarkov network associated with the MLN. Standard deviation is plotted as error bars.\n\n(a) IE(3.2K,1M)\n\n(b) IE(82.8K,731.6M)\n\n(c) Equivalence(100,1.2K)\n\nFigure 3: Cost vs Partition Size: Notation used to label each \ufb01gure: MLN(numvariables, numclauses).\n\n(from each other). This simple idea introduces new, approximate symmetries which can help speed-up\nthe inference process. Although our proposed approach is inherently approximate, we proved that it\nhas nice theoretical properties in that it is guaranteed to yield a consistent assignment that is a lower-\nbound on the MAP value. We further described an any-time algorithm which can improve this lower\nbound through systematic re\ufb01nement of the partitions. Finally, based on the exchangeability property\nof the re\ufb01ned partitions, we demonstrated a method for organizing the partitions in a lattice structure\nwhich can be traversed heuristically to yield ef\ufb01cient any-time as well as any-space lifted MAP\ninference algorithms. Our experiments on a wide variety of benchmark MLNs clearly demonstrate the\npower of our new approach. Future work includes connecting this work to the work on Sherali-Adams\nhierarchy [2]; deriving a variational principle for our method [14]; and developing novel branch and\nbound [12] as well as weight learning algorithms based on our partitioning approach.\nAcknowledgments: This work was supported in part by the DARPA Probabilistic Programming for\nAdvanced Machine Learning Program under AFRL prime contract number FA8750-14-C-0005.\n\n8\n\n-10000 0 10000 20000 30000 40000 50000 60000 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsTUFFY ALY P-IPP IPP ILP -5.6e+06-5.4e+06-5.2e+06-5e+06-4.8e+06-4.6e+06-4.4e+06 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsP-IPP IPP -1e+08-9e+07-8e+07-7e+07-6e+07-5e+07-4e+07-3e+07-2e+07-1e+07 0 1e+07 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsP-IPP IPP -400-200 0 200 400 600 800 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsTUFFY ALY P-IPP IPP ILP -10000 0 10000 20000 30000 40000 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsTUFFY ALY P-IPP IPP ILP -340000-320000-300000-280000-260000-240000-220000-200000-180000-160000 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsP-IPP 742 744 746 748 750 752 754 756 758 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsP-IPP 0 1e+06 2e+06 3e+06 4e+06 5e+06 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsP-IPP IPP 24700 24800 24900 25000 25100 25200 25300 0 20 40 60 80 100 120 140 160 180 200CostTime in SecondsP-IPP -3600-3400-3200-3000-2800-2600-2400-2200-2000 2 3 4 5 6 7 8 9 10CostkOptimum P-IPP -700000-600000-500000-400000-300000-200000-100000 2 3 4 5 6 7 8 9 10CostkOptimum P-IPP -340-320-300-280-260-240-220-200-180-160-140 2 3 4 5 6 7 8 9 10CostkOptimum P-IPP \fReferences\n[1] U. Apsel and R. Braman. Exploiting uniform assignments in \ufb01rst-order MPE. In Proceedings of the\n\nTwenty-Sixth AAAI Conference on Arti\ufb01cial Intelligence, pages 74\u201383, 2012.\n\n[2] U. Apsel, K. Kersting, and M. Mladenov. Lifting Relational MAP-LPs Using Cluster Signatures. In\n\nProceedings of the Twenty-Eighth AAAI Conference on Arti\ufb01cial Intelligence, 2014.\n\n[3] H. Bui, T. Huynh, and S. Riedel. Automorphism groups of graphical models and lifted variational inference.\n\nIn Proceedings of the Twenty-Ninth Conference on Uncertainty in Arti\ufb01cial Intelligence, 2013.\n\n[4] R. de Salvo Braz. Lifted First-Order Probabilistic Inference. PhD thesis, University of Illinois, Urbana-\n\nChampaign, IL, 2007.\n\n[5] P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Arti\ufb01cial Intelligence. Morgan &\n\nClaypool, 2009.\n\n[6] V. Gogate and P. Domingos. Probabilistic Theorem Proving. In Proceedings of the Twenty-Seventh\n\nConference on Uncertainty in Arti\ufb01cial Intelligence, pages 256\u2013265. AUAI Press, 2011.\n\n[7] F. Hadiji and K. Kersting. Reduce and Re-Lift: Bootstrapped Lifted Likelihood Maximization for MAP. In\n\nProceedings of the Twenty-Seventh AAAI Conference on Arti\ufb01cial Intelligence, 2013.\n\n[8] Gurobi Optimization Inc. Gurobi Optimizer Reference Manual, 2014.\n[9] A. Jha, V. Gogate, A. Meliou, and D. Suciu. Lifted Inference from the Other Side: The tractable Features.\n\nIn Proceedings of the 24th Annual Conference on Neural Information Processing Systems, 2010.\n\n[10] J. Kisynski and D. Poole. Constraint Processing in Lifted Probabilistic Inference. In Proceedings of the\n\nTwenty-Fifth Conference on Uncertainty in Arti\ufb01cial Intelligence, pages 293\u2013302, 2009.\n\n[11] S. Kok, M. Sumner, M. Richardson, P. Singla, H. Poon, D. Lowd, J. Wang, and P. Domingos. The Alchemy\nSystem for Statistical Relational AI. Technical report, Department of Computer Science and Engineering,\nUniversity of Washington, Seattle, WA, 2008. http://alchemy.cs.washington.edu.\n\n[12] R. Marinescu and R. Dechter. AND/OR Branch-and-Bound Search for Combinatorial Optimization in\n\nGraphical Models. Arti\ufb01cial Intelligence, 173(16-17):1457\u20131491, 2009.\n\n[13] H. Mittal, P. Goyal, V. Gogate, and P. Singla. New Rules for Domain Independent Lifted MAP Inference.\n\nIn Advances in Neural Information Processing Systems, 2014.\n\n[14] M. Mladenov, A. Globerson, and K. Kersting. Ef\ufb01cient Lifting of MAP LP Relaxations Using k-Locality.\n\nProceedings of the 17th International Conference on Arti\ufb01cial Intelligence and Statistics, 2014.\n\n[15] F. Niu, C. R\u00b4e, A. Doan, and J. Shavlik. Tuffy: Scaling up Statistical Inference in Markov Logic Networks\n\nUsing an RDBMS. Proceedings of the VLDB Endowment, 2011.\n\n[16] J. Noessner, M. Niepert, and H. Stuckenschmidt. RockIt: Exploiting Parallelism and Symmetry for MAP\nInference in Statistical Relational Models. In Proceedings of the Twenty-Seventh AAAI Conference on\nArti\ufb01cial Intelligence, 2013.\n\n[17] D. Poole. First-Order Probabilistic Inference.\n\nIn Proceedings of the Eighteenth International Joint\n\nConference on Arti\ufb01cial Intelligence, pages 985\u2013991, Acapulco, Mexico, 2003. Morgan Kaufmann.\n\n[18] S. Sarkhel, D. Venugopal, P. Singla, and V. Gogate. An Integer Polynomial Programming Based Framework\n\nfor Lifted MAP Inference. In Advances in Neural Information Processing Systems, 2014.\n\n[19] S. Sarkhel, D. Venugopal, P. Singla, and V. Gogate. Lifted MAP inference for Markov Logic Networks.\n\nProceedings of the 17th International Conference on Arti\ufb01cial Intelligence and Statistics, 2014.\n\n[20] B. Selman, H. Kautz, and B. Cohen. Local Search Strategies for Satis\ufb01ability Testing. In Cliques, Coloring,\n\nand Satis\ufb01ability: Second DIMACS Implementation Challenge. 1996.\n\n[21] G. Van den Broeck and A. Darwiche. On the Complexity and Approximation of Binary Evidence in Lifted\n\nInference. In Advances in Neural Information Processing Systems, 2013.\n\n[22] G. Van den Broeck, N. Taghipour, W. Meert, J. Davis, and L. De Raedt. Lifted Probabilistic Inference by\nFirst-Order Knowledge Compilation. In Proceedings of the Twenty Second International Joint Conference\non Arti\ufb01cial Intelligence, pages 2178\u20132185, 2011.\n\n[23] D. Venugopal and V. Gogate. Evidence-based Clustering for Scalable Inference in Markov Logic. In\n\nMachine Learning and Knowledge Discovery in Databases. 2014.\n\n9\n\n\f", "award": [], "sourceid": 1809, "authors": [{"given_name": "Somdeb", "family_name": "Sarkhel", "institution": "University of Texas at Dallas"}, {"given_name": "Parag", "family_name": "Singla", "institution": "Indian Institute of Technology"}, {"given_name": "Vibhav", "family_name": "Gogate", "institution": "UT Dallas"}]}