{"title": "Scaling-up Importance Sampling for Markov Logic Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 2978, "page_last": 2986, "abstract": "Markov Logic Networks (MLNs) are weighted first-order logic templates for generating large (ground) Markov networks. Lifted inference algorithms for them bring the power of logical inference to probabilistic inference. These algorithms operate as much as possible at the compact first-order level, grounding or propositionalizing the MLN only as necessary. As a result, lifted inference algorithms can be much more scalable than propositional algorithms that operate directly on the much larger ground network. Unfortunately, existing lifted inference algorithms suffer from two interrelated problems, which severely affects their scalability in practice. First, for most real-world MLNs having complex structure, they are unable to exploit symmetries and end up grounding most atoms (the grounding problem). Second, they suffer from the evidence problem, which arises because evidence breaks symmetries, severely diminishing the power of lifted inference. In this paper, we address both problems by presenting a scalable, lifted importance sampling-based approach that never grounds the full MLN. Specifically, we show how to scale up the two main steps in importance sampling: sampling from the proposal distribution and weight computation. Scalable sampling is achieved by using an informed, easy-to-sample proposal distribution derived from a compressed MLN-representation. Fast weight computation is achieved by only visiting a small subset of the sampled groundings of each formula instead of all of its possible groundings. We show that our new algorithm yields an asymptotically unbiased estimate. Our experiments on several MLNs clearly demonstrate the promise of our approach.", "full_text": "Scaling-up Importance Sampling for Markov Logic\n\nNetworks\n\nDeepak Venugopal\n\nDepartment of Computer Science\n\nUniversity of Texas at Dallas\n\ndxv021000@utdallas.edu\n\nVibhav Gogate\n\nDepartment of Computer Science\n\nUniversity of Texas at Dallas\n\nvgogate@hlt.utdallas.edu\n\nAbstract\n\nMarkov Logic Networks (MLNs) are weighted \ufb01rst-order logic templates for gen-\nerating large (ground) Markov networks. Lifted inference algorithms for them\nbring the power of logical inference to probabilistic inference. These algorithms\noperate as much as possible at the compact \ufb01rst-order level, grounding or proposi-\ntionalizing the MLN only as necessary. As a result, lifted inference algorithms can\nbe much more scalable than propositional algorithms that operate directly on the\nmuch larger ground network. Unfortunately, existing lifted inference algorithms\nsuffer from two interrelated problems, which severely affects their scalability in\npractice. First, for most real-world MLNs having complex structure, they are\nunable to exploit symmetries and end up grounding most atoms (the grounding\nproblem). Second, they suffer from the evidence problem, which arises because\nevidence breaks symmetries, severely diminishing the power of lifted inference. In\nthis paper, we address both problems by presenting a scalable, lifted importance\nsampling-based approach that never grounds the full MLN. Speci\ufb01cally, we show\nhow to scale up the two main steps in importance sampling: sampling from the\nproposal distribution and weight computation. Scalable sampling is achieved by\nusing an informed, easy-to-sample proposal distribution derived from a compressed\nMLN-representation. Fast weight computation is achieved by only visiting a small\nsubset of the sampled groundings of each formula instead of all of its possible\ngroundings. We show that our new algorithm yields an asymptotically unbiased\nestimate. Our experiments on several MLNs clearly demonstrate the promise of\nour approach.\n\n1\n\nIntroduction\n\nMarkov Logic Networks (MLNs) [5] are powerful template models that de\ufb01ne Markov networks\nby instantiating \ufb01rst-order formulas with objects from its domain. Designing scalable inference for\nMLNs is a challenging task because as the domain-size increases, the Markov network underlying\nthe MLN can become extremely large. Lifted inference algorithms [1, 2, 3, 7, 8, 13, 15, 18] try to\ntackle this challenge by exploiting symmetries in the relational representation. However, current\nlifted inference approaches face two interrelated problems. First, most of these techniques have the\ngrounding problem, i.e., unless the MLN has a speci\ufb01c symmetric, liftable structure [3, 4, 9], most\nalgorithms tend to ground most formulas in the MLN and this is infeasible for large domains. Second,\nlifted inference algorithms have an evidence problem, i.e., even if the MLN is liftable, in the presence\nof arbitrary evidence, symmetries are broken and once again, lifted inference is just as scalable as\npropositional inference [16]. Both these problems are severe because, often, practical applications\nrequire arbitrarily structured MLNs which can handle arbitrary evidence. To handle this problem, a\npromising approach is to approximate/bias the MLN distribution such that inference is less expensive\non this biased MLN. This idea has been explored in recent work such as [16] which uses the idea of\nintroducing new symmetries or [19] which uses unsupervised learning to reduce the objects in the\n\n1\n\n\fdomain. However, in both these approaches, it may turn out that for certain cases, the bias skews the\nMLN distribution to a large extent. Here, we propose a general-purpose importance sampling based\nalgorithm that retains the scalability of the aforementioned biased approaches but has theoretical\nguarantees, i.e., it yields asymptotically unbiased estimates.\nImportance sampling, a widely used sampling approach has two steps, namely, we \ufb01rst sample from\na proposal distribution and next, for each sample, we compute its importance weight. It turns out that\nfor MLNs, both steps can be computationally expensive. Therefore, we scale-up each of these steps.\nSpeci\ufb01cally, to scale-up step one, based on the recently proposed MLN approximation approach [19],\nwe design an informed proposal distribution using a \u201ccompressed\u201d representation of the ground\nMLN. We then compile a symbolic counting formula where each symbol is lifted, i.e., it represents\nmultiple assignments to multiple ground atoms. The compilation allows us to sample each lifted\nsymbol ef\ufb01ciently using Gibbs sampling. Importantly, the state space of the sampler depends upon\nthe number of symbols allowing us to trade-off accuracy-of-the-proposal with ef\ufb01ciency.\nStep two requires iterating over all ground formulas to compute the number of groundings satis\ufb01ed by\na sample. Though this operation can be made space-ef\ufb01cient (for bounded formula-length), i.e., we\ncan go over each grounding independently, the time-complexity is prohibitively large and is equivalent\nto the grounding problem. For example, consider a simple relationship, Friends(x, y) \u2227 Likes(y,\nz) \u21d2 Likes(x, z). If the domain-size of each variable is 100, then to obtain the importance weight\nof a single sample, we need to process 1 million ground formulas which is practically infeasible.\nTherefore, to make this weight-computation step feasible, we propose the following approach. We\nuse a second sampler to sample ground formulas in the MLN and compute the importance weight\nbased on the sampled groundings. We show that this method yields asymptotically unbiased estimates.\nFurther, by taking advantage of \ufb01rst-order structure, we reduce the variance of estimates in many\ncases through Rao-Blackwellization [11].\nWe perform experiments on varied MLN structures (Alchemy benchmarks [10]) with arbitrary\nevidence to illustrate the generality of our approach. We show that using our approach, we can\nsystematically trade-off accuracy with ef\ufb01ciency and can scale-up inference to extremely large\ndomain-sizes which cannot be handled by state-of-the-art MLN systems such as Alchemy.\n\n2 Preliminaries\n\n2.1 Markov Logic\n\nIn this paper, we assume a strict subset of \ufb01rst-order logic called \ufb01nite Herbrand logic. Thus, we\nassume that we have no function constants and \ufb01nitely many object constants. We also assume that\neach argument of each predicate is typed and can only be assigned to a \ufb01xed subset of constants. By\nextension, each logical variable in each formula is also typed. The domain of a term x in any formula\nrefers to the set of constants that can be substituted for x and is represented as \u2206x. We further assume\nthat all \ufb01rst-order formulas are disjunctive (clauses), have no free logical variables (namely, each\nlogical variable is quanti\ufb01ed), have only universally quanti\ufb01ed logical variables (CNF). Note that all\n\ufb01rst-order formulas can be easily converted to this form. A ground atom is an atom that contains no\nlogical variables.\nMarkov logic extends FOL by softening the hard constraints expressed by the formulas. A soft\nformula or a weighted formula is a pair (f, w) where f is a formula in FOL and w is a real-number.\nA MLN denoted by M, is a set of weighted formulas (fi, wi). Given a set of constants that represent\nobjects in the domain, an MLN de\ufb01nes a Markov network or a log-linear model. The Markov network\nis obtained by grounding the weighted \ufb01rst-order knowledge base and represents the following\nprobability distribution.\n\nPM(\u03c9) =\n\n1\n\nZ(M)\n\nexp\n\nwiN (fi, \u03c9)\n\n(1)\n\n(cid:32)(cid:88)\n\ni\n\n(cid:33)\n\nwhere \u03c9 is a world, N (fi, \u03c9) is the number of groundings of fi that evaluate to True in the world \u03c9\nand Z(M) is a normalization constant or the partition function.\nIn this paper, we assume that the input MLN to our algorithm is in normal form [9, 12]. A normal\nMLN [9] is an MLN that satis\ufb01es the following two properties: (1) There are no constants in any\nformula, and (2) If two distinct atoms with the same predicate symbol have variables x and y in\n\n2\n\n\fthe same position then \u2206x = \u2206y. An important distinction here is that, unlike in previous work\non lifted inference that use normal forms [7, 9] which require the MLN along with the associated\nevidence to be normalized, here we only require the MLN in normal form. This is important because\nnormalizing the MLN along with evidence typically requires grounding the MLN and blows-up its\nsize. In contrast, normalizing without evidence typically does not change the MLN. For instance, in\nall the benchmarks in Alchemy, the MLNs are already normalized.\nTwo main inference problems in MLNs are computing the partition function and the marginal\nprobabilities of query atoms given evidence. In this paper, we focus on the latter.\n\n2.2\n\nImportance Sampling\n\nImportance sampling [6] is a standard sampling-based approach, where we draw samples from a\nproposal distribution H that is easier to sample compared to sampling from the true distribution P .\nEach sample is then weighted with its importance weight to correct for the fact that it is drawn from\nthe wrong distribution. To compute the marginal probabilities from the weighted samples, we use the\nfollowing estimator.\n\n(2)\n\n(cid:80)T\n\n(cid:80)T\n\n(cid:48)\n\nP\n\n( \u00afQ) =\n\nt=1 \u03b4 \u00afQ(\u00afs(t))w(\u00afs(t))\n\nt=1 w(\u00afs(t))\n\nwhere \u00afs(t) is the tth sample drawn from H, \u03b4 \u00afQ(\u00afs(t)) = 1 iff the query atom Q is assigned \u00afQ in \u00afs(t)\nand 0 otherwise, w(\u00afs(t)) is the importance weight of the sample given by P (\u00afs(t))\nH(\u00afs(t)).\nP (cid:48)( \u00afQ) computed from Eq. (2) is an asymptotically unbiased estimate of PM( \u00afQ), namely as T \u2192 \u221e\nP (cid:48)( \u00afQ) almost surely converges to P ( \u00afQ). Eq. (2) is called as a ratio estimate or a normalized estimate\nbecause we only need to know each sample\u2019s importance weight up to a normalizing constant. We\nwill leverage this property throughout the paper.\n\n2.3 Compressed MLN Representation\n\nRecently, we [19] proposed an approach to generate a \u201ccompressed\u201d approximation of the MLN using\nunsupervised learning. Speci\ufb01cally, for each unique domain in the MLN, the objects in that domain\nare clustered into groups based on approximate symmetries. To learn the clusters effectively, we use\nstandard clustering algorithms and a distance function based on the evidence structure presented to the\nMLN. The distance function is constructed to ensure that objects that are approximately symmetrical\nto each other (from an inference perspective) are placed in a common cluster.\nFormally, given a MLN M, let D denote the set of all domains in M. That is, D \u2208 D is a set of\nobjects that belong to the same domain. To compress M, we consider each D \u2208 D independently and\nlearn a new domain D(cid:48) where |D(cid:48)| \u2264 D and g : D \u2192 D(cid:48) is a surjective mapping, i.e., \u2200 \u00b5 \u2208 D(cid:48), \u2203 C\n\u2208 D such that g(C) = \u00b5. In other words, each cluster of objects is replaced by its cluster center in\nthe reduced domain.\nIn this paper, we utilize the above approach to build an informed proposal distribution for importance\nsampling.\n\n3 Scalable Importance Sampling\n\nIn this section, we describe the two main steps in our new importance sampling algorithm: (a)\nconstructing and sampling the proposal distribution, and (b) computing the sample weight. We\ncarefully design each step, ensuring that we never ground the full MLN. As a result, the computational\ncomplexity of our method is much smaller than existing importance sampling approaches [8].\n\n3.1 Constructing and Sampling the Proposal Distribution\nWe \ufb01rst compress the domains of the given MLN, say M, based on the method in [19]. Let \u02c6M be\nthe network obtained by grounding M with its reduced domains (which corresponds to the cluster\ncenters) and let MG be the ground Markov network of M using the original domains. \u02c6M and MG\n\n3\n\n\fFormulas:\nR(x) \u2228 S(x, y), w\nDomains:\n\u2206x = {A1, B1, C1, D1}\n\u2206y = {A2, B2, C2, D2}\n\n(a)\n\nFormulas:\nR1(\u00b51) \u2228 S(\u00b51, \u00b53), w; R1(\u00b52) \u2228 S(\u00b52, \u00b53), w\nR1(\u00b51) \u2228 S(\u00b51, \u00b54), w; R1(\u00b52) \u2228 S(\u00b52, \u00b54), w\nDomains:\n\u03b6(\u00b51) = {A1, B1}; \u03b6(\u00b52) = {C1, D1}\n\u03b6(\u00b53) = {A2, B2}; and \u03b6(\u00b54) = {C2, D2}\n\n(b)\n\nFigure 1: (a) an example MLN M and (b) MLN \u02c6M obtained from M by grounding each logical\nvariable in M by the cluster centers \u00b51, . . ., \u00b54.\n\nare related as follows. We can think of \u02c6M as an MLN, in which the logical variables are the cluster\ncenters. If we set the domain of each logical variable corresponding to cluster center \u00b5 \u2208 D(cid:48) to \u03b6(\u00b5)\nwhere \u03b6(\u00b5) = {C \u2208 D|g(C) = \u00b5}, then the ground Markov network of \u02c6M is MG. Figure 1 shows\nan example MLN M and its corresponding compressed MLN \u02c6M. Notice that the Markov network\nobtained by grounding M is the same as the one obtained by grounding \u02c6M.\nNext, we describe how to generate samples from \u02c6M. Let \u02c6M contain \u02c6K predicates, for which we\nassume some ordering. Let E and U represent the counts of true (evidence) and unknown ground\natoms respectively. Let Ei (Ui) \u2208 E (U) represent the number of true (unknown) ground atoms\ncorresponding to the i-th predicate in \u02c6M. To keep the equations more readable, we assume that\nwe only have positive evidence (i.e., an assertion that the ground atom is true). Note that it is\nstraightforward to extend the equations to the general case in which we have both positive and\nnegative evidences.\nWithout loss of generality, let the j-th formula in \u02c6M, denoted by fj, contain the atoms p1, . . . pk\nwhere pi is an instance of the pi-th predicate and if i \u2264 m, it has a positive sign else it has a negative\nsign. The task is to now count the total number of satis\ufb01ed groundings in fj symbolically without\nactually going over the ground formulas. Unfortunately, this task is in #P. Therefore, we make the\nfollowing approximation. Let N (p1, . . . pk) denote the number of the satis\ufb01ed groundings of fj based\non the assignments to all groundings of predicates indexed by p1, . . . pk. Then, we will approximate\ni=1 N (pi), thereby independently counting the number of satis\ufb01ed groundings\nfor each predicate. Clearly, our approximation overestimates the number of satis\ufb01ed formulas because\nit ignores the joint dependencies between atoms in f. To compensate for this, we scale-down each\ncount by a scaling factor (\u03b3) which is the ratio of the actual number of ground formulas in f to the\nassumed number of ground formulas. Next, we de\ufb01ne these counting equations formally.\nGiven the j-th formula fj and a set of indexes k, where k \u2208 k corresponds to the k-th atom in fj, let\n#Gfj (k) denote the number of ground formulas in fj if all the terms in all atoms speci\ufb01ed by k are\nreplaced by constants. For instance, in the example shown in Fig. 1, let f be R1(\u00b51) \u2228 S1(\u00b51, \u00b53),\nthen, #Gf (\u2205) = 4, #Gf ({1}) = 2 and #Gf ({2}) = 1. We now count fj\u2019s satis\ufb01ed groundings\nsymbolically as follows.\n\nN (p1, . . . pk) using(cid:80)k\n\nm(cid:88)\n\ni=1\n\nS(cid:48)\nj = \u03b3\n\nEpi#Gfj ({i})\nk(cid:88)\n\n\u02c6Spi#Gfj ({i}) +\n\n(Upi \u2212 \u02c6Spi)#Gfj ({i})\n\ni=1\n\ni=m+1\n\n(cid:33)\n\n(3)\n\n(4)\n\nk#Gfj (\u2205)\n\n, \u02c6Spi is a lifted symbol representing the total number of true ground\nwhere \u03b3 =\natoms (among the unknown atoms) of the pi-th predicate and Sj is rounded to the nearest integer.\nThe symbolic (un-normalized) proposal probability is given by the following equation.\n\nH(\u02c6S, E) = exp\n\n(5)\n\n\uf8f6\uf8f8\n\nwjSj\n\n\uf8eb\uf8ed C(cid:88)\n\nj=1\n\n4\n\nwhere \u03b3 =\n\nj is rounded to the nearest integer.\n\n#Gfj (\u2205)\nm#Gfj (\u2205) = 1\nSj = \u03b3\n\nm and S(cid:48)\n\n(cid:32) m(cid:88)\n\nmax(#Gfj (\u2205)\u2212S(cid:48)\n\nj ,0)\n\n\fAlgorithm 1: Compute-Marginals\nInput: \u02c6M, \u03b6, Evidence E, Query Q, sampling threshold \u03b2, thinning parameter p, iterations T\nOutput: Marginal probabilities P for Q\nbegin\n\nConstruct the symbolic counting formula Eq. (5)\n// Outer Sampler\nfor t = 1 to T do\n\nSample \u02c6S(t) using Gibbs sampling on Eq. (5)\nAfter burn-in, for every p-th sample, generate \u00afs(t) from \u02c6S(t)\nfor each formula fi do\n\n// Inner Sampler\nfor c = 1 to \u03b2 do\n\n// Rao-Blackwellization\nf(cid:48)\ni = Partially ground formula created by sampling assignments to shared variables in fi\nCompute the satis\ufb01ed groundings in f(cid:48)\nCompute the sample weight using Eq. (7)\n\ni\n\nUpdate the marginal probability estimates using Eq. (2)\n\nwhere C is the number of formulas in \u02c6M and wj is the weight of the j-th formula.\nGiven the symbolic equation Eq. (5), we sample the set of lifted symbols, \u02c6S, using randomized Gibbs\nsampling. For this, we initialize all symbols to a random value. We then choose a random symbol \u02c6Si\nand substitute it in Eq. (5) for each value between 0 to Ui yielding a conditional distribution on \u02c6Si\ngiven assignments to \u02c6S\u2212i, where \u02c6S\u2212i refers to all symbols other than the ith one. We then sample\n\nfrom this conditional distribution by taking into account that there are(cid:0)Ui\n\n(cid:1) different assignments\n\ncorresponding to the vth value in the distribution, which corresponds to setting exactly v groundings\nof the ith predicate to True. After the Markov chain has mixed, to reduce the dependency between\nsuccessive Gibbs samples, we thin the samples and only use every p-th sample for estimation.\nNote that during the process of sampling from the proposal, we only had to compute \u02c6M, namely\nground the original MLN with the cluster-centers. Therefore, the representation is lifted because we\ndo not ground \u02c6M. This helps us scale up the sampling step to large domains-sizes (since we can\ncontrol the number of clusters).\n\nv\n\n3.2 Computing the Importance Weight\n\nIn order to compute the marginal probabilities as in Eq. (2), given a sample, we need to compute\n(up to a normalization constant) the weight of that sample. It is easy to see that a sample from the\nproposal (assignments on all symbols) has multiple possible assignments in the original MLN. For\ninstance, suppose in our running example in Fig. 1, the symbol corresponding to R(\u00b51) has a value\nequal to 1, this corresponds to two different assignments in M, either R(A1) is true or R(B1) is true.\n\n(cid:1) different assignments in the original distribution.\n\nFormally, a sample from the proposal has(cid:81) \u02c6K\n\nWe assume that all these assignments are equi-probable (have the same weight) in the proposal. Thus,\nto compute the (un-normalized) probability of a sample w.r.t M, we \ufb01rst convert the assignments on\na speci\ufb01c sample, \u02c6S(t) into one of the equi-probable assignments \u00afs by randomly choosing one of the\nassignments. Then, we compute the (un-normalized) probability P (\u00afs, E). The importance weight\n(upto a multiplicative constant) for the t-th sample is given by the ratio,\n\n(cid:0)Ui\n\n\u02c6Si\n\ni=1\n\nw(\u02c6S(t), E) =\n\nP (\u00afs(t), E)\nH(\u02c6S(t), E)\n\n(6)\n\nPlugging-in the weight computed by Eq. (6) into Eq. (2) yields an asymptotically unbiased estimate\nof the query marginal probabilities [11]. However, in the case of MLNs, computing Eq. (6) turns\nout to be a hard problem. Speci\ufb01cally, to compute \u02c6P (\u00afs(t), E), given a sample, we need to go\nover each ground formula in M and check if it is satis\ufb01ed or not. The combined-complexity [17]\n(domain-size as well as formula-size are assumed to be variable) of this operation for each formula\n\n5\n\n\fis #P-complete (cf. [5]). However, the data complexity (\ufb01xed formula-size, variable domain-size)\nis polynomial. That is, for k variables in a formula where the domain-size of each variable is d,\nthe complexity is clearly O(dk) to go over every grounding. However, in the case of MLNs, notice\nthat a polynomial data-complexity is equivalent to the complexity of the grounding-problem, which\nis precisely what we are trying to avoid and is therefore intractable for all practical purposes. To\nmake this weight-computation step tractable, we use an additional sampler which samples a bounded\nnumber of groundings of a formula in M and approximates the importance weight based on these\nsampled groundings. Formally,\nLet Ui be a proposal distribution de\ufb01ned on the groundings of the i-th formula. Here, we de\ufb01ne this\ndistribution as a product of |Vi| uniform distributions where Vi = Vi1 . . . Vik is the set of distinct\nUij, where Uij is a uniform distribution over the\ndomain-size of Vik. A sample from Ui contains a grounding for every variable in the i-th formula.\nUsing this, we can approximate the importance weight using the following equation.\n\nvariables in the i-th formula. Formally, Ui =(cid:81)|Vi|\n(cid:18)(cid:80)M\n\n(cid:19)\n\nj=1\n\nN\n\nw(\u00afs(t), E, \u00afu(t)\n\ni ) =\n\nexp\n\ni=1 wi\n\n\u03b2(cid:81)|Vi|\n\n(cid:48)\ni (\u00afs(t),E,\u00afu(t)\ni )\nUij\n\nj=1\n\nH(\u02c6S(t), E)\n\n(7)\n\ni\n\ni (\u00afs(t), E, \u00afu(t)\n\ni groundings of the i-th formula.\n\ni ) is the count of satis\ufb01ed groundings in \u00afu(t)\n\nare \u03b2 groundings of the i-th formula drawn from Ui\n\nwhere M is the number of formulas in M, \u00afu(t)\nand N(cid:48)\nProposition 1. Using the importance weights shown in Eq. (7) in a normalized estimator (see Eq. (2))\nyields an asymptotically unbiased estimate of the query marginals, i.e., as the number of samples, T\n\u2192 \u221e, the estimated marginal probabilities almost surely converge to the true marginal probabilities.\nWe skip the proof for lack of space, but the idea is that for each unique sample of the outer sampler,\neach of the importance weight estimates computed using a subset of formula groundings converge\ntowards the true importance weights (if all groundings of formulas were used). Speci\ufb01cally, the\nweights computed by the \u201cinner\u201d sampler by considering partial groundings of formulas add up to\nthe true weight as T \u2192 \u221e and therefore each importance weight is asymptotically unbiased. Eq. (2)\nis thus a ratio of asymptotically unbiased quantities and the above proposition follows.\nWe now show how we can leverage MLN structure to improve the weight estimate in Eq. (7).\nSpeci\ufb01cally, we Rao-Blackwellize the \u201cinner\u201d sampler as follows. We partition the variables in each\nformula into two sets, V1 and V2, such that we sample a grounding for the variables in V1 and\nfor each sample, we tractably compute the exact number of satis\ufb01ed groundings for all possible\ngroundings to V2. We illustrate this with the following example.\nExample 1. Consider a formula \u00acR(x, y) \u2228 S(y, z) where each variable has domain-size equal to d.\nThe data-complexity of computing the satis\ufb01ed groundings in this formula is clearly d3. However, for\nany speci\ufb01c value of y, say y = A, the satis\ufb01ed groundings in this formula can be computed in closed\nform as, n1d + n2d \u2212 n1n2, where n1 is the number of false groundings of R(x, A) and n2 is the\nnumber of true groundings in S(A, z). Computing this for all possible values of y has a complexity of\nO(d2).\nGeneralizing the above example, for any formula f with variables V, we say that V (cid:48) \u2208 V is shared,\nif it occurs more than once in that formula. For instance, in the above example y is a shared variable.\nSarkhel et. al [14] showed that for a formula f where no terms are shared, given an assignment to\nits ground atoms, it is always possible to compute the number of satis\ufb01ed groundings of f in closed\nform. Using this, we have the following proposition.\nProposition 2. Given assignments to all ground atoms of a formula f with no shared terms, the\ncombined complexity of computing the number of satis\ufb01ed groundings of f is O(dK), where d is an\nupper-bound on the domain-size of the non-shared variables in f and K is the maximum number of\nnon-shared variables in an atom of f.\nAlgorithm 1 illustrates our complete sampler. It assumes \u02c6M and \u03b6 are provided as input. First,\nwe construct the symbolic equation Eq. (5) that computes the weight of the proposal. In the outer\nsampler, we sample the symbols from Eq. (5) using Gibbs sampling. After the chain has mixed, for\neach sample from the outer sampler, for every formula in M, we construct an inner sampler that uses\nRao-Blackwelization to approximate the sample weight. Speci\ufb01cally, for a formula f, we sample\n\n6\n\n\f(a) Smokers\n\n(b) Relation\n\n(c) HMM\n\n(d) LogReq\n\nFigure 2: Tradeoff between computational ef\ufb01ciency and accuracy. The y-axis plots the average\nKL-divergence between the true marginals and the approximated ones for different values of Ns.\nLarger Ns implies weaker proposal, faster sampling. For this experiment, we set \u03b2 (sampling bound)\nto 0.2. Note that changing \u03b2 did not affect our results very signi\ufb01cantly.\n\nan assignment to each non-shared variable to create a partially ground formula, f(cid:48) and compute the\nexact number of satis\ufb01ed groundings in f(cid:48). Finally, we compute the sample weight as in Eq. (7) and\nupdate the normalized estimator in Eq. (2).\n\n4 Experiments\n\nWe run two sets of experiments. First, to illustrate the trade-off between accuracy and complexity, we\nexperiment with MLNs which can be solved exactly. Our test MLNs include Smokers and HMM\n(with few states) from the Alchemy website [10] and two additional MLNs, Relation (R(x, y) \u21d2 S(y,\nz)), LogReq (randomly generated formulas with singletons). Next, to illustrate scalability, we use\ntwo Alchemy benchmarks that are far larger, namely Hypertext classi\ufb01cation with 1 million ground\nformulas and Entity Resolution (ER) with 8 million ground formulas. For all MLNs, we randomly set\n25% groundings as true and 25% as false. For clustering, we used the scheme in [19] with KMeans++\nas the clustering method. For Gibbs sampling, we set the thinning parameter to 5 and use a burn-in of\n50 samples. We ran all experiments on a quad-core, 6GB RAM, Ubuntu laptop.\nFig. 2 shows our results on the \ufb01rst set of experiments, where the y-axis plots the average KL-\ndivergence between the true marginals for the query atoms and the marginals generated by our\nalgorithm. The values are shown for varying values of Ns = #Groundings(M)\n. Intuitively, Ns\n#F ormulas( \u02c6M)\nindicates the amount by which M has been compressed to form the proposal. As illustrated in\nFig. 2, as Ns increases, the accuracy becomes lower in all cases because the proposal is a weaker\napproximation of the true distribution. However, at the same time, the complexity decreases allowing\nus to trade-off accuracy with ef\ufb01ciency. Further, the MLN-structure also determines the proposal\naccuracy. For example, LogReg that contains singletons yields an accurate estimate even for high\n\n7\n\n 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 10 20 30 40 50 60 70 80 90 100ErrorTimeNs=40Ns=10Ns=5 0.15 0.2 0.25 0.3 0.35 0.4 0.45 10 20 30 40 50 60 70 80 90 100ErrorTimeNs=32Ns=16Ns=10 0.05 0.1 0.15 0.2 0.25 0.3 10 20 30 40 50 60 70 80 90 100ErrorTimeNs=400Ns=56Ns=16 0.05 0.1 0.15 0.2 0.25 0.3 0.35 10 20 30 40 50 60 70 80 90 100ErrorTimeNs=150Ns=60\f(a) Hypertext (1M groundings)\n\n(b) ER (8M groundings)\n\nFigure 3: Scalability experiments. C-Time indicates the time in seconds to generate the proposal.\nI-SRATE is the sampling rate measured as samples/minute.\n\nvalues of Ns, while, for Relation, a smaller Ns yields such accuracy. This is because, singletons have\nsymmetries [4, 7] which are exploited by the clustering scheme when building the proposal.\nFig. 3 shows the results on the second set of experiments where we measure the computational-time\nrequired by our algorithm during all its operational steps namely proposal creation, sampling and\nweight estimation. Note that, for both the MLNs used here, we tried to compare the results with\nAlchemy, but we were unable to get any results due to the grounding problem. As Fig. 3 shows, we\ncould scale to these large domains because, the complexity of sampling the proposal is feasible even\nwhen generating the ground MLN is infeasible. Speci\ufb01cally, we show the time taken to generate\nthe proposal distribution (C-Time) and the the number of weighted samples generated per minute\nduring inference (I-SRate). As expected, decreasing Ns, or increasing \u03b2 (sampling bound) lowers\nI-SRate because the complexity of sampling increases. At the same time, we also expect the quality\nof the samples to be better. Importantly, these results show that by addressing the evidence/grounding\nproblems, we can process large, arbitrarily structured MLNs/evidence without running out of memory\nin a reasonable amount of time.\n\n5 Conclusion\n\nInference algorithms in Markov logic encounter two interrelated problems hindering scalability \u2013 the\ngrounding and evidence problems. Here, we proposed an approach based on importance sampling\nthat avoids these problems in every step of its operation. Further, we showed that our approach yields\nasymptotically unbiased estimates. Our evaluation showed that our approach can systematically\ntrade-off complexity with accuracy and can therefore scale-up to large domains.\nFuture work includes, clustering strategies using better similarity measures such as graph-based\nsimilarity, applying our technique to MCMC algorithms, etc.\n\nAcknowledgments\n\nThis work was supported in part by the AFRL under contract number FA8750-14-C-0021, by the\nARO MURI grant W911NF-08-1-0242, and by the DARPA Probabilistic Programming for Advanced\nMachine Learning Program under AFRL prime contract number FA8750-14-C-0005. Any opinions,\n\ufb01ndings, conclusions, or recommendations expressed in this paper are those of the authors and do not\nnecessarily re\ufb02ect the views or of\ufb01cial policies, either expressed or implied, of DARPA, AFRL, ARO\nor the US government.\n\nReferences\n[1] Babak Ahmadi, Kristian Kersting, Martin Mladenov, and Sriraam Natarajan. Exploiting\nsymmetries for scaling loopy belief propagation and relational training. Machine Learning,\n92(1):91\u2013132, 2013.\n\n8\n\n(Ns,\u03b2)C-Time(secs)I-SRate(210,0.1)31200(210,0.25)3250(210,0.5)3150(25,0.1)8650(25,0.25)8180(25,0.5)8100(23,0.1)15600(23,0.25)15150(23,0.5)1590(Ns,\u03b2)C-Time(secs)I-SRate(10K,0.1)25125(10K,0.25)6545(10K,0.5)6515(1K,0.1)65125(1K,0.25)6545(1K,0.5)6515(25,0.1)15015(25,0.25)1508(25,0.5)1504\f[2] H. Bui, T. Huynh, and R. de Salvo Braz. Exact lifted inference with distinct soft evidence on\n\nevery object. In AAAI, 2012.\n\n[3] R. de Salvo Braz. Lifted First-Order Probabilistic Inference. PhD thesis, University of Illinois,\n\nUrbana-Champaign, IL, 2007.\n\n[4] Guy Van den Broeck. On the completeness of \ufb01rst-order knowledge compilation for lifted\n\nprobabilistic inference. In NIPS, pages 1386\u20131394, 2011.\n\n[5] P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Arti\ufb01cial Intelligence. Morgan\n\n& Claypool, San Rafael, CA, 2009.\n\n[6] J. Geweke. Bayesian inference in econometric models using Monte Carlo integration. Econo-\n\nmetrica, 57(6):1317\u201339, 1989.\n\n[7] V. Gogate and P. Domingos. Probabilistic Theorem Proving. In Proceedings of the Twenty-\nSeventh Conference on Uncertainty in Arti\ufb01cial Intelligence, pages 256\u2013265. AUAI Press,\n2011.\n\n[8] V. Gogate, A. Jha, and D. Venugopal. Advances in Lifted Importance Sampling. In Proceedings\n\nof the Twenty-Sixth AAAI Conference on Arti\ufb01cial Intelligence, 2012.\n\n[9] A. Jha, V. Gogate, A. Meliou, and D. Suciu. Lifted Inference from the Other Side: The tractable\nFeatures. In Proceedings of the 24th Annual Conference on Neural Information Processing\nSystems (NIPS), pages 973\u2013981, 2010.\n\n[10] S. Kok, M. Sumner, M. Richardson, P. Singla, H. Poon, D. Lowd, J. Wang, and P. Domin-\ngos. The Alchemy System for Statistical Relational AI. Technical report, Department\nof Computer Science and Engineering, University of Washington, Seattle, WA, 2008.\nhttp://alchemy.cs.washington.edu.\n\n[11] J. S. Liu. Monte Carlo Strategies in Scienti\ufb01c Computing. Springer Publishing Company,\n\nIncorporated, 2001.\n\n[12] B. Milch, L. S. Zettlemoyer, K. Kersting, M. Haimes, and L. P. Kaelbling. Lifted Probabilistic\nInference with Counting Formulas. In Proceedings of the Twenty-Third AAAI Conference on\nArti\ufb01cial Intelligence, pages 1062\u20131068, 2008.\n\n[13] D. Poole. First-Order Probabilistic Inference. In Proceedings of the Eighteenth International\nJoint Conference on Arti\ufb01cial Intelligence, pages 985\u2013991, Acapulco, Mexico, 2003. Morgan\nKaufmann.\n\n[14] Somdeb Sarkhel, Deepak Venugopal, Parag Singla, and Vibhav Gogate. Lifted MAP inference\nfor markov logic networks. In Proceedings of the Seventeenth International Conference on\nArti\ufb01cial Intelligence and Statistics, AISTATS, pages 859\u2013867, 2014.\n\n[15] G. Van den Broeck, N. Taghipour, W. Meert, J. Davis, and L. De Raedt. Lifted Probabilistic\nIn Proceedings of the Twenty Second\n\nInference by First-Order Knowledge Compilation.\nInternational Joint Conference on Arti\ufb01cial Intelligence, pages 2178\u20132185, 2011.\n\n[16] Guy van den Broeck and Adnan Darwiche. On the complexity and approximation of binary\nevidence in lifted inference. In Advances in Neural Information Processing Systems 26, pages\n2868\u20132876, 2013.\n\n[17] Moshe Y. Vardi. The complexity of relational query languages (extended abstract). In Pro-\nceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, pages 137\u2013146,\n1982.\n\n[18] D. Venugopal and V. Gogate. On lifting the gibbs sampling algorithm. In Proceedings of the\n26th Annual Conference on Neural Information Processing Systems (NIPS), pages 1664\u20131672,\n2012.\n\n[19] Deepak Venugopal and Vibhav Gogate. Evidence-based clustering for scalable inference in\nIn Machine Learning and Knowledge Discovery in Databases - European\nmarkov logic.\nConference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part III,\npages 258\u2013273, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1560, "authors": [{"given_name": "Deepak", "family_name": "Venugopal", "institution": "The University of Texas at Dallas (UT Dallas)"}, {"given_name": "Vibhav", "family_name": "Gogate", "institution": "UT Dallas"}]}