{"title": "Solving Marginal MAP Problems with NP Oracles and Parity Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 1127, "page_last": 1135, "abstract": "Arising from many applications at the intersection of decision-making and machine learning, Marginal Maximum A Posteriori (Marginal MAP) problems unify the two main classes of inference, namely maximization (optimization) and marginal inference (counting), and are believed to have higher complexity than both of them. We propose XOR_MMAP, a novel approach to solve the Marginal MAP problem, which represents the intractable counting subproblem with queries to NP oracles, subject to additional parity constraints. XOR_MMAP provides a constant factor approximation to the Marginal MAP problem, by encoding it as a single optimization in a polynomial size of the original problem. We evaluate our approach in several machine learning and decision-making applications, and show that our approach outperforms several state-of-the-art Marginal MAP solvers.", "full_text": "Solving Marginal MAP Problems with NP Oracles\n\nand Parity Constraints\n\nDepartment of Computer Science\n\nInstitute of Interdisciplinary Information Sciences\n\nYexiang Xue\n\nCornell University\n\nyexiang@cs.cornell.edu\n\nZhiyuan Li\u2217\n\nTsinghua University\n\nlizhiyuan13@mails.tsinghua.edu.cn\n\nStefano Ermon\n\nDepartment of Computer Science\n\nStanford University\n\nermon@cs.stanford.edu\n\nCarla P. Gomes, Bart Selman\nDepartment of Computer Science\n\nCornell University\n\n{gomes,selman}@cs.cornell.edu\n\nAbstract\n\nArising from many applications at the intersection of decision-making and machine\nlearning, Marginal Maximum A Posteriori (Marginal MAP) problems unify the\ntwo main classes of inference, namely maximization (optimization) and marginal\ninference (counting), and are believed to have higher complexity than both of\nthem. We propose XOR_MMAP, a novel approach to solve the Marginal MAP\nproblem, which represents the intractable counting subproblem with queries to\nNP oracles, subject to additional parity constraints. XOR_MMAP provides a constant\nfactor approximation to the Marginal MAP problem, by encoding it as a single\noptimization in a polynomial size of the original problem. We evaluate our approach\nin several machine learning and decision-making applications, and show that our\napproach outperforms several state-of-the-art Marginal MAP solvers.\n\n1\n\nIntroduction\n\nTypical inference queries to make predictions and learn probabilistic models from data include the\nmaximum a posteriori (MAP) inference task, which computes the most likely assignment of a set\nof variables, as well as the marginal inference task, which computes the probability of an event\naccording to the model. Another common query is the Marginal MAP (MMAP) problem, which\ninvolves both maximization (optimization over a set of variables) and marginal inference (averaging\nover another set of variables).\nMarginal MAP problems arise naturally in many machine learning applications. For example, learning\nlatent variable models can be formulated as a MMAP inference problem, where the goal is to optimize\nover the model\u2019s parameters while marginalizing all the hidden variables. MMAP problems also arise\nnaturally in the context of decision-making under uncertainty, where the goal is to \ufb01nd a decision\n(optimization) that performs well on average across multiple probabilistic scenarios (averaging).\nThe Marginal MAP problem is known to be NPPP-complete [18], which is commonly believed to be\nharder than both MAP inference (NP-hard) and marginal inference (#P-complete). As supporting\nevidence, MMAP problems are NP-hard even on tree structured probabilistic graphical models\n[13]. Aside from attempts to solve MMAP problems exactly [17, 15, 14, 16], previous approximate\napproaches fall into two categories, in general. The core idea of approaches in both categories is\n\n\u2217This research was done when Zhiyuan Li was an exchange student at Cornell University.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fto effectively approximate the intractable marginalization, which often involves averaging over an\nexponentially large number of scenarios. One class of approaches [13, 11, 19, 12] use variational\nforms to represent the intractable sum. Then the entire problem can be solved with message passing\nalgorithms, which correspond to searching for the best variational approximation in an iterative\nmanner. As another family of approaches, Sample Average Approximation (SAA) [20, 21] uses a\n\ufb01xed set of samples to represent the intractable sum, which then transforms the entire problem into\na restricted optimization, only considering a \ufb01nite number of samples. Both approaches treat the\noptimization and marginalizing components separately. However, we will show that by solving these\ntwo tasks in an integrated manner, we can obtain signi\ufb01cant computational bene\ufb01ts.\nErmon et al. [8, 9] recently proposed an alternative approach to approximate intractable counting\nproblems. Their key idea is a mechanism to transform a counting problem into a series of optimization\nproblems, each corresponding to the original problem subject to randomly generated XOR constraints.\nBased on this mechanism, they developed an algorithm providing a constant-factor approximation to\nthe counting (marginalization) problem.\nWe propose a novel algorithm, called XOR_MMAP, which approximates the intractable sum with a\nseries of optimization problems, which in turn are folded into the global optimization task. Therefore,\nwe effectively reduce the original MMAP inference to a single joint optimization of polynomial size\nof the original problem.\nWe show that XOR_MMAP provides a constant factor approximation to the Marginal MAP problem.\nOur approach also provides upper and lower bounds on the \ufb01nal result. The quality of the bounds can\nbe improved incrementally with increased computational effort.\nWe evaluate our algorithm on unweighted SAT instances and on weighted Markov Random Field\nmodels, comparing our algorithm with variational methods, as well as sample average approximation.\nWe also show the effectiveness of our algorithm on applications in computer vision with deep neural\nnetworks and in computational sustainability. Our sustainability application shows how MMAP\nproblems are also found in scenarios of searching for optimal policy interventions to maximize the\noutcomes of probabilistic models. As a \ufb01rst example, we consider a network design application to\nmaximize the spread of cascades [20], which include modeling animal movements or information\ndiffusion in social networks. In this setting, the marginals of a probabilistic decision model represent\nthe probabilities for a cascade to reach certain target states (averaging), and the overall network\ndesign problem is to make optimal policy interventions on the network structure to maximize the\nspread of the cascade (optimization). As a second example, in a crowdsourcing domain, probabilistic\nmodels are used to model people\u2019s behavior. The organizer would like to \ufb01nd an optimal incentive\nmechanism (optimization) to steer people\u2019s effort towards crucial tasks, taking into account the\nprobabilistic behavioral model (averaging) [22].\nWe show that XOR_MMAP is able to \ufb01nd considerably better solutions than those found by previous\nmethods, as well as provide tighter bounds.\n\n2 Preliminaries\nProblem De\ufb01nition Let A = {0, 1}m be the set of all possible assignments to binary variables\na1, . . . , am and X = {0, 1}n be the set of assignments to binary variables x1, . . . , xn. Let w(x, a) :\nand the marginal inference task(cid:80)\nX \u00d7A \u2192 R+ be a function that maps every assignment to a non-negative value. Typical queries over\na probabilistic model include the maximization task, which requires the computation of maxa\u2208A w(a),\n\nx\u2208X w(x), which sums over X .\n\nArising naturally from many machine learning applications, the following Marginal Maximum A\nPosteriori (Marginal MAP) problem is a joint inference task, which combines the two aforementioned\ninference tasks:\n\n(cid:88)\nWe consider the case where the counting problem(cid:80)\n\nx\u2208X w(x, a) and the maximization problem\nmaxa\u2208A #w(a) are de\ufb01ned over sets of exponential size, therefore both are intractable in general.\nCounting by Hashing and Optimization Our approach is based on a recent theoretical result that\ntransforms a counting problem to a series of optimization problems [8, 9, 2, 1]. A family of functions\nH = {h : {0, 1}n \u2192 {0, 1}k} is said to be pairwise independent if the following two conditions\n\n(1)\n\nmax\na\u2208A\n\nx\u2208X\n\nw(x, a).\n\n2\n\n\fW(a0, hk) = {x \u2208 X : w(a0, x) = 1, hk(x) = 0} is empty;\n\nAlgorithm 1: XOR_Binary(w : A \u00d7 X \u2192 {0, 1}, a0, k)\nSample function hk : X \u2192 {0, 1}k from a pair-wise independent function family;\nQuery an NP Oracle on whether\nReturn true if W(a0, hk) (cid:54)= \u2205, otherwise return false.\nhold for any function h randomly chosen from the family H: (1) \u2200x \u2208 {0, 1}n, the random variable\nh(x) is uniformly distributed in {0, 1}k and (2) \u2200x1, x2 \u2208 {0, 1}n, x1 (cid:54)= x2, the random variables\nh(x1) and h(x2) are independent.\nWe sample matrices A \u2208 {0, 1}k\u00d7n and vector b \u2208 {0, 1}k uniformly at random to form the\nfunction family HA,b = {hA,b : hA,b(x) = Ax + b mod 2}. It is possible to show that HA,b\nis pairwise independent [8, 9]. Notice that in this case, each function hA,b(x) = Ax + b mod 2\ncorresponds to k parity constraints. One useful way to think about pairwise independent functions\nis to imagine them as functions that randomly project elements in {0, 1}n into 2k buckets. De\ufb01ne\nBh(g) = {x \u2208 {0, 1}n : hA,b(x) = g} to be a \u201cbucket\u201d that includes all elements in {0, 1}n whose\nmapped value hA,b(x) is vector g (g \u2208 {0, 1}k). Intuitively, if we randomly sample a function hA,b\nfrom a pairwise independent family, then we get the following: x \u2208 {0, 1}n has an equal probability\nto be in any bucket B(g), and the bucket locations of any two different elements x, y are independent.\n\n3 XOR_MMAP Algorithm\n\n3.1 Binary Case\nWe \ufb01rst solve the Marginal MAP problem for the binary case, in which the function w : A \u00d7 X \u2192\n{0, 1} outputs either 0 or 1. We will extend the result to the weighted case in the next section.\nSince a \u2208 A often represent decision variables when MMAP problems are used in decision making,\nwe call a \ufb01xed assignment to vector a = a0 a \u201csolution strategy\u201d. To simplify the notation, we\nuse W(a0) to represent the set {x \u2208 X : w(a0, x) = 1}, and use W(a0, hk) to represent the set\n{x \u2208 X : w(a0, x) = 1 and hk(x) = 0}, in which hk is sampled from a pairwise independent\nfunction family that maps X to {0, 1}k. We write #w(a0) as shorthand for the count |{x \u2208 X :\nTheorem 3.1. (Ermon et. al.[8]) For a \ufb01xed solution strategy a0 \u2208 A,\n\nw(a0, x) = 1}| =(cid:80)\n\nx\u2208X w(a0, x). Our algorithm depends on the following result:\n\n\u2022 Suppose #w(a0) \u2265 2k0, then for any k \u2264 k0, with probability 1 \u2212 2c\nXOR_Binary(w, a0, k \u2212 c)=true.\n\u2022 Suppose #w(a0) < 2k0, then for any k \u2265 k0, with probability 1 \u2212 2c\nXOR_Binary(w, a0, k + c)=false.\n\n(2c\u22121)2 , Algorithm\n\n(2c\u22121)2 , Algorithm\n\nTo understand Theorem 3.1 intuitively, we can think of hk as a function that maps every element in\nset W(a0) into 2k buckets. Because hk comes from a pairwise independent function family, each\nelement in W(a0) will have an equal probability to be in any one of the 2k buckets, and the buckets\nin which any two elements end up are mutually independent. Suppose the count of solutions for a\n\ufb01xed strategy #w(a0) is 2k0, then with high probability, there will be at least one element located\nin a randomly selected bucket if the number of buckets 2k is less than 2k0. Otherwise, with high\nprobability there will be no element in a randomly selected bucket.\nTheorem 3.1 provides us with a way to obtain a rough count on #w(a0) via a series of tests on\nwhether W(a0, hk) is empty, subject to extra parity functions hk. This transforms a counting problem\nto a series of NP queries, which can also be thought of as optimization queries. This transformation\nis extremely helpful for the Marginal MAP problem. As noted earlier, the main challenge for the\nmarginal MAP problem is the intractable sum embedded in the maximization. Nevertheless, the\nwhole problem can be re-written as a single optimization if the intractable sum can be approximated\nwell by solving an optimization problem over the same domain.\nWe therefore design Algorithm XOR_MMAP, which is able to provide a constant factor approximation\nto the Marginal MAP problem. The whole algorithm is shown in Algorithm 3. In its main procedure\n\n3\n\n\fAlgorithm 2: XOR_K(w : A \u00d7 X \u2192 {0, 1}, k, T )\nSample T pair-wise independent hash functions\n\nh(1)\nk , h(2)\n\nk , . . . , h(T )\n\nk\n\n: X \u2192 {0, 1}k;\n\nQuery Oracle\n\nmax\n\na\u2208A,x(i)\u2208X\n\nw(a, x(i))\n\n(2)\n\ns.t. h(i)\n\nk (x(i)) = 0, i = 1, . . . , T.\n\nReturn true if the max value is larger than (cid:100)T /2(cid:101),\notherwise return false.\n\nT(cid:88)\n\ni=1\n\nAlgorithm 3: XOR_MMAP(w : A \u00d7 X \u2192\n{0, 1},n = log2 |X|,m = log2 |A|,T )\nk = n;\nwhile k > 0 do\n\nif XOR_K(w, k, T ) then\n\nReturn 2k;\n\nend\nk \u2190 k \u2212 1;\n\nend\nReturn 1;\n\n\u03b1\u2217(c)\n\nXOR_K, the algorithm transforms the Marginal MAP problem into an optimization over the sum of T\nreplicates of the original function w. Here, x(i) \u2208 X is a replicate of the original x, and w(a, x(i)) is\nthe original function w but takes x(i) as one of the inputs. All replicates share common input a. In\naddition, each replicate is subject to an independent set of parity constraints on x(i). Theorem 3.2\nstates that XOR_MMAP provides a constant-factor approximation to the Marginal MAP problem:\nTheorem 3.2. For T \u2265 m ln 2+ln(n/\u03b4)\n, with probability 1 \u2212 \u03b4, XOR_MMAP(w, log2 |X|, log2 |A|, T )\noutputs a 22c-approximation to the Marginal MAP problem: maxa\u2208A #w(a). \u03b1\u2217(c) is a constant.\nLet us \ufb01rst understand the theorem in an intuitive way. Without losing generality, suppose the\noptimal value maxa\u2208A #w(a) = 2k0. Denote a\u2217 as the optimal solution, ie, #w(a\u2217) = 2k0.\nAccording to Theorem 3.1, the set W(a\u2217, hk) has a high probability to be non-empty, for any\nfunction hk that contains k < k0 parity constraints.\nIn this case, the optimization problem\n(i = 1 . . . T ) are sampled independently, the sum(cid:80)T\nk (x(i))=0 w(a\u2217, x(i)) for one replicate x(i) almost always returns 1. Because h(i)\nmaxx(i)\u2208X ,h(i)\ni=1 w(a\u2217, x(i)) is likely to be larger than (cid:100)T /2(cid:101),\nsince each term in the sum is likely to be 1 (under the \ufb01xed a\u2217). Furthermore, since XOR_K maximizes\nthis sum over all possible strategies a \u2208 A, the sum it \ufb01nds will be at least as good as the one attained\nat a\u2217, which is already over (cid:100)T /2(cid:101). Therefore, we conclude that when k < k0, XOR_K will return\ntrue with high probability.\nWe can develop similar arguments to conclude that XOR_K will return false with high probability\nwhen more than k0 XOR constraints are added. Notice that replications and an additional union bound\nargument are necessary to establish the probabilistic guarantee in this case. As a counter-example,\nsuppose function w(x, a) = 1 if and only if x = a, otherwise w(x, a) = 0 (m = n in this case). If\nwe set the number of replicates T = 1, then XOR_K will almost always return 1 when k < n, which\nsuggests that there are 2n solutions to the MMAP problem. Nevertheless, in this case the true optimal\nvalue of maxx #w(x, a) is 1, which is far away from 2n. This suggests that at least two replicates\nare needed.\nLemma 3.3. For T \u2265 ln 2\u00b7m+ln(n/\u03b4)\n\n, procedure XOR_K(w,k) satis\ufb01es:\n\n\u03b1\u2217(c)\n\nk\n\nreturns true.\n\n\u2208 A, s.t. #w(a\u2217) \u2265 2k, then with probability 1 \u2212 \u03b4\n\u2022 Suppose \u2203a\u2217\n\u2022 Suppose \u2200a0 \u2208 A, s.t. #w(a0) < 2k, then with probability 1 \u2212 \u03b4\n\nn2m , XOR_K(w, k \u2212 c, T )\nn , XOR_K(w, k + c, T )\n\nreturns false.\n\nProof. Claim 1: If there exists such a\u2217 satisfying #w(a\u2217) \u2265 2k, pick a0 = a\u2217. Let X (i)(a0) =\nk\u2212c(x(i))=0 w(a0, x(i)), for i = 1 . . . , T . From Theorem 3.1, X (i)(a0) = 1 holds with\nmaxx(i)\u2208X ,h(i)\n(2c\u22121)2 . Let \u03b1\u2217(c) = D( 1\nT(cid:88)\nprobability 1 \u2212 2c\n2(cid:107)\nX (i)(a) \u2264 T /2\n\n(2c\u22121)2 ). By Chernoff bound, we have\nX (i)(a0) \u2264 T /2\n\n(cid:34) T(cid:88)\n\n\u2212\u03b1\u2217(c)T ,\n\n(2c\u22121)2 )T\n\n\u2212D( 1\n2 (cid:107)\n\n\u2264 Pr\n\n\u2264 e\n\n(cid:35)\n\n(cid:35)\n\n= e\n\n(3)\n\n2c\n\n2c\n\n(cid:19)\n\ni=1\n\n= 2 ln(2c \u2212 1) \u2212 ln 2 \u2212 1\n2\n\nln(2c) \u2212 1\n2\n\nln((2c \u2212 1)2 \u2212 2c) \u2265 (\n\n\u2212 2) ln 2.\n\nc\n2\n\n4\n\n(cid:34)\n(cid:18) 1\n\nPr\n\nmax\na\u2208A\n\ni=1\n\nwhere\n\nD\n\n(cid:107)\n\n2c\n\n(2c \u2212 1)2\n\n2\n\n\f\u03b1\u2217(c)\n\nn2m , we have\n\n, we have e\u2212\u03b1\u2217(c)T \u2264 \u03b4\n\nn2m . Thus, with probability 1 \u2212 \u03b4\n\n(cid:80)T\nFor T \u2265 ln 2\u00b7m+ln(n/\u03b4)\ni=1 X (i)(a) > T /2, which implies that XOR_K(w, k \u2212 c, T ) returns true.\n\nmax\na\u2208A\nClaim 2: The proof is almost the same as Claim 1, except that we need to use a union bound to let\nthe property hold for all a \u2208 A simultaneously. As a result, the success probability will be 1 \u2212 \u03b4\ninstead of 1 \u2212 \u03b4\nProof. (Theorem 3.2) With probability 1 \u2212 n \u03b4\nn = 1 \u2212 \u03b4, the output of n calls of XOR_K(w, k, T )\n(with different k = 1 . . . n) all satisfy the two claims in Lemma 3.3 simultaneously. Suppose\na\u2208A #w(a) \u2208 [2k0 , 2k0+1), we have (i) \u2200k \u2265 k0 + c + 1, XOR_K(w, k, T ) returns false, (ii)\nmax\n\u2200k \u2264 k0 \u2212 c, XOR_K(w, k, T ) returns true. Therefore, with probability 1 \u2212 \u03b4, the output of\nXOR_MMAP is guaranteed to be among 2k0\u2212c and 2k0+c.\n\nn2m . The proof is left to supplementary materials.\n\nn\n\nThe approximation bound in Theorem 3.2 is a worst-case guarantee. We can obtain a tight bound (e.g.\n16-approx) with a large number of T replicates. Nevertheless, we keep a small T , therefore a loose\nbound, in our experiments, after trading between the formal guarantee and the empirical complexity.\nIn practice, our method performs well, even with loose bounds. Moreover, XOR_K procedures with\ndifferent input k are not uniformly hard. We therefore can run them in parallel. We can obtain a looser\nbound at any given time, based on all completed XOR_K procedures. Finally, if we have access to a\npolynomial approximation algorithm for the optimization problem in XOR_K, we can propagate this\nbound through the analysis, and again get a guaranteed bound, albeit looser for the MMAP problem.\nReduce the Number of Replicates We further develop a few variants of XOR_MMAP in the supple-\nmentary materials to reduce the number of replicates, as well as the number of calls to the XOR_K\nprocedure, while preserving the same approximation bound.\nImplementation We solve the optimization problem in XOR_K using Mixed Integer Programming\n(MIP). Without losing generality, we assume w(a, x) is an indicator variable, which is 1 iff (a, x)\nsatis\ufb01es constraints represented in Conjunctive Normal Form (CNF). We introduce extra variables\ni w(a, x(i)) which is left in the supplementary materials. The XORs in\n\nto represent the sum(cid:80)\n\nEquation 2 are encoded as MIP constraints using the Yannakakis encoding, similar as in [7].\n\n3.2 Extension to the Weighted Case\n\nIn this section, we study the more general case, where w(a, x) takes non-negative real numbers\ninstead of integers in {0, 1}. Unlike in [8], we choose to build our proof from the unweighted case\nbecause it can effectively avoid modeling the median of an array of numbers [6], which is dif\ufb01cult\nto encode in integer programming. We noticed recent work [4]. It is related but different from our\napproach. Let w : A \u00d7 X \u2192 R+, and M = maxa,x w(a, x).\nDe\ufb01nition 3.4. We de\ufb01ne the embedding Sa(w, l) of X in X \u00d7 {0, 1}l as:\n\nSa(w, l) =\n\n(x, y)|\u22001 \u2264 i \u2264 l,\n\nw(a, x)\n\nM \u2264\n\n2i\u22121\n2l \u21d2 yi = 0\n\n.\n\n(4)\n\n(cid:27)\n\n(cid:26)\n\nl(a, x, y) be an indicator variable which is 1 if and only if (x, y) is in Sa(w, l),\n\nLemma 3.5. Let w(cid:48)\ni.e., w(cid:48)\n\nl(a, x, y) = 1(x,y)\u2208Sa(w,l). We claim that\nw(cid:48)\nl(a, x, y) \u2264 2 max\n\nM\n2l max\n\nw(a, x) \u2264\n\nmax\n\na\n\na\n\na\n\n(cid:88)\n\nx\n\n(cid:88)\n\n(x,y)\n\n(cid:88)\n\nx\n\nw(a, x) + M 2n\u2212l.2\n\n(5)\n\nSa(w, l, x0) = {(x, y) \u2208 Sa(w, l) : x = x0}. It is not hard to see that(cid:80)\n(cid:80)\n|Sa(w, l, x)| and w(a, x). Then we use the result to show the relationship between(cid:80)\nx |Sa(w, l, x)|.\n2 If w satisfy the property that mina,x w(a, x) \u2265 2\u2212l\u22121M, we don\u2019t have the M 2n\u2212l term.\n\nProof. De\ufb01ne Sa(w, l, x0) as the set of (x, y) pairs within the set Sa(w, l) and x = x0, ie,\nl(a, x, y) =\nIn the following, \ufb01rst we are going to establish the relationship between\nx |Sa(w, l, x)|\n\n(x,y) w(cid:48)\n\n5\n\n\fand(cid:80)\n\nx w(x, a). Case (i): If w(a, x) is sandwiched between two exponential levels: M\n\n2l 2i\u22121 <\n2l 2i for i \u2208 {0, 1, . . . , l}, according to De\ufb01nition 3.4, for any (x, y) \u2208 Sa(w, l, x), we\n\nw(a, x) \u2264 M\nhave yi+1 = yi+2 = . . . = yl = 0. This makes |Sa(w, l, x)| = 2i, which further implies that\n\nM\n\n2l \u00b7 |Sa(w, l, x)|\n\n2\n\n< w(a, x) \u2264\n\nM\n2l \u00b7 |Sa(w, l, x)|,\n\nor equivalently,\n\nCase (ii): If w(a, x) \u2264 M\n\nw(a, x) \u2264\n\nM\n2l \u00b7 |Sa(w, l, x)| < 2w(a, x).\n2l+1 , we have |Sa(w, l, x)| = 1. In other words,\n\n(6)\n\n(7)\n\nM\n2l |Sa(w, l, x)|.\n\nM\n2l+1|Sa(w, l, x)| =\n\nw(a, x) \u2264 2w(a, x) \u2264 2\n\n(8)\nAlso, M 2\u2212l|Sa(w, l, x)| = M 2\u2212l \u2264 2w(a, x) + M 2\u2212l. Hence, the following bound holds in both\ncases (i) and (ii):\n(9)\n\n2l |Sa(w, l, x)| \u2264 2w(a, x) + M 2\u2212l.\n\nw(a, x) \u2264\n\nThe lemma holds by summing up over X and maximizing over A on all sides of Inequality 9.\nWith the result of Lemma 3.5, we are ready to prove the following approximation result:\nTheorem 3.6. Suppose there is an algorithm that gives a c-approximation to solve the unweighted\nproblem: maxa\nl(a, x, y), then we have a 3c-approximation algorithm to solve the weighted\nMarginal MAP problem maxa\n\n(cid:80)\n(x,y) w(cid:48)\n\nM\n\nx w(a, x).\n\nProof. Let l = n in Lemma 3.5. By de\ufb01nition M = maxa,x w(a, x) \u2264 maxa\n\nx w(a, x), we have:\n\nmax\n\na\n\nw(a, x) \u2264 M\n\n2l max\n\na\n\nl(a, x, y) \u2264 2 max\n(cid:48)\nw\n\na\n\nw(a, x) + M \u2264 3 max\n\nw(a, x).\n\n(cid:88)\n\nx\n\n(cid:80)\n(cid:88)\n\n(x,y)\n\nThis is equivalent to:\n\u00b7 M\n2l max\n\n1\n3\n\na\n\n(cid:88)\n\n(x,y)\n\nw\n\nl(a, x, y) \u2264 max\n(cid:48)\n\na\n\nw(a, x) \u2264 M\n\n2l max\n\na\n\n(cid:48)\nl(a, x, y).\n\nw\n\n(cid:88)\n\nx\n\n(cid:88)\n\nx\n\n(cid:80)\n\na\n\n(cid:88)\n\nx\n\n(cid:88)\n\n(x,y)\n\n4 Experiments\n\nlog(cid:80)\n\nx\u2208X w(amethod, x) \u2212 log(cid:80)\n\nWe evaluate our proposed algorithm XOR_MMAP against two baselines \u2013 the Sample Average Ap-\nproximation (SAA) [20] and the Mixed Loopy Belief Propagation (Mixed LBP) [13]. These two\nbaselines are selected to represent the two most widely used classes of methods that approximate the\nembedded sum in MMAP problems in two different ways. SAA approximates the intractable sum\nwith a \ufb01nite number of samples, while the Mixed LBP uses a variational approximation. We obtained\nthe Mixed LBP implementation from the author of [13] and we use their default parameter settings.\nSince Marginal MAP problems are in general very hard and there is currently no exact solver that\nscales to reasonably large instances, our main comparison is on the relative optimality gap: we \ufb01rst\nobtain the solution amethod for each approach. Then we compare the difference in objective function\nx\u2208X w(abest, x), in which abest is the best solution among the\nthree methods. Clearly a better algorithm will \ufb01nd a vector a which yields a larger objective function.\nThe counting problem under a \ufb01xed solution a is solved using an exact counter ACE [5], which is\nonly used for comparing the results of different MMAP solvers.\nOur \ufb01rst experiment is on unweighted random 2-SAT instances. Here, w(a, x) is an indicator variable\non whether the 2-SAT instance is satis\ufb01able. The SAT instances have 60 variables, 20 of which are\nFigure 1 shows the median objective function(cid:80)\nrandomly selected to form set A, and the remaining ones form set X . The number of clauses varies\nfrom 1 to 70. For a \ufb01xed number of clauses, we randomly generate 20 instances, and the left panel of\nx\u2208X w(amethod, x) of the solutions found by the\nthree approaches. We tune the constants of our XOR_MMAP so it gives a 210 = 1024-approximation\n(2\u22125 \u00b7 sol \u2264 OP T \u2264 25 \u00b7 sol, \u03b4 = 10\u22123). The upper and lower bounds are shown in dashed lines.\nSAA uses 10,000 samples. On average, the running time of our algorithm is reasonable. When\n\n6\n\n\fhigher objective(cid:80)\n\nFigure 1: (Left) On median case, the solutions a0 found by the proposed Algorithm XOR_MMAP have\nx\u2208X w(a0, x) than the solutions found by SAA and Mixed LBP, on random 2-SAT\ninstances with 60 variables and various number of clauses. Dashed lines represent the proved bounds\nfrom XOR_MMAP. (Right) The percentage of instances that each algorithm can \ufb01nd a solution that is at\nleast 1/8 value of the best solutions among 3 algorithms, with different number of clauses.\n\nFigure 2: On median case, the solutions a0 found by the proposed Algorithm XOR_MMAP are better\nthan the solutions found by SAA and Mixed LBP, on weighted 12-by-12 Ising models with mixed\ncoupling strength. (Up) Field strength 0.01. (Down) Field strength 0.1. (Left) 20% variables are\nrandomly selected for maximization. (Mid) 50% for maximization. (Right) 80% for maximization.\n\nenforcing the 1024-approximation bound, the median time for a single XOR_k procedure is in seconds,\nalthough we occasionally have long runs (no more than 30-minute timeout).\nAs we can see from the left panel of Figure 1, both Mixed LBP and SAA match the performance\nof our proposed XOR_MMAP on easy instances. However, as the number of clauses increases, their\nperformance quickly deteriorates. In fact, for instances with more than 20 (60) clauses, typically the\na vectors returned by Mixed LBP (SAA) do not yield non-zero solution values. Therefore we are not\nable to plot their performance beyond the two values. At the same time, our algorithm XOR_MMAP can\nstill \ufb01nd a vector a yielding over 220 solutions on larger instances with more than 60 clauses, while\nproviding a 1024-approximation.\nNext, we look at the performance of the three algorithms on weighted instances. Here, we set the\nnumber of replicates T = 3 for our algorithm XOR_MMAP, and we repeatedly start the algorithm with\nan increasing number of XOR constraints k, until it completes for all k or times out in an hour. For\nSAA, we use 1,000 samples, which is the largest we can use within the memory limit. All algorithms\nare given a one-hour time and a 4G memory limit.\nThe solutions found by XOR_MMAP are considerably better than the ones found by Mixed LBP and\nSAA on weighted instances. Figure 2 shows the performance of the three algorithms on 12-by-12\nIsing models with mixed coupling strength, different \ufb01eld strengths and number of variables to form\nset A. All values in the \ufb01gure are median values across 20 instances (in log10). In all 6 cases in\nFigure 2, our algorithm XOR_MMAP is the best among the three approximate algorithms. In general,\nthe difference in performance increases as the coupling strength increases. These instances are\nchallenging for the state-of-the-art complete solvers. For example, the state-of-the-art exact solver\n\n7\n\n010203040506070Number of clauses01020304050log of number of solutionsupper boundlower boundMIXED_LBPXOR_MMAPSAA010203040506070Number of clauses0.00.20.40.60.81.01.21.4% sol within 1/8 OPTMIXED_LBPXOR_MMAPSAA0.51.01.52.02.53.03.5CouplingStrength\u221214\u221212\u221210\u22128\u22126\u22124\u221220log#w(amethod)\u2212log#w(abest)XORMMAPSAAMIXEDLBP0.51.01.52.02.53.03.5CouplingStrength\u221225\u221220\u221215\u221210\u221250log#w(amethod)\u2212log#w(abest)XORMMAPSAAMIXEDLBP0.51.01.52.02.53.03.5CouplingStrength\u221250\u221240\u221230\u221220\u2212100log#w(amethod)\u2212log#w(abest)XORMMAPSAAMIXEDLBP0.51.01.52.02.53.03.5CouplingStrength\u221214\u221212\u221210\u22128\u22126\u22124\u221220log#w(amethod)\u2212log#w(abest)XORMMAPSAAMIXEDLBP0.51.01.52.02.53.03.5CouplingStrength\u221225\u221220\u221215\u221210\u221250log#w(amethod)\u2212log#w(abest)XORMMAPSAAMIXEDLBP0.51.01.52.02.53.03.5CouplingStrength\u221250\u221240\u221230\u221220\u2212100log#w(amethod)\u2212log#w(abest)XORMMAPSAAMIXEDLBP\fFigure 3: (Left) The image completion task. Solvers are given digits of the upper part as shown in the\n\ufb01rst row. Solvers need to complete the digits based on a two-layer deep belief network and the upper\npart. (2nd Row) completion given by XOR_MMAP. (3rd Row) SAA. (4th Row) Mixed Loopy Belief\nPropagation. (Middle) Graphical illustration of the network cascade problem. Red circles are nodes\nto purchase. Lines represent cascade probabilities. See main text. (Right) Our XOR_MMAP performs\nbetter than SAA on a set of network cascade benchmarks, with different budgets.\n\nAOBB with mini-bucket heuristics and moment matching [14] runs out of 4G memory on 60% of\ninstances with 20% variables randomly selected as max variables. We also notice that the solution\nfound by our XOR_MMAP is already close to the ground-truth. On smaller 10-by-10 Ising models\nwhich the exact AOBB solver can complete within the memory limit, the median difference between\nthe log10 count of the solutions found by XOR_MMAP and those found by the exact solver is 0.3, while\nthe differences between the solution values of XOR_MMAP against those of the Mixed BP or SAA are\non the order of 10.\nWe also apply the Marginal MAP solver to an image completion task. We \ufb01rst learn a two-layer deep\nbelief network [3, 10] from a 14-by-14 MNIST dataset. Then for a binary image that only contains\nthe upper part of a digit, we ask the solver to complete the lower part, based on the learned model.\nThis is a Marginal MAP task, since one needs to integrate over the states of the hidden variables, and\nquery the most likely states of the lower part of the image. Figure 3 shows the result of a few digits.\nAs we can see, SAA performs poorly. In most cases, it only manages to come up with a light dot for\nall 10 different digits. Mixed Loopy Belief Propagation and our proposed XOR_MMAP perform well.\nThe good performance of Mixed LBP may be due to the fact that the weights on pairwise factors in\nthe learned deep belief network are not very combinatorial.\nFinally, we consider an application that applies decision-making into machine learning models. This\nnetwork design application maximizes the spread of cascades in networks, which is important in\nthe domain of social networks and computational sustainability. In this application, we are given a\nstochastic graph, in which the source node at time t = 0 is affected. For a node v at time t, it will\nbe affected if one of its ancestor nodes at time t \u2212 1 is affected, and the con\ufb01guration of the edge\nconnecting the two nodes is \u201con\u201d. An edge connecting node u and v has probability pu,v to be turned\non. A node will not be affected if it is not purchased. Our goal is to purchase a set of nodes within a\n\ufb01nite budget, so as to maximize the probability that the target node is affected. We refer the reader to\n[20] for more background knowledge. This application cannot be captured by graphical models due\nto global constraints. Therefore, we are not able to run mixed LBP on this problem. We consider a\nset of synthetic networks, and compare the performance of SAA and our XOR_MMAP with different\nbudgets. As we can see from the right panel of Figure 3, the nodes that our XOR_MMAP decides to\npurchase result in higher probabilities of the target node being affected, compared to SAA. Each dot\nin the \ufb01gure is the median value over 30 networks generated in a similar way.\n\n5 Conclusion\nWe propose XOR_MMAP, a novel constant approximation algorithm to solve the Marginal MAP\nproblem. Our approach represents the intractable counting subproblem with queries to NP oracles,\nsubject to additional parity constraints. In our algorithm, the entire problem can be solved by a\nsingle optimization. We evaluate our approach on several machine learning and decision-making\napplications. We are able to show that XOR_MMAP outperforms several state-of-the-art Marginal MAP\nsolvers. XOR_MMAP provides a new angle to solving the Marginal MAP problem, opening the door to\nnew research directions and applications in real world domains.\nAcknowledgments\nThis research was supported by National Science Foundation (Awards #0832782, 1522054, 1059284,\n1649208) and Future of Life Institute (Grant 2015-143902).\n\n8\n\nt=1t=2t=TpuvuvST30354045505560Budgets\u221230\u221225\u221220\u221215Log2 ProbabilitySAAXOR_MMAP\fReferences\n[1] Dimitris Achlioptas and Pei Jiang. Stochastic integration via error-correcting codes. In Proc. Uncertainty\n\nin Arti\ufb01cial Intelligence, 2015.\n\n[2] Vaishak Belle, Guy Van den Broeck, and Andrea Passerini. Hashing-based approximate probabilistic\n\ninference in hybrid domains. In Proceedings of the 31st UAI Conference, 2015.\n\n[3] Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. Greedy layer-wise training of deep\n\nnetworks. In Advances in Neural Information Processing Systems 19, 2006.\n\n[4] Supratik Chakraborty, Dror Fried, Kuldeep S. Meel, and Moshe Y. Vardi. From weighted to unweighted\n\nmodel counting. In Proceedings of the 24th Interational Joint Conference on AI (IJCAI), 2015.\n\n[5] Mark Chavira, Adnan Darwiche, and Manfred Jaeger. Compiling relational bayesian networks for exact\n\ninference. Int. J. Approx. Reasoning, 2006.\n\n[6] Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, and Bart Selman. Embed and project: Discrete\nsampling with universal hashing. In Advances in Neural Information Processing Systems (NIPS), pages\n2085\u20132093, 2013.\n\n[7] Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, and Bart Selman. Optimization with parity constraints:\nFrom binary codes to discrete integration. In Proceedings of the Twenty-Ninth Conference on Uncertainty\nin Arti\ufb01cial Intelligence, UAI, 2013.\n\n[8] Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, and Bart Selman. Taming the curse of dimensionality:\nDiscrete integration by hashing and optimization. In Proceedings of the 30th International Conference on\nMachine Learning, ICML, 2013.\n\n[9] Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, and Bart Selman. Low-density parity constraints\nfor hashing-based discrete integration. In Proceedings of the 31th International Conference on Machine\nLearning, ICML, 2014.\n\n[10] Geoffrey Hinton and Ruslan Salakhutdinov. Reducing the dimensionality of data with neural networks.\n\nScience, 313(5786):504 \u2013 507, 2006.\n\n[11] Jiarong Jiang, Piyush Rai, and Hal Daum\u00e9 III. Message-passing for approximate MAP inference with\n\nlatent variables. In Advances in Neural Information Processing Systems 24, 2011.\n\n[12] Junkyu Lee, Radu Marinescu, Rina Dechter, and Alexander T. Ihler. From exact to anytime solutions for\nmarginal MAP. In Proceedings of the Thirtieth AAAI Conference on Arti\ufb01cial Intelligence, AAAI, 2016.\n[13] Qiang Liu and Alexander T. Ihler. Variational algorithms for marginal MAP. Journal of Machine Learning\n\nResearch, 14, 2013.\n\n[14] Radu Marinescu, Rina Dechter, and Alexander Ihler. Pushing forward marginal map with best-\ufb01rst search.\n\nIn Proceedings of the 24th International Conference on Arti\ufb01cial Intelligence (IJCAI), 2015.\n\n[15] Radu Marinescu, Rina Dechter, and Alexander T. Ihler. AND/OR search for marginal MAP. In Proceedings\n\nof the Thirtieth Conference on Uncertainty in Arti\ufb01cial Intelligence, UAI, 2014.\n\n[16] Denis Deratani Mau\u00e1 and Cassio Polpo de Campos. Anytime marginal MAP inference. In Proceedings of\n\nthe 29th International Conference on Machine Learning, ICML, 2012.\n\n[17] James D. Park and Adnan Darwiche. Solving map exactly using systematic search. In Proceedings of the\n\nNineteenth Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), 2003.\n\n[18] James D. Park and Adnan Darwiche. Complexity results and approximation strategies for map explanations.\n\nJ. Artif. Int. Res., 2004.\n\n[19] Wei Ping, Qiang Liu, and Alexander T. Ihler. Decomposition bounds for marginal MAP. In Advances in\n\nNeural Information Processing Systems 28, 2015.\n\n[20] Daniel Sheldon, Bistra N. Dilkina, Adam N. Elmachtoub, Ryan Finseth, Ashish Sabharwal, Jon Conrad,\nCarla P. Gomes, David B. Shmoys, William Allen, Ole Amundsen, and William Vaughan. Maximizing the\nspread of cascades using network design. In UAI, 2010.\n\n[21] Shan Xue, Alan Fern, and Daniel Sheldon. Scheduling conservation designs for maximum \ufb02exibility via\n\nnetwork cascade optimization. J. Artif. Intell. Res. (JAIR), 2015.\n\n[22] Yexiang Xue, Ian Davies, Daniel Fink, Christopher Wood, and Carla P. Gomes. Avicaching: A two\nstage game for bias reduction in citizen science. In Proceedings of the 15th International Conference on\nAutonomous Agents and Multiagent Systems (AAMAS), 2016.\n\n9\n\n\f", "award": [], "sourceid": 637, "authors": [{"given_name": "Yexiang", "family_name": "Xue", "institution": "Cornell University"}, {"given_name": "Zhiyuan", "family_name": "Li", "institution": "Tsinghua University"}, {"given_name": "Stefano", "family_name": "Ermon", "institution": "Stanford"}, {"given_name": "Carla", "family_name": "Gomes", "institution": "Cornell University"}, {"given_name": "Bart", "family_name": "Selman", "institution": "Cornell University"}]}