{"title": "Tractable Operations for Arithmetic Circuits of Probabilistic Models", "book": "Advances in Neural Information Processing Systems", "page_first": 3936, "page_last": 3944, "abstract": "We consider tractable representations of probability distributions and the polytime operations they support. In particular, we consider a recently proposed arithmetic circuit representation, the Probabilistic Sentential Decision Diagram (PSDD). We show that PSDD supports a polytime multiplication operator, while they do not support a polytime operator for summing-out variables. A polytime multiplication operator make PSDDs suitable for a broader class of applications compared to arithmetic circuits, which do not in general support multiplication. As one example, we show that PSDD multiplication leads to a very simple but effective compilation algorithm for probabilistic graphical models: represent each model factor as a PSDD, and then multiply them.", "full_text": "Tractable Operations for\n\nArithmetic Circuits of Probabilistic Models\n\nYujia Shen and Arthur Choi and Adnan Darwiche\n\n{yujias,aychoi,darwiche}@cs.ucla.edu\n\nComputer Science Department\n\nUniversity of California\nLos Angeles, CA 90095\n\nAbstract\n\nWe consider tractable representations of probability distributions and the polytime\noperations they support. In particular, we consider a recently proposed arithmetic\ncircuit representation, the Probabilistic Sentential Decision Diagram (PSDD). We\nshow that PSDDs support a polytime multiplication operator, while they do not\nsupport a polytime operator for summing-out variables. A polytime multiplication\noperator makes PSDDs suitable for a broader class of applications compared to\nclasses of arithmetic circuits that do not support multiplication. As one example,\nwe show that PSDD multiplication leads to a very simple but effective compilation\nalgorithm for probabilistic graphical models: represent each model factor as a\nPSDD, and then multiply them.\n\n1\n\nIntroduction\n\nArithmetic circuits (ACs) have been a central representation for probabilistic graphical models,\nsuch as Bayesian networks and Markov networks. On the reasoning side, some state-of-the-art\napproaches for exact inference are based on compiling probabilistic graphical models into arithmetic\ncircuits [Darwiche, 2003]; see also Darwiche [2009, chapter 12]. Such approaches can exploit\nparametric structure (such as determinism and context-speci\ufb01c independence), allowing inference to\nscale sometimes to models with very high treewidth, which are beyond the scope of classical inference\nalgorithms such as variable elimination and jointree. For example, the ace system for compiling\nACs [Chavira and Darwiche, 2008] was the only system in the UAI\u201908 evaluation of probabilistic\nreasoning systems to exactly solve all 250 networks in a challenging (very high-treewidth) suite of\nrelational models [Darwiche et al., 2008].\nOn the learning side, arithmetic circuits have become a popular representation for learning from\ndata, as they are tractable for certain probabilistic queries. For example, there are algorithms for\nlearning ACs of Bayesian networks [Lowd and Domingos, 2008], ACs of Markov networks [Lowd\nand Rooshenas, 2013, Bekker et al., 2015] and Sum-Product Networks (SPNs) [Poon and Domingos,\n2011], among other related representations.1\nDepending on their properties, different classes of ACs are tractable for different queries and opera-\ntions. Among these queries are maximum a posteriori (MAP) inference,2 which is an NP-complete\nproblem, and evaluating the partition function, which is a PP-complete problem (more intractable).\nAmong operations, the multiplication of two ACs stands out as particularly important, being a primi-\ntive operation in some approaches to incremental or adaptive inference [Delcher et al., 1995, Acar\net al., 2008], bottom-up compilation of probabilistic graphical models [Choi et al., 2013], and some\nsearch-based approaches to structure learning [Bekker et al., 2015].\n\n1SPNs can be converted into ACs (and vice-versa) with linear size and time [Rooshenas and Lowd, 2014].\n2This is also known as most probable explanation (MPE) inference [Pearl, 1988].\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fIn this paper, we investigate the tractability of two fundamental operations on arithmetic circuits:\nmultiplying two ACs and summing out a variable from an AC. We show that both operations are\nintractable for some in\ufb02uential ACs that have been employed in the probabilistic reasoning and\nlearning literatures. We then consider a recently proposed sub-class of ACs, called the Probabilistic\nSentential Decision Diagram (PSDD) [Kisa et al., 2014]. We show that PSDDs support a polytime\nmultiplication operation, which makes them suitable for a broader class of applications. We also show\nthat PSDDs do not support a polytime summing-out operation (a primitive operation for message-\npassing inference algorithms). We empirically illustrate the advantages of PSDDs compared to\nother AC representations, for compiling probabilistic graphical models. Previous approaches for\ncompiling probabilistic models into ACs are based on encoding these models into auxiliary logical\nrepresentations, such as a Sentential Decision Diagram (SDD) or a deterministic DNNF circuits,\nwhich are then converted to an AC [Chavira and Darwiche, 2008, Choi et al., 2013]. PSDDs are\na direct representation of probability distributions, bypassing the overhead of intermediate logical\nrepresentations, and leading to more ef\ufb01cient compilations in some cases. Most importantly though,\nthis approach lends itself to a signi\ufb01cantly simpler compilation algorithm: represent each factor of a\ngiven model as a PSDD, and then multiply the factors using PSDD multiplication.\nThis paper is organized as follows. In Section 2, we review arithmetic circuits (ACs) as a representa-\ntion of probability distributions, including PSDDs in particular. In Section 3, we introduce a polytime\nmultiplication operator for PSDDs, and in Section 4, we show that there is no polytime sum-out\noperator for PSDDs. In Section 5, we propose a simple compilation algorithm for PSDDs based\non the multiply operator, which we evaluate empirically. We discuss related work in Section 6 and\n\ufb01nally conclude in Section 7. Proofs of theorems are available in the Appendix.\n\n2 Representing Distributions Using Arithmetic Circuits\n\nWe start with the de\ufb01nition of factors, which include distributions as a special case.\n\nDe\ufb01nition 1 (Factor) A factor f (X) over variables X maps each instantiation x of variables X\n\ninto a non-negative number f (x). The factor represents a distribution whenPx f (x) = 1.\nWe de\ufb01ne the value of a factor at a partial instantiation y, where Y \u2713 X, as f (y) =Pz f (yz),\nwhere Z = X \\ Y. When the factor is a distribution, f (y) corresponds to the probability of evidence\ny. We also de\ufb01ne the MAP instantiation of a factor as argmaxx f (x), which corresponds to the most\nlikely instantiation when the factor is a distribution.\nThe classical, tabular representation of a factor f (X) is exponential in the number of variables X.\nHowever, one can represent such factors more compactly using arithmetic circuits.\nDe\ufb01nition 2 (Arithmetic Circuit) An arithmetic circuit AC(X) over variables X is a rooted DAG\nwhose internal nodes are labeled with + or \u21e4 and whose leaf nodes are labeled with either indicator\nvariables x or non-negative parameters \u2713. The value of the circuit at instantiation x, denoted\nAC(x), is obtained by assigning indicator x the value 1 if x is compatible with instantiation x and\n0 otherwise, then evaluating the circuit in the standard way. The circuit AC(X) represents factor\nf (X) iff AC(x) = f (x) for each instantiation x.\nA tractable arithmetic circuit allows one to ef\ufb01ciently answer certain queries about the factor it\nrepresents. We next discuss two properties that lead to tractable arithmetic circuits. The \ufb01rst is\ndecomposability [Darwiche, 2001b], which was used for probabilistic reasoning in [Darwiche, 2003].\n\nDe\ufb01nition 3 (Decomposability) Let n be a node in an arithmetic circuit AC(X). The variables of\nn, denoted vars(n), are the variables X 2 X with some indicator x appearing at or under node\nn. An arithmetic circuit is decomposable iff every pair of children c1 and c2 of a \u21e4-node satis\ufb01es\nvars(c1) \\ vars(c2) = ;.\nThe second property is determinism [Darwiche, 2001a], which was also employed for probabilistic\nreasoning in Darwiche [2003].\nDe\ufb01nition 4 (Determinism) An arithmetic circuit AC(X) is deterministic iff each +-node has at\nmost one non-zero input when the circuit is evaluated under any instantiation x of the variables X.\n\n2\n\n\fA third property called smoothness is also desirable as it simpli\ufb01es the statement of certain AC\nalgorithms, but is less important for tractability as it can be enforced in polytime [Darwiche, 2001a].\nDe\ufb01nition 5 (Smoothness) An arithmetic circuit AC(X) is smooth iff it contains at least one indi-\ncator for each variable in X, and for each child c of +-node n, we have vars(n) = vars(c).\n\nDecomposability and determinism lead to tractability in the following sense. Let Pr (X) be a\ndistribution represented by a decomposable, deterministic and smooth arithmetic circuit AC(X).\nThen one can compute the following queries in time that is linear in the size of circuit AC(X): the\nprobability of any partial instantiation, Pr (y), where Y \u2713 X [Darwiche, 2003] and the most likely\ninstantiation, argmaxx Pr (x) [Chan and Darwiche, 2006]. The decision problems of these queries\nare known to be PP-complete and NP-complete for Bayesian networks [Roth, 1996, Shimony, 1994].\nA number of methods have been proposed for\ncompiling a Bayesian network into a decompos-\nable, deterministic and smooth AC that repre-\nsents its distribution [Darwiche, 2003]. Figure 1\ndepicts such a circuit that represents the distribu-\ntion of Bayesian network A ! B. One method\nensures that the size of the AC is proportional to\nthe size of a jointree for the network. Another\nmethod yields circuits that can sometimes be ex-\nponentially smaller, and is implemented in the\npublicly available ace system [Chavira and Dar-\nwiche, 2008]; see also Darwiche et al. [2008].\n\nFigure 1: An AC for a Bayesian network A ! B.\nAdditional methods are discussed in Darwiche [2009, chapter 12].\nThis work is motivated by the following limitation of these tractable circuits, which may narrow their\napplicability in probabilistic reasoning and learning.\nDe\ufb01nition 6 (Multiplication) The product of two arithmetic circuits AC1(X) and AC2(X) is an\narithmetic circuit AC(X) such that AC(x) = AC1(x)AC2(x) for every instantiation x.\nTheorem 1 Computing the product of two decomposable ACs is NP-hard if the product is also\ndecomposable. Computing the product of two decomposable and deterministic ACs is NP-hard if the\nproduct is also decomposable and deterministic.\n\nWe now investigate a newly introduced class of tractable ACs, called the Probabilistic Sentential\nDecision Diagram (PSDD) [Kisa et al., 2014]. In particular, we show that this class of circuits admits\na tractable product operation and then explore an application of this operation to exact inference in\nprobabilistic graphical models.\nPSDDs were motivated by the need to represent probability distributions Pr (X) with many instantia-\ntions x attaining zero probability, Pr (x) = 0. Consider the distribution Pr (X) in Figure 2(a) for an\nexample. The \ufb01rst step in constructing a PSDD for this distribution is to construct a special Boolean\ncircuit that captures its zero entries; see Figure 2(b). The Boolean circuit captures zero entries in the\nfollowing sense. For each instantiation x, the circuit evaluates to 0 at instantiation x iff Pr (x) = 0.\nThe second and \ufb01nal step of constructing a PSDD amounts to parameterizing this Boolean circuit\n(e.g., by learning them from data), by including a local distribution on the inputs of each or-gate; see\nFigure 2(c).\nThe Boolean circuit underlying a PSDD is known as a Sentential Decision Diagram (SDD) [Darwiche,\n2011]. These circuits satisfy speci\ufb01c syntactic and semantic properties based on a binary tree, called\na vtree, whose leaves correspond to variables; see Figure 2(d). The following de\ufb01nition of SDD\ncircuits is a based on the one given by Darwiche [2011] and uses a different notation.\n\nDe\ufb01nition 7 (SDD) An SDD normalized for a vtree v is a Boolean circuit de\ufb01ned as follows. If v is\na leaf node labeled with variable X, then the SDD is either X, \u00acX, ? or an or-gate with inputs X\nand \u00acX. If v is an internal vtree node, then the SDD has the structure in Figure 3, where p1, . . . , pn\nare SDDs normalized for the left child vl and s1, . . . , sn are SDDs normalized for the right child vr.\nMoreover, the circuits p1, . . . , pn are consistent, mutually exclusive and exhaustive.\n\n3\n\n****+++****(cid:79)a(cid:79)a b(cid:79)b(cid:79)a(cid:84)a(cid:84)ab|(cid:84)ab|(cid:84)ab|(cid:84)ab|(cid:84)\fA B C P r\n0.2\n0\n0.2\n0\n0\n0.0\n0.1\n0\n0.0\n1\n0.3\n1\n0.1\n1\n0.1\n1\n(a) Distribution\n\n0\n1\n0\n1\n0\n1\n0\n1\n\n0\n0\n1\n1\n0\n0\n1\n1\n\n3\n\n3\n\n.6\n\n.4\n\n1\n\n4\n\n1\n\nC\n\nC \u00acC\n\n1\n\n.33\n\n.67\n\n4\n\n.5\n.5\nC \u00acC\n\n1\n\nC\n\n.75\n\n.25\n\nA B \u00acA\u00acB\n\nA \u00acB\u00acA B\n\n(b) SDD\n\nA B \u00acA\u00acB\n\nA \u00acB\u00acA B\n\n(c) PSDD\n\n3 \n\n1 \n\n4 \nC \n\n0 \nA \n\n2 \nB \n\n(d) Vtree\n\nFigure 2: A probability distribution and its SDD/PSDD representation. Note that the numbers\nannotating or-gates in (b) & (c) correspond to vtree node IDs in (d). Further, note that while the\ncircuit appears to be a tree, the input variables are shared and hence the circuit is not a tree.\n\n\u21b5n\n\n\u00b7 \u00b7 \u00b7\n\n\u21b51\n\u21b52\n\n\u00b7 \u00b7 \u00b7\n\np1 s1 p2 s2\n\npn sn\n\nFigure 3: Each (pi, si,\u21b5 i)\nis called an element of the\nor-gate, where the pi\u2019s are\ncalled primes and the si\u2019s\nare called subs. Moreover,\n\nPi \u21b5i = 1 and exactly one\n\npi evaluates to 1 under any\ncircuit input.\n\nSDD circuits alternate between or-gates and and-gates. Their and-\ngates have two inputs each. The or-gates of these circuits are such\nthat at most one input will be high under any circuit input. An SDD\ncircuit may produce a 1-output for every possible input (i.e., the circuit\nrepresents the function true). These circuits arise when representing\nstrictly positive distributions (with no zero entries).\nA PSDD is obtained by including a distribution \u21b51, . . . ,\u21b5 n on the\ninputs of each or-gate; see Figure 3. The semantics of PSDDs are\ngiven in [Kisa et al., 2014].3 We next provide an alternative semantics,\nwhich is based on converting a PSDD into an arithmetic circuit.\n\nDe\ufb01nition 8 (ACs of PSDDs) The arithmetic circuit of a PSDD is\nobtained as follows. Leaf nodes x and ? are converted into x and 0,\nrespectively. Each and-gate is converted into a \u21e4-node. Each or-node\nwith children c1, . . . , cn and corresponding parameters \u21b51, . . . ,\u21b5 n is\nconverted into a +-node with children \u21b51 \u21e4 c1, . . . , \u21b5n \u21e4 cn.\nTheorem 2 The arithmetic circuit of a PSDD represents the distribu-\ntion induced by the PSDD. Moreover, the arithmetic circuit is decom-\nposable and deterministic.4\n\nThe PSDD is a complete and canonical representation of probability\ndistributions. That is, PSDDs can represent any distribution, and there is a unique PSDD for that distri-\nbution (under some conditions). A variety of probabilistic queries are tractable on PSDDs, including\nthat of computing the probability of a partial variable instantiation and the most likely instantiation.\nMoreover, the maximum likelihood parameter estimates of a PSDD are unique given complete data,\nand these parameters can be computed ef\ufb01ciently using closed-form estimates; see [Kisa et al.,\n2014] for details. Finally, PSDDs have been used to learn distributions over combinatorial objects,\nincluding rankings and permutations [Choi et al., 2015], paths and games [Choi et al., 2016]. In these\napplications, the Boolean circuit underlying a PSDD captures variable instantiations that correspond\nto combinatorial objects, while its parameterization induces a distribution over these objects.\nAs a concrete example, PSDDs were used to induce distributions over the permutations of n items as\nfollows. We have a variable Xij for each i, j 2{ 1, . . . , n} denoting that item i is at position j in the\npermutation. Clearly, not all instantiations of these variables correspond to (valid) permutations. An\nSDD circuit is then constructed, which outputs 1 iff the corresponding input corresponds to a valid\npermutation. Each parameterization of this SDD circuit leads to a distribution on permutations and\nthese parameterizations can be learned from data; see Choi et al. [2015].\n\n3Let x be an instantiation of PSDD variables. If the SDD circuit outputs 0 at input x, then Pr (x) = 0.\nOtherwise, traverse the circuit top-down, visiting the (unique) high input of each visited or-node, and all inputs\nof each visited and-node. Then Pr (x) is the product of parameters visited during the traversal process.\n\n4The arithmetic circuit also satis\ufb01es a minor weakening of smoothness with the same effect as smoothness.\n\n4\n\n\f3 Multiplying Two PSDDs\n\nFactors and their operations are fundamental to probabilistic inference, whether exact or approximate\n[Darwiche, 2009, Koller and Friedman, 2009]. Consider two of the most basic operations on factors:\n(1) computing the product of two factors and (2) summing out a variable from a factor. With these\noperations, one can directly implement various inference algorithms, including variable elimination,\nthe jointree algorithm, and message-passing algorithms such as loopy belief propagation. Typically,\ntabular representations (and their sparse variations) are used to represent factors and implement\nthe above algorithms; see Larkin and Dechter [2003], Sanner and McAllester [2005], Chavira and\nDarwiche [2007] for some alternatives.\nMore generally, factor multiplication is useful for online or incremental reasoning with probabilistic\nmodels. In some applications, we may not have access to all factors of a model beforehand, to\ncompile as a jointree or an arithmetic circuit. For example, when learning the structure of a Markov\nnetwork from data [Bekker et al., 2015], we may want to introduce and remove candidate factors from\na model, while evaluating the changes to the log likelihood. Certain realizations of generalized belief\npropagation also require the multiplication of factors [Yedidia et al., 2005, Choi and Darwiche, 2011].\nIn these realizations, one can use factor multiplication to enforce dependencies between factors that\nhave been relaxed to make inference more tractable, albeit less accurate.\nWe next discuss PSDD multiplication, while deferring summing out to the following section.\n\n, \uf8ff {} , 0\nfor all elements (p, s, \u21b5) of n1 do\n\nAlgorithm 1 Multiply(n1, n2, v)\ninput: PSDDs n1, n2 normalized for vtree v\noutput: PSDD n and constant \uf8ff\nmain:\n1: n, k cachem(n1, n2), cachec(n1, n2)\n2: if n 6= null then return (n, k)\n3: else if v is a leaf then (n, \uf8ff) BaseCase(n1, n2)\n4: else\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16: cachem(n1, n2) n\n17: cachec(n1, n2) \uf8ff\n18: return (n, \uf8ff)\n\nfor all elements (q, r, ) of n2 do\n(m1, k1) Multiply(p, q, vl)\nif k1 6= 0 then\n\n(m2, k2) Multiply(s, r, vr)\n\u2318 k1 \u00b7 k2 \u00b7 \u21b5 \u00b7 \n\uf8ff \uf8ff + \u2318\nadd (m1, m2,\u2318 ) to \n\n { (m1, m2,\u2318/\uf8ff ) | (m1, m2,\u2318 ) 2 }\nn unique PSDD node with elements \n\n. check if previously computed\n. return previously cached result\n.n 1, n2 are literals, ? or simple or-gates\n.n 1 and n2 have the structure in Figure 3\n. initialization\n. see Figure 3\n. see Figure 3\n. recursively multiply primes p and q\n. if (m1, k1) is not a trivial factor\n. recursively multiply subs s and r\n. compute weight of element (m1, m2)\n. aggregate weights of elements\n\n. normalize parameters of \n. cache lookup for unique nodes\n\n. store results in cache\n\nOur \ufb01rst observation is that the product of two distributions is generally not a distribution, but a factor.\nMoreover, a factor f (X) can always be represented by a distribution Pr (X) and a constant \uf8ff such\nthat f (x) = \uf8ff \u00b7 Pr (x). Hence, our proposed multiplication method will output a PSDD together\nwith a constant, as given in Algorithm 1. This algorithm uses three caches, one for storing constants\n(cachec), another for storing circuits (cachem), and a third used to implement Line 15.5 This line\nensures that the PSDD has no duplicate structures of the form given in Figure 3. The description\nof function BaseCase() on Line 3 is available in the Appendix. It appears inside the proof of the\nfollowing theorem, which establishes the soundness and complexity of the given algorithm.\n\nTheorem 3 Algorithm 1 outputs a PSDD n normalized for vtree v. Moreover, if Pr 1(X) and Pr 2(X)\nare the distributions of input PSDDs n1 and n2, and Pr (X) is the distribution of output PSDD n,\nthen Pr 1(x)Pr 2(x) = \uf8ff \u00b7 Pr (x) for every instantiation x. Finally, Algorithm 1 takes time O(s1s2),\nwhere s1 and s2 are the sizes of input PSDDs.\n\n5The cache key of a PSDD node in Figure 3 is based on the (unique) ID\u2019s of nodes pi/si and parameters \u21b5i.\n\n5\n\n\fWe will later discuss an application of PSDD multiplication to probabilistic inference, in which we\ncascade these multiplication operations. In particular, we end up multiplying two factors f1 and f2,\nrepresented by PSDDs n1 and n2 and the corresponding constants \uf8ff1 and \uf8ff2. We use Algorithm 1\nfor this purpose, multiplying PSDDs n1 and n2 (distributions), to yield a PSDD n (distribution) and\na constant \uf8ff. The factor f1f2 will then correspond to PSDD n and constant \uf8ff \u00b7 \uf8ff1 \u00b7 \uf8ff2.\n\nA\n\nA\n\nA\n\nG\n\nG K H D\n\nB\n\nG F K\n\nB J H\n\nH D\n\nE C\n\nI D\n\nFigure 4: A vtree and two of its projections.\n\nAnother observation is that Algorithm 1 assumes\nthat the input PSDDs are over the same vtree and,\nhence, same set of variables. A more detailed ver-\nsion of this algorithm can multiply two PSDDs\nover different sets of variables as long as the PS-\nDDs have compatible vtrees. We omit this version\nhere to simplify the presentation, but mention that\nit has the same complexity as Algorithm 1.\nTwo vtrees over variables X and Y are compatible\niff they can be obtained by projecting some vtree\non variables X and Y, respectively.\n\nDe\ufb01nition 9 (Vtree Projection) Let v be a vtree\nover variables Z. The projection of v on variables\nX \u2713 Z is obtained as follows. Successively remove every maximal subtree v0 whose variables are\noutside X, while replacing the parent of v0 with its sibling.\n\nFigure 4 depicts a vtree and two of its projections. When compiling a probabilistic graphical model\ninto a PSDD, we \ufb01rst construct a vtree v over all variables in the model. We then compile each factor\nf (X) into a PSDD, using the projection of v on variables X. We \ufb01nally multiply the PSDDs of these\nfactors. We will revisit these steps later.\n\n4 Summing-Out a Variable in a PSDD\n\nWe now discuss the summing out of variables from distributions represented by arithmetic circuits.\n\nDe\ufb01nition 10 (Sum Out) Summing-out a variable X 2 X from factor f (X) results in another fac-\n\ntor over variables Y = X\\{X}, denoted byPX f and de\ufb01ned as:\u21e3PX f\u2318(y) def= Px f (x, y).\n\nWhen the factor is a distribution (i.e., normalized), the sum out operation corresponds to marginaliza-\ntion. Together with multiplication, summing out provides a direct implementation of algorithms such\nas variable elimination and those based on message passing.\nJust like multiplication, summing out is also intractable for a common class of arithmetic circuits.\n\nTheorem 4 The sum-out operation on decomposable and deterministic ACs is NP-hard, assuming\nthe output is also decomposable and deterministic.\n\nThis theorem does not preclude the possibility that the resulting AC is of polynomial size with respect\nto the size of the input AC\u2014it just says that the computation is intractable. Summing out is also\nintractable on PSDDs, but the result is stronger here as the size of the output can be exponential.\n\nTheorem 5 There exists a class of factors f (X) and variable X 2 X, such that n = |X| can be\narbitrarily large, f (X) has a PSDD whose size is linear in n, while the PSDD ofPX f has size\nexponential in n for every vtree.\n\nOnly the multiplication operation is needed to compile probabilistic graphical models into arithmetic\ncircuits. Even for inference algorithms that require summing out variables, such as variable elimina-\ntion, summing out can still be useful, even if intractable, since the size of resulting arithmetic circuit\nwill not be larger than a tabular representation.\n\n6\n\n\f5 Compiling Probabilistic Graphical Models into PSDDs\n\nEven though PSDDs form a strict subclass of decomposable and deterministic ACs (and satisfy\nstronger properties), one can still provide the following classical guarantee on PSDD size.\n\nTheorem 6 The interaction graph of factors f1(X1), . . . , fn(Xn) has nodes corresponding to vari-\nables X1 [ . . . [ Xn and an edge between two variables iff they appear in the same factor. There is\na PSDD for the product f1 . . . fn whose size is O(m \u00b7 exp(w)), where m is the number of variables\nand w is its treewidth.\n\nThis theorem provides an upper bound on the size of PSDD compilations for both Bayesian and\nMarkov networks. An analogous guarantee is available for SDD circuits of propositional models,\nusing a special type of vtree known as a decision vtree [Oztok and Darwiche, 2014]. We next discuss\nour experiments, which focused on the compilation of Markov networks using decision vtrees.\nTo compile a Markov network, we \ufb01rst construct a decision vtree using a known technique.6 For each\nfactor of the network, we project the vtree on the factor variables, and then compile the factor into\na PSDD. This can be done in time linear in the factor size, but we omit the details here. We \ufb01nally\nmultiply the obtained PSDDs. The order of multiplication is important to the overall ef\ufb01ciency of the\ncompilation approach. The order we used is as follows. We assign each PSDD to the lowest vtree\nnode containing the PSDD variables, and then multiply PSDDs in the order that we encounter them\nas we traverse the vtree bottom-up (this is analogous to compiling CNFs in Choi et al. [2013]).\nTable 1 summarizes our results. We compiled Markov networks into three types of arithmetic circuits.\nThe \ufb01rst compilation (AC1) is to decomposable and deterministic ACs using ace [Chavira and\nDarwiche, 2008].7 The second compilation (AC2) is also to decomposable and deterministic ACs, but\nusing the approach proposed in Choi et al. [2013]. The third compilation is to PSDDs as discussed\nabove. The \ufb01rst two approaches are based on reducing the inference problem into a weighted model\ncounting problem. In particular, these approaches encode the network using Boolean expressions,\nwhich are compiled to logical representations (d-DNNF or SDD), from which an arithmetic circuit is\ninduced. The systems underlying these approaches are quite complex and are the result of many years\nof engineering. In contrast, the proposed compilation to PSDDs does not rely on an intermediate\nrepresentation or additional boxes, such as d-DNNF or SDD compilers.\nThe benchmarks in Table 1 are from the UAI-14 Inference Competition.8 We selected all networks\nover binary variables in the MAR track, and report a network only if at least one approach successfully\ncompiled it (given time and space limits of 30 minutes and 16GB). We report the size (the number\nof edges) and time spent for each compilation. First, we note that for all benchmarks that compiled\nto both PSDD and AC2 (based on SDDs), the PSDD size is always smaller. This can be attributed\nin part to the fact that reductions to weighted model counting represent parameters explicitly as\nvariables, which are retained throughout the compilation process. In contrast, PSDD parameters are\nannotated on its edges. More interestingly, when we multiply two PSDD factors, the parameters of\nthe inputs may not persist in the output PSDD. That is, the PSDD only maintains enough parameters\nto represent the resulting distribution, which further explains the size differences.\nIn the Promedus benchmarks, we also see that in all but 5 cases, the compiled PSDD is smaller than\nAC1. However, several Grids benchmarks were compilable to AC1, but failed to compile to AC2\nor PSDD, given the time and space limits. On the other hand, we were able to compile some of the\nrelational benchmarks to PSDD, which did not compile to AC1 and compiled partially to AC2.\n\n6 Related Work\n\nTabular representations and their sparse variations (e.g., Larkin and Dechter [2003]) are typically\nused to represent factors for probabilistic inference and learning. Rules and decision trees are more\nsuccinct representations for modeling context-speci\ufb01c independence, although they are not much more\namenable to exact inference compared to tabular representations [Boutilier et al., 1996, Friedman and\nGoldszmidt, 1998]. Domain speci\ufb01c representations have been proposed, e.g., in computer vision\n\n6We used the minic2d package which is available at reasoning.cs.ucla.edu/minic2d/.\n7The ace system is publicly available at http://reasoning.cs.ucla.edu/ace/.\n8http://www.hlt.utdallas.edu/~vgogate/uai14-competition/index.html\n\n7\n\n\fnetwork\n\nAC2\n\n201,250\n\ncompilation size\n\nAC2\n\npsdd\n\nAlchemy_11\nGrids_11\nGrids_12\nGrids_13\nGrids_14\n\nAC1\n12,705,213\n81,074,816\n232,496\n81,090,432\n83,186,560\n\ncompilation time\nAC1\n- 13,715,906 130.83\n- 271.97\n-\n0.93\n457,529\n- 273.88\n-\n-\n- 279.12\n\npsdd\n- 300.80\n-\n-\n1.68\n1.12\n-\n-\n-\n-\n72.39 204.54 223.60\nSegmentation_11 20,895,884 41,603,129 30,951,708\nSegmentation_12 15,840,404 41,005,721 34,368,060\n51.27 209.03 283.79\nSegmentation_13 33,746,511 78,028,443 33,726,812 117.46 388.97 255.29\n62.31 279.19 639.07\nSegmentation_14 16,965,928 48,333,027 46,363,820\n- 273.67\nSegmentation_15 29,888,972\n65.64 265.07 163.38\nSegmentation_16 18,799,112 54,557,867 19,935,308\n41,070\n10.43\n- 594.68\n217,696\n2.28\n30,542\n2.46\n48,814\n3.94\n26,100\n24.90\n749,528\n1.52\n9,520\n29,150\n2.06\n50.22\n1,549,170\n\nrelational_3\nrelational_5\nPromedus_11\nPromedus_12\nPromedus_13\nPromedus_14\nPromedus_15\nPromedus_16\nPromedus_17\n\n-\n-\n67,036\n45,119\n42,065\n2,354,180\n14,363\n45,935\n3,336,316\n\n183,064\n-\n174,592\n349,916\n83,701\n3,667,740\n31,176\n154,467\n9,849,598\n\n-\n-\n6.80\n0.91\n0.80\n21.64\n0.95\n1.35\n68.08\n\n- 33,866,332 107.87\n\n1.21\n\n1.88\n5.81\n0.23\n33.27\n0.10\n0.40\n48.47\n\nTable 1: AC compilation size (number of edges) and time (in seconds)\n\ncompilation size\n\ncompilation time\n\nAC2\n\n9,085\n\n4,774\n\nnetwork\n\n188,322\n31,911\n39,016\n\nAC1\nPromedus_18 3,006,654\nPromedus_19\nPromedus_20\nPromedus_21\nPromedus_22\nPromedus_23\nPromedus_24\nPromedus_25\nPromedus_26\nPromedus_27\nPromedus_28\nPromedus_29\nPromedus_30\nPromedus_31\nPromedus_32\nPromedus_33\nPromedus_34\nPromedus_35\nPromedus_36\nPromedus_37\nPromedus_38\n\nAC2 psdd\npsdd AC1\n18.38 21.20\n762,247 539,478 20.46\n25.01 68.62\n6.80\n3.46\n3.24\n0.96\n1.78\n0.18\n0.62\n0.10\n1.58\n0.63\n17.77 10.88\n3.29\n0.80\n0.05\n0.45\n6.78\n7.66\n32.90\n2.72\n0.71 198.74\n1.16\n0.55\n0.73\n1.59\n0.30\n1.04\n0.54\n0.08\n1.88\n1.23\n0.07\n0.50\n1.96\n0.12\n0.57\n1.77\n0.11\n0.59\n1.57\n0.07\n0.59\n1.78\n0.78\n0.87\n1.79\n0.13\n0.68\n1.22\n0.12\n1.91\n6.15\n3.50\n1.49\n1.67\n17.19\n8.09\n\n796,928 1,171,288 977,510\n70,492\n70,422\n10,944\n17,528\n26,010\n33,064\n329,669 1,473,628 317,514\n1,960\n556,179 3,614,581 407,974\n5,146\n57,190\n24,578\n19,434\n33,611\n52,698\n17,084\n24,049\n46,364\n10,403\n4,828\n20,600\n6,734\n9,884\n21,230\n10,842\n17,977\n31,754\n8,682\n15,215\n33,064\n4,006\n10,734\n18,535\n21,398\n38,113\n54,214\n11,120\n18,765\n31,792\n19,175\n11,004\n31,792\n144,664\n77,088\n79,210\n177,560\n593,675 123,552\n\n[Felzenszwalb and Huttenlocher, 2006], to allow for more ef\ufb01cient factor operations. Algebraic\nDecision Diagrams (ADDs) and Algebraic Sentential Decision Diagrams (ASDDs) can also be used\nto multiply two factors in polytime [Bahar et al., 1993, Herrmann and de Barros, 2013], but their sizes\ncan grow quickly with repeated multiplications: ADDs have a distinct node for each possible value\nof a factor/distribution. Since ADDs also support a polytime summing-out operation, ADDs are more\ncommonly used in the context of variable elimination [Sanner and McAllester, 2005, Chavira and\nDarwiche, 2007], and in message passing algorithms [Gogate and Domingos, 2013]. Probabilistic\nDecision Graphs (PDGs) and AND/OR Multi-Valued Decision Diagrams (AOMDD) support a\npolytime multiply operator, and also have treewidth upper bounds when compiling probabilistic\ngraphical models [Jaeger, 2004, Mateescu et al., 2008]. Both PDGs and AOMDDs can be viewed as\nsub-classes of PSDDs that branch on variables instead of sentences as is the case with PSDDs\u2014this\ndistinction can lead to exponential reductions in size [Xue et al., 2012, Bova, 2016].\n\n7 Conclusion\n\nWe considered the tractability of multiplication and summing-out operators for arithmetic circuits\n(ACs), as tractable representations of factors and distributions. We showed that both operations are\nintractable for deterministic and decomposable ACs (under standard complexity theoretic assump-\ntions). We also showed that for a sub-class of ACs, known as PSDDs, a polytime multiplication\noperator is supported. Moreover, we showed that PSDDs do not support summing-out in polytime\n(unconditionally). Finally, we illustrated the utility of PSDD multiplication, providing a relatively\nsimple but effective algorithm for compiling probabilistic graphical models into PSDDs.\n\nAcknowledgments\nThis work was partially supported by NSF grant #IIS-1514253 and ONR grant #N00014-15-1-2339.\n\nReferences\nU. A. Acar, A. T. Ihler, R. R. Mettu, and \u00d6. S\u00fcmer. Adaptive inference on general graphical models. In UAI,\n\npages 1\u20138, 2008.\n\nR. I. Bahar, E. A. Frohm, C. M. Gaona, G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi. Algebraic decision\n\ndiagrams and their applications. In ICCAD, pages 188\u2013191, 1993.\n\nJ. Bekker, J. Davis, A. Choi, A. Darwiche, and G. Van den Broeck. Tractable learning for complex probability\n\nqueries. In NIPS, 2015.\n\nC. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller. Context-speci\ufb01c independence in Bayesian networks.\n\nIn UAI, pages 115\u2013123, 1996.\n\nS. Bova. SDDs are exponentially more succinct than OBDDs. In AAAI, pages 929\u2013935, 2016.\nH. Chan and A. Darwiche. On the robustness of most probable explanations. In UAI, 2006.\nM. Chavira and A. Darwiche. Compiling Bayesian networks using variable elimination. In IJCAI, 2007.\n\n8\n\n\fM. Chavira and A. Darwiche. On probabilistic inference by weighted model counting. AIJ, 172(6\u20137):772\u2013799,\n\nApril 2008.\n\nA. Choi and A. Darwiche. Relax, compensate and then recover. In T. Onada, D. Bekki, and E. McCready,\n\neditors, NFAI, volume 6797 of LNCF, pages 167\u2013180. Springer, 2011.\n\nA. Choi, D. Kisa, and A. Darwiche. Compiling probabilistic graphical models using sentential decision diagrams.\n\nIn ECSQARU, pages 121\u2013132, 2013.\n\nA. Choi, G. Van den Broeck, and A. Darwiche. Tractable learning for structured probability spaces: A case\n\nstudy in learning preference distributions. In IJCAI, 2015.\n\nA. Choi, N. Tavabi, and A. Darwiche. Structured features in naive Bayes classi\ufb01cation. In AAAI, 2016.\nA. Darwiche. On the tractable counting of theory models and its application to truth maintenance and belief\n\nrevision. Journal of Applied Non-Classical Logics, 11(1-2):11\u201334, 2001a.\n\nA. Darwiche. Decomposable negation normal form. J. ACM, 48(4):608\u2013647, 2001b.\nA. Darwiche. A differential approach to inference in Bayesian networks. J. ACM, 50(3):280\u2013305, 2003.\nA. Darwiche. Modeling and Reasoning with Bayesian Networks. Cambridge University Press, 2009.\nA. Darwiche. SDD: A new canonical representation of propositional knowledge bases. In IJCAI, pages 819\u2013826,\n\n2011.\n\nA. Darwiche and P. Marquis. A knowledge compilation map. JAIR, 17:229\u2013264, 2002.\nA. Darwiche, R. Dechter, A. Choi, V. Gogate, and L. Otten. Results from the probabilistic inference evaluation\n\nof UAI-08. 2008.\n\nA. L. Delcher, A. J. Grove, S. Kasif, and J. Pearl. Logarithmic-time updates and queries in probabilistic networks.\n\nIn UAI, pages 116\u2013124, 1995.\n\nP. F. Felzenszwalb and D. P. Huttenlocher. Ef\ufb01cient belief propagation for early vision. IJCV, 70(1):41\u201354,\n\n2006.\n\nN. Friedman and M. Goldszmidt. Learning bayesian networks with local structure. In Learning in graphical\n\nmodels, pages 421\u2013459. Springer, 1998.\n\nV. Gogate and P. M. Domingos. Structured message passing. In UAI, 2013.\nR. G. Herrmann and L. N. de Barros. Algebraic sentential decision diagrams in symbolic probabilistic planning.\n\nIn Proceedings of the Brazilian Conference on Intelligent Systems (BRACIS), pages 175\u2013181, 2013.\n\nM. Jaeger. Probabilistic decision graphs \u2014 combining veri\ufb01cation and AI techniques for probabilistic inference.\n\nInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 12:19\u201342, 2004.\n\nD. Kisa, G. Van den Broeck, A. Choi, and A. Darwiche. Probabilistic sentential decision diagrams. In KR, 2014.\nD. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.\nD. Larkin and R. Dechter. Bayesian inference in the presence of determinism. In AISTATS, 2003.\nD. Lowd and P. M. Domingos. Learning arithmetic circuits. In UAI, pages 383\u2013392, 2008.\nD. Lowd and A. Rooshenas. Learning Markov networks with arithmetic circuits. In AISTATS, pages 406\u2013414,\n\n2013.\n\nR. Mateescu, R. Dechter, and R. Marinescu. AND/OR multi-valued decision diagrams (AOMDDs) for graphical\n\nmodels. J. Artif. Intell. Res. (JAIR), 33:465\u2013519, 2008.\n\nU. Oztok and A. Darwiche. On compiling CNF into decision-DNNF. In CP, pages 42\u201357, 2014.\nJ. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. MK, 1988.\nH. Poon and P. M. Domingos. Sum-product networks: A new deep architecture. In UAI, pages 337\u2013346, 2011.\nA. Rooshenas and D. Lowd. Learning sum-product networks with direct and indirect variable interactions. In\n\nICML, pages 710\u2013718, 2014.\n\nD. Roth. On the hardness of approximate reasoning. Artif. Intell., 82(1-2):273\u2013302, 1996.\nS. Sanner and D. A. McAllester. Af\ufb01ne algebraic decision diagrams (AADDs) and their application to structured\n\nprobabilistic inference. In IJCAI, pages 1384\u20131390, 2005.\n\nS. E. Shimony. Finding MAPs for belief networks is NP-hard. Artif. Intell., 68(2):399\u2013410, 1994.\nD. Sieling and I. Wegener. NC-algorithms for operations on binary decision diagrams. Parallel Processing\n\nLetters, 3:3\u201312, 1993.\n\nY. Xue, A. Choi, and A. Darwiche. Basing decisions on sentences in decision diagrams. In AAAI, pages 842\u2013849,\n\n2012.\n\nJ. Yedidia, W. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propaga-\n\ntion algorithms. IEEE Transactions on Information Theory, 51(7):2282\u20132312, 2005.\n\n9\n\n\f", "award": [], "sourceid": 1954, "authors": [{"given_name": "Yujia", "family_name": "Shen", "institution": "UCLA"}, {"given_name": "Arthur", "family_name": "Choi", "institution": "UCLA"}, {"given_name": "Adnan", "family_name": "Darwiche", "institution": "UCLA"}]}