{"title": "Tractability in Structured Probability Spaces", "book": "Advances in Neural Information Processing Systems", "page_first": 3477, "page_last": 3485, "abstract": "Recently, the Probabilistic Sentential Decision Diagram (PSDD) has been proposed as a framework for systematically inducing and learning distributions over structured objects, including combinatorial objects such as permutations and rankings, paths and matchings on a graph, etc. In this paper, we study the scalability of such models in the context of representing and learning distributions over routes on a map. In particular, we introduce the notion of a hierarchical route distribution and show how they can be leveraged to construct tractable PSDDs over route distributions, allowing them to scale to larger maps. We illustrate the utility of our model empirically, in a route prediction task, showing how accuracy can be increased significantly compared to Markov models.", "full_text": "Tractability in Structured Probability Spaces\n\nArthur Choi\n\nUniversity of California\nLos Angeles, CA 90095\naychoi@cs.ucla.edu\n\nYujia Shen\n\nUniversity of California\nLos Angeles, CA 90095\nyujias@cs.ucla.edu\n\nAdnan Darwiche\n\nUniversity of California\nLos Angeles, CA 90095\ndarwiche@cs.ucla.edu\n\nAbstract\n\nRecently, the Probabilistic Sentential Decision Diagram (PSDD) has been proposed\nas a framework for systematically inducing and learning distributions over struc-\ntured objects, including combinatorial objects such as permutations and rankings,\npaths and matchings on a graph, etc. In this paper, we study the scalability of such\nmodels in the context of representing and learning distributions over routes on\na map. In particular, we introduce the notion of a hierarchical route distribution\nand show how they can be leveraged to construct tractable PSDDs over route\ndistributions, allowing them to scale to larger maps. We illustrate the utility of\nour model empirically, in a route prediction task, showing how accuracy can be\nincreased signi\ufb01cantly compared to Markov models.\n\n1\n\nIntroduction\n\nA structured probability space is one where members of the space correspond to structured or\ncombinatorial objects, such as permutations, partial rankings, or routes on a map [Choi et al., 2015,\n2016]. Structured spaces have come into focus recently, given their large number of applications\nand the lack of systematic methods for inducing and learning distributions over such spaces. Some\nstructured objects are supported by specialized distributions, e.g., the Mallows distribution over\npermutations [Mallows, 1957, Lu and Boutilier, 2011]. For other types of objects, one is basically\non their own as far developing representations and corresponding algorithms for inference and\nlearning. Standard techniques, such as probabilistic graphical models, are not suitable for these kind\nof distributions since the constraints on such objects often lead to almost fully connected graphical\nmodels, which are not amenable to inference or learning.\nA framework known as PSDD was proposed recently for systematically inducing and learning\ndistributions over structured objects [Kisa et al., 2014a,b, Shen et al., 2016, Liang et al., 2017].\nAccording to this framework, one \ufb01rst describes members of the space using propositional logic,\nthen compiles these descriptions into Boolean circuits with speci\ufb01c properties (a circuit encodes a\nstructured space by evaluating to 1 precisely on inputs corresponding to members of the space). By\nparameterizing these Boolean circuits, one can induce a tractable distribution over objects in the\nstructured space. The only domain speci\ufb01c investment in this framework corresponds to the encoding\nof objects using propositional logic. Moreover, the only computational bottleneck in this framework\nis the compilation of propositional logic descriptions to circuits with speci\ufb01c properties, which\nare known as SDD circuits (for Sentential Decision Diagrams) [Darwiche, 2011, Xue et al., 2012].\nParameterized SDD circuits are known as a PSDDs (for Probabilistic SDDs) and have attractive\nproperties, including tractable inference and closed-form parameter estimation under complete data\n[Kisa et al., 2014a].\nMost of the focus on PSDDs has been dedicated to showing how they can systematically induce and\nlearn distributions over various structured objects. Case studies have been reported relating to total\nand partial rankings [Choi et al., 2015], game traces, and routes on a map [Choi et al., 2016]. The\nscalability of these studies varied. For partial rankings, experiments have been reported for hundreds\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fA B C P r\n0.2\n0\n0.2\n0\n0\n0.0\n0.1\n0\n0.0\n1\n0.3\n1\n0.1\n1\n0.1\n1\n(a) Distribution\n\n0\n1\n0\n1\n0\n1\n0\n1\n\n0\n0\n1\n1\n0\n0\n1\n1\n\n3\n\n3\n\n.6\n\n.4\n\n1\n\n4\n\n1\n\nC\n\n1\n\n4\n\n1\n\nC\n\nC \u00acC\n\n.33\n\n.67\n\n.5\n.5\nC \u00acC\n\n.75\n\n.25\n\nA B \u00acA\u00acB\n\nA \u00acB\u00acA B\n\nA B \u00acA\u00acB\n\nA \u00acB\u00acA B\n\n(b) SDD\n\n(c) PSDD\n\n(d) Vtree\n\nFigure 1: A probability distribution and its SDD/PSDD representation. The numbers annotating\nor-gates in (b) & (c) correspond to vtree node IDs in (d). While the circuit appears to be a tree, the\ninput variables are shared and hence the circuit is not a tree.\n\nof items. However, for total rankings and routes, the experimental studies were more of a proof of\nconcept, showing for example how the learned PSDD distributions can be superior to ones learned\nused specialized or baseline methods [Choi et al., 2015].\nIn this paper, we study a particular structured space, while focusing on computational considerations.\nThe space we consider is that of routes on a map, leading to what we call route distributions. These\ndistributions are of great practical importance as they can be used to estimate traf\ufb01c jams, predict\nspeci\ufb01c routes, and even project the impact of interventions, such as closing certain routes on a map.\nThe main contribution on this front is the notion of hierarchical simple-route distributions, which\ncorrespond to a hierarchical map representation that forces routes to be simple (no loops) at different\nlevels of the hierarchy. We show in particular how this advance leads to the notion of hierarchical\nPSDDs, allowing one to control the size of component PSDDs by introducing more levels of the\nhierarchy. This guarantees a representation of polynomial size, but at the expense of losing exactness\non some route queries. Not only does this advance the state-of-the-art for learning distributions over\nroutes, but it also suggests a technique that can potentially be applied in other contexts as well.\nThis paper is structured as follows. In Section 2, we review SDD circuits and PSDDs, and in Section 3\nwe turn to routes as a structured space and their corresponding distributions. Hierarchical distributions\nare treated in Section 4, with complexity and correctness guarantees. In Section 5, we discuss new\ntechniques for encoding and compiling a PSDD in a hierarchy. We present empirical results in\nSection 6, and \ufb01nally conclude with some remarks in Section 7.\n\n2 Probabilistic SDDs\n\nPSDDs are a class of tractable probabilistic models, which were originally motivated by the need\nto represent probability distributions Pr (X) with many instantiations x attaining zero probability,\ni.e., a structured space [Kisa et al., 2014a, Choi et al., 2015, 2016]. Consider the distribution Pr (X)\nin Figure 1(a) for an example. To construct a PSDD for such a distribution, we perform the two\nfollowing steps. We \ufb01rst construct a special Boolean circuit that captures the zero entries in the\nfollowing sense; see Figure 1(b). For each instantiation x, the circuit evaluates to 0 at instantiation x\niff Pr (x) = 0. We then parameterize this Boolean circuit by including a local distribution on the\ninputs of each or-gate; see Figure 1(c). Such parameters are often learned from data.\nThe Boolean circuit underlying a PSDD is known as a Sentential Decision Diagram (SDD) [Darwiche,\n2011]. These circuits satisfy speci\ufb01c syntactic and semantic properties based on a binary tree, called a\nvtree, whose leaves correspond to variables; see Figure 1(d). SDD circuits alternate between or-gates\nand and-gates. Their and-gates have two inputs each and satisfy a property called decomposability:\neach input depends on a different set of variables. The or-gates satisfy a property called determinism:\nat most one input will be high under any circuit input. The role of the vtree is (roughly) to determine\nwhich variables will appear as inputs for gates.\n\n2\n\nA B C 3 1 0 2 4 \fFigure 2: Two paths connecting s and t in a graph.\n\nA PSDD is obtained by including a distribution \u03b11, . . . , \u03b1n on the inputs of each or-gate; see\nagain Figure 1(c). The semantics of PSDDs are given in [Kisa et al., 2014a].1 The PSDD is a\ncomplete and canonical representation of probability distributions. That is, PSDDs can represent\nany distribution, and there is a unique PSDD for that distribution (under some conditions). A variety\nof probabilistic queries are tractable on PSDDs, including that of computing the probability of a\npartial variable instantiation and the most likely instantiation. Moreover, the maximum likelihood\nparameter estimates of a PSDD are unique given complete data, and these parameters can be computed\nef\ufb01ciently using closed-form estimates; see [Kisa et al., 2014a] for details. Finally, PSDDs have been\nused to learn distributions over combinatorial objects, including rankings and permutations [Choi\net al., 2015], as well as paths and games [Choi et al., 2016]. In these applications, the Boolean circuit\nunderlying a PSDD captures variable instantiations that correspond to combinatorial objects, while\nits parameterization induces a distribution over these objects.\nAs a concrete example, PSDDs were used to induce distributions over the permutations of n items as\nfollows. We have a variable Xij for each i, j \u2208 {1, . . . , n} denoting that item i is at position j in the\npermutation. Clearly, not all instantiations of these variables correspond to (valid) permutations. An\nSDD circuit is then constructed, which outputs 1 iff the corresponding input corresponds to a valid\npermutation. Each parameterization of this SDD circuit leads to a distribution on permutations and\nthese parameterizations can be learned from data; see Choi et al. [2015].\n\n3 Route Distributions\n\nWe consider now the structured space of simple routes on a map, which correspond to connected and\nloop-free paths on a graph. Our ultimate goal here is to learn distributions over simple routes and use\nthem for reasoning about traf\ufb01c, but we \ufb01rst discuss how to represent such distributions.\nConsider a map in the form of an undirected graph G and let X be a set of binary variables, which\nare in one-to-one correspondence with the edges of graph G. For example, the graph in Figure 2 will\nlead to 12 binary variables, one for each edge in the graph. A variable instantiation x will then be\ninterpreted as a set of edges in graph G. In particular, instantiation x includes edge e iff the edge\nvariable is set to true in instantiation x. As such, some of the instantiations x will correspond to\nroutes in G and others will not.2 In Figure 2, the left route corresponds to a variable instantiation in\nwhich 4 variables are set to true, while all other 8 variables are set to false.\nLet \u03b1G be a Boolean formula obtained by disjoining all instantiations x that correspond to routes\nin graph G. A probability distribution Pr (X) is called a route distribution iff it assigns a zero\nprobability to every instantiation x that does not correspond to a route, i.e., Pr (x) = 0 if x (cid:54)|= \u03b1G.\nOne can systematically induce a route distribution over graph G by simply compiling the Boolean\nformula \u03b1G into an SDD, and then parameterizing the SDD to obtain a PSDD. This approach was\nactually proposed in Choi et al. [2016], where empirical results were shown for routes on grids of\nsize at most 8 nodes by 8 nodes.\nLet us now turn to simple routes, which are routes that do not contain loops. The path on the left of\nFigure 2 is simple, while the one on the right is not simple. Among the instantiations x corresponding\nto routes, some are simple routes and others are not. Let \u03b2G be a Boolean formula obtained by\ndisjoining all instantiations x that correspond to simple routes. We then have \u03b2G |= \u03b1G.\n\n1Let x be an instantiation of PSDD variables. If the SDD circuit outputs 0 at input x, then Pr (x) = 0.\nOtherwise, traverse the circuit top-down, visiting the (unique) high input of each visited or-node, and all inputs\nof each visited and-node. Then Pr (x) is the product of parameters visited during the traversal process.\n\n2An instantiation x corresponds to a route iff the edges it mentions positively can be ordered as a sequence\n\n(n1, n2), (n2, n3), (n3, n4), . . . , (nk\u22121, nk).\n\n3\n\nstst\f(cid:87)\n\n=\n\nFigure 3: The set of all s-t paths corresponds to concatenating edge (s, a) with all a-t paths and\nconcatenating edge (s, b) with all b-t paths.\n\nFigure 4: Partitioning a map into three regions (intersections are nodes of the graph and roads between\nintersections are edges of the graph). Regions have black boundaries. Red edges cross regions and\nblue edges are contained within a region.\n\nA simple-route distribution Pr (X) is a distribution such that Pr (x) = 0 if x (cid:54)|= \u03b2G. Clearly, simple-\nroute distributions are a subclass of route distributions. One can also systematically represent and\nlearn simple-route distributions using PSDDs. In this case, one must compile the Boolean formula\n\u03b2G into an SDD whose parameters are then learned from data. Figure 3 shows one way to encode\nthis Boolean formula (recursively), as discussed in Choi et al. [2016]. More ef\ufb01cient approaches are\nknown, based on Knuth\u2019s Simpath algorithm [Knuth, 2009, Minato, 2013, Nishino et al., 2017].\nTo give a sense of current scalability when compiling simple-routes into SDD circuits, Nishino et al.\n[2017] reported results on graphs with as many as 100 nodes and 140 edges for a single source and\ndestination pair. To put these results in perspective, we point out that we are not aware of how one\nmay obtain similar results using standard probabilistic graphical model\u2014for example, a Bayesian or\na Markov network. Imposing complex constraints, such as the simple-route constraint, typically lead\nto highly-connected networks with high treewidths.3\nWhile PSDD scalability is favorable in this case\u2014when compared to probabilistic graphical models\u2014\nour goal is to handle problems that are signi\ufb01cantly larger in scale. The classical direction for\nachieving this goal is to advance current circuit compilation technology, which would allow us\nto compile propositional logic descriptions that cannot be compiled today. We next propose an\nalternative, yet a complementary direction, which is based on the notion of hierarchical maps and the\ncorresponding notion of hierarchical distributions.\n\n4 Hierarchical Route Distributions\n\nA route distribution can be represented hierarchically if one imposes a hierarchy on the underlying\nmap, leading to a representation that is polynomial in size if one includes enough levels in the\nhierarchy. Under some conditions which we discuss later, the hierarchical representation can also\nsupport inference in time polynomial in its size. The penalty incurred due to this hierarchical\nrepresentation is a loss of exactness on some queries, which can be controlled as we discuss later.\n\n3If we can represent a uniform distribution of simple routes on a map, then we can count the number of\nsimple paths on a graph, which is a #P-complete problem [Valiant, 1979]. Hence, we do not in general expect a\nBayesian or Markov network for such a distribution to have bounded treewidth.\n\n4\n\nstabtabtab\fWe start by discussing hierarchical maps, where a map is represented by a graph G as discussed\nearlier. Let N1, . . . , Nm be a partitioning of the nodes in graph G and let us call each Ni a region.\nThese regions partition edges X into B, A1, . . . , Am, where B are edges that cross regions and Ai\nare edges inside region Ni. Consider the following decomposition for distributions over routes:\n\nPr (x) = Pr (b)\n\nPr (ai | bi).\n\n(1)\n\nm(cid:89)\n\ni=1\n\nWe refer to such a distribution as a decomposable route distribution.4 Here, Bi are edges that cross\nout of region Ni, and b, ai and bi are partial instantiations that are compatible with instantiation x.\nTo discuss the main insight behind this hierarchical representation, we need to \ufb01rst de\ufb01ne a graph\nGB that is obtained from G by aggregating each region Ni into a single node. We also need to\nde\ufb01ne subgraphs Gbi, obtained from G by keeping only edges Ai and the edges set positively in\ninstantiation bi (the positive edges of bi denote the edges used to enter and exit the region Ni).\nHence, graph GB is an abstraction of graph G, while each graph Gbi is a subset of G. Moreover, one\ncan think of each subgraph Gbi as a local map (for region i) together with a particular set of edges\nthat connects it to other regions. We can now state the following key observations. The distribution\nPr (B) is a route distribution for the aggregated graph GB. Moreover, each distribution Pr (Ai | bi)\nis a distribution over (sets of) routes for subgraph Gbi (in general, we may enter and exit a region\nmultiple times).\nHence, we are able to represent the route distribution Pr (X) using a set of smaller route distributions.\nOne of these distributions Pr(B) captures routes across regions. The others, Pr (Ai | bi), capture\ni=1 2|Bi|, which\nis exponential in the size of variable sets B1, . . . , Bn. We will later see that this count can be\npolynomial for some simple-route distributions.\nWe used \u03b1G to represent the instantiations corresponding to routes, and \u03b2G to represent the instantia-\ntions corresponding to simple routes, with \u03b2G |= \u03b1G. Some of these simple routes are also simple\nwith respect to the aggregated graph GB (i.e., they will not visit a region Ni more than once), while\nother simple routes are not simple with respect to graph GB. Let \u03b3G be the Boolean expression\nobtained by disjoining instantiations x that correspond to simple routes that are also simple (and\nnon-empty) with respect to graph GB.5 We then have \u03b3G |= \u03b2G |= \u03b1G and the following result.\nTheorem 1 Consider graphs G, GB and Gbi as indicated above. Let Pr (B) be a simple-route\ndistribution for graph GB, and Pr (Ai | bi) be a simple-route distribution for graph Gbi. Then the\nresulting distribution Pr (X), as de\ufb01ned by Equation 1, is a simple-route distribution for graph G.\n\nroutes that are within a region. The count of these smaller distributions is 1 +(cid:80)m\n\nThis theorem will not hold if Pr (B) were not a simple-route distribution for graph GB. That is,\nhaving each distribution Pr (Ai | bi) be a simple-route distribution for graph Gbi is not suf\ufb01cient\nfor the hierarchical distribution to be a simple-route distribution for G.\nHierarchical distributions that satisfy the conditions of Theorem 1 will be called hierarchical simple-\nroute distributions.\n\nTheorem 2 Let Pr (X) be a hierarchical simple-route distribution for graph G and let \u03b3G be as\nindicated above. We then have Pr (x) = 0 if x (cid:54)|= \u03b3G.\nThis means that the distribution will assign a zero probability to all instantiations x |= \u03b2G \u2227 \u00ac\u03b3G.\nThese instantiations correspond to routes that are simple for graph G but not simple for graph\nGB. Hence, simple-route hierarchical distributions correspond to a subclass of the simple-route\ndistributions for graph G. This subclass, however, is interesting for the following reason.\n\nTheorem 3 Consider a hierarchical simple-route distribution Pr (X) and let x be an instantiation\nthat sets more than two variables in some Bi to true. Then Pr (x) = 0.\n\n4Note that not all route distributions can be decomposed as such: the decomposition implies the independence\n\nof routes on edges Ai given the route on edges B.\n\n5For most practical cases, the independence assumption of the hierarchical decomposition will dictate that\nroutes on GB be non-empty. An empty route on GB corresponds to a route contained within a single region,\nwhich we can accommodate using a route distribution for the single region.\n\n5\n\n\fBasically, a route that is simple for graph GB cannot enter and leave a region more than once.\nCorollary 1 The hierarchical simple-route distribution Pr (X) can be constructed from distribution\nPr (B) and distributions Pr (Ai | bi) for which bi sets no more than two variables to true.\nCorollary 2 The hierarchical simple-route distribution Pr (X) can be represented by a data structure\n\nwhose size is O(2|B| +(cid:80)m\n\ni=1 2|Ai||Bi|2).\n\nIf we choose our regions Ni to be small enough, then 2|Ai| can be treated as a constant. A tabular\nrepresentation of the simple-route distribution Pr (B) has size O(2|B|). If representing this table is\npractical, then inference is also tractable (via variable elimination). However, this distribution can\nitself be represented by a simple-route hierarchical distribution. This process can continue until we\nreach a simple-route distribution that admits an ef\ufb01cient representation. We can therefore obtain a\n\ufb01nal representation which is polynomial in the number of variables X and, hence, polynomial in the\nsize of graph G (however, inference may no longer be tractable).\nIn our approach, we represent the distributions Pr (B) and Pr (Ai | bi) using PSDDs. This allows\nthese distributions to be over a relatively large number of variables (on the order of hundreds), which\nwould not be feasible if we used more classical representations, such as graphical models.\nThis hierarchical representation, which is both small and admits polytime inference, is an approxima-\ntion as shown by the following theorem.\nTheorem 4 Consider a decomposable route distribution Pr (X) (as in Equation 1), the correspond-\ning hierarchical simple-route distribution Pr (X | \u03b3G), and a query \u03b1 over variables X. The error of\nthe query Pr (\u03b1 | \u03b3G), relative to Pr (\u03b1), is:\nPr (\u03b1 | \u03b3G) \u2212 Pr (\u03b1)\n\n(cid:20)\n\n(cid:21)\n\nPr (\u03b1 | \u03b3G)\n\n= Pr (\u03baG)\n\n1 \u2212 Pr (\u03b1 | \u03baG)\nPr (\u03b1 | \u03b3G)\n\nwhere \u03baG = \u03b2G \u2227 \u00ac\u03b3G denotes simple-routes in G that are not simple routes in GB.\nThe conditions of this theorem basically require the two distributions to agree on the relative proba-\nbilities of simple routes that are also simple in GB. Note also that Pr (\u03b3G) + Pr (\u03baG) = 1. Hence, if\nPr (\u03b3G) \u2248 1, then we expect the hierarchical distribution to be accurate. This happens when most\nsimple routes are also simple in GB, a condition that may be met by a careful choice of map regions.6\nAt one extreme, if each region has at most two edges crossing out of it, then Pr (\u03b3G) = 1 and the\nhierarchical distribution is exact.\nHierarchical simple-route distributions will assign a zero probability to routes x that are simple in G\nbut not in GB. However, for a mild condition on the hierarchy, we can guarantee that if there is a\nsimple route between nodes s and t in G, there is also a simple route that is simple for GB.\nProposition 1 If the subgraphs Gbi are connected, then there is a simple route connecting s and t\nin G iff there is a simple route connecting s and t in G that is also a simple route for GB.\n\nUnder this condition, hierarchical simple-route distributions will provide an approximation for any\nsource/destination query.\nOne can compute marginal and MAP queries in polytime on a hierarchical distribution, assuming\nthat one can (in polytime) multiply and sum-out variables from its component distributions\u2014we\nbasically need to sum-out variables Bi from each Pr (Ai|bi), then multiply the results with Pr (B).\nIn our experiments, however, we follow a more direct approach to inference, in which we multiply all\ncomponent distributions (PSDDs), to yield one PSDD for the hierarchical distribution. This is not\nalways guaranteed to be ef\ufb01cient, but leads to a much simpler implementation.\n\n5 Encoding and Compiling Routes\n\nRecall that constructing a PSDD involves two steps: constructing an SDD that represents the\nstructured space, and then parameterizing the SDD. In this section, we discuss how to construct\n\n6If \u03b1 is independent of \u03b3G (and hence \u03b1 is independent of \u03baG), then the approximation is also exact. At this\n\npoint, however, we do not know of an intuitive characterization of queries \u03b1 that satisfy this property.\n\n6\n\n\fFigure 5: Partitioning of the area around the Financial District of San Francisco, into regions.\n\nan SDD that represents the structured space of hierarchical, simple routes. Subsequently, in our\nexperiments, we shall learn the parameters of the PSDD from data.\nWe \ufb01rst consider the space of simple routes that are not necessarily hierarchical. Note here that an\nSDD of a Boolean formula can be constructed bottom-up, starting with elementary SDDs representing\nliterals and constants, and then constructing more complex SDDs from them using conjoin, disjoin,\nand negation operators implemented by an SDD library. This approach can be used to construct an\nSDD that encodes simple routes, using the idea from Figure 3, which is discussed in more detail in\nChoi et al. [2016]. The GRAPHILLION library can be used to construct a Zero-suppressed Decision\nDiagram (ZDD) representing all simple routes for a given source/destination pair [Inoue et al., 2014].\nThe ZDDs can then be disjoined across all source and destination pairs, and then converted to an\nSDD. An even more ef\ufb01cient algorithm was proposed recently for compiling simple routes to ZSDDs,\nwhich we used in our experiments [Nishino et al., 2016, 2017].\nConsider now the space of hierarchical simple routes induced by regions N1, . . . , Nm of graph\nG, with a corresponding partition of edges into B, A1, . . . , Am, as discussed earlier. To compile\nan SDD for the hierarchical, simple routes of G, we \ufb01rst compile an SDD representing the simple\nroutes over each region. That is, for each region Ni, we take the graph induced by the edges Ai\nand Bi, and compile an SDD representing all its simple routes (as described above). Similarly, we\ncompile an SDD representing the simple routes of the abstracted graph GB. At this point, we have a\nhierarchical, simple-route distribution in which components are represented as PSDDs and that we\ncan do inference on using multiplication and summing-out as discussed earlier.\nIn our experiments, however, we take the extra step of multiplying all the m + 1 component PSDDs,\nto yield a single PSDD over the structured space of hierarchical, simple routes. This simpli\ufb01es\ninference and learning as we can now use the linear-time inference and learning procedures known\nfor PSDDs [Kisa et al., 2014a].7\n\n6 Experimental Results\n\nIn our experiments, we considered a dataset consisting of GPS data collected from taxicab routes\nin San Francisco.8 We acquired public map data from http://www.openstreetmap.org/, i.e.,\nthe undirected graph representing the streets (edges) and intersections (nodes) of San Francisco.\nWe projected the GPS data onto the San Francisco graph using the map-matching API of the\ngraphhopper package.9 For more on map-matching, see, e.g., [Froehlich and Krumm, 2008].\n\n7In our experiments, we use an additional simpli\ufb01cation. Recall from Footnote 5 that if bi sets all variables\nnegatively (i.e., no edges), then Gbi is empty. We now allow the case that Gbi contains all edges Ai (by disjoing\nthe corresponding SDDs). Intuitively, this optionally allows a simple path to exist strictly in region Ri. While\nthe global SDD no longer strictly represents hierarchical simple paths (it may allow sets of independent simple\npaths at once), we do not have to treat simple paths that are con\ufb01ned to a single region as a special case.\n\n8Available at http://crawdad.org/epfl/mobility/20090224/.\n9Available at https://www.graphhopper.com.\n\n7\n\n\fTo partition the graph of San Francisco into regions, we obtained a publicly available dataset of\ntraf\ufb01c analysis zones, produced by the California Metropolitan Transportation Commission.10 These\nzones correspond to small area neighborhoods and communities of the San Francisco Bay Area. To\nfacilitate the compilation of regions into SDDs, we further split these zones in half until each region\nwas compilable (horizontally if the region was taller than it was wide, or vertically otherwise). Finally,\nwe restricted our attention to areas around the Financial District of San Francisco, which we were\nable to compile into a hierarchical distribution using one level of abstraction; see Figure 5.\nGiven the routes over the graph of San Francisco, we \ufb01rst \ufb01ltered out any routes that did not\ncorrespond to a simple path on the San Francisco graph. We next took all routes that were contained\nsolely in the region under consideration. We further took any sub-route that passed through this\nregion, as a route for our region. In total, we were left with 87,032 simple routes. We used half for\ntraining, and the other half for testing. For the training set, we also removed all simple routes that\nwere not simple in the hierarchy. We did not remove such routes for the purposes of testing. We\n\ufb01rst compiled an SDD of hierarchical simple-routes over the region, leading to an SDD with 62,933\nnodes, and 152,140 free parameters. We then learned the parameters of our PSDD from the training\nset, assuming Laplace smoothing [Kisa et al., 2014a].\nWe considered a route prediction task where we predict the next road segment, given the route taken\nso far; see, e.g., [Letchner et al., 2006, Simmons et al., 2006, Krumm, 2008]. That is, for each route\nof the testing set, we consider one edge at a time and try to predict the next edge, given the edges\nobserved so far. We consider three approaches: (1) a naive baseline that uses the relative frequency of\nedges to predict the next edge, while discounting the last-used edge, (2) a Markov model that predicts,\ngiven the last-used edge, what edge would be the most likely one to be traversed next, (3) a PSDD\ngiven the current partial route as well as the destination. The last assumption is often the situation in\nreality, given the ubiquity of GPS routing applications on mobile phones. We remark that Markov\nmodels and HMMs are less amenable to accepting a destination as an observation.\nFor the PSDD, the current partial route and the last edge to be used (i.e., the destination) are given as\nevidence e. The evidence for an endpoint (source or destination) is the edge used (set positively),\nwhere the remaining edges are assumed to be unused (and set negatively). For internal nodes on\na route, two edges (entering and exiting a node) are set positively and the remaining edges are\nset negatively in the evidence. To predict the next edge on a partial route, we consider the edges\nX incident to the current node and compute their marginal probabilities Pr (X | e) according to\nthe PSDD. The probability of the last edge used in the partial route is 1, which we ignore. The\nremaining edges have a probability that sums to a value less than one; one minus this probability is\nthe probability that the route ends at the current node. Among all these options, we pick the most\nlikely as our prediction (either navigate to a new edge, or stop).\nNote that for the purposes of training our PSDD, we removed those simple routes that were not simple\non the hierarchy. When testing, such routes have a probability of zero on our PSDD. Moreover, partial\nroutes may also have zero probability, if they cannot be extended to a hierarchical simple-route. In\nthis case, we cannot compute the marginals Pr (X | e). Hence, we simply unset our evidence, one\nedge at a time in the order that we set them (\ufb01rst unsetting negative edges before positive edges),\nuntil the evidence becomes consistent again, relative to the PSDD.\nWe summarize the relative accuracies over 43,516 total testing routes:\n\nmodel\naccuracy\n\nnaive\n\n0.736 (326,388/443,481)\n\nMarkov\n\n0.820 (363,536/443,481)\n\nPSDD\n\n0.931 (412,958/443,481)\n\nFor each model, we report the accuracy averaged over all steps on all paths, ignoring those steps\nwhere the prediction is trivial (i.e., there is only one edge or no edge available to be used next). We\n\ufb01nd that the PSDD is much more accurate at predicting the next road segment, compared to the\nMarkov model and the naive baseline. Indeed, this could be expected as (1) the PSDD uses the history\nof the route so far, and perhaps more importantly, (2) it utilizes knowledge of the destination.\n\n10Available at https://purl.stanford.edu/fv911pc4805.\n\n8\n\n\f7 Conclusion\n\nIn this paper, we considered Probabilistic Sentential Decision Diagrams (PSDDs) representing\ndistributions over routes on a map, or equivalently, simple paths on a graph. We considered a\nhierarchical approximation of simple-route distributions, and examined its relative tractability and\nits accuracy. We showed how this perspective can be leveraged to represent and learn more scalable\nPSDDs for simple-route distributions. In a route prediction task, we showed that PSDDs can take\nadvantage of the available observations, such as the route taken so far and the destination of a trip, to\nmake more accurate predictions.\n\nAcknowledgments\n\nWe greatly thank Noah Had\ufb01eld-Menell and Andy Shih for their contributions, and Eunice Chen for\nhelpful discussions. This work has been partially supported by NSF grant #IIS-1514253, ONR grant\n#N00014-15-1-2339 and DARPA XAI grant #N66001-17-2-4032.\n\nReferences\nA. Choi, G. Van den Broeck, and A. Darwiche. Tractable learning for structured probability spaces: A case\n\nstudy in learning preference distributions. In Proceedings of IJCAI, 2015.\n\nA. Choi, N. Tavabi, and A. Darwiche. Structured features in naive Bayes classi\ufb01cation. In Proceedings of the\n\n30th AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2016.\n\nA. Darwiche. SDD: A new canonical representation of propositional knowledge bases. In Proceedings of IJCAI,\n\npages 819\u2013826, 2011.\n\nJ. Froehlich and J. Krumm. Route prediction from trip observations. Technical report, SAE Technical Paper,\n\n2008.\n\nT. Inoue, H. Iwashita, J. Kawahara, and S.-i. Minato. Graphillion: software library for very large sets of labeled\n\ngraphs. International Journal on Software Tools for Technology Transfer, pages 1\u201310, 2014.\n\nD. Kisa, G. Van den Broeck, A. Choi, and A. Darwiche. Probabilistic sentential decision diagrams. In KR,\n\n2014a.\n\nD. Kisa, G. Van den Broeck, A. Choi, and A. Darwiche. Probabilistic sentential decision diagrams: Learning\nwith massive logical constraints. In ICML Workshop on Learning Tractable Probabilistic Models (LTPM),\n2014b.\n\nD. E. Knuth. The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks & Techniques; Binary\n\nDecision Diagrams. Addison-Wesley Professional, 2009.\n\nJ. Krumm. A Markov model for driver turn prediction. Technical report, SAE Technical Paper, 2008.\nJ. Letchner, J. Krumm, and E. Horvitz. Trip router with individualized preferences (TRIP): incorporating\n\npersonalization into route planning. In AAAI, pages 1795\u20131800, 2006.\n\nY. Liang, J. Bekker, and G. Van den Broeck. Learning the structure of probabilistic sentential decision diagrams.\n\nIn Proceedings of the 33rd Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), 2017.\n\nT. Lu and C. Boutilier. Learning Mallows models with pairwise preferences. In Proceedings of ICML, pages\n\n145\u2013152, 2011.\n\nC. L. Mallows. Non-null ranking models. Biometrika, 1957.\nS. Minato. Techniques of BDD/ZDD: brief history and recent activity. IEICE Transactions, 96-D(7):1419\u20131429,\n\n2013.\n\nM. Nishino, N. Yasuda, S. Minato, and M. Nagata. Zero-suppressed sentential decision diagrams. In AAAI,\n\npages 1058\u20131066, 2016.\n\nM. Nishino, N. Yasuda, S. Minato, and M. Nagata. Compiling graph substructures into sentential decision\n\ndiagrams. In Proceedings of the Thirty-First Conference on Arti\ufb01cial Intelligence (AAAI), 2017.\n\nY. Shen, A. Choi, and A. Darwiche. Tractable operations for arithmetic circuits of probabilistic models. In\n\nAdvances in Neural Information Processing Systems 29 (NIPS), 2016.\n\nR. Simmons, B. Browning, Y. Zhang, and V. Sadekar. Learning to predict driver route and destination intent. In\n\nIntelligent Transportation Systems Conference, pages 127\u2013132, 2006.\n\nL. G. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8(3):410\u2013421, 1979.\nY. Xue, A. Choi, and A. Darwiche. Basing decisions on sentences in decision diagrams. In AAAI, pages 842\u2013849,\n\n2012.\n\n9\n\n\f", "award": [], "sourceid": 1973, "authors": [{"given_name": "Arthur", "family_name": "Choi", "institution": "UCLA"}, {"given_name": "Yujia", "family_name": "Shen", "institution": "UCLA"}, {"given_name": "Adnan", "family_name": "Darwiche", "institution": "UCLA"}]}