{"title": "Constraint-based Causal Structure Learning with Consistent Separating Sets", "book": "Advances in Neural Information Processing Systems", "page_first": 14257, "page_last": 14266, "abstract": "We consider constraint-based methods for causal structure learning, such as the PC algorithm or any PC-derived algorithms whose \ufb01rst step consists in pruning a complete graph to obtain an undirected graph skeleton, which is subsequently oriented. All constraint-based methods perform this \ufb01rst step of removing dispensable edges, iteratively, whenever a separating set and corresponding conditional independence can be found. Yet, constraint-based methods lack robustness over sampling noise and are prone to uncover spurious conditional independences in \ufb01nite datasets. In particular, there is no guarantee that the separating sets identi\ufb01ed during the iterative pruning step remain consistent with the \ufb01nal graph. In this paper, we propose a simple modi\ufb01cation of PC and PC-derived algorithms so as to ensure that all separating sets identi\ufb01ed to remove dispensable edges are consistent with the \ufb01nal graph,thus enhancing the explainability of constraint-basedmethods. It is achieved by repeating the constraint-based causal structure learning scheme, iteratively, while searching for separating sets that are consistent with the graph obtained at the previous iteration. Ensuring the consistency of separating sets can be done at a limited complexity cost, through the use of block-cut tree decomposition of graph skeletons, and is found to increase their validity in terms of actual d-separation. It also signi\ufb01cantly improves the sensitivity of constraint-based methods while retaining good overall structure learning performance. Finally and foremost, ensuring sepset consistency improves the interpretability of constraint-based models for real-life applications.", "full_text": "Constraint-based Causal Structure Learning with\n\nConsistent Separating Sets\n\nHonghao Li, Vincent Cabeli, Nadir Sella, Herv\u00e9 Isambert\u2217\nInstitut Curie, PSL Research University, CNRS UMR168, Paris\n\n{honghao.li, vincent.cabeli, nadir.sella, herve.isambert}@curie.fr\n\nAbstract\n\nWe consider constraint-based methods for causal structure learning, such as the PC\nalgorithm or any PC-derived algorithms whose \ufb01rst step consists in pruning a com-\nplete graph to obtain an undirected graph skeleton, which is subsequently oriented.\nAll constraint-based methods perform this \ufb01rst step of removing dispensable edges,\niteratively, whenever a separating set and corresponding conditional independence\ncan be found. Yet, constraint-based methods lack robustness over sampling noise\nand are prone to uncover spurious conditional independences in \ufb01nite datasets. In\nparticular, there is no guarantee that the separating sets identi\ufb01ed during the itera-\ntive pruning step remain consistent with the \ufb01nal graph. In this paper, we propose\na simple modi\ufb01cation of PC and PC-derived algorithms so as to ensure that all\nseparating sets identi\ufb01ed to remove dispensable edges are consistent with the \ufb01nal\ngraph, thus enhancing the explainability of constraint-based methods. It is achieved\nby repeating the constraint-based causal structure learning scheme, iteratively,\nwhile searching for separating sets that are consistent with the graph obtained at\nthe previous iteration. Ensuring the consistency of separating sets can be done at a\nlimited complexity cost, through the use of block-cut tree decomposition of graph\nskeletons, and is found to increase their validity in terms of actual d-separation.\nIt also signi\ufb01cantly improves the sensitivity of constraint-based methods while\nretaining good overall structure learning performance. Finally and foremost, ensur-\ning sepset consistency improves the interpretability of constraint-based models for\nreal-life applications.\n\n1 Introduction\n\nWhile the oracle versions of constraint-based methods have been demonstrated to be sound and\ncomplete (Zhang, 2008; Spirtes, Glymour, and Scheines, 2000; Pearl, 2009), a major limitation of\nthese methods is their lack of robustness with respect to sampling noise for \ufb01nite datasets. This\nhas largely limited their use to analyze real-life data so far, although important advances have been\nmade lately, in particular, to limit the order-dependency of constraint-based methods (Colombo\nand Maathuis, 2014) or to improve their robustness to sampling noise by recasting them within a\nmaximum likelihood framework (Affeldt and Isambert, 2015; Affeldt, Verny, and Isambert, 2016).\nHowever, it remains that constraint-based methods still lack graph consistency, in practice, as they do\nnot guarantee that the learnt structures belong to their presumed class of graphical models, such as a\ncompleted partially directed acyclic graph (CPDAG) model for the PC (Spirtes and Glymour, 1991;\nKalisch and B\u00fchlmann, 2008; Kalisch et al., 2012) or IC (Pearl and Verma, 1991) algorithms, or a\npartial ancestral graph (PAG) for FCI or related constraint-based algorithms allowing for unobserved\nlatent variables (Spirtes, Meek, and Richardson, 1999; Richardson and Spirtes, 2002; Colombo et\nal., 2012; Verny et al., 2017; Sella et al., 2018). By contrast, search-and-score structure learning\n\n\u2217corresponding author\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fmethods (Koller and Friedman, 2009) inherently enforce graph consistency by searching structures\nwithin the assumed class of graphs, e.g., within the class of directed acyclic graphs (DAG). Similarly,\nhybrid methods such as MMHC (Tsamardinos, Brown, and Aliferis, 2006) can also ensure graph\nclass consistency by maximizing the likelihood of edge orientation within the class of DAGs.\nThis paper concerns, more speci\ufb01cally, the inconsistency of separating sets used to remove dis-\npensable edges, iteratively, based on conditional independence tests. This inconsistency arises as\nsome separating sets might no longer be compatible with the \ufb01nal graph, if they were not already\nincompatible with the current skeleton, when testing for conditional independence during the pruning\nprocess. It occurs, for instance, when a node in a separating set is not on any indirect path linking the\nextremities of a removed edge, as noted in (Spirtes, Glymour, and Scheines, 2000). Such inconsis-\ntencies can be seen as a major shortcoming of constraint-based methods, as the primary motivation\nto learn and visualize graphical models is arguably to be able to read off conditional independences\ndirectly from the graph structure (Spirtes, Glymour, and Scheines, 2000; Pearl, 2009).\nIn the following, we propose a simple modi\ufb01cation of PC or PC-derived algorithms so as to ensure\nthat all conditional independences identi\ufb01ed and used to remove dispensable edges are consistent\nwith the \ufb01nal graph. It is achieved by repeating the constraint-based causal structure learning scheme,\niteratively, while searching for separating sets that are consistent with the graph obtained at the\nprevious iteration, until a limit cycle of successive graphs is reached. The union of the graphs over this\nlimit cycle is then guaranteed to be consistent with the separating sets and corresponding conditional\nindependences used to remove all dispensable edges from the initial complete graph. Enforcing\nsepset consistency of constraint-based methods is found to limit their tendency to uncover spurious\nconditional independences early on in the pruning process when the combinatorial space of possible\nseparating sets is still large. As a result, enforcing sepset consistency reduces the large number\nof false negative edges usually predicted by constraint-based methods (Colombo and Maathuis,\n2014) and, thereby, achieve a better balance between their sensitivity and precision. Ensuring the\nconsistency of separating sets is also found to increase their validity in terms of actual d-separation\nand, therefore, to improve the interpretability of constraint-based models for real-life applications.\nMoreover, ensuring the consistency of separating sets can be done at a limited complexity cost,\nthrough the use of block-cut tree decomposition of graph skeletons, which enables to learn causal\nstructures with consistent separating sets for a few hundred nodes. By contrast, earlier methods\naiming at reducing the number of d-separation con\ufb02icts or other structural inconsistencies through\nSAT-based approaches, e.g. (Hyttinen et al., 2013), have a much larger complexity burden, which\nlimits their applications to very small networks in practice.\n\n2 Result\n\n2.1 Background\n\n2.1.1 Terminology\nA graph G(V , E) consists of a vertex set V = {X1,\u00b7\u00b7\u00b7 , Xp} and an edge set E. All graphs\nconsidered here have at most one edge between any pair of vertices. A walk is a sequence of edges\njoining a sequence of vertices. A trail is a walk without repeated edge. A path is a trail without\nrepeated vertices. A cycle is a trail in which the only repeated vertices are the \ufb01rst and last vertices.\nVertices are said to be adjacent if there is an edge between them. If all pairs of vertices in a graph\nare adjacent, it is called a complete graph and is denoted by Gc. By constrast, an empty graph,\ndenoted by G\u2205, consists of isolated vertices with no edges. The adjacency set of a vertex Xi in a\ngraph G, denoted by adj(G, Xi), is the set of all vertices in V that are adjacent to Xi in G. If an edge\nis directed, as X \u2192 Y , X is a parent of Y and Y a child of X. A collider is a triple (Xi, Xj, Xk) in\na graph where the edges are oriented as Xi \u2192 Xk \u2190 Xj. A v-structure is a collider for which Xi\nand Xj are not adjacent. Given a statistical signi\ufb01cance level \u03b1, the conditional independence of\na pair of variables (Xi, Xj) given a set of variables C, is denoted by (Xi \u22a5\u22a5 Xj|C)\u03b1, where C is\ncalled a separating set or \u201csepset\u201d for (Xi, Xj).\n\n2.1.2 The PC and PC-stable Algorithms\n\nThe PC algorithm (Spirtes and Glymour, 1991), outlined in algorithm 1, is the archetype of constraint-\nbased structure learning methods (Spirtes, Glymour, and Scheines, 2000; Pearl, 2009), as illustrated\n\n2\n\n\fin Figure 1. Given a dataset over a set of variables (vertices), it starts from a complete graph G. By\na series of statistical tests on each pair of variables, all dispensable edges X\nY are removed if\na (conditional) independence and separating set C can be found, i.e. (X \u22a5\u22a5 Y | C) (step 1). The\nresulting undirected graph is called the skeleton. V-structures are then identi\ufb01ed, X \u2192 Z \u2190 Y ,\nif (X \u22a5\u22a5 Y | C) and Z /\u2208 C (step 2). Additional assumptions (e.g., acyclicity) allow for the\npropagation of v-structure orientations to some of the remaining undirected edges (Zhang, 2008)\n(step 3).\n\nAlgorithm 1 The PC Algorithm\nRequire: V ,D(V ), signi\ufb01cance level \u03b1\n\nStep 1: Find the graph skeleton and separating sets of removed edges\nStep 2: Orient v-structures based on separating sets\nStep 3: Propagate orientations of v-structures to as many remaining undirected edges as possible\nreturn Output graph\n\nStep 1\n\n6\n\n3\n\n3\n\n2\n\n4\n\n1\n\n5\n\nComplete graph\n\n2\n\n1\n\n5\n4\nSkeleton\n\nStep 2\n\n6\n\n3\n\n2\n\n4\n\n1\n\n5\n\nIdentify V-structures\n\nStep 3\n\n6\n\n3\n\n2\n\n1\n\n6\n\n5\n4\nPropagation\n\nFigure 1: General procedure of constraint-based structure learning.\n\nWhile the oracle version of the PC-algorithm has been shown to be sound and complete, its application\nis known to be sensitive to the \ufb01nite size of real life datasets. In particular, the PC-algorithm in its\noriginal implementation (Spirtes, Glymour, and Scheines, 2000) is known to be order-dependent,\nin the sense that the output depends on the lexicographic order of the variables. This issue can\nbe circumvented, however, for the \ufb01rst step of algorithm 1 with a simple modi\ufb01cation given in\nalgorithm 2 and referred to as Step 1 of PC-stable algorithm (Colombo and Maathuis, 2014).\n\nAlgorithm 2 Find skeleton and separating sets (Step 1 of PC-stable algorithm)\nRequire: Conditional independence assessment between all variables V with signi\ufb01cance level \u03b1\n\nG \u2190 Gc\n\ufffd \u2190 \u22121\nrepeat\n\n\ufffd \u2190 \ufffd + 1\nfor all vertices Xi \u2208 G do\nend for\na(Xi) = adj(G, Xi)\nrepeat\n\nselect a new pair of vertices (Xi, Xj) adjacent in G and satisfying |a(Xi)\\{Xj}| \u2265 \ufffd\nrepeat\n\nchoose new C \u2286 a(Xi)\\{Xj},|C| = \ufffd\nif (Xi \u22a5\u22a5 Xj|C)\u03b1 then\n\nDelete edge Xi Xj from G\nSepset(Xi, Xj | G) = Sepset(Xj, Xi | G) \u2190 C\n\nend if\n\nuntil Xi and Xj are no longer adjacent in G or all C \u2286 a(Xi)\\{Xj} with |C| = \ufffd have\nuntil all pairs of adjacent vertices (Xi, Xj) in G with |a(Xi)\\{Xj}| \u2265 \ufffd have been considered\n\nbeen considered\nuntil all pairs of adjacent vertices (Xi, Xj) in G satisfy |a(Xi)\\{Xj}| \u2264 \ufffd\nreturn G, sepsets\n\n3\n\n\f2.2 The Consistent PC Algorithm\n\n2.2.1 Lack of Robustness and Consistency of Constraint-based Methods\nBeyond the order-dependence of the PC Algorithm, the general lack of robustness of constraint-based\nmethods stems from their tendency to uncover spurious conditional independences (false negatives)\nbetween variables. This trend originates from the fact that conditioning on other variables amounts to\n\u201cslicing\u201d the available data into smaller and smaller subsets, corresponding to different combinations\nof categories or discrete values of the conditioning variables, over which independence tests are\nessentially \u201caveraged\u201d to assess conditional independence.\nHence, by making sure that all separating sets are actually consistent with the \ufb01nal graph, one expects\nto reduce the number of false negative edges due to spurious conditional independences inferred\nduring the edge pruning process and, thereby, to improve the sensitivity (or recall) of the PC or\nPC-stable algorithms.\nThe inconsistency of separating sets can be of different forms, regarding either the skeleton (type I)\nor the \ufb01nal (partially) oriented graph (type II), as illustrated on Figure 1.\nA type I inconsistency corresponds to a conditional independence relation such as (2 \u22a5\u22a5 6 | 3) in\nFigure 1, for which there is no path between vertex 2 and 6 that passes through 3. This type of\ninconsistency often involves edges evaluated early on in the pruning process when few edges have\nbeen removed, and thus the combinatorial space of possible separating sets is still large. In particular,\nedge 3\n6 is\nunder consideration.\nA type II inconsistency is a different kind of incompatibility originating from the orientation of the\nskeleton. It occurs, in particular, when a conditional independence relation is conditioned on at least\none common descendant of the pair of interest in the \ufb01nal graph, e.g. (3 \u22a5\u22a5 6 | 1) in Figure 1. Since it\nstems from the orientation of edges (steps 2&3), the origin of type II inconsistencies is generally more\ncomplex and results from a cascade of errors in both conditional independence tests and orientation.\nThese two types of inconsistency help de\ufb01ne the following consistent set for candidate nodes of\nseparating sets in absence of latent variables:\nDe\ufb01nition 1 (Consistent set). Given a graph G(V , E) and a set of variables { X, Y, Z } \u2286 V ,\n\n6, which is eventually removed in the \ufb01nal graph, may still exist when the edge 2\n\nConsist(X, Y | G) = { Z \u2208 adj(X) \\ { Y } | 1. at least one path \u03b3Z\n\nXY exists in G;\n\n2. Z is not a child of X in G }\n\nwhere \u03b3Z\ncondition is always satis\ufb01ed.\n\nXY is a path from X to Y passing through Z. Note that for an undirected graph, the second\n\n2.2.2 Consistent PC Pseudocodes\nDe\ufb01nition 2. NewStep1(G1|G2) is a modi\ufb01ed version of PC-stable step 1 (algorithm 2) where,\n\n1. Gc is replaced by G1, and\n2. a(Xi) \\ {Xj} is replaced by a(Xi) \\ {Xj} \u2229 Consist(Xi, Xj | G2)\n\nNote that algorithm NewStep1(Gc|Gc) corresponds to the unmodi\ufb01ed step 1 of original PC-stable\nalgorithm 2. By constrast, algorithm NewStep1(Gc|G\u2205) removes all edges corresponding to indepen-\ndence without conditioning, as no separating set is involved. This unconditional independence search\nwill be noted step 1a, while the subsequent conditional independence search will be referred to as\nstep 1b, thereafter.\nDe\ufb01nition 3. S(G1|G2) is a modi\ufb01ed version of the PC-stable algorithm, where step 1 in algorithm 1\nis replaced by NewStep1(G1|G2) from de\ufb01nition 2.\nThen, de\ufb01nition 3 allows to de\ufb01ne algorithm 3, which ensures a consistent constraint-based algorithm\nthrough an iterative call of S algorithms, (Sk)k\u2208N\ufffd, following an initial step 1a, NewStep1(Gc|G\u2205).\nAs illustrated on Figure 2 and proved below, algorithm 3 achieves separating set consistency by\nrepeating step 1b and step 2&3, iteratively, while searching for separating sets that are consistent\nwith the graph obtained at the previous iteration, until a limit cycle of successive graphs is reached.\n\n4\n\n\f3\n\n2\n\n4\n\n1\n\n5\n\nStep 1a\n\n6\n\n3\n\nAlgorithm 4\n\nStep 1b\n\n6\n\n3\n\n2\n\n4\n\n1\n\n5\n\nAlgorithm 3\n\n2\n\n4\n\n1\n\n5\n\nStep 2&3\n\n6\n\n3\n\n2\n\n4\n\n1\n\n5\n\n6\n\nGc\n\nG0 = NewStep1(Gc|G\u2205)\n\nGk = NewStep1(G0|Gk\u22121)\n\nGk = Sk(G0|Gk\u22121)\n\nFigure 2: Illustration of the iterative procedure to learn graphical models with orientation-consistent\n(algorithm 3) or skeleton-consistent (algorithm 4) separating sets. Dashed edges mark the difference\nbetween two successive iterations. Proof of separating set consistency is given in theorem 4.\n\nAlgorithm 3 Sepset consistent PC algorithm (1st version, orientation consistency)\nRequire: V ,D(V ), signi\ufb01cance level \u03b1\nEnsure: G with consistent separating sets\n\nG0 \u2190 NewStep1(Gc|G\u2205)\nk \u2190 0\nrepeat\n\nk \u2190 k + 1\nGk \u2190 Sk(G0|Gk\u22121)\n\nuntil loop detected, i.e., \u2203n > 0,Gk\u2212n = Gk\nG \u2190\ufffd (Gj)k\nreturn G and consistent separating sets\n\nj=k\u2212n, with discarded con\ufb02icting orientations\n\nAlternatively, one may require a separating set consistency at the level of the skeleton only, i.e., before\nthe orientation steps, which corresponds to algorithm 4, below. Indeed, early sepset inconsistencies at\nthe level of the skeleton might cause orientation errors, which in turn can lead to the rejection of valid\nconsistent separating sets in algorithm 3. As outlined in Figure 2, the modi\ufb01cation of algorithm 4\nonly concerns step 1b, which is called iteratively until a limit cycle is reached. Then, the orientation\nsteps 2&3 are performed as for classical PC or PC-derived algorithms, but using consistent separating\nsets with respect to the union of skeletons returned by the iterative call of step 1b in algorithm 4.\nHowever, as the orientation steps 2&3 might induce additional type II inconsistencies, algorithm 4\nrequires a \ufb01nal consistency check for all separating sets with respect to the \ufb01nal graph G.\n\nAlgorithm 4 Sepset consistent PC algorithm (2nd version, skeleton consistency)\nRequire: V ,D(V ), signi\ufb01cance level \u03b1\nEnsure: G with consistent separating sets\n\nG0 \u2190 NewStep1(Gc|G\u2205)\nk \u2190 0\nrepeat\n\nj=k\u2212n and consistent separating sets with respect to the graph skeleton G\n\nk \u2190 k + 1\nGk \u2190 NewStep1(G0|Gk\u22121)\nuntil loop detected, i.e., \u2203n > 0,Gk\u2212n = Gk\nG \u2190\ufffd (Gj)k\nStep 2 (orientation of v-structures in G)\nStep 3 (propagation of orientations in G)\nfor all removed edges (X, Y ) in G do\nSepset(X, Y | G) \u2190 Sepset(X, Y | Gk)\nif Sepset(X, Y | G) \ufffd Consist(X, Y | G) and Sepset(X, Y | G) \ufffd Consist(Y, X | G) then\nend if\n\nAdd undirected edge (X, Y ) to G\n\nend for\nreturn G and consistent separating sets\n\n5\n\n\fTheorem 4. The separating sets returned by algorithms 3 and 4 are consistent with respect to the\n\ufb01nal graph G.\nProof. Firstly, the limit cycles in algorithms 3 and 4 are warranted to be \ufb01nite by the deterministic\nnature of these algorithms and the \ufb01nite set of graphs Gj.\nIn algorithm 3, as the union of graphs\ufffd (Gj)k\nj=k\u2212n does not remove any edge from the last graph\nGk and discards all con\ufb02icting orientations with previous graphs Gj, j \u2208 { k \u2212 n, k \u2212 1}, taking the\nunion of graphs does not create any new conditional independence relation, nor any inconsistency\nregarding the \ufb01nal separating sets. More precisely, all removed edges in Gk have separating sets\nconsistent with respect to at least one graph in the union (Gk\u22121), which is thus also consistent with\nrespect to the union of graphs G.\nIn algorithm 4, the consistency of separating sets is guaranteed by similar arguments, but only with\nrespect to the skeleton. As the orientation and propagation steps 2&3 might induce additional type II\ninconsistencies, algorithm 4 requires a \ufb01nal consistency check for all separating sets. Adding back\nedges with inconsistent separating sets in the \ufb01nal graph G then guarantees that all the separating sets\nare consistent with respect to de\ufb01nition 1.\n\nXY ) exists in G and 2) Z is not a child of X in G (de\ufb01nition 1).\n\n2.2.3 Tests of Consistency\nA unitary operation of algorithms 3 and 4 is to test, for a vertex Z \u2208 adj(X) \\ { Y } in G, if\nZ \u2208 Consist(X, Y | G), which requires that 1) at least one path from X to Y passing through Z\n(i.e. \u03b3Z\nTo test the \ufb01rst condition, it is conceptually simple to \ufb01rst get all paths between X and Y , then check\nif Z lies in at least one of them, This is however unfeasible as the complexity of getting all paths\nbetween two vertices can be large, depending on the edge density of the graph. Fortunately, it is\npossible to get directly the set of all Z for which at least on path \u03b3Z\nXY exists. This can be done very\nef\ufb01ciently with the help of biconnected component analysis based on block-cut tree decomposition,\nas detailed in Supplementary Material.\nThe second condition assumes the absence of latent variables, which allows for condition indepen-\ndence tests on adjacent nodes only in algorithm 2. It is thus straightforward to test without additional\ncomplexity burden.\nHence, the overall complexity of the consistency tests of separating sets relies on the block-cut tree\ndecomposition, which can be done beforehand within a single depth \ufb01rst search with complexity\nO(|V | + |E|). Thus for each pair (X, Y ), the complexity of \ufb01nding all candidate Z depends on the\nsize of the block-cut tree, which is in the worst case (when the underlying skeleton is a forest) linear\nin the size of the graph, O(|V | + |E|), see Supplementary Material.\n2.3 Empirical Evaluation\n\nWe conducted a series of benchmark structure learning simulations to study the differences between\nthe original PC-stable algorithm and the proposed modi\ufb01cations ensuring consistent separating sets.\nFor each simulation setting, we \ufb01rst quanti\ufb01ed the fraction of inconsistent separating sets predicted\nby the original PC-stable algorithm, Figure 3. We then compared the performance of the original\nPC-stable (algorithm 1 and algorithm 2), orientation-consistent PC-stable (algorithm 3) and skeleton-\nconsistent PC-stable (algorithm 4), for different signi\ufb01cance levels \u03b1, in terms of the precision and\nrecall of the adjacencies found in the inferred graph with respect to the true skeleton, Figures 4 and 5.\nFigure 4 highlights situations for which the original PC manages to recover a DAG that is already\nclosely related to the ground truth but produces inconsistent separating sets, as shown in Figure 3. By\nconstrast, Figure 5 highlights standard benchmarks from the BNlearn repository (Scutari, 2010) for\nwhich the original PC show a poor Recall due to too many spurious conditional independences, and\nultimately outputs a graph with only a few obvious edges. Finally, we also measured the fraction\nof the separating sets used for discarding edges by the three approaches that correspond to true\nD-separation in the ground-truth DAG, Figure 6.\n\n6\n\n\f2.3.1 Data generation and benchmarks\n\nThe data-sets used for the numerical experiments were generated with the following scheme. The\nunderlying DAGs were generated with TETRAD (Scheines et al., 1998) as scale-free DAGs with\n50 nodes (\u03b1 = 0.05, \u03b2 = 0.4, average total degree d(G) = 1.6) using a preferential attachment\nmodel and orienting its edges based on a random topological ordering of the vertices. Data-sets\nwere simulated with linear structural equation models for three settings : strong, medium and weak\ninteractions (with respective coef\ufb01cient ranges [0.2, 0.7], [0.1, 0.5], and [0, 0.3] and covariance ranges\n[0.5, 1.5], [0.5, 1], and [0.2, 0.7]). In addition, we also generated data-sets for the classical benchmarks\nInsurance (27 nodes, 52 links, 984 paramaters), Hepar2 (70 nodes, 123 links, 1453 paramaters) and\nBarley (48 nodes, 84 links, 114005 paramaters) networks from the Bayesian Network repository\n(Scutari, 2010).\nReconstruction benchmarks were performed with pcalg\u2019s (Kalisch et al., 2012) PC-stable imple-\nmentation, modi\ufb01ed for enforcing separating set consistency either taking into account orientations\n(algorithm 3) or at the level of the skeleton (algorithm 4). The (conditional) independence test used\nin all simulations is a linear (partial) correlation with Fisher\u2019s z-transformation. Performances are\nobtained with relation to the true skeleton by measuring the Precision (positive predictive value),\nP rec = T P/(T P + F P ) and Recall or Sensitivity (true positive rate), Rec = T P/(T P + F N )\nwhere T P is a correctly predicted adjacency, F P an incorrectly predicted adjacency and F N an\nincorrectly discarded adjacency.\n\n2.3.2 Benchmark Results\n\nThe fraction of inconsistent separating sets that were used to remove edges was \ufb01rst estimated for\nincreasing sample size and varying parent-child interaction strength, using the original PC-stable\nalgorithm for random and scale-free DAGs of 50 nodes, Figure 3. We note that in typical settings, a\nsigni\ufb01cant fraction of the separating sets that were used to remove edges during Step 1 of the PC-stable\nalgorithm cannot be \"read off\" the returned graph, either because there is no path containing Z that\nconnects X and Y (skeleton inconsistency, green in Figure 3) or because there is a conditioning on an\ninvalid child node (orientation inconsistency, i.e., difference between blue and green inconsistencies\nin Figure 3). Both increasing the sample size and increasing the interaction strength reduces the\nnumber of inconsistent sepsets. We attribute this in part to the severity of the PC-stable algorithm\nwhich tends to remove to many false negative edges because of spurious inconsistencies. With a\nlarger sample size N and stronger interactions, consistent separating sets are still not guaranteed\nby the original algorithm but these settings decrease the number of spurious independencies and\nleads to denser reconstructed graphs, thus making it more likely for potential separating sets to be\nconsistent. Orientation consistency is particularly dif\ufb01cult to obtain with respect to the returned\nCPDAG, as orientation and propagation steps generally suffer even more from sampling noise and\nprevious mistakes than the skeleton reconstruction (Step 1). Notably, the orientation depends on\nthe order in which separating sets are tested in PC-stable (in pcalg it depends on the ordering of the\nvariables in the data-set).\n\ns\nt\ne\ns\n \n\nn\no\n\ni\nt\n\na\nr\na\np\ne\ns\n \nt\n\nn\ne\n\ni\n\nt\ns\ns\nn\no\nc\nn\n\ni\n \nf\n\no\n\n \nt\n\nn\ne\nc\nr\ne\nP\n\n100\n\n75\n\n50\n\n25\n\n0\n\n100\n\n75\n\n50\n\n25\n\n0\n\nr\na\np\ne\ns\n \nt\n\ni\n\nn\ne\nt\ns\ns\nn\no\nc\nn\n\ni\n \nf\n\no\n\n \nt\n\nn\ne\nc\nr\ne\n\n100\n\n75\n\n50\n\n25\n\n0\n\nr\na\np\ne\ns\n \nt\n\ni\n\nn\ne\nt\ns\ns\nn\no\nc\nn\n\ni\n \nf\n\no\n\n \nt\n\nn\ne\nc\nr\ne\n\n100\n\n1000\nN\n\n10000\n\n100\n\n1000\nN\n\n10000\n\n100\n\n1000\nN\n\n10000\n\nFigure 3: Sepset inconsistency of the original PC-stable algorithm. In each subplot the fraction of inconsis-\ntent separating sets with respect to the skeleton (green) or CPDAG (blue) obtained with the original PC-stable\nalgorithm with a \ufb01xed \u03b1 = 0.05 is displayed for increasing sample size N. Data-sets were generated from 100\nscale-free graphs of 50 nodes and d(G) = 1.6 with different parent-child interaction strengths : strong (left),\nmedium (middle) and weak (right).\n\n7\n\n\fWe then compared the performance of the original PC-stable (algorithm 1 and algorithm 2),\norientation-consistent PC-stable (algorithm 3) and skeleton-consistent PC-stable (algorithm 4), for\ndifferent signi\ufb01cance levels \u03b1, in terms of the precision and recall of the adjacencies found in the\ninferred graph with respect to the true skeleton, Figures 4, 5 and S1. Enforcing the sepset consistency\nis shown to signi\ufb01cantly improve the sensitivity of constraint-based methods, for a given \u03b1, while\nachieving equivalent or better overall structure learning performance.\nIt is particularly the case for standard benchmark networks from the BNlearn repository (Scutari,\n2010), Figure 5, for which the original PC-stable algorithm shows good precision but poor recall\n(Rec<0.15-0.35 and Prec>0.65 at maximum Fscore, see iso-Fscore dotted lines in Figure 5), while con-\nsistent PC-stable achieves a better balance between precision and recall (Rec\ufffd0.5 and Prec\ufffd0.5-0.6\nat maximum Fscore, Figure 5).\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n0.00\n\n0.00\n\n0.25\n\n0.50\nRecall\n\n0.75\n\n1.00\n\n0.00\n\n0.00\n\n0.25\n\n0.50\nRecall\n\n0.75\n\n1.00\n\n0.00\n\n0.00\n\n0.25\n\n0.50\nRecall\n\n0.75\n\n1.00\n\nFigure 4: Precision-recall curves for the original PC-stable (yellow), skeleton-consistent PC-stable\n(green) and orientation-consistent PC-stable (blue). The mean performances and standard deviations (error\nbars) obtained over 100 networks are shown for 7 values of the (conditional) independence signi\ufb01cance threshold\n\u03b1 between 10\u22125 and 0.2 Data-sets with N=500 samples were generated from the same graphs as in Figure 3\nwith strong (left), medium (middle) and weak (right) interactions. See Figure S1 for N=100, 1000.\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\ni\n\ni\n\nn\no\ns\nc\ne\nr\nP\n\n0.00\n\n0.00\n\n0.25\n\n0.50\nRecall\n\n0.75\n\n1.00\n\n0.00\n\n0.00\n\n0.25\n\n0.50\nRecall\n\n0.75\n\n1.00\n\n0.00\n\n0.00\n\n0.25\n\n0.50\nRecall\n\n0.75\n\n1.00\n\nFigure 5: Precision-recall curves for the original PC-stable (yellow), skeleton-consistent PC-stable\n(green) and orientation-consistent PC-stable (blue). The mean performances and standard deviations (error\nbars) obtained over 100 networks are shown for 12 values of the (conditional) independence signi\ufb01cance\nthreshold \u03b1 between 10\u221225 and 0.5 (1e-25 1e-20 1e-17 1.0e-15 1.0e-13 1.0e-10 8.7e-09 7.6e-07 6.6e-05 5.7e-03\n5.0e-02 5.0e-01). Data-sets with N=1000 samples were generated for the standard benchmarks Hepar2 (left),\nInsurance (middle) and Barley (right) networks from the BNlearn repository (Scutari, 2010).\n\nFinally, we also compared the fraction of valid separating sets used for discarding edges, which entail\ntrue d-separation in the ground-truth DAG, Figures 6 and S2. Ensuring the consistency of separating\nsets tends to increase, although not guarantee, their validity in terms of actual d-separation. Consistent\nsepsets with invalid d-separation are primarily caused by edge mis-orientations rather than skeleton\nerrors. In particular, skeleton-consistent separating sets yield better performance in terms of valid\nd-separation than orientation-consistent separating sets with the setting of the PC-stable algorithm\nused here. This is, however, expected to depend on the speci\ufb01c settings for conditional independence\ntest, orientation and propagation rules, used in different constraint-based methods.\n3 Conclusion\nIn this paper, we propose and implement simple modi\ufb01cations of the PC algorithm also applicable to\nany PC-derived constraint-based methods, in order to enforce the consistency of the separating sets\n\n8\n\n\f1.0\n\n0.8\n\n0.6\n\n0.4\n\n)\nn\no\n\ni\nt\n\na\nr\na\np\ne\ns\n\u2212\nD\ne\nu\nr\nT\n(\nP\n\n \n\n)\nn\no\n\ni\nt\n\na\nr\np\na\np\ne\ns\n\u2212\nD\ne\nu\nr\nT\n(\nP\n\n \n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n)\nn\no\n\ni\nt\n\na\nr\na\np\ne\ns\n\u2212\nD\ne\nu\nT\n(\nP\n\n \n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n0.00\n\nOriginal PC Orientation\n\nSkeleton\n\nOriginal PC Orientation\n\nSkeleton\n\n)\nn\no\n\ni\nt\n\na\nr\np\na\np\ne\ns\n\u2212\nD\ne\nu\nT\n(\nP\n\n \n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n0.00\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n0.00\n\n)\nn\no\n\ni\nt\n\na\nr\na\np\ne\ns\n\u2212\nD\ne\nu\nT\n(\nP\n\n \n\n)\nn\no\n\ni\nt\n\na\nr\np\na\np\ne\ns\n\u2212\nD\ne\nu\nT\n(\nP\n\n \n\nOriginal PC Orientation\n\nSkeleton\n\n0.7\n\n0.8\n\nRecall\n\n0.9\n\n1.0\n\n0.6\n\n0.7\n\n0.8\nRecall\n\n0.9\n\n1.0\n\n0.4\n\n0.6\n\nRecall\n\n0.8\n\nFigure 6: Proportion of valid d-separation sepsets among edge-removing sepsets. Top row shows the\nproportion of sepsets that correspond to a valid d-separation in the true DAG that were used for removing edges\nduring Step 1 of original, orientation-consistent and skeleton-consistent PC-stable algorithms for all tested \u03b1.\nBottom row shows the average proportion of valid d-separation for a given average recall over all tested values of\n\u03b1. Data-sets with N=500 were generated from 100 DAGs with linear SEMs with strong (left), medium (middle)\nand weak (right) interaction (see Figure S2 for N=100, 1000).\n\nof discarded edges with respect to the \ufb01nal graph, which is an actual shortcoming of constraint-based\napproaches, Figure 3. Enforcing sepset consistency is shown to signi\ufb01cantly improve the sensitivity of\nconstraint-based methods, while achieving equivalent or better overall structure learning performance,\nFigures 4, 5 and S1. In addition, ensuring the consistency of separating sets increases also their\nvalidity in terms of actual d-separation, Figures 6 and S2.\nThe existence of sepset inconsistencies with constraint-based methods originates from their tendency\nto uncover spurious conditional independences early on in the pruning process when the combinatorial\nspace of possible separating sets is still large, unlike in the \ufb01nal typically sparse skeleton. Such\nspurious conditional independences are responsible, in particular, for the large number of false\nnegative edges and, therefore, frequently poor sensitivity of constraint-based methods (Colombo\nand Maathuis, 2014). By contrast, enforcing sepset consistency enables to achieve a better balance\nbetween sensitivity and precision.\nTo circumvent this inconsistency issue during the skeleton step, we have shown that one can either use\nsepset consistency taking into account orientations to help reject inconsistent sepsets (algorithm 3)\nor use sepset consistency of the skeleton to help determine the orientations (algorithm 4). The later\napproach tends to yield slightly better performance with the setting of the PC-stable algorithm used\nhere but this is expected to be dependent on the speci\ufb01c settings used, for conditional independence\ntest, orientation and propagation rules, in different constraint-based methods.\nIndeed, the methods and algorithmic implementations presented here are not primarily meant to out-\ncompete a speci\ufb01c PC or PC-derived algorithm but rather to improve the explainability of constraint-\nbased methods, by ensuring the consistency of all separating sets in the \ufb01nal causal graphs.\nThe approach is very general and applicable to the large variety of constraint-based methods, starting\nwith a complete graph and discarding dispensable edges iteratively based on conditional independence\nsearch. Beyond the formal interest of guaranteeing sepset consistency, this is also especially important,\nin practice, for the interpretability of constraint-based models for real-life applications.\n\nAcknowledgements\nThe authors acknowledge \ufb01nancial support from the French Ministry of Higher Education and\nResearch, PSL Research University and Sorbonne University.\n\n9\n\n\fReferences\nAffeldt, S., and Isambert, H. 2015. Robust reconstruction of causal graphical models based on\nconditional 2-point and 3-point information. In Proceedings of the Thirty-First Conference on\nUncertainty in Arti\ufb01cial Intelligence, UAI 2015, 42\u201351.\n\nAffeldt, S.; Verny, L.; and Isambert, H. 2016. 3off2: A network reconstruction algorithm based on\n\n2-point and 3-point information statistics. BMC Bioinformatics 17(S2).\n\nColombo, D., and Maathuis, M. H. 2014. Order-independent constraint-based causal structure\n\nlearning. Journal of Machine Learning Research 15:3741\u20133782.\n\nColombo, D.; Maathuis, M. H.; Kalisch, M.; and Richardson, T. S. 2012. Learning high-dimensional\n\ndirected acyclic graphs with latent and selection variables. Ann. Statist. 40(1):294\u2013321.\n\nHyttinen, A.; Hoyer, P. O.; Eberhardt, F.; and J\u00e4rvisalo, M. 2013. Discovering cyclic causal\nmodels with latent variables: A general sat-based procedure. In Proceedings of the Twenty-Ninth\nConference on Uncertainty in Arti\ufb01cial Intelligence, UAI\u201913, 301\u2013310. Arlington, Virginia, United\nStates: AUAI Press.\n\nKalisch, M., and B\u00fchlmann, P. 2008. Robusti\ufb01cation of the pc-algorithm for directed acyclic graphs.\n\nJournal Of Computational And Graphical Statistics 17(4):773\u2013789.\n\nKalisch, M.; M\u00e4chler, M.; Colombo, D.; Maathuis, M. H.; and B\u00fchlmann, P. 2012. Causal inference\n\nusing graphical models with the R package pcalg. J. Stat. Softw. 47(11):1\u201326.\n\nKoller, D., and Friedman, N. 2009. Probabilistic Graphical Models: Principles and Techniques.\n\nMIT Press.\n\nPearl, J., and Verma, T. 1991. A theory of inferred causation.\n\nIn Proceedings of the Second\nInternational Conference on Principles of Knowledge Representation and Reasoning, 441\u2013452.\nMorgan Kaufmann Publishers Inc.\n\nPearl, J. 2009. Causality: models, reasoning and inference. Cambridge University Press, 2nd edition.\n\nRichardson, T., and Spirtes, P. 2002. Ancestral graph markov models. Ann. Statist. 30(4):962\u20131030.\n\nScheines, R.; Spirtes, P.; Glymour, C.; Meek, C.; and Richardson, T. 1998. The tetrad project:\nConstraint based aids to causal model speci\ufb01cation. Multivariate Behavioral Research 33(1):65\u2013\n117.\n\nScutari, M. 2010. Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical\n\nSoftware 35(3):1\u201322.\n\nSella, N.; Verny, L.; Uguzzoni, G.; Affeldt, S.; and Isambert, H. 2018. Miic online: a web\nserver to reconstruct causal or non-causal networks from non-perturbative data. Bioinformatics\n34(13):2311\u20132313.\n\nSpirtes, P., and Glymour, C. 1991. An algorithm for fast recovery of sparse causal graphs. Social\n\nScience Computer Review 9:62\u201372.\n\nSpirtes, P.; Glymour, C.; and Scheines, R. 2000. Causation, Prediction, and Search. The MIT Press,\n\nCambridge, Massachusetts, 2nd edition.\n\nSpirtes, P.; Meek, C.; and Richardson, T. 1999. An algorithm for causal inference in the presence of\nlatent variables and selection bias. In Computation, Causation, and Discovery. Menlo Park, CA:\nAAAI Press. 211\u2013252.\n\nTsamardinos, I.; Brown, L. E.; and Aliferis, C. F. 2006. The Max-Min Hill-Climbing Bayesian\n\nNetwork Structure Learning Algorithm. Machine Learning 65(1):31\u201378.\n\nVerny, L.; Sella, N.; Affeldt, S.; Singh, P. P.; and Isambert, H. 2017. Learning causal networks with la-\ntent variables from multivariate information in genomic data. PLoS Comput. Biol. 13(10):e1005662.\n\nZhang, J. 2008. On the completeness of orientation rules for causal discovery in the presence of\n\nlatent confounders and selection bias. Artif. Intell. 172(16-17):1873\u20131896.\n\n10\n\n\f", "award": [], "sourceid": 8026, "authors": [{"given_name": "Honghao", "family_name": "Li", "institution": "Institut Curie"}, {"given_name": "Vincent", "family_name": "Cabeli", "institution": "Institut Curie"}, {"given_name": "Nadir", "family_name": "Sella", "institution": "Institut Curie"}, {"given_name": "Herve", "family_name": "Isambert", "institution": "Institut Curie"}]}