{"title": "Experimental Design for Cost-Aware Learning of Causal Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 5279, "page_last": 5289, "abstract": "We consider the minimum cost intervention design problem: Given the essential graph of a causal graph and a cost to intervene on a variable, identify the set of interventions with minimum total cost that can learn any causal graph with the given essential graph. We first show that this problem is NP-hard. We then prove that we can achieve a constant factor approximation to this problem with a greedy algorithm. We then constrain the sparsity of each intervention. We develop an algorithm that returns an intervention design that is nearly optimal in terms of size for sparse graphs with sparse interventions and we discuss how to use it when there are costs on the vertices.", "full_text": "Experimental Design for Cost-Aware\n\nLearning of Causal Graphs\n\nErik M. Lindgren\n\nUniversity of Texas at Austin\n\nerikml@utexas.edu\n\nMurat Kocaoglu\n\nMIT-IBM Watson AI Lab\n\nmurat@ibm.com\n\nAlexandros G. Dimakis\n\nUniversity of Texas at Austin\n\ndimakis@austin.utexas.edu\n\nSriram Vishwanath\n\nUniversity of Texas at Austin\nsriram@ece.utexas.edu\n\nAbstract\n\nWe consider the minimum cost intervention design problem: Given the essential\ngraph of a causal graph and a cost to intervene on a variable, identify the set of\ninterventions with minimum total cost that can learn any causal graph with the\ngiven essential graph. We \ufb01rst show that this problem is NP-hard. We then prove\nthat we can achieve a constant factor approximation to this problem with a greedy\nalgorithm. We then constrain the sparsity of each intervention. We develop an\nalgorithm that returns an intervention design that is nearly optimal in terms of size\nfor sparse graphs with sparse interventions and we discuss how to use it when there\nare costs on the vertices.\n\n1\n\nIntroduction\n\nCausality is a fundamental concept in science and an essential tool for multiple disciplines such as\nengineering, medical research, and economics [28, 27, 29]. Discovering causal relations has been\nstudied extensively under different frameworks and under various assumptions [25, 15]. To learn\nthe cause-effect relations between variables without any assumptions other than basic modeling\nassumptions, it is essential to perform experiments. Experimental data combined with observational\ndata has been successfully used for recovering causal relationships in different domains [30].\nThere is signi\ufb01cant cost and time required to set up experiments. Often there are many ways to design\nexperiments to discover cause-and-effect relationships. Considering cost when designing experiments\ncan critically change the total cost needed to learn the same causal system. King et al. [18] created a\nrobot scientist that would automatically perform experiments to learn how a yeast gene functions.\nDifferent experiments required different materials with large variations with costs. By considering\nmaterial cost when de\ufb01ning interventions, their robot scientist was able to learn the same causal\nstructure signi\ufb01cantly cheaper.\nSince the work of King et al., there have been a number of papers on automated and cost-sensitive\nexperiment design for causal learning in biological systems. Sverchkov and Craven [33] discuss some\naspects on how to design costs. Ness et al. [24] develop an active learning strategy for cost-aware\nexperiments in protein networks.\nWe study the problem of cost-aware causal learning in Pearl\u2019s framework of causality [25] under the\ncausal suf\ufb01ciency assumption, i.e., when there are no latent confounders. In this framework, there is a\ndirected acyclic graph (DAG) called the causal graph that describes the causal relationships between\nthe variables in our system. Learning direct causal relations between the variables in the system is\nequivalent to learning the directed edges of this graph. From observational data, we can learn of the\nexistence of a causal edge, as well as some of the edge directions, however in general we cannot learn\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fthe direction of every edge. To learn the remaining causal edges, we need to perform experiments and\ncollect additional data from these experiments [3, 10, 13].\nAn intervention is an experiment where we force a variable to take a particular value. An intervention\nis called a stochastic intervention when the value of the intervened variable is assigned to another\nindependent random variable. Interventions can be performed on a single variable, or a subset of\nvariables simultaneously. In the non-adaptive setting, which is what we consider here, all interventions\nare performed in parallel. In this setting, we can only guarantee that an edge direction is learned when\nthere is an intervention such that exactly one of the endpoints is included [19].\nIn the minimum cost intervention design problem, as \ufb01rst formalized by Kocaoglu et al. [19], there is\na cost to intervene on each variable. We want to learn the causal direction of every edge in the graph\nwith minimum total cost. This becomes a combinatorial optimization problem, and so two natural\nquestions that have not yet been addressed are if the problem is NP-hard and if the greedy algorithm\nproposed by [19] has any approximation guarantees.\nOur contributions:\n\n\u2022 We show that the minimum cost intervention design problem is NP-hard.\n\u2022 We modify the greedy coloring algorithm proposed in [19]. We establish that our modi\ufb01ed\nalgorithm is a (2 + \")-approximation algorithm for the minimum cost intervention design\nproblem. Our proof makes use of a connection to submodular optimization.\n\n\u2022 We consider the sparse intervention setup where each experiment can include at most k\nvariables. We show a lower bound to the minimum number of interventions and create an\nalgorithm which is a (1 + o(1))-approximation to this problem for sparse graphs with sparse\ninterventions.\n\n\u2022 We introduce the minimum cost k-sparse intervention design problem and develop an\nalgorithm that is essentially optimal for the unweighted variant of this problem on sparse\ngraphs. We then discuss how to extend this algorithm to the weighted problem.\n\n2 Minimum Cost Intervention Design\n\n2.1 Relevant Graph Theory Concepts\nWe \ufb01rst discuss some graph theory concepts that we utilize in this work.\nA proper coloring of a graph G = (V, E) is an assignment of colors c : V 7! {1, 2, . . . , t} to\nthe vertices V such that for all edges uv 2 E we have c(u) 6= c(v). The chromatic number is the\nminimum number of colors needed for a proper coloring to exist and is denoted by .\nAn independent set of a graph G = (V, E) is a subset of the vertices S \u2713 V such that for all pairs\nof vertices u, v 2 S we have that uv /2 E. The independence number is the size of the maximum\nindependent set and is denoted by \u21b5. If there is a weight function on the vertices, a maximum weight\nindependent set is an independent set with the largest total weight.\nA vertex cover of a graph G = (V, E) is a subset of vertices S such that for every edge uv 2 E, at\nleast one of u or v are in S. Vertex covers are closely related to independent sets: if S is a vertex cover\nthen V \\ S is an independent set and vice versa. Further, if S is a minimum weight vertex cover then\nV \\ S is a maximum weight independent set. The size of the smallest vertex cover of G is denoted \u2327.\nA chordal graph is a graph such that for any cycle v1, v2, . . . , vt for t 4, there is a chord, which is\nan edge between two vertices that are not adjacent in the cycle. There are linear complexity algorithms\nfor \ufb01nding a minimum coloring, maximum weight independent set, and minimum weight vertex\ncover of a chordal graph. Any induced subgraph of a chordal graph is also a chordal graph.\nGiven a graph G = (V, E) and a subset of vertices I \u2713 V , the cut (I) is the set of edges uv 2 E\nsuch that u 2 I and v 2 V \\ I.\n2.2 Causal Graphs and Interventional Learning\nConsider two variables X, Y of a system. If every time we change the value of X, the value of Y\nchanges but not vice versa, then we suspect that variable X causes Y . If we have a set of variables, the\n\n2\n\n\fsame intuition carries through while de\ufb01ning causality. This asymmetry in the directional in\ufb02uence\nbetween variables is at the core of causality.\nPearl [25] and Spirtes et al. [32] formalized the notion of causality using directed acyclic graphs\n(DAGs). DAGs are suitable to encode asymmetric relations. Consider a system of n random variables\nV = {V1, V2, . . . , Vn}. The structural causal model of Pearl models the causal relations between\nvariables as follows: each variables Vi can be written as a deterministic function of a set of other\nvariables Si and an unobserved variable Ei as Vi = fi(Si, Ei). We assume that Ei, called an\nexogenous random variable, is independent from everything, i.e., every variable in V and all other\nexogenous variables Ej. The graph that captures these directional relations is called the causal graph\nbetween variables in V. We restrict the graph created to be acyclic, so that if we replace the value of a\nvariable we potentially change the descendent variables but the ancestor variables will not change.\nGiven a causal graph, a variable is said to be caused by the set of parents 1. This is precisely Si\nin the structural causal model. It is known that the joint distribution induced on V by a structural\ncausal model factorizes with respect to the causal graph. Thus, the causal graph D is a valid Bayesian\nnetwork for the observed joint distribution.\nThere are two main approaches for learning causal graphs from observational distribution: i) score\nbased [6, 11], and ii) constraint based [25, 32]. Score based approaches optimize a score (e.g.,\nlikelihood) over all Bayesian networks to recover the most likely graph. Constraint-based approaches,\nsuch as IC and PC algorithms, use conditional independence tests to identify the causal edges that are\ninvariant across every graph consistent with the observed data. This remaining mixed graph is called\nthe essential graph. The undirected components of the essential graph are always chordal [32, 9]\nAlthough PC runs in time exponential in the maximum degree of the graph, various extensions make\nit feasible to run it even on graphs with 30,000 nodes with maximum degree up to 12 [26]. To learn\nthe rest of the causal edge directions without additional assumptions, we need to use interventions\non the undirected, chordal components. 2 An intervention is an experiment where a random variable\nis forced to take a certain value. Due to the acyclicity assumption on the graph, if X ! Y , then\nintervening on Y should not change the distribution of X, however intervening on X will change the\ndistribution of Y . Running the observational learning algorithms like PC/IC after an intervention on a\nset S of variables, we can learn the new skeleton after the intervention, which allows us to identify\nthe immediate children and immediate parents of the intervened variables. Therefore, if we perform a\nrandomized experiment on a set S of vertices in the causal graph, we can learn the direction of all the\nedges cut between S and V \\ S. This approach has been heavily used in the literature [13, 10, 31].\n2.3 Graph Separating Systems and Minimum Cost Intervention Design\nGiven a causal DAG D = (V, E), we observe the essential graph E(D). Kocaoglu et al. [19]\nestablished that if we want to guarantee learning the direction of the undirected edges with nonadaptive\ninterventions, it is nessesary and suf\ufb01cient for our intervention design I = {I1, I2, . . . , Im} to be a\ngraph separating system on the undirected component of the graph G.\nDe\ufb01nition 1 (Graph Separating System). Given an undirected graph G = (V, E), a graph separating\nsystem of size m is a collection of m subsets of vertices I = {I1, I2, . . . , Im} such that every edge is\ncut at least once, that is,SI2I (I) = E.\n\nRecall that the undirected component of the essential graph of a causal DAG is always a chordal\ngraph. We can now de\ufb01ne the minimum cost intervention design problem.\nDe\ufb01nition 2 (Minimum Cost Intervention Design). Given a chordal graph G = (V, E), a set of\nweights wv for all v 2 V , and a size constraint m dlog e, the minimum cost intervention design\nproblem is to \ufb01nd a graph separating system I of size at most m that minimizes the cost\n\n1To be more precise, parent nodes are said to directly cause a variable whereas ancestors cause indirectly\nthrough parents. In this paper, we will not make this distinction since we do not use indirect causal relations for\ngraph discovery.\n\n2It is known that the edges identi\ufb01ed in a chordal component of the skeleton do not help identify edges in\n\nanother component [9].Thus, each chordal component learning task can be treated as an individual problem.\n\ncost(I) =XI2IXv2I\n\nwv.\n\n3\n\n\fGraph separating systems are tightly related to graph colorings. Mao-Cheng [22] proved that the\nsmallest graph separating system has size m = dlog e, where is the chromatic number. To see\nthis, for each vertex, we create a binary vector c(v) where c(v)i = 1 if v 2 Ii and c(v)i = 0 if v /2 Ii.\nSince two neighboring vectors u and v must have, for some intervention Ii, exactly one of u 2 Ii or\nv 2 Ii, the assignment of vectors to vertices c : V 7! 0, 1m is a proper coloring. With a size m graph\nseparating system, we are able to create 2m different colors, proving that the size of the smallest\nseparating system is exactly m = dlog e.\nThe equivalence between graph separating systems and coloring allows us to de\ufb01ne an equivalent\ncoloring version of the minimum cost intervention design problem, which was \ufb01rst developed in [19].\nDe\ufb01nition 3 (Minimum Cost Intervention Design, Coloring Version). Given a chordal graph G =\n(V, E), a set of weights wv for all v 2 V , and the colors C = {0, 1}m such that |C| , the coloring\nversion of the minimum cost intervention design problem is to \ufb01nd a proper coloring c : V 7! C that\nminimizes the total cost\n\ncost(c) =Xv2V\n\nkc(v)k1wv.\n\nGiven a minimum cost coloring from the coloring variant of the minimum cost intervention design,\nwe can create a minimum cost intervention design. Further, the reduction is approximation preserving.\nIn practice, it can sometimes be dif\ufb01cult to intervene on a large number of variables. A variant\nof intervention design of interest is when every intervention can only involve k variables. For this\nproblem, we want our interventions to be a k-sparse graph separating system.\nDe\ufb01nition 4 (k-Sparse Graph Separating System). Given an undirected graph G = (V, E), a k-sparse\ngraph separating system of size m is a collection of m subsets of vertices I = {I1, I2, . . . , Im} such\nthat all subsets Ii satisfy |Ii|\uf8ff k and every edge is cut at least once, that is,SI2I (I) = E.\n\nWe consider two optimization problems related to k-sparse graph separating systems. In the \ufb01rst one\nwe want to \ufb01nd a graph separating system of minimum size.\nDe\ufb01nition 5 (Minimum Size k-Sparse Intervention Design). Given a chordal graph G = (V, E) and\na sparsity constraint k, the minimum size k-sparse intervention design problem is to \ufb01nd a k-sparse\ngraph separating system for G of minimum size, that is, we want to minimize the cost\n\ncost(I) = |I|.\n\nFor the next problem, we want to \ufb01nd the k-sparse intervention design of minimum cost where there\nis a cost to intervene on every variable.\nDe\ufb01nition 6 (Minimum Cost k-Sparse Intervention Design). Given a chordal graph G = (V, E), a\nset of weights wv for all v 2 V , a sparsity constraint k, and a size constraint m, the minimum cost\nk-sparse intervention design problem is to \ufb01nd a k-sparse graph separating system I of size m that\nminimizes the cost\n\ncost(I) =XI2IXv2I\n\nwv.\n\n3 Related Work\n\nOne problem of interest is to \ufb01nd the intervention design with the smallest number of interventions.\nEberhardt et al. [2] established that dlog ne is suf\ufb01cient and nessesary in the worst case. Eberhardt\n[3] established that graph separating systems are necessary across all graphs (the example he used is\nthe complete graph). Hauser and B\u00fchlmann [10] establish the connection between graph colorings\nand intervention designs by using the key observation of Mao-Cheng [22] that graph colorings can\nbe used to construct graph separating systems, and vise-versa. This leads to the requirement and\nsuf\ufb01ciency of dlog()e experiments where is the chromatic number of the graph.\nSince graph coloring can be done ef\ufb01ciently for chordal graphs, we can ef\ufb01ciently create a minimum\nsize intervention design when given as input a chordal skeleton. Similarly, if we are given as input\nan arbitrary graph, perhaps due to side information on some edge directions, it is NP-hard to \ufb01nd a\nminimum size intervention design [13, 22].\nHu et al. [12] proposed a randomized algorithm that requires only O(log log n) experiments and\nlearns the causal graph with high probability.\n\n4\n\n\fCloser to our setup, Hyttinen et al. [13] considers a special case of minimum cost intervention design\nproblem when every vertex has cost 1 and the input is the complete graph. They were able to optimally\nsolve this special case. Kocaoglu et al. [19] was the \ufb01rst to formalize the minimum cost intervention\ndesign problem on general chordal graphs and the relationship to its coloring variant. They used\nthe coloring variant to develop a greedy algorithm that \ufb01nds a maximum weighted independent\nset and colors this set with the available color with the lowest weight. However their work did not\nestablish approximation guarantees on this algorithm and it is not clear how many iterations the\ngreedy algorithm needs to fully color the graph\u2014we address these issues in this paper. Further it was\nunknown until our work that the minimum cost intervention design problem is NP-hard.\nThere has been a lot of prior work when every intervention is constrained to be of size at most k.\nEberhardt et al. [2] was the \ufb01rst to consider the minimum size k-sparse intervention design problem\nand established suf\ufb01cient conditions on the number of interventions needed for the complete graph.\nHyttinen et al. [13] showed how k-sparse separating system constructions can be used for intervention\ndesigns on the complete graph using the construction of Katona [17]. They establish the necessary\nand suf\ufb01cient number of k-sparse interventions needed to learn all causal directions in the complete\ngraph. Shanmugam et al. [31] illustrate that for the complete graphs separating systems are necessary\neven under the constraint that each intervention has size at most k. They also identify an information\ntheoretic lower bound on the necessary number of experiments and propose a new optimal k-sparse\nseparating system construction for the complete graph. To the best of our knowledge there has been\nno graph dependent bounds on the size of a k-sparse graph separating systems until our work.\nGhassami et al. considered the dual problem of maximizing the number of learned causal edges for\na given number of interventions [7]. They show that this problem is a submodular maximization\nproblem when only interventions involving a single variable are allowed. We note that their connection\nto submodularity is different than the one we discover in our work.\nGraph coloring has been extensively studied in the literature. There are various versions of graph\ncoloring problem. We identify a connection of the minimum cost intervention design problem to the\ngeneral optimum cost chromatic partition problem (GOCCP). GOCCP is a graph coloring problem\nwhere there are t colors and a cost vi to color vertex v with color i. It is a more general version of\nthe minimum cost intervention design problem. Jansen [16] established that for graphs with bounded\ntreewidth r, the GOCCP can be solved exactly in time O(trn). This implies that for graphs with\nmaximum degree we can solve the minimum cost intervention design problem exactly in time\nO(2mn). Note that m is at least log and can be as large as , thus this algorithm is not practical\neven for = 12.\n\n4 Hardness of Minimum Cost Intervention Design\n\nIn this section, we show that the minimum cost intervention design problem is NP-hard.\nWe assume that the input graph is chordal, since it is obtained as an undirected component of a causal\ngraph skeleton. We note that every chordal graph can be realized by this process.\nProposition 7. For any undirected chordal graph G, there is a causal graph D such that the essential\ngraph E(D) = G.\nThus every chordal graph is the undirected subgraph of the essential graph for some causal DAG.\nThis validates the problem de\ufb01nition of the minimum cost intervention design as any chordal graph\ncan be given as input. We now state our hardness result.\nTheorem 8. The minimum cost intervention design problem is NP-hard.\nPlease see Appendix D for the proof. Our proof is based on the reduction from numerical 3D matching\nto a graph coloring problem that is more general than the minimum cost intervention problem on\ninterval graphs by Kroon et al. [21]. Our hardness proof holds even if the vertex costs are all equal to\n1 and the input graph is an interval graph, which is a subset of chordal graphs that often have ef\ufb01cient\nalgorithms for problems that are hard in general graphs.\nIt it worth comparing to complexity results on related minimum size intervention design problem.\nThe minimum size intervention design problem on a graph can be solved by \ufb01nding a minimum\ncoloring on the same graph [22, 10]. For chordal graphs, graph coloring can be solved ef\ufb01ciently\nso the minimum size intervention design problem can also be solved ef\ufb01ciently. In contrast, the\n\n5\n\n\fminimum cost intervention design problem is NP-hard, even on chordal graphs. Both problems are\nhard on general graphs, which can be due to side information.\n\n5 Approximation Guarantees for Minimum Cost Intervention Design\n\nSince the input graph is chordal, we can \ufb01nd the maximum weighted independent sets in polynomial\ntime using Frank\u2019s algorithm [4]. Further, a chordal graph remains chordal after removing a subset\nof the vertices. The authors of [19] use these facts to construct a greedy algorithm for this weighted\ncoloring problem. Let G0 = G. On iteration t, \ufb01nd the maximum weighted independent set in Gt\nand assign these vertices the available color with the smallest cost. Then let Gt+1 be the graph after\nremoving the colored vertices from Gt. Repeat this until all vertices are colored. Convert the coloring\nto a graph separating system and return this design.\nOne issue with this algorithm is it is not clear how many iterations the greedy algorithm will utilize\nuntil the graph is fully colored. This is important as we want to satisfy the size constraint on the graph\nseparating system. To reduce the number of colors in the graph, we introduce a quantization step\nto reduce the number of iterations the greedy algorithm requires to completely color the graph. In\nFigure 3 of Appendix A, we see an example of a (non-chordal) graph where without quantization the\ngreedy algorithm requires n/2 colors but with quantization it only requires 4 colors.\nSpeci\ufb01cally, we \ufb01rst \ufb01nd the maximum independent set of the input graph and remove it. We then\n\ufb01nd the maximum cost vertex of the new graph with weight wmax. For all vertices v in the new graph,\nwe replace the cost wv with b wvn3\nThe reason we \ufb01rst remove the maximum independent set before quantizing is because the maximum\nindependent set will be colored with a color of weight 0, and thus not contribute to the cost. We want\nthe quantized costs to not be arbitrarily far from the original costs, except for the vertices that are not\nintervened on. For example, if there is a vertex with a weight of in\ufb01nity, we will never intervene on it.\nHowever if we were to quantize it the optimal solution to the quantized problem can be arbitrarily far\nfrom the true optimal solution. Our method of quantization will allow us to show that a good solution\nto the quantized weights is also a good solution to the true weights.\n\nwmaxc. See Algorithm 1 for pseudocode describing our algorithm.\n\nAlgorithm 1 Greedy Coloring Algorithm with Quantization\n\nInput: A chordal graph G = (V, E), positive integral weights wi for all i 2 V .\nQuantize the vertex weights:\nS0 maximum weighted independent set of G\nwmax maxi2V \\S0 wi\nwi j win3\nwmaxk\n\nGreedy weighted coloring algorithm:\nAssign S0 color 0\nG1 G S0\nt 1\nwhile Gt is not empty:\n\nSt maximum weight independent set of Gt\ncolor all vertices of St with the color t\nGt+1 Gt St\nt t + 1\n\nconvert the coloring of G to a graph separating system I\nreturn I\n\nWe now state our main theorem, which guarantees that the greedy algorithm with quantization\nwill return a solution that is a (2 + \")-approximation from the optimal solution while only using\nlog + O(log log n) interventions. Our algorithm thus returns a good solution to the minimum cost\nintervention design problem whenever the allowed number of interventions m log +O(log log n).\nNote that m log is required for there to exist any graph separating system.\nTheorem 9. If the number of interventions m satis\ufb01es m log + log log n + 5, then the greedy\ncoloring algorithm with quantization for the minimum cost intervention design problem creates a\n\n6\n\n\fgraph separating system Igreedy such that\n\ncost(Igreedy) \uf8ff (2 + \")OPT,\n\nwhere \" = exp(\u2326(m)) + n1.\nSee Appendix B for the proof of the theorem. We present a brief sketch of our proof.\nTo show that the greedy algorithm uses a small number of colors, we \ufb01rst de\ufb01ne a submodular,\nmonotone, and non-negative function such that every vertex has been colored if and only if this\nparticular submodular function is maximized. This is an instance of the submodular cover problem.\nWolsey established that the greedy algorithm for the submodular cover problem returns a set with\ncardinality that is close to the optimal cardinality solution when the values of the submodular function\nare bounded by a polynomial [35]. This is why we need to quantize the weights.\nTo show that the greedy algorithm returns a solution with small value, we \ufb01rst de\ufb01ne a new class\nof functions which we call supermodular chain functions. We then show that the minimum cost\nintervention design problem is an instance of a supermodular chain function. Using result on submod-\nular optimization from [23, 20] and some nice properties of the minimum cost intervention design\nproblem, we are able to show that the greedy algorithm returns an approximately optimal solution.\nTo relate the quantized weights back to the original weights, we use an analysis that is similar to the\nanalysis used to show the approximation guarantees of the knapsack problem [14].\nFinally, we remark how our algorithm will perform when there are vertices with in\ufb01nite cost. These\nvertices can be interpreted as variables that cannot be intervened on. If these variables form an\nindependent set, then they can be colored with the color of weight zero. We can maintain our\ntheoretical guarantees in this case, since our quantization procedure \ufb01rst removes the maximum\nweight independent set. If the variables with in\ufb01nite cost do not form an independent set, then no\nvalid graph separating system has \ufb01nite cost.\n\n6 Algorithms for k-Sparse Intervention Design Problems\n\nWe \ufb01rst establish a lower bound for how large a k-sparse graph separating system must be for a graph\nG based on the size of the smallest vertex cover of the graph \u2327.\nProposition 10. For any graph G, the size of the smallest k-sparse graph separating system m\u21e4k\nsatis\ufb01es m\u21e4k \u2327\nSee Appendix C for the proof.\n\nk , where \u2327 is the size of the smallest vertex cover in the graph G.\n\nAlgorithm 2 Algorithm for Min Size and Unweighted Min Cost k-Sparse Intervention Design\n\nInput: A chordal graph G, a sparsity constraint k.\nS minimum size vertex cover of G.\nGS induced graph of S in G.\nFind an optimal coloring of GS.\nSplit the color classes of GS into size k intervention sets I1, I2, . . . , Im.\nReturn I = {I1, I2, . . . , Im}.\n\nWe use Algorithm 2 to \ufb01nd a small k-sparse graph separating system. It \ufb01rst \ufb01nds the minimum\ncardinality vertex cover S. It then \ufb01nds an optimal coloring of the graph induced with the vertices of\nS. It then partitions the color class into independent sets of size k and performs an intervention for\neach of these partitions. Since the set of vertices not in a vertex cover is an independent set, this is a\nvalid k-sparse graph separating system.\nWhen the sparsity k and the maximum degree are small, Algorithm 2 is nearly optimal. Using\nProposition 10, we can establish the following approximation guarantee on the size of the graph\nseparating system created.\nTheorem 11. Given a chordal graph G with maximum degree , Algorithm 2 \ufb01nds a k-sparse graph\nseparating system of size mk such that\n\nmk \uf8ff\u27131 +\n\n\u25c6 OPT,\n\nk( + 1)\n\nn\n\n7\n\n\fwhere OPT is the size of the smallest k-sparse graph separating system.\n\nSee Appendix C for the proof. If the sparsity constraint k and the maximum degree of the graph \nboth satisfy k, = o(n1/3), then Theorem 11 implies that we have a 1 + o(1) approximation to the\noptimal solution.\nOne interesting aspect of Algorithm 2 is that every vertex is only intervened on once and the set of\nelements not intervened on is the maximum cardinality independent set. By a similar argument to\nTheorem 2 of [19], we have that this algorithm is optimal in the unweighted case.\nCorollary 12. Given an instance of the minimum cost k-sparse intervention design problem with\nchordal graph G with maximum degree and vertex cover of size \u2327, sparsity constraint k, size\n), and all vertex weights wv = 1, Algorithm 2 returns a solution with\nconstraint m \u2327\noptimal cost.\n\nk (1 + k(+1)\n\nn\n\nWe show one way to extend Algorithm 2 to the weighted case. There is a trade off between the size\nand the weight of the independent set of vertices that are never intervened on. We can trade these off\nby adding a penalty to every vertex weight, i.e., the new weight w\nv = wv + .\nLarger values of will encourage independent sets of larger size. See Algorithm 3 for the pseudocode\ndescribing this algorithm. We can run Algorithm 3 for various values of to explore the trade off\nbetween cost and size.\n\nv of a vertex v is w\n\nAlgorithm 3 Algorithm for Weighted Min Cost k-Sparse Intervention Design\n\nInput: chordal graph G, sparsity constraint k, vertex weights wv, penalty parameter \nS minimum weight vertex cover S using weights w\nGS induced graph of S in G.\nFind an optimal coloring of GS.\nSplit the color classes of GS into size k intervention sets I1, I2, . . . , Im.\nReturn I = {I1, I2, . . . , Im}.\n\nv = wv + .\n\n7 Experiments\n\nWe generate chordal graphs following the procedure of [31], however we modify the sampling\nalgorithm so that we can control the maximum degree. First we order the vertices {v1, v2, . . . , vn}.\nFor vertex vi we choose a vertex from {vib, vib+1, . . . , vi1} uniformly at random and add it to\nthe neighborhood of vi. We then go through the vertices {vib, vib+1, . . . , vi1} and add them\nb . We then add edges so that the neighbors of vi in\nto the neighborhood of vi with probability d\n{v1, v2, . . . , vi1} form a clique. This is guaranteed to be a connected chordal graph with maximum\ndegree bounded by 2b.\nIn our \ufb01rst experiment we compare the greedy algorithm to two other algorithms. One \ufb01rst assigns the\nmaximum weight independent set the weight 0 color, then \ufb01nds a minimum coloring of the remaining\nvertices, sorts the independent sets by weight, then assigns the cheapest colors to the independent\nsets of the highest weight. The other algorithm \ufb01nds the optimal solution with integer programming\nusing the Gurobi solver[8]. The integer programming formulation is standard (see, e.g., [1]).\nWe compare the cost of the different algorithms when we (a) adjust the number of vertices while\nmaintaining the average degree and (b) adjust the average degree while maintaining the number of\nvertices. We see that the greedy coloring algorithm performs almost optimally. We also see that it is\nable to \ufb01nd a proper coloring even with only m = 5 interventions and no quantization. See Figure 1\nfor the complete results.\nIn our second experiment we see how Algorithm 3 allows us to trade off the number of interventions\nand the cost of the interventions in the k-sparse minimum cost intervention design problem. See\nFigure 2 for the results.\nFinally, we observe the empirical running time of the greedy algorithm. We generate graphs on\n10, 000 vertices with maximum degree 20 and have 5 interventions. The greedy algorithm terminates\nin 5 seconds. In contrast, the integer programming solution takes 128 seconds using the Gurobi solver\n[8].\n\n8\n\n\fNum Vertices vs Cost\n\nin Min Cost Intervention Design\n\nAverage Degree vs Cost\n\nin Min Cost Intervention Design\n\nBaseline\nGreedy\nOptimal\n\n3,000\n\n2,000\n\nt\ns\no\nc\n\n1,000\n\nBaseline\nGreedy\nOptimal\n\n400\n\n300\n\n200\n\nt\ns\no\nc\n\n2,000\n4,000\nnumber of vertices\n\n5\n\n10\n\n15\n\naverage degree\n\n(a) We adjust the number of vertices. The aver-\nage degree stays close to 10 for all values of the\nnumber of vertices.\n\n(b) The number of vertices are \ufb01xed at 500. We\nadjust the sparsity parameter in the graph genera-\ntor to see how the algorithms perform for varying\ngraph densities.\n\nFigure 1: We generate random chordal graphs such that the maximum degree is bounded by 20. The\nnode weights are generated by the heavy-tailed Pareto distribution with scale parameter 2.0. The\nnumber of interventions m is \ufb01xed to 5. We compare the greedy algorithm to the optimal solution\nand the baseline algorithm mentioned in the experimental setup. We see that the greedy algorithm is\nclose to optimal and outperforms the baseline. We also see that the greedy algorithm is able to \ufb01nd a\nsolution with the available number of colors, even without quantization.\n\nNum Interventions vs. Cost in k-Sparse\n\nMin Cost Intervention Design\n\n1\n\n0.95\n\n0.9\n\n0.85\n\n0.8\n\nt\ns\no\nc\n\n520\nnumber of interventions\n\n540\n\n560\n\nFigure 2: We sample graphs of size 10000 such that the maximum degree is bounded by 20 and\nthe average degree is 3. We draw the weights from the heavy-tailed Pareto distribution with scale\nparameter 2.0. We restrict all interventions to be of size 10. We adjust the penalty parameter in\nAlgorithm 3 to see how the size of the k-sparse graph separating system relates to the cost. Costs are\nnormalized so that the largest cost is 1.0. We see that with 561 interventions we can achieve a cost of\n0.78 compared to a cost of 1.0 with 510 interventions. Our lower bound implies that we need 506\ninterventions on average.\n\n9\n\n\fAcknowledgments\n\nThis material is based upon work supported by the National Science Foundation Graduate Research\nFellowship under Grant No. DGE-1110007. This research has been supported by NSF Grants CCF\n1422549, 1618689, DMS 1723052, CCF 1763702, ARO YIP W911NF-14-1-0258 and research gifts\nby Google, Western Digital, and NVIDIA.\n\nReferences\n[1] Diego Delle Donne and Javier Marenco. Polyhedral studies of vertex coloring problems: The\n\nstandard formulation. Discrete Optimization, 21:1\u201313, 2016.\n\n[2] Frederich Eberhardt, Clark Glymour, and Richard Scheines. On the number of experiments\nsuf\ufb01cient and in the worst case necessary to identify all causal relations among n variables. In\nUncertainty in Arti\ufb01cial Intelligence, 2005.\n\n[3] Frederick Eberhardt. Causation and intervention. (Ph.D. Thesis), 2007.\n[4] Andr\u00e1s Frank. Some polynomial algorithms for certain graphs and hypergraphs. In British\n\nCombinatorial Conference, 1975.\n\n[5] Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory\n\nof NP-Completeness. W. H. Freeman & Co., 1979.\n\n[6] Dan Geiger and David Heckerman. Learning Gaussian networks. In Uncertainty in Arti\ufb01cial\n\nIntelligence, 1994.\n\n[7] AmirEmad Ghassami, Saber Salehkaleybar, Negar Kiyavash, and Elias Bareinboim. Budgeted\nIn International Conference on Machine\n\nexperiment design for causal structure learning.\nLearning, 2018.\n\n[8] Gurobi Optimization, LLC. Gurobi optimizer reference manual, 2018.\n[9] Alain Hauser and Peter B\u00fchlmann. Characterization and greedy learning of interventional\nMarkov equivalence classes of directed acyclic graphs. Journal of Machine Learning Research,\n13(1):2409\u20132464, 2012.\n\n[10] Alain Hauser and Peter B\u00fchlmann. Two optimal strategies for active learning of causal networks\n\nfrom interventional data. In European Workshop on Probabilistic Graphical Models, 2012.\n\n[11] David Heckerman, Dan Geiger, and David Chickering. Learning Bayesian networks: The\n\ncombination of knowledge and statistical data. In Machine Learning, 20(3):197\u2013243, 1995.\n\n[12] Huining Hu, Zhentao Li, and Adrian Vetta. Randomized experimental design for causal graph\n\ndiscovery. In Neural Information Processing Systems, 2014.\n\n[13] Antti Hyttinen, Frederick Eberhardt, and Patrik Hoyer. Experiment selection for causal discovery.\n\nJournal of Machine Learning Research, 14:3041\u20133071, 2013.\n\n[14] Oscar H Ibarra and Chul E Kim. Fast approximation algorithms for the knapsack and sum of\n\nsubset problems. Journal of the ACM, 22(4):463\u2013468, 1975.\n\n[15] Guido W. Imbens and Donald B. Rubin. Causal Inference for Statistics, Social, and Biomedical\n\nSciences: An Introduction. Cambridge University Press, 2015.\n\n[16] Klaus Jansen. The optimum cost chromatic partition problem.\n\nAlgorithms and Complexity, 1997.\n\nIn Italian Conference on\n\n[17] Gyula Katona. On separating systems of a \ufb01nite set. Journal of Combinatorial Theory, 1(2):\n\n174\u2013194, 1966.\n\n[18] Ross D King, Kenneth E Whelan, F\ufb01on M Jones, Philip GK Reiser, Christopher H Bryant,\nStephen H Muggleton, Douglas B Kell, and Stephen G Oliver. Functional genomic hypothesis\ngeneration and experimentation by a robot scientist. Nature, 427(6971):247, 2004.\n\n10\n\n\f[19] Murat Kocaoglu, Alexandros G. Dimakis, and Sriram Vishwanath. Cost-optimal learning of\n\ncausal graphs. In International Conference on Machine Learning, 2017.\n\n[20] Andreas Krause and Daniel Golovin. Submodular function maximization, 2014.\n[21] Leo G Kroon, Arunabha Sen, Haiyong Deng, and Asim Roy. The optimal cost chromatic\npartition problem for trees and interval graphs. In International Workshop on Graph-Theoretic\nConcepts in Computer Science, 1996.\n\n[22] Cai Mao-Cheng. On separating systems of graphs. Discrete Mathematics, 49:15\u201320, 1984.\n[23] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations\nfor maximizing submodular set functions\u2014i. Mathematical Programming, 14(1):265\u2013294,\n1978.\n\n[24] Robert Osazuwa Ness, Karen Sachs, Parag Mallick, and Olga Vitek. A Bayesian active learning\nexperimental design for inferring signaling networks. In International Conference on Research\nin Computational Molecular Biology, 2017.\n\n[25] Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009.\n[26] Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. A million\nvariables and more: The fast greedy equivalence search algorithm for learning high-dimensional\ngraphical causal models, with an application to functional magnetic resonance images. Interna-\ntional Journal of Data Science and Analytics, 3(2):121\u2013129, 2017.\n\n[27] Joseph D Ramsey, Stephen Jos\u00e9 Hanson, Catherine Hanson, Yaroslav O Halchenko, Russell A\nPoldrack, and Clark Glymour. Six problems for causal inference from fMRI. Neuroimage, 49\n(2):1545\u20131558, 2010.\n\n[28] Maya Rotmensch, Yoni Halpern, Abdulhakim Tlimat, Steven Horng, and David Sontag. Learn-\ning a health knowledge graph from electronic medical records. Scienti\ufb01c Reports, 7(1):5994,\n2017.\n\n[29] Donald B Rubin and Richard P Waterman. Estimating the causal effects of marketing interven-\n\ntions using propensity score methodology. Statistical Science, pages 206\u2013222, 2006.\n\n[30] Karen Sachs, Omar Perez, Dana Pe\u2019er, Douglas A Lauffenburger, and Garry P Nolan. Causal\nprotein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):\n523\u2013529, 2005.\n\n[31] Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, and Sriram Vishwanath.\nLearning causal graphs with small interventions. In Neural Information Processing Systems,\n2015.\n\n[32] Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. A\n\nBradford Book, 2001.\n\n[33] Yuriy Sverchkov and Mark Craven. A review of active learning approaches to experimental\ndesign for uncovering biological networks. PLoS Computational Biology, 13(6):e1005466,\n2017.\n\n[34] David P Williamson and David B Shmoys. The design of approximation algorithms. Cambridge\n\nUniversity Press, 2011.\n\n[35] Laurence A Wolsey. An analysis of the greedy algorithm for the submodular set covering\n\nproblem. Combinatorica, 2(4):385\u2013393, 1982.\n\n11\n\n\f", "award": [], "sourceid": 2527, "authors": [{"given_name": "Erik", "family_name": "Lindgren", "institution": "University of Texas at Austin"}, {"given_name": "Murat", "family_name": "Kocaoglu", "institution": "IBM Research"}, {"given_name": "Alexandros", "family_name": "Dimakis", "institution": "University of Texas, Austin"}, {"given_name": "Sriram", "family_name": "Vishwanath", "institution": "University of Texas at Austin"}]}