{"title": "Transportability from Multiple Environments with Limited Experiments: Completeness Results", "book": "Advances in Neural Information Processing Systems", "page_first": 280, "page_last": 288, "abstract": "This paper addresses the problem of $mz$-transportability, that is, transferring causal knowledge collected in several heterogeneous domains to a target domain in which only passive observations and limited experimental data can be collected. The paper first establishes a necessary and sufficient condition for deciding the feasibility of $mz$-transportability, i.e., whether causal effects in the target domain are estimable from the information available. It further proves that a previously established algorithm for computing transport formula is in fact complete, that is, failure of the algorithm implies non-existence of a transport formula. Finally, the paper shows that the do-calculus is complete for the $mz$-transportability class.", "full_text": "Transportability from Multiple Environments\n\nwith Limited Experiments: Completeness Results\n\nElias Bareinboim\nComputer Science\n\nUCLA\n\neb@cs.ucla.edu\n\nJudea Pearl\n\nComputer Science\n\nUCLA\n\njudea@cs.ucla.edu\n\nAbstract\n\nThis paper addresses the problem of mz-transportability, that is, transferring\ncausal knowledge collected in several heterogeneous domains to a target domain\nin which only passive observations and limited experimental data can be collected.\nThe paper \ufb01rst establishes a necessary and suf\ufb01cient condition for deciding the\nfeasibility of mz-transportability, i.e., whether causal effects in the target domain\nare estimable from the information available. It further proves that a previously\nestablished algorithm for computing transport formula is in fact complete, that is,\nfailure of the algorithm implies non-existence of a transport formula. Finally, the\npaper shows that the do-calculus is complete for the mz-transportability class.\n\n1 Motivation\n\nThe issue of generalizing causal knowledge is central in scienti\ufb01c inferences since experiments are\nconducted, and conclusions that are obtained in a laboratory setting (i.e., speci\ufb01c population, do-\nmain, study) are transported and applied elsewhere, in an environment that differs in many aspects\nfrom that of the laboratory. If the target environment is arbitrary, or drastically different from the\nstudy environment, no causal relations can be learned and scienti\ufb01c progress will come to a stand-\nstill. However, the fact that scienti\ufb01c experimentation continues to provide useful information about\nour world suggests that certain environments share common characteristics and that, owed to these\ncommonalities, causal claims would be valid even where experiments have never been performed.\nRemarkably, the conditions under which this type of extrapolation can be legitimized have not been\nformally articulated until very recently. Although the problem has been extensively discussed in\nstatistics, economics, and the health sciences, under rubrics such as \u201cexternal validity\u201d [1, 2], \u201cmeta-\nanalysis\u201d [3], \u201cquasi-experiments\u201d [4], \u201cheterogeneity\u201d [5], these discussions are limited to verbal\nnarratives in the form of heuristic guidelines for experimental researchers \u2013 no formal treatment of\nthe problem has been attempted to answer the practical challenge of generalizing causal knowledge\nacross multiple heterogeneous domains with disparate experimental data as posed in this paper. The\nlack of sound mathematical machinery in such settings precludes one of the main goals of machine\nlearning (and by and large computer science), which is automating the process of discovery.\nThe class of problems of causal generalizability is called transportability and was \ufb01rst formally\narticulated in [6]. We consider the most general instance of transportability known to date that is\nthe problem of transporting experimental knowledge from heterogeneous settings to a certain spe-\nci\ufb01c target. [6] introduced a formal language for encoding differences and commonalities between\ndomains accompanied with necessary or suf\ufb01cient conditions under which transportability of em-\npirical \ufb01ndings is feasible between two domains, a source and a target; then, these conditions were\nextended for a complete characterization for transportability in one domain with unrestricted exper-\nimental data [7, 8]. Subsequently, assumptions were relaxed to consider settings when only limited\nexperiments are available in the source domain [9, 10], further for when multiple source domains\n\n1\n\n\fwith unrestricted experimental information are available [11, 12], and then for multiple heteroge-\nneous sources with limited and distinct experiments [13], which was called \u201cmz-transportability\u201d.1\nSpeci\ufb01cally, the mz-transportability problem concerns with the transfer of causal knowledge from a\nheterogeneous collection of source domains \u03a0 = {\u03c01, ..., \u03c0n} to a target domain \u03c0\u2217. In each domain\n\u03c0i \u2208 \u03a0, experiments over a set of variables Zi can be performed, and causal knowledge gathered.\nIn \u03c0\u2217, potentially different from \u03c0i, only passive observations can be collected (this constraint will\nbe weakened). The problem is to infer a causal relationship R in \u03c0\u2217 using knowledge obtained in \u03a0.\nThe problem studied here generalizes the one-dimensional version of transportability with limited\nscope and the multiple dimensional with unlimited scope previously studied. Interestingly, while\ncertain effects might not be individually transportable to the target domain from the experiments in\nany of the available sources, combining different pieces from the various sources may enable their\nestimation. Conversely, it is also possible that effects are not estimable from multiple experiments in\nindividual domains, but they are from experiments scattered throughout domains (discussed below).\nThe goal of this paper is to formally understand the conditions causal effects in the target do-\nmain are (non-parametrically) estimable from the available data. Suf\ufb01cient conditions for \u201cmz-\ntransportability\u201d were given in [13], but this treatment falls short of providing guarantees whether\nthese conditions are also necessary, should be augmented, or even replaced by more general ones.\nThis paper establishes the following results:\n\n\u2022 A necessary and suf\ufb01cient condition for deciding when causal effects in the target domain\nare estimable from both the statistical information available and the causal information\ntransferred from the experiments in the domains.\n\u2022 A proof that the algorithm proposed in [13] is in fact complete for computing the transport\nformula, that is, the strategy devised for combining the empirical evidence to synthesize\nthe target relation cannot be improved upon.\n\n\u2022 A proof that the do-calculus is complete for the mz-transportability class.\n\n2 Background in Transportability\nIn this section, we consider other transportability instances and discuss the relationship with the\nmz-transportability setting. Consider Fig. 1(a) in which the node S represents factors that produce\ndifferences between source and target populations. We conduct a randomized trial in Los Angeles\n(LA) and estimate the causal effect of treatment X on outcome Y for every age group Z = z,\ndenoted by P (y|do(x), z). We now wish to generalize the results to the population of New York\nCity (NYC), but we \ufb01nd the distribution P (x, y, z) in LA to be different from the one in NYC (call\nthe latter P \u2217(x, y, z)). In particular, the average age in NYC is signi\ufb01cantly higher than that in LA.\nHow are we to estimate the causal effect of X on Y in NYC, denoted R = P \u2217(y|do(x))? 2 3\nThe selection diagram \u2013 overlapping of the diagrams in LA and NYC \u2013 for this example (Fig. 1(a))\nconveys the assumption that the only difference between the two populations are factors determining\nage distributions, shown as S \u2192 Z, while age-speci\ufb01c effects P \u2217(y|do(x), Z = z) are invariant\nacross populations. Difference-generating factors are represented by a special set of variables called\nselection variables S (or simply S-variables), which are graphically depicted as square nodes ((cid:4)).\nFrom this assumption, the overall causal effect in NYC can be derived as follows:\n\nR = (cid:88)\n= (cid:88)\n\nz\n\nP \u2217(y|do(x), z)P \u2217(z)\n\nP (y|do(x), z)P \u2217(z)\n\n(1)\n\nz\n\nThe last line constitutes a transport formula for R; it combines experimental results obtained in\nLA, P (y|do(x), z), with observational aspects of NYC population, P \u2217(z), to obtain a causal claim\n1Traditionally, the machine learning literature has been concerned about discrepancies among domains in\nthe context, almost exclusively, of predictive or classi\ufb01cation tasks as opposed to learning causal or counterfac-\ntual measures [14, 15]. Interestingly, recent work on anticausal learning leverages knowledge about invariances\nof the underlying data-generating structure across domains, moving the literature towards more general modal-\nities of learning [16, 17].\n\n2We will use Px(y | z) interchangeably with P (y | do(x), z).\n3We use the structural interpretation of causal diagrams as described in [18, pp. 205] (see also Appendix 1).\n\n2\n\n\fFigure 1: (a) Selection diagram illustrating when transportability of R = P \u2217(y|do(x)) between two\ndomains is trivially solved through simple recalibration. (b) The smallest diagram in which a causal\nrelation is not transportable. (c,d) Selection diagrams illustrating the impossibility of estimating\nR through individual transportability from \u03c0a and \u03c0b even when Z = {Z1, Z2}. If experiments\nover {Z2} is available in \u03c0a and over {Z1} in \u03c0b, R is transportable.\n(e,f) Selection diagrams\nillustrating opposite phenomenon \u2013 transportability through multiple domains is not feasible, but if\nZ = {Z1, Z2} in one domain is. The selection variables S are depicted as square nodes ((cid:4)).\n\nP \u2217(y|do(x)) about NYC. In this trivial example, the transport formula amounts to a simple re-\ncalibration (or re-weighting) of the age-speci\ufb01c effects to account for the new age distribution. In\ngeneral, however, a more involved mixture of experimental and observational \ufb01ndings would be\nnecessary to obtain an unbiased estimate of the target relation R. In certain cases there is no way\nto synthesize a transport formula, for instance, Fig. 1(b) depicts the smallest example in which\ntransportability is not feasible (even with X randomized). Our goal is to characterize these cases.\nIn real world applications, it may happen that only a limited amount of experimental information can\nbe gathered at the source environment. The question arises whether an investigator in possession of\na limited set of experiments would still be able to estimate the desired effects at the target domain.\nTo illustrate some of the subtle issues that mz-transportability entails, consider Fig. 1(c,d) which\nconcerns the transport of experimental results from two sources ({\u03c0a, \u03c0b}) to infer the effect of X\non Y in \u03c0\u2217, R = P \u2217(y|do(x)). In these diagrams, X may represent the treatment (e.g., cholesterol\nlevel), Z1 represents a pre-treatment variable (e.g., diet), Z2 represents an intermediate variable (e.g.,\nbiomarker), and Y represents the outcome (e.g., heart failure). Assume that experimental studies\nrandomizing {Z2} can be conducted in domain \u03c0a and {Z1} in domain \u03c0b. A simple analysis can\nshow that R cannot be transported from either source alone (even when experiments are available\nover both variables) [9]. Still, combining experiments from both sources allows one to determine\nthe effect in the target through the following transport formula [13]:\n\nP (b)(z2|x, do(Z1))P (a)(y|do(z2))\n\nP \u2217(y|do(x)) = (cid:88)\n\nz2\n\ntransport\n\nthe experimental\n\nformula is a mixture of\n\n(2)\nresult over {Z1} from \u03c0b,\nThis\nP (b)(z2|x, do(Z1)), with the result of the experiment over {Z2} in \u03c0a, P (a)(y|do(z2)), and consti-\ntute a consistent estimand of the target relation in \u03c0\u2217. Further consider Fig. 1(e,f) which illustrates\nthe opposite phenomenon. In this case, if experiments over {Z2} are available in domain \u03c0a and\nover {Z1} in \u03c0b, R is not transportable. However, if {Z1, Z2} are available in the same domain, say\n\u03c0a, R is transportable and equals P (a)(y|x, do(Z1, Z2)), independently of the values of Z1 and Z2.\nThese intriguing results entail two fundamental issues that will be answered throughout this paper.\nFirst, whether the do-calculus is complete relative to such problems, that is, whether it would always\n\ufb01nd a transport formula whenever such exists. Second, assuming that there exists a sequence of\napplications of do-calculus that achieves the reduction required by mz-transportability, to \ufb01nd such a\nsequence may be computational intractable, so an ef\ufb01cient way is needed for obtaining such formula.\n3 A Graphical Condition for mz-transportability\nThe basic semantical framework in our analysis rests on structural causal models as de\ufb01ned in [18,\npp. 205], also called data-generating models. In the structural causal framework [18, Ch. 7], actions\nare modi\ufb01cations of functional relationships, and each action do(x) on a causal model M produces\na new model Mx = (cid:104)U, V, Fx, P (U)(cid:105), where V is the set of observable variables, U is the set of\nunobservable variables, and Fx is obtained after replacing fX \u2208 F for every X \u2208 X with a new\nfunction that outputs a constant value x given by do(x).\nWe follow the conventions given in [18]. We denote variables by capital letters and their realized\nvalues by small letters. Similarly, sets of variables will be denoted by bold capital letters, sets\n\n3\n\nYXZ(a)(b)YX(c)XYZ1Z2(d)XYZ1Z21XYZZ2XYZ1Z2(f)(e)\fof realized values by bold small letters. We use the typical graph-theoretic terminology with the\ncorresponding abbreviations De(Y)G, P a(Y)G, and An(Y)G, which will denote respectively the\nset of observable descendants, parents, and ancestors of the node set Y in G. A graph GY will\ndenote the induced subgraph G containing nodes in Y and all arrows between such nodes. Finally,\nGXZ stands for the edge subgraph of G where all arrows incoming into X and all arrows outgoing\nfrom Z are removed.\nKey to the analysis of transportability is the notion of identi\ufb01ability [18, pp. 77], which expresses\nthe requirement that causal effects are computable from a combination of non-experimental data P\nand assumptions embodied in a causal diagram G. Causal models and their induced diagrams are\nassociated with one particular domain (i.e., setting, population, environment), and this representation\nis extended in transportability to capture properties of two domains simultaneously. This is possible\nif we assume that the structural equations share the same set of arguments, though the functional\nforms of the equations may vary arbitrarily [7]. 4\nDe\ufb01nition 1 (Selection Diagrams). Let (cid:104)M, M\u2217(cid:105) be a pair of structural causal models relative to\ndomains (cid:104)\u03c0, \u03c0\u2217(cid:105), sharing a diagram G. (cid:104)M, M\u2217(cid:105) is said to induce a selection diagram D if D is\nconstructed as follows: every edge in G is also an edge in D; D contains an extra edge Si \u2192 Vi\nwhenever there might exist a discrepancy fi (cid:54)= f\u2217\nIn words, the S-variables locate the mechanisms where structural discrepancies between the two do-\nmains are suspected to take place.5 Armed with the concept of identi\ufb01ability and selection diagrams,\nmz-transportability of causal effects can be de\ufb01ned as follows [13]:\nDe\ufb01nition 2 (mz-Transportability). Let D = {D(1), ..., D(n)} be a collection of selection diagrams\nrelative to source domains \u03a0 = {\u03c01, ..., \u03c0n}, and target domain \u03c0\u2217, respectively, and Zi (and Z\u2217)\nz(cid:105) be the\nbe the variables in which experiments can be conducted in domain \u03c0i (and \u03c0\u2217). Let (cid:104)P i, I i\nP i(v|do(z(cid:48))), and\npair of observational and interventional distributions of \u03c0i, where I i\nz(cid:105) be the observational and interventional distributions of \u03c0\u2217. The\nin an analogous manner, (cid:104)P \u2217, I\u2217\nx(y) is said to be mz-transportable from \u03a0 to \u03c0\u2217 in D if P \u2217\ncausal effect R = P \u2217\nx(y) is uniquely\nz(cid:105) \u222a (cid:104)P \u2217, I\u2217\ni=1,...,n(cid:104)P i, I i\n\ni or P (Ui) (cid:54)= P \u2217(Ui) between M and M\u2217.\n\ncomputable from(cid:83)\n\nz(cid:105) in any model that induces D.\n\nz =(cid:83)\n\nZ(cid:48)\u2286Zi\n\nz(cid:105) and (cid:104)P i, I i\n\nWhile this de\ufb01nition might appear convoluted, it is nothing more than a formalization of the state-\nment \u201cR need to be uniquely computable from the information set IS alone.\u201d Naturally, when IS\nhas many components (multiple observational and interventional distributions), it becomes lengthy.\nz(cid:105) from all sources has a syntactic image\nThis requirement of computability from (cid:104)P \u2217, I\u2217\nin the do-calculus, which is captured by the following suf\ufb01cient condition:\nTheorem 1 ([13]). Let D = {D(1), ..., D(n)} be a collection of selection diagrams relative to source\ndomains \u03a0 = {\u03c01, ..., \u03c0n}, and target domain \u03c0\u2217, respectively, and Si represents the collection of\nz(cid:105) be respectively the pairs\nS-variables in the selection diagram D(i). Let {(cid:104)P i, I i\nof observational and interventional distributions in the sources \u03a0 and target \u03c0\u2217. The effect R =\nP \u2217(y|do(x)) is mz-transportable from \u03a0 to \u03c0\u2217 in D if the expression P (y|do(x), S1, ..., Sn) is\nreducible, using the rules of the do-calculus, to an expression in which (1) do-operators that apply\nto subsets of I i\n\nz have no Si-variables or (2) do-operators apply only to subsets of I\u2217\nz .\n\nz(cid:105)} and (cid:104)P \u2217, I\u2217\n\nIt is not dif\ufb01cult to see that in Fig. 1(c,d) (and also in Fig. 1(e,f)) a sequence of applications of\nthe rules of do-calculus indeed reaches the reduction required by the theorem and yields a transport\nformula as shown in Section 2. It is not obvious, however, whether such sequence exists in Fig.\n2(a,b) when experiments over {X} are available in \u03c0a and {Z} in \u03c0b, and if it does not exist, it is\nalso not clear whether this would imply the inability to transport. It turns out that in this speci\ufb01c\nexample there is not such sequence and the target relation R is not transportable, which means\nthat there exist two models that are equally compatible with the data (i.e., both could generate the\nsame dataset) while each model entails a different answer for the effect R (violating the uniqueness\nrequirement of Def. 2). 6 To demonstrate this fact formally, we show the existence of two structural\n4As discussed in the reference, the assumption of no structural changes between domains can be relaxed,\nbut some structural assumptions regarding the discrepancies between domains must still hold (e.g., acyclicity).\n5Transportability assumes that enough structural knowledge about both domains is known in order to sub-\nstantiate the production of their respective causal diagrams. In the absence of such knowledge, causal discovery\nalgorithms might be used to infer the diagrams from data [19, 18].\n\n6This is usually an indication that the current state of scienti\ufb01c knowledge about the problem (encoded in\nthe form of a selection diagram) does not constraint the observed distributions in such a way that an answer is\nentailed independently of the details of the functions and probability over the exogenous.\n\n4\n\n\fFigure 2: (a,b) Selection diagrams in which is not possible to transport R = P \u2217(y|do(x)) with\nexperiments over {X} in \u03c0a and {Z} in \u03c0b. (c,d) Example of diagrams in which some paths need\nto be extended for satisfying the de\ufb01nition of mz\u2217-shedge.\n\nmodels M1 and M2 such that the following equalities and inequality between distributions hold,\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\nM2 (X, Z, Y ),\nM2(X, Z, Y ),\n\nM1 (X, Z, Y ) = P (a)\nP (a)\nP (b)\nM1(X, Z, Y ) = P (b)\nM1 (Z, Y |do(X)) = P (a)\nP (a)\nM1(X, Y |do(Z)) = P (b)\nP (b)\nM1(X, Z, Y ) = P \u2217\nP \u2217\n\nM2(X, Z, Y ),\n\nM2 (Z, Y |do(X)),\nM2(X, Y |do(Z)),\n\nfor all values of X, Z, and Y , and\nM1(Y |do(X)) (cid:54)= P \u2217\nP \u2217\n\nM2(Y |do(X)),\n\n(3)\n\n(4)\n\nfor some value of X and Y .\nLet us assume that all variables in U \u222a V are binary. Let U1, U2 \u2208 U be the common causes of X\nand Y and Z and Y , respectively; let U3, U4 \u2208 U be the random disturbances exclusive to Z and\nY , respectively, and U5, U6 \u2208 U be extra random disturbances exclusive to Y . Let Sa and Sb index\nthe model in the following way: the tuples (cid:104)Sa = 1, Sb = 0(cid:105), (cid:104)Sa = 0, Sb = 1(cid:105), (cid:104)Sa = 0, Sb = 0(cid:105)\nrepresent domains \u03c0a, \u03c0b, and \u03c0\u2217, respectively. De\ufb01ne the two models as follows:\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3 X = U1\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3 X = U1\n\nM1 =\n\nZ = U2 \u2295 (U3 \u2227 Sa)\nY = ((X \u2295 Z \u2295 U1 \u2295 U2 \u2295 (U4 \u2227 Sb))\n\n\u2227U5) + (\u00acU5 \u2227 U6)\n\nM2 =\n\nZ = U2 \u2295 (U3 \u2227 Sa)\nY = ((Z \u2295 U2 \u2295 (U4 \u2227 Sb))\n\u2227U5) \u2295 (\u00acU5 \u2227 U6)\n\nwhere \u2295 represents the exclusive or function. Both models agree in respect to P (U), which is\nde\ufb01ned as P (Ui) = 1/2, i = 1, ..., 6. It is not dif\ufb01cult to evaluate these models and note that the\nconstraints given in Eqs. (3) and (4) are indeed satis\ufb01ed (including positivity), the result follows. 7\nGiven that our goal is to demonstrate the converse of Theorem 1, we collect different examples of\nnon-transportability, as the previous one, and try to make sense whether there is a pattern in such\ncases and how to generalize them towards a complete characterization of mz-transportability.\nOne syntactic subtask of mz-transportability is to determine whether certain effects are identi\ufb01able\nin some source domains where interventional data is available. There are two fundamental results\ndeveloped for identi\ufb01ability that will be relevant for mz-transportability as well. First, we should\nconsider confounded components (or c-components), which were de\ufb01ned in [20] and stand for a\ncluster of variables connected through bidirected edges (which are not separable through the ob-\nservables in the system). One key result is that each causal graph (and subgraphs) induces an unique\nC-component decomposition ([20, Lemma 11]). This decomposition was indeed instrumental for\na series of conditions for ordinary identi\ufb01cation [21] and the inability to recursively decompose a\ncertain graph was later used to prove completeness.\nDe\ufb01nition 3 (C-component). Let G be a causal diagram such that a subset of its bidirected arcs\nforms a spanning tree over all vertices in G. Then G is a C-component (confounded component).\n\nSubsequently, [22] proposed an extension of C-components called C-forests, essentially enforcing\nthat each C-component has to be a spanning forest and closed under ancestral relations [20].\n\n7To a more sophisticated argument on how to evaluate these models, see proofs in appendix 3.\n\n5\n\nWUXZY(d)WYXZ(c)XY(b)Z(a)XZY\fDe\ufb01nition 4 (C-forest). Let G be a causal diagram where Y is the maximal root set. Then G is a\nY-rooted C-forest if G is a C-component and all observable nodes have at most one child.\nFor concreteness, consider Fig. 1(c) and note that there exists a C-forest over nodes {Z1, X, Z2} and\nrooted in {Z2}. There exists another C-forest over nodes {Z1, X, Z2, Y } rooted in {Y }. It is also\nthe case that {Z2} and {Y } are themselves trivial C-forests. When we have a pair of C-forests as\n{Z1, X, Z2} and {Z2} or {Z1, X, Z2, Y } and {Y } \u2013 i.e., the root set does not intersect the treatment\nvariables; these structures are called hedges and identi\ufb01ability was shown to be infeasible whenever\na hedge exists [22]. Clearly, despite the existence of hedges in Fig. 1(c,d), the effects of interest\nwere shown to be mz-transportable. This example is an indication that hedges do not capture in an\nimmediate way the structure needed for characterizing mz-transportability \u2013 i.e., a graph might be a\nhedge (or have a hedge as an edge sub\u2013graph) but the target quantity might still be mz-transportable.\nBased on these observations, we propose the following de\ufb01nition that may lead to the boundaries of\nthe class of mz-transportable relations:\nDe\ufb01nition 5 (mz\u2217-shedge). Let D = (D(1), . . . , D(n)) be a collection of selection diagrams rel-\native to source domains \u03a0 = (\u03c01, . . . , \u03c0n) and target domain \u03c0\u2217, respectively, Si represents the\ncollection of S-variables in the selection diagram D(i), and let D(\u2217) be the causal diagram of \u03c0\u2217.\nLet {(cid:104)P i, I i\nz(cid:105)} be the collection of pairs of observational and interventional distributions of {\u03c0i},\nz(cid:105) be the observational and\nwhere I i\ninterventional distributions of \u03c0\u2217, for Zi the set of experimental variables in \u03c0i. Consider a pair of\nR-rooted C-forests F = (cid:104)F, F (cid:48)(cid:105) such that F (cid:48) \u2282 F , F (cid:48) \u2229 X = \u2205, F \u2229 X (cid:54)= \u2205, and R \u2286 An(Y)GX\n(called a hedge [22]). We say that the induced collection of pairs of R-rooted C-forests over each\ndiagram, (cid:104)F (\u2217),F (1), ...,F (n)(cid:105), is an mz-shedge for P \u2217\nz )\nz , ..., I n\nif they are all hedges and one of the following conditions hold for each domain \u03c0i, i = {\u2217, 1, ..., n}:\n\nP i(v|do(z(cid:48))), and in an analogous manner, (cid:104)P \u2217, I\u2217\n\nx(y) relative to experiments (I\u2217\n\nz =(cid:83)\n\nZ(cid:48)\u2286Zi\n\nz , I 1\n\n1. There exists at least one variable of Si pointing to the induced diagram F (cid:48)(i), or\n2. (F (i) \\ F (cid:48)(i)) \u2229 Zi is an empty set, or\n3. The collection of pairs of C-forests induced over diagrams, (cid:104)F (\u2217),F (1), . . . , F (i) \\\ni =\n\ni , . . . ,F (n)(cid:105), is also an mz-shedge relative to (I\u2217\nZ\u2217\n(F (i) \\ F (cid:48)(i)) \u2229 Zi.\n\nz ), where Z\u2217\n\nz , ..., I i\n\n, ..., I n\n\nz , I 1\n\nz\\z\u2217\n\ni\n\nz1,x,z2(y).\n\n2 = P \u2217\n\nFurthermore, we call mz\u2217-shedge the mz-shedge in which there exist one directed path from R \\\n(R \u2229 De(X)F ) to (R \u2229 De(X)F ) not passing through X (see also appendix 3).\nThe de\ufb01nition of mz\u2217-shedge might appear involved, but it is nothing more than the articulation\nof the computability requirement of Def. 2 (and implicitly the syntactic goal of Thm. 1) in a more\nexplicit graphical fashion. Speci\ufb01cally, for a certain factor Q\u2217\ni needed for the computation of the\neffect Q\u2217 = P \u2217(y|do(x)), in at least one domain, (i) it should be enforced that the S-nodes are\nseparable from the inducing root set of the component in which Q\u2217\ni belongs, and further, (ii) the\nexperiments available in this domain are suf\ufb01cient for solving Q\u2217\ni . For instance, assuming we want\nto compute Q\u2217 = P \u2217(y|do(x)) in Fig. 1(c, d), Q\u2217 can be decomposed into two factors, Q\u2217\n1 =\nP \u2217\nz1,x(z2) and Q\u2217\n1, (i) holds true in \u03c0b and (ii)\nthe experiments available over Z1 are enough to guarantee the computability of this factor (similar\n2) \u2013 i.e., there is no mz\u2217-shedge and Q\u2217 is computable from the available data.\nanalysis applies to Q\u2217\nDef. 5 also asks for the explicit existence of a path from the nodes in the root set R\\(R\u2229 De(X)F )\nto (R \u2229 De(X)F ), a simple example can help to illustrate this requirement. Consider Fig. 2(c)\nand the goal of computing Q = P \u2217(y|do(x)) without extra experimental information. There ex-\nists a hedge for Q induced over {X, Z, Y } without the node W (note that {W} is a c-component\nitself) and the induced graph G{X,Z,Y } indeed leads to a counter-example for the computability of\nP \u2217(z, y|do(x)). Using this subgraph alone, however, it would not be possible to construct a counter-\nexample for the marginal effect P \u2217(y|do(x)). Despite the fact that P \u2217(z, y|do(x)) is not computable\nfrom P \u2217(x, z, y), the quantity P \u2217(y|do(x)) is identi\ufb01able in G{X,Z,Y }, and so any structural model\ncompatible with this subgraph will generate the same value under the marginalization over Z from\nP \u2217(z, y|do(x)). Also, it might happen that the root set R must be augmented (Fig. 2(d)), so we\nprefer to add this requirement explicitly to the de\ufb01nition. (There are more involved scenarios that\n\nIt is the case that for factor Q\u2217\n\n6\n\n\f.]\n\nZi\n\nV\\An(Y)D\n\nZ, P (i)\nZi\n\n.\n\nV\\Y P.\n\nor F AIL(D, C0).\n\nP,I,S,W, DAn(Y)).\n\nPROCEDURE TRmz(y, x,P,I,S,W, D)\nINPUT: x, y: value assignments; P: local distribution relative to domain S (S = 0 indexes \u03c0\u2217) and active\nexperiments I; W: weighting scheme; D: backbone of selection diagram; Si: selection nodes in \u03c0i (S0 = \u2205\nrelative to \u03c0\u2217); [The following set and distributions are globally de\ufb01ned: Zi, P \u2217, P (i)\nOUTPUT: P \u2217\n\nx (y) in terms of P \u2217, P \u2217\n\nif W (cid:54)= \u2205, return TRmz(y, x \u222a w,P,I,S,W, D).\nV\\{Y,X}\n\n3 set W = (V \\ X) \\ An(Y)DX\n4\n5 if C(D \\ X) = {C0},\n6\n7\n8\n\n1 if x = \u2205, returnP\n2 if V \\ An(Y)D (cid:54)= \u2205, return TRmz(y, x \u2229 An(Y)D,P\nif C(D \\ X) = {C0, C1, ..., Ck}, returnP\nQ\ni TRmz(ci, v \\ ci,P,I,S,W, D).\nif C0 \u2208 C(D), returnQ\nreturn TRmz(y, x \u2229 C(cid:48),Q\nif`(Si \u22a5\u22a5 Y | X)\nif |E| > 0, returnP|E|\n\nif C(D) (cid:54)= {D},\nV\\V\nif (\u2203C(cid:48))C0 \u2282 C(cid:48) \u2208 C(D),\nfor {i|Vi \u2208 C(cid:48)}, set \u03bai = \u03bai \u222a v(i\u22121)\n\nelse,\nif I = \u2205, for i = 0, ...,|D|,\n\ni|Vi\u2208C(cid:48) P(Vi|V (i\u22121)\n\u2227 (Zi \u2229 X (cid:54)= \u2205)\u00b4, Ei = TRmz(y, x \\ zi,P, Zi \u2229 X, i,W, D \\ {Zi \u2229 X}).\n\n\u2229 C(cid:48), \u03bai),I,S,W, C(cid:48)).\n\n11\n12\nFigure 3: Modi\ufb01ed version of identi\ufb01cation algorithm capable of recognizing mz-transportability.\n\nelse, FAIL(D, C0).\n\nP/P\n\ni=1 w(j)\n\ni Ei.\n\n(i)\nD\n\n\\ C(cid:48).\n\nP\n\nD\n\nP.\n\nV\\V\n\n(i\u22121)\nD\n\ni|Vi\u2208C0\n\nD\n\nD\n\n(i)\nX\n\n9\n10\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f3\n\nM1 =\n\nwe prefer to omit for the sake of presentation.) After adding the directed path from Z to Y that\npasses through W , we can construct the following counter-example for Q:\n\nX = U1\nZ = U1 \u2295 U2\nW = ((Z \u2295 U3) \u2228 B) \u2295 (B \u2227 (1 \u2295 Z))\nY = ((X \u2295 W \u2295 U2) \u2227 A)\n\n\u2295 (A \u2228 (1 \u2295 X \u2295 W \u2295 U2)),\n\nM2 =\n\nX = U1\nZ = U2\nW = ((Z \u2295 U3) \u2228 B) \u2295 (B \u2227 (1 \u2295 Z))\nY = ((W \u2295 U2) \u2227 A)\n\n\u2295 (A \u2228 (1 \u2295 W \u2295 U2)),\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f3\n\nwith P (Ui) = 1/2,\u2200i, P (A) = P (B) = 1/2. It is not immediate to show that the two models\nproduce the desired property. Refer to Appendix 2 for a formal proof of this statement.\nGiven that the de\ufb01nition of mz\u2217-shedge is justi\ufb01ed and well-understood, we can now state the\nconnection between hedges and mz\u2217-shedges more directly (the proof can be found in Appendix 3):\nz = {}),\nTheorem 2. If there is a hedge for P \u2217\nthere exists an mz\u2217-shedge for P \u2217\n\nx(y) in G and no experimental data is available (i.e., I\u2217\n\nx(y) in G.\n\nz , I 1\n\nz , ..., I n\n\nz ) in D, R is not mz-transportable from \u03a0 to \u03c0\u2217 in D.\n\nWhenever one domain is considered and no experimental data is available, this result states that a\nmz\u2217-shedge can always be constructed from a hedge, which implies that we can operate with mz\u2217-\nshedges from now on (the converse holds for Z = {}). Finally, we can concentrate on the most\ngeneral case of mz\u2217-shedges with experimental data in multiple domains as stated in the sequel:\nTheorem 3. Let D = {D(1), ..., D(n)} be a collection of selection diagrams relative to source\ndomains \u03a0 = {\u03c01, ..., \u03c0n}, and target domain \u03c0\u2217, respectively, and {I i\nz}, for i = {\u2217, 1, ..., n}\nde\ufb01ned appropriately. If there is an mz\u2217-shedge for the effect R = P \u2217\nx(y) relative to experiments\n(I\u2217\nThis is a powerful result that states that the existence of a mz\u2217-shedge precludes mz-transportability.\n(The proof of this statement is somewhat involved, see the supplementary material for more details.)\nFor concreteness, let us consider the selection diagrams D = (D(a), D(b)) relative to domains \u03c0a\nand \u03c0b in Fig. 2(a,b). Our goal is to mz-transport Q = P \u2217(y|do(x)) with experiments over {X} in\n\u03c0a and {Z} in \u03c0b. It is the case that there exists an mz\u2217-shedge relative to the given experiments.\nTo witness, \ufb01rst note that F (cid:48) = {Y, Z} and F = F (cid:48) \u222a {X}, and also that there exists a selection\nvariable S pointing to F (cid:48) in both domains \u2013 the \ufb01rst condition of Def. 5 is satis\ufb01ed. This is a trivial\ngraph with 3 variables that can be solved by inspection, but it is somewhat involved to ef\ufb01ciently\nevaluate the conditions of the de\ufb01nition in more intricate structures, which motivates the search for\na procedure for recognizing mz\u2217-shedges that can be coupled with the previous theorem.\n\n7\n\n\f4 Complete Algorithm for mz-transportability\nThere exists an extensive literature concerned with the problem of computability of causal relations\nfrom a combination of assumptions and data [21, 22, 7, 13]. In this section, we build on the works\nthat treat this problem by graphical means, and we concentrate particularly in the algorithm called\nTRmz constructed in [13] (see Fig. 3) that followed some of the results in [21, 22, 7].\nThe algorithm TRmz takes as input a collection of selection diagrams with the corresponding ex-\nperimental data from the corresponding domains, and it returns a transport formula whenever it is\nable to produce one. The main idea of the algorithm is to leverage the c-component factorization\n[20] and recursively decompose the target relation into manageable pieces (line 4), so as to try to\nsolve each of them separately. Whenever this standard evaluation fails in the target domain \u03c0\u2217 (line\n6), TRmz tries to use the experimental information available from the target and source domains\n(line 10). (For a concrete view of how TRmz works, see the running example in [13, pp. 7]. )\nIn a systematic fashion, the algorithm basically implements the declarative condition delineated in\nTheorem 1. TRmz was shown to be sound [13, Thm. 3], but there is no theoretical guarantee on\nwhether failure in \ufb01nding a transport formula implies its non-existence and perhaps, the complete\nlack of transportability. This guarantee is precisely what we state in the sequel.\nTheorem 4. Assume TRmz fails to transport the effect P \u2217\nx(y) (exits with failure executing line 12).\nThen there exists X(cid:48) \u2286 X, Y(cid:48) \u2286 Y, such that the graph pair D, C0 returned by the fail condition\nof TRmz contains as edge subgraphs C-forests F, F\u2019 that span a mz\u2217-shedge for P \u2217\n\nx(cid:48)(y(cid:48)).\n\nProof. Let D be the subgraph local to the call in which TRmz failed, and R be the root set of D. It\nis possible to remove some directed arrows from D while preserving R as root, which result in a R-\nrooted c-forest F . Since by construction F (cid:48) = F \u2229 C0 is closed under descendents and only directed\narrows were removed, both F, F (cid:48) are C-forests. Also by construction R \u2282 An(Y)GX\ntogether with\nthe fact that X and Y from the recursive call are clearly subsets of the original input. Before failure,\nTRmz evaluated false consecutively at lines 6, 10, and 11, and it is not dif\ufb01cult to see that an S-node\npoints to F (cid:48) or the respective experiments were not able to break the local hedge (lines 10 and 11).\nIt remains to be showed that this mz-shedge can be stretched to generate a mz\u2217-shedge, but now the\nsame construction given in Thm. 2 can be applied (see also supplementary material).\n\nx(y) is mz-transportable from \u03a0 to \u03c0\u2217 in D if and\n\nFinally, we are ready to state the completeness of the algorithm and the graphical condition.\nTheorem 5 (completeness). TRmz is complete.\nCorollary 1 (mz\u2217-shedge characterization). P \u2217\nonly if there is not mz\u2217-shedge for Px(cid:48)(y(cid:48)) in D for any X(cid:48) \u2286 X and Y(cid:48) \u2286 Y.\nFurthermore, we show below that the do-calculus is complete for establishing mz-transportability,\nwhich means that failure in the exhaustive application of its rules implies the non-existence of a\nmapping from the available data to the target relation (i.e., there is no mz-transport formula), inde-\npendently of the method used to obtain such mapping.\nCorollary 2 (do-calculus characterization). The rules of do-calculus together with standard proba-\nbility manipulations are complete for establishing mz-transportability of causal effects.\n5 Conclusions\nIn this paper, we provided a complete characterization in the form of a graphical condition for de-\nciding mz-transportability. We further showed that the procedure introduced in [1] for computing\nthe transport formula is complete, which means that the set of transportable instances identi\ufb01ed by\nthe algorithm cannot be broadened without strengthening the assumptions. Finally, we showed that\nthe do-calculus is complete for this class of problems, which means that \ufb01nding a proof strategy in\nthis language suf\ufb01ces to solve the problem. The non-parametric characterization established in this\npaper gives rise to a new set of research questions. While our analysis aimed at achieving unbiased\ntransport under asymptotic conditions, additional considerations need to be taken into account when\ndealing with \ufb01nite samples. Speci\ufb01cally, when sample sizes vary signi\ufb01cantly across studies, statis-\ntical power considerations need to be invoked along with bias considerations. Furthermore, when\nno transport formula exists, approximation techniques must be resorted to, for example, replacing\nthe requirement of non-parametric analysis with assumptions about linearity or monotonicity of cer-\ntain relationships in the domains. The nonparametric characterization provided in this paper should\nserve as a guideline for such approximation schemes.\n\n8\n\n\fReferences\n[1] D. Campbell and J. Stanley. Experimental and Quasi-Experimental Designs for Research. Wadsworth\n\nPublishing, Chicago, 1963.\n\n[2] C. Manski.\n\nIdenti\ufb01cation for Prediction and Decision. Harvard University Press, Cambridge, Mas-\n\nsachusetts, 2007.\n\n[3] L. V. Hedges and I. Olkin. Statistical Methods for Meta-Analysis. Academic Press, January 1985.\n[4] W.R. Shadish, T.D. Cook, and D.T. Campbell. Experimental and Quasi-Experimental Designs for Gen-\n\neralized Causal Inference. Houghton-Mif\ufb02in, Boston, second edition, 2002.\n\n[5] S. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles for Social\nResearch (Analytical Methods for Social Research). Cambridge University Press, New York, NY, 2007.\n[6] J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal approach. In\nW. Burgard and D. Roth, editors, Proceedings of the Twenty-Fifth National Conference on Arti\ufb01cial In-\ntelligence, pages 247\u2013254. AAAI Press, Menlo Park, CA, 2011.\n\n[7] E. Bareinboim and J. Pearl. Transportability of causal effects: Completeness results. In J. Hoffmann and\nB. Selman, editors, Proceedings of the Twenty-Sixth National Conference on Arti\ufb01cial Intelligence, pages\n698\u2013704. AAAI Press, Menlo Park, CA, 2012.\n\n[8] E. Bareinboim and J. Pearl. A general algorithm for deciding transportability of experimental results.\n\nJournal of Causal Inference, 1(1):107\u2013134, 2013.\n\n[9] E. Bareinboim and J. Pearl. Causal transportability with limited experiments.\n\nIn M. desJardins and\nM. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Arti\ufb01cial Intelligence,\npages 95\u2013101, Menlo Park, CA, 2013. AAAI Press.\n\n[10] S. Lee and V. Honavar. Causal transportability of experiments on controllable subsets of variables: z-\ntransportability. In A. Nicholson and P. Smyth, editors, Proceedings of the Twenty-Ninth Conference on\nUncertainty in Arti\ufb01cial Intelligence (UAI), pages 361\u2013370. AUAI Press, 2013.\n\n[11] E. Bareinboim and J. Pearl. Meta-transportability of causal effects: A formal approach. In C. Carvalho\nand P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Arti\ufb01cial Intelligence\nand Statistics (AISTATS), pages 135\u2013143. JMLR W&CP 31, 2013.\n\n[12] S. Lee and V. Honavar. m-transportability: Transportability of a causal effect from multiple environments.\nIn M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on\nArti\ufb01cial Intelligence, pages 583\u2013590, Menlo Park, CA, 2013. AAAI Press.\n\n[13] E. Bareinboim, S. Lee, V. Honavar, and J. Pearl. Transportability from multiple environments with limited\nexperiments. In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors,\nAdvances in Neural Information Processing Systems 26, pages 136\u2013144. Curran Associates, Inc., 2013.\n\n[14] H. Daume III and D. Marcu. Domain adaptation for statistical classi\ufb01ers. Journal of Arti\ufb01cial Intelligence\n\nResearch, 26:101\u2013126, 2006.\n\n[15] A.J. Storkey. When training and test sets are different: characterising learning transfer. In J. Candela,\nM. Sugiyama, A. Schwaighofer, and N.D. Lawrence, editors, Dataset Shift in Machine Learning, pages\n3\u201328. MIT Press, Cambridge, MA, 2009.\n\n[16] B. Sch\u00a8olkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal\nIn J Langford and J Pineau, editors, Proceedings of the 29th International Conference on\n\nlearning.\nMachine Learning (ICML), pages 1255\u20131262, New York, NY, USA, 2012. Omnipress.\n\n[17] K. Zhang, B. Sch\u00a8olkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional\nshift. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR: W&CP\nvolume 28, 2013.\n\n[18] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000.\n\n2nd edition, 2009.\n\n[19] P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge,\n\nMA, 2nd edition, 2000.\n\n[20] J. Tian. Studies in Causal Reasoning and Learning. PhD thesis, Department of Computer Science,\n\nUniversity of California, Los Angeles, Los Angeles, CA, November 2002.\n\n[21] J. Tian and J. Pearl. A general identi\ufb01cation condition for causal effects. In Proceedings of the Eighteenth\nNational Conference on Arti\ufb01cial Intelligence, pages 567\u2013573. AAAI Press/The MIT Press, Menlo Park,\nCA, 2002.\n\n[22] I. Shpitser and J. Pearl. Identi\ufb01cation of joint interventional distributions in recursive semi-Markovian\ncausal models. In Proceedings of the Twenty-First National Conference on Arti\ufb01cial Intelligence, pages\n1219\u20131226. AAAI Press, Menlo Park, CA, 2006.\n\n9\n\n\f", "award": [], "sourceid": 207, "authors": [{"given_name": "Elias", "family_name": "Bareinboim", "institution": "UCLA"}, {"given_name": "Judea", "family_name": "Pearl", "institution": "UCLA"}]}