{"title": "Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering", "book": "Advances in Neural Information Processing Systems", "page_first": 13510, "page_last": 13521, "abstract": "State-of-the-art approaches to causal discovery usually assume a fixed underlying causal model. However, it is often the case that causal models vary across domains or subjects, due to possibly omitted factors that affect the quantitative causal effects. As a typical example, causal connectivity in the brain network has been reported to vary across individuals, with significant differences across groups of people, such as autistics and typical controls. In this paper, we develop a unified framework for causal discovery and mechanism-based group identification. In particular, we propose a specific and shared causal model (SSCM), which takes into account the variabilities of causal relations across individuals/groups and leverages their commonalities to achieve statistically reliable estimation. The learned SSCM gives the specific causal knowledge for each individual as well as the general trend over the population. In addition, the estimated model directly provides the group information of each individual. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed method.", "full_text": "Speci\ufb01c and Shared Causal Relation Modeling and\n\nMechanism-Based Clustering\n\nBiwei Huang 1 \u2217, Kun Zhang1, Pengtao Xie2, Mingming Gong3, Eric Xing2,4, Clark Glymour1\n\n1Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, USA.\n\n3School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia.\n4Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA, USA.\n\n2Petuum Inc., USA.\n\nAbstract\n\nState-of-the-art approaches to causal discovery usually assume a \ufb01xed underly-\ning causal model. However, it is often the case that causal models vary across\ndomains or subjects, due to possibly omitted factors that affect the quantitative\ncausal effects. As a typical example, causal connectivity in the brain network\nhas been reported to vary across individuals, with signi\ufb01cant differences across\ngroups of people, such as autistics and typical controls. In this paper, we develop\na uni\ufb01ed framework for causal discovery and mechanism-based group identi\ufb01ca-\ntion. In particular, we propose a speci\ufb01c and shared causal model (SSCM), which\ntakes into account the variabilities of causal relations across individuals/groups\nand leverages their commonalities to achieve statistically reliable estimation. The\nlearned SSCM gives the speci\ufb01c causal knowledge for each individual as well as\nthe general trend over the population. In addition, the estimated model directly\nprovides the group information of each individual. Experimental results on syn-\nthetic and real-world data demonstrate the ef\ufb01cacy of the proposed method.\n\n1\n\nIntroduction\n\nLearning causal relations from observational data automatically, known as causal discovery, has\nshown its increasing importance and ef\ufb01cacy. State-of-the-art approaches to causal discovery usu-\nally assume a \ufb01xed causal model [33, 2, 13, 31, 14, 39, 16]; that is, causal mechanisms are invariant\nacross instances in the data set. Under this assumption, causal relations can be identi\ufb01ed by lever-\naging the conditional independence between observed variables [33] or the asymmetrical indepen-\ndence between estimated noise term and hypothetical causes, implied by suitable functional causal\nmodels [31, 37, 14, 39].\nIn real-world scenarios, it is often the case that causal relations over the considered set of variables\nmay vary across individuals or individual groups, and meanwhile they also share many commonal-\nities. For example, in healthcare, individuals may show different responses to the same treatment.\nThe varying responses may be due to some (unmeasured) factors, such as nutrition and health status.\nAt the same time, although the effect may be different for different individuals, a large proportion\nmay still show a similar trend, while others may show very distinct effects. This suggests that to\nunderstand causal effects, it is helpful to properly divide these subjects into different groups: within\ngroups, the variation of the treatment effect should be small, while it may be large across groups.\nWhen examining whether a treatment is effective and should be adopted as standard practice, one\nshould not only care about its effect in the general population, but also account for the response to\nthe treatment of each individual or each properly divided group. The brain network is another ex-\nample. There is ample evidence that heterogeneity in brain processes exists across individuals [34].\n\n\u2217Correspondence to: Biwei Huang, email: biweih@andrew.cmu.edu.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fMore speci\ufb01cally, there exist signi\ufb01cant differences of brain information \ufb02ows in different groups\nof people. For example, it has been shown that cases of autism are associated with atypical brain\nconnectivities [5, 17], and that the differences provide a good criterion for autism diagnosis and help\nto localize neuropathology biomarkers.\nTo \ufb01nd the difference across individuals, a typical solution is to analyze data from each subject\nseparately and then make a comparison. However, this approach may suffer from low statistical\nreliability, because the size of samples from one subject may be small while the data dimension\nmay be high. For example, in healthcare data, each patient may have only a few records due to\nresource and time constraints, while many clinical variables are measured. Fortunately, although\nindividuals may not have the same causal model, they usually share many commonalities, which\ncan be leveraged to achieve more reliable estimation results. This reasoning is motivated by multi-\ntask learning [1], where multiple learning tasks are solved jointly in a principled way, thus exploiting\ntheir commonalities while at the same time preserving useful information for each individual task.\nOn the other hand, if we ignore the differences and concatenate the data to estimate a causal graph,\nspurious edge or incorrect causal directions may be introduced [38].\nRecently, some approaches have been developed for causal discovery in the case where causal rela-\ntions over the observed variables change across domains. For example, causal discovery from het-\nerogeneous/nonstationary data (CD-NOD) [38, 19] concatenates data from different domains and\nconsiders the domain index as a surrogate to characterize the variability of causal mechanisms, and\n\ufb01nally recovers \ufb01xed as well as varying causal relations. In the linear case, invariant causal rela-\ntions are found based on invariant predictions [25], and some other methods can directly estimate\nvarying causal relations [18, 20, 11, 36, 15]. Despite their success on the considered problem, these\napproaches do not explicitly provide the group-level information regarding which ones are similar\nto each other and can be grouped together. In addition, they do not allow opposite causal directions\nin different domains. The Group Iterative Multiple Model Estimation (GIMME) approach [22, 9]\ntries to recover causal structure at both group and individual levels in a heuristic way. It \ufb01rst learns a\ngroup model by selecting adjacencies which will improve the majority of individuals\u2019 maps in an it-\nerative forward selection procedure, and then selects individual-level adjacencies that will optimally\nimprove that model. As a heuristic method, GIMME does not have theoretical guarantees of the\nidenti\ufb01ability of the learned causal structure.\nMotivated by the above real-world scenarios in healthcare and neuroscience, we propose a Speci\ufb01c\nand Shared Causal Model (SSCM), to achieve the following goals: (1) Discover a general trend\nof causal relations over the population. (2) Identify speci\ufb01c causal relations for each individual or\neach automatically determined group. (3) Exploit variations and commonalities of causal relations\nto cluster individuals into different groups. In particular, for each individual, the causal model is\nformalized with a linear non-Gaussian model. The learned speci\ufb01c and shared causal model gives\nthe information of the speci\ufb01c causal knowledge for each individual, as well as the general trend\nover the population. Each individual can be grouped by directly using the learned causal model.\nMoreover, the proposed causal model is theoretically identi\ufb01able under mild conditions.\n\n2 Speci\ufb01c and Shared Causal Models\nSuppose there are n individuals, which can be divided into q groups; we do not know which group\neach individual belongs to. All individuals have the same m observed variables under investigation,\nbut their causal relations may be different. For the s-th individual (s = 1,\u00b7\u00b7\u00b7 , n), we observe ls data\npoints for the m variables; the ls data points can be either independent and identically distributed\n(i.i.d.) or from a stationary time series. Consider the brain connectivity problem. We have n\nsubjects, which are expected to be from q groups; for the s-th subject we record ls fMRI data points\nover m variables. We aim to learn a shared causal model over the m variables, which is shared\nacross the population, and also a speci\ufb01c causal model for each individual. Moreover, we cluster\nthese n individuals into q groups by leveraging the learned causal model.\nSuppose the m observed variables from the s-th individual, X s(t) = (xs\nthe following generating process\n\n1(t),\u00b7\u00b7\u00b7 , xs\n\nm(t))T, satisfy\n\nxs\ni (t) =\n\nbs\nijxs\n\nj(t)\n\n+\n\n+es\n\ni (t),\n\n(1)\n\n(cid:88)\n(cid:124)\n\nj\u2208P s\n\ni\n\n(cid:123)(cid:122)\n\n(cid:88)\n\nj\u2208Ls\n\ni\n\npl(cid:88)\n(cid:124)\n\np=1\n\n(cid:125)\n\nas\nij,pxs\n\n(cid:123)(cid:122)\n\nj(t \u2212 p)\n(cid:125)\n\ninstantaneous\n\ntime-lagged\n\n2\n\n\fpl(cid:88)\n\nij,p p-lagged causal in\ufb02uences, P s\ni the set of indices of lagged direct causes of xs\nij and as\n\nfor i = 1,\u00b7\u00b7\u00b7 , m, where bs\nij represents instantaneous causal in\ufb02uences from variable j to i in the\ns-th subject, as\ni the set of indices of instantaneous direct causes\ni , and Ls\ni . Each individual has \ufb01xed causal\nof xs\ncoef\ufb01cients bs\ni (t) is non-\nj(t \u2212 p), for all\nGaussian, representing some unmeasured factors. It is independent of xs\nj, p \u2208 N +. Note that for i.i.d. samples, we only consider instantaneous causal relations, while for\nstationary time series, we allow both instantaneous and time-lagged causal relations. Eq. (1) can be\nrepresented in the matrix form:\n\nij,p, while they may change across individuals. The noise term es\n\nj(t) and xs\n\np=1\n\nij, As\n\n1(t),\u00b7\u00b7\u00b7 , es\n\nX s(t) = BsX s(t) +\n\nij,p, and Es(t) = (es\n\n(2)\np the m\u00d7m lagged\n\npX s(t \u2212 p) + Es(t),\nAs\nwhere Bs is the m\u00d7m instantaneous causal adjacency matrix with entries bs\nm(t))T.\ncausal adjacency matrix with entries as\nWe allow that both instantaneous causal relations bs\nij,p change across groups\nij and lagged relations as\nor individuals. More speci\ufb01cally, they vary across different groups, while there are also slight dif-\nferences across individuals within the group. Intuitively, one may estimate corresponding causal\nrelationships for each individual separately. However, the limited sample size from each individual\nlimits statistical ef\ufb01ciency or even makes causal discovery impossible, especially when the data di-\nmension is high and the causal graph is dense. Although individuals may not have the same causal\nmodel, they usually share many commonalities, which can be leveraged to achieve more reliable\nestimation results.\nTo exploit both variations and commonalities across groups, as well\nas across individuals, and meanwhile perform mechanism-based clus-\ntering in a principled way, we propose speci\ufb01c and shared causal re-\nlation modeling. Speci\ufb01cally, we take the instantaneous causal in\ufb02u-\nence as a random variable bij, where bs\nij can be seen as an instance of\nbij. To encode the variation across groups, as well as that within each\ngroup, we assume that in each group, bij follows a Gaussian distribu-\ntion, while in different groups the Gaussian distributions are different.\nTherefore, we impose a mixture of Gaussians (MoG) prior on bij. Let\na q-dimensional binary vector Z denote the index of the Gaussian com-\nponent in the MoG for bij (i.e., which group the individual belongs to)\nwhere a particular element zk = 1 and all other elements are zero.\nThen the distribution of bij can be represented as\n\nZE\n\nb12\n\nx1\n\ne1\n\ne2\n\nx2\n\nZ\n\nP (bij) =\n\nP (zk = 1)P (bij|\u00b5k,ij, \u03c3k,ij),\n\nwhere P (bij|\u00b5k,ij, \u03c3k,ij) = N (bij|\u00b5k,ij, \u03c32\n\nGaussian distribution, P (zk = 1) = \u03c0k, and(cid:80)q\n\nk=1\n\nfor lagged causal in\ufb02uences, we have\n\n(3)\nk,ij) and N (\u00b7) denotes a\nk=1 \u03c0k = 1. Similarly,\n\nP (aij,p) =\n\nP (zk = 1)P (aij,p|\u03bdk,ij,p, \u03c9k,ij,p),\n\n(4)\n\nq(cid:88)\n\nq(cid:88)\n\nk=1\n\nFigure 1: Graphical repre-\nsentation of a two-variable\ncase: x1 and x2 are two\nobserved variables, b12 is\nthe instantaneous causal\nstrength from x1 to x2, e1\nand e2 denote the noise\nterms w.r.t x1 and x2, re-\nspectively, Z is the group\nindicator, and ZE is the in-\ndicator of the MoG of e1\nand e2.\n\nwhere P (aij,p|\u03bdk,ij,p, \u03c9k,ij,p) = N (aij,p|\u03bdk,ij,p, \u03c92\nbij, for all i, j, p) in the same group share the same P (Z).\nWe also allow the noise distribution varies across groups but remains the same within the group.\nMore speci\ufb01cally, we model the non-Gaussian noise in each group with an MoG, and in different\ngroups, the noise may have different MoG distributions. Denote by Z E the indicator of the MoG of\nE, with Z E = (zE\n\nk,ij,p). Different causal coef\ufb01cients (aij,p and\n\nq(cid:48) ), and thus in group k with zk = 1, the distribution of E is:\n\n1 ,\u00b7\u00b7\u00b7 , zE\nP (E|zk = 1) =\n\nq(cid:48)(cid:88)\nk(cid:48)=1 \u03c0k,k(cid:48) = 1. Thus, P (E) =(cid:80)q\n(cid:80)q(cid:48)\n\nwhere P (E|zE\n\nk(cid:48)=1\n\nk(cid:48) = 1, zk = 1) = N (E|(cid:126)\u00b5E\n\nP (zE\n\nk(cid:48) = 1|zk = 1)P (E|zE\n\nk,k(cid:48), \u03a3E\n\nk,k(cid:48)), P (zE\nk=1 P (E|zk = 1)P (zk = 1).\n\nk(cid:48) = 1, zk = 1),\nk(cid:48) = 1|zk = 1) = \u03c0E\n\nk,k(cid:48), and\n\n(5)\n\n3\n\n\fFigure 1 shows the graphical representation of the entire model in a two-variable case, with only\ninstantaneous causal relations. Because in our model we consider bij as a random variable, there is\na causal edge from b12 to x2. Therefore, the speci\ufb01c and shared causal model is represented as\n\nX(t) = BX(t) +\n\nApX(t \u2212 p) + E(t),\n\npl(cid:88)\n\np=1\n\n(6)\n\n(7)\n\nwith\n\nP (bij) =(cid:80)q\nP (aij,p) =(cid:80)q\np(E) =(cid:80)q\n\nk=1 \u03c0kN (bij|\u00b5k,ij, \u03c32\nk,ij),\n(cid:80)q(cid:48)\nk=1 \u03c0kN (aij,p|\u03bdk,ij,p, \u03c92\nk,k(cid:48)N (E|(cid:126)\u00b5E\n\nk(cid:48)=1 \u03c0E\n\nk=1 \u03c0k\n\nk,ij,p),\nk,k(cid:48), \u03a3E\n\nk,k(cid:48)).\n\nIn the next section, we will discuss its identi\ufb01ability; the identi\ufb01ability applies to both the causal\nstructure and model parameters.\n\n3 Model Identi\ufb01ability\n\nTheorem 1 shows the identi\ufb01ability in a speci\ufb01c case, where there are different groups, and causal\nrelations are different across groups but identical within each group.\nTheorem 1. The proposed causal model in (6) and (7), including the causal structure and model\nparameters, is identi\ufb01able, as n \u2192 \u221e, under the following conditions:\n\n1. The parameters \u03c3k,ij = 0 and \u03c9k,ij,p = 0, for all i, j, k, p \u2208 N +.\n2. The sample size of each individual ls > 2q \u2212 1, where q is the number of groups.\n3. The instantaneous causal structure for each individual is acyclic.\n\nNote that although the instantaneous causal structure for each individual is acyclic, we allow that\nacross different groups, some causal directions are reversed. For instance, in the brain network,\ndifferent directions may be activated across subjects or states, which is hard to handle with traditional\nmethods. In addition, if there are cycles in the instantaneous causal structure, the identi\ufb01ability\nrequires two more conditions [23]: (1) the cycles are disjoint, and (2) the causal model is stable, i.e.,\nk\u2192\u221e Bk = 0. As an unsupervised method, the order of the group index is not identi\ufb01able; i.e., it can\nlim\nbe arbitrarily permuted. We are aware that in the above result it is assumed that there is no variation\nwithin groups. For the general case, the proof of the identi\ufb01ability results does not seem immediate,\nbut our empirical results suggest that the causal model is also identi\ufb01able. In the following, we give\na proof sketch of Theorem 1; for detailed proofs, please refer to the supplementary material.\n\nProof Sketch. Condition 1 means that bij and aij,p take a degenerate Gaussian distribution in each\ngroup; their distributions can be represented as follows:\n\nP (bij) =(cid:80)q\n\nk=1 \u03c0k\u03b4\u00b5k,ij (bij), P (aij,p) =(cid:80)q\n\nk=1 \u03c0k\u03b4\u03bdk,ij,p(aij,p),\n\nwhere \u03b4\u00b5k,ij (bij) = 1, if bij = \u00b5k,ij, and 0 if otherwise; \u03b4\u03bdk,ij,p(aij,p) = 1, if aij = \u03bdk,ij,p,\nand 0 if otherwise. With condition 1, the identi\ufb01ability of the proposed causal model can be seen\nfrom the view of \ufb01nite mixture models with grouped samples. The \u201cgrouped samples\u201d means that\nfor each individual there are several samples, and it is known in advance that they are identically\ndistributed samples from the same component. Note that the identi\ufb01ability of \ufb01nite mixture models\nwith grouped samples is easier to achieve (see [35]), compared to the case where the observations\nare drawn i.i.d. from a mixture model. Imagine an extreme case: if there are enough samples for\neach individual, then the corresponding components can be identi\ufb01ed from each individual directly.\nWe \ufb01rst show that under condition 1 and 2, where ls > 2q \u2212 1 [35], the cumulative distribu-\ntion function of each mixture component, as well as the mixture proportion, is identi\ufb01able; that is,\nP (X|zk = 1) is identi\ufb01able, for k = 1,\u00b7\u00b7\u00b7 , q, and P (Z) is identi\ufb01able. Next, thanks to the identi-\n\ufb01ability of independent component analysis-based models [7, 31], we can show that in each group,\ninstantaneous causal relations bij and lagged causal relations aij,p are identi\ufb01able [21].\n\n4\n\n\f4 Model Identi\ufb01cation\n\nThe speci\ufb01c and shared causal model de\ufb01ned above can be regarded as a latent variable\n\nmodel, with U = (cid:8){bij}m\n(cid:8){\u03c0k},{\u00b5k,ij},{\u03c3k,ij},{\u03bdk,ij,p},{\u03c9k,ij,p},{\u03c0E\n\ni,j=1,{aij,p}(cid:9) as latent variables that we are interested in, and \u03b8 =\nk,k(cid:48)}(cid:9) as free parameters that need\n\nk,k(cid:48)},{\u03a3E\n\nto be estimated.\nIn particular, we exploit a stochastic approximation expectation maximization\n(SAEM) algorithm [4], combined with Gibbs sampling in the E step and EM algorithm in the M\nstep, for model estimation.\n\nk,k(cid:48)},{(cid:126)\u00b5E\n\n4.1 Parameter Estimation with SAEM\nFor a traditional EM algorithm, the procedure is initialized at some \u03b80 \u2208 \u0398 and then iterates between\ntwo steps, expectation (E) and maximization (M):\n\n(E) Compute P\u03b8r\u22121(U|X) and the lower bound of the log-likelihood, Q(\u03b8, \u03b8r\u22121), with\n\n(cid:90)\n\nQ(\u03b8, \u03b8r\u22121) =\n\nP\u03b8r\u22121(U|X) log P\u03b8(X, U) dU.\n\n(M) Compute \u03b8r = arg max\u03b8\u2208\u0398 Q(\u03b8, \u03b8r\u22121).\n\nM(cid:88)\n\nj=1\n\nIn the E step, we need to compute the expectation under the posterior P\u03b8r\u22121(U|X), which is in-\ntractable in our case, since P (X, U) is not Gaussian. To address this issue, SAEM computes the E\nstep by Monte Carlo integration and uses a stochastic approximation update of the quantity Q at the\nr-th iteration:\n\n\u02dcQr(\u03b8) = (1 \u2212 \u03bbr) \u02dcQr\u22121(\u03b8) + \u03bbr\n\n1\nM\n\nlog P\u03b8(X1:n, \u02daU (1:n,r,j)),\n\n(8)\n\ns=1\n\nr \u03bb2\n\nr \u03bbr = \u221e and(cid:80)\n\nof positive step size, with(cid:80)\n\nwhere \u02daU indicates sampled particles of U, M is the generated number of particles, X1:n = {Xs}n\ns=1, and {\u03bbr}r\u22651 is a decreasing sequence\nand Xs = (X s(1),\u00b7\u00b7\u00b7 , X s(ls)), \u02daU (1:n,r,j) = {\u02daU (s,r,j)}n\nr < \u221e. More speci\ufb01cally, given the learned\nparameters in the current iteration, the values of latent variables are \ufb01rst sampled under the posteriori\ndensity. Then these sampled data are used to update the value of the conditional expectation of the\ncomplete log-likelihood with stochastic approximation. The E-step is thus replaced by the following:\n(E(cid:48)) At each iteration, generate M particles of \u02daU (1:n,r,j) from P\u03b8r\u22121 (U|X) and compute \u02dcQr(\u03b8)\naccording to (8). A method for sampling from P\u03b8r\u22121 (U|X) is introduced in the following.\nGibbs Sampling in E-step Since the dimension of latent variables U may be high, especially\nwhen m is large, we use Gibbs sampling to sample particles \u02daU from the posterior distribution,\nand within Gibbs sampling, we use independent doubly adaptive rejection metropolis sampling\n(IA2RMS) [24].\nThe idea in Gibbs sampling is to generate posterior samples by sweeping through each variable to\nsample from its conditional distribution with the remaining variables \ufb01xed to their current values.\nAt each iteration, perform\n\nbij \u223c P (bij|X1:n, U\\bij),\n\naij,p \u223c P (aij,p|X1:n, U\\aij,p),\n\n(9)\nfor all i, j, p \u2208 N+, where U\\bij and U\\aij,p denote all variables in U except bij and aij,p, respec-\ntively. In each sampling, we use IA2RMS. It differs from adaptive rejection metropolis sampling,\nwith an additional adaptive step to improve the proposal probability density function.\nEM Algorithm in M-step In the M step, we compute \u03b8r = arg max\u03b8\u2208\u0398 Q(\u03b8, \u03b8r\u22121). It is achieved\nby an inner EM algorithm. See supplementary materials for detailed derivations.\nThe computational complexity of SAEM in each iteration is O(m2nM T0), where m is the number\nof variables, n the number of subjects, M the number of sampled particles (we used M = 30),\nand T0 the number of iterations needed in the Gibbs sampling for each variable, depending on the\nnumber of rejection times and the number of supporting points that need to be calculated in the\nadaptive rejection sampling.\n\n5\n\n\f4.2 Speci\ufb01c and Shared Causal Relation Determination\nAfter estimating the parameters, we can derive the posterior distribution of {Ap}pl\ni,j,p P (aij,p)P (bij),\n\nP ({Ap}, B|X1:n) \u221d P (X1:n|{Ap}, B)(cid:81)\ns=1 ls \u00b7 PE((I \u2212 B) \u02c7X0 \u2212(cid:80)\nk,ij), P (aij,p) =(cid:80)q\n\nP (X1:n|{Ap}, B) = | det(I \u2212 B)|(cid:80)n\nP (bij) =(cid:80)q\np Ap \u02c7Xp) =(cid:80)q\nPE((I \u2212 B) \u02c7X0 \u2212(cid:80)\nk=1 \u03c0kN (bij|\u00b5k,ij, \u03c32\n\nk,k(cid:48)N(cid:0)(I \u2212 B) \u02c7X0 \u2212(cid:80)\n\nk=1 \u03c0kN (aij,p|\u03bdk,ij,p, \u03c9k,ij,p),\np Ap \u02c7Xp|\u00b5E\n\n(cid:80)q(cid:48)\n\np Ap \u02c7Xp),\n\nk=1 \u03c0k\n\nwhere\n\n\u02c7X0 = (X1\n\np+1:l1\n\n,\u00b7\u00b7\u00b7 , Xn\n\np+1:ln\n\n),\n\nand \u02c7Xp = (X1\n\nk(cid:48)=1 \u03c0E\n1:l1\u2212p,\u00b7\u00b7\u00b7 , Xn\n\n1:ln\u2212p).\n\np=1 and B, with\n\n(10)\n\nk,k(cid:48)(cid:1),\n\nk,k(cid:48), \u03a3E\n\nThen the estimated speci\ufb01c causal relationships are implied by the posterior distribution of Ap and\nB given the data from the s-th individual. More speci\ufb01cally, one may take the maximum a posterior\n(MAP) as a point estimator of Ap and B:\n\n{{ \u02c6As\n\np}, \u02c6Bs} = arg max\n{Ap},B\n\nP ({Ap}, B|Xs).\n\n(11)\n\nThe estimated shared causal relationships are implied by the posterior distribution of Ap and B\ngiven the data from all individuals, and its point estimator is\n\n{{ \u02c6Ap}, \u02c6B} = arg max\n{Ap},B\n\nP ({Ap}, B|X1:n).\n\n(12)\n\nRecall that the linear non-Gaussian acyclic model (LiNGAM [31]) \ufb01rst estimates W = (I \u2212 B)\u22121\nand then recovers the underlying adjacency matrix B by performing extra permutation and rescal-\ning, since W is only identi\ufb01ed up to permutation and scale. In our model, we directly model the\ncausal process B, with the following advantages: (1) It is easy to add prior knowledge of causal\nconnections. In practice, experts may have domain knowledge about some causal edges. (2) One\ncan directly enforce sparsity constraints on causal adjacencies. (3) The estimation procedure directly\noutputs the causal adjacency matrix, without additional steps of permutation and rescaling, which\nare usually expensive.\n\n5 Mechanism-based Clustering with Speci\ufb01c and Shared Causal Model\n\nAfter estimating the speci\ufb01c and shared causal model, we can immediately cluster individuals into\nq groups, by estimating P (zk = 1|Xs), for k = 1,\u00b7\u00b7\u00b7 , q, where\n\nP (zk = 1|Xs) \u221d P (Xs|zk = 1)P (zk = 1),\n\n(13)\n\n(14)\n\nand\n\nP (Xs|zk = 1) =\n\n(cid:90) (cid:90)\n\nP (Xs|{Ap}, B, zk = 1)P ({Ap}, B|zk = 1) d{Ap} dB.\n\nThe above integration does not have a closed form, and thus we use Monte Carlo integration. We\nsample M values of {Ap} and B from P ({Ap}, B|zk = 1), and thus\n\n= 1\nM\n\nwhere A(i)\n\nP (Xs|zk = 1)\n\n(cid:80)M\ni=1 | det(I \u2212 B(i))|ls \u00b7(cid:80)q(cid:48)\n1:ls\u2212p; (cid:126)\u00b5E\np and B(i) denote the sampled i-th value from P ({Ap}, B|zk = 1). Therefore,\n\nk(cid:48)=1 \u03c0k,k(cid:48)N(cid:0)(I \u2212 B(i))Xs\n\u03c0k,k(cid:48)N(cid:0)(I\u2212B(i))Xs\n\np+1:ls \u2212(cid:80)\np+1:ls \u2212(cid:88)\n\n| det(I\u2212B(i))|ls \u00b7 q(cid:48)(cid:88)\n\n1:ls\u2212p|\u00b5E\n\nM(cid:88)\n\np ApXs\n\nApXs\n\nP (zk = 1|Xs) \u221d \u03c0k\nM\n\ni=1\n\nk,k(cid:48) , \u03a3E\n\nk,k(cid:48)(cid:1),\nk,k(cid:48)(cid:1).\n\nk,k(cid:48) , \u03a3E\n\np\n\nDenote by cs the group that individual s belongs to. The estimated group index for individual s is\n\ufb01nally given by:\n\n\u02c6cs = arg max\n\nP (zk = 1|Xs).\n\n(15)\n\nk(cid:48)=1\n\nk\n\n6\n\n\f6 Experimental Results\n\nTo show the ef\ufb01cacy of the proposed approach to speci\ufb01c and shared causal relation discovery and\nits performance in mechanism-based clustering, we apply it to both synthetic and real-world data.\n\nk,ij \u223c U(0.01, 0.1), \u03c92\n\nk=1 \u03c0k = 1, (cid:80)q(cid:48)\n\nk(cid:48)=1 \u03c0E\n\nk,k(cid:48) \u223c U(0.2, 0.5), \u03c0k \u223c U(0.3, 0.6), \u03c0E\n\nU(0.3, 0.6), and(cid:80)q\n\nOther parameters were set as follows: \u03c32\nk,k(cid:48) \u223c U(\u22120.6,\u22120.4)\u222aU(0.4, 0.6), each entry of \u03a3E\n(cid:126)\u00b5E\n\nSynthetic Data We randomly generated acyclic causal structures according to the Erdos-Renyi\nmodel [6] with parameter 0.3. We denote by G the graph structure. Each generated graph has 5\nvariables. To show the generality of the proposed method, we varied the number of groups with\nq = 2, 3, the number of samples for each individual with ls = 20, 40, 60, and the number of\nindividuals with n = 60, 80, 100. Motivated from the real-world scenario that brain connectivities\nmay be enhanced or inhibited in individuals with mental disorders, such as autism and schizophrenia,\ncompared to typical controls, the parameters were set in the following way:\n\u2022 In the 2-group case (q = 2), when the group index k = 1 (e.g., typical control group), we set\n\u00b5k,ij \u223c U(0.8, 1) for all i, j where Gij = 1; when k = 2 (e.g., autism group), we randomly\nsampled pairs of i(cid:48), j(cid:48) where Gi(cid:48)j(cid:48) = 1, and set \u00b5k,i(cid:48)j(cid:48) \u223c U(0, 0.2) to model the situation that\nsome causal edges are inhibited, and for the remaining i, j where Gij = 1, \u00b5k,ij \u223c U(0.8, 1).\n\u2022 In the 3-group case (q = 3), when k = 1 or k = 2, \u00b5k,ij was the same as above; when k = 3\n(e.g., schizophrenia group), we randomly sampled pairs of i(cid:48), j(cid:48) where Gi(cid:48)j(cid:48) = 1, and set \u00b51,i(cid:48)j(cid:48) \u223c\nU(1.8, 2) to model the situation that some causal edges are enhanced.\nk,ij,p \u223c U(0.01, 0.1), each entry of\nk,k(cid:48) \u223c\nk,k(cid:48) = 1, where U(l, u) denotes a uniform distribution\nbetween l and u. For each setting (a particular group size q, a particular sample size for each\nindividual ls, and the number of individuals n), we generated 30 realizations.\nFor causal discovery, we identi\ufb01ed speci\ufb01c causal relations by the proposed approach. We compared\nit with other well-known approaches in causal discovery, including LiNGAM [31], the minimal\nchange method (MC) [11], the identical boundaries method (IB) [11], and GIMME [22, 9].\nIn\nparticular, we applied LiNGAM on each individual separately because it assumes a \ufb01xed causal\nmodel. Both MC and IB leverage the minimal change principle to identify the causal structure.\nGIMME is a heuristic method, which is designed to learn both speci\ufb01c and averaged causal relations.\nSince the state-of-the-art baselines, such as LiNGAM, MC, and IB, only consider instantaneous\ncausal relations, we report the identi\ufb01cation results of instantaneous causal relations. For time-\nlagged causal relations, the causal direction is \ufb01xed (from past to future), and thus, it reduces to a\nparameter identi\ufb01cation problem.\nIn our method, we initialized the parameters in the following way: we \ufb01rst estimated the correlation\nmatrix for each individual and clustered the estimated correlation matrices with K-means clustering,\nand then we used the estimated centroids of each group as the initial value of \u00b5k,ij. Other parameters\nwere initialized randomly. In our experiments, the number of groups was given. If there is a large\nnumber of groups, one may use some information criteria, such as the Minimal Message Length\n[8] to determine it. We denote by \u02c6Gs the estimated causal graph for the s-th individual. It was\ndetermined as follows: \u02c6Gs\nij = 0 if otherwise. Alternatively, one may use\nWald test to examine signi\ufb01cance of edges, as in [31]. The simulation was conducted on a 2.9GHZ\nIntel Core i5, and each trial costs about 5 minutes.\nIn Figure 2(Upper), we reported the F1 score to measure the accuracy of learned causal graphs.\nSpeci\ufb01cally, sub-\ufb01gure (a) shows the F1 score (y-axis) for the number of groups q = 2, the sample\nsize of each individual ls = 20, and the number of individuals n = 60, 80, 100 (x-axis), (b) for\nq = 2, n = 100, and ls = 20, 40, 60 (x-axis), (c) for q = 3, ls = 20, and n = 60, 80, 100 (x-axis),\nand (d) for q = 3, n = 100, and ls = 20, 40, 60 (x-axis). We can see that our proposed method\nSSCM has the best performance (the highest F1) in all cases, and the accuracy slightly increases\nalong with the number of individuals or the sample size per individual. MC, IB, and LiNGAM show\nsimilar performance and are less accurate than SSCM. MC and IB perform less well because they\nonly take into account the \ufb01rst two orders of noise distributions. The performance of LiNGAM may\nbe affected by the small sample size. GIMME does not perform well, possibly because it uses a\ngreedy, heuristic strategy without theoretical guarantees.\n\nij = 1 if |\u02c6bs\n\nij| > 0.1, and \u02c6Gs\n\n7\n\n\fFigure 2: (Upper) F1 score of the recovered causal structure; (Lower) L2 distance of the estimated\ncausal strength.\n\nBesides the causal structure, which only takes into account the existence of an edge, we also com-\npared the accuracy of the estimated causal strength quantitatively. It is important to compare the\nestimated causal strength, because in different groups the causal strength may be enhanced or inhib-\nited while the causal structure remains the same. In particular, we compared the L2 distance between\nthe true causal strength and the estimation for each individual, i.e., (cid:107)Bs \u2212 \u02c6Bs(cid:107)2. Figure 2(Lower)\nreports the estimated L2 distance with the proposed method in different settings, compared to that\nwith LiNGAM, IB, MC, and GIMME. Our SSCM gives the most accurate estimation of the causal\nstrength (the smallest L2 distance). MC and IB have the second-best accuracy, while LiNGAM and\nGIMME perform less well. LiNGAM fails to estimate the quantitative causal strength accurately,\nalthough the estimated qualitative causal graph has a similar accuracy with MC and IB.\n\nFigure 3: Adjusted Rand Index.\n\nNext, we performed mechanism-based clustering by directly leveraging the estimated speci\ufb01c and\nshared causal model. Figure 3 gives the clustering performance in different settings, measured by\nAdjusted Rand Index (ARI [27]). It measures the similarity between the estimated groups and the\nground truth (the higher, the more accurate). We compared our method with GIMME, LiNGAM-K-\nMeans, MC-K-Means, and IB-K-Means. In particular, GIMME originally estimates the group index\nfor each individual. LiNGAM-K-Means, MC-K-Means, and IB-K-Means use K-means to cluster\nthe causal relations estimated by LiNGAM, MC, and IB, respectively. We also performed paired,\none-sided Wilcoxon signed rank test on the estimated ARI between our method and each of the\nremaining ones [12], across different settings. Our method signi\ufb01cantly outperforms LiNGAM and\nGIMME, with p-values less than 0.002, and achieves performance that is comparable to MC and IB.\n\n8\n\n6080100# of subjects (a) 0.20.40.60.81.0F1 score2 groups204060# of samples (b) 2 groups6080100# of subjects (c) 3 groups204060# of samples (d) 3 groupsSSCMLiNGAMIBMCGIMME6080100# of subjects (a) 0.81.21.62.02.4L2 distance2 groups204060# of samples (b) 2 groups6080100# of subjects (c) 3 groups204060# of samples (d) 3 groups6080100# of subjects (a) 0.20.61ARI2 groupsSSCMLiNGAMIBMCGIMME204060# of samples (b) 2 groups6080100# of subjects (c) 3 groups204060# of samples (d) 3 groups\ffMRI Hippocampus We applied our methods to the fMRI hippocampus data [26], which contains\nsignals from six separate brain regions: perirhinal cortex (PRC), parahippocampal cortex (PHC),\nentorhinal cortex (EC), subiculum (Sub), CA1, and CA3/Dentate Gyrus (CA3) in resting states on\nthe same person on 84 successive days. We used anatomical connections [3, 38] as a reference. Some\nbiological evidence has shown that in resting-states, the effective pathways in the hippocampus may\nchange, depending on unmeasured intrinsic states [10]. We assume that the causal relations are \ufb01xed\non the same day, but may change across different days. With the proposed method, we found that\nthe causal relations between these six regions can be divided into two groups: in one group, the edge\nSub \u2192 EC is inhibited; in the other group, EC \u2192 CA3 and CA1 \u2192 Sub are inhibited. This result is\nconsistent with the \ufb01nding that EC \u2192 CA3 and CA1 \u2192 Sub are usually involved in consolidation\nof a long-term memory [28], while Sub \u2192 EC is usually implicated in working memory [29]. The\nedge CA3 \u2192 CA1 is robust, existing in both groups, which coincides with the current \ufb01ndings in\nneuroscience [32].\n\nMethods\nARI\n\nMC GIMME Plain K-Means\n0.25\n\n0.10\n\n0.87\n\nTable 1: Clustering performance on \ufb02ow cytometry data\n\nSSCM LiNGAM IB\n0.92\n\n0.21\n\n0.78\n\nCellular Signaling Networks We applied the proposed method to multivariate \ufb02ow cytometry\ndata, which were measured from 11 phosphorylated proteins and phospholipids [30]. A series of\nstimulatory cues and inhibitory interventions were performed, leading to different conditions. With\ndifferent interventions in different conditions, the causal relations over the 11 variables may change\nacross them. The data from each condition mimic a group, and in each condition, we segmented\nthe data into subsets, with 30 samples in each subset, mimicking an individual. Table 1 reports\nthe clustering performance, measured by ARI, on the data from condition phorbol myristate\nacetate and condition anti-CD3 + anti-CD28 + LY294002. Besides those comparisons in the\nsimulation, we also compared the clustering performance with plain K-means, that is, directly ap-\nplying K-means to the original data. Our method achieves the best performance, with ARI 0.92.\nCompared to the former condition, the causal strength of the following edges in the latter condition\nare inhibited: PIP2 \u2192 PIP3, Erk \u2192 Pka, Jnk \u2192 Pkc, and the following edges are enhanced: Raf\n\u2192 Mek, Mek \u2192 Raf, Akt \u2192 Pka, Pkc \u2192 P38. For the estimated cellular signaling networks under\neach condition, please see the supplementary material.\n\n7 Conclusions and Future Work\n\nIn this paper, we proposed a uni\ufb01ed framework for causal relations discovery and mechanism-based\nclustering. In particular, we developed a speci\ufb01c and shared causal model, which takes into ac-\ncount the variabilities of causal relations across individuals/groups and leverages commonalities to\nachieve statistically reliable estimation. Experimental results on synthetic and real-world data show\nthat the learned SSCM gives the speci\ufb01c causal knowledge for each individual as well as the gen-\neral trend over the population, and the estimated model directly provides the group information of\neach individual. Our current implementation relies on maximum likelihood estimation with SAEM,\nwhich does not generally scale well: currently we can handle 10 variables with 200 subjects within\n1 hour. For the purpose of improving scalability, one line of our future work is to use likelihood-free\nframeworks for parameter estimation with, e.g., adversarial learning. Moreover, we will extend our\nmethods to cover nonlinear causal relationships, to partially observable processes, and to data with\nselection bias [40].\n\nAcknowledgements\n\nWe thank Petar Stojanov for helping to revise the paper. We would like to acknowledge the support\nby National Institutes of Health under Contract No. NIH-1R01EB022858-01, FAIN-R01EB022858,\nNIH-1R01LM012087, NIH-5U54HG008540-02, and FAIN-U54HG008540, by the United States\nAir Force under Contract No. FA8650-17-C-7715, and by National Science Foundation EAGER\nGrant No. IIS-1829681. The National Institutes of Health, the U.S. Air Force, and the National\nScience Foundation are not responsible for the views reported in this article. KZ also bene\ufb01ted from\nfunding from Living Analytics Research Center and Singapore Management University.\n\n9\n\n\fReferences\n[1] R. Caruana. Multi-task learning. Machine Learning, 28:41\u201375, 1997.\n\n[2] D. M. Chickering. Optimal structure identi\ufb01cation with greedy search. Journal of Machine\n\nLearning Research, 3:507\u2013554, 2003.\n\n[3] M. B. Chris and B. Neil. The hippocampus and memory: insights from spatial processing.\n\nNature Reviews Neuroscience, 9(3):182\u2013194, 2008.\n\n[4] B. Delyon, M. Lavielle, and E. Moulines. Convergence of a stochastic approximation version\n\nof the EM algorithm. In The Annals of Statistics, volume 27(1), pages 94\u2013128, 1999.\n\n[5] S. J. H. Ebisch, V. Gallese, R. M. Willems, D. Mantini, W. B. Groen, G. L. Romani, and\net al. Altered intrinsic functional connectivity of anterior and posterior insula regions in high-\nfunctioning participants with autism spectrum disorder. Hum. Brain Mapp, 32(7):1013\u20131028,\n2011.\n\n[6] P. Erd\u02ddos and A. R\u00b4enyi. On random graphs I. Publicationes Mathematicae, 6:290\u2013297, 1959.\n\n[7] J. Eriksson and V. Koivunen. Identi\ufb01ability, separability, and uniqueness of linear ICA models.\n\nIEEE Signal Processing Letters, 11(7):601\u2013604, 2004.\n\n[8] M. A. T. Figueiredo and A. K. Jain. Unsupervised learning on \ufb01nite mixture models. IEEE\n\ntransactions of pattern analysis and machine intelligence, 24(3):381\u2013396, 2002.\n\n[9] K. M. Gates, P. C. Molenaar, F. G. Hillary, N. Ram, and M. J. Rovine. Automatic search for\nfMRI connectivity mapping: An alternative to Granger causality testing using formal equiv-\nalences among SEM path modeling, VAR, and uni\ufb01ed SEM. NeuroImage, 50:1118\u20131125,\n2010.\n\n[10] M. S. Gazzaniga, R. B. Ivry, and G. R. Mangun. Cognitive Neuroscience: The Biology of the\n\nMind. W. W. Norton Company, New York, 2014.\n\n[11] A. E. Ghassami, N. Kiyavash, B. Huang, and K. Zhang. Multi-domain causal structure learning\n\nin linear systems. In Advances in neural information processing systems (NeurIPS), 2018.\n\n[12] J. D. Gibbons and S. Chakraborti. Chapman Hall/CRC Press, 2011.\n\n[13] D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks: The combina-\n\ntion of knowledge and statistical data. Machine Learning, 20:197\u2013243, 1995.\n\n[14] P. O. Hoyer, D. Janzing, J. Mooji, J. Peters, and B. Sch\u00a8olkopf. Nonlinear causal discovery with\nadditive noise models. In Advances in Neural Information Processing Systems 21, Vancouver,\nB.C., Canada, 2009.\n\n[15] B. Huang, K. Zhang, M. Gong, and C. Glymour. Causal discovery and forecasting in non-\nIn International Conference on Machine\n\nstationary environments with state-space models.\nLearning (ICML), 2019.\n\n[16] B. Huang, K. Zhang, Y. Lin, B. Sch\u00a8olkopf, and C. Glymour. Generalized score functions\nfor causal discovery. In Proceedings of the 24th ACM SIGKDD International Conference on\nKnowledge Discovery Data Mining (KDD), pages 1551\u20131560, 2018.\n\n[17] B. Huang, K. Zhang, R. Sanchez-Romero, J. Ramsey, M. Glymour, and C. Glymour. Diagnosis\nof autism spectrum disorder by causal in\ufb02uence strength learned from resting-state fMRI data.\nIn arXiv preprint arXiv:1902.10073, 2019.\n\n[18] B. Huang, K. Zhang, and B. Sch\u00a8olkopf. Identi\ufb01cation of time-dependent causal model: A gaus-\nsian process treatment. In the 24th International Joint Conference on Arti\ufb01cial Intelligence,\npages 3561\u20133568, 2015.\n\n[19] B. Huang, K. Zhang, J. Zhang, J. Ramsey, R. Sanchez-Romero, C. Glymour, and B. Sch\u00a8olkopf.\nCausal discovery from heterogeneous/nonstationary data. In arXiv preprint arXiv:1902.10073,\n2019.\n\n10\n\n\f[20] B. Huang, K. Zhang, J. Zhang, Sanchez-Romero, C. R., Glymour, and B. Sch\u00a8olkopf. Behind\ndistribution shift: Mining driving forces of changes and causal arrows. In IEEE International\nConference on Data Mining (ICDM), pages 913\u2013918, 2017.\n\n[21] A. Hyv\u00a8arinen, K. Zhang, S. Shimizu, and P. Hoyer. Estimation of a structural vector autoregres-\nsion model using non-gaussianity. Journal of Machine Learning Research, pages 1709\u20131731,\n2010.\n\n[22] J. Kim, W. Zhu, L. Chang, P. Bentler, and T. Ernst. Uni\ufb01ed structural equation modeling\napproach for the analysis of multisubject, multivariate functional mri data. Human Brain Map-\nping, 28:85\u201393, 2007.\n\n[23] G. Lacerda, P. Spirtes, J. Ramsey, and P. O. Hoyer. Discovering cyclic causal models by\nindependent components analysis. In Proceedings of the 24th Conference on Uncertainty in\nArti\ufb01cial Intelligence (UAI), Helsinki, Finland, 2008.\n\n[24] L. Martino, J. Read, and D. Luengo. Independent doubly adaptive rejection metropolis sam-\nIEEE Transactions on Signal Processing, 63(12):3123\u20133138,\n\npling within gibbs sampling.\n2015.\n\n[25] J. Peters, P. B\u00a8uhlmann, and N. Meinshausen. Causal inference using invariant prediction:\nidenti\ufb01cation and con\ufb01dence intervals. Journal of the Royal Statistical Society: Series B,\n2016.\n\n[26] Poldrack and Laumann. https://openfmri.org/dataset/ds000031/, 2015.\n\n[27] W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the Amer-\n\nican Statistical Association, 66(336):846\u2013850, 1971.\n\n[28] M. Remondes and E M. Schuman. Role for a cortical input to hippocampal area ca1 in the\n\nconsolidation of a long-term memory. In Nature, volume 431(7009), pages 699\u2013703, 2004.\n\n[29] C. Riegert, R. Galani, S. Heilig, C. Lazarus, B. Cosquer, and J. C. Cassel. Electrolytic lesions\nof the ventral subiculum weakly alter spatial memory but potentiate amphetamine-induced\nlocomotion. In Behav. Brain Res., volume 152(1), pages 23\u201334, 2004.\n\n[30] K. Sachs, O. Perez, D. Pe\u2019er, D. A. Lauffenburger, and G. P. Nolan. Causal protein signaling\nnetworks derived from multiparameter single-cell data. In Science, volume 308, pages 523\u2013\n529, 2005.\n\n[31] S. Shimizu, P.O. Hoyer, A. Hyv\u00a8arinen, and A.J. Kerminen. A linear non-Gaussian acyclic\n\nmodel for causal discovery. Journal of Machine Learning Research, 7:2003\u20132030, 2006.\n\n[32] D. Song, M. C. Hsiao, I. Opris, R. E. Hampson, V. Z. Marmarelis, G. A. Gerhardt, S. A. Dead-\nwyler, and TW. Berger. Hippocampal microcircuits, functional connectivity, and prostheses.\nRecent Advances On the Modular Organization of the Cortex, pages 385\u2013405, 2015.\n\n[33] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. Spring-Verlag\n\nLectures in Statistics, 1993.\n\n[34] L. Q. Uddin, K. Supekar, and V. Menon. Typical and atypical development of functional human\nbrain networks: insights from resting-state fMRI. In Frontiers in systems neuroscience, volume\n4(21), 2010.\n\n[35] R. A. Vandermeulen and C. D. Scott. On the identi\ufb01ability of mixture models from grouped\n\nsamples. In arXiv preprint arXiv:1502.06644, 2015.\n\n[36] Y. Wang, C. Squires, A. Belyaeva, and C. Uhler. Direct estimation of differences in causal\n\ngraphs. In NIPS 2018.\n\n[37] K. Zhang and L. Chan. Extensions of ICA for causality discovery in the hong kong stock\nmarket. In Proc. 13th International Conference on Neural Information Processing (ICONIP\n2006), 2006.\n\n11\n\n\f[38] K. Zhang, B. Huang, J. Zhang, C. Glymour, and B. Sch\u00a8olkopf. Causal discovery from non-\nstationary/heterogeneous data: Skeleton estimation and orientation determination. In IJCAI,\n2017.\n\n[39] K. Zhang and A. Hyv\u00a8arinen. On the identi\ufb01ability of the post-nonlinear causal model.\n\nIn\nProceedings of the 25th Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), Montreal,\nCanada, 2009.\n\n[40] K. Zhang, J. Zhang, B. Huang, B. Sch\u00a8olkopf, and C. Glymour. On the identi\ufb01ability and\nIn\n\nestimation of functional causal models in the presence of outcome-dependent selection.\nProceedings of the 32rd Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), 2016.\n\n12\n\n\f", "award": [], "sourceid": 7490, "authors": [{"given_name": "Biwei", "family_name": "Huang", "institution": "Carnegie Mellon University"}, {"given_name": "Kun", "family_name": "Zhang", "institution": "CMU"}, {"given_name": "Pengtao", "family_name": "Xie", "institution": "Petuum / CMU"}, {"given_name": "Mingming", "family_name": "Gong", "institution": "University of Melbourne"}, {"given_name": "Eric", "family_name": "Xing", "institution": "Petuum Inc."}, {"given_name": "Clark", "family_name": "Glymour", "institution": "Carnegie Mellon University"}]}