{"title": "The Noisy-Logical Distribution and its Application to Causal Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 1673, "page_last": 1680, "abstract": "We describe a novel noisy-logical distribution for representing the distribution of a binary output variable conditioned on multiple binary input variables. The distribution is represented in terms of noisy-or's and noisy-and-not's of causal features which are conjunctions of the binary inputs. The standard noisy-or and noisy-and-not models, used in causal reasoning and artificial intelligence, are special cases of the noisy-logical distribution. We prove that the noisy-logical distribution is complete in the sense that it can represent all conditional distributions provided a sufficient number of causal factors are used. We illustrate the noisy-logical distribution by showing that it can account for new experimental findings on how humans perform causal reasoning in more complex contexts. Finally, we speculate on the use of the noisy-logical distribution for causal reasoning and artificial intelligence.", "full_text": "The Noisy-Logical Distribution and its Application to\n\nCausal Inference\n\nUniversity of California at Los Angeles\n\nUniversity of California at Los Angeles\n\nHongjing Lu\n\nDepartment of Psychology\n\nLos Angeles, CA 90095\nhongjing@ucla.edu\n\nAlan Yuille\n\nDepartment of Statistics\n\nLos Angeles, CA 90095\n\nyuille@stat.ucla.edu\n\nAbstract\n\nWe describe a novel noisy-logical distribution for representing the distribution of\na binary output variable conditioned on multiple binary input variables. The distri-\nbution is represented in terms of noisy-or\u2019s and noisy-and-not\u2019s of causal features\nwhich are conjunctions of the binary inputs. The standard noisy-or and noisy-and-\nnot models, used in causal reasoning and arti\ufb01cial intelligence, are special cases\nof the noisy-logical distribution. We prove that the noisy-logical distribution is\ncomplete in the sense that it can represent all conditional distributions provided a\nsuf\ufb01cient number of causal factors are used. We illustrate the noisy-logical dis-\ntribution by showing that it can account for new experimental \ufb01ndings on how\nhumans perform causal reasoning in complex contexts. We speculate on the use\nof the noisy-logical distribution for causal reasoning and arti\ufb01cial intelligence.\n\n1 Introduction\n\nThe noisy-or and noisy-and-not conditional probability distributions are frequently studied in cog-\nnitive science for modeling causal reasoning [1], [2],[3] and are also used as probabilistic models\nfor arti\ufb01cial intelligence [4]. It has been shown, for example, that human judgments of the power of\ncausal cues in experiments involving two cues [1] can be interpreted in terms of maximum likelihood\nestimation and model selection using these types of models [3].\nBut the noisy-or and noisy-and-not distributions are limited in the sense that they can only represent\na restricted set of all possible conditional distributions. This restriction is sometimes an advantage\nbecause there may not be suf\ufb01cient data to determine the full conditional distribution. Nevertheless it\nwould be better to have a representation that can expand to represent the full conditional distribution,\nif suf\ufb01cient data is available, but can be reduced to simpler forms (e.g. standard noisy-or) if there is\nonly limited data.\nThis motivates us to de\ufb01ne the noisy-logical distribution. This is de\ufb01ned in terms of noisy-or\u2019s\nand noisy-and-not\u2019s of causal features which are conjunctions of the basic input variables (inspired\nby the use of conjunctive features in [2] and the extensions in [5]). By restricting the choice of\ncausal features we can obtain the standard noisy-or and noisy-and-not models. We prove that the\nnoisy-logical distribution is complete in the sense that it can represent any conditional distribution\nprovided we use all the causal features. Overall, it gives a distribution whose complexity can be\nadjusted by restricting the number of causal features.\nTo illustrate the noisy-logical distribution we apply it to modeling some recent human experiments\non causal reasoning in complex environments [6]. We show that noisy-logical distributions involv-\ning causal factors are able to account for human performance. By contrast, an alternative linear\nmodel gives predictions which are the opposite of the observed trends in human causal judgments.\nSection (2) presents the noisy-logical distribution for the case with two input causes (the case com-\nmonly studied in causal reasoning). In section (3) we specify the full noisy-logical distribution and\n\n1\n\n\fwe prove its completeness in section (4). Section (5) illustrates the noisy-logical distribution by\nshowing that it accounts for recent experimental \ufb01ndings in causal reasoning.\n\n2 The Case with N = 2 causes\n\nIn this section we study the simple case when the binary output effect E depends only on two binary-\nvalued causes C1, C2. This covers most of the work reported in the cognitive science literature\n[1],[3]. In this case, the probability distribution is speci\ufb01ed by the four numbers P (E = 1|C1, C2),\nfor C1 \u2208 {0, 1}, C2 \u2208 {0, 1}.\nTo de\ufb01ne the noisy-logical distribution over two variables P (E = 1|C1, C2), we introduce three\nconcepts. Firstly, we de\ufb01ne four binary-valued causal features \u03a80(.), \u03a81(.), \u03a82(.), \u03a83(.) which are\nfunctions of the input state (cid:126)C = (C1, C2). They are de\ufb01ned by \u03a80( (cid:126)C) = 1, \u03a81( (cid:126)C) = C1, \u03a82( (cid:126)C) =\nC2, \u03a83( (cid:126)C) = C1\u2227C2, where \u2227 denotes logical-and operation(i.e. C1\u2227C2 = 1 if C1 = C2 = 1 and\nC1 \u2227 C2 = 0 otherwise). \u03a83( (cid:126)C) is the conjunction of C1 and C2. Secondly, we introduce binary-\nvalued hidden states E0, E1, E2, E3 which are caused by the corresponding features \u03a80, \u03a81, \u03a82, \u03a83.\nWe de\ufb01ne P (Ei = 1|\u03a8i; \u03c9i) = \u03c9i\u03a8i with \u03c9i \u2208 [0, 1], for i = 1, ..., 4 with (cid:126)\u03c9 = (\u03c91, \u03c92, \u03c93, \u03c94).\nThirdly, we de\ufb01ne the output effect E to be a logical combination of the states E0, E1, E2, E3\nwhich we write in form \u03b4E,f (E0,E1,E2,E3), where f(., ., ., .) is a logic function which is formed by a\ncombination of three logic operations AN D, OR, N OT . This induces the noisy-logical distribution\nPnl(E| (cid:126)C; (cid:126)\u03c9) =\nThe noisy-logical distribution is characterized by the parameters \u03c90, ..., \u03c93 and the choice of the\nlogic function f(., ., ., .). We can represent the distribution by a circuit diagram where the output E\nis a logical function of the hidden states E0, ..., E3 and each state is caused probabilistically by the\ncorresponding causal features \u03a80, ..., \u03a83, as shown in Figure (1).\n\n(cid:81)3\ni=0 P (Ei|\u03a8i( (cid:126)C); \u03c9i).\n\nE0,...,E3 \u03b4E,f (E0,E1,E2,E3)\n\n(cid:80)\n\nFigure 1: Circuit diagram in the case with N = 2 causes.\n\n(cid:88)\n\nE1,E2\n\nThe noisy-logical distribution includes the commonly known distributions, noisy-or and noisy-and-\nnot, as special cases. To obtain the noisy-or, we set E = E1 \u2228 E2 (i.e. E1 \u2228 E2 = 0 if E1 = E2 = 0\nand E1 \u2228 E2 = 1 otherwise). A simple calculation shows that the noisy-logical distribution reduces\nto the noisy-or Pnor(E|C1, C2; \u03c91, \u03c92) [4], [1]:\n\nPnl(E = 1|C1, C2; \u03c91, \u03c92) =\n\n\u03b41,E1\u2228E2P (E1|\u03a81( (cid:126)C); \u03c91)P (E2|\u03a82( (cid:126)C); \u03c92)\n= \u03c91C1(1 \u2212 \u03c92C2) + (1 \u2212 \u03c91C1)\u03c92C2 + \u03c91\u03c92C1C2\n= \u03c91C1 + \u03c92C2 \u2212 \u03c91\u03c92C1C2 = Pnor(E = 1|C1, C2; \u03c91, \u03c92)(1)\nTo obtain the noisy-and-not, we set E = E1 \u2227 \u00acE2 (i.e. E1 \u2227 \u00acE2 = 1 if E1 = 1, E2 = 0\nand E1 \u2227 \u00acE2 = 0 otherwise). The noisy-logical distribution reduces to the noisy-and-not\nPn\u2212and\u2212not(E|C1, C2; \u03c91, \u03c92) [4],[?]:\n\n(cid:88)\n\nPnl(E = 1|C1, C2; \u03c91, \u03c92) =\n\n\u03b41,E1\u2227\u00acE2P (E1|\u03a81( (cid:126)C); \u03c91)P (E2|\u03a82( (cid:126)C); \u03c92)\n\nE1,E2\n\n= \u03c91C1{1 \u2212 \u03c92C2} = Pn\u2212and\u2212not(E = 1|C1, C2; \u03c91, \u03c92) (2)\n\n2\n\n\fWe claim that noisy-logical distributions of this form can represent any conditional distribution\nP (E| (cid:126)C). The logical function f(E0, E1, E2, E3) will be expressed as a combination of logic oper-\nations AND-NOT, OR. The parameters of the distribution are given by \u03c90, \u03c91, \u03c92, \u03c93.\nThe proof of this claim will be given for the general case in the next section. To get some insight,\nwe consider the special case where we only know the values P (E|C1 = 1, C2 = 0) and P (E|C1 =\n1, C2 = 1). This situation is studied in cognitive science where C1 is considered to be a background\ncause which always takes value 1, see [1] [3]. In this case, the only causal features are considered,\n\u03a81( (cid:126)C) = C1 and \u03a82( (cid:126)C) = C2.\nResult. The noisy-or and the noisy-and-not models, given by equations (1,2) are suf\ufb01cient to \ufb01t any\nvalues of P (E = 1|1, 0) and P (E = 1|1, 1). (In this section we use P (E = 1|1, 0) to denote\nP (E = 1|C1 = 1, C2 = 0) and use P (E = 1|1, 1) to denote P (E = 1|C1 = 1, C2 = 1).)\nThe noisy-or and noisy-and-not \ufb01t the cases when P (E = 1|1, 1) \u2265 P (E = 1|1, 0) and P (E =\n1|1, 1) \u2264 P (E = 1|1, 0) respectively. In Cheng\u2019s terminology [1] C2 is respectively a generative or\npreventative cause).\nProof. We can \ufb01t both the noisy-or and noisy-and-not models to P (E|1, 0) by setting \u03c91 = P (E =\n1|1, 0), so it remains to \ufb01t the models to P (E|1, 1). There are three cases to consider: (i) P (E =\n1|1, 1) > P (E = 1|1, 0), (ii) P (E = 1|1, 1) < P (E = 1|1, 0), and (iii) P (E = 1|1, 1) =\nP (E = 1|1, 0).\nIt follows directly from equations (1,2) that Pnor(E = 1|1, 1) \u2265 Pnor(E =\n1|1, 0) and Pn\u2212and\u2212not(E = 1|1, 1) \u2264 Pn\u2212and\u2212not(E = 1|1, 0) with equality only if P (E =\n1|1, 1) = P (E = 1|1, 0). Hence we must \ufb01t a noisy-or and a noisy-and-not model to cases (i)\nand (ii) respectively. For case (i), this requires solving P (E = 1|1, 1) = \u03c91 + \u03c92 \u2212 \u03c91\u03c92 to\nobtain \u03c92 = {P (E = 1|1, 1) \u2212 P (E = 1|1, 0)}/{1 \u2212 P (E = 1|1, 0)} (note that the condition\nP (E = 1|1, 1) > P (E = 1|1, 0) ensures that \u03c92 \u2208 [0, 1]). For case (ii), we must solve P (E =\n1|1, 1) = \u03c91 \u2212 \u03c91\u03c92 which gives \u03c92 = {P (E = 1|1, 0) \u2212 P (E = 1|1, 1)}/P (E = 1|1, 0) (the\ncondition P (E = 1|1, 1) < P (E = 1|1, 0) ensures that \u03c92 \u2208 [0, 1]). For case (iii), we can \ufb01t either\nmodel by setting \u03c92 = 0.\n\n3 The Noisy-Logical Distribution for N causes\n\nWe next consider representing probability distributions of form P (E| (cid:126)C), where E \u2208 {0, 1} and\n(cid:126)C = (C1, ..., CN ) where Ci \u2208 {0, 1}, \u2200i = 1, .., N. These distributions can be characterized by\nthe values of P (E = 1| (cid:126)C) for all possible 2N values of (cid:126)C.\nWe de\ufb01ne the set of 2N binary-valued causal features {\u03a8i( (cid:126)C) : i = 0, ..., 2N \u2212 1}. These features\nare ordered so that \u03a80( (cid:126)C) = 1, \u03a8i( (cid:126)C) = Ci : i = 1, .., N, \u03a8N +1( (cid:126)C) = C1 \u2227 C2 is the conjunction\nof C1 and C2, and so on. The feature \u03a8( (cid:126)C) = Ca \u2227 Cb \u2227 ... \u2227 Cg will take value 1 if Ca = Cb =\n... = Cg = 1 and value 0 otherwise.\nWe de\ufb01ne binary variables {Ei : i = 0, ..., 2N \u2212 1} which are related to the causal features {\u03a8i :\ni = 0, ..., 2N \u2212 1} by distributions P (Ei = 1|\u03a8i; \u03c9i) = \u03c9i\u03a8i, speci\ufb01ed by parameters {\u03c9i : i =\n0, ..., 2N \u2212 1}.\nThen we de\ufb01ne the output variable E to be a logical (i.e. deterministic) function of the {Ei\n:\ni = 0, ..., 2N \u2212 1}. This can be thought of as a circuit diagram. In particular, we de\ufb01ne E =\nf(E0, ..., E2N\u22121) = (((((E1 \u2297 E2) \u2297 E3) \u2297 E4....) where E1 \u2297 E2 can be E1 \u2228 E2 or E1 \u2227 \u00acE2\n(where \u00acE means logical negation). This gives the general noisy-logical distribution, as shown in\nFigure (2).\n\n(cid:88)\n\n2N\u22121(cid:89)\n\nP (E = 1| (cid:126)C; (cid:126)\u03c9) =\n\n\u03b4E,f (E0,...,E2N \u22121)\n\nP (Ei = 1|\u03a8i; \u03c9i).\n\n(3)\n\n(cid:126)E\n\ni=0\n\n4 The Completeness Result\n\nThis section proves that the noisy-logical distribution is capable of representing any conditional\ndistribution. This is the main theoretical result of this paper.\n\n3\n\n\fFigure 2: Circuit diagram in the case with N causes. All conditional distributions can be represented\nin this form if we use all possible 2N causal features \u03a8, choose the correct parameters \u03c9, and select\nthe correct logical combinations \u2297.\n\nResult We can represent any conditional distribution P (E| (cid:126)C) de\ufb01ned on binary variables in terms\nof a noisy logical distribution given by equation (3).\nProof. The proof is constructive. We show that any distribution P (E| (cid:126)C) can be expressed as a\nnoisy-logical distribution.\nWe order the states (cid:126)C0, ..., (cid:126)C2N\u22121. This ordering must obey \u03a8i( (cid:126)Ci) = 1 and \u03a8i( (cid:126)Cj) = 0, \u2200j < i.\nThis ordering can be obtained by setting (cid:126)C0 = (0, ..., 0), then selecting the terms with a single\nconjunction (i.e. only one Ci is non-zero), then those with two conjunctions (i.e.\ntwo Ci\u2019s are\nnon-zero), then with three conjunctions, and so on.\nThe strategy is to use induction to build a noisy-logical distribution which agrees with P (E| (cid:126)C)\nfor all values of (cid:126)C. We loop over the states and incrementally construct the logical function\nf(E0, ..., E2N\u22121) and estimate the parameters \u03c90, ..., \u03c92N\u22121. It is convenient to recursively de-\n\ufb01ne a variable Ei+1 = Ei \u2297 Ei, so that f(E0, ..., E2N\u22121) = E2N\u22121.\nWe start the induction using feature \u03a80( (cid:126)C) = 1. Set E0 = E0 and \u03c90 = P (E|0, ..., 0). Then\nP (E0| (cid:126)C0; \u03c90) = P (E| (cid:126)C0), so the noisy-logical distribution \ufb01ts the data for input (cid:126)C0.\nNow proceed by induction to determine EM +1 and \u03c9M +1, assuming that we have determined EM\nand \u03c90, ..., \u03c9M such that P (EM = 1| (cid:126)Ci; \u03c90, ..., \u03c9M ) = P (E = 1| (cid:126)Ci), for i = 0, ..., M. There are\nthree cases to consider which are analogous to the cases considered in the section with two causes.\nCase 1. If P (E = 1| (cid:126)CM +1) > P (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M ) we need \u03a8M +1( (cid:126)C) to be a genera-\ntive feature. Set EM +1 = EM \u2228 EM +1 with P (EM +1 = 1|\u03a8M +1; \u03c9M +1) = \u03c9M +1\u03a8M +1. Then\nwe obtain:\nP (EM +1 = 1| (cid:126)CM +1; \u03c90, ., \u03c9M +1) = P (EM = 1| (cid:126)CM +1; \u03c90, ., \u03c9M )+P (EM +1|\u03a8M +1( (cid:126)C); \u03c9M +1)\n\n\u2212P (EM = 1| (cid:126)CM +1; \u03c90, ., \u03c9M )P (EM +1 = 1|\u03a8M +1( (cid:126)C); \u03c9M +1) =\n\nP (EM = 1| (cid:126)CM +1; \u03c90, ., \u03c9M )+\u03c9M +1\u03a8M +1( (cid:126)C)\u2212P (EM = 1| (cid:126)CM +1; \u03c90, ., \u03c9M )\u03c9M +1\u03a8M +1( (cid:126)C)\nIn particular, we see that P (EM +1 = 1| (cid:126)Ci; \u03c90, ..., \u03c9M +1) = P (EM = 1| (cid:126)Ci; \u03c90, ..., \u03c9M ) =\nP (E = 1| (cid:126)Ci) for i < M + 1 (using \u03a8M +1( (cid:126)Ci) = 0, \u2200i < M + 1). To determine the value\nof \u03c9M +1, we must solve P (E = 1| (cid:126)CM +1) = P (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M ) + \u03c9M +1 \u2212 P (EM =\n1| (cid:126)CM +1; \u03c90, ..., \u03c9M )\u03c9M +1 (using \u03a8M +1( (cid:126)CM +1) = 1). This gives \u03c9M +1 = {P (E = 1| (cid:126)CM +1) \u2212\nP (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M )}/{1 \u2212 P (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M +1)} (the conditions ensure\nthat \u03c9M +1 \u2208 [0, 1]).\nCase 2. If P (E = 1| (cid:126)CM +1) < P (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M ) we need \u03a8M +1( (cid:126)C) to be a preven-\ntative feature. Set EM +1 = EM \u2227 \u00acEM +1 with P (EM +1 = 1|\u03a8M +1; \u03c9M +1) = \u03c9M +1\u03a8M +1.\nThen we obtain:\nP (EM +1 = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M +1) = P (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M ){1 \u2212 \u03c9M +1\u03a8M +1( (cid:126)C)}.\n(4)\n\n4\n\n\fAs for the \ufb01rst case, P (EM +1 = 1| (cid:126)Ci; \u03c90, ..., \u03c9M +1) = P (EM = 1| (cid:126)Ci; \u03c90, ..., \u03c9M ) = P (E =\n1| (cid:126)Ci) for i < M + 1 (because \u03a8M +1( (cid:126)Ci) = 0, \u2200i < M + 1). To determine the value of\n\u03c9M +1 we must solve P (E = 1| (cid:126)CM +1) = P (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M ){1 \u2212 \u03c9M +1} (us-\ning \u03a8M +1( (cid:126)CM +1) = 1). This gives \u03c9M +1 = {P (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M ) \u2212 P (E =\n1| (cid:126)CM +1)}/P (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M ) (the conditions ensure that \u03c9M +1 \u2208 [0, 1]).\nCase 3. If P (E = 1| (cid:126)CM +1) = P (EM = 1| (cid:126)CM +1; \u03c90, ..., \u03c9M ), then we do nothing.\n\n5 Cognitive Science Human Experiments\n\nWe illustrate noisy-logical distributions by applying them to model two recent cognitive science\nexperiments by Liljeholm and Cheng which involve causal reasoning in complex environments [6].\nIn these experiments, the participants are asked questions about the causal structure of the data.\nBut the participants are not given enough data to determine the full distribution (i.e. not enough\nto determine the causal structure with certainty). Instead the experimental design forces them to\nchoose between two different causal structures.\nWe formulate this as a model selection problem [3].\nFormally, we specify distributions\nP (D|(cid:126)\u03c9, Graph) for generating the data D from a causal model speci\ufb01ed by Graph and parameter-\nized by (cid:126)\u03c9. These distributions will be of simple noisy-logical form. We set the prior distributions\nP ((cid:126)\u03c9|Graph) on the parameter values to be the uniform distribution. The evidence for the causal\nmodel is given by:\n\n(cid:90)\n\nP (D|Graph) =\n\nd(cid:126)\u03c9P (D|(cid:126)\u03c9, Graph)P ((cid:126)\u03c9|Graph).\n\n(5)\n\nWe then evaluate the log-likelihood ratio log P (D|Graph1)\nP (D|Graph2) between two causal models Graph1\nGraph2, called the causal support [3] and use this to predict the performance of the participants.\nThis gives good \ufb01ts to the experimental results.\nAs an alternative theoretical model, we consider the possibility that the participants use the same\ncausal structures, speci\ufb01ed by Graph1 and Graph2, but use a linear model to combine cues.\nFormally, this corresponds to a model P (E = 1|C1, ..., CN ) = \u03c91C1 + ... + \u03c9N CN (with\n\u03c9i \u2265 0, \u2200i = 1, ..., N and \u03c91 + ... + \u03c9N \u2264 1). This model corresponds [1, 3] to the classic\nRescorla-Wagner learning model [8].\nIt cannot be expressed in simple noisy-logical form. Our\nsimulations show that this model does not account for human participant performance .\nWe note that previous attempts to model experiments with multiple causes and conjunctions by\nNovick and Cheng [2] can be interpreted as performing maximum likelihood estimation of the pa-\nrameters of noisy-logical distributions (their paper helped inspire our work). Those experiments,\nhowever, were simpler than those described here and model selection was not used. The extensive\nliteratures on two cases [1, 3] can also be interpreted in terms of noisy-logical models.\n\n5.1 Experiment I: Multiple Causes\n\nIn Experiment 1 of [6], the cover story involves a set of allergy patients who either did or did not\nhave a headache, and either had or had not received allergy medicines A and B. The experimental\nparticipants were informed that two independent studies had been conducted in different labs us-\ning different patient groups. In the \ufb01rst study, patients were administered medicine A, whereas in\nthe second study patients were administered both medicines A and B. A simultaneous presenta-\ntion format [7] was used to display the speci\ufb01c contingency conditions used in both studies to the\nexperimental subjects. The participants were then asked whether medicine B caused the headache.\nWe represent this experiment as follows using binary-valued variables E, B1, B2, C1, C2. The vari-\nable E indicates whether a headache has occurred (E = 1) or not (E = 0). B1 = 1 and B2 = 1 no-\ntate background causes for the two studies (which are always present). C1 and C2 indicate whether\nmedicine A and B are present respectively (e.g. C1 = 1 if A is present, C1 = 0 otherwise). The\ndata D shown to the subjects can be expressed as D = (D1, D2) where D1 is the contingency table\nPd(E = 1|B1 = 1, C1 = 0, C2 = 0), Pd(E = 1|B1 = 1, C1 = 1, C2 = 0) for the \ufb01rst study\n\n5\n\n\fand D2 is the contingency table Pd(E = 1|B2 = 1, C1 = 0, C2 = 0), Pd(E = 1|B2 = 1, C1 =\n1, C2 = 1) for the second study.\nThe experimental design forces the participants to choose between the two causal models shown\non the left of \ufb01gure (3).\nThese causal models differ by whether C2 (i.e. medicine B)\ncan have an effect or not. We set P (D|(cid:126)\u03c9, Graph) = P (D1|(cid:126)\u03c91, Graph)P (D2|(cid:126)\u03c92, Graph),\ni )} (for i = 1, 2) is the contingency data. We express these distribu-\nwhere Di = {(E\u00b5, (cid:126)C \u00b5\ntions in form P (Di|(cid:126)\u03c9i, Graph) =\ni , Graph). For Graph1, P1(.) and P2(.)\nare P (E|B1, C1, \u03c9B1, \u03c9C1) and P (E|B2, C1, \u03c9B2, \u03c9C1).\nFor Graph2, P1(.) and P2(.) are\nP (E|B1, C1, \u03c9B1, \u03c9C1) and P (E|B2, C1, C2, \u03c9B2, \u03c9C1, \u03c9C2). All these P (E|.) are noisy-or dis-\ntributions.\nFor Experiment 1 there are two conditions [6], see table (1). In the \ufb01rst power-constant condition\n[6], the data is consistent with the causal structure for Graph1 (i.e. C2 has no effect) using noisy-or\ndistributions. In the second \u2206P-constant condition [6], the data is consistent with the causal structure\nfor Graph1 but with noisy-or replaced by the linear distributions (e.g. P (E = 1|C1, ..., Cn) =\n\u03c91C1 + ... + \u03c9nCn)).\n\n\u00b5 Pi(E\u00b5| (cid:126)C \u00b5\n\n(cid:81)\n\ni , (cid:126)\u03c9\u00b5\n\nTable 1: Experimental conditions (1) and (2) for Experiment 1\n(1) Pd(E = 1|B1 = 1, C1 = 0, C2 = 0), Pd(E = 1|B1 = 1, C1 = 1, C2 = 0)\nPd(E = 1|B2 = 1, C1 = 0, C2 = 0), Pd(E = 1|B2 = 1, C1 = 1, C2 = 1)\n(2) Pd(E = 1|B1 = 1, C1 = 0, C2 = 0), Pd(E = 1|B1 = 1, C1 = 1, C2 = 0)\nPd(E = 1|B2 = 1, C1 = 0, C2 = 0), Pd(E = 1|B2 = 1, C1 = 1, C2 = 1)\n\n16/24, 22/24\n0/24,18/24\n0/24, 6/24\n16/24,22/24\n\n5.2 Experiment I: Results\n\nWe compare Liljeholm and Cheng\u2019s experimental results with our theoretical simulations. These\ncomparisons are shown on the right-hand-side of \ufb01gure (3). The left panel shows the proportion\nof participants who decide that medicine B causes a headache for the two conditions. The right\npanel shows the predictions of our model (labeled \u201dnoisy-logical\u201d) together with predictions of a\nmodel that replaces the noisy-logical distributions by a linear model (labeled \u201dlinear\u201d). The simu-\nlations show that the noisy-logical model correctly predicts that participants (on average) judge that\nmedicine B has no effect in the \ufb01rst experimental condition, but B does have an effect in the second\ncondition. By contrast, the linear model makes the opposite (wrong) prediction. In summary, model\nselection comparing two noisy-logical models gives a good prediction of participant performance.\n\nFigure 3: Causal model and results for Experiment I. Left panel: two alternative causal models for\nthe two studies. Right panel: the experimental results (proportion of patients who think medicine\nB causes headaches)) for the Power-constant and \u2206P-constant conditions [6]. Far right, the causal\nsupport for the noisy-logic and linear models.\n\n6\n\n\f5.3 Experiment II: Causal Interaction\n\nLiljeholm and Cheng [6] also investigated causal interactions. The experimental design was identical\nto that used in Experiment 1, except that participants were presented with three studies in which only\none medicine (A) was tested. Participants were asked to judge whether medicine A interacts with\nbackground causes that vary across the three studies. We de\ufb01ne the background causes as B1,B2,B3\nfor the three studies, and C1 for medicine A. This experiment was also run under two different\nconditions, see table (2). The \ufb01rst power-constant condition [6] was consistent with a noisy-logical\nmodel, but the second power-varying condition [6] was not.\n\nTable 2: Experimental conditions (1) and (2) for Experiment 2\n(1) P (E = 1|B1 = 1, C1 = 0), P (E = 1|B1 = 1, C1 = 1)\nP (E = 1|B2 = 1, C1 = 0), P (E = 1|B2 = 1, C1 = 1)\nP (E = 1|B3 = 1, C1 = 0), P (E = 1|B3 = 1, C1 = 1)\n(2) P (E = 1|B1 = 1, C1 = 0), P (E = 1|B1 = 1, C1 = 1)\nP (E = 1|B2 = 1, C1 = 0), P (E = 1|B2 = 1, C1 = 1)\nP (E = 1|B3 = 1, C1 = 0), P (E = 1|B3 = 1, C1 = 1)\n\n16/24, 22/24\n8/24,20/24\n0/24,18/24\n0/24, 6/24\n0/24,12/24\n0/24,18/24\n\nThe experimental design caused participants to choose between two causal models shown on the\nleft panel of \ufb01gure (4). The probability of generating the data is given by P (D|(cid:126)\u03c9, Graph) =\nP (D1|(cid:126)\u03c91, Graph)P (D2|(cid:126)\u03c92, Graph)P (D3|(cid:126)\u03c93, Graph).\nthe P (Di|.) are noisy-\nor distributions P (E|B1, C1, \u03c9B1, \u03c9C1), P (E|B2, C1, \u03c9B2, \u03c9C1), P (E|B3, C1, \u03c9B3, \u03c9C1).\nFor\nthe P (Di|.) are P (E|B1, C1, \u03c9B1, \u03c9C1), P (E|B2, C1, B2C1, \u03c9B2, \u03c9C1, \u03c9B2C1) and\nGraph2,\nP (E|B3, C1, B3C1, \u03c9B3, \u03c9C1, \u03c9B3C1).\nAll the distributions are noisy-or on the unary causal features (e.g. B, C1), but the nature of the\nconjunctive cause B \u2227 C1 is unknown (i.e. not speci\ufb01ed by the experimental design). Hence our\ntheory considers the possibilities that it is a noisy-or (e.g. can produce headaches) or noisy-and-not\n(e.g. can prevent headaches), see graph 2 of Figure (4).\n\nFor Graph1,\n\n5.4 Results of Experiment II\n\nFigure (4) shows human and model performance for the two experimental conditions. Our noisy-\nlogical model is in agreement with human performance \u2013 i.e. there is no interaction between causes\nin the power-constant condition, but there is interaction in the power-varying condition. By contrast,\nthe linear model predicts interaction in both conditions and hence fails to model human performance.\n\nFigure 4: Causal model and results for Experiment II. Left panel: two alternative causal models (one\ninvolving conjunctions) for the three studies . Right panel: the proportion of participants who think\nthat there is an interaction (conjunction) between medicine A and the background for the power-\nconstant and power-varying conditions [6]. Far right, the causal support for the noisy-logical and\nlinear models.\n\n7\n\n\f6 Summary\n\nThe noisy-logical distribution gives a new way to represent conditional probability distributions\nde\ufb01ned over binary variables. The complexity of the distribution can be adjusted by restricting\nthe set of causal factors. If all the causal factors are allowed, then the distribution can represent\nany conditional distribution. But by restricting the set of causal factors we can obtain standard\ndistributions such as the noisy-or and noisy-and-not.\nWe illustrated the noisy-logical distribution by modeling experimental \ufb01ndings on causal reasoning.\nOur results showed that this distribution \ufb01tted the experimental data and, in particular, accounted for\nthe major trends (unlike the linear model). This is consistent with the success of noisy-or and noisy-\nand-not models for accounting for experiments involving two causes [1], [2],[3]. This suggests that\nhumans may make use of noisy-logical representations for causal reasoning.\nOne attraction of the noisy-logical representation is that it helps clarify the relationship between\nlogic and probabilities. Standard logical relationships between causes and effects arise in the limit\nas the \u03c9i take values 0 or 1. We can, for example, bias the data towards a logical form by using\na prior on the (cid:126)\u03c9. This may be useful, for example, when modeling human cognition \u2013 evidence\nsuggests that humans \ufb01rst learn logical relationships and, only later, move to probabilities.\nIn summary, the noisy-logical distribution is a novel way to represent conditional probability distri-\nbutions de\ufb01ned on binary variables. We hope this class of distributions will be useful for modeling\ncognitive phenomena and for applications to arti\ufb01cial intelligence.\n\nAcknowledgements\n\nWe thank Mimi Liljeholm, Patricia Cheng, Adnan Darwiche, Keith Holyoak, Iasonas Kokkinos, and\nYingNian Wu for helpful discussions. Mimi and Patricia kindly gave us access to their experimental\ndata. We acknowledge funding support from the W.M. Keck foundation and from NSF 0413214.\n\nReferences\n[1] P. W. Cheng. From covariation to causation: A causal power theory. Psychological Review,\n\n104, 367405. 1997.\n\n[2] L.R. Novick and P.W. Cheng. Assessing interactive causal in\ufb02uence. Psychological Review,\n\n111, 455-485. 2004.\n\n[3] T. L. Grif\ufb01ths, and J. B. Tenenbaum. Structure and strength in causal induction. Cognitive\n\nPsychology, 51, 334-384, 2005.\n\n[4] J. Pearl, Probabilistic Reasoning in Intelligent Systems. Morgan-Kauffman, 1988.\n[5] C.N. Glymour. The Mind\u2019s Arrow: Bayes Nets and Graphical Causal Models in Psychology.\n\nMIT Press. 2001.\n\n[6] M. Liljeholm and P. W. Cheng. When is a Cause the \u201dSame\u201d? Coherent Generalization across\n\nContexts. Psychological Science, in press. 2007.\n\n[7] M. J. Buehner, P. W. Cheng, and D. Clifford. From covariation to causation: A test of the\nassumption of causal power. Journal of Experimental Psychology: Learning, Memory, and\nCognition, 29, 1119-1140, 2003.\n\n[8] R. A. Rescorla, and A. R. Wagner. A theory of Pavlovian conditioning: Variations in the effec-\ntiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Prokasy (Eds.), Clas-\nsical conditioning II: Current theory and research (pp. 64-99). New York: Appleton-Century\nCrofts. 1972.\n\n8\n\n\f", "award": [], "sourceid": 664, "authors": [{"given_name": "Hongjing", "family_name": "Lu", "institution": null}, {"given_name": "Alan", "family_name": "Yuille", "institution": null}]}