{"title": "PC-Fairness: A Unified Framework for Measuring Causality-based Fairness", "book": "Advances in Neural Information Processing Systems", "page_first": 3404, "page_last": 3414, "abstract": "A recent trend of fair machine learning is to define fairness as causality-based notions which concern the causal connection between protected attributes and decisions. However, one common challenge of all causality-based fairness notions is identifiability, i.e., whether they can be uniquely measured from observational data, which is a critical barrier to applying these notions to real-world situations. In this paper, we develop a framework for measuring different causality-based fairness. We propose a unified definition that covers most of previous causality-based fairness notions, namely the path-specific counterfactual fairness (PC fairness). Based on that, we propose a general method in the form of a constrained optimization problem for bounding the path-specific counterfactual fairness under all unidentifiable situations. Experiments on synthetic and real-world datasets show the correctness and effectiveness of our method.", "full_text": "PC-Fairness: A Uni\ufb01ed Framework for Measuring\n\nCausality-based Fairness\n\nYongkai Wu\n\nUniversity of Arkansas\n\nyw009@uark.edu\n\nXintao Wu\n\nUniversity of Arkansas\nxintaowu@uark.edu\n\nLu Zhang\n\nUniversity of Arkansas\n\nlz006@uark.edu\n\nHanghang Tong\n\nUniversity of Illinois at Urbana-Champaign\n\nhtong@illinois.edu\n\nAbstract\n\nA recent trend of fair machine learning is to de\ufb01ne fairness as causality-based\nnotions which concern the causal connection between protected attributes and\ndecisions. However, one common challenge of all causality-based fairness notions\nis identi\ufb01ability, i.e., whether they can be uniquely measured from observational\ndata, which is a critical barrier to applying these notions to real-world situations.\nIn this paper, we develop a framework for measuring different causality-based fair-\nness. We propose a uni\ufb01ed de\ufb01nition that covers most of previous causality-based\nfairness notions, namely the path-speci\ufb01c counterfactual fairness (PC fairness).\nBased on that, we propose a general method in the form of a constrained opti-\nmization problem for bounding the path-speci\ufb01c counterfactual fairness under all\nunidenti\ufb01able situations. Experiments on synthetic and real-world datasets show\nthe correctness and effectiveness of our method.\n\n1\n\nIntroduction\n\nFair machine learning is now an important research \ufb01eld which studies how to develop predictive\nmachine learning models such that decisions made with their assistance fairly treat all groups of\npeople irrespective of their protected attributes such as gender, race, etc. A recent trend in this \ufb01eld is\nto de\ufb01ne fairness as causality-based notions which concern the causal connection between protected\nattributes and decisions. Based on Pearl\u2019s structural causal models [8], a number of causality-based\nfairness notions have been proposed for capturing fairness in different situations, including total effect\n[19, 16, 20], direct/indirect discrimination [19, 16, 7, 20], and counterfactual fairness [5, 14, 15, 9].\nOne common challenge of all causality-based fairness notions is identi\ufb01ability, i.e., whether they can\nbe uniquely measured from observational data. As causality-based fairness notions are de\ufb01ned based\non different types of causal effects, such as total effect on interventions, direct/indirect discrimination\non path-speci\ufb01c effects, and counterfactual fairness on counterfactual effects, their identi\ufb01ability\ndepends on the identi\ufb01ability of these causal effects. Unfortunately, in many situations these causal\neffects are in general unidenti\ufb01able, referred to as unidenti\ufb01able situations [12]. Identi\ufb01ability is a\ncritical barrier for the causality-based fairness to be applied to real applications. In previous works,\nsimplifying assumptions are proposed to evade this problem [5, 19, 4]. However, these simpli\ufb01cations\nmay severely damage the performance of predictive models. In [20] the authors propose a method\nto bound indirect discrimination as the path-speci\ufb01c effect in unidenti\ufb01able situations, and in [14] a\nmethod is proposed to bound counterfactual fairness. Nevertheless, the tightness of these methods is\nnot analyzed. In addition, it is not clear whether these methods can be applied to other unidenti\ufb01able\nsituations, and more importantly, a combination of multiple unidenti\ufb01able situations.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fIn this paper, we propose a framework for handling different causality-based fairness notions. We \ufb01rst\npropose a general representation of all types of causal effects, i.e., the path-speci\ufb01c counterfactual\neffect, based on which we de\ufb01ne a uni\ufb01ed fairness notion that covers most previous causality-based\nfairness notions, namely the path-speci\ufb01c counterfactual fairness (PC fairness). We summarize all\nunidenti\ufb01able situations that are discovered in the causal inference literature. Then, we develop a\nconstrained optimization problem for bounding the PC fairness, which is motivated by the method\nproposed in [2] for bounding confounded causal effects. The key idea is to parameterize the causal\nmodel using so-called response-function variables, whose distribution captures all randomness\nencoded in the causal model, so that we can explicitly traverse all possible causal models to \ufb01nd\nthe tightest possible bounds. In the experiments, we evaluate the proposed method and compare it\nwith previous bounding methods using both synthetic and real-world datasets. The results show that\nour method is capable of bounding causal effects under any unidenti\ufb01able situation or combinations.\nWhen only path-speci\ufb01c effect or counterfactual effect is considered, our method provides tighter\nbounds than methods in [20] or [14]. The proposed framework settles a general theoretical foundation\nfor causality-based fairness. We make no assumption about the hidden confounders so that hidden\nconfounders are allowed to exist in the causal model. We also make no assumption about the data\ngenerating process and whether the observation data is generated by linear or non-linear functions\nwould not introduce bias into our results. We only assume that the causal graph is given, which is a\ncommon assumption in structural causal models.\nRelationship to other work. In [3], the author introduces the term \u201cpath-speci\ufb01c counterfactual\nfairness\u201d, which states that a decision is fair toward an individual if it coincides with the one\nthat would have been taken in a counterfactual world in which the sensitive attribute along the\nunfair pathways were different. They develop a correction method called PSCF for eliminating the\nindividual-level unfair information contained in the observations while retaining fair information.\nCompared to [3], we formally de\ufb01ne a general fairness notion which, besides the individual-level\nfairness, is also applied to fairness in any sub-group of the population. In addition, we further\nconsider the identi\ufb01ability issue in causal inference that is inevitably brought by conditioning on the\nindividual level. Unidenti\ufb01able situation means that there exist two causal models which exactly\nagree with the same observational distribution (hence cannot be distinguished using statistic methods\nsuch as maximum likelihood), but lead to very different causal effects. In our paper, we address\nvarious unidenti\ufb01able situations by developing a general bounding method. The authors in [6]\nstudy the conditional path-speci\ufb01c effect and develop a complete identi\ufb01cation algorithm with the\napplication to the problem of algorithmic fairness. Similar to our proposed notion, their notion is\nalso quanti\ufb01ed via conditional distributions over the interventional variant. However, the conditional\npath-speci\ufb01c effect generalizes the conditional causal effect, where the factual condition is assumed\nto be \u201cnon-contradictory\u201d (such as age in measuring the effect of smoking on lung cancer) [12]. The\npath-speci\ufb01c counterfactual effect, on the other hand, generalizes the counterfactual effect, where\nthe factual condition can be contradictory to the observation. Formally, in the conditional path-\nspeci\ufb01c effect, the condition is performed on the pre-intervention distribution, but in the path-speci\ufb01c\ncounterfactual effect, the condition is performed on the post-intervention distribution.\n\n2 Preliminaries\n\nIn our notations, an uppercase denotes a variable, e.g., X; a bold uppercase denotes a set of variables,\ne.g., X; and a lowercase denotes a value or a set of values of the variables, e.g., x and x.\n\n2.1 Causal Model and Causal Graph\nDe\ufb01nition 1 (Structural Causal Model [8]). A structural causal model M is represented by a\nquadriple (cid:104)U, V, F, P (U)(cid:105) where\n\n1. U is a set of exogenous variables that are determined by factors outside the model.\n2. P (U) is a joint probability distribution de\ufb01ned over U.\n3. V is a set of endogenous variables that are determined by variables in U \u222a V.\n4. F is a set of structural equations from U \u222a V to V. Speci\ufb01cally, for each V \u2208 V, there is a\nfunction fV \u2208 F mapping from U \u222a (V\\V ) to V , i.e., v = fV (paV , uV ), where paV is a\nrealization of a set of endogenous variables PAV \u2208 V \\ V that directly determines V , and\nuV is a realization of a set of exogenous variables that directly determines V .\n\n2\n\n\fFigure 1: Causal graphs of a Markovian model and a semi-Markovian models\n\nIn general, fV (\u00b7) can be an equation of any type. In some cases, people may assume that fV (\u00b7) is of\na speci\ufb01c type, e.g., the nonlinear additive function if v = fV (paV ) + uV . On the other hand, if all\nexogenous variables in U are assumed to be mutually independent, then the causal model is called\na Markovian model; otherwise, it is called a semi-Markovian model. In this paper, we don\u2019t make\nassumptions about the type of equations and independence relationships among exogenous variables.\nThe causal model M is associated with a causal graph G = (cid:104)V,E(cid:105) where V is a set of nodes and E is\na set of edges. Each node of V corresponds to a variable of V in M. Each edge in E, denoted by a\ndirected arrow \u2192, points from a node X \u2208 U \u222a V to a different node Y \u2208 V if fY uses values of\nX as input. A causal path from X to Y is a directed path which traces arrows directed from X to\nY . The causal graph is usually simpli\ufb01ed by removing all exogenous variables from the graph. In a\nMarkovian model, exogenous variables can be directly removed without loss of information. In a\nsemi-Markovian model, after removing exogenous variables we also need to add dashed bi-directed\nedges between the children of correlated exogenous variables to indicate the existence of unobserved\ncommon cause factors, i.e., hidden confounders. Examples are demonstrated in Figure 1.\n\n2.2 Causal Effects\n\nQuantitatively measuring causal effects in the causal model is facilitated with the do-operator [8]\nwhich forces some variable X to take certain value x, formally denoted by do(X = x) or do(x).\nIn a causal model M, the intervention do(x) is de\ufb01ned as the substitution of structural equation\nX = fX (PAX , UX ) with X = x. For an observed variable Y (Y (cid:54)= X) which is affected by the\nintervention, its interventional variant is denoted by Yx. The distribution of Yx, also referred to as the\npost-intervention distribution of Y under do(x), is denoted by P (Yx = y) or simply P (yx).\nBy using the do-operator, the total causal effect is de\ufb01ned as follows.\nDe\ufb01nition 2 (Total Causal Effect [8]). The total causal effect of the value change of X from x0 to x1\non Y = y is given by\n\nTCE(x1, x0) = P (yx1) \u2212 P (yx0).\n\nThe total causal effect is de\ufb01ned as the effect of X on Y where the intervention is transferred along\nall causal paths from X to Y . If we force the intervention to be transferred only along a subset of all\ncausal paths from X to Y , the causal effect is then called the path-speci\ufb01c effect, de\ufb01ned as follows.\nDe\ufb01nition 3 (Path-speci\ufb01c Effect [1]). Given a causal path set \u03c0, the \u03c0-speci\ufb01c effect of the value\nchange of X from x0 to x1 on Y = y through \u03c0 (with reference x0) is given by\n\nPE\u03c0(x1, x0) = P (yx1|\u03c0,x0|\u00af\u03c0) \u2212 P (yx0 ),\n\nwhere P (Yx1|\u03c0,x0|\u00af\u03c0) represents the post-intervention distribution of Y where the effect of intervention\ndo(x1) is transmitted only along \u03c0 while the effect of reference intervention do(x0) is transmitted\nalong the other paths.\n\nDe\ufb01nition 2 and 3 consider the average causal effect over the entire population without any prior\nobservations. If we have certain observations about a subset of attributes O = o and use them as con-\nditions when inferring the causal effect, then the causal inference problem becomes a counterfactual\ninference problem meaning that the causal inference is performed on the sub-population speci\ufb01ed\nby O = o only. Symbolically, the distribution of Yx conditioning on factual observation O = o is\ndenoted by P (yx|o). The counterfactual effect is de\ufb01ned as follows.\nDe\ufb01nition 4 (Counterfactual Effect [12]). Given a factual condition O = o, the counterfactual effect\nof the value change of X from x0 to x1 on Y = y is given by\n\nCE(x1, x0|o) = P (yx1|o) \u2212 P (yx0|o).\n\n3\n\nXYUXUYcorrelatedXYXYUXUYindependentXYAMarkovianmodelAsemi-Markovianmodel\fTable 1: Connection between previous fairness notions and PC fairness\n\nDescription\nTotal effect\n(System) Direct discrimination\n(System) Indirect discrimination\nIndividual direct discrimination\nGroup direct discrimination\nCounterfactual fairness\nCounterfactual error rate\n\nO = \u2205 and \u03c0 = \u03a0\n\nReferences Relating to PC fairness\n[19, 16]\n[19, 7, 16] O = \u2205 or {S} and \u03c0 = \u03c0d = {S \u2192 \u02c6Y }\n[19, 7, 16] O = \u2205 or {S} and \u03c0 = \u03c0i \u2282 \u03a0\nO = {S, X} and \u03c0 = \u03c0d = {S \u2192 \u02c6Y }\n[17]\nO = Q = PAY \\{S} and \u03c0 = \u03c0d = {S \u2192 \u02c6Y }\n[18]\nO = {S, X} and \u03c0 = \u03a0\n[5, 9, 14]\nO = {S, Y } and \u03c0 = \u03c0d or \u03c0i\n[15]\n\n3 Path-speci\ufb01c Counterfactual Fairness\n\nIn this section, we de\ufb01ne a uni\ufb01ed fairness notion for representing different causality-based fairness\nnotions. The key component of our notion is a general representation of causal effects. Consider\nan intervention on X which is transmitted along a subset of causal paths \u03c0 to Y , conditioning on\nobservation O = o. Based on that, we de\ufb01ne path-speci\ufb01c counterfactual effect as follows.\nDe\ufb01nition 5 (Path-speci\ufb01c Counterfactual Effect). Given a factual condition O = o and a causal\npath set \u03c0, the path-speci\ufb01c counterfactual effect of the value change of X from x0 to x1 on Y = y\nthrough \u03c0 (with reference x0) is given by\n\nPCE\u03c0(x1, x0|o) = P (yx1|\u03c0,x0|\u00af\u03c0|o) \u2212 P (yx0|o).\n\nIn the context of fair machine learning, we use S \u2208 {s+, s\u2212} to denote the protected attribute,\nY \u2208 {y+, y+} to denote the decision, and X to denote a set of non-protected attributes. The\nunderlying mechanism of the population over the space S \u00d7 X \u00d7 Y is represented by a causal model\nM, which is associated with a causal graph G. A historical dataset D is drawn from the population,\nwhich is used to construct a predictor h : X, S \u2192 \u02c6Y . The causal model for the population over space\nS \u00d7 X \u00d7 \u02c6Y can be considered the same as M except that function fY is replaced with a predictor h.\nWe use \u03a0 to denote all causal paths from S to \u02c6Y in the causal graph.\nThen, we de\ufb01ne the path-speci\ufb01c counterfactual fairness based on De\ufb01nition 5.\nDe\ufb01nition 6 (Path-speci\ufb01c Counterfactual Fairness (PC Fairness)). Given a factual condition\nO = o where O \u2286 {S, X, Y } and a causal path set \u03c0, predictor \u02c6Y achieves the PC fairness\nif PCE\u03c0(s1, s0|o) = 0 where s1, s0 \u2208 {s+, s\u2212}. We also say that \u02c6Y achieves the \u03c4-PC fairness if\n\n(cid:12)(cid:12)PCE\u03c0(s1, s0|o)(cid:12)(cid:12) \u2264 \u03c4.\n\nWe show that previous causality-based fairness notions can be expressed as special cases of the PC\nfairness. Their connections are summarised in Table 1, where \u03c0d contains the direct edge from S to\n\u02c6Y , and \u03c0i is a path set that contains all causal paths passing through any redlining attributes (i.e., a\nset of attributes in X that cannot be legally justi\ufb01ed if used in decision-making). Based on whether O\nequals \u2205 or not, the previous notions can be categorized into the ones that deal with the system level\n(O = \u2205) and the ones that have certain conditions (O (cid:54)= \u2205). Based on whether \u03c0 equals \u03a0 or not,\nthe previous notions can be categorized into the ones that deal with the total causal effect (\u03c0 = \u03a0),\nthe ones that consider the direct discrimination (\u03c0 = \u03c0d), and the ones that consider the indirect\ndiscrimination (\u03c0 = \u03c0i).\nIn addition to unifying the existing notions, the notion of PC fairness also resolves new types of\nfairness that the previous notions cannot do. One example is individual indirect discrimination,\nwhich means discrimination along the indirect paths for a particular individual. Individual indirect\ndiscrimination has not been studied yet in the literature, probably due to the dif\ufb01culty in de\ufb01nition\nand identi\ufb01cation. However, it can be directly de\ufb01ned and analyzed using PC fairness by letting\nO = {S, X} and \u03c0 = \u03c0i.\n\n4 Measuring Path-speci\ufb01c Counterfactual Fairness\n\nIn this section, we develop a general method for bounding the path-speci\ufb01c counterfactual effect\nin any unidenti\ufb01able situation. In the causal inference \ufb01eld, researchers have studied the reasons\n\n4\n\n\fFigure 2: The \u201cbow graph\u201d.\n\nFigure 3: The \u201ckite graph\u201d.\n\nFigure 4: The \u201cw graph\u201d.\n\nfor unidenti\ufb01ability under different cases. When O = \u2205 and \u03c0 \u2282 \u03a0, the reason for unidenti\ufb01ability\ncan be the existence of the \u201ckite graph\u201d (see Figure 3) in the causal graph [1]. When O (cid:54)= \u2205 and\n\u03c0 = \u03a0, the reason for unidenti\ufb01ability can be the existence of the \u201cw graph\u201d (see Figure 4) [11]. In\nany situation, as long as there exists a \u201chedge graph\u201d (where the simplest case is the \u201cbow graph\u201d as\nshown in Figure 2), then the causal effect is unidenti\ufb01able [12]. Obviously, all above unidenti\ufb01able\nsituations can exist in the path-speci\ufb01c counterfactual effect.\nOur method is motivated by [2] which formulates the bounding problem as a constrained optimization\nproblem. The general idea is to parameterize the causal model and use the observational distribution\nP (V) to impose constraints on the parameters. Then, the path-speci\ufb01c counterfactual effect of\ninterest is formulated as an objective function of maximization or minimization for estimating its\nupper or lower bound. The bounds are guaranteed to be tight as we traverse all possible causal models\nwhen solving the optimization problem. Thus, a byproduct of the method is a unique estimation of\nthe path-speci\ufb01c counterfactual effect in the identi\ufb01able situation.\nFor presenting our method, we \ufb01rst introduce a key concept called the response-function variable.\n\n4.1 Response-function Variable\n\nResponse-function variables are proposed in [2] for parameterizing the causal model. Consider\nan arbitrary endogenous variable denoted by V \u2208 V, its endogenous parents denoted by PAV , its\nexogenous parents denoted by UV , and its associated structural function in the causal model denoted\nby v = fV (paV , uV ). In general, UV can be a variable of any type with any domain size, and fV can\nbe any function, making the causal model very dif\ufb01cult to be handled. However, we can note that, for\neach particular value uV of UV , the functional mapping from PAV to V is a particular deterministic\nresponse function. Thus, we can map each value of UV to a deterministic response function. Although\nthe domain size of UV is unknown which might be very large or even in\ufb01nite, the number of different\ndeterministic response functions is known and limited, given the domain sizes of PAV and V . This\nmeans that the domain of UV can be divided into several equivalent regions, each corresponding to\nthe same response function. As a result, we can transform the original non-parameterized structural\nfunction to a limited number of parameterized functions.\nFormally, we represent equivalent regions of each endogenous variable V by the response-function\nvariable RV = {0,\u00b7\u00b7\u00b7 , NV \u2212 1} where NV = |V ||PAV | is the total number of different deterministic\nresponse functions mapping from PAV to V (NV = |V | if V has no parent). Each value rV represents\na pre-de\ufb01ned response function. We also denote the mapping from UV to RV as rV = (cid:96)V (uV ).\nThen, for any fV (paV , uV ), it can be re-formulated as\n\nV (rV )) = fV \u25e6 (cid:96)\u22121\nV , and denotes the response functions represented by rV .\n\nV (paV , rV ) = gV (paV , rV ),\n\nfV (paV , uV ) = fV (paV , (cid:96)\u22121\nwhere gV is the composition of fV and (cid:96)\u22121\nWe denote the set of all response-function variables by R = {RV : V \u2208 V}.\nNext, we show how joint distribution P (v) can be expressed as a linear function of P (r). According\nto [13], P (v) can be expressed as the summation over the probabilities of certain values u of U that\nsatisfy following corresponding requirements: for each V \u2208 V, we must have fV (paV , uV ) = v\nwhere v, paV are speci\ufb01ed by v and uV is speci\ufb01ed by u. In other words, denoting by V (u) the\nu:V(u)=v P (u). Then, by mapping from\nr:V(r)=v P (r), where for each V \u2208 V, V (r) = v means\n\nvalue that V would obtain if U = u, we have P (v) =(cid:80)\nU to R, we accordingly obtain P (v) =(cid:80)\n\nthat gV (paV , rV ) = v. As a result, by de\ufb01ning an indicator function\n\n(cid:26)1\n\n0\n\nI(v; paV , rV ) =\n\nif gV (paV , rV ) = v,\notherwise,\n\n5\n\nXYXWY\u03c0={X\u2192W\u2192Z\u2192Y}ZXYYxx\fwe obtain\n\n(cid:88)\n\nr\n\nP (r)\n\n(cid:89)\n\nV \u2208V\n\nP (v) =\n\nI(v; paV , rV ),\n\n(1)\n\nwhich is a linear expression of P (r).\nExample 1. Consider the causal graph shown in Figure 1 with two endogenous variables X and Y ,\nand two exogenous variables UX and UY with unknown domains. Assume that both X and Y are\nbinary, i.e., X \u2208 {x0, x1} and Y \u2208 {y0, y1}, and denote their response variables as RX and RY . For\nY , since there are a total number of 22 = 4 response functions, response-function variable RY and\nresponse function gY can be de\ufb01ned as follows:\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\ny0 if rY = 0;\ny0 if x = x0, rY = 1;\ny1 if x = x1, rY = 1;\ny1 if x = x0, rY = 2;\ny0 if x = x1, rY = 2;\ny1 if rY = 3.\n\n\uf8f1\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f3\n(cid:26)0\n\n1\n\nrY = (cid:96)Y (uY ) =\n\ngY (x, rY ) =\n\n0 if fY (x0, uY ) = y0, fY (x1, uY ) = y0;\n1 if fY (x0, uY ) = y0, fY (x1, uY ) = y1;\n2 if fY (x0, uY ) = y1, fY (x1, uY ) = y0;\n3 if fY (x0, uY ) = y1, fY (x1, uY ) = y1.\n\nSimilarly, response-function variable RX and response function gX can be de\ufb01ned as\n\n(cid:26)x0\n\nx1\n\ngX (rX ) =\n\nif rX = 0;\nif rX = 1.\n\nrX = (cid:96)X (uX ) =\n\nif fX (uX ) = x0;\nif fX (uX ) = x1.\n\n(cid:88)\n\nAs a result, the joint distribution over X, Y is given by\n\nP (x, y) =\n\nP (rX , rY )I(x; rX )I(y; x, rY ).\n\nrX ,rY\n\n4.2 Expressing Path-speci\ufb01c Counterfactual Fairness\nFor bounding the path-speci\ufb01c counterfactual effect, i.e., PCE\u03c0(s1, s0|o) = P (\u02c6ys1|\u03c0,s0|\u00af\u03c0|o) \u2212\nP (\u02c6ys0|o), we also apply response-function variables to express it. We focus on the expression of\nP (\u02c6ys1|\u03c0,s0|\u00af\u03c0|o), and the expression of P (\u02c6ys0|o) can be similarly obtained as a simpler case. Similar\nto the previous section, we \ufb01rst express P (\u02c6ys1|\u03c0,s0|\u00af\u03c0|o) as the summation over the probabilities\nof certain values of U that satisfy corresponding requirements. However, as described below, the\nrequirements are much more complicated than previous ones due to the integration of intervention,\npath-speci\ufb01c effect, and counterfactual.\nFirstly, since the path-speci\ufb01c counterfactual effect is under a factual condition O = o, values u\nmust satisfy that O(u) = o, i.e., for each O \u2208 O, we must have fO(paO, uO) = o. Secondly, the\npath-speci\ufb01c counterfactual effect is transmitted only along some path set \u03c0. According to [20], for\nthe variables of X that lie on both \u03c0 and \u00af\u03c0, referred to as witness variables/nodes [1], we need to\nconsider two sets of values, one obtained by treating them on \u03c0 and the other obtained by treating\nthem on \u00af\u03c0. Formally, non-protected attributes X are divided into three disjoint sets. We denote by\nW the set of witness variables, denote by A the set of non-witness variables on \u03c0, and denote by\nB the set of non-witness variables on \u00af\u03c0. A simple example is given in Figure 5. We denote the\ninterventional variant of A by As1|\u03c0, the interventional variant of B by Bs0|\u00af\u03c0, the interventional\nvariant of W treated on \u03c0 by Ws1|\u03c0, and the interventional variant of W treated on \u00af\u03c0 by Ws0|\u00af\u03c0.\nThen, P (\u02c6ys1|\u03c0,s0|\u00af\u03c0|o) can be written as\nP (\u02c6ys1|\u03c0,s0|\u00af\u03c0|o) =\n\nP ( \u02c6Ys1|\u03c0,s0|\u00af\u03c0 = y, As1|\u03c0 = a, Bs0|\u00af\u03c0 = b, Ws1|\u03c0 = w1, Ws0|\u00af\u03c0 = w0 | o).\n\n(cid:88)\n\na,b,w1,w0\n\nTo obtain the above joint distribution, in addition to O(u) = o, values u must also satisfy that:\n\n1. As1|\u03c0(u) = a, which means for each A \u2208 A, we must have fA(pa1\n\nA, uA) = a, where pa1\nA\nmeans that if PAA contains S or any witness node W , its value is speci\ufb01ed by s1 or w1 if\nedge S/W \u2192 Y belongs to a path in \u03c0, and speci\ufb01ed by s0 or w0 otherwise;\n\n2. Bs0|\u00af\u03c0(u) = b, which means for each B \u2208 B, we must have fB(pa0\n\nB, uB) = b, where pa0\n\nB\n\nmeans that if PAB contains S or any witness node W , its value is speci\ufb01ed by s0 or w0;\n\n6\n\n\f3. Ws1|\u03c0(u) = w1, which means for each W \u2208 W, we must have fW (pa1\n4. Ws0|\u03c0(u) = w0, which means for each W \u2208 W, we must have fW (pa0\n\nW , uW ) = w1;\nW , uW ) = w0.\n\nThen, by mapping from U to R, we can obtain the requirements for R accordingly. Finally, denoting\nthe values of R that satisfy O(r) = o by ro, we obtain\n(cid:88)\nP (\u02c6ys1|\u03c0,s0|\u00af\u03c0|o) =\nI(\u02c6y; pa1\n\u02c6Y\n\nW , rW )I(w0; pa0\n\nW , rW ), (2)\n\nI(w1; pa1\n\nI(a; pa1\n\n(cid:89)\n\n(cid:89)\n\nI(b; pa0\n\nB, rB)\n\n(cid:89)\n\nA, rA)\n\nP (r)\nP (o)\n\nB\u2208B\n\nW\u2208W\n\n, r \u02c6Y )\n\nA\u2208A\n\na,b,w1\nw0,r\u2208ro\nwhich is still a linear expression of P (r).\nSimilarly, we can obtain\n\n(cid:88)\n\nv(cid:48),r\u2208ro\n\n(cid:89)\n\nV \u2208V(cid:48)\n\nP (\u02c6ys0|o) =\n\nP (r)\nP (o)\n\nI(\u02c6y; pa \u02c6Y , r \u02c6Y )\n\nI(v; paV , rV ),\n\n(3)\n\nwhere V(cid:48) = V\\{S, Y }.\nExample 2. Consider causal graphs shown in Figures 2, 3, 4 and following unidenti\ufb01able causal\neffects: total causal effect TCE(x1, x0) in Figure 2, path-speci\ufb01c effect PE\u03c0(x1, x0) in Figure 3,\nand counterfactual effect CE(x1, x0|x0, y0) in Figure 4. By similarly de\ufb01ning response functions as\nin Example 1, for Figure 2 with R = {RX , RY }, we have\n\nTCE(x1, x0) =\n\nP (rX , rY )I(y; x0, rY ),\n\nP (rX , rY )I(y; x1, rY ) \u2212 (cid:88)\n\nrX ,rY\n\nfor Figure 3 with R = {RX , RW , RZ, RY }, we have\n\nPE\u03c0(x1, x0) =\n\nP (r)I(y; z, w0, rY )I(z; w1, rZ)I(w1; x1, rW )I(w0; x0, rW )\n\nP (r)I(y; z, w, rY )I(z; w, rZ)I(w; x0, rW ),\n\nfor Figure 4 with R = {RX , RY }, we have\nP (rX , rY )\nP (x0, y0)\n\nCE(x1, x0) =\n\nrX ,rY \u2208ro\n\nI(y; x1, rY ) \u2212 (cid:88)\n\nP (rX , rY )\nP (x0, y0)\n\nI(y; x0, rY ).\n\nrX ,rY \u2208ro\n\n(cid:80)\n\nNote that in Figures 2, the total causal effect is identi\ufb01able if UX and UY are independent. This\nis re\ufb02ected in our formulation such that when RX and RY are independent, we have P (yx1) =\nP (rX )P (rY )I(y; x1, rY ) = P (y|x1), which can be directly measured from observational\n\nrX ,rY\n\ndata. Similar phenomenons can be observed in other identi\ufb01able situations.\n\nrX ,rY\n\n(cid:88)\n(cid:88)\n\u2212 (cid:88)\n(cid:88)\n\nz,w1,w0,r\n\nz,w,r\n\nFigure 5: A causal graph with unidenti\ufb01able path-speci\ufb01c counterfactual fairness.\n\nExample 3. Consider a causal graph shown in Figure 5, and the path-speci\ufb01c counterfactual effect\nPCE\u03c0(s1, s0|o) where \u03c0 = {S \u2192 \u02c6Y , S \u2192 W \u2192 A \u2192 \u02c6Y } and o = {s0, w(cid:48), a(cid:48), b(cid:48)}. Any\npair of exogenous variables can be correlated. Response-function variables are given by R =\n{RS, RW , RA, RB, R \u02c6Y }. By similarly de\ufb01ning response functions as in Example 1, we can obtain\nP (\u02c6ys1|\u03c0,s0|\u00af\u03c0|o) =\nI(\u02c6y; a, b, s1, r \u02c6Y )I(a; w1, rA)I(b; w0, rB)I(w1; s1, rW )I(w0; s0, rW ),\n\n(cid:88)\n\na,b,w1,w0\n\nr\u2208ro\n\nP (r)\nP (o)\n\n(cid:88)\n\na,b,w,r\u2208ro\n\nand\n\nP (\u02c6ys0|o) =\n\nP (r)\nP (o)\n\nI(\u02c6y; a, b, s0, r \u02c6Y )I(a; w, rA)I(b; w, rA)I(w; s0, rW ).\n\n7\n\nSW\u02c6YAB\u03c0={S\u2192W\u2192A\u2192\u02c6Y,S\u2192\u02c6Y}\f4.3 Bounding Path-speci\ufb01c Counterfactual Fairness\n\nIn above two sections we express both joint distribution P (v) and the path-speci\ufb01c counterfactual\neffect as linear functions of P (r). All causal models (represented by different P (r)) that agree with\nthe distribution of observational data D cannot be distinguished and should be considered in bounding\nPC fairness. Therefore, \ufb01nding the lower or upper bound of the path-speci\ufb01c counterfactual effect is\nequivalent to \ufb01nding the P (r) that minimizes or maximizes the path-speci\ufb01c counterfactual effect,\nsubject to that the derived joint distribution P (v) agrees with the observational distribution P (D).\nThis fact results in the following linear programming problem for deriving the lower/upper bound of\npath-speci\ufb01c counterfactual effect.\n\nmin/max P (\u02c6ys1|\u03c0,s0|\u00af\u03c0|o) \u2212 P (\u02c6ys0|o),\n\n(cid:88)\n\ns.t.\n\nP (V) = P (D),\n\nP (r) = 1, P (r) \u2265 0,\n\n(4)\n\nr\n\nwhere P (\u02c6ys1|\u03c0,s0|\u00af\u03c0|o) is given by Eq. (2), P (\u02c6ys0|o) is given by Eq. (3), and P (v) is given by\nEquation (1).\nThe lower and upper bounds derived by solving the above optimization problem is guaranteed to\nbe the tightest, since the response function is an equivalent mapping that covers all possible causal\nmodels thus we can explicitly traverse all possible causal models.\nWe use the derived bounds for examining \u03c4-PC fairness: if the upper bound is less than \u03c4 and the\nlower bound is greater than \u2212\u03c4, then \u03c4-PC fairness must be satis\ufb01ed; if the upper bound is less than\n\u2212\u03c4 or the lower bound is greater than \u03c4, \u03c4-PC fairness must not be satis\ufb01ed; otherwise, it is uncertain\nand cannot be determined from data.\n\n5 Experiments\n\nDatasets. For synthetic datasets, we manually build a causal model with complete knowledge of\nexogenous variables and equations using Tetrad [10] according to the causal graphs. The causal\nmodel consists of 4 endogenous variables, S, W , A, \u02c6Y , all of which have two domain values. Then,\nwe consider two versions of the causal model: (1) we assume a shared exogenous variables, i.e., a\nhidden confounder, with 100 domain values (the causal graph is shown in Figure 6); (2) we assume\nall exogenous variables are mutually independent (the causal graph is omitted due to the space\nlimit). The distribution of exogenous variables and structural equations of endogenous variables are\nrandomly assigned. Finally, we generate two datasets from each version of the causal model, denoted\nby D1 and D2 respectively.\nFor the real-world dataset, we adopt the Adult dataset, which consists of 65,123 records with 11\nattributes including edu, sex, income etc. Similar to [14], we select 7 attributes, binarize their values,\nand build the causal graph. Fairness threshold \u03c4 is set to 0.1. The datasets and implementation are\navailable at http://tiny.cc/pc-fairness-code.\nBounding Path-speci\ufb01c Counterfactual Fairness. We use D1 to validate our method in Eq. (4) for\nbounding PCE\u03c0(s+, s\u2212|o) where O = {S, W, A} and \u03c0 = {S \u2192 W \u2192 A \u2192 \u02c6Y , S \u2192 \u02c6Y }. The\nground truth can be computed by exactly executing the intervention under given conditions using the\ncomplete causal model. The results are shown in Table 2, where the \ufb01rst column indicates the indices\nof o\u2019s value combinations. As can be seen, the true values of PCE\u03c0(s+, s\u2212|o) fall into the range of\nour bounds for all value combinations of O, which validates our method.\nComparing with previous bounding methods. We use D2 to compare with the previous methods\n[20, 14] which are derived under the Markovian assumption. We compare with [20] for bounding\nPE\u03c0(s+, s\u2212) with \u03c0 = {S \u2192 W \u2192 A \u2192 \u02c6Y , S \u2192 \u02c6Y }. We also compare with [14] for bounding\nCE(s+, s\u2212|o) with O = {S, W, A}. The results are shown in Table 3 where the bold indicates\nthat our method makes different judgments on discrimination detection due to the tighter bounds.\nAs can be seen, our method achieves much tighter bounds than previous methods, which can be\nused to examine fairness more accurately. For example, when measuring indirect discrimination\nusing PE\u03c0(s+, s\u2212) (Row 1 in Table 3), it is uncertain for [20] since the lower and upper bounds are\n\u22120.2605 and 0.2656, but our method can guarantee that the decision is discriminatory as the lower\n\n8\n\n\fbound 0.1772 is larger than \u03c4 = 0.1. As another example, when measuring counterfactual fairness of\nthe 2nd groups of o using CE(s+, s\u2212|o) (Row 3 in Table 3), the method in [14] is uncertain since\nthe lower and upper bounds are \u22120.4383,\u22120.0212 but our method can guarantee that the decision is\nfair due to the range of [\u22120.0783,\u22120.0212].\nWe also use the Adult datset to compare with the method in [14] for bounding CE(s+, s\u2212|o) with\nO = {age, edu, marital-status} and obtain similar results, which are shown in Table 4.\n\nTable 2: Bounds and ground\ntruth of PC fairness on D1.\nPCE\u03c0(s+, s\u2212|o)\n# of o\nlb\n\nub\n\n1\n2\n3\n4\n\n-0.4548 0.5452\n-0.5565 0.4435\n-0.5065 0.4935\n-0.4598 0.5402\n\nPE\n\nCE\n\nT ruth\n0.1507\n-0.0928\n0.0561\n0.0548\n\nTable 3: Compare with existing methods in [20, 14] on D2.\nOur method\nub\nlb\n\nPrevious methods\n\nub\n\n# of o T ruth\n0.1793\nN/A\n0.3438\n1\n-0.0557\n2\n0.2318\n3\n0.0800\n4\n\nlb\n\n-0.2605\n0.0878\n-0.4383\n-0.1192\n-0.2101\n\n0.2656\n0.5049\n-0.0212\n0.2979\n0.2070\n\n0.1772\n0.0878\n-0.0783\n0.1282\n0.0110\n\n0.1836\n0.5049\n-0.0212\n0.2847\n0.1499\n\nTable 4: Compare with the existing method in [14] on\nthe Adult dataset.\n\nMethod in [14]\nlb\nub\n\nOur Method\nlb\nub\n\n0.0541\n-0.1314\n0.1878\n-0.0356\n0.1676\n-0.1634\n0.1290\n-0.1808\n\n0.2946\n0.1091\n0.3210\n0.0976\n0.5289\n0.1979\n0.4689\n0.1591\n\n0.1498\n-0.1314\n0.2507\n-0.0356\n0.4419\n-0.0731\n0.3942\n0.0014\n\n0.1944\n0.1091\n0.2890\n0.0976\n0.5289\n0.1979\n0.4689\n0.1591\n\n# of o\n\n0\n1\n2\n3\n4\n5\n6\n7\n\nFigure 6: The causal graph for the\nsynthetic dataset D1.\n\n6 Conclusion\n\nIn this paper, we develop a general framework for measuring causality-based fairness. We propose\na uni\ufb01ed de\ufb01nition that covers most of previous causality-based fairness notions, namely the path-\nspeci\ufb01c counterfactual fairness (PC fairness). Then, we formulate a linear programming problem to\nbound PC fairness which can produce the tightest possible bounds. Experiments using synthetic and\nreal-world datasets show that, our method can bound causal effects under any unidenti\ufb01able situation\nor combinations, and achieves tighter bounds than previous methods.\nAs the concern of scalability, the domain size of each response variable is exponential to the number\nof parents, meaning that the joint domain size of all response variables are exponential to the total\nin-degree of the causal graph. However, we notice that not all response variables are needed in the\nformulation, and only those that directly lead to unidenti\ufb01cation are needed. For example, when\na hidden confounder causes unidenti\ufb01cation, only the children of the hidden confounder need to\nhave response variables in the formulation; and when a \u201ckite graph\u201d causes unidenti\ufb01cation, only the\nwitness variable need to have a response variable in the formulation. As a result, the total complexity\nof the problem formulation could be signi\ufb01cantly decreased. How to construct fair predictive models\nbased on the derived bounds is another future research direction. One possible method would be to\nincorporate the bounding formulation into a post-processing method. The new formulation will be\na min-max optimization problem, where the optimization variables will include response variables\nP (r) as well as a post-processing mapping P (\u02dcy|\u02c6y, paY ). The inner optimization is to maximize the\npath-speci\ufb01c counterfactual effect to \ufb01nd the upper bound, and the outer optimization is to minimize\nboth the loss function and the upper bound. We will to explore these ideas in the future work.\n\nAcknowledgments\n\nThis work was supported in part by NSF 1646654, 1920920, and 1940093.\n\n9\n\nSW\u02c6YA\fReferences\n[1] Chen Avin, Ilya Shpitser, and Judea Pearl. Identi\ufb01ability of path-speci\ufb01c effects. In IJCAI-\n05, Proceedings of the Nineteenth International Joint Conference on Arti\ufb01cial Intelligence,\nEdinburgh, Scotland, UK, July 30 - August 5, 2005, pages 357\u2013363, 2005.\n\n[2] Alexander Balke and Judea Pearl. Counterfactual probabilities: Computational methods, bounds\nand applications. In UAI \u201994: Proceedings of the Tenth Annual Conference on Uncertainty in\nArti\ufb01cial Intelligence, Seattle, Washington, USA, July 29-31, 1994, pages 46\u201354, 1994.\n\n[3] Silvia Chiappa. Path-speci\ufb01c counterfactual fairness. In The Thirty-Third AAAI Conference\non Arti\ufb01cial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Arti\ufb01cial\nIntelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in\nArti\ufb01cial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019.,\npages 7801\u20137808, 2019.\n\n[4] Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janz-\ning, and Bernhard Sch\u00f6lkopf. Avoiding discrimination through causal reasoning. In Advances\nin Neural Information Processing Systems 30: Annual Conference on Neural Information\nProcessing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 656\u2013666, 2017.\n\n[5] Matt J. Kusner, Joshua R. Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness.\nIn Advances in Neural Information Processing Systems 30: Annual Conference on Neural\nInformation Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages\n4066\u20134076, 2017.\n\n[6] Daniel Malinsky, Ilya Shpitser, and Thomas S. Richardson. A potential outcomes calculus for\nidentifying conditional path-speci\ufb01c effects. In The 22nd International Conference on Arti\ufb01cial\nIntelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, pages\n3080\u20133088, 2019.\n\n[7] Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. In Proceedings of the Thirty-Second\nAAAI Conference on Arti\ufb01cial Intelligence, (AAAI-18), the 30th innovative Applications of\nArti\ufb01cial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in\nArti\ufb01cial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages\n1931\u20131940, 2018.\n\n[8] Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, New\n\nYork, NY, USA, 2nd edition, 2009.\n\n[9] Chris Russell, Matt J. Kusner, Joshua R. Loftus, and Ricardo Silva. When worlds collide:\nIntegrating different counterfactual assumptions in fairness. In Advances in Neural Information\nProcessing Systems 30: Annual Conference on Neural Information Processing Systems 2017,\n4-9 December 2017, Long Beach, CA, USA, pages 6414\u20136423, 2017.\n\n[10] Richard Scheines, Peter Spirtes, Clark Glymour, Christopher Meek, and Thomas Richardson.\nThe TETRAD Project: Constraint Based Aids to Causal Model Speci\ufb01cation. Multivariate\nBehavioral Research, 33(1):65\u2013117, January 1998.\n\n[11] Ilya Shpitser and Judea Pearl. What counterfactuals can be tested. In UAI 2007, Proceedings of\nthe Twenty-Third Conference on Uncertainty in Arti\ufb01cial Intelligence, Vancouver, BC, Canada,\nJuly 19-22, 2007, pages 352\u2013359, 2007.\n\n[12] Ilya Shpitser and Judea Pearl. Complete identi\ufb01cation methods for the causal hierarchy. J.\n\nMach. Learn. Res., 9:1941\u20131979, 2008.\n\n[13] Jin Tian and Judea Pearl. Probabilities of causation: Bounds and identi\ufb01cation. In UAI \u201900:\nProceedings of the 16th Conference in Uncertainty in Arti\ufb01cial Intelligence, Stanford University,\nStanford, California, USA, June 30 - July 3, 2000, pages 589\u2013598, 2000.\n\n[14] Yongkai Wu, Lu Zhang, and Xintao Wu. Counterfactual fairness: Unidenti\ufb01cation, bound and\nalgorithm. In Proceedings of the Twenty-Eighth International Joint Conference on Arti\ufb01cial\nIntelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 1438\u20131444, 2019.\n\n10\n\n\f[15] Junzhe Zhang and Elias Bareinboim. Equality of opportunity in classi\ufb01cation: A causal\napproach. In Advances in Neural Information Processing Systems 31: Annual Conference on\nNeural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr\u00e9al,\nCanada., pages 3675\u20133685. 2018.\n\n[16] Junzhe Zhang and Elias Bareinboim. Fairness in decision-making - the causal explanation\nIn Proceedings of the Thirty-Second AAAI Conference on Arti\ufb01cial Intelligence,\nformula.\n(AAAI-18), the 30th innovative Applications of Arti\ufb01cial Intelligence (IAAI-18), and the 8th\nAAAI Symposium on Educational Advances in Arti\ufb01cial Intelligence (EAAI-18), New Orleans,\nLouisiana, USA, February 2-7, 2018, pages 2037\u20132045, 2018.\n\n[17] Lu Zhang, Yongkai Wu, and Xintao Wu. Situation testing-based discrimination discovery: A\ncausal inference approach. In Proceedings of the Twenty-Fifth International Joint Conference\non Arti\ufb01cial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pages 2718\u20132724,\n2016.\n\n[18] Lu Zhang, Yongkai Wu, and Xintao Wu. Achieving non-discrimination in data release. In\nProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and\nData Mining, Halifax, NS, Canada, August 13 - 17, 2017, pages 1335\u20131344, 2017.\n\n[19] Lu Zhang, Yongkai Wu, and Xintao Wu. A causal framework for discovering and removing\nIn Proceedings of the Twenty-Sixth International Joint\ndirect and indirect discrimination.\nConference on Arti\ufb01cial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017,\npages 3929\u20133935, 2017.\n\n[20] Lu Zhang, Yongkai Wu, and Xintao Wu. Causal Modeling-Based Discrimination Discovery\nand Removal: Criteria, Bounds, and Algorithms. IEEE Transactions on Knowledge and Data\nEngineering, pages 1\u20131, 2018.\n\n11\n\n\f", "award": [], "sourceid": 1879, "authors": [{"given_name": "Yongkai", "family_name": "Wu", "institution": "University of Arkansas"}, {"given_name": "Lu", "family_name": "Zhang", "institution": "University of Arkansas"}, {"given_name": "Xintao", "family_name": "Wu", "institution": "University of Arkansas"}, {"given_name": "Hanghang", "family_name": "Tong", "institution": "Arizona State University"}]}