{"title": "Counterfactual Fairness", "book": "Advances in Neural Information Processing Systems", "page_first": 4066, "page_last": 4076, "abstract": "Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices. In this paper, we develop a framework for modeling fairness using tools from causal inference. Our definition of counterfactual fairness captures the intuition that a decision is fair towards an individual if it the same in (a) the actual world and (b) a counterfactual world where the individual belonged to a different demographic group. We demonstrate our framework on a real-world problem of fair prediction of success in law school.", "full_text": "Counterfactual Fairness\n\nChris Russell \u2217\n\nThe Alan Turing Institute and\n\nUniversity of Surrey\n\ncrussell@turing.ac.uk\n\nMatt Kusner \u2217\n\nThe Alan Turing Institute and\n\nUniversity of Warwick\n\nmkusner@turing.ac.uk\n\nJoshua Loftus \u2217\n\nNew York University\nloftus@nyu.edu\n\nRicardo Silva\n\nThe Alan Turing Institute and\nUniversity College London\nricardo@stats.ucl.ac.uk\n\nAbstract\n\nMachine learning can impact people with legal or ethical consequences when\nit is used to automate decisions in areas such as insurance, lending, hiring, and\npredictive policing. In many of these scenarios, previous decisions have been made\nthat are unfairly biased against certain subpopulations, for example those of a\nparticular race, gender, or sexual orientation. Since this past data may be biased,\nmachine learning predictors must account for this to avoid perpetuating or creating\ndiscriminatory practices. In this paper, we develop a framework for modeling\nfairness using tools from causal inference. Our de\ufb01nition of counterfactual fairness\ncaptures the intuition that a decision is fair towards an individual if it the same in\n(a) the actual world and (b) a counterfactual world where the individual belonged\nto a different demographic group. We demonstrate our framework on a real-world\nproblem of fair prediction of success in law school.\n\n1 Contribution\n\nMachine learning has spread to \ufb01elds as diverse as credit scoring [20], crime prediction [5], and loan\nassessment [25]. Decisions in these areas may have ethical or legal implications, so it is necessary for\nthe modeler to think beyond the objective of maximizing prediction accuracy and consider the societal\nimpact of their work. For many of these applications, it is crucial to ask if the predictions of a model\nare fair. Training data can contain unfairness for reasons having to do with historical prejudices or\nother factors outside an individual\u2019s control. In 2016, the Obama administration released a report2\nwhich urged data scientists to analyze \u201chow technologies can deliberately or inadvertently perpetuate,\nexacerbate, or mask discrimination.\"\nThere has been much recent interest in designing algorithms that make fair predictions [4, 6, 10,\n12, 14, 16\u201319, 22, 24, 36\u201339]. In large part, the literature has focused on formalizing fairness\ninto quantitative de\ufb01nitions and using them to solve a discrimination problem in a certain dataset.\nUnfortunately, for a practitioner, law-maker, judge, or anyone else who is interested in implementing\nalgorithms that control for discrimination, it can be dif\ufb01cult to decide which de\ufb01nition of fairness to\nchoose for the task at hand. Indeed, we demonstrate that depending on the relationship between a\nprotected attribute and the data, certain de\ufb01nitions of fairness can actually increase discrimination.\n\n\u2217Equal contribution. This work was done while JL was a Research Fellow at the Alan Turing Institute.\n2https://obamawhitehouse.archives.gov/blog/2016/05/04/big-risks-big-opportunities-intersection-big-data-\n\nand-civil-rights\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fIn this paper, we introduce the \ufb01rst explicitly causal approach to address fairness. Speci\ufb01cally, we\nleverage the causal framework of Pearl [30] to model the relationship between protected attributes\nand data. We describe how techniques from causal inference can be effective tools for designing fair\nalgorithms and argue, as in DeDeo [9], that it is essential to properly address causality in fairness. In\nperhaps the most closely related prior work, Johnson et al. [15] make similar arguments but from a\nnon-causal perspective. An alternative use of causal modeling in the context of fairness is introduced\nindependently by [21].\nIn Section 2, we provide a summary of basic concepts in fairness and causal modeling. In Section 3,\nwe provide the formal de\ufb01nition of counterfactual fairness, which enforces that a distribution over\npossible predictions for an individual should remain unchanged in a world where an individual\u2019s\nprotected attributes had been different in a causal sense. In Section 4, we describe an algorithm to\nimplement this de\ufb01nition, while distinguishing it from existing approaches. In Section 5, we illustrate\nthe algorithm with a case of fair assessment of law school success.\n\n2 Background\n\nThis section provides a basic account of two separate areas of research in machine learning, which\nare formally uni\ufb01ed in this paper. We suggest Berk et al. [1] and Pearl et al. [29] as references.\nThroughout this paper, we will use the following notation. Let A denote the set of protected attributes\nof an individual, variables that must not be discriminated against in a formal sense de\ufb01ned differently\nby each notion of fairness discussed. The decision of whether an attribute is protected or not is taken\nas a primitive in any given problem, regardless of the de\ufb01nition of fairness adopted. Moreover, let\nX denote the other observable attributes of any particular individual, U the set of relevant latent\nattributes which are not observed, and let Y denote the outcome to be predicted, which itself might\nbe contaminated with historical biases. Finally, \u02c6Y is the predictor, a random variable that depends on\nA, X and U, and which is produced by a machine learning algorithm as a prediction of Y .\n\n2.1 Fairness\n\nThere has been much recent work on fair algorithms. These include fairness through unawareness\n[12], individual fairness [10, 16, 24, 38], demographic parity/disparate impact [36], and equality of\nopportunity [14, 37]. For simplicity we often assume A is encoded as a binary attribute, but this can\nbe generalized.\nDe\ufb01nition 1 (Fairness Through Unawareness (FTU)). An algorithm is fair so long as any protected\nattributes A are not explicitly used in the decision-making process.\nAny mapping \u02c6Y : X \u2192 Y that excludes A satis\ufb01es this. Initially proposed as a baseline, the approach\nhas found favor recently with more general approaches such as Grgic-Hlaca et al. [12]. Despite its\ncompelling simplicity, FTU has a clear shortcoming as elements of X can contain discriminatory\ninformation analogous to A that may not be obvious at \ufb01rst. The need for expert knowledge in\nassessing the relationship between A and X was highlighted in the work on individual fairness:\nDe\ufb01nition 2 (Individual Fairness (IF)). An algorithm is fair if it gives similar predictions to similar\nindividuals. Formally, given a metric d(\u00b7,\u00b7), if individuals i and j are similar under this metric (i.e.,\nd(i, j) is small) then their predictions should be similar: \u02c6Y (X (i), A(i)) \u2248 \u02c6Y (X (j), A(j)).\nAs described in [10], the metric d(\u00b7,\u00b7) must be carefully chosen, requiring an understanding of the\ndomain at hand beyond black-box statistical modeling. This can also be contrasted against population\nlevel criteria such as\nDe\ufb01nition 3 (Demographic Parity (DP)). A predictor \u02c6Y satis\ufb01es demographic parity if P ( \u02c6Y |A =\n0) = P ( \u02c6Y |A = 1).\nDe\ufb01nition 4 (Equality of Opportunity (EO)). A predictor \u02c6Y satis\ufb01es equality of opportunity if\nP ( \u02c6Y = 1|A = 0, Y = 1) = P ( \u02c6Y = 1|A = 1, Y = 1).\nThese criteria can be incompatible in general, as discussed in [1, 7, 22]. Following the motivation of\nIF and [15], we propose that knowledge about relationships between all attributes should be taken\ninto consideration, even if strong assumptions are necessary. Moreover, it is not immediately clear\n\n2\n\n\ffor any of these approaches in which ways historical biases can be tackled. We approach such issues\nfrom an explicit causal modeling perspective.\n\n2.2 Causal Models and Counterfactuals\n\nWe follow Pearl [28], and de\ufb01ne a causal model as a triple (U, V, F ) of sets such that\n\n\u2022 U is a set of latent background variables,which are factors not caused by any variable in\nthe set V of observable variables;\n\u2022 F is a set of functions {f1, . . . , fn}, one for each Vi \u2208 V , such that Vi = fi(pai, Upai),\npai \u2286 V \\{Vi} and Upai \u2286 U. Such equations are also known as structural equations [2].\nThe notation \u201cpai\u201d refers to the \u201cparents\u201d of Vi and is motivated by the assumption that the model\nfactorizes as a directed graph, here assumed to be a directed acyclic graph (DAG). The model is causal\nin that, given a distribution P (U ) over the background variables U, we can derive the distribution of a\nsubset Z \u2286 V following an intervention on V \\ Z. An intervention on variable Vi is the substitution\nof equation Vi = fi(pai, Upai ) with the equation Vi = v for some v. This captures the idea of an\nagent, external to the system, modifying it by forcefully assigning value v to Vi, for example as in a\nrandomized experiment.\nThe speci\ufb01cation of F is a strong assumption but allows for the calculation of counterfactual\nquantities. In brief, consider the following counterfactual statement, \u201cthe value of Y if Z had taken\nvalue z\u201d, for two observable variables Z and Y . By assumption, the state of any observable variable is\nfully determined by the background variables and structural equations. The counterfactual is modeled\nas the solution for Y for a given U = u where the equations for Z are replaced with Z = z. We\ndenote it by YZ\u2190z(u) [28], and sometimes as Yz if the context of the notation is clear.\nCounterfactual inference, as speci\ufb01ed by a causal model (U, V, F ) given evidence W , is the computa-\ntion of probabilities P (YZ\u2190z(U ) | W = w), where W , Z and Y are subsets of V . Inference proceeds\nin three steps, as explained in more detail in Chapter 4 of Pearl et al. [29]: 1. Abduction: for a given\nprior on U, compute the posterior distribution of U given the evidence W = w; 2. Action: substitute\nthe equations for Z with the interventional values z, resulting in the modi\ufb01ed set of equations Fz;\n3. Prediction: compute the implied distribution on the remaining elements of V using Fz and the\nposterior P (U |W = w).\n3 Counterfactual Fairness\n\nGiven a predictive problem with fairness considerations, where A, X and Y represent the protected\nattributes, remaining attributes, and output of interest respectively, let us assume that we are given a\ncausal model (U, V, F ), where V \u2261 A \u222a X. We postulate the following criterion for predictors of Y .\nDe\ufb01nition 5 (Counterfactual fairness). Predictor \u02c6Y is counterfactually fair if under any context\nX = x and A = a,\n\nP ( \u02c6YA\u2190a (U ) = y | X = x, A = a) = P ( \u02c6YA\u2190a(cid:48)(U ) = y | X = x, A = a),\n\n(1)\n\nfor all y and for any value a(cid:48) attainable by A.\nThis notion is closely related to actual causes [13], or token causality in the sense that, to be fair,\nA should not be a cause of \u02c6Y in any individual instance. In other words, changing A while holding\nthings which are not causally dependent on A constant will not change the distribution of \u02c6Y . We also\nemphasize that counterfactual fairness is an individual-level de\ufb01nition. This is substantially different\nfrom comparing different individuals that happen to share the same \u201ctreatment\u201d A = a and coincide\non the values of X, as discussed in Section 4.3.1 of [29] and the Supplementary Material. Differences\nbetween Xa and Xa(cid:48) must be caused by variations on A only. Notice also that this de\ufb01nition is\nagnostic with respect to how good a predictor \u02c6Y is, which we discuss in Section 4.\nRelation to individual fairness. IF is agnostic with respect to its notion of similarity metric, which\nis both a strength (generality) and a weakness (no uni\ufb01ed way of de\ufb01ning similarity). Counterfactuals\nand similarities are related, as in the classical notion of distances between \u201cworlds\u201d corresponding to\ndifferent counterfactuals [23]. If \u02c6Y is a deterministic function of W \u2282 A \u222a X \u222a U, as in several of\n\n3\n\n\fFigure 1: (a), (b) Two causal models for different real-world fair prediction scenarios. See Section 3.1\nfor discussion. (c) The graph corresponding to a causal model with A being the protected attribute and\nY some outcome of interest, with background variables assumed to be independent. (d) Expanding\nthe model to include an intermediate variable indicating whether the individual is employed with\ntwo (latent) background variables Prejudiced (if the person offering the job is prejudiced) and\nQuali\ufb01cations (a measure of the individual\u2019s quali\ufb01cations). (e) A twin network representation of\nthis system [28] under two different counterfactual levels for A. This is created by copying nodes\ndescending from A, which inherit unaffected parents from the factual world.\n\nour examples to follow, then IF can be de\ufb01ned by treating equally two individuals with the same W\nin a way that is also counterfactually fair.\nRelation to Pearl et al. [29]. In Example 4.4.4 of [29], the authors condition instead on X, A, and\nthe observed realization of \u02c6Y , and calculate the probability of the counterfactual realization \u02c6YA\u2190a(cid:48)\ndiffering from the factual. This example con\ufb02ates the predictor \u02c6Y with the outcome Y , of which\nwe remain agnostic in our de\ufb01nition but which is used in the construction of \u02c6Y as in Section 4. Our\nframing makes the connection to machine learning more explicit.\n\n3.1 Examples\n\nTo provide an intuition for counterfactual fairness, we will consider two real-world fair prediction sce-\nnarios: insurance pricing and crime prediction. Each of these correspond to one of the two causal\ngraphs in Figure 1(a),(b). The Supplementary Material provides a more mathematical discussion of\nthese examples with more detailed insights.\n\nScenario 1: The Red Car. A car insurance company wishes to price insurance for car owners\nby predicting their accident rate Y . They assume there is an unobserved factor corresponding to\naggressive driving U, that (a) causes drivers to be more likely have an accident, and (b) causes\nindividuals to prefer red cars (the observed variable X). Moreover, individuals belonging to a\ncertain race A are more likely to drive red cars. However, these individuals are no more likely to be\naggressive or to get in accidents than any one else. We show this in Figure 1(a). Thus, using the\nred car feature X to predict accident rate Y would seem to be an unfair prediction because it may\ncharge individuals of a certain race more than others, even though no race is more likely to have an\naccident. Counterfactual fairness agrees with this notion: changing A while holding U \ufb01xed will also\nchange X and, consequently, \u02c6Y . Interestingly, we can show (Supplementary Material) that in a linear\nmodel, regressing Y on A and X is equivalent to regressing on U, so off-the-shelf regression here is\ncounterfactually fair. Regressing Y on X alone obeys the FTU criterion but is not counterfactually\nfair, so omitting A (FTU) may introduce unfairness into an otherwise fair world.\n\nScenario 2: High Crime Regions. A city government wants to estimate crime rates by neighbor-\nhood to allocate policing resources. Its analyst constructed training data by merging (1) a registry of\nresidents containing their neighborhood X and race A, with (2) police records of arrests, giving each\nresident a binary label with Y = 1 indicating a criminal arrest record. Due to historically segregated\nhousing, the location X depends on A. Locations X with more police resources have larger numbers\nof arrests Y . And \ufb01nally, U represents the totality of socioeconomic factors and policing practices\nthat both in\ufb02uence where an individual may live and how likely they are to be arrested and charged.\nThis can all be seen in Figure 1(b).\nIn this example, higher observed arrest rates in some neighborhoods are due to greater policing there,\nnot because people of different races are any more or less likely to break the law. The label Y = 0\n\n4\n\nAXYUAXYU(a)(b)AYUYUAEmployedAYUYUAPrejudicedQuali\ufb01cationsaEmployedaYaEmployedYa0a0a0EmployedAYUYUAPrejudicedQuali\ufb01cations(c)(d)(e)\fdoes not mean someone has never committed a crime, but rather that they have not been caught. If\nindividuals in the training data have not already had equal opportunity, algorithms enforcing EO will\nnot remedy such unfairness. In contrast, a counterfactually fair approach would model differential\nenforcement rates using U and base predictions on this information rather than on X directly.\nIn general, we need a multistage procedure in which we \ufb01rst derive latent variables U, and then based\non them we minimize some loss with respect to Y . This is the core of the algorithm discussed next.\n\n3.2\n\nImplications\n\nOne simple but important implication of the de\ufb01nition of counterfactual fairness is the following:\nLemma 1. Let G be the causal graph of the given model (U, V, F ). Then \u02c6Y will be counterfactually\nfair if it is a function of the non-descendants of A.\n\nProof. Let W be any non-descendant of A in G. Then WA\u2190a(U ) and WA\u2190a(cid:48)(U ) have the same\ndistribution by the three inferential steps in Section 2.2. Hence, the distribution of any function \u02c6Y of\nthe non-descendants of A is invariant with respect to the counterfactual values of A.\n\nThis does not exclude using a descendant W of A as a possible input to \u02c6Y . However, this will only\nbe possible in the case where the overall dependence of \u02c6Y on A disappears, which will not happen in\ngeneral. Hence, Lemma 1 provides the most straightforward way to achieve counterfactual fairness.\nIn some scenarios, it is desirable to de\ufb01ne path-speci\ufb01c variations of counterfactual fairness that allow\nfor the inclusion of some descendants of A, as discussed by [21, 27] and the Supplementary Material.\nAncestral closure of protected attributes. Suppose that a parent of a member of A is not in A.\nCounterfactual fairness allows for the use of it in the de\ufb01nition of \u02c6Y . If this seems counterintuitive,\nthen we argue that the fault should be at the postulated set of protected attributes rather than with the\nde\ufb01nition of counterfactual fairness, and that typically we should expect set A to be closed under\nancestral relationships given by the causal graph. For instance, if Race is a protected attribute, and\nMother\u2019s race is a parent of Race, then it should also be in A.\nDealing with historical biases and an existing fairness paradox. The explicit difference between\n\u02c6Y and Y allows us to tackle historical biases. For instance, let Y be an indicator of whether a client\ndefaults on a loan, while \u02c6Y is the actual decision of giving the loan. Consider the DAG A \u2192 Y ,\nshown in Figure 1(c) with the explicit inclusion of set U of independent background variables. Y is\nthe objectively ideal measure for decision making, the binary indicator of the event that the individual\ndefaults on a loan. If A is postulated to be a protected attribute, then the predictor \u02c6Y = Y = fY (A, U )\nis not counterfactually fair, with the arrow A \u2192 Y being (for instance) the result of a world that\npunishes individuals in a way that is out of their control. Figure 1(d) shows a \ufb01ner-grained model,\nwhere the path is mediated by a measure of whether the person is employed, which is itself caused\nby two background factors: one representing whether the person hiring is prejudiced, and the other\nthe employee\u2019s quali\ufb01cations. In this world, A is a cause of defaulting, even if mediated by other\nvariables3. The counterfactual fairness principle however forbids us from using Y : using the twin\nnetwork 4 of Pearl [28], we see in Figure 1(e) that Ya and Ya(cid:48) need not be identically distributed\ngiven the background variables.\nIn contrast, any function of variables not descendants of A can be used a basis for fair decision\nmaking. This means that any variable \u02c6Y de\ufb01ned by \u02c6Y = g(U ) will be counterfactually fair for any\nfunction g(\u00b7). Hence, given a causal model, the functional de\ufb01ned by the function g(\u00b7) minimizing\nsome predictive error for Y will satisfy the criterion, as proposed in Section 4.1. We are essentially\nlearning a projection of Y into the space of fair decisions, removing historical biases as a by-product.\nCounterfactual fairness also provides an answer to some problems on the incompatibility of fairness\ncriteria. In particular, consider the following problem raised independently by different authors (e.g.,\n3For example, if the function determining employment fE(A, P, Q) \u2261 I(Q>0,P =0 or A(cid:54)=a) then an individual\nwith suf\ufb01cient quali\ufb01cations and prejudiced potential employer may have a different counterfactual employment\nvalue for A = a compared to A = a(cid:48), and a different chance of default.\n\n4In a nutshell, this is a graph that simultaneously depicts \u201cmultiple worlds\u201d parallel to the factual realizations.\nIn this graph, all multiple worlds share the same background variables, but with different consequences in the\nremaining variables depending on which counterfactual assignments are provided.\n\n5\n\n\f[7, 22]), illustrated below for the binary case: ideally, we would like our predictors to obey both\nEquality of Opportunity and the predictive parity criterion de\ufb01ned by satisfying\n\nP (Y = 1 | \u02c6Y = 1, A = 1) = P (Y = 1 | \u02c6Y = 1, A = 0),\n\nas well as the corresponding equation for \u02c6Y = 0. It has been shown that if Y and A are marginally\nassociated (e.g., recidivism and race are associated) and Y is not a deterministic function of \u02c6Y ,\nthen the two criteria cannot be reconciled. Counterfactual fairness throws a light in this scenario,\nsuggesting that both EO and predictive parity may be insuf\ufb01cient if Y and A are associated: assuming\nthat A and Y are unconfounded (as expected for demographic attributes), this is the result of A being\na cause of Y . By counterfactual fairness, we should not want to use Y as a basis for our decisions,\ninstead aiming at some function Y\u22a5A of variables which are not caused by A but are predictive of Y .\n\u02c6Y is de\ufb01ned in such a way that is an estimate of the \u201cclosest\u201d Y\u22a5A to Y according to some preferred\nrisk function. This makes the incompatibility between EO and predictive parity irrelevant, as A and\nY\u22a5A will be independent by construction given the model assumptions.\n\n4\n\nImplementing Counterfactual Fairness\n\nAs discussed in the previous Section, we need to relate \u02c6Y to Y if the predictor is to be useful, and we\nrestrict \u02c6Y to be a (parameterized) function of the non-descendants of A in the causal graph following\nLemma 1. We next introduce an algorithm, then discuss assumptions that can be used to express\ncounterfactuals.\n\n4.1 Algorithm\n\nE[l(y(i), g\u03b8(U (i), x(i)\n\ni=1\n\n(cid:48)), g\u03b8(U (i\n\n1 , . . . , U (i)\n\nLet \u02c6Y \u2261 g\u03b8(U, X(cid:7)A) be a predictor parameterized by \u03b8, such as a logistic regression or a neural\nnetwork, and where X(cid:7)A \u2286 X are non-descendants of A. Given a loss function l(\u00b7,\u00b7) such as\nsquared loss or log-likelihood, and training data D \u2261 {(A(i), X (i), Y (i))} for i = 1, 2, . . . , n, we\nde\ufb01ne L(\u03b8) \u2261(cid:80)n\n(cid:7)A)) | x(i), a(i)]/n as the empirical loss to be minimized\nwith respect to \u03b8. Each expectation is with respect to random variable U (i) \u223c PM(U | x(i), a(i))\nwhere PM(U | x, a) is the conditional distribution of the background variables as given by a causal\nmodel M that is available by assumption. If this expectation cannot be calculated analytically,\nMarkov chain Monte Carlo (MCMC) can be used to approximate it as in the following algorithm.\n1: procedure FAIRLEARNING(D,M)\n(cid:46) Learned parameters \u02c6\u03b8\nFor each data point i \u2208 D, sample m MCMC samples U (i)\nm \u223c PM(U | x(i), a(i)).\n2:\nLet D(cid:48) be the augmented dataset where each point (a(i), x(i), y(i)) in D is replaced with the\n3:\ncorresponding m points {(a(i), x(i), y(i), u(i)\nj )}.\n(cid:48))\n\u02c6\u03b8 \u2190 argmin\u03b8(cid:80)i(cid:48)\u2208D(cid:48) l(y(i\n(cid:48)), x(i\n(cid:7)A)).\n\n4:\n5: end procedure\nAt prediction time, we report \u02dcY \u2261 E[ \u02c6Y (U (cid:63), x(cid:63)(cid:7)A) | x(cid:63), a(cid:63)] for a new data point (a(cid:63), x(cid:63)).\nDeconvolution perspective. The algorithm can be understood as a deconvolution approach that,\ngiven observables A \u222a X, extracts its latent sources and pipelines them into a predictive model. We\nadvocate that counterfactual assumptions must underlie all approaches that claim to extract the\nsources of variation of the data as \u201cfair\u201d latent components. As an example, Louizos et al. [24] start\nfrom the DAG A \u2192 X \u2190 U to extract P (U | X, A). As U and A are not independent given X in this\nrepresentation, a type of penalization is enforced to create a posterior Pf air(U |A, X) that is close\nto the model posterior P (U | A, X) while satisfying Pf air(U |A = a, X) \u2248 Pf air(U |A = a(cid:48), X).\nBut this is neither necessary nor suf\ufb01cient for counterfactual fairness. The model for X given A\nand U must be justi\ufb01ed by a causal mechanism, and that being the case, P (U | A, X) requires no\npostprocessing. As a matter of fact, model M can be learned by penalizing empirical dependence\nmeasures between U and pai for a given Vi (e.g. Mooij et al. [26]), but this concerns M and not \u02c6Y ,\nand is motivated by explicit assumptions about structural equations, as described next.\n\n6\n\n\f4.2 Designing the Input Causal Model\nModel M must be provided to algorithm FAIRLEARNING. Although this is well understood, it is\nworthwhile remembering that causal models always require strong assumptions, even more so when\nmaking counterfactual claims [8]. Counterfactuals assumptions such as structural equations are in\ngeneral unfalsi\ufb01able even if interventional data for all variables is available. This is because there\nare in\ufb01nitely many structural equations compatible with the same observable distribution [28], be it\nobservational or interventional. Having passed testable implications, the remaining components of a\ncounterfactual model should be understood as conjectures formulated according to the best of our\nknowledge. Such models should be deemed provisional and prone to modi\ufb01cations if, for example,\nnew data containing measurement of variables previously hidden contradict the current model.\nWe point out that we do not need to specify a fully deterministic model, and structural equations can\nbe relaxed as conditional distributions. In particular, the concept of counterfactual fairness holds\nunder three levels of assumptions of increasing strength:\nLevel 1. Build \u02c6Y using only the observable non-descendants of A. This only requires partial\ncausal ordering and no further causal assumptions, but in many problems there will be few, if any,\nobservables which are not descendants of protected demographic factors.\nLevel 2. Postulate background latent variables that act as non-deterministic causes of observable\nvariables, based on explicit domain knowledge and learning algorithms5. Information about X is\npassed to \u02c6Y via P (U | x, a).\nLevel 3. Postulate a fully deterministic model with latent variables. For instance, the distribution\nP (Vi | pai) can be treated as an additive error model, Vi = fi(pai)+ei [31]. The error term ei then\nbecomes an input to \u02c6Y as calculated from the observed variables. This maximizes the information\nextracted by the fair predictor \u02c6Y .\n\n4.3 Further Considerations on Designing the Input Causal Model\n\nOne might ask what we can lose by de\ufb01ning causal fairness measures involving only non-\ncounterfactual causal quantities, such as enforcing P ( \u02c6Y = 1 | do(A = a)) = P ( \u02c6Y = 1 | do(A = a(cid:48)))\ninstead of our counterfactual criterion. The reason is that the above equation is only a constraint\non an average effect. Obeying this criterion provides no guarantees against, for example, having\nhalf of the individuals being strongly \u201cnegatively\u201d discriminated and half of the individuals strongly\n\u201cpositively\u201d discriminated. We advocate that, for fairness, society should not be satis\ufb01ed in pursuing\nonly counterfactually-free guarantees. While one may be willing to claim posthoc that the equation\nabove masks no balancing effect so that individuals receive approximately the same distribution of\noutcomes, that itself is just a counterfactual claim in disguise. Our approach is to make counterfactual\nassumptions explicit. When unfairness is judged to follow only some \u201cpathways\u201d in the causal graph\n(in a sense that can be made formal, see [21, 27]), nonparametric assumptions about the independence\nof counterfactuals may suf\ufb01ce, as discussed by [27]. In general, nonparametric assumptions may not\nprovide identi\ufb01able adjustments even in this case, as also discussed in our Supplementary Material.\nIf competing models with different untestable assumptions are available, there are ways of simultane-\nously enforcing a notion of approximate counterfactual fairness in all of them, as introduced by us in\n[32]. Other alternatives include exploiting bounds on the contribution of hidden variables [29, 33].\nAnother issue is the interpretation of causal claims involving demographic variables such as race\nand sex. Our view is that such constructs are the result of translating complex events into random\nvariables and, despite some controversy, we consider counterproductive to claim that e.g. race and sex\ncannot be causes. An idealized intervention on some A at a particular time can be seen as a notational\nshortcut to express a conjunction of more speci\ufb01c interventions, which may be individually doable\nbut jointly impossible in practice. It is the plausibility of complex, even if impossible to practically\nmanipulate, causal chains from A to Y that allows us to claim that unfairness is real [11]. Experiments\nfor constructs exist, such as randomizing names in job applications to make them race-blind. They do\nnot contradict the notion of race as a cause, and can be interpreted as an intervention on a particular\naspect of the construct \u201crace,\u201d such as \u201crace perception\u201d (e.g. Section 4.4.4 of [29]).\n\n5In some domains, it is actually common to build a model entirely around latent constructs with few or no\n\nobservable parents nor connections among observed variables [2].\n\n7\n\n\f5\n\nIllustration: Law School Success\n\nWe illustrate our approach on a practical problem that requires fairness, the prediction of success in\nlaw school. A second problem, understanding the contribution of race to police stops, is described in\nthe Supplementary Material. Following closely the usual framework for assessing causal models in\nthe machine learning literature, the goal of this experiment is to quantify how our algorithm behaves\nwith \ufb01nite sample sizes while assuming ground truth compatible with a synthetic model.\nProblem de\ufb01nition: Law school success\nThe Law School Admission Council conducted a survey across 163 law schools in the United States\n[35]. It contains information on 21,790 law students such as their entrance exam scores (LSAT), their\ngrade-point average (GPA) collected prior to law school, and their \ufb01rst year average grade (FYA).\nGiven this data, a school may wish to predict if an applicant will have a high FYA. The school would\nalso like to make sure these predictions are not biased by an individual\u2019s race and sex. However, the\nLSAT, GPA, and FYA scores, may be biased due to social factors. We compare our framework with\ntwo unfair baselines: 1. Full: the standard technique of using all features, including sensitive features\nsuch as race and sex to make predictions; 2. Unaware: fairness through unawareness, where we\ndo not use race and sex as features. For comparison, we generate predictors \u02c6Y for all models using\nlogistic regression.\n\nFair prediction. As described in Section 4.2, there are three ways in which we can model a\ncounterfactually fair predictor of FYA. Level 1 uses any features which are not descendants of race\nand sex for prediction. Level 2 models latent \u2018fair\u2019 variables which are parents of observed variables.\nThese variables are independent of both race and sex. Level 3 models the data using an additive error\nmodel, and uses the independent error terms to make predictions. These models make increasingly\nstrong assumptions corresponding to increased predictive power. We split the dataset 80/20 into a\ntrain/test set, preserving label balance, to evaluate the models.\nAs we believe LSAT, GPA, and FYA are all biased by race and sex, we cannot use any observed\nfeatures to construct a counterfactually fair predictor as described in Level 1.\nIn Level 2, we postulate that a latent variable: a student\u2019s knowledge (K), affects GPA, LSAT, and\nFYA scores. The causal graph corresponding to this model is shown in Figure 2, (Level 2). This is a\nshort-hand for the distributions:\n\nGPA \u223c N (bG + wK\n\nG K + wR\nLSAT \u223c Poisson(exp(bL + wK\nWe perform inference on this model using an observed training set to estimate the posterior distribution\nof K. We use the probabilistic programming language Stan [34] to learn K. We call the predictor\nconstructed using K, Fair K.\n\nF K + wR\nK \u223c N (0, 1)\n\nGR + wS\nL K + wR\n\nGS, \u03c3G),\nL R + wS\n\nLS)),\n\nFYA \u223c N (wK\n\nF R + wS\n\nF S, 1),\n\nFigure 2: Left: A causal model for the problem of predicting law school success fairly. Right:\nDensity plots of predicted FYAa and FYAa(cid:48).\n\nIn Level 3, we model GPA, LSAT, and FYA as continuous variables with additive error terms\nindependent of race and sex (that may in turn be correlated with one-another). This model is shown\n\n8\n\nKnowGPALSATFYARaceSexGPALSATFYARaceSexLevel 2Level 3\u270fG\u270fL\u270fF0123\u22121.0\u22120.50.00.5pred_zfyadensitytypeoriginalswapped0123\u22121.0\u22120.50.00.5pred_zfyadensitytypeoriginalswapped0123\u22121.0\u22120.50.00.5pred_zfyadensitytypeoriginalswapped0123\u22121.0\u22120.50.00.5pred_zfyadensitytypeoriginalswapped0.00.51.01.52.0\u22120.50.00.5pred_zfyadensitytypeoriginalswapped0.00.51.01.52.0\u22120.40.00.40.8pred_zfyadensitytypeoriginalswapped0.00.51.01.52.0\u22120.40.00.40.8pred_zfyadensitytypeoriginalswapped0.00.51.01.52.0\u22120.40.00.40.8pred_zfyadensitytypeoriginalswappedFYAVFYAVFYAVFYAVFYAVFYAVFYAVdensitydensitydensitydensitydensitydensitydensitydensityfemale$maleblack$whiteasian$whitemexican$whiteFullUnawareoriginal datacounter-factual\fTable 1: Prediction results using logistic regression. Note that we must sacri\ufb01ce a small amount of\naccuracy to ensuring counterfactually fair prediction (Fair K, Fair Add), versus the models that use\nunfair features: GPA, LSAT, race, sex (Full, Unaware).\n\n0.894\n\n0.929\n\n0.918\n\nFull\nRMSE 0.873\n\nUnaware Fair K Fair Add\n\nin Figure 2, (Level 3), and is expressed by:\n\nGPA = bG + wR\nLSAT = bL + wR\nFYA = bF + wR\n\nGR + wS\nL R + wS\nF R + wS\n\nGS + \u0001G, \u0001G \u223c p(\u0001G)\nLS + \u0001L, \u0001L \u223c p(\u0001L)\nF S + \u0001F , \u0001F \u223c p(\u0001F )\n\nWe estimate the error terms \u0001G, \u0001L by \ufb01rst \ufb01tting two models that each use race and sex to individually\npredict GPA and LSAT. We then compute the residuals of each model (e.g., \u0001G =GPA\u2212 \u02c6YGPA(R, S)).\nWe use these residual estimates of \u0001G, \u0001L to predict FYA. We call this Fair Add.\n\nAccuracy. We compare the RMSE achieved by logistic regression for each of the models on the test\nset in Table 1. The Full model achieves the lowest RMSE as it uses race and sex to more accurately\nreconstruct FYA. Note that in this case, this model is not fair even if the data was generated by one of\nthe models shown in Figure 2 as it corresponds to Scenario 3. The (also unfair) Unaware model still\nuses the unfair variables GPA and LSAT, but because it does not use race and sex it cannot match the\nRMSE of the Full model. As our models satisfy counterfactual fairness, they trade off some accuracy.\nOur \ufb01rst model Fair K uses weaker assumptions and thus the RMSE is highest. Using the Level 3\nassumptions, as in Fair Add we produce a counterfactually fair model that trades slightly stronger\nassumptions for lower RMSE.\n\nCounterfactual fairness. We would like to empirically test whether the baseline methods are\ncounterfactually fair. To do so we will assume the true model of the world is given by Figure 2,\n(Level 2). We can \ufb01t the parameters of this model using the observed data and evaluate counterfactual\nfairness by sampling from it. Speci\ufb01cally, we will generate samples from the model given either\nthe observed race and sex, or counterfactual race and sex variables. We will \ufb01t models to both the\noriginal and counterfactual sampled data and plot how the distribution of predicted FYA changes for\nboth baseline models. Figure 2 shows this, where each row corresponds to a baseline predictor and\neach column corresponds to the counterfactual change. In each plot, the blue distribution is density of\npredicted FYA for the original data and the red distribution is this density for the counterfactual data. If\na model is counterfactually fair we would expect these distributions to lie exactly on top of each other.\nInstead, we note that the Full model exhibits counterfactual unfairness for all counterfactuals except\nsex. We see a similar trend for the Unaware model, although it is closer to being counterfactually\nfair. To see why these models seem to be fair w.r.t. to sex we can look at weights of the DAG which\ngenerates the counterfactual data. Speci\ufb01cally the DAG weights from (male,female) to GPA are\n(0.93,1.06) and from (male,female) to LSAT are (1.1,1.1). Thus, these models are fair w.r.t. to sex\nsimply because of a very weak causal link between sex and GPA/LSAT.\n\n6 Conclusion\n\nWe have presented a new model of fairness we refer to as counterfactual fairness. It allows us\nto propose algorithms that, rather than simply ignoring protected attributes, are able to take into\naccount the different social biases that may arise towards individuals based on ethically sensitive\nattributes and compensate for these biases effectively. We experimentally contrasted our approach\nwith previous fairness approaches and show that our explicit causal models capture these social biases\nand make clear the implicit trade-off between prediction accuracy and fairness in an unfair world. We\npropose that fairness should be regulated by explicitly modeling the causal structure of the world.\nCriteria based purely on probabilistic independence cannot satisfy this and are unable to address how\nunfairness is occurring in the task at hand. By providing such causal tools for addressing fairness\nquestions we hope we can provide practitioners with customized techniques for solving a wide array\nof fairness modeling problems.\n\n9\n\n\fAcknowledgments\n\nThis work was supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1. CR\nacknowledges additional support under the EPSRC Platform Grant EP/P022529/1. We thank Adrian\nWeller for insightful feedback, and the anonymous reviewers for helpful comments.\n\nReferences\n[1] Berk, R., Heidari, H., Jabbari, S., Kearns, M., and Roth, A. Fairness in criminal justice risk\n\nassessments: The state of the art. arXiv:1703.09207v1, 2017. 2\n\n[2] Bollen, K. Structural Equations with Latent Variables. John Wiley & Sons, 1989. 3, 7\n\n[3] Bollen, K. and (eds.), J. Long. Testing Structural Equation Models. SAGE Publications, 1993.\n\n13\n\n[4] Bolukbasi, Tolga, Chang, Kai-Wei, Zou, James Y, Saligrama, Venkatesh, and Kalai, Adam T.\nMan is to computer programmer as woman is to homemaker? debiasing word embeddings. In\nAdvances in Neural Information Processing Systems, pp. 4349\u20134357, 2016. 1\n\n[5] Brennan, Tim, Dieterich, William, and Ehret, Beate. Evaluating the predictive validity of the\ncompas risk and needs assessment system. Criminal Justice and Behavior, 36(1):21\u201340, 2009.\n1\n\n[6] Calders, Toon and Verwer, Sicco. Three naive bayes approaches for discrimination-free classi\ufb01-\n\ncation. Data Mining and Knowledge Discovery, 21(2):277\u2013292, 2010. 1\n\n[7] Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction\n\ninstruments. Big Data, 2:153\u2013163, 2017. 2, 6\n\n[8] Dawid, A. P. Causal inference without counterfactuals. Journal of the American Statistical\n\nAssociation, pp. 407\u2013448, 2000. 7\n\n[9] DeDeo, Simon. Wrong side of the tracks: Big data and protected categories. arXiv preprint\n\narXiv:1412.4643, 2014. 2\n\n[10] Dwork, Cynthia, Hardt, Moritz, Pitassi, Toniann, Reingold, Omer, and Zemel, Richard. Fairness\nthrough awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science\nConference, pp. 214\u2013226. ACM, 2012. 1, 2\n\n[11] Glymour, C. and Glymour, M. R. Commentary: Race and sex are causes. Epidemiology, 25(4):\n\n488\u2013490, 2014. 7\n\n[12] Grgic-Hlaca, Nina, Zafar, Muhammad Bilal, Gummadi, Krishna P, and Weller, Adrian. The case\nfor process fairness in learning: Feature selection for fair decision making. NIPS Symposium on\nMachine Learning and the Law, 2016. 1, 2\n\n[13] Halpern, J. Actual Causality. MIT Press, 2016. 3\n\n[14] Hardt, Moritz, Price, Eric, Srebro, Nati, et al. Equality of opportunity in supervised learning. In\n\nAdvances in Neural Information Processing Systems, pp. 3315\u20133323, 2016. 1, 2\n\n[15] Johnson, Kory D, Foster, Dean P, and Stine, Robert A. Impartial predictive modeling: Ensuring\n\nfairness in arbitrary models. arXiv preprint arXiv:1608.00528, 2016. 2\n\n[16] Joseph, Matthew, Kearns, Michael, Morgenstern, Jamie, Neel, Seth, and Roth, Aaron. Rawlsian\n\nfairness for machine learning. arXiv preprint arXiv:1610.09559, 2016. 1, 2\n\n[17] Kamiran, Faisal and Calders, Toon. Classifying without discriminating. In Computer, Control\nand Communication, 2009. IC4 2009. 2nd International Conference on, pp. 1\u20136. IEEE, 2009.\n\n[18] Kamiran, Faisal and Calders, Toon. Data preprocessing techniques for classi\ufb01cation without\n\ndiscrimination. Knowledge and Information Systems, 33(1):1\u201333, 2012.\n\n10\n\n\f[19] Kamishima, Toshihiro, Akaho, Shotaro, and Sakuma, Jun. Fairness-aware learning through\nregularization approach. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International\nConference on, pp. 643\u2013650. IEEE, 2011. 1\n\n[20] Khandani, Amir E, Kim, Adlar J, and Lo, Andrew W. Consumer credit-risk models via\n\nmachine-learning algorithms. Journal of Banking & Finance, 34(11):2767\u20132787, 2010. 1\n\n[21] Kilbertus, N., Carulla, M. R., Parascandolo, G., Hardt, M., Janzing, D., and Sch\u00f6lkopf, B.\nAvoiding discrimination through causal reasoning. Advances in Neural Information Processing\nSystems 30, 2017. 2, 5, 7\n\n[22] Kleinberg, J., Mullainathan, S., and Raghavan, M. Inherent trade-offs in the fair determination\nof risk scores. Proceedings of The 8th Innovations in Theoretical Computer Science Conference\n(ITCS 2017), 2017. 1, 2, 6\n\n[23] Lewis, D. Counterfactuals. Harvard University Press, 1973. 3\n\n[24] Louizos, Christos, Swersky, Kevin, Li, Yujia, Welling, Max, and Zemel, Richard. The variational\n\nfair autoencoder. arXiv preprint arXiv:1511.00830, 2015. 1, 2, 6\n\n[25] Mahoney, John F and Mohen, James M. Method and system for loan origination and underwrit-\n\ning, October 23 2007. US Patent 7,287,008. 1\n\n[26] Mooij, J., Janzing, D., Peters, J., and Scholkopf, B. Regression by dependence minimization\nand its application to causal inference in additive noise models. In Proceedings of the 26th\nAnnual International Conference on Machine Learning, pp. 745\u2013752, 2009. 6\n\n[27] Nabi, R. and Shpitser, I. Fair inference on outcomes. arXiv:1705.10378v1, 2017. 5, 7, 16\n\n[28] Pearl, J. Causality: Models, Reasoning and Inference. Cambridge University Press, 2000. 3, 4,\n\n5, 7\n\n[29] Pearl, J., Glymour, M., and Jewell, N. Causal Inference in Statistics: a Primer. Wiley, 2016. 2,\n\n3, 4, 7, 12, 16\n\n[30] Pearl, Judea. Causal inference in statistics: An overview. Statistics Surveys, 3:96\u2013146, 2009. 2\n\n[31] Peters, J., Mooij, J. M., Janzing, D., and Sch\u00f6lkopf, B. Causal discovery with continuous\nadditive noise models. Journal of Machine Learning Research, 15:2009\u20132053, 2014. URL\nhttp://jmlr.org/papers/v15/peters14a.html. 7\n\n[32] Russell, C., Kusner, M., Loftus, J., and Silva, R. When worlds collide: integrating different\ncounterfactual assumptions in fairness. Advances in Neural Information Processing Systems,\n31, 2017. 7, 13\n\n[33] Silva, R. and Evans, R. Causal inference through a witness protection program. Journal of\n\nMachine Learning Research, 17(56):1\u201353, 2016. 7\n\n[34] Stan Development Team. Rstan: the r interface to stan, 2016. R package version 2.14.1. 8\n\n[35] Wightman, Linda F. Lsac national longitudinal bar passage study. lsac research report series.\n\n1998. 8\n\n[36] Zafar, Muhammad Bilal, Valera, Isabel, Rodriguez, Manuel Gomez, and Gummadi, Krishna P.\n\nLearning fair classi\ufb01ers. arXiv preprint arXiv:1507.05259, 2015. 1, 2\n\n[37] Zafar, Muhammad Bilal, Valera, Isabel, Rodriguez, Manuel Gomez, and Gummadi, Krishna P.\nFairness beyond disparate treatment & disparate impact: Learning classi\ufb01cation without dis-\nparate mistreatment. arXiv preprint arXiv:1610.08452, 2016. 2\n\n[38] Zemel, Richard S, Wu, Yu, Swersky, Kevin, Pitassi, Toniann, and Dwork, Cynthia. Learning\n\nfair representations. ICML (3), 28:325\u2013333, 2013. 2\n\n[39] Zliobaite, Indre. A survey on measuring indirect discrimination in machine learning. arXiv\n\npreprint arXiv:1511.00148, 2015. 1\n\n11\n\n\f", "award": [], "sourceid": 2152, "authors": [{"given_name": "Matt", "family_name": "Kusner", "institution": "Alan Turing Institute"}, {"given_name": "Joshua", "family_name": "Loftus", "institution": "The Alan Turing Institute"}, {"given_name": "Chris", "family_name": "Russell", "institution": "The Alan Turing Institute/ The University of Surrey"}, {"given_name": "Ricardo", "family_name": "Silva", "institution": "University College London"}]}