{"title": "Inherent Tradeoffs in Learning Fair Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 15675, "page_last": 15685, "abstract": "With the prevalence of machine learning in high-stakes applications, especially the ones regulated by anti-discrimination laws or societal norms, it is crucial to ensure that the predictive models do not propagate any existing bias or discrimination. Due to the ability of deep neural nets to learn rich representations, recent advances in algorithmic fairness have focused on learning fair representations with adversarial techniques to reduce bias in data while preserving utility simultaneously. In this paper, through the lens of information theory, we provide the first result that quantitatively characterizes the tradeoff between demographic parity and the joint utility across different population groups. Specifically, when the base rates differ between groups, we show that any method aiming to learn fair representations admits an information-theoretic lower bound on the joint error across these groups. To complement our negative results, we also prove that if the optimal decision functions across different groups are close, then learning fair representations leads to an alternative notion of fairness, known as the accuracy parity, which states that the error rates are close between groups. Finally, our theoretical findings are also confirmed empirically on real-world datasets.", "full_text": "Inherent Tradeoffs in Learning Fair Representations\n\nHan Zhao\u2217\n\nMachine Learning Department\nSchool of Computer Science\nCarnegie Mellon University\nhan.zhao@cs.cmu.edu\n\nGeoffrey J. Gordon\n\nMicrosoft Research, Montreal\nMachine Learning Department\nCarnegie Mellon University\n\ngeoff.gordon@microsoft.com\n\nAbstract\n\nWith the prevalence of machine learning in high-stakes applications, especially the\nones regulated by anti-discrimination laws or societal norms, it is crucial to ensure\nthat the predictive models do not propagate any existing bias or discrimination. Due\nto the ability of deep neural nets to learn rich representations, recent advances in\nalgorithmic fairness have focused on learning fair representations with adversarial\ntechniques to reduce bias in data while preserving utility simultaneously. In this\npaper, through the lens of information theory, we provide the \ufb01rst result that\nquantitatively characterizes the tradeoff between demographic parity and the joint\nutility across different population groups. Speci\ufb01cally, when the base rates differ\nbetween groups, we show that any method aiming to learn fair representations\nadmits an information-theoretic lower bound on the joint error across these groups.\nTo complement our negative results, we also prove that if the optimal decision\nfunctions across different groups are close, then learning fair representations leads\nto an alternative notion of fairness, known as the accuracy parity, which states that\nthe error rates are close between groups. Finally, our theoretical \ufb01ndings are also\ncon\ufb01rmed empirically on real-world datasets.\n\n1\n\nIntroduction\n\nWith the prevalence of machine learning applications in high-stakes domains, e.g., criminal judgement,\nmedical testing, online advertising, etc., it is crucial to ensure that the automated decision making\nsystems do not propagate existing bias or discrimination that might exist in historical data [3, 4, 28].\nAmong many recent proposals for achieving different notions of algorithmic fairness [10, 14, 31\u201333],\nlearning fair representations has received increasing attention due to recent advances in learning\nrich representations with deep neural networks [5, 11, 24, 26, 30, 34]. In fact, a line of work has\nproposed to learn group-invariant representations with adversarial learning techniques in order to\nachieve statistical parity, also known as the demographic parity in the literature. This line of work\ndates at least back to Zemel et al. [33] where the authors proposed to learn predictive models that\nare independent of the group membership attribute. At a high level, the underlying idea is that if\nrepresentations of instances from different groups are similar to each other, then any predictive model\non top of them will certainly make decisions independent of group membership.\nOn the other hand, it has long been observed that there is an underlying tradeoff between utility and\ndemographic parity:\n\n\u201cAll methods have in common that to some extent accuracy must be traded-off for\nlowering the dependency.\u201d [6]\n\nIn particular, it is easy to see that in an extreme case where the group membership coincides with\nthe target task, a call for exact demographic parity will inevitably remove the perfect predictor [14].\n\n\u2217Part of this work was done when Han Zhao was visiting Microsoft Research, Montreal.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fEmpirically, it has also been observed that a tradeoff exists between accuracy and fairness in binary\nclassi\ufb01cation [38]. Clearly, methods based on learning fair representations are also bound by such\ninherent tradeoff between utility and fairness. But how does the fairness constraint trade for utility?\nWill learning fair representations help to achieve other notions of fairness besides the demographic\nparity? If yes, what is the fundamental limit of utility that we can hope to achieve under such\nconstraint?\nTo answer the above questions, through the lens of information theory, in this paper we provide the\n\ufb01rst result that quantitatively characterizes the tradeoff between demographic parity and the joint\nutility across different population groups. Speci\ufb01cally, when the base rates differ between groups,\nwe provide a tight information-theoretic lower bound on the joint error across these groups. Our\nlower bound is algorithm-independent so it holds for all methods aiming to learn fair representations.\nWhen only approximate demographic parity is achieved, we also present a family of lower bounds to\nquantify the tradeoff of utility introduced by such approximate constraint. As a side contribution, our\nproof technique is simple but general, and we expect it to have broader applications in other learning\nproblems using adversarial techniques, e.g., unsupervised domain adaptation [12, 36], privacy-\npreservation under attribute inference attacks [13, 35] and multilingual machine translation [16].\nTo complement our negative results, we show that if the optimal decision functions across different\ngroups are close, then learning fair representations helps to achieve an alternative notion of fairness,\ni.e., the accuracy parity, which states that the error rates are close between groups. Empirically,\nwe conduct experiments on a real-world dataset that corroborate both our positive and negative\nresults. We believe our theoretical insights contribute to better understanding of the tradeoff between\nutility and different notions of fairness, and they are also helpful in guiding the future design of\nrepresentation learning algorithms to achieve algorithmic fairness.\n\n2 Preliminary\n\nWe \ufb01rst introduce the notation used throughout the paper and formally describe the problem setup.\nWe then brie\ufb02y discuss some information-theoretic concepts that will be used in our analysis.\nNotation We use X \u2286 Rd and Y = {0, 1} to denote the input and output space. Accordingly, we\nuse X and Y to denote the random variables which take values in X and Y, respectively. Lower case\nletters x and y are used to denote the instantiation of X and Y . To simplify the presentation, we\nuse A \u2208 {0, 1} as the sensitive attribute, e.g., race, gender, etc. 2 Let H be the hypothesis class of\nclassi\ufb01ers. In other words, for h \u2208 H, h : X \u2192 Y is the predictor that outputs a prediction. Note that\neven if the predictor does not explicitly take the sensitive attribute A as input, this fairness through\nblindness mechanism can still be biased due to the potential correlations between X and A. In this\nwork we study the stochastic setting where there is a joint distribution D over X, Y and A from\nwhich the data are sampled. To keep the notation consistent, for a \u2208 {0, 1}, we use Da to mean\nthe conditional distribution of D given A = a. For an event E, D(E) denotes the probability of E\nunder D. In particular, in the literature of fair machine learning, we call D(Y = 1) the base rate\nof distribution D and we use \u2206BR(D,D(cid:48)) := |D(Y = 1) \u2212 D(cid:48)(Y = 1)| to denote the difference\nof the base rates between two distributions D and D(cid:48) over the same sample space. Given a feature\ntransformation function g : X \u2192 Z that maps instances from the input space X to feature space Z,\nwe de\ufb01ne g(cid:93)D := D \u25e6 g\u22121 to be the induced (pushforward) distribution of D under g, i.e., for any\nevent E(cid:48) \u2286 Z, g(cid:93)D(E(cid:48)) := D(g\u22121(E(cid:48))) = D({x \u2208 X | g(x) \u2208 E(cid:48)}).\nProblem Setup Given a joint distribution D, the error of a predictor h under D is de\ufb01ned as\nErrD(h) := ED[|Y \u2212 h(X)|]. Note that for binary classi\ufb01cation problems, when h(X) \u2208 {0, 1},\nErrD(h) reduces to the true error rate of binary classi\ufb01cation. To make the notation more compact,\nwe may drop the subscript D when it is clear from the context. In this work we focus on group\nfairness where the group membership is given by the sensitive attribute A. Even in this context there\nare many possible de\ufb01nitions of fairness [27], and in what follows we provide a brief review of the\nones that are mostly relevant to this work.\n\nDe\ufb01nition 2.1 (Demographic Parity). Given a joint distribution D, a classi\ufb01er (cid:98)Y satis\ufb01es demo-\ngraphic parity if (cid:98)Y is independent of A.\n\n2Our main results could also be straightforwardly extended to the setting where A is a categorical variable.\n\n2\n\n\fDemographic parity reduces to the requirement that D0((cid:98)Y = 1) = D1((cid:98)Y = 1), i.e., positive outcome\nDe\ufb01nition 2.2 (DP Gap). Given a joint distribution D, the demographic parity gap of a classi\ufb01er(cid:98)Y\nis \u2206DP((cid:98)Y ) := |D0((cid:98)Y = 1) \u2212 D1((cid:98)Y = 1)|.\n\nis given to the two groups at the same rate. When exact equality does not hold, we use the absolute\ndifference between them as an approximate measure:\n\nDemographic parity is also known as statistical parity, and it has been adopted as de\ufb01nition of fairness\nin a series of work [6, 11, 15, 17, 18, 24, 26, 33]. However, as we shall quantify precisely in Section 3,\ndemographic parity may cripple the utility that we hope to achieve, especially in the common scenario\nwhere the base rates differ between two groups, e.g., D0(Y = 1) (cid:54)= D1(Y = 1) [14]. In light of this,\nan alternative de\ufb01nition is accuracy parity:\nDe\ufb01nition 2.3 (Accuracy Parity). Given a joint distribution D, a classi\ufb01er h satis\ufb01es accuracy parity\nif ErrD0(h) = ErrD1(h).\n\nIn the literature, a break of accuracy parity is also known as disparate mistreatment [32]. Again, when\nh is a deterministic binary classi\ufb01er, accuracy parity reduces to D0(h(X) = Y ) = D1(h(X) = Y ).\nDifferent from demographic parity, the de\ufb01nition of accuracy parity does not eliminate the perfect\npredictor when Y = A when the base rates differ between two groups. When costs of different error\ntypes matter, more re\ufb01ned de\ufb01nitions exist:\nDe\ufb01nition 2.4 (Positive Rate Parity). Given a joint distribution D, a deterministic classi\ufb01er h satis\ufb01es\npositive rate parity if D0(h(X) = 1 | Y = y) = D1(h(X) = 1 | Y = y), \u2200y \u2208 {0, 1}.\nPositive rate parity is also known as equalized odds [14], which essentially requires equal true positive\nand false positive rates between different groups. Furthermore, Hardt et al. [14] also de\ufb01ned true\npositive parity, or equal opportunity, to be D0(h(X) = 1 | Y = 1) = D1(h(X) = 1 | Y = 1) when\npositive outcome is desirable. Last but not least, predictive rate parity, also known as test fairness [7],\nasks for equal chance of positive outcomes across groups given predictions:\nDe\ufb01nition 2.5 (Predictive Rate Parity). Given a joint distribution D, a probabilistic classi\ufb01er h\nsatis\ufb01es predictive rate parity if D0(Y = 1 | h(X) = c) = D1(Y = 1 | h(X) = c), \u2200c \u2208 [0, 1].\nWhen h is a deterministic binary classi\ufb01er that only takes value in {0, 1}, Chouldechova [7] showed\nan intrinsic tradeoff between predictive rate parity and positive rate parity:\nTheorem 2.1 (Chouldechova [7]). Assume D0(Y = 1) (cid:54)= D1(Y = 1), then for any deterministic\nclassi\ufb01er h : X \u2192 {0, 1} that is not perfect, i.e., h(X) (cid:54)= Y , positive rate parity and predictive rate\nparity cannot hold simultaneously.\n\nSimilar tradeoff result for probabilistic classi\ufb01er has also been observed by Kleinberg et al. [21],\nwhere the authors showed that for any non-perfect predictors, calibration and positive rate parity\ncannot be achieved simultaneously if the base rates are different across groups. Here a classi\ufb01er h is\nsaid to be calibrated if D(Y = 1 | h(X) = c) = c,\u2200c \u2208 [0, 1], i.e., if we look at the set of data that\nreceive a predicted probability of c by h, we would like c-fraction of them to be positive instances\naccording to Y [29].\n\nf-divergence\nIntroduced by Ali and Silvey [2] and Csisz\u00e1r [8, 9], f-divergence, also known as the\nAli-Silvey distance, is a general class of statistical divergences to measure the difference between\ntwo probability distributions P and Q over the same measurable space.\nDe\ufb01nition 2.6 (f-divergence). Let P and Q be two probability distributions over the same space and\nassume P is absolutely continuous w.r.t. Q (P (cid:28) Q). Then for any convex function f : (0,\u221e) \u2192 R\nthat is strictly convex at 1 and f (1) = 0, the f-divergence of Q from P is de\ufb01ned as\n\n(cid:20)\n\n(cid:18) dP\n\n(cid:19)(cid:21)\n\nDf (P || Q) := EQ\n\ndQ\nThe function f is called the generator function of Df (\u00b7 || \u00b7).\nDifferent choices of the generator function f recover popular statistical divergence as special cases,\ne.g., the KL-divergence. From Jensen\u2019s inequality it is easy to verify that Df (P || Q) \u2265 0 and\nDf (P || Q) = 0 iff P = Q almost surely. Note that f-divergence does not necessarily leads to\n\nf\n\n.\n\n(1)\n\n3\n\n\fa distance metric, and it is not symmetric in general, i.e., Df (P || Q) (cid:54)= Df (Q || P) provided\nthat P (cid:28) Q and Q (cid:28) P. We list some common choices of the generator function f and their\ncorresponding properties in Table 1. Notably, Khosravifard et al. [20] proved that total variation is\nthe only f-divergence that serves as a metric, i.e., satisfying the triangle inequality.\n\nTable 1: List of different f-divergences and their corresponding properties. DKL(P || Q) denotes the\nKL-divergence of Q from P and M := (P + Q)/2 is the average distribution of P and Q. Symm.\nstands for Symmetric and Tri. stands for Triangle Inequality.\n\nDf (P || Q)\nName\nKullback-Leibler DKL(P || Q)\nDKL(Q || P)\nReverse-KL\nJensen-Shannon DJS(P,Q) := 1\nSquared Hellinger H 2(P,Q) := 1\nTotal Variation\n\n(cid:82) (\n\ndQ)2\ndTV(P,Q) := supE |P(E) \u2212 Q(E)|\n\n2\n\n\u221a\n\ndP \u2212 \u221a\n\n2 (DKL(P||M) + DKL(Q||M)) t log t \u2212 (t + 1) log( t+1\n\nGenerator f (t)\n\nt log t\n\u2212 log t\n(1 \u2212 \u221a\n|t \u2212 1|/2\n\nt)2/2\n\nSymm. Tri.\n\u0017\n\u0017\n2 ) \u0013\n\u0013\n\u0013\n\n\u0017\n\u0017\n\u0017\n\u0017\n\u0013\n\n3 Main Results\n\nAs we brie\ufb02y mentioned in Section 2, it is impossible to have imperfect predictor that is both\ncalibrated and preserves positive rate parity when the base rates differ between two groups. Similar\nimpossibility result also holds between positive rate parity and predictive rate parity. On the other\nhand, while it has long been observed that demographic parity may eliminate perfect predictor [14],\nand previous work has empirically veri\ufb01ed that tradeoff exists between accuracy and demographic\nparity [6, 17, 38] on various datasets, so far a quantitative characterization on the exact tradeoff\nbetween accuracy and various notions of parity is still missing. In what follows we shall prove a\nfamily of information theoretic lower bounds on the accuracy that hold for all algorithms.\n\n3.1 Tradeoff between Fairness and Utility\n\ng\u2212\u2192 Z h\u2212\u2192 (cid:98)Y , where g is the\nfeature transformation, h is the classi\ufb01er on feature space, Z is the feature and(cid:98)Y is the predicted target\nEssentially, every prediction function induces a Markov chain: X\nvariable by h \u25e6 g. Note that simple models, e.g., linear classi\ufb01ers, are also included by specifying\ng to be the identity map. With this notation, we \ufb01rst state the following theorem that quanti\ufb01es an\nTheorem 3.1. Let(cid:98)Y = h(g(X)) be the predictor. If(cid:98)Y satis\ufb01es demographic parity, then ErrD0(h \u25e6\ninherent tradeoff between fairness and utility.\ng) + ErrD1 (h \u25e6 g) \u2265 \u2206BR(D0,D1).\nRemark First of all, \u2206BR(D0,D1) is the difference of base rates across groups, and it achieves\nits maximum value of 1 iff Y = A almost surely, i.e., Y indicates group membership. On the other\nhand, if Y is independent of A, then \u2206BR(D0,D1) = 0 so the lower bound does not make any\nconstraint on the joint error. Second, Theorem 3.1 applies to all possible feature transformation g\nand predictor h. In particular, if we choose g to be the identity map, then Theorem 3.1 says that\nwhen the base rates differ, no algorithm can achieve a small joint error on both groups, and it also\nrecovers the previous observation that demographic parity can eliminate the perfect predictor [14].\nThird, the lower bound in Theorem 3.1 is insensitive to the marginal distribution of A, i.e., it treats\nthe errors from both groups equally. As a comparison, let \u03b1 := D(A = 1), then ErrD(h \u25e6 g) =\n(1 \u2212 \u03b1)ErrD0 (h \u25e6 g) + \u03b1ErrD1 (h \u25e6 g). In this case ErrD(h \u25e6 g) could still be small even if the\nminority group suffers a large error.\nCorollary 3.1. If the predictor (cid:98)Y = h(g(X)) satis\ufb01es demographic parity, then max{ErrD0(h \u25e6\nFurthermore, by the pigeonhole principle, the following corollary holds:\ng), ErrD1(h \u25e6 g)} \u2265 \u2206BR(D0,D1)/2.\nIn words, this means that for fair predictors in the demographic parity sense, at least one of the\nsubgroups has to incur an error of at least \u2206BR(D0,D1)/2 which could be large in settings like\ncriminal justice where \u2206BR(D0,D1) is large.\n\n4\n\n\fBefore we give the proof, we \ufb01rst present a useful lemma that lower bounds the prediction error by\nLemma 3.1. Let (cid:98)Y = h(g(X)) be the predictor, then for a \u2208 {0, 1}, dTV(Da(Y ),Da((cid:98)Y )) \u2264\nthe total variation distance.\nErrDa (h \u25e6 g).\nProof. For a \u2208 {0, 1}, we have:\n\ndTV(Da(Y ),Da((cid:98)Y )) = |Da(Y = 1) \u2212 Da(h(g(X)) = 1)| = |EDa[Y ] \u2212 EDa [h(g(X))]|\n\n\u2264 EDa [|Y \u2212 h(g(X))|] = ErrDa (h \u25e6 g).\n\nNow we are ready to prove Theorem 3.1:\n\nProof of Theorem 3.1. First of all, we show that if (cid:98)Y = h(g(X)) satis\ufb01es demographic parity, then:\ndTV(D0((cid:98)Y ),D1((cid:98)Y )) = max(cid:8)|D0((cid:98)Y = 0) \u2212 D1((cid:98)Y = 0)|, |D0((cid:98)Y = 1) \u2212 D1((cid:98)Y = 1)|(cid:9)\n\n= |D0((cid:98)Y = 1) \u2212 D1((cid:98)Y = 1)|\n= |D((cid:98)Y = 1 | A = 0) \u2212 D((cid:98)Y = 1 | A = 1)| = 0,\n\n(cid:4)\n\n(cid:4)\n\nwhere the last equality follows from the de\ufb01nition of demographic parity. Now from Table 1, dTV(\u00b7,\u00b7)\nis symmetric and satis\ufb01es the triangle inequality, we have:\n\ndTV(D0(Y ),D1(Y )) \u2264 dTV(D0(Y ),D0((cid:98)Y )) + dTV(D0((cid:98)Y ),D1((cid:98)Y )) + dTV(D1((cid:98)Y ),D1(Y ))\nThe last step is to bound dTV(Da(Y ),Da((cid:98)Y )) in terms of ErrDa (h \u25e6 g) for a \u2208 {0, 1} using\n\n= dTV(D0(Y ),D0((cid:98)Y )) + dTV(D1((cid:98)Y ),D1(Y )).\n\n(2)\n\ndTV(D0(Y ),D0((cid:98)Y )) \u2264 ErrD0(h \u25e6 g),\n\ndTV(D1(Y ),D1((cid:98)Y )) \u2264 ErrD1 (h \u25e6 g).\n\nLemma 3.1:\n\nCombining the above two inequalities and (2) completes the proof.\n\npredictor (cid:98)Y \u2261 1 or (cid:98)Y \u2261 0, which clearly satis\ufb01es demographic parity by de\ufb01nition. But in this\nIt is not hard to show that our lower bound in Theorem 3.1 is tight. To see this, consider the\ncase A = Y , where the lower bound achieves its maximum value of 1. Now consider a constant\ncase either ErrD0(h \u25e6 g) = 1, ErrD1(h \u25e6 g) = 0 or ErrD0 (h \u25e6 g) = 0, ErrD1(h \u25e6 g) = 1, hence\nErrD0(h \u25e6 g) + ErrD1 (h \u25e6 g) \u2261 1, achieving the lower bound.\nTo conclude this section, we point out that the choice of total variation in the lower bound is not\nunique. As we will see shortly in Section 3.2, similar lower bounds could be attained using speci\ufb01c\nchoices of the general f-divergence with some desired properties.\n\n3.2 Tradeoff in Fair Representation Learning\n\nIn the last section we show that there is an inherent tradeoff between fairness and utility when a\npredictor exactly satis\ufb01es demographic parity. In practice we may not be able to achieve demographic\nparity exactly. Instead, most algorithms [1, 5, 11, 24] build an adversarial discriminator that takes as\ninput the feature vector Z = g(X), and the goal is to learn fair representations such that it is hard for\nthe adversarial discriminator to infer the group membership from Z. In this sense due to the limit on\nthe capacity of the adversarial discriminator, only approximate demographic parity can be achieved\nin the equilibrium. Hence it is natural to ask what is the tradeoff between fair representations and\naccuracy in this scenario? In this section we shall answer this question by generalizing our previous\nanalysis with f-divergence to prove a family of lower bounds on the joint target prediction error.\nOur results also show how approximate DP helps to reconcile but not remove the tradeoff between\nfairness and utility. Before we state and prove the main results in this section, we \ufb01rst introduce the\nfollowing lemma by Liese and Vajda [22] as a generalization of the data processing inequality for\nf-divergence:\nLemma 3.2 (Liese and Vajda [22]). Let \u00b5(Z) be the space of all probability distributions over Z.\nThen for any f-divergence Df (\u00b7 || \u00b7), any stochastic kernel \u03ba : X \u2192 \u00b5(Z), and any distributions P\nand Q over X , Df (\u03baP || \u03baQ) \u2264 Df (P || Q).\n\n5\n\n\fDe\ufb01ne dJS(P,Q) :=(cid:112)DJS(P,Q) and H(P,Q) :=(cid:112)H 2(P,Q). It is well-known in information\n\nRoughly speaking, Lemma 3.2 says that data processing cannot increase discriminating information.\ntheory that both dJS(\u00b7,\u00b7) and H(\u00b7,\u00b7) form a bounded distance metric over the space of probability\ndistributions. Realize that dTV(\u00b7,\u00b7), H 2(\u00b7,\u00b7) and DJS(\u00b7,\u00b7) are all f-divergence. The following\ncorollary holds:\nCorollary 3.2. Let h : Z \u2192 Y be any hypothesis, and g(cid:93)Da be the pushforward distribution of Da\n\nby g, \u2200a \u2208 {0, 1}. Let (cid:98)Y = h(g(X)) be the predictor, then all the following inequalities hold:\n\n1. dTV(D0((cid:98)Y ),D1((cid:98)Y )) \u2264 dTV(g(cid:93)D0, g(cid:93)D1)\n2. H(D0((cid:98)Y ),D1((cid:98)Y )) \u2264 H(g(cid:93)D0, g(cid:93)D1)\n3. dJS(D0((cid:98)Y ),D1((cid:98)Y )) \u2264 dJS(g(cid:93)D0, g(cid:93)D1)\n\nNow we are ready to present the following main theorem of this section:\n\nTheorem 3.2. Let (cid:98)Y = h(g(X)) be the predictor. Assume dJS(g(cid:93)D0, g(cid:93)D1) \u2264 dJS(D0(Y ),D1(Y ))\n\nand H(g(cid:93)D0, g(cid:93)D1) \u2264 H(D0(Y ),D1(Y )), then the following three inequalities hold:\n\n1. Total variation lower bound:\n\n2. Jensen-Shannon lower bound:\n\nErrD0(h \u25e6 g) + ErrD1(h \u25e6 g) \u2265 dTV(D0(Y ),D1(Y )) \u2212 dTV(g(cid:93)D0, g(cid:93)D1).\n\nErrD0 (h \u25e6 g) + ErrD1(h \u25e6 g) \u2265(cid:0)dJS(D0(Y ),D1(Y )) \u2212 dJS(g(cid:93)D0, g(cid:93)D1)(cid:1)2\nErrD0 (h \u25e6 g) + ErrD1 (h \u25e6 g) \u2265(cid:0)H(D0(Y ),D1(Y )) \u2212 H(g(cid:93)D0, g(cid:93)D1)(cid:1)2\n\n/2.\n\n/2.\n\n3. Hellinger lower bound:\n\nRemark All the three lower bounds in Theorem 3.2 imply a tradeoff between the joint error across\ndemographic subgroups and learning group-invariant feature representations. In a nutshell:\nFor fair representations, it is not possible to construct a predictor that simultane-\nously minimizes the errors on both demographic subgroups.\n\nWhen g(cid:93)D0 = g(cid:93)D1, which also implies D0((cid:98)Y ) = D1((cid:98)Y ), all three lower bounds get larger, in this\n\ncase we have\n\n(cid:26)\n\nmax\n\ndTV(D0(Y ),D1(Y )),\n\nJS(D0(Y ),D1(Y )),\nd2\n\n1\n2\n\n1\n2\n\nH 2(D0(Y ),D1(Y ))\n\n(cid:27)\n\n= dTV(D0(Y ),D1(Y ))\n= \u2206BR(D0,D1),\n\nand this reduces to Theorem 3.1. Now we give a sketch of the proof for Theorem 3.2:\n\nProof Sketch of Theorem 3.2. We prove the three inequalities respectively. The total variation lower\ndTV(g(cid:93)D0, g(cid:93)D1) from Corollary 3.2. To prove the Jensen-Shannon lower bound, realize that\ndJS(\u00b7,\u00b7) is a distance metric over probability distributions. Combining with the inequality\n\nbound follows the same idea as the proof of Theorem 3.1 and the inequality dTV(D0((cid:98)Y ),D1((cid:98)Y )) \u2264\ndJS(D0((cid:98)Y ),D1((cid:98)Y )) \u2264 dJS(g(cid:93)D0, g(cid:93)D1) from Corollary 3.2, we have:\n\nNow by Lin\u2019s lemma [23, Theorem 3], for any two distributions P and Q, we have d2\ndTV(P,Q). Combine Lin\u2019s lemma with Lemma 3.1, we get the following lower bound:\n\ndJS(D0(Y ),D1(Y )) \u2264 dJS(D0(Y ),D0((cid:98)Y )) + dJS(g(cid:93)D0, g(cid:93)D1) + dJS(D1((cid:98)Y ),D1(Y )).\n(cid:112)ErrD0(h \u25e6 g) +(cid:112)ErrD1(h \u25e6 g) \u2265 dJS(D0(Y ),D1(Y )) \u2212 dJS(g(cid:93)D0, g(cid:93)D1).\n(cid:113)\n2(cid:0)ErrD0 (h \u25e6 g) + ErrD1 (h \u25e6 g)(cid:1) \u2265(cid:112)ErrD0 (h \u25e6 g) +(cid:112)ErrD1(h \u25e6 g).\n\nApply the AM-GM inequality, we can further bound the L.H.S. by\n\nJS(P,Q) \u2264\n\nUnder the assumption that dJS(g(cid:93)D0, g(cid:93)D1) \u2264 dJS(D0(Y ),D1(Y )), taking a square at both sides\nthen completes the proof for the second inequality. The proof for Hellinger\u2019s lower bound follows\nuse the fact that H 2(P,Q) \u2264 dTV(P,Q) \u2264 \u221a\nexactly as the one for Jensen-Shannon\u2019s lower bound, except that instead of Lin\u2019s lemma, we need to\n(cid:4)\n\n2H(P,Q), \u2200P,Q.\n\n6\n\n\fAs a simple corollary of Theorem 3.2, the following result shows how approximate DP (in terms of\nthe DP gap) helps to reconcile the tradeoff between fairness and utility:\n\nCorollary 3.3. Let (cid:98)Y = h(g(X)) be the predictor, then ErrD0(h \u25e6 g) + ErrD1 (h \u25e6 g) \u2265\n\u2206BR(D0,D1) \u2212 \u2206DP((cid:98)Y ).\n\nIn a sense Corollary 3.3 means that in order to lower the joint error, the DP gap of the predictor cannot\nbe too small. Of course, since the above inequality is a lower bound, it only serves as a necessary\ncondition for small joint error. Hence an interesting question would be to ask whether it is possible to\nhave a suf\ufb01cient condition that guarantees a small joint error such that the DP gap of the predictor is\nno larger than that of the perfect predictor, i.e., \u2206BR(D0,D1). We leave this as a future work.\n\n3.3 Fair Representations Lead to Accuracy Parity\n\nIn the previous sections we prove a family of information-theoretic lower bounds that demonstrate an\ninherent tradeoff between fair representations and joint error across groups. A natural question to ask\nthen, is, what kind of parity can fair representations bring us? To complement our negative results,\nin this section we show that learning group-invariant representations help to reduce discrepancy of\nerrors (utilities) across groups.\nFirst of all, since we work under the stochastic setting where Da is a joint distribution over X and\nY conditioned on A = a, then any function mapping h : X \u2192 Y will inevitably incur an error due\nto the noise existed in the distribution Da. Formally, for a \u2208 {0, 1}, de\ufb01ne the optimal function\na : X \u2192 Y under the absolute error to be h\u2217\na(X) := mDa (Y | X), where mDa (Y | X) denotes\nh\u2217\nthe median of Y given X under distribution Da. Now de\ufb01ne the noise of distribution Da to be\nnDa := EDa [|Y \u2212 h\u2217\na(X)|]. With these notations, we are now ready to present the following theorem:\nTheorem 3.3. For any hypothesis H (cid:51) h : X \u2192 Y, the following inequality holds:\n\n|ErrD0(h) \u2212 ErrD1(h)| \u2264 nD0 + nD1 + dTV(D0(X),D1(X))\n1|], ED1[|h\u2217\n\n+ min{ED0[|h\u2217\n\n0 \u2212 h\u2217\n\n0 \u2212 h\u2217\n\n1|]} .\n\n|ErrD0(h) \u2212 ErrD1(h)| \u2264 dTV(D0(X),D1(X)) + min{ED0[|h\u2217\n0 and h\u2217\n\nRemark Theorem 3.3 upper bounds the discrepancy of accuracy across groups by three terms: the\nnoise, the distance of representations across groups and the discrepancy of optimal decision functions.\nIn an ideal setting where both distributions are noiseless, i.e., same people in the same group are\nalways treated equally, the upper bound simpli\ufb01es to the latter two terms:\n0 \u2212 h\u2217\nIf we further require that the optimal decision functions h\u2217\n1 are close to each other, i.e.,\noptimal decisions are insensitive to the group membership, then Theorem 3.3 implies that a suf\ufb01cient\ncondition to guarantee accuracy parity is to \ufb01nd group-invariant representation that minimizes\ndTV(D0(X),D1(X)). We now present the proof for Theorem 3.3:\nProof of Theorem 3.3. First, we show that for a \u2208 {0, 1}, ErrDa (h) cannot be too large if h is close\nto h\u2217\na:\n|ErrDa (h) \u2212 nDa| = |ErrDa (h) \u2212 ErrDa (h\u2217\na(X)|],\n\na)| =(cid:12)(cid:12)EDa [|Y \u2212 h(X)|] \u2212 EDa [|Y \u2212 h\u2217\n\n\u2264 EDa [|h(X) \u2212 h\u2217\n\n1|], ED1[|h\u2217\n\nwhere the inequality is due to triangular inequality. Next, we bound |ErrD0(h) \u2212 ErrD1(h)| by:\n\n0 \u2212 h\u2217\n\n1|]} .\n\n0(X)|] \u2212 ED1 [|h(X) \u2212 h\u2217\n\nIn order to show this, de\ufb01ne \u03b5a(h, h(cid:48)) := EDa [|h(X) \u2212 h(cid:48)(X)|] so that\n\n|ErrD0 (h) \u2212 ErrD1(h)| \u2264 nD0 + nD1 +(cid:12)(cid:12)ED0[|h(X) \u2212 h\u2217\n(cid:12)(cid:12)ED0[|h(X) \u2212 h\u2217\n1)(cid:12)(cid:12), realize that |h(X) \u2212 h\u2217\nTo bound(cid:12)(cid:12)\u03b50(h, h\u2217\n(cid:12)(cid:12)\u03b50(h, h\u2217\n1)(cid:12)(cid:12) =(cid:12)(cid:12)\u03b50(h, h\u2217\n\u2264(cid:12)(cid:12)\u03b50(h, h\u2217\n\n0) \u2212 \u03b51(h, h\u2217\n0) \u2212 \u03b51(h, h\u2217\n\n1(X)|](cid:12)(cid:12) =(cid:12)(cid:12)\u03b50(h, h\u2217\n1)(cid:12)(cid:12) +(cid:12)(cid:12)\u03b50(h, h\u2217\n\n0(X)|] \u2212 ED1 [|h(X) \u2212 h\u2217\n\n0) \u2212 \u03b50(h, h\u2217\n0) \u2212 \u03b50(h, h\u2217\n1) + dTV(D0(X),D1(X)),\n\n\u2264 \u03b50(h\u2217\n\n0, h\u2217\n\n0) \u2212 \u03b51(h, h\u2217\n\na(X)| \u2208 {0, 1}. On one hand, we have:\n1) + \u03b50(h, h\u2217\n\n1) \u2212 \u03b51(h, h\u2217\n1) \u2212 \u03b51(h, h\u2217\n\na(X)|](cid:12)(cid:12)\n1(X)|](cid:12)(cid:12).\n1)(cid:12)(cid:12).\n1)(cid:12)(cid:12)\n1)(cid:12)(cid:12)\n\n7\n\n\fwhere the last inequality is due to(cid:12)(cid:12)\u03b50(h, h\u2217\n1)(cid:12)(cid:12) \u2264 supE |D0(E) \u2212 D1(E)| = dTV(D0,D1). Similarly, by subtracting and adding back \u03b51(h, h\u2217\ninstead, we can also show that(cid:12)(cid:12)\u03b50(h, h\u2217\n1)(cid:12)(cid:12) \u2264 min{\u03b50(h\u2217\n\n1)(cid:12)(cid:12) =(cid:12)(cid:12)D0(|h \u2212 h\u2217\n1)(cid:12)(cid:12) \u2264 \u03b51(h\u2217\n\nCombine the above two inequalities yielding:\n\n1) + dTV(D0(X),D1(X)).\n\n1| = 1) \u2212 D1(|h \u2212 h\u2217\n\n(cid:12)(cid:12)\u03b50(h, h\u2217\n\n1)} + dTV(D0(X),D1(X)).\nIncorporating the two noise terms back to the above inequality then completes the proof.\n\n0) \u2212 \u03b51(h, h\u2217\n\n1), \u03b51(h\u2217\n\n0, h\u2217\n\n0, h\u2217\n\n0) \u2212 \u03b51(h, h\u2217\n\n0, h\u2217\n\n1) \u2212 \u03b51(h, h\u2217\n\n1| =\n0)\n\n(cid:4)\n\n4 Empirical Validation\n\nOur theoretical results on the lower bound imply that over-training the feature transformation function\nto achieve group-invariant representations will inevitably lead to large joint errors. On the other\nhand, our upper bound also implies that group-invariant representations help to achieve accuracy\nparity. To verify these theoretical implications, in this section we conduct experiments on a real-world\nbenchmark dataset, the UCI Adult dataset, to present empirical results with various metrics.\n\nDataset The Adult dataset contains 30,162/15,060 training/test instances for income prediction.\nEach instance in the dataset describes an adult from the 1994 US Census. Attributes include gender,\neducation level, age, etc. In this experiment we use gender (binary) as the sensitive attribute, and we\npreprocess the dataset to convert categorical variables into one-hot representations. The processed\ndata contains 114 attributes. The target variable (income) is also binary: 1 if \u2265 50K/year otherwise 0.\nFor the sensitive attribute A, A = 0 means Male otherwise Female. In this dataset, the base rates\nacross groups are different: Pr(Y = 1 | A = 0) = 0.310 while Pr(Y = 1 | A = 1) = 0.113. Also,\nthe group ratios are different: Pr(A = 0) = 0.673.\n\nExperimental Protocol To validate the effect of learning group-invariant representations with\nadversarial debiasing techniques [5, 26, 34], we perform a controlled experiment by \ufb01xing the\nbaseline network architecture to be a three hidden-layer feed-forward network with ReLU activations.\nThe number of units in each hidden layer are 500, 200, and 100, respectively. The output layer\ncorresponds to a logistic regression model. This baseline without debiasing is denoted as NoDebias.\nFor debiasing with adversarial learning techniques, the adversarial discriminator network takes\nthe feature from the last hidden layer as input, and connects it to a hidden-layer with 50 units,\nfollowed by a binary classi\ufb01er whose goal is to predict the sensitive attribute A. This model is\ndenoted as AdvDebias. Compared with NoDebias, the only difference of AdvDebias in terms of\nobjective function is that besides the cross-entropy loss for target prediction, the AdvDebias also\ncontains a classi\ufb01cation loss from the adversarial discriminator to predict the sensitive attribute\nA. In the experiment, all the other factors are \ufb01xed to be the same between these two methods,\nincluding learning rate, optimization algorithm, training epoch, and also batch size. To see how the\nadversarial loss affects the joint error, the demographic parity as well as the accuracy parity, we vary\nthe coef\ufb01cient \u03c1 for the adversarial loss between 0.1, 1.0, 5.0 and 50.0.\nResults and Analysis The experimental results are listed in Table 2. Note that in the table |ErrD0 \u2212\n\u2206DP((cid:98)Y ) measures the closeness of the classi\ufb01er to satisfy demographic parity. From the table, it is\nErrD1| could be understood as measuring an approximate version of accuracy parity, and similarly\n\u2206DP((cid:98)Y ) is drastically decreasing with the increasing of \u03c1. Furthermore, |ErrD0 \u2212 ErrD1| is also\ngradually decreasing, but much slowly than \u2206DP((cid:98)Y ). This is due to the existing noise in the data\n\nthen clear that with increasing \u03c1, both the overall error ErrD (sensitive to the marginal distribution of\nA) and the joint error ErrD0 + ErrD1 (insensitive to the imbalance of A) are increasing. As expected,\n\nas well as the shift between the optimal decision functions across groups, as indicated by our upper\nbound. To conclude, all the empirical results are consistent with our theoretical \ufb01ndings.\n\n5 Related Work\n\nFairness Frameworks Two central notions of fairness have been extensively studied, i.e., group\nfairness and individual fairness. In a seminal work, Dwork et al. [10] de\ufb01ne individual fairness\nas a measure of smoothness of the classi\ufb01cation function. Under the assumption that number of\n\n8\n\n\fTable 2: Adversarial debiasing on demographic parity, joint error across groups, and accuracy parity.\n\nNoDebias\nAdvDebias, \u03c1 = 0.1\nAdvDebias, \u03c1 = 1.0\nAdvDebias, \u03c1 = 5.0\nAdvDebias, \u03c1 = 50.0\n\nErrD\n0.157\n0.159\n0.162\n0.166\n0.201\n\nErrD0 + ErrD1\n\n0.275\n0.278\n0.286\n0.295\n0.360\n\n|ErrD0 \u2212 ErrD1| \u2206DP((cid:98)Y )\n\n0.115\n0.116\n0.106\n0.106\n0.112\n\n0.189\n0.190\n0.113\n0.032\n0.028\n\nindividuals is \ufb01nite, the authors proposed a linear programming framework to maximize the utility\nunder their fairness constraint. However, their framework requires apriori a distance function that\ncomputes the similarity between individuals, and their optimization formulation does not produce\nan inductive rule to generalize to unseen data. Based on the de\ufb01nition of positive rate parity, Hardt\net al. [14] proposed a post-processing method to achieve fairness by taking as input the prediction\nand the sensitive attribute. In a concurrent work, Kleinberg et al. [21] offer a calibration technique\nto achieve the corresponding fairness criterion as well. However, both of the aforementioned two\napproaches require sensitive attribute during the inference phase, which is not available in many\nreal-world scenarios.\nRegularization Techniques The line of work on fairness-aware learning through regularization\ndates at least back to Kamishima et al. [19], where the authors argue that simple deletion of sensitive\nfeatures in data is insuf\ufb01cient for eliminating biases in automated decision making, due to the possible\ncorrelations among attributes and sensitive information [25]. In light of this, the authors proposed a\nprejudice remover regularizer that essentially penalizes the mutual information between the predicted\ngoal and the sensitive information. In a more recent approach, Zafar et al. [31] leveraged a measure\nof decision boundary fairness and incorporated it via constraints into the objective function of logistic\nregression as well as support vector machines. As discussed in Section 2, both approaches essentially\nreduce to achieving demographic parity through regularization.\nRepresentation Learning In a pioneer work, Zemel et al. [33] proposed to preserve both group\nand individual fairness through the lens of representation learning, where the main idea is to \ufb01nd a\ngood representation of the data with two competing goals: to encode the data for utility maximization\nwhile at the same time to obfuscate any information about membership in the protected group.\nDue to the power of learning rich representations offered by deep neural nets, recent advances in\nbuilding fair automated decision making systems focus on using adversarial techniques to learn\nfair representation that also preserves enough information for the prediction vendor to achieve his\nutility [1, 5, 11, 24, 30, 34, 37]. Madras et al. [26] further extended this approach by incorporating\nreconstruction loss given by an autoencoder into the objective function to preserve demographic\nparity, equalized odds, and equal opportunity.\n\n6 Conclusion\n\nIn this paper we theoretically and empirically study the important problem of quantifying the tradeoff\nbetween utility and fairness in learning group-invariant representations. Speci\ufb01cally, we prove a\nnovel lower bound to characterize the tradeoff between demographic parity and the joint utility across\ndifferent population groups when the base rates differ between groups. In particular, our results\nimply that any method aiming to learn fair representations admits an information-theoretic lower\nbound on the joint error, and the better the representation, the larger the joint error. Complementary\nto our negative results, we also show that learning fair representations leads to accuracy parity if\nthe optimal decision functions across different groups are close. These theoretical \ufb01ndings are also\ncon\ufb01rmed empirically on real-world datasets. We believe our results take an important step towards\nbetter understanding the tradeoff between utility and different notions of fairness. Inspired by our\nlower bound, one interesting direction for future work is to design instance-weighting algorithm to\nbalance the base rates during representation learning.\n\nAcknowledgments\nHZ and GG would like to acknowledge support from the DARPA XAI project, contract\n#FA87501720152 and an Nvidia GPU grant. HZ would also like to thank Jianfeng Chi for helpful\ndiscussions on the relationship between algorithmic fairness and privacy-preservation learning.\n\n9\n\n\fReferences\n[1] Tameem Adel, Isabel Valera, Zoubin Ghahramani, and Adrian Weller. One-network adversarial\n\nfairness. In 33rd AAAI Conference on Arti\ufb01cial Intelligence, 2019.\n\n[2] Syed Mumtaz Ali and Samuel D Silvey. A general class of coef\ufb01cients of divergence of one\ndistribution from another. Journal of the Royal Statistical Society: Series B (Methodological),\n28(1):131\u2013142, 1966.\n\n[3] Solon Barocas and Andrew D Selbst. Big data\u2019s disparate impact. Calif. L. Rev., 104:671, 2016.\n\n[4] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in\ncriminal justice risk assessments: The state of the art. Sociological Methods & Research, page\n0049124118782533, 2018.\n\n[5] Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H Chi. Data decisions and theoretical implications\n\nwhen adversarially learning fair representations. arXiv preprint arXiv:1707.00075, 2017.\n\n[6] Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classi\ufb01ers with independency\nconstraints. In 2009 IEEE International Conference on Data Mining Workshops, pages 13\u201318.\nIEEE, 2009.\n\n[7] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism\n\nprediction instruments. Big data, 5(2):153\u2013163, 2017.\n\n[8] Imre Csisz\u00e1r. Eine informationstheoretische ungleichung und ihre anwendung auf beweis der\nergodizitaet von markoffschen ketten. Magyer Tud. Akad. Mat. Kutato Int. Koezl., 8:85\u2013108,\n1964.\n\n[9] Imre Csisz\u00e1r. Information-type measures of difference of probability distributions and indirect\n\nobservation. studia scientiarum Mathematicarum Hungarica, 2:229\u2013318, 1967.\n\n[10] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness\nthrough awareness. In Proceedings of the 3rd innovations in theoretical computer science\nconference, pages 214\u2013226. ACM, 2012.\n\n[11] Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv\n\npreprint arXiv:1511.05897, 2015.\n\n[12] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Fran\u00e7ois\nLaviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural\nnetworks. The Journal of Machine Learning Research, 17(1):2096\u20132030, 2016.\n\n[13] Jihun Hamm. Minimax \ufb01lter: Learning to preserve privacy from inference attacks. The Journal\n\nof Machine Learning Research, 18(1):4704\u20134734, 2017.\n\n[14] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In\n\nAdvances in neural information processing systems, pages 3315\u20133323, 2016.\n\n[15] James E Johndrow, Kristian Lum, et al. An algorithm for removing sensitive information:\napplication to race-independent recidivism prediction. The Annals of Applied Statistics, 13(1):\n189\u2013220, 2019.\n\n[16] Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil\nThorat, Fernanda Vi\u00e9gas, Martin Wattenberg, Greg Corrado, et al. Google\u2019s multilingual neural\nmachine translation system: Enabling zero-shot translation. Transactions of the Association for\nComputational Linguistics, 5:339\u2013351, 2017.\n\n[17] Faisal Kamiran and Toon Calders. Classifying without discriminating. In 2009 2nd International\n\nConference on Computer, Control and Communication, pages 1\u20136. IEEE, 2009.\n\n[18] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regu-\nlarization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops,\npages 643\u2013650. IEEE, 2011.\n\n10\n\n\f[19] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classi\ufb01er\nwith prejudice remover regularizer. In Joint European Conference on Machine Learning and\nKnowledge Discovery in Databases, pages 35\u201350. Springer, 2012.\n\n[20] Mohammadali Khosravifard, Dariush Fooladivanda, and T Aaron Gulliver. Con\ufb02iction of the\nconvexity and metric properties in f-divergences. IEICE Transactions on Fundamentals of\nElectronics, Communications and Computer Sciences, 90(9):1848\u20131853, 2007.\n\n[21] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair\n\ndetermination of risk scores. arXiv preprint arXiv:1609.05807, 2016.\n\n[22] Friedrich Liese and Igor Vajda. On divergences and informations in statistics and information\n\ntheory. IEEE Transactions on Information Theory, 52(10):4394\u20134412, 2006.\n\n[23] Jianhua Lin. Divergence measures based on the shannon entropy.\n\nInformation theory, 37(1):145\u2013151, 1991.\n\nIEEE Transactions on\n\n[24] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational\n\nfair autoencoder. arXiv preprint arXiv:1511.00830, 2015.\n\n[25] Kristian Lum and James Johndrow. A statistical framework for fair predictive algorithms. arXiv\n\npreprint arXiv:1610.08077, 2016.\n\n[26] David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially\nfair and transferable representations. In International Conference on Machine Learning, pages\n3381\u20133390, 2018.\n\n[27] Arvind Narayanan. Translation tutorial: 21 fairness de\ufb01nitions and their politics. In Proc. Conf.\n\nFairness Accountability Transp., New York, USA, 2018.\n\n[28] Executive Of\ufb01ce of the President. Big data: A report on algorithmic systems, opportunity, and\n\ncivil rights. Executive Of\ufb01ce of the President, 2016.\n\n[29] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness\nand calibration. In Advances in Neural Information Processing Systems, pages 5680\u20135689,\n2017.\n\n[30] Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano Ermon. Learning\ncontrollable fair representations. In Arti\ufb01cial Intelligence and Statistics, pages 2164\u20132173,\n2019.\n\n[31] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi.\nFairness constraints: Mechanisms for fair classi\ufb01cation. arXiv preprint arXiv:1507.05259,\n2015.\n\n[32] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi.\nFairness beyond disparate treatment & disparate impact: Learning classi\ufb01cation without dis-\nparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web,\npages 1171\u20131180. International World Wide Web Conferences Steering Committee, 2017.\n\n[33] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representa-\n\ntions. In International Conference on Machine Learning, pages 325\u2013333, 2013.\n\n[34] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with\nadversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and\nSociety, pages 335\u2013340. ACM, 2018.\n\n[35] Han Zhao, Jianfeng Chi, Yuan Tian, and Geoffrey J. Gordon. Adversarial privacy preservation\n\nunder attribute inference attack. arXiv preprint arXiv:1906.07902, 2019.\n\n[36] Han Zhao, Remi Tachet des Combes, Kun Zhang, and Geoffrey J Gordon. On learning invariant\nrepresentation for domain adaptation. In International Conference on Machine Learning, 2019.\n[37] Han Zhao, Amanda Coston, Tameem Adel, and Geoffrey J. Gordon. Conditional learning of\n\nfair representations. arXiv preprint arXiv:1910.07162, 2019.\n\n[38] Indre Zliobaite. On the relation between accuracy and fairness in binary classi\ufb01cation. arXiv\n\npreprint arXiv:1505.05723, 2015.\n\n11\n\n\f", "award": [], "sourceid": 9114, "authors": [{"given_name": "Han", "family_name": "Zhao", "institution": "Carnegie Mellon University"}, {"given_name": "Geoff", "family_name": "Gordon", "institution": "Microsoft"}]}