{"title": "Equal Opportunity in Online Classification with Partial Feedback", "book": "Advances in Neural Information Processing Systems", "page_first": 8974, "page_last": 8984, "abstract": "We study an online classification problem with partial feedback in which individuals arrive one at a time from a fixed but unknown distribution, and must be classified as positive or negative. Our algorithm only observes the true label of an individual if they are given a positive classification. This setting captures many classification problems for which fairness is a concern: for example, in criminal recidivism prediction, recidivism is only observed if the inmate is released; in lending applications, loan repayment is only observed if the loan is granted. We require that our algorithms satisfy common statistical fairness constraints (such as equalizing false positive or negative rates --- introduced as \"equal opportunity\" in Hardt et al. (2016)) at every round, with respect to the underlying distribution. We give upper and lower bounds characterizing the cost of this constraint in terms of the regret rate (and show that it is mild), and give an oracle efficient algorithm that achieves the upper bound.", "full_text": "Equal Opportunity in Online Classi\ufb01cation with\n\nPartial Feedback\n\nYahav Bechavod\nHebrew University\n\nyahav.bechavod@cs.huji.ac.il\n\nKatrina Ligett\n\nHebrew University\n\nkatrina@cs.huji.ac.il\n\nAaron Roth\n\nUniversity of Pennsylvania\naaroth@cis.upenn.edu\n\nBo Waggoner\n\nUniversity of Colorado\nbwag@colorado.edu\n\nZhiwei Steven Wu\n\nUniversity of Minnesota\n\nzsw@umn.edu\n\nAbstract\n\nWe study an online classi\ufb01cation problem with partial feedback in which indi-\nviduals arrive one at a time from a \ufb01xed but unknown distribution, and must be\nclassi\ufb01ed as positive or negative. Our algorithm only observes the true label of an\nindividual if they are given a positive classi\ufb01cation. This setting captures many\nclassi\ufb01cation problems for which fairness is a concern: for example, in criminal\nrecidivism prediction, recidivism is only observed if the inmate is released; in\nlending applications, loan repayment is only observed if the loan is granted. We\nrequire that our algorithms satisfy common statistical fairness constraints (such as\nequalizing false positive or negative rates \u2014 introduced as \u201cequal opportunity\u201d in\n[18]) at every round, with respect to the underlying distribution. We give upper\nand lower bounds characterizing the cost of this constraint in terms of the regret\nrate (and show that it is mild), and give an oracle ef\ufb01cient algorithm that achieves\nthe upper bound.1\n\n1\n\nIntroduction\n\nMany real-world prediction tasks in which fairness concerns arise \u2014 such as online advertising,\nshort-term hiring, lending micro-loans, and predictive policing \u2014 are naturally modeled as online\nbinary classi\ufb01cation problems, but with an important twist: feedback is only received for one of the\ntwo classi\ufb01cation outcomes. Clickthrough is only observed if the advertisement is shown; worker\nperformance is only observed for candidates who were actually hired; those who are denied a loan\nnever have an opportunity to demonstrate that they would have repaid; only if police troops were\ndispatched to a precinct they able to detect unreported crimes. Applying standard techniques for\nenforcing statistical fairness constraints on the gathered data can thus lead to pernicious feedback\nloops that can lead to classi\ufb01ers that badly violate these constraints on the underlying distribution.\nThis kind of failure to \u201cexplore\u201d has been highlighted as an important source of algorithmic unfairness\n\u2014 for example, in predictive policing settings [26, 13, 14].\nTo avoid this problem, it is important to explicitly manage the exploration/exploitation tradeoff that\ncharacterizes learning in partial feedback settings, which is what we study in this paper. We ask for\nalgorithms that enforce well-studied statistical fairness constraints across two protected populations\n(we focus on the \u201cequal opportunity\u201d constraint of [18], which enforces equalized false positive\nrates or false negative rates, but our techniques also apply to other statistical fairness constraints\nlike \u201cstatistical parity\u201d [12]). In particular, we ask for algorithms that satisfy these constraints (with\n\n1The full version of this paper is available at https://arxiv.org/abs/1902.02242.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\frespect to the unknown underlying distribution) at every round of the learning procedure. The result\nis that the fairness constraints restrict how our algorithms can explore, not just how they can exploit,\nwhich makes the problem of fairness-constrained online learning substantially different from in the\nbatch setting. The main question that we explore in this paper is: \u201chow much does the constraint of\nfairness impact the regret bound of learning algorithms?\u201d\n\n1.1 Our Model and Results\nIn our setting, there is an unknown distribution D over examples, which are triples (\u02c6x, a, y) 2\nX \u21e5 {1, 1} \u21e5 {1, 1}. Here \u02c6x 2X represents a vector of features in some arbitrary feature space,\na 2 A = {\u00b11} is the group to which this example belongs (which we also call the sensitive feature),\nand y 2Y = {\u00b11} is a binary label. We write x to denote a pair (\u02c6x, a) \u2013 the set of all features\n(including the sensitive one) that the learner has access to.\nIn each round t 2 [T ], our learner selects hypotheses from a hypothesis class H consisting of\nfunctions h : X\u21e5 A !Y recommending an action (or label) as a function of the features (potentially\nincluding the sensitive feature). We take the positive label to be the one that corresponds to observing\nfeedback (hiring a worker, admitting a student, approving a loan, releasing an inmate, etc.) We allow\nalgorithms that randomize over H. Let (H) be the set of probability distributions over H. We refer\nto a \u21e1 2 (H) as a convex combination of classi\ufb01ers.\nDe\ufb01nition 1.1 (False positive rate). For a \ufb01xed distribution D on examples, we de\ufb01ne the false\npositive rate (FPR) of a convex combination of classi\ufb01ers \u21e1 2 (H) on group j 2 {\u00b11} to be\n(h(x) = +1|a = j, y = 1) .\n\nF P Rj(\u21e1) = P(\u21e1(x) = +1|a = j, y = 1) = E\n\nh\u21e0\u21e1\uf8ff P\n\n(x,y)\u21e0D\n\nWe denote the difference between false positive rates between populations as\n\nF P R(\u21e1) := F P R1(\u21e1) F P R1(\u21e1).\n\nThe fairness constraint we impose on our classi\ufb01ers in this paper asks that false positive rates be\napproximately equalized across populations at every round t. Throughout, analogous results hold for\nfalse negative rates. These constraints were called equal opportunity constraints in [18].\nDe\ufb01nition 1.2 (-equalized rates [18]). Fix a distribution D. A convex combination \u21e1 2 (H)\nsatis\ufb01es the -equalized false positive rate (-EFP) constraint if |F P R(\u21e1)|\uf8ff . We informally use\nthe term -fair to refer to such a classi\ufb01er or combination of classi\ufb01ers.\n\nAs we will see in De\ufb01nition 2.1, we will actually allow our algorithm to have a tiny probability of\never breaking the fairness constraint.\nRemark 1.3. The sources of unfairness we deal with here are the differential abilities of models\nin H to predict on different populations (which we inherit from the batch setting), and the biased\ndata collection inherent in online partial information settings. We use equal opportunity constraints\nonly as a canonical example of a statistical fairness constraint and do not take the position that it is\nalways the right one. Our techniques also apply to other constraints like statistical parity.\nNote that the fairness constraint is de\ufb01ned with respect to the true underlying distribution D. One of\nthe primary dif\ufb01culties we face is that in early rounds, the learner has very little information about D,\nand yet is required to satisfy the fairness constraint with respect to D.\nIt is straightforward to see (and a consequence of a more general lower bound that we prove) that a\n-fair algorithm cannot in general achieve non-trivial regret to the set of -fair convex combinations\nof classi\ufb01ers, because of ongoing statistical uncertainty about the fairness level for all non-trivial\nclassi\ufb01ers. Thus our goal is to minimize our regret to the -fair convex combination of classi\ufb01ers that\nhas the lowest classi\ufb01cation error on D, while guaranteeing that our algorithm only deploys convex\ncombinations of classi\ufb01ers that guarantee fairness level 0 for some 0 > . Clearly, the optimal\nregret bound will be a function of the gap (0 ), and one of our aims is to characterize this tradeoff.\nOur results. We show that the tradeoff achieved by the inef\ufb01cient algorithm is tight by proving\na lower bound in Section 4. In some sense, the computational inef\ufb01ciency of the simple bandits\nreduction above is unavoidable, because we measure the regret of our learner with respect to 0/1\n\n2\n\n\fclassi\ufb01cation error, which is computationally hard to minimize, even for very simple classes H\n(see, e.g., [22, 16, 11]). However, we can still hope to give an oracle ef\ufb01cient algorithm for our\nproblem. This approach, which is common in the contextual bandits literature, assumes access to\nan \u201coracle\u201d which can in polynomial time solve the empirical risk minimization problem over H\n(absent fairness constraints), and is an attractive way to isolate the \u201chard part\u201d of the problem that is\noften tractable in practice. Our main result, to which we devote the body of the paper, is to show that\naccess to such an oracle is suf\ufb01cient to give a polynomial-time algorithm for the fairness-constrained\nlearning problem, matching the simple information theoretically optimal bounds described above.\nTo do this, we use two tools. Our high-level strategy is to apply the oracle ef\ufb01cient stochastic\ncontextual bandit algorithm from [2]. In order to do this, we need to supply it with an of\ufb02ine learning\noracle for the set of classi\ufb01ers that can with high probability be certi\ufb01ed to satisfy our fairness\nconstraints given the data so far. We construct an approximate oracle for this problem (given a\nlearning oracle for H) using the oracle-ef\ufb01cient reduction for of\ufb02ine fair classi\ufb01cation from [1]. We\nneed to overcome a number of technical dif\ufb01culties stemming from the fact that the fair oracle that we\ncan construct is only an approximate empirical risk minimizer, whereas the oracle assumed in [2] is\nexact. Moreover, the algorithm from [2] assumes a \ufb01nite hypothesis class, whereas we need to obtain\nno regret to a continuous family of distributions over hypotheses. The \ufb01nal result is an oracle-ef\ufb01cient\nalgorithm trading off between regret and fairness, allowing for a regret bound of O(T 2\u21b5) to the best\n-fair classi\ufb01er while satisfying 0-fairness at every round, with a gap of 0 = O(T \u21b5) for\n\u21b5 2 [0.25, 0.5].\n1.2 Additional Related Work\nWe build on two lines of work in the fair machine learning literature. First, batch (non-online)\nclassi\ufb01cation under a variety of statistical fairness constraints: raw classi\ufb01cation rates [8, 23, 15]\n(statistical parity [12]), positive predictive value [24, 9], and false positive and false negative rates [24,\n9, 18] ; see [5] for more examples. Second, fair online classi\ufb01cation and regression in the contextual\nbandit setting [20, 21, 25]. Unlike some of this prior work that demands stringent individual fairness\nconstraints at every round, which requires strong realizability assumptions to avoid lower bounds\n[20], this paper interpolates by requiring statistical fairness constraints to be enforced, but they still\nmust hold for every round. This allows much stronger positive results. Recently, [7] considered the\nproblem of enforcing statistical fairness in online learning, but from a very different perspective. That\nwork studies the full information and adversarial setting with false positive and error rates averaged\nacross rounds. Here, we have partial and bandit feedback, distributional assumptions, and require\nstatistical fairness guarantees at every round.\n\n2 Additional Preliminaries\nThroughout the paper, we assume +1,1 2H , where +1 and 1 are the two constant classi\ufb01ers\n(that is, +1(x) = 1 and 1(x) = 1 for all x). In some cases, we will additionally assume\n+a,a 2H , where +a and a are the identity function (and its negation) on the sensitive feature\n(that is, +a(\u02c6x, a) = a and a(\u02c6x, a) = a for all \u02c6x, a).\nThe Online Setting: The learner interacts with the environment as follows. For each round\nt = 1, . . . , T , the learner chooses some convex combination \u21e1t 2 (H). The environment draws\n(xt, yt) \u21e0D independently; the learner observes xt. The learner labels the point \u02c6yt = ht(xt), where\nht \u21e0 \u21e1t. If \u02c6yt = +1, the learner observes yt; otherwise, there is no feedback for this round.\nWe measure a learner\u2019s performance using 0-1 loss, `(\u02c6yt, yt) = [\u02c6yt 6= yt]. Given a class of\ndistributions P over H \u2713H and a sequence of T examples, the optimal convex combination of\nhypotheses from H in hindsight is de\ufb01ned as \u21e1\u21e4(P) = argmin\u21e12PPT\nA learner\u2019s (pseudo)-regret with respect to P is\n\nt=1 Eh\u21e0P [`(h(xt), yt)].\n\nTXt=1\n\n3\n\nRegret =\n\nTXt=1\n\nE\n\n(xt,yt)\u21e0D\n\n[`(h(xt), yt)] \n\nE\n\n(xt,yt)\u21e0D h\u21e0\u21e1\u21e4(P)\n\n[`(h(xt), yt)].\n\nIn particular, when P = {\u21e1 2 (H) : \u21e1 satis\ufb01es -EFP}, we call this the learner\u2019s -EFP regret.\nFinally, we ask for online learning algorithms that satisfy the following notion of fairness:\n\n\fDe\ufb01nition 2.1 (A -EFP() online learning algorithm). An online learning algorithm is said to satisfy\n-EFP() fairness (for 2 [0, 1]) if, with probability 1 over the draw of {(xt, at, yt)}T\nt=1 \u21e0D T ,\nsimultaneously for all rounds t 2 [T ]: \u21e1t satis\ufb01es -EFP.\nCost Sensitive Classi\ufb01cation Algorithms: We aim to give oracle-ef\ufb01cient online learning algo-\nrithms \u2014 that is, algorithms that run in polynomial time per round, assuming access to an oracle\nwhich can solve the corresponding of\ufb02ine empirical risk minimization problem. Concretely, we\nassume oracles for solving cost sensitive classi\ufb01cation (CSC) problems over H, which are de\ufb01ned by\na set of examples xj and a set of weights c1\n, c+1\nj 2 R corresponding to the cost of a negative and\npositive classi\ufb01cation respectively.\nDe\ufb01nition 2.2. Given an instance of a CSC problem S = {xj, c1\nj=1, a CSC oracle O for H\nreturns O(S) 2 arg minh2H Pn\n. From these oracles, we will construct \u232b-approximate\nCSC oracles that may have restricted ranges \u21e7 \u2713 (H). Such oracles return O\u232b(S) = \u21e1 2 \u21e7 such\nthat Eh\u21e0\u21e1[Pn\n\n] \uf8ff arg min\u21e12\u21e7 Eh\u21e0\u21e1[Pn\n\nFrom \u201cApple Tasting\u201d to Contextual Bandits: Online classi\ufb01cation problems under the feedback\nmodel we study were \ufb01rst described as \u201cApple Tasting\u201d problems [19]. The algorithm\u2019s loss at each\nround accumulates according to the following loss matrix:\n\nj=1 c(h(xj ))\n\nj\n\nj=1 c(h(xj ))\n\nj\n\n, c+1\nj }n\n\nj\n\nj\n\nj=1 c(h(xj ))\n\nj\n\n] + \u232b.\n\nL =\n\ny = +1\n\n0\n1\n\n\u02c6y = +1\n\u02c6y = 1\n\n\u21e3\n\ny = 1\n\n1\n0\n\n,\n\n\u2318\n\nbut feedback is only observed for positive classi\ufb01cations (when \u02c6y = +1). This is a different feedback\nmodel than the more commonly studied contextual bandits setting. In that setting, the learner always\nget to observe the loss of the selected action (regardless of a positive or a negative classi\ufb01cation). We\nwill defer the formal description of contextual bandits to the appendix.\nIt is nevertheless straightforward to transform the apple tasting setting into the contextual bandits\nsetting (similar observations have been previously made [4]).\nProposition 2.3. Let A be an contextual bandits algorithm that guarantees a regret bound R(T ) with\nprobability 1 . There exists a transformation that maps feedbacks for apple tasting to feedbacks for\ncontextual bandits such that A gurantees regret bound 2R(T ) with probability 1 when running\non the transformed feedbacks on any apple tasting instance.\n\nBaseline approaches. Given the reduction above, we can draw on standard methods from contex-\ntual bandits to solve our fair online learning problem. A simple baseline approach that is oracle-\nef\ufb01cient is to perform \u201cexploration-then-exploitation\u201d: the learner \ufb01rst \u201cexplores\u201d by predicting\n+1 for roughly T 2/3 rounds, then \u201cexploit\u201d what we have learned by deploying the (empirically)\nbest performing fair policy. This approach would guarantee a sub-optimal regret bound of \u02dcO(T 2\n3 )\nto the best -fair classi\ufb01er, while satisfying a 0-fairness constraint at every round with a gap of\n(0 ) = O(T 1\n3 ).\nA more sophisticated approach starts with the observation (Lemma B.1) that although the set of \u201cfair\ndistributions over classi\ufb01ers\u201d is continuously large, the \u201cfair empirical risk minimization\u201d problem\nonly has a single constraint, and so we may without loss of generality consider distributions over\nhypotheses H that have support of size 2. By an appropriate discretization, this allows us to restrict\nattention to a \ufb01nite net of classi\ufb01ers whenever H itself is \ufb01nite. From this observation, one could\nemploy a simple strategy to obtain an information theoretic result: Fix any parameter \u21b5 2 [1/4, 1/2].\nThe learner \ufb01rst predicts +1 for roughly T 2\u21b5 rounds, then uses the collected data to de\ufb01ne a set of fair\npolicies according to the observed empirical distribution, and lastly runs the EXP4 algorithm [3, 6]\nover the set of fair policies. Such algorithm obtains a regret bound of O(T 2\u21b5) to the best -fair\nclassi\ufb01er, while satisfying a 0-fairness constraint at every round with a gap of (0 ) = O(T \u21b5).\nHowever, this algorithm needs to maintain a distribution of exponential size, and our goal is to match\nits regret rate with an oracle-ef\ufb01cient algorithm.\n\n4\n\n\f3 An Oracle-Ef\ufb01cient Algorithm\n\nOur algorithm proceeds in two phases. First, during the \ufb01rst T0 rounds, the algorithm performs pure\nexploration and always predicts +1 to collect labelled data. Because constant classi\ufb01ers exactly\nequalize the false positive rates across populations, each exploration round satis\ufb01es our fairness\nconstraint. The algorithm then use the collected data to form empirical fairness constraints, which we\nuse to de\ufb01ne our construction of a fair CSC oracle, given a CSC oracle unconstrained by fairness.\nThen, in the remaining rounds, we will run an adaptive contextual bandit algorithm that minimizes\ncumulative regret, while satisfying the empirical fairness constraint at every round.\nWe make two mild assumptions to simplify our analysis and the statement of our \ufb01nal bounds. First,\nwe assume that negative examples from each of the two protected groups have constant probability\nmass: Pr[a = 1, y = 1], Pr[a = 1, y = 1] 2 \u2326(1). Second, we assume that the hypothesis\nclass H contains the two constant classi\ufb01ers and the identity function and its negation on the protected\nattribute: {+1,1, +a,a}\u2713H .\nOur main theorem is as follows:\nTheorem 3.1. For any H and data distribution satisfying the two mild assumptions above, there\nexists an oracle-ef\ufb01cient algorithm that takes parameters 2 [0, 1pT\n] and 0 as input and satis\ufb01es\n( + )-EFP() fairness and has an expected regret at most \u02dcO(pT ln(|H|/)) with respect to the\nclass of -EFP fair policies, where = O(pln(|H|/)/T 1/4).\nRemark 3.2. More generally, we can extend Theorem 3.1 to give an algorithm that satis\ufb01es ( + )-\n2 + pT ln(|H|/)\u25c6 with\nEFP() for any > 0, and achieves an expected regret at most \u02dcO\u2713 ln( |H| )\nrespect to the class of -EFP fair policies.\nRemark 3.3. We state our theorem in what we believe is the most attractive parametric regime:\nwhen it can obtain a regret bound of O(pT ). But it is straightforward, by modifying the length of the\nexploration round, to obtain a more general tradeoff\u2014a regret bound of O(T 2\u21b5) with respect to the\nset of -EFP fair policies, while satisfying ( + O(T \u21b5))-EFP() fairness, for any \u21b5 2 [1/4, 1/2].\nThis tradeoff is tight, as we show in Section 4.\n\nAlgorithm. The outline of our algorithm is as follows.\n1. Label the \ufb01rst T0 arrivals as \u02c6yt = 1; observe their true labels.\n2. Based on this data, construct an ef\ufb01cient FairCSC oracle. The oracle will be given a cost-sensitive\nclassi\ufb01cation objective. It returns an approximately-optimal convex combination \u21e1 of hypotheses\nsubject to the linear constraint of ( + T 1/4)-EFP on the empirical distribution of data. We show the\nalgorithm can be implemented to always return a member of \u21e7, de\ufb01ned to be the set of mixtures on\nH with support size two whose empirical fairness on the exploration data is at most + \u02dcO(T 1/4).\n3. Instantiate a bandit algorithm with policy class \u21e7. The bandit algorithm, a modi\ufb01cation of [2], is\ndescribed in detail in the next sections. In order to select its hypotheses, the bandit algorithm makes\ncalls to the FairCSC oracle we implemented above.\n4. For the remaining rounds t > T0, choose labels \u02c6yt selected by the bandit algorithm and provide\nfeedback to the bandit algorithm via the reduction given by Proposition 2.3.\n\nAnalysis.\nIn the remainder of this section, we present our analysis in three main steps. First, we\nstudy the empirical fairness constraint given by the data collected during the exploration phase and\ngive a reduction from a cost-sensitive classi\ufb01cation problem subject to such fairness constraint to a\nstandard cost-sensitive classi\ufb01cation problem absent the constraint, based on [1]. We need to perform\ntwo modi\ufb01cations on the reduction method in [1]. First, we allow our algorithm to handle fairness\nconstraints de\ufb01ned by a separate data set that is different from the one de\ufb01ning the cost objective.\nSecondly, we also provide a fair approximate CSC oracle that returns a sparse solution, a distribution\nover H with support size of at most 2. This will be useful for establishing uniform convergence.\nNext, we present the algorithm run in the second phase: at each round t > T0, the algorithm makes a\nprediction based on a randomized policy \u21e1t 2 (H), which is a solution to a feasibility program\ngiven by [2]. We show how to rely on an approximate fair CSC oracle to solve this program ef\ufb01ciently.\nConsequently, we generalize the results of [2] to the setting in which the given oracle may only\noptimize the cost sensitive objective approximately. This may be of independent interest.\n\n5\n\n\fFinally, we bound the deviation between the algorithm\u2019s empirical regret and true expected regret.\nThis in particular requires uniform convergence over the entire class of fair randomized policies,\nwhich we show by leveraging the sparsity of the fair distributions.\nWe now give the proof of Theorem 3.1, with forward references to needed theorems and lemmas.\n\nProof of Theorem 3.1. We set T0 =\u21e5( pT ln(|H|/)). First, Lemma 3.4 shows that given our\nempirical EFP constraint, there exists an optimal policy of support size at most 2. Next, Lemma B.2\nshows that, with probability 1 over arrivals 1, . . . , T0, all convex combinations \u21e1 2 \u21e7 satisfy\n\u02c6-EFP for \u02c6 = + , = O\u21e3pln(|H|/)/T 1/4\u2318. It also implies that the optimal -fair policy\nis in the class. Theorem 3.5 shows that, given a CSC oracle for H, we can implement an ef\ufb01cient\napproximate CSC oracle for this class \u21e7. Theorem 3.11 shows that, given an approximate CSC oracle\nfor any class, there is an ef\ufb01cient bandit algorithm that plays from this class and achieves expected\nregret O\u21e3ln (|H|T /)pT\u2318.\nFairness: In the \ufb01rst T0 rounds we play +1 which is 0-fair, and in the remaining rounds we play only\npolicies from \u21e7. With probability 1 over the exploration data, every member of \u21e7 is ( + )-fair.\nRegret: The algorithm\u2019s regret is at most T0 plus its regret, on rounds T0 + 1, . . . , T , to the optimal\npolicy in \u21e7. By Proposition 2.3, this is at most twice the bandit algorithm\u2019s regret on those rounds.\nSo our expected regret totals at most O\u21e3ln (|H|T /)pT\u2318 to the best policy in \u21e7. With probability\n1 , \u21e7 contains the optimal -fair classi\ufb01er; with the remaining probability, the algorithm\u2019s regret\nto the best -fair classi\ufb01er can be bounded by T . Choosing \uf8ff 1pT gives the result.\n3.1 Step 1: Constructing a Fair CSC Oracle From Exploration Data\nLet SE denote the set of T0 labeled examples {zi = (xi, ai, yi)}T0\ni=1 collected from the initial\nexploration phase, and let DE denote the empirical distribution over SE. We will use DE as a proxy\nfor the true distribution to form an empirical fairness constraint. To support the learning algorithm in\nthe second phase, we need to construct an oracle that solves CSC problems subject to the empirical\nfairness constraint. Formally, an instance of the FairCSC problem for the class H is given by a set of\nn tuples {(xj, c(1)\nj=1 as before, along with a fairness parameter and an approximation\n)}n\nparameter \u232b. We wish to solve the following fair CSC problem:\n\n, c(+1)\n\nj\n\nj\n\nwhere F P R(\u21e1, DE) = F P R1(\u21e1, DE) F P R1(\u21e1, DE) and each F P Rj(\u21e1, DE) denotes the\nfalse positive rate of \u21e1 on distribution DE. We show a useful structural property that there always\nexists a small-support optimal solution; the proof appears in Appendix B.1.\nLemma 3.4. There exists an optimal solution for the FairCSC that is a distribution over H with\nsupport size no greater than 2.\n\nWe therefore consider the set of sparse convex combinations:\n\n\u21e7= {\u21e1 2 (H) | Supp(\u21e1) \uf8ff 2,\n\n|F P R(\u21e1, DE)|\uf8ff + }\n\nand focus on algorithms that only play policies from \u21e7 and measure their performance with respect to\n\u21e7. For any \u21e1 2 \u21e7, we will write \u21e1(h) to denote the probability \u21e1 places on h. Applying a standard\nconcentration inequality, we can show (Lemma B.2) that each policy in \u21e7 is also approximately fair\nwith respect to the underlying distribution.\nWe provide a reduction from FairCSC problems to standard CSC problems as follows: 1) We\n\ufb01rst apply a standard transformation on the input CSC objective to derive an equivalent weighted\nclassi\ufb01cation problem, in which each example j has importance weight |c(1)\n|. 2) We\nthen run the fair classi\ufb01cation algorithm due to [1] that solves the weighted classi\ufb01cation problem\napproximately using a polynomal number of CSC oracle calls. 3) Finally, we follow an approach\nsimilar to that of [10] to shrink the support size of the solution returned by the fair classi\ufb01cation\nalgorithm down to at most 2, which can be done in polynomial time.\n\n c(+1)\n\nj\n\nj\n\n6\n\nmin\n\n\u21e12(H)\n\nE\n\nh\u21e0\u21e124\n\nnXj=1\n\n35\n\nc(h(xj ))\nj\n\nsuch that\n\n|F P R(\u21e1, DE)|\uf8ff \n\n(1)\n\n\fTheorem 3.5 (Reduction from FairCSC to CSC). For any 0 <\u232b T0, we utilize a bandit algorithm\nto make predictions. We now describe the algorithm, which closely follows the ILOVETOCONBAN-\nDITS algorithm by [2] but with important modi\ufb01cations that are necessary to handle approximation\nerror in the FairCSC oracle.\nAt each round t > T0, the bandit algorithm produces a distribution Qt over policies \u21e1. Each policy \u21e1\nis a convex combination of two classi\ufb01ers in H and satis\ufb01es approximate fairness. The algorithm\nthen draws \u21e1 from Qt, draws h from \u21e1, and labels \u02c6yt = h(xt). To choose Qt, the algorithm places\nsome constraints on Q and runs a short coordinate descent algorithm to \ufb01nd a Q satisfying those\nconstraints. Finally, it mixes in a small amount of the uniform distribution over labels (which can\nbe realized by mixing between +1 and 1). We will see that the constraints, called the feasibility\nprogram, correspond to roughly bounding the expected regret of the algorithm along with bounding\nthe variance in regret of each possible \u21e1.\n\nFeasibility program. To describe the feasibility program, we \ufb01rst introduce some notation. For\neach t, we will write pt to denote the probability that prediction \u02c6yt is selected by the learner, and `t\nbe the incurred (contextual bandit) loss given by the transformation in Proposition 2.3.\nfor each policy \u21e1 2 \u21e7, let\ntXs=1\n\ndenote the estimated average loss given by the inverse propensity score (IPS) estimator and true\nexpected loss for \u21e1, respectively. Similarly, let\n\n[1[\u21e1(x) 6= y]]i\n\n(x,a,y)\u21e0DhE\u21e1\n\nPr[\u21e1(xs) = \u02c6ys]\n\n\u02c6Lt(\u21e1) =\n\nL(\u21e1) =\n\n1\nt\n\nps\n\nE\n\n`s\n\n,\n\nReg(\u21e1) = L(\u21e1) min\n\u21e102\u21e7\n\nL(\u21e10),\n\ndRegt(\u21e1) = \u02c6Lt(\u21e1) min\n\n\u21e102\u21e7\n\n\u02c6Lt(\u21e10),\n\nQ(\u21e1) Pr[\u21e1(x) = \u02c6y]d\u21e1\n\ndenote the estimated average regret and the true expected regret. In order to bound the variance of\nthe IPS estimators, we will ensure that the learner predicts each label with minimum probability\n\u00b5t at each round t. In particular, given a solution Q for the program and a minimum probability\nparameter \u00b5t, the learner will predict according to the mixture distribution Q\u00b5t(\u00b7 | x) (a distribution\nthat predicts +1 w.p. \u00b5t, and predicts according to Q w.p. 1 \u00b5t):\n\nWe describe the feasibility problem solved at each step. The approach and analysis directly follow\nand extend that of [2]. In that work, the \ufb01rst step at each round is to compute the best policy so far,\n\nQ\u00b5t(\u02c6y | x) = \u00b5t + (1 2\u00b5t)Z\u21e12\u21e7\nNote that this can be represented as a convex combination of classi\ufb01ers from H since we assume that\n+1 2H . We de\ufb01ne for each \u21e1 2 \u21e7, bt(\u21e1) = dRegt(\u21e1)\nwhich lets us computedRegt(\u21e1) and bt(\u21e1) for any policy \u21e1. Here, our FairCSC oracle only computes\nso far, which leads to corresponding approximationsgRegt(\u21e1) and \u02dcbt(\u21e1). Then, our algorithm solves\n\nthe same feasibility program (although a few more technicalities must be handled): given history Ht\n(in the second phase) and minimum probability \u00b5t, \ufb01nd a probability distribution Q over \u21e7 such that\n(Low regret)\n\napproximate solutions, and so we can only compute regret relative to the approximately best policy\n\n4(e2)\u00b5t ln(T ), and also initialize b0(\u21e1) = 0.\n\nZ\u21e12\u21e7\nx\u21e0Ht\uf8ff\n\nE\n\nQ(\u21e1)\u02dcbt1(\u21e1)d\u21e1 \uf8ff 4\nQ\u00b5t(\u21e1(x) | x) \uf8ff 4 + \u02dcbt1(\u21e1)\n\n1\n\n8\u21e1 2 \u21e7:\n\n(Low variance)\n\n7\n\n\fIntuitively, the \ufb01rst constraint ensures that the estimated regret (based on historical data) of the\nsolution is at most \u02dcO(1/pt). The second constraint bounds the variance of the resulting IPS loss\nestimator for policies in \u21e7, which in turn allows us to bound the deviation between the empirical\nregret and the true regret for each policy over time. Importantly, we impose a tighter variance\nconstraint on policies that have lower empirical regret so far, which prioritizes their regret estimation.\nTo solve the feasibility program using our FairCSC oracle, we will run a coordinate descent algorithm,\nsimilar to [2] (full description in Section B.3 as Algorithm 1). The FairCSC oracle is used to identify\nand \ufb01x violated constraints. Via a potential argument similar to the one of [2], we can show that the\nalgorithm halts in a small number of iterations. We will also bound the additional error in the output\nsolution due to the approximation in the FairCSC oracle. In the following, let \u21e40 = 0 and for any\nt 1,\n\nt ln(T )\nwhere \u232b is the approximation parameter of the FairCSC oracle.\nLemma 3.6. Algorithm 1 halts in a number of iterations (and oracle calls) that is polynomial in 1\n\u00b5t\nand outputs a weight vector Q that is a probability distribution with the following guarantee:\n\n,\n\n\u21e4t :=\n\n\u232b\n4(e 2)\u00b52\n\n.\n\nZ\u21e12\u21e7\n\n8\u21e1 2 \u21e7:\n\nQ(\u21e1)(4 + bt1(\u21e1))d\u21e1 \uf8ff 4 +\u21e4 t\nE\n\nQ\u00b5t(\u21e1(x) | x) \uf8ff 4 + bt1(\u21e1) +\u21e4 t.\n\nx\u21e0Ht\uf8ff\n\n1\n\n3.3 Step 3: Regret Analysis\nThe key step in our regret analysis is to establish a tight relationship between the estimated regret and\n\nThe \ufb01nal regret guarantee then essentially follows from the guarantee of Lemma 3.6 that the estimated\nregret of our policy is bounded by \u02dcO (1/t) with proper setting of \u00b5t.\n\nthe true expected regret and show that for any \u21e1 2 \u21e7, Reg(\u21e1) \uf8ff 2dReg(\u21e1) + \u270ft, with \u270ft = \u02dcO(1/pt).\nTo bound the deviation between Reg(\u21e1) and dRegt(\u21e1), we need to bound the variance of our IPS\nestimators. Let us de\ufb01ne the following for any probability distribution P over \u21e7, \u21e1 2 \u21e7,\nP \u00b5(\u21e1(x) | x)\n\nP \u00b5(\u21e1(x) | x)\n\n\u02c6Vt(P, \u21e1, \u00b5) := E\n\nx\u21e0Ht\uf8ff\n\nV (P, \u21e1, \u00b5) := E\n\nx\u21e0D\uf8ff\n\n1\n\n1\n\nRecall that through the feasibility program, we can directly bound \u02c6Vt(Qt,\u21e1, \u00b5 t) for each round.\nHowever, to apply a concentration inequality on the IPS estimator, we need to bound the population\nvariance V (Qt,\u21e1, \u00b5 t). We do that through a deviation bound between \u02c6Vt(Qt,\u21e1, \u00b5 t) and V (Qt,\u21e1, \u00b5 t)\nfor all \u21e1 2 \u21e7. In particular, we rely on the sparsity on \u21e7 and apply a covering argument. Let \u21e7\u2318 \u21e2 \u21e7\ndenote an \u2318-cover such that for every \u21e1 in \u21e7, min\u21e102\u21e7\u2318 k\u21e1(h) \u21e10(h)k1 \uf8ff \u2318 for any h 2H .\nSince \u21e7 consists of distributions with support size at most 2, we can take the cardinality of \u21e7\u2318 to be\nbounded by d|H|2/\u2318e.\nClaim 3.7. Let P be any distribution over the policy set \u21e7, and let \u21e1 be any policy in \u21e7. Then there\nexists \u21e10 2 \u21e7\u2318 such that |V (P, \u21e1, \u00b5) V (P, \u21e10, \u00b5)|1,| \u02c6Vt(P, \u21e1, \u00b5) \u02c6Vt(P, \u21e10, \u00b5)|1 \uf8ff \u2318\nLemma 3.8. Suppose that \u00b5t q ln(2|\u21e7\u2318|t2/)\n\n, t 8 ln(2|\u21e7\u2318|t2/). Then with probability 1 ,\n\n\u00b5(\u00b5+\u2318) .\n\n2t\n\nV (P, \u21e1, \u00b5t) \uf8ff 6.4 \u02c6Vt(P, \u21e1, \u00b5t) + 162.6 +\n\n2\u2318\n\n\u00b5t(\u00b5t + \u2318)\n\nNext we bound the deviation between the estimated loss and true expected loss for every \u21e1 2 \u21e7.\nLemma 3.9. Assume that the algorithm solves the per-round feasibility program with accuracy\nguarantee of Lemma 3.6. With probability at least 1 , we have for all t 2 [T ] all policies \u21e1 2 \u21e7,\n 2 [0, \u00b5t], and t 8 ln(2|\u21e7\u2318|t2/),\nln\u21e3|\u21e7\u2318|T\n \u2318\n|L(\u21e1) \u02c6Lt(\u21e1)|\uf8ff (e2) 188.2 +\n\ntXs=1\u27136.4bs1(\u21e1) + 6.4\u21e4s1 +\n\n\u00b5s(\u00b5s + \u2318)\u25c6!+\n\n1\nt\n\n2\u2318\n\nt\n\n8\n\n\fTo bound the difference between Reg(\u21e1) and dRegt(\u21e1), we will set \u2318 = 1/T 2, \u00b5t = 3.2 ln(|\u21e7\u2318|T / )\nthe approximation parameter \u232b of FairCSC to be 1/T .\nLemma 3.10. Assume that the algorithm solves the per-round feasibility program with the accuracy\nguarantee of Lemma 3.6. With probability at least 1 , we have for all t 2 [T ] all policies \u21e1 2 \u21e7,\nand for all t 8 ln(2|H|2T 3/),\n\npt\n\nReg(\u21e1) \uf8ff 2dRegt(\u21e1) + \u270ft,\n\npt\n\nand dRegt(\u21e1) \uf8ff 2Reg(\u21e1) + \u270ft\n\nwith \u270ft = 1000 ln(|H|2T 2/)\nTheorem 3.11. The bandit algorithm, given access to an approximate-CSC oracle, runs in time\n\n.\n\npolynomial in T and achieves expected regret at most O\u21e3ln(|H|T /) pT\u2318.\n\n4 Lower Bound\n\nIn this section we show that the tradeoff that our algorithm exhibits between its regret bound and\nthe \u201cfairness gap\u201d 0 (i.e. our algorithm is 0-fair, but competes with the best -fair classi\ufb01er\nwhen measuring regret) is optimal. We do this by constructing a lower bound instance consisting of\ntwo very similar distributions, D1 and D2 de\ufb01ned as a function of our algorithm\u2019s fairness target .\nRoughly, there are not enough samples to distinguish the distributions until at least \u21e5( 1\n2 ) rounds\nelapse, but in order to equalize false positive rates on both distributions, an algorithm must \u201cplay it\nsafe\u201d and incur constant regret per round during this time.\n\nTheorem 4.1. Fix any \u21b5 2 (0, 0.5) and let T \u21b5p16. Fix any \uf8ff 0.24. There exists a hypothesis\nclass H containing {\u00b11} such that any algorithm satisfying a T \u21b5-EFP() fairness constraint has\nexpected regret with respect to the set of 0-EFP fair policies of \u2326T 2\u21b5.\n\nAcknowledgments\nWe thank Nati Srebro for a conversation leading to the question we study here. We thank Michael\nKearns for helpful discussions at an early stage of this work. YB and KL were funded in part by Israel\nScience Foundation (ISF) grant 1044/16, the United States Air Force and DARPA under contract\nFA8750-16-C-0022, and the Federmann Cyber Security Center in conjunction with the Israel national\ncyber directorate. AR was funded in part by NSF grant CCF-1763307 and the United States Air Force\nand DARPA under contract FA8750-16-C-0022. ZSW was supported in part by a Google Faculty\nResearch Award, a J.P. Morgan Faculty Award, a Mozilla research grant, and a Facebook Research\nAward. Part of this work was done while KL and ZSW were visiting the Simons Institute for the\nTheory of Computing, and BW was a postdoc at the University of Pennsylvania\u2019s Warren Center and\nat Microsoft Research, New York City. Any opinions, \ufb01ndings and conclusions or recommendations\nexpressed in this material are those of the authors and do not necessarily re\ufb02ect the views of JP\nMorgan, the United States Air Force and DARPA.\n\nReferences\n[1] Alekh Agarwal, Alina Beygelzimer, Miroslav Dud\u00edk, John Langford, and Hanna M. Wallach. A\nreductions approach to fair classi\ufb01cation. In Proceedings of the 35th International Conference\non Machine Learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10-15, 2018,\npages 60\u201369, 2018.\n\n[2] Alekh Agarwal, Daniel J. Hsu, Satyen Kale, John Langford, Lihong Li, and Robert E. Schapire.\nTaming the monster: A fast and simple algorithm for contextual bandits. In Proceedings of the\n31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June\n2014, pages 1638\u20131646, 2014.\n\n[3] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic\n\nmultiarmed bandit problem. SIAM journal on computing, 32(1):48\u201377, 2002.\n\n[4] G\u00e1bor Bart\u00f3k, D\u00e1vid P\u00e1l, and Csaba Szepesv\u00e1ri. Toward a classi\ufb01cation of \ufb01nite partial-\nmonitoring games. In International Conference on Algorithmic Learning Theory, pages 224\u2013\n238. Springer, 2010.\n\n9\n\n\f[5] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in\ncriminal justice risk assessments: The state of the art. Sociological Methods & Research,\n0(0):0049124118782533, 2018.\n\n[6] Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert Schapire. Contextual\nbandit algorithms with supervised learning guarantees. In Geoffrey Gordon, David Dunson, and\nMiroslav Dud\u00edk, editors, Proceedings of the Fourteenth International Conference on Arti\ufb01cial\nIntelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pages\n19\u201326, Fort Lauderdale, FL, USA, 11\u201313 Apr 2011. PMLR.\n\n[7] Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, and Nati Srebro. On preserving non-\ndiscrimination when combining expert advice. In Advances in Neural Information Processing\nSystems, pages 8386\u20138397, 2018.\n\n[8] Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classi\ufb01-\n\ncation. Data Mining and Knowledge Discovery, 21(2):277\u2013292, 2010.\n\n[9] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism\n\nprediction instruments. Big data, 5(2):153\u2013163, 2017.\n\n[10] Andrew Cotter, Heinrich Jiang, and Karthik Sridharan. Two-player games for ef\ufb01cient non-\n\nconvex constrained optimization. CoRR, abs/1804.06500, 2018.\n\n[11] Amit Daniely, Nati Linial, and Shai Shalev-Shwartz. From average case complexity to improper\n\nlearning complexity. arXiv preprint arXiv:1311.2272, 2013.\n\n[12] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness\nthrough awareness. In Proceedings of the 3rd innovations in theoretical computer science\nconference, pages 214\u2013226. ACM, 2012.\n\n[13] Danielle Ensign, Sorelle A Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkata-\nsubramanian. Runaway feedback loops in predictive policing. In Conference on Fairness,\nAccountability and Transparency, pages 160\u2013171, 2018.\n\n[14] Danielle Ensign, Frielder Sorelle, Neville Scott, Scheidegger Carlos, and Venkatasubramanian\nSuresh. Decision making with limited feedback. In Firdaus Janoos, Mehryar Mohri, and Karthik\nSridharan, editors, Proceedings of Algorithmic Learning Theory, volume 83 of Proceedings of\nMachine Learning Research, pages 359\u2013367. PMLR, 07\u201309 Apr 2018.\n\n[15] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkata-\n\nsubramanian. Certifying and removing disparate impact. In KDD, 2015.\n\n[16] Vitaly Feldman, Venkatesan Guruswami, Prasad Raghavendra, and Yi Wu. Agnostic learning\n\nof monomials by halfspaces is hard. SIAM Journal on Computing, 41(6):1558\u20131590, 2012.\n\n[17] M. Gr\u00f6tschel, L. Lov\u00e1sz, and A. Schrijver. The ellipsoid method and its consequences in\n\ncombinatorial optimization. Combinatorica, 1(2):169\u2013197, Jun 1981.\n\n[18] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In\n\nAdvances in neural information processing systems, pages 3315\u20133323, 2016.\n\n[19] David P Helmbold, Nicholas Littlestone, and Philip M Long. Apple tasting. Information and\n\nComputation, 161(2):85\u2013139, 2000.\n\n[20] Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. Fairness in learning:\nClassic and contextual bandits. In Advances in Neural Information Processing Systems, pages\n325\u2013333, 2016.\n\n[21] Matthew Joseph, Michael J. Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. Merito-\ncratic fairness for in\ufb01nite and contextual bandits. In Jason Furman, Gary E. Marchant, Huw\nPrice, and Francesca Rossi, editors, Proceedings of the 2018 AAAI/ACM Conference on AI,\nEthics, and Society, AIES 2018, New Orleans, LA, USA, February 02-03, 2018, pages 158\u2013163.\nACM, 2018.\n\n10\n\n\f[22] Adam Tauman Kalai, Adam R Klivans, Yishay Mansour, and Rocco A Servedio. Agnostically\n\nlearning halfspaces. SIAM Journal on Computing, 37(6):1777\u20131805, 2008.\n\n[23] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through\nregularization approach. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International\nConference on, pages 643\u2013650. IEEE, 2011.\n\n[24] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair\n\ndetermination of risk scores. arXiv preprint arXiv:1609.05807, 2016.\n\n[25] Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, and David C Parkes.\n\nCalibrated fairness in bandits. arXiv preprint arXiv:1707.01875, 2017.\n\n[26] Kristian Lum and William Isaac. To predict and serve? Signi\ufb01cance, 13(5):14\u201319, 2016.\n\n11\n\n\f", "award": [], "sourceid": 4814, "authors": [{"given_name": "Yahav", "family_name": "Bechavod", "institution": "Hebrew University"}, {"given_name": "Katrina", "family_name": "Ligett", "institution": "Hebrew University"}, {"given_name": "Aaron", "family_name": "Roth", "institution": "University of Pennsylvania"}, {"given_name": "Bo", "family_name": "Waggoner", "institution": "U. Colorado, Boulder"}, {"given_name": "Steven", "family_name": "Wu", "institution": "University of Minnesota"}]}