{"title": "Paradoxes in Fair Machine Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 8342, "page_last": 8352, "abstract": "Equalized odds is a statistical notion of fairness in machine learning that ensures that classification algorithms do not discriminate against protected groups. We extend equalized odds to the setting of cardinality-constrained fair classification, where we have a bounded amount of a resource to distribute. This setting coincides with classic fair division problems, which allows us to apply concepts from that literature in parallel to equalized odds. In particular, we consider the axioms of resource monotonicity, consistency, and population monotonicity, all three of which relate different allocation instances to prevent paradoxes. Using a geometric characterization of equalized odds, we examine the compatibility of equalized odds with these axioms. We empirically evaluate the cost of allocation rules that satisfy both equalized odds and axioms of fair division on a dataset of FICO credit scores.", "full_text": "Paradoxes in Fair Machine Learning\n\nPaul G\u00f6lz, Anson Kahng, and Ariel D. Procaccia\n\nComputer Science Department\nCarnegie Mellon University\n\n{pgoelz, akahng, arielpro}@cs.cmu.edu\n\nAbstract\n\nEqualized odds is a statistical notion of fairness in machine learning that ensures\nthat classi\ufb01cation algorithms do not discriminate against protected groups. We\nextend equalized odds to the setting of cardinality-constrained fair classi\ufb01cation,\nwhere we have a bounded amount of a resource to distribute. This setting coincides\nwith classic fair division problems, which allows us to apply concepts from that\nliterature in parallel to equalized odds. In particular, we consider the axioms\nof resource monotonicity, consistency, and population monotonicity, all three of\nwhich relate different allocation instances to prevent paradoxes. Using a geometric\ncharacterization of equalized odds, we examine the compatibility of equalized odds\nwith these axioms. We empirically evaluate the cost of allocation rules that satisfy\nboth equalized odds and axioms of fair division on a dataset of FICO credit scores.\n\n1\n\nIntroduction\n\nThroughout most of human history, the question \u201cwho deserves what?\u201d could only be answered by\npeople. As such, questions of fairly allocating resources among groups of people were historically\ndictated by common sense, enforced by law, or suggested by social conventions. In the age of big data,\nhowever, machine learning algorithms increasingly dictate decisions about distributing resources in a\nwide range of domains [15, 19]. Machine learning classi\ufb01ers have been trained to determine which\napplicants deserve bank loans [19], which students merit acceptance from a particular school [23], or\nwhich prisoners should receive parole [15]. The prevalence of algorithmic intervention has led to\na widespread call for accountability in machine learning: in order to ensure that algorithms do not\ndisproportionately affect different constituent subpopulations, researchers must be able to provide\nfairness guarantees of the resulting classi\ufb01cation algorithms. This call, in turn, has led to much\nprior work on measuring and ensuring statistical notions of fairness, notably through metrics like\ndemographic parity and equalized odds [8, 10, 11, 13, 16, 18, 24\u201326].\nThe statistical notion of fairness that we will consider throughout this paper is that of equalized\nodds, which states that a classi\ufb01er must have equal true positive and false positive rates for all\ngroups. While equalized odds has been extensively studied as a metric of fairness in machine\nlearning [10, 11, 13, 16, 18, 24], it has not been considered in settings where a desired number of\npositive labels is given. This constraint is natural and ubiquitous whenever agents labeled as positive\nobtain a limited resource. For instance, a school can only offer admission to a \ufb01xed number of\nstudents, a police department\u2019s staff dictates the number of suspects they can stop and frisk, and a\nbank might only have a \ufb01nite amount of available loans. In the unconstrained setting, the quality\nof a classi\ufb01er is computed by adding a given utility per true positive and subtracting a given cost\nper false positive. In the cardinality-constrained setting, the ef\ufb01ciency that we seek to maximize is\nsimply the number of true positives (e.g., people who repay loans or students who will graduate from\nschool). Since we \ufb01x the number of overall positives, optimizing for any choice of (positive) utility\nand (positive) cost coincides with maximizing our notion of ef\ufb01ciency.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fWhile fair classi\ufb01cation has not previously been studied from this perspective, the task of fairly\nallocating a \ufb01nite resource is central to the \ufb01eld of fair division [6, 17]. Indeed, it is natural to directly\nformulate the problem of fairly classifying agents, where exactly k must be labeled as positive, as\nthe fair division problem of awarding k identical items to k applicants in a way that satis\ufb01es certain\nfairness constraints.\nThat being said, the notions of fairness differ between the fairness in machine learning and fair\ndivision communities. On one side, the machine learning literature studies statistical notions of\nfairness that hold over groups and which are usually mutually exclusive. In contrast, the fair division\nliterature includes a whole toolbox of fairness axioms, most of which can be understood as precluding\na paradox that, if present, would clearly violate intuitive notions of fairness. Combinations of these\naxioms then induce families of allocation rules that are immune to these types of paradoxes. To the\nbest of our knowledge, there has been no prior work that relates statistical measures of fairness to\nclassical axioms of fairness. This is unfortunate, since it would certainly be desirable to prevent the\ncorresponding types of paradoxes when applying fair machine learning. This motivates our main\nresearch question: To what extent is equalized odds compatible with axioms of fairness prevalent in\nfair division?\nOur contributions are twofold: First, we introduce the setting of cardinality constraints and study\noptimal classi\ufb01cation algorithms that satisfy equalized odds in this setting. In particular, we present a\ngeometric characterization of the optimal allocation rule that satis\ufb01es equalized odds given cardinality\nconstraints.\nSecond, in the cardinality-constrained model, we examine the relationship between equalized odds\nand the following three standard fair-division axioms. Resource monotonicity captures the intuition\nthat, given more of a resource to distribute among a population, no agent should be worse off than\nbefore. Consistency says that, if an agent leaves with her allocation, then running the same allocation\nrule on the remaining agents and resources should result in the remaining agents receiving the same\nallocations as before. Population monotonicity states that, if an agent joins the division process, then\nall previous agents should receive at most what they previously received.\nFor resource monotonicity, we achieve a positive result: resource monotonicity can be implemented\nalongside equalized odds without cost to ef\ufb01ciency, which requires careful consideration of how\ngoods are allocated inside of each group. For consistency, we prove a strikingly negative result \u2014 the\nonly allocation rule that satis\ufb01es equalized odds and consistency is uniform allocation. In the case of\npopulation monotonicity, compatibility with equalized odds is also severely limited. More precisely,\nno allocation rule that achieves a constant approximation of the optimal equalized-odds ef\ufb01ciency\ncan satisfy population monotonicity. To complement these theoretical results, we use a dataset of\nFICO credit scores to study the ef\ufb01ciency of allocation rules that satisfy equalized odds and each of\nthe three axioms.\nOur results are related to, but conceptually and technically distinct from, previous work showing\nthat equalized odds is incompatible with other statistical notions of fairness, notably the property of\ncalibration. Intuitively, calibration states that if a classi\ufb01er assigns a probability label of p to a set of\nn people, then p n of them should actually be positive [10, 16, 18]. It has been shown [16, 18] that\nwhen groups have different base rates, i.e., probabilities that they belong to the positive class, the only\nclassi\ufb01er that satis\ufb01es equalized odds and calibration is the perfect classi\ufb01er. Note that our approach\nis not in con\ufb02ict with these results; we assume a calibrated, unfair classi\ufb01er and produce a fair, but\nuncalibrated classi\ufb01er. Indeed, our \ufb01nal classi\ufb01er should not be expected to be calibrated since the\nsum of allocations is determined by the cardinality constraint, not by the fraction of positive agents\nin the population. Additionally, work by Corbett-Davies et al. [11] establishes a trade-off between\nachieving equalized odds and the natural fairness notion of holding all agents in all groups to the\nsame standard.\n2 Our Model\n\nWe consider settings with at least two groups, and let g range over these groups by default. Each\ngroup is composed of positive and negative agents; allocating a good to a positive agent is preferable\nto allocating it to a negative agent. For example, if we distribute loans, positive agents might be\nthose who will not default if they are given the loan, or, if we select whom to stop and frisk, positive\nindividuals might be those who indeed carry an illegal weapon. To exclude trivial cases, we assume\nthat some positive and negative agents exist, even if not necessarily in the same group.\n\n2\n\n\fWe assume the existence of a calibrated classi\ufb01er on each group. Thus, for every group g, there is\na \ufb01nite set Pg of probabilities of, say, repaying a loan. When simultaneously ranging over g and p,\nwe implicitly only refer to p \u2208 Pg. For each p \u2208 Pg, dp\ng > 0 gives the number of agents to whom\nthe classi\ufb01er assigns probability p of being positive. We refer to the set of agents in the same group\nclassi\ufb01ed as the same probability as being in one bucket. By the calibration assumption, p dp\ng agents\nin a bucket (g, p) are positive and (1 \u2212 p) dp\ng are negative. Denote the total number of positive agents\n(1 \u2212 p) dp\nin g by D+\ng.\ng, and the total cardinality over all groups is\ndp\n\nThe total cardinality of a group is Dg := (cid:80)\nD :=(cid:80)\n\ng and the total number of negative agents by D\u2212\n\ng :=(cid:80)\n\ng :=(cid:80)\n\np\u2208Pg\n\np\u2208Pg\n\np\u2208Pg\n\np dp\n\ng Dg.\n\n2.1 From Classi\ufb01cation with Cardinality Constraints to Allocation of Divisible Goods\n\nAn allocation algorithm is given the output of the classi\ufb01er and a real-number cardinality constraint\nk \u2208 [0, D]. The algorithm must allocate k units of a divisible good to the agents, where each agent\ncan receive at most one unit of the good. The objective in this allocation is to maximize ef\ufb01ciency,\ni.e., the amount of goods allocated to positive agents.\nNote that this setting, where we allocate a divisible good, generalizes binary classi\ufb01cation with\ncardinality constraints. Indeed, the latter problem is equivalent to distributing k discrete indivisible\nitems. If we choose our good as the probability of receiving an item, we can immediately apply our\nframework to this setting. Using the Birkhoff-von Neumann theorem [5, 22], the individual allocation\nprobabilities can be translated into a lottery of the items that guarantees that exactly k many items are\ndistributed at any time.\nThat said, the increased expressive power does allow us to capture additional settings of interest. For\nexample, in the context of the fair allocation of \ufb01nancial aid, colleges typically provide different\namounts of aid to different students, rather than making binary decisions.\nReturning to our model, for each bucket (g, p), let (cid:96)p\ng] denote the amount of goods allocated\nto the agents in the bucket. Since the algorithm does not possess more detailed information than\nthe classi\ufb01er output, we may without loss of generality assume that the allocation equally spreads a\nbucket\u2019s allocation between its members. Indeed, if 0 < p < 1, any unbalanced allocation inside the\nbucket would make mean allocations in the de\ufb01nition of equalized odds depend on which agents will\nbe positive, which means that equalized odds cannot be guaranteed. For the probabilities 0 and 1, all\nagents in the bucket have the same type, and the algorithm can, in principle, arbitrarily discriminate\nbetween them. However, since the agents in the bucket are indistinguishable, assuming a balanced\nallocation does not change our analyses.\nWith these observations, we know that the total allocation to positive agents in group g is L+\n\ng \u2208 [0, dp\n\ng and that the total allocation to negative agents is L\u2212\ng.\n(cid:96)p\n\ng + L\u2212\n\ncardinality of the group allocation be Lg := L+\nEach allocation is decomposable into allocations for each group. For a group g, we call a group\ng for some \u03b1 \u2208 [0, 1] and all p \u2208 Pg. Another important class\nallocation ((cid:96)p\nof group allocations are threshold allocations, which do not give any goods to agents in a bucket p\nuntil every agent in a higher-p bucket of the same group receives a full unit of the good. Formally,\nthere must be a threshold probability p\u2217 such that (cid:96)p\ng = 0 for all\np < p\u2217, where (cid:96)p\u2217\n\ng for all p > p\u2217 and such that (cid:96)p\n\ng)p uniform if (cid:96)p\n\ng = \u03b1 dp\n\ng can be arbitrary.\n\ng = dp\n\np\u2208Pg\n\ng := (cid:80)\n\ng =(cid:80)\n\np\u2208Pg\n\np (cid:96)p\n\n(cid:80)\n\n(1 \u2212 p) (cid:96)p\n\ng :=\ng. Let the\n\np\u2208Pg\n\n2.2 Equalized Odds\n\nThroughout the paper, allocations must satisfy equalized odds, which means that (a) the mean\nallocation over the positive agents in g is equal between all groups g that have any positive agents;\nand (b) the mean allocation over the negative agents in g is equal between all groups g that have any\nnegative agents. We refer to the pair (L+\ng ) \u2014 the mean allocation to positive agents\nand the mean allocation to negative agents \u2014 as the signature of the allocation.\n\ng /D\u2212\n\ng , L\u2212\n\ng /D+\n\n2.3 Fair-Division Axioms for Allocation Algorithms\n\nThere have been many decades of work on fair division, spanning settings with both divisible\nand indivisible goods [6, 7, 12, 14, 17, 20]. Throughout this literature, desirable properties are\n\n3\n\n\fencoded via axioms, which can be either punctual or relational. Punctual axioms such as equitability,\nproportionality [20], and envy-freeness [12] apply to each instance of the fair division problem\nseparately, and ensure that each agent\u2019s allocation satis\ufb01es global or relative valuations. For example,\nproportionality states that, given n agents, each agent should receive at least a 1/n fraction of her\nvalue for the entire resource, and envy-freeness states that all agents should value their own allocations\nmore than any other allocation given to another agent [21]. By contrast, relational axioms such as\nresource monotonicity, consistency, and population monotonicity link separate instances of the fair\ndivision problem together and can be thought of as well-behavedness properties. In our model, each\nagent desires as much of the good as possible, which means that only uniform allocation satis\ufb01es\npunctual axioms such as equitability, proportionality, and envy-freeness.1 For this reason, we focus\non the abovementioned relational axioms.\nAn allocation algorithm satis\ufb01es resource monotonicity if increasing k does not decrease any agent\u2019s\nallocation. In our model, we assume that the amount of goods to be allocated is \ufb01xed a priori. In\npractice, however, additional resources might become available during the allocation phase. If the\navailability of more resources were to decrease an agent\u2019s allocation, the allocator might \ufb01nd it\ndif\ufb01cult to recuperate goods that have already been promised (or distributed). Resource monotonicity\navoids these bad situations.\nConsistency says that allocations can be computed separately for subsets, using the share of the good\nallocated to them. Formally, for a given classi\ufb01cation ( \u02c6dp\ng)g,p\nde\ufb01ne the allocations of an allocation algorithm. Consider a second instance, in which we remove\nsome agents, i.e., have an allocation (dp\ng for all buckets (buckets might also be\nremoved, which we represent by setting dp\ng = 0). In addition, we reduce the allocation cardinality to\nwhat these agents together received in the previous instance, i.e., have a new allocation cardinality\n\u02c6(cid:96)p\ng. Consistency requires that every agent of the second instance receive the same\n\u02c6(cid:96)p\nallocation as in the \ufb01rst, i.e., that (cid:96)p\ng. Notably, assuming both consistency and equalized\nodds implies that equalized odds must hold over subpopulations. For instance, fairness between racial\ngroups would be preserved when considering only the female, senior, or foreign-born subpopulations;\nruling out fairness analogues of Simpson\u2019s paradox. While this would certainly be desirable, we will\nshow that it comes at an unreasonable price in ef\ufb01ciency.\nFinally, population monotonicity mandates that, if we remove some of the agents without changing\nthe allocation cardinality, the allocation to any remaining agent cannot decrease. In our example of\nallocating \ufb01nancial aid, for instance, it is quite likely that students will join another school or drop out\nafter enrollment. If we want to preserve equalized odds, and if our allocation rule violates population\nmonotonicity, the departure of a student from the application pool might reduce another student\u2019s\nallocation, which will be hard to justify.\nNote that consistency and resource monotonicity together imply population monotonicity. Indeed, if\nwe remove some agents together with their allocation, the allocation to the remaining agents does not\nchange by consistency. Adding the removed goods back can only increase allocations by resource\nmonotonicity.\n\ng)g,p and allocation cardinality \u02c6k, let (\u02c6(cid:96)p\ng \u2264 \u02c6dp\n\nk :=(cid:80)\n\ng)g,p such that dp\n\ng = dp\n\ng,p dp\n\ng/ \u02c6dp\n\ng/ \u02c6dp\n\ng\n\ng\n\n3 Geometric Interpretation of Equalized Odds\n\nAs observed by Hardt et al. [13], the axiom of equalized odds is most easily understood through\nthe lens of a geometric interpretation. We adapt and extend their interpretation to our setting and\nprove that it encompasses all equalized-odds allocations (which Hardt et al. do not do). The resulting\ncharacterization is employed to prove our axiomatic theorems in Section 4, and gives an algorithm\nused in Section 5.\nFor the time being, focus on a single group g and ignore the cardinality constraint. An allocation to\nthis group ((cid:96)p\n\ng)p is now only constrained by 0 \u2264 (cid:96)p\n\ng for all p.\n\ng \u2264 dp\n\nLet fg be a function mapping every group allocation to its signature (L+\ng ) in [0, 1]2.\nDenote the image of fg by Sg, which marks the set of implementable signatures.2 For an example\n\ng /D+\n\ng , L\u2212\n\ng /D\u2212\n\n1In a recent paper [1], Balcan et al. argue for envy-freeness as a new notion of individual fairness for\n\nclassi\ufb01cation when preferences are heterogeneous.\n\n2If a group possesses only positive or only negative agents, all average allocations for its type are possible\n\nand it imposes no constraint on the other type. We will thus set Sg := [0, 1]2 in these cases.\n\n4\n\n\fp\n\n1/3\n1/2\n3/4\n\n1\n\ndp\ng\n\n9\n4\n8\n9\n\n(a) Buckets for group\ng. The colors link the\nbuckets to vectors in\nFig. 1b.\n\n(b) The border of Sg is\nbuilt from vectors corre-\nsponding to the p.\n\n(c) fg(ag(k)) traces the\nlower border of S+\ng , k be-\ning the cardinality of the\nallocation.\n\n(d) Each (x, y) \u2208 S+\ng has\npoints on the upper and\nlower border of S+\ng with\nequal cardinality.\n\nFigure 1: Example group g. D+\n\ng = 20 and D\u2212\n\ng = 10.\n\nFigure 2: Superimposition of S0 and S1. The cardinality line is drawn in green, the optimal solution\nis marked by a star.\n\ng , (1 \u2212 p) dp\n\ng/D+\n\ngroup speci\ufb01ed in Table 1a, the shape of Sg is shown in Fig. 1b. Sg is convex as the image of\nthe convex space of allocations under the linear function fg. Furthermore, the diagonal (x, x) for\n0 \u2264 x \u2264 1 is a subset of Sg as the image of the uniform allocations. We will restrict our investigation\ng \u201cright of\u201d that line, i.e., the set {(x, y) \u2208 Sg | x \u2265 y}, since the arguments for the\nto the area S+\ng is still convex.\nother half of Sg are symmetric. As the intersection of convex sets, S+\nConsider a speci\ufb01c value of p \u2208 Pg and the allocation that gives dp\ng units to bucket p and none to\ng/D\u2212\nall other buckets. Applying fg to this allocation gives us a vector vp := (p dp\ng ).\nFor example, as described in Table 1a, the bucket 3/4 has size 8, which means that it contains 6\npositive agents and 2 negative agents. In Fig. 1b, this bucket is represented by the green vector\nv3/4 = (6/20, 2/10). Since, in each vp, both components are nonnegative, all these vectors point in\na direction between right (p = 1) and up (p = 0). Because the slope is proportional to (1 \u2212 p)/p,\nthe slope of the vp decreases monotonically in p.3 As hinted at in Fig. 1b, we want to show that the\nupper border of S+\ng is the line (x, x), whereas the lower border can be constructed by appending\nthe vp in order of decreasing p. Formally, let ag be a function from the interval [0, Dg] into the set\nof allocations. For every k, ag(k) is the unique threshold allocation of cardinality k. Thus, ag(k)\ng for all p > p\u2217,\ng. As illustrated in Fig. 1c, fg \u25e6 ag walks along the\ng := 0 for all p < p\u2217, and (cid:96)p\u2217\n(cid:96)p\nsequence of the vp. This allows us to formally describe the shape of S+\ng .\ng is the convex set whose border is the union of the diagonal line {(x, x) | 0 \u2264 x \u2264 1}\nTheorem 1. S+\nand the image of fg \u25e6 ag.\nProof. Clearly, the image of fg \u25e6 ag lies within Sg = im(fg). Moreover, it intersects the line\n(x, x) in the points fg(ag(0)) = (0, 0) and fg(ag(Dg)) = (1, 1). Since the slope of the vectors\nincreases in their layout from left to right, im(fg \u25e6 ag) must lie under (x, x), just like a function\nwith increasing slope is convex. Thus, im(fg \u25e6 ag) \u2286 S+\ng . Because of the rising slopes of the lower\n\ndetermines the smallest p\u2217 \u2208 Pg such that(cid:80)\ng := k \u2212(cid:80)\n\ng \u2264 k. Then, ag(k) sets (cid:96)p\n\ng := dp\n\np>p\u2217 dp\n\np>p\u2217 dp\n\n3For p = 0, we consider the slope to be in\ufb01nite.\n\n5\n\n0.20.40.60.810.20.40.60.81OSgSgS+gS+gL+g/D+gL+g/D+gL\u2212g/D\u2212gL\u2212g/D\u2212g0.20.40.60.810.20.40.60.81OS+gS+gfg(ag(0))fg(ag(0))fg(ag(13))fg(ag(13))fg(ag(30))fg(ag(30))fg(ag(24))fg(ag(24))fg(ag(21))fg(ag(21))L+g/D+gL+g/D+gL\u2212g/D\u2212gL\u2212g/D\u2212g0.20.40.60.810.20.40.60.81OL+g+L\u2212g=15L+g+L\u2212g=15(\u02c6x,\u02c6y)=(15Dg,15Dg)(\u02c6x,\u02c6y)=(15Dg,15Dg)(x,y)(x,y)fg(ag(15))=(\u02dcx,\u02dcy)fg(ag(15))=(\u02dcx,\u02dcy)L+g/D+gL+g/D+gL\u2212g/D\u2212gL\u2212g/D\u2212g0.20.40.60.810.20.40.60.81OS0S0S1S1L0+L1=35L0+L1=35(cid:63)(cid:63)L+/D+L+/D+L\u2212/D\u2212L\u2212/D\u2212\fborder and the previous observations, the closed curve induced by walking counter-clockwise along\nim(fg \u25e6 ag) \u222a {(x, x) | 0 \u2264 x \u2264 1} only has left turns. Thus, the interior of the curve is convex.\nIt remains to show that the convex hull of im(fg \u25e6 ag) \u222a {(x, x) | 0 \u2264 x \u2264 1} encompasses\ng . Indeed, let (x, y) \u2208 S+\ng)p be a preimage under fg. By\nS+\nassumption, x \u2265 y, and we may assume without loss of generality that x > y. Let (\u02c6(cid:96)p\ng)p be the\nuniform allocation that sets \u02c6(cid:96)p\ng for all p. Clearly, this allocation is mapped by fg to the\nsignature (\u02c6x, \u02c6y) := (Lg/Dg, Lg/Dg). Finally, let (\u02dc(cid:96)p\ng)p be the allocation produced by ag(Lg) and let\n(\u02dcx, \u02dcy) be its image under fg.\nAs Fig. 1d shows, the images under fg of all three allocations lie on a line because they all satisfy\n\ng be given, and let the allocation ((cid:96)p\n\ng := Lg/Dg dp\n\ng y = Lg.\n\n(1)\nTo show that (x, y) lies inside of the convex hull, it is enough to show that it lies in between the two\nother points. Since \u02c6x = \u02c6y and x > y, since both points satisfy Eq. (1), and since D+\ng are\npositive, we know that \u02c6x < x. It remains to show that \u02dcx \u2265 x. Let p\u2217 be the probability used in the\ng = 0 for all p < p\u2217, the allocations coincide\nde\ufb01nition of ag(Lg). If \u02dc(cid:96)p\nand we are done. Else, by going from ((cid:96)p\ng)p, we just move parts of the allocation from\nprobabilities p \u2264 p\u2217 to probabilities p \u2265 p\u2217. The cardinality of the allocation must stay the same by\nEq. (1). Since, to calculate L+\ng , the allocation for every probability p is counted with weight p, this\ng , thus \u02dcx \u2265 x. Thus, (x, y) lies in the convex hull, and im(fg \u25e6 ag) is\nmoving can only increase L+\ng .\nindeed the lower border of S+\n\ng for all p > p\u2217 and \u02dc(cid:96)p\ng)p to (\u02dc(cid:96)p\n\ng and D\u2212\n\ng = dp\n\ng x + D\u2212\nD+\n\ng D+\n\ng D+\n\ng S+\n\ng S+\n\ng D+\n\ng D\u2212\n\ng D\u2212\n\ng Dg), and thus intersects(cid:84)\n\nto allocations that allocate ((cid:80)\ng ) x units to positive agents and ((cid:80)\nagents. Thus, the total cardinality ((cid:80)\ng ) x + ((cid:80)\nis equivalent to a constraint y = (k \u2212 ((cid:80)\ng ) x)/((cid:80)\ncardinality line must intersect the line (x, x) at x = k/((cid:80)\n(cid:84)\nNote that ef\ufb01ciency,(cid:80)\ng . If we trace the lower border of(cid:84)\ncardinality line and(cid:84)\n\nLet us return to the full setting with multiple groups, and draw the subsets Sg in the same coordinate\nsystem, as illustrated in Fig. 2. For any global allocation satisfying equalized odds, the group\nallocations must be mapped to the same signatures by the corresponding functions fg. Thus, all\nthese allocations must have a signature in the intersection of the Sg. Conversely, for any point in\nthe intersection, we can take preimages of that point for each group and obtain an allocation that is\nwell-formed and satis\ufb01es equalized odds.\nThe remaining constraint is the cardinality constraint on the allocation. Any point (x, y) corresponds\ng ) y units to negative\ng ) y of such an allocation must equal k. This\ng D\u2212\ng ). Geometrically, this constraint has the\nshape of a line with negative, \ufb01nite slope, which we refer to as the cardinality line (see Fig. 2). The\ng Sg (even\ng ). This demonstrates that an equalized-odds allocation with the given cardinality always exists.\ng , is proportional to the x coordinate of a point. Thus, ef\ufb01ciency is\noptimized by selecting an allocation corresponding to the rightmost point in the intersection of the\ng , i.e., we keep following the\nuppermost of the lower borders of the Sg, we obtain a convex monotone maximum curve. The\nsignature of the most ef\ufb01cient allocation is then simply de\ufb01ned by the intersection of the cardinality\nline and this curve.4 This description directly translates into a polynomial-time algorithm.\n4 Combining Equalized Odds with Fair Division Axioms\nWe investigate the compatibility between equalized odds and the three fair division axioms formally\nintroduced in Section 2.3. All four properties can be satis\ufb01ed simultaneously by allocating uniformly\nacross all groups. Thus, the compatibility must be measured in terms of how much ef\ufb01ciency must be\nsacri\ufb01ced to simultaneously guarantee the properties.\nThis is particularly interesting since, if we do not insist on equalized odds, the most ef\ufb01cient allocation\nalgorithm (which simply allocates to buckets in order of decreasing p) immediately satis\ufb01es resource\nmonotonicity, consistency, and population monotonicity. Thus, the fair division axioms in question\ndo not have an inherent cost to ef\ufb01ciency, in contrast to punctual axioms in related settings [4, 9].\nHowever, two of them will drastically lower ef\ufb01ciency when imposed in addition to equalized odds\nwith respect to the optimal equalized-odds allocation as a baseline.\n\ng L+\n\ng S+\n\n4This point is unique because of the possible slopes for the cardinality line and the line segments making up\n\nthe maximum curve.\n\n6\n\n\f(a) Original situation.\n\n(b) First step, recurse on part right\nof x = 0.38.\n\n(c) Final permutation after recur-\nsion.\n\nFigure 3: Illustration for the proof of Theorem 2. The maximum curve is black, and the lower border\nof Sg is colored, which allows to track how the curve is permuted.\n\n4.1 Resource Monotonicity\n\nFortunately, we can \ufb01nd equalized-odds allocations that satisfy resource monotonicity for free, i.e.,\nwhile retaining maximum ef\ufb01ciency.\nTheorem 2. There is an allocation algorithm that satis\ufb01es equalized odds and resource monotonicity,\nwhich, on every input, leads to maximum ef\ufb01ciency among all equalized-odds algorithms.\nProof sketch. We sketch the argument here, and relegate the formal proof to Appendix A of the\nsupplementary material. As we described in Section 3, the signature of the optimal equalized-odds\nallocation is de\ufb01ned by the intersection of the cardinality line and the maximum curve. Increasing\nthe allocation cardinality shifts the cardinality line to a parallel position further to the right. Since\nthe maximum curve is monotone increasing, and since the cardinality line has negative slope, this\nwill shift the intersection further to the right on the maximum curve. This implies that the average\nallocations to either type cannot decrease, but we need to ensure that the allocation inside of each\ngroup does not reduce the allocation to any single bucket. This does not hold for most natural ways\nof implementing the signature.\nIt suf\ufb01ces to focus on a single group. We need to associate points on the maximum curve with group\nallocations of matching signature such that the allocation to any bucket increases monotonically along\nthe curve. It is suf\ufb01cient to do so for the corners of the maximum curve; convex combinations of the\ncorner allocations directly implement the signatures of a line segment while preserving monotonicity.\nGeometrically, we can specify such group allocations as a permutation of the fg \u25e6 ag curve, where\npermutation means that we cut the curve into \ufb01nitely many segments, reorder them, and translate\nthem to form a single connected curve. For example, the colored curves in Figs. 3b and 3c are\npermutations of the one in Fig. 3a. The permuted curve should touch all corners of the maximum\ncurve. Then, at a corner of the maximum curve, allocate to each bucket p its demand multiplied by\nthe fraction of line segments with corresponding slope that appear before the vertex on the permuted\ncurve. This ensures that the allocation implements the desired signature, and that the allocations\nincrease bucket-wise between corners.\nIn Lemma 5 in Appendix A of the supplementary material, we describe a recursive algorithm that\nproduces such a reordering. Figure 3 demonstrates this algorithm on an example. In every recursion\nstep, it \ufb01nds a section of the lower curve that matches the \ufb01rst line segment of the maximum curve,\nswaps this segment to the left, and then recurses on the subcurves to the right of the intersection. The\nmiddle section can be found ef\ufb01ciently without resorting to numerical search; an implementation of\nthe algorithm is included in our code at https://github.com/pgoelz/equalized.\n\n4.2 Consistency\n\nUnfortunately, the situation is less rosy for consistency: The only allocation rule that satis\ufb01es both\nconsistency and equalized odds is the uniform allocation.\nTheorem 3. Let A be an algorithm that guarantees equalized odds and consistency. Then, A will\nallocate uniformly on any given instance.\n\n7\n\n0.20.40.60.810.20.40.60.81xy0.20.40.60.810.20.40.60.81xy0.20.40.60.810.20.40.60.81xy\fg := L+\n\ng /D+\n\ng and \u03c1\u2212\n\ng := L\u2212\n\ng equal a single constant \u03c1+, and all \u03c1\u2212\n\ng /D\u2212\ng equal a single constant \u03c1\u2212.\n\nProof. We refer to the given instance as Instance I. Obtain Instance II by adding two agents with prob-\nability label 1/2 to each group and by setting the new cardinality constraint to k (n + 2 #groups)/n,\nsuch that the average allocation per agent remains the same. Now, every group contains positive and\nnegative agents, and the average allocations \u03c1+\ng exist. By equalized\nodds, all \u03c1+\nFix any bucket (g, p) with a probability label p > 0. We want to show that this bucket will be\nallocated \u03c1+ dp\ng units in Instance II: Construct an Instance IIIg,p from II by removing all buckets\nexcept for (g, p) from g, along with their allocations. By consistency, this does not change the\nallocation to any other group; thus, the \u03c1+ of the other groups are unchanged. Because (g, p) is now\nthe only partially positive bucket, \u03c1+\ng is just the per-agent allocation of (g, p). By equalized odds,\n(g, p) is allocated \u03c1+ dp\ng units in IIIg,p. By consistency, (g, p) receives the same amount in Instance II.\nSymmetrically, any bucket (g, p) with probability p < 1 is allocated \u03c1\u2212 dp\nIn any given group g, \ufb01x the bucket with label 1/2 and let their common allocation be (cid:96)1/2\n. Since\ng = \u03c1\u2212. It follows that every single bucket (g, p) in\n0 < 1/2 < 1, by the above, \u03c1+ = (cid:96)1/2\ng /d1/2\ng = \u03c1\u2212 dp\nInstance II is allocated \u03c1+ dp\ng units, so the allocation is uniform. If we remove the inserted\nagents along with their allocation, we recover Instance I with the original budget k. By consistency,\nthe allocation in Instance I was uniform.\n\ng units in Instance II.\n\ng\n\nIntuitively, the incompatibility between equalized odds and consistency is not particularly surprising.\nBy nature, equalized odds is sensitive to the composition of the total application pool, whereas\nconsistency rules out such dependencies. For example, if we remove applicants from one group along\nwith their allocations, this likely changes the mean allocation to positive and negative agents in that\ngroup. As a result, the classi\ufb01cation on the remaining agents must adapt to still satisfy equalized\nodds.\n\n4.3 Population Monotonicity\n\nFor population monotonicity, the situation is also fairly bad, albeit less so than for consistency. In the\nfollowing theorem, whose proof we defer to Appendix B.1 of the supplementary material, we show\nthat any algorithm satisfying population monotonicity and equalized odds will, on certain inputs,\nincur arbitrarily high loss in ef\ufb01ciency over the optimum equalized-odds allocation.\nTheorem 4. Let A denote an allocation algorithm satisfying equalized odds and population mono-\ntonicity. Then, A does not give a constant-factor approximation to the ef\ufb01ciency of the optimal\nequalized-odds algorithm.\n\nLet us compare this result with Theorem 3, whose assertion holds for any instance. By contrast,\nTheorem 4 is a worst-case result, and so it leaves room for algorithms satisfying population mono-\ntonicity and equalized odds that are signi\ufb01cantly more ef\ufb01cient than a uniform allocation in practice.\nIn fact, in Appendix B.2 of the supplementary material, we do construct a non-uniform algorithm\nwith these axiomatic properties that (slightly) outperforms uniform allocations. However, we will\nshortly see that, on a real dataset, requiring population monotonicity and equalized odds inevitably\nleads to ef\ufb01ciency close to uniform allocations.\n5 Empirical Results\n\nWe evaluate our approach on a dataset relating the FICO credit scores of 174 047 individuals to credit\ndelinquency. The dataset is based on TransUnion\u2019s TransRisk scores, and was originally published\nby the Federal Reserve [2]. We use a cleaned and aggregated version made publicly available by\nBarocas et al. [3] at https://github.com/fairmlbook/fairmlbook.github.io/tree/master/code/\ncreditscore. For each of four races (white, black, Hispanic, Asian), the individuals are partitioned\ninto buckets for 198 credit score values. For each bucket, we can compute its size and fraction of\nnon-defaulters. Our code is publicly available at https://github.com/pgoelz/equalized.\nFor different numbers k of loans to be given out, Fig. 4 shows the ef\ufb01ciency loss entailed by insisting\non certain fairness properties. As a baseline, we use the optimal non-fair allocation that greedily\nallocates to agents in descending order of p, regardless of their race. Insisting on equalized odds \u2014\nand, by Theorem 2, even additionally insisting on resource monotonicity \u2014 only incurs a small\nef\ufb01ciency penalty of less than 3.5%. Even uniform allocation loses at most 30% ef\ufb01ciency since 70%\n\n8\n\n\fFigure 4: Ef\ufb01ciency of three different equalized-odds algorithms on the FICO dataset, as a function\nof k and as a fraction of the optimal allocation without fairness constraints.\n\nof agents in the dataset do not become delinquent. The higher k becomes, the more even the optimal\nnon-fair algorithm is forced to allocate to agents that might default, and the lower the relative loss\nof uniform allocation becomes. Nevertheless, as long as k is not a large fraction of the number of\nagents, we suspect the price of consistency to be unacceptably high \u2014 as is evident from the fact that\nbanks use credit scoring at all.\nThe most interesting line is the third algorithm. Since we do not have a characterization of the\nbest algorithms satisfying equalized odds and population monotonicity, we test an algorithm that,\non every instance, will be at least as ef\ufb01cient as every such algorithm. This algorithm is based\non the observation that, if we remove all buckets from a group except for one with probability in\n(0, 1), any algorithm satisfying equalized odds must give this bucket its proportional share of k\nin the resulting instance. If population monotonicity is satis\ufb01ed, this gives us an upper bound on\nthe allocation to the bucket in the original instance. By maximizing for ef\ufb01ciency subject to these\nconstraints and equalized odds with a linear program, we obtain the desired upper bound on every\nequalized-odds algorithm that satis\ufb01es population monotonicity. As the graph shows, insisting on\npopulation monotonicity forces us into an ef\ufb01ciency dynamic that is essentially that of uniform\nallocation. While there is a gap of a few percentage points between the two curves, part of it might\nbe explained by the looseness of our upper bound. Just as in the case of consistency, population\nmonotonicity seems to be unacceptably costly unless we can satisfy a large fraction of the demand.\n\n6 Discussion\n\nWe have shown that equalized odds in a setting with cardinality-constrained resources is perfectly\ncompatible with the classic fair division axiom of resource monotonicity. However, our theoretical\nand empirical results imply that equalized odds is grossly incompatible with consistency and (more\nimportantly) population monotonicity.\nWhy is that a problem? On a practical level, the paradoxes these axioms are meant to prevent can lead\nto real dif\ufb01culties. For example, as mentioned in Section 2.3, a violation of population monotonicity\nmay give rise to a situation where we need to decrease a student\u2019s \ufb01nancial aid because another\nstudent declined to accept aid. On a conceptual level, it is hard to justify and explain the design of\nallocation algorithms that behave in such counter-intuitive ways.\nIn summary, our results tease out new tradeoffs between notions of fairness. We also believe our\nwork strengthens the case against equalized odds as a tenable standard for fair machine learning.\n\nAcknowledgments\n\nThis work was partially supported by the National Science Foundation under grants IIS-1350598, IIS-\n1714140, CCF-1525932, and CCF-1733556; by the Of\ufb01ce of Naval Research under grants N00014-\n16-1-3075 and N00014-17-1-2428; by a J.P. Morgan AI Research Award; and by a Guggenheim\nFellowship.\n\n9\n\n0250005000075000100000125000150000175000k0.700.750.800.850.900.951.00e\ufb03ciencyfractionofunfairalgorithmequalizedoddsupperbounduniform\fReferences\n\n[1] M.-F. Balcan, T. Dick, R. Noothigattu, and A. D. Procaccia. 2018. Envy-Free Classi\ufb01cation. In\n\nProceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS).\n\n[2] T. F. R. Bard. 2007. Report to the Congress on Credit Scoring and Its Effects on the Avail-\nhttps://www.federalreserve.gov/boarddocs/\n\nability and Affordability of Credit.\nrptcongress/creditscore/\n\n[3] S. Barocas, M. Hardt, and A. Narayanan. 2018. Fairness and Machine Learning. http:\n\n[4] D. Bertsimas, V. F. Farias, and N. Trichakis. 2011. The Price of Fairness. Operations Research\n\n[5] G. Birkhoff. 1946. Three Observations on Linear Algebra. Universidad Nacional de Tucum\u00e1n,\n\n[6] S. J. Brams and A. D. Taylor. 1996. Fair Division: From Cake-Cutting to Dispute Resolution.\n\n[7] F. Brandt and A. D. Procaccia. 2016. Handbook of Computational Social Choice. Cambridge\n\n//www.fairmlbook.org.\n\n59, 1 (2011), 17\u201331.\n\nRevista A 5 (1946), 147\u2013151.\n\nCambridge University Press.\n\nUniversity Press.\n\n45\u201398.\n\n[8] T. Calders, F. Kamiran, and M. Pechenizkiy. 2009. Building Classi\ufb01ers with Independency\nConstraints. In Proceedings of the 9th IEEE International Conference on Data Mining (ICDM).\n13\u201318.\n\n[9] I. Caragiannis, C. Kaklamanis, P. Kanellopoulos, and M. Kyropoulou. 2009. The Ef\ufb01ciency of\nFair Division. In Proceedings of the 5th Workshop on Internet and Network Economics (WINE).\n475\u2013482.\n\n[10] A. Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism\n\nPrediction Instruments. Big Data 5, 2 (2017), 153\u2013163.\n\n[11] S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq. 2017. Algorithmic Decision\nMaking and the Cost of Fairness. In Proceedings of the 23rd ACM SIGKDD International\nConference on Knowledge Discovery and Data Mining (KDD). 797\u2013806.\n\n[12] D. Foley. 1967. Resource Allocation and the Public Sector. Yale Economics Essays 7 (1967),\n\n[13] M. Hardt, E. Price, and N. Srebro. 2016. Equality of Opportunity in Supervised Learning.\nIn Proceedings of the 29th Conference on Neural Information Processing Systems (NeurIPS),\nVol. 29. 3315\u20133323.\n\n[14] C. Klamler. 2010. Fair Division. In Handbook of Group Decision and Negotiation, D. M.\n\nKilgour and C. Eden (Eds.). Springer, 183\u2013202.\n\n[15] J. Kleinberg, H. Lakkaraju, J. Leskovec, J. Ludwig, and S. Mullainathan. 2017. Human\nDecisions and Machine Predictions. The Quarterly Journal of Economics 133, 1 (2017),\n237\u2013293.\n\n[16] J. Kleinberg, S. Mullainathan, and M. Raghavan. 2017. Inherent Trade-Offs in the Fair Determi-\nnation of Risk Scores. In 8th Innovations in Theoretical Computer Science Conference (ITCS).\n43:1\u201343:23.\n\n[17] H. Moulin. 2004. Fair Division and Collective Welfare. MIT press.\n[18] G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, and K. Q. Weinberger. 2017. On Fairness and\nCalibration. In Proceedings of the 30th Conference on Neural Information Processing Systems\n(NeurIPS), Vol. 30. 5680\u20135689.\n\n[19] N. Siddiqi. 2012. Credit Risk Scorecards: Developing and Implementing Intelligent Credit\n\nScoring. Vol. 3. John Wiley & Sons.\n\n[20] H. Steinhaus. 1948. The Problem of Fair Division. Econometrica 16 (1948), 101\u2013104.\n[21] W. Thomson. 2011. Fair Allocation Rules. In Handbook of Social Choice and Welfare (1 ed.),\n\nK. J. Arrow, A. K. Sen, and K. Suzumura (Eds.). Vol. 2. Elsevier, Chapter 21, 393\u2013506.\n\n[22] J. von Neumann. 1953. A Certain Zero-Sum Two-Person Game Equivalent to the Optimal\nAssignment Problem. In Contributions to the Theory of Games, W. Kuhn and A. W. Tucker\n(Eds.). Vol. 2. Princeton University Press, 5\u201312.\n\n[23] A. Waters and R. Miikkulainen. 2014. GRADE: Machine Learning Support for Graduate\n\nAdmissions. AI Magazine 35, 1 (2014), 64\u201375.\n\n[24] M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi. 2017. Fairness Beyond Dis-\nparate Treatment & Disparate Impact: Learning Classi\ufb01cation without Disparate Mistreatment.\nIn Proceedings of the 26th International Conference on World Wide Web (WWW). 1171\u20131180.\n\n10\n\n\f[25] M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi. 2017. Fairness Constraints:\nMechanisms for Fair Classi\ufb01cation. In Proceedings of the 20th International Conference on\nArti\ufb01cial Intelligence and Statistics (AISTATS). 962\u2013970.\n\n[26] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. 2013. Learning Fair Representations.\nIn Proceedings of the 30th International Conference on Machine Learning (ICML). 325\u2013333.\n\n11\n\n\f", "award": [], "sourceid": 4519, "authors": [{"given_name": "Paul", "family_name": "Goelz", "institution": "Carnegie Mellon University"}, {"given_name": "Anson", "family_name": "Kahng", "institution": "Carnegie Mellon University"}, {"given_name": "Ariel", "family_name": "Procaccia", "institution": "Carnegie Mellon University"}]}