{"title": "Multi-Criteria Dimensionality Reduction with Applications to Fairness", "book": "Advances in Neural Information Processing Systems", "page_first": 15161, "page_last": 15171, "abstract": "Dimensionality reduction is a classical technique widely used for data analysis. One foundational instantiation is Principal Component Analysis (PCA),\nwhich minimizes the average reconstruction error. In this paper, we introduce the multi-criteria dimensionality reduction problem where we are given multiple objectives that need to be optimized simultaneously. As an application, our model captures several fairness criteria for dimensionality reduction such as the Fair-PCA problem introduced by Samadi et al. [NeurIPS18] and the Nash Social Welfare (NSW) problem. In the Fair-PCA problem, the input data is divided into k groups, and the goal is to find a single d-dimensional representation for all groups for which the maximum reconstruction error of any one group is minimized. In NSW the goal is to maximize the product of the individual variances of the groups achieved by the common low-dimensinal space.\n\nOur main result is an exact polynomial-time algorithm for the two-criteria dimensionality reduction problem when the two criteria are increasing concave functions. As an application of this result, we obtain a polynomial time algorithm for Fair-PCA for k=2 groups, resolving an open problem of Samadi et al.[NeurIPS18], and a polynomial time algorithm for NSW objective for k=2 groups. We also give approximation algorithms for k>2. Our technical contribution in the above results is to prove new low-rank properties of extreme point solutions to semi-definite programs. We conclude with the results of several experiments indicating improved performance and generalized application of our algorithm on real-world datasets.", "full_text": "Multi-Criteria Dimensionality Reduction with\n\nApplications to Fairness\n\nUthaipon (Tao) Tantipongpipat\u21e4\u2020\n\nSamira Samadi \u21e4\u2021\n\nMohit Singh\u21e4\u2020\n\nJamie Morgenstern\u21e4\n\nSantosh Vempala\u21e4\u2021\n\nAbstract\n\nDimensionality reduction is a classical technique widely used for data analysis.\nOne foundational instantiation is Principal Component Analysis (PCA), which\nminimizes the average reconstruction error. In this paper, we introduce the multi-\ncriteria dimensionality reduction problem where we are given multiple objectives\nthat need to be optimized simultaneously. As an application, our model captures\nseveral fairness criteria for dimensionality reduction such as the Fair-PCA problem\nintroduced by Samadi et al. [2018] and the Nash Social Welfare (NSW) problem.\nIn the Fair-PCA problem, the input data is divided into k groups, and the goal is to\n\ufb01nd a single d-dimensional representation for all groups for which the maximum\nreconstruction error of any one group is minimized. In NSW the goal is to maximize\nthe product of the individual variances of the groups achieved by the common\nlow-dimensional space.\nOur main result is an exact polynomial-time algorithm for the two-criteria dimen-\nsionality reduction problem when the two criteria are increasing concave functions.\nAs an application of this result, we obtain a polynomial time algorithm for Fair-\nPCA for k = 2 groups, resolving an open problem of Samadi et al. [2018], and\na polynomial time algorithm for NSW objective for k = 2 groups. We also give\napproximation algorithms for k > 2. Our technical contribution in the above results\nis to prove new low-rank properties of extreme point solutions to semi-de\ufb01nite\nprograms. We conclude with experiments indicating the effectiveness of algorithms\nbased on extreme point solutions of semi-de\ufb01nite programs on several real-world\ndatasets.\n\n1\n\nIntroduction\n\nDimensionality reduction is the process of choosing a low-dimensional representation of a large,\nhigh-dimensional data set. It is a core primitive for modern machine learning and is being used\nin image processing, biomedical research, time series analysis, etc. Dimensionality reduction can\nbe used during the preprocessing of the data to reduce the computational burden as well as at the\n\ufb01nal stages of data analysis to facilitate data summarization and data visualization [Raychaudhuri\net al., 1999; Iezzoni and Pritts, 1991]. Among the most ubiquitous and effective of dimensionality\nreduction techniques in practice are Principal Component Analysis (PCA) [Pearson, 1901; Jolliffe,\n1986; Hotelling, 1933], multidimensional scaling [Kruskal, 1964], Isomap [Tenenbaum et al., 2000],\nlocally linear embedding [Roweis and Saul, 2000], and t-SNE [Maaten and Hinton, 2008].\n\n\u21e4Georgia Institute of Technology.\n\n{tao,ssamadi6}@gatech.edu, mohit.singh@isye.gatech.edu,\n\njamiemmt.cs@gatech.edu, vempala@cc.gatech.edu\n\n\u2020Supported by NSF-AF:1910423 and NSF-AF:1717947.\n\u2021Supported in part by NSF awards CCF-1563838 and CCF-1717349.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fOne of the major obstacles to dimensionality reduction tasks in practice is complex high-dimensional\ndata structures that lie on multiple different low-dimensional subspaces. For example, Maaten\nand Hinton [2008] address this issue for low-dimensional visualization of images of objects from\ndiverse classes seen from various viewpoints, or Samadi et al. [2018] study PCA on human data\nwhen different groups in the data (e.g., high-educated vs low-educated or men vs women) have an\ninherently different structure. Although these two contexts might seem unrelated, our work presents a\ngeneral framework that addresses both issues. In both setting, a single criteria for the dimensionality\nreduction might not be suf\ufb01cient to capture different structures in the data. This motivates our study\nof multi-criteria dimensionality reduction.\nAs an illustration, consider applying PCA on a high dimensional data to do a visualization analysis in\nlow dimensions. Standard PCA aims to minimize the single criteria of average reconstruction error\nover the whole data. But the reconstruction error on different parts of data can be widely different. In\nparticular, Samadi et al. [2018] show that on real world data sets, PCA has more reconstruction error\non images of women vs images of men. A similar phenomenon is also noticed on other data sets\nwhen groups are formed based on education. Unbalanced average reconstruction error or equivalently\nunbalanced variance could have implications of representational harms [Crawford, 2017] in early\nstages of data analysis.\n\nMulti-criteria dimensionality reduction. Multi-criteria dimensionality reduction could be used\nas an umbrella term with speci\ufb01cations changing based on the applications and the metrics that\nthe machine learning researcher has in mind. Aiming for an output with a balanced error over\ndifferent subgroups seems to be a natural choice as re\ufb02ected by minimizing the maximum of average\nreconstruction errors studied by Samadi et al. [2018] and maximizing geometric mean of the variances\nof the groups, which is the well-studied Nash social welfare (NSW) objective [Kaneko and Nakamura,\n1979; Nash Jr, 1950]. Motivated by these settings, the more general question that we would like to\nstudy is as following.\nQuestion 1. How might one rede\ufb01ne dimensionality reduction to produce projections which optimize\ndifferent groups\u2019 representation in a balanced way?\n\nFor simplicity of explanation, we \ufb01rst describe our framework for PCA, but the approach is general\nand applies to a much wider class of dimensionality reduction techniques. Consider the data points\nas rows of an m \u21e5 n matrix A. For PCA, the objective is to \ufb01nd an n \u21e5 d projection matrix P\nF (this is equivalent to minimizing the reconstruction\nthat maximizes the Frobenius norm, kAPk2\nerror). Suppose that the rows of A belong to different groups, based on demographics or some other\nsemantically meaningful clustering. The de\ufb01nition of these groups need not be a partition; each\ngroup could be de\ufb01ned as a different weighting of the data set (rather than a subset, which is a 0/1\nweighting). Multi-criteria dimensionality reduction can then be viewed as simultaneously considering\nobjectives on the different weightings of A. One way to balance multiple objectives is to \ufb01nd a\nprojection P that maximizes the minimum objective value over each of the groups (weightings), i.e.,\n(FAIR-PCA)\n\nmax\n\nP :P T P =Id\n\nmin\n\n1\uf8ffi\uf8ffk kAiPk2\n\nF = hAT\n\ni Ai, P P Ti.\n\n2\n\n(We note that our FAIR-PCA is different from one in Samadi et al. [2018], but equivalent by additive\nand multiplicative scalings.) More generally, let Pd denote the set of all n \u21e5 d projection matrices P ,\ni.e., matrices with d orthonormal columns. For each group Ai, we associate a function fi : Pd ! R\nthat denotes the group\u2019s objective value for a particular projection. For any g : Rk ! R, we de\ufb01ne\nthe (f, g)-multi-criteria dimensionality reduction problem as \ufb01nding a d-dimensional projection P\nwhich optimizes\n\ng(f1(P ), f2(P ), . . . , fk(P )).\n\n(MULTI-CRITERIA-DIMENSION-REDUCTION)\n\nmax\nP2Pd\n\nIn the above example of max-min Fair-PCA, g is simply the min function and fi(P ) = kAiPk2 is\nthe total squared norm of the projection of vectors in Ai. Other examples include: de\ufb01ning each\nfi as the average squared norm of the projections rather than the total, or the marginal variance \u2014\nthe difference in total squared norm when using P rather than the best possible projection for that\ngroup. One could also choose the product function g(y1, . . . , yk) = Qi yi for the accumulating\npower mean of the projections, e.g. fi(P ) = kAiPk2 and g(y1, . . . , yk) =\u21e3Pi2[k] yp/2\ni \u23181/p\n\nfunction g. This is also a natural choice, famously introduced in Nash\u2019s solution to the bargaining\nproblemNash Jr [1950]; Kaneko and Nakamura [1979]. This framework can also describe the pth\n\n.\n\n\fThe appropriate weighting of k objectives often depends on the context and application. The central\nmotivating questions of this paper are the following:\n\n\u21e7 What is the complexity of FAIR-PCA ?\n\u21e7 More generally, what is the complexity of MULTI-CRITERIA-DIMENSION-REDUCTION ?\n\nFramed another way, we ask whether these multi-criteria optimization problems force us to incur\nsubstantial computational cost compared to optimizing g over A alone. Samadi et al. [2018] intro-\nduced the problem of FAIR-PCA and showed how to use the natural semi-de\ufb01nite relaxation to \ufb01nd\na rank-(d + k 1) approximation whose cost is at most that of the optimal rank-d approximation.\nFor k = 2 groups, this is an increase of 1 in the dimension (as opposed to the na\u00efve bound of 2d,\nby taking the span of the optimal d-dimensional subspaces for the two groups). The computational\ncomplexity of \ufb01nding the exact optimal solution to FAIR-PCA was left as an open question.\n\n1.1 Results and techniques\n\nLet us \ufb01rst focus on FAIR-PCA for ease of exposition. The problem can be reformulated as the\nfollowing mathematical program where we denote P P T by X. A natural approach to solving this\nproblem is to consider the SDP relaxation obtained by relaxing the rank constraint to a bound on the\ntrace.\n\nExact FAIR-PCA\n\nmax z\n\ni 2{ 1, . . . , k}\n\nhAT\n\ni Ai, Xi z\nrank(X) \uf8ff d\n0 X I\n\nSDP Relaxation of FAIR-PCA\n\nmax z\n\ni 2{ 1, . . . , k}\n\nhAT\n\ni Ai, Xi z\ntr(X) \uf8ff d\n0 X I\n\nOur \ufb01rst main result is that the SDP relaxation is exact when there are two groups. Thus \ufb01nding an\nextreme point of this SDP gives an exact algorithm for FAIR-PCA for two groups. Previously, only\napproximation algorithms were known for this problem. This result also resolves the open problem\nposed by Samadi et al. [2018].\nTheorem 1.1. Any optimal extreme point solution to the SDP relaxation for FAIR-PCA with two\ngroups has rank at most d. Therefore, 2-group FAIR-PCA can be solved in polynomial time.\nGiven m datapoints partitioned into k \uf8ff n groups in n dimensions, the algorithm runs in O(nm +\nn6.5) time. O(mnk) is from computing AT\ni Ai and O(n6.5) is from solving an SDP over n \u21e5 n\nPSD matrices [Ben-Tal and Nemirovski, 2001]. Our results also hold for the MULTI-CRITERIA-\nDIMENSION-REDUCTION when g is monotone nondecreasing in any one coordinate and concave,\nand each fi is an af\ufb01ne function of P P T (and thus a special case of a quadratic function in P ).\nTheorem 1.2. There is a polynomial time algorithm for 2-group MULTI-CRITERIA-DIMENSION-\nREDUCTION problem when g is concave and monotone nondecreasing for at least one of its two\narguments, and each fi is linear in P P T , i.e., fi(P ) = hBi, P P Ti for some matrix Bi(A).\nAs indicated in the theorem, the core idea is that extreme-point solutions of the SDP, in fact, have\nrank d, not just trace equal to d.\nFor k > 2, the SDP need not recover a rank d solution. In fact, the SDP may be inexact even\nfor k = 3 (see Section 8). Nonetheless, we show that we can bound the rank of a solution to the\nSDP and obtain the following result. We state it for FAIR-PCA, though the same bound holds for\nMULTI-CRITERIA-DIMENSION-REDUCTION under the same assumptions as in Theorem 1.1. Note\nthat this result generalizes Theorem 1.1.\nTheorem 1.3. For any concave g that is monotone nondecreasing in at least one of its argu-\nments, there exists a polynomial time algorithm for FAIR-PCA with k groups that returns a\n\n3\n\n\fd\n\nd +jq2k + 1\n2k-dimensional embedding whose objective value is at least that of the optimal\n4 3\nd-dimensional embedding. If g is only concave, then the solution lies in at most d + 1 dimensions.\nThis strictly improves and generalizes the bound of d + k 1 for FAIR-PCA . Moreover, if the\ndimensionality of the solution is a hard constraint, instead of tolerating s = O(pk) extra dimension\nin the solution, one may solve FAIR-PCA for target dimension d s to guarantee a solution of rank\nat most d. Thus, we obtain an approximation algorithm for FAIR-PCA of factor 1 O(pk)\n.\nTheorem 1.4. Let A1, . . . , Ak be data sets of k groups and suppose s := jq2k + 1\n2k < d.\n4 3\nd = 1 O(pk)\nThen, there exists a polynomial-time approximation algorithm of factor 1 s\nFAIR-PCA problem.\nd of\nThat is, the algorithm returns a project P 2P d of exact rank d with objective at least 1 s\nthe optimal objective. More details on the approximation result are in Section 4. The runtime of\nTheorems 1.2 and 1.3 depends on access to \ufb01rst order oracle to g and standard application of the\nellipsoid algorithm would take \u02dcO(n2) oracle calls.\nWe now focus our attention to the marginal loss function. This measures the maximum over the\ngroups of the difference between the variance of a common solution for the k groups and an\noptimal solution for an individual group (\u201cthe marginal cost of sharing a common subspace\"). For\nthis problem, the above scaling method could substantially harm the objective value, since the\ntarget function is nonlinear. MULTI-CRITERIA-DIMENSION-REDUCTION captures the marginal\nloss functions by setting the utility fi(P ) = kAiPk2\nF for each group i and\ng(f1, f2, . . . , fk) := min{f1, f2, . . . , fk}, giving an optimization problem\nF\u25c6\nF kAiPk2\nand the marginal loss objective is indeed the objective of the problem.\nIn Section 5, we develop a general rounding framework for SDPs with eigenvalue upper bounds and\nk other linear constraints. This algorithm gives a solution of desired rank that violates each constraint\nby a bounded amount. The precise statement is Theorem 1.8. It implies that for FAIR-PCA with\nmarginal loss as the objective the additive error is\n\nF maxQ2Pd kAiQk2\n\nQ2Pd kAiQk2\n\ni2[k]\u2713 max\n\nmax\n\nmin\nP2Pd\n\nto\n\nd\n\n(1)\n\n(A) := max\nS\u2713[m]\n\nbp2|S|+1cXi=1\n\ni(AS)\n\n|S|Pi2S Ai.\n\nwhere AS = 1\nIt is natural to ask whether FAIR-PCA is NP-hard to solve exactly. The following result implies that\nit is, even for the target dimension d = 1.\nTheorem 1.5. The max-min FAIR-PCA problem for target dimension d = 1 is NP-hard when the\nnumber of groups k is part of the input.\nThis raises the question of the complexity for constant k 3 groups. For k groups, we would\nhave k constraints, one for each group, plus the eigenvalue constraint and the trace constraint; now\nthe tractability of the problem is far from clear. In fact, as we show in Section 8, the SDP has an\nintegrality gap even for k = 3, d = 1. We therefore consider an approach beyond SDPs, to one that\ninvolves solving non-convex problems. Thanks to the powerful algorithmic theory of quadratic maps,\ndeveloped by Grigoriev and Pasechnik [2005], it is polynomial-time solvable to check feasibility\nof a set of quadratic constraints for any \ufb01xed k. As we discuss next, their algorithm can check for\nzeros of a function of a set of k quadratic functions, and can be used to optimize the function. Using\nthis result, we show that for d = k = O(1), there is a polynomial-time algorithm for rather general\nfunctions g of the values of individual groups.\nTheorem 1.6. Let the fairness objective be g : Rk ! R where g is a degree ` polynomial in some\ncomputable subring of Rk and each fi is quadratic for 1 \uf8ff i \uf8ff k. Then there is an algorithm to\nsolve the fair dimensionality reduction problem in time (`dn)O(k+d2).\n\n4\n\n\fBy choosing g to be the product polynomial over the usual (\u21e5, +) ring or the min function which is\ndegree k in the (min, +) ring, this applies to the variants of FAIR-PCA discussed above and various\nother problems.\n\nminhC, Xi subject to\n\nSDP extreme points. For k = 2, the underlying structural property we show is that extreme point\nsolutions of the SDP have rank exactly d. First, for k = d = 1, this is the largest eigenvalue problem,\nsince the maximum obtained by a matrix of trace equal to 1 can also be obtained by one of the\nextreme points in the convex decomposition of this matrix. This extends to trace equal to any d, i.e.,\nthe optimal solution must be given by the top k eigenvectors of AT A. Second, without the eigenvalue\nbound, for any SDP with k constraints, there is an upper bound on the rank of any extreme point, of\nO(pk), a seminal result of Pataki [1998] (see also Barvinok [1995]). However, we cannot apply this\ndirectly as we have the eigenvalue upper bound constraint. The complication here is that we have to\ntake into account the constraint X I without increasing the rank.\nTheorem 1.7. Let C and A1, . . . , Am be n \u21e5 n real matrices, d \uf8ff n, and b1, . . . bm 2 R. Suppose\nthe semi-de\ufb01nite program SDP(I):\n(2)\n(3)\n(4)\n(5)\nwhere i 2 {\uf8ff,, =}, has a nonempty feasible set. Then, all extreme optimal solutions X\u21e4 to\nSDP(I) have rank at most r\u21e4 := d +jq2m + 9\n2k. Moreover, given a feasible optimal solution,\n4 3\nTo prove the theorem, we extend Pataki [1998]\u2019s characterization of rank of SDP extreme points\nwith minimal loss in the rank. We show that the constraints 0 X I can be interpreted as\na generalization of restricting variables to lie between 0 and 1 in the case of linear programming\nrelaxations. From a technical perspective, our results give new insights into structural properties of\nextreme points of semi-de\ufb01nite programs and more general convex programs. Since the result of\nPataki [1998] has been studied from perspective of fast algorithms Boumal et al. [2016]; Burer and\nMonteiro [2003, 2005] and applied in community detection and phase synchronization Bandeira et al.\n[2016], we expect our extension of the result to have further applications in many of these areas.\n\nhAi, Xi i\ntr(X) \uf8ff d\n0 X In\n\nan extreme optimal solution can be found in polynomial time.\n\nbi 8 1 \uf8ff i \uf8ff m\n\nSDP iterative rounding. Using Theorem 1.7, we extend the iterative rounding framework for\nlinear programs (see Lau et al. [2011] and references therein) to semi-de\ufb01nite programs, where the\n0, 1 constraints are generalized to eigenvalue bounds. The algorithm has a remarkably similar \ufb02avor.\nIn each iteration, we \ufb01x the subspaces spanned by eigenvectors with 0 and 1 eigenvalues, and argue\nthat one of the constraints can be dropped while bounding the total violation in the constraint over the\ncourse of the algorithm. While this applies directly to the FAIR-PCA problem, in fact, is a general\nstatement for SDPs, which we give below.\nLet A = {A1, . . . , Am} be a collection of n \u21e5 n matrices. For any set S \u2713{ 1, . . . , m}, let i(S)\nthe ith largest singular of the average of matrices 1\n\nTheorem 1.8. Let C be a n \u21e5 n matrix and A = {A1, . . . , Am} be a collection of n \u21e5 n real\nmatrices, d \uf8ff n, and b1, . . . bm 2 R. Suppose the semi-de\ufb01nite program SDP:\n\n|S|Pi2S Ai. We let\nbp2|S|+1cXi=1\n\ni(S).\n\n(A) := max\nS\u2713[m]\n\nminhC, Xi subject to\n\nhAi, Xi bi 8 1 \uf8ff i \uf8ff m\ntr(X) \uf8ff d\n0 X In\n\nhas a nonempty feasible set and let X\u21e4 denote an optimal solution. The Algorithm ITERATIVE-SDP\n(see Figure 2 in Appendix) returns a matrix \u02dcX such that\n\n5\n\n\f1. rank of \u02dcX is at most d,\n2. hC, \u02dcXi \uf8ff hC, X\u21e4i, and\n3. hAi, \u02dcXi bi (A) for each 1 \uf8ff i \uf8ff m.\n\nThe time complexity of Theorems 1.7 and 1.8 is analyzed in Sections 2 and 5. Both algorithms\nintroduce the rounding procedures that do not contribute signi\ufb01cant computational cost; rather,\nsolving the SDPis the bottleneck for running time both in theory and practice.\n\n1.2 Related work\n\nAs mentioned earlier, Pataki [1998] (see also Barvinok [1995]) showed low rank solutions to semi-\nde\ufb01nite programs with a small number of af\ufb01ne constraints can be obtained ef\ufb01ciently. Restricting\na feasible region of certain SDPs relaxations with low-rank constraints has been shown to avoid\nspurious local optima [Bandeira et al., 2016] and reduce the runtime due to known heuristics and\nanalysis [Burer and Monteiro, 2003, 2005; Boumal et al., 2016]. We also remark that methods based\non Johnson-Lindenstrauss lemma can also be applied to obtain bi-criteria results for FAIR-PCA\nproblem. For example, So et al. [2008] give algorithms that give low rank solutions for SDPs with\naf\ufb01ne constraints without the upper bound on eigenvalues. Here we have focused on single criteria\nsetting, with violation either in the number of dimensions or the objective but not both. We also\nremark that extreme point solutions to linear programming have played an important role in the\ndesign of approximation algorithms [Lau et al., 2011] and our result adds to the comparatively small,\nbut growing, number of applications for utilizing extreme points of semi-de\ufb01nite programs.\nA closely related area, especially to MULTI-CRITERIA-DIMENSION-REDUCTION problem, is multi-\nobjective optimization which has a vast literature. We refer the reader to Deb [2014] and references\ntherein. We also remark that properties of extreme point solutions of linear programs [Ravi and\nGoemans, 1996; Grandoni et al., 2014] have also been utilized to obtain approximation algorithms\nto multi-objective problems. For semi-de\ufb01nite programming based methods, the closest works are\non simultaneous max-cut [Bhangale et al., 2015, 2018] that utilize the sum of squares hierarchy to\nobtain improved approximation algorithms.\nThe applications of multi-criteria dimensionality reduction in fairness are closely related to studies\non representational bias in machine learning [Crawford, 2017; Noble, 2018; Bolukbasi et al., 2016]\nand fair resource allocation in game theory [Wei et al., 2010; Fang and Bensaou, 2004]. There have\nbeen various mathematical formulations suggested for representational bias in ML [Chierichetti et al.,\n2017; Celis et al., 2018; Kleindessner et al., 2019; Samadi et al., 2018] among which our model\ncovers unbalanced reconstruction error in PCA suggested by Samadi et al. [2018]. From the game\ntheory literature, our model covers Nash social welfare objective [Kaneko and Nakamura, 1979;\nNash Jr, 1950] and others [Kalai et al., 1975; Kalai, 1977].\n\n2 Low-rank solutions of MULTI-CRITERIA-DIMENSION-REDUCTION\n\nIn this section, we show that all extreme solutions of SDP relaxation of MULTI-CRITERIA-\nDIMENSION-REDUCTION have low rank, proving Theorem 1.1-1.3. Before we state the results,\nwe make the following assumptions. In this section, we let g : Rk ! R be a concave function\nwhich is monotonic in at least one coordinate, and mildly assume that g can be accessed with a\npolynomial-time subgradient oracle and is polynomially bounded by its input. We are explicitly given\nfunctions f1, f2, . . . , fk which are af\ufb01ne in P P T , i.e. we are given real n \u21e5 n matrices B1, . . . , Bk\nand constants \u21b51,\u21b5 2, . . . ,\u21b5 k 2 R and fi(P ) =\u2326Bi, P P T\u21b5 + \u21b5i.\nWe assume g to be G-Lipschitz. For functions f1, . . . , fk, g that are L1, . . . , Lk, G-Lipschitz, we\nde\ufb01ne an \u270f-optimal solution to (f, g)-MULTI-CRITERIA-DIMENSION-REDUCTION problem as a pro-\njection matrix X 2 Rn\u21e5n, 0 X In of rank d whose objective value is at most G\u270f\u21e3Pk\n\ni\u23181/2\nfrom the optimum. In the context where an optimization problem has af\ufb01ne constraints Fi(X) \uf8ff bi\nwhere Fi is Li Lipschitz, we also de\ufb01ne \u270f-solution as a projection matrix X 2 Rn\u21e5n, 0 X In of\nrank d that violates ith af\ufb01ne constraints by at most \u270fLi. Note that the feasible region of the problem\nis implicitly bounded by the constraint X In.\n\ni=1 L2\n\n6\n\n\fIn this section, the algorithm may involve solving an optimization under a matrix linear inequality,\nwhich may not give an answer representable in \ufb01nite bits of computation. However, we give\nalgorithms that return an \u270f-close solution whose running time depends polynomially on log 1\n\u270f for any\n\u270f> 0. This is standard for computational tractability in convex optimization (see, for example, in\nBen-Tal and Nemirovski [2001]). Therefore, for ease of exposition, we omit the computational error\ndependent on this \u270f to obtain an \u270f-feasible and \u270f-optimal solution, and de\ufb01ne polynomial running\ntime as polynomial in n, k and log 1\n\u270f .\nWe \ufb01rst prove Theorem 1.7 below. To prove Theorem 1.1-1.3, we \ufb01rst show that extreme point\nsolutions in semi-de\ufb01nite cone under af\ufb01ne constraints and X I have low rank. The statement\nbuilds on a result of Pataki [1998]. We then apply our result to MULTI-CRITERIA-DIMENSION-\nREDUCTION problem, which contains the FAIR-PCA problem. Finally, we show that the existence\nof a low-rank solution leads to an approximation algorithm to FAIR-PCA problem.\nProof of Theorem 1.7: Let X\u21e4 be an extreme point optimal solution to SDP(I). Suppose rank of X\u21e4,\nsay r, is more than r\u21e4. Then we show a contradiction to the fact that X\u21e4 is extreme. Let 0 \uf8ff l \uf8ff r of\nthe eigenvalues of X\u21e4 be equal to one. If l d, then we have l = r = d since tr(X) \uf8ff d and we are\ndone. Thus we assume that l \uf8ff d 1. In that case, there exist matrices Q1 2 Rn\u21e5rl, Q2 2 Rn\u21e5l\nand a symmetric matrix \u21e4 2 R(rl)\u21e5(rl) such that\n\nX\u21e4 = (Q1 Q2)\u2713\u21e40\n\n0\n1 Q1 = Irl, QT\n\nIl\u25c6 (Q1 Q2)> = Q1\u21e4Q>1 + Q2QT\n\n2\n\nwhere 0 \u21e4 Irl, QT\ni.e. Q = (Q1 Q2) has orthonormal columns. Now, we have\n\n2 Q2 = Il, and that the columns of Q1 and Q2 are orthogonal,\n\nhAi, X\u21e4i = hAi, Q1\u21e4Q>1 + Q2Q>2 i = hQ>1 AiQ1, \u21e4i + hAi, Q2Q>2 i\nand tr(X\u21e4) = hQ>1 Q1, \u21e4i + tr(Q2Q>2 ) so that hAi, X\u21e4i and tr(X\u21e4) are linear in \u21e4.\nObserve the set of s\u21e5 s symmetric matrices forms a vector space of dimension s(s+1)\n2 with the above\ninner product where we consider the matrices as long vectors. If m + 1 < (rl)(rl+1)\nthen there\nexists a (r l) \u21e5 (r l)-symmetric matrix 6= 0 such that hQ>1 AiQ1, i = 0 for each 1 \uf8ff i \uf8ff m\nand hQ>1 Q1, i = 0.\n2 is feasible for small > 0, which implies a\nBut then we claim that Q1(\u21e4 \u00b1 )Q>1 + Q2QT\ncontradiction to X\u21e4 being extreme. Indeed, it satis\ufb01es all the linear constraints by construction of .\nThus it remains to check the eigenvalues of the newly constructed matrix. Observe that\n\n2\n\nQ1(\u21e4 \u00b1 )Q>1 + Q2QT\n\n2 = Q\u2713\u21e4 \u00b1 0\n\n0\n\nIl\u25c6 Q>\n\n0\n\nIl\u25c6 .\n\nwhich implies r l \uf8ff 1\n\nwith orthonormal Q. Thus it is enough to consider the eigenvalues of\u2713\u21e4 \u00b1 0\nObserve that eigenvalues of the above matrix are exactly l ones and eigenvalues of \u21e4 \u00b1 . Since\neigenvalues of \u21e4 are bounded away from 0 and 1, one can \ufb01nd small such that the eigenvalues\nof \u21e4 \u00b1 are bounded away from 0 and 1 as well, so we are done. Therefore, we must have\nm + 1 (rl)(rl+1)\nFor the algorithmic version, given feasible \u00afX, we iteratively reduce r l by at least one until\nm + 1 (rl)(rl+1)\n, we obtain by using Gaussian elimination.\nNow we want to \ufb01nd the correct value of \u00b1 so that \u21e40 =\u21e4 \u00b1 takes one of the eigenvalues to\nzero or one. First, determine the sign of hC, i to \ufb01nd the correct sign to move \u21e4 that keeps the\nobjective non-increasing, say it is in the positive direction. Since the set of feasible X is convex and\nbounded, the ray f (t) = Q1(\u21e4 + t)Q>1 + Q2Q>2 , t 0 intersects the boundary of feasible region\nat a unique t0 > 0. Perform binary search for the correct value of t0 and set = t0 up to the desired\naccuracy. Since hQ>1 AiQ1, i = 0 for each 1 \uf8ff i \uf8ff m and hQ>1 Q1, i = 0, the additional tight\nconstraint from moving \u21e40 \u21e4+ to the boundary of feasible region must be an eigenvalue\nconstraint 0 X In, i.e., at least one additional eigenvalue is now at 0 or 1, as desired. We apply\neigenvalue decomposition to \u21e40 and update Q1 accordingly, and repeat.\n\n4. By l \uf8ff d 1, we have r \uf8ff r\u21e4.\n\n2 +q2m + 9\n\n. While m + 1 < (rl)(rl+1)\n\n2\n\n2\n\n2\n\n7\n\n\fThe algorithm involves at most n rounds of reducing r l, each of which involves Gaussian\nelimination and several iterations (from binary search) of 0 X In which can be done by\neigenvalue value decomposition. Gaussian elimination and eigenvalue decomposition can be\ndone in O(n3) time, and therefore the total runtime of SDP rounding is \u02dcO(n4) which is polynomial. 2\n\nIn practice, one may initially reduce the rank of given feasible \u00afX using an LP rounding (in O(n3.5)\ntime) introduced in Samadi et al. [2018] so that the number of rounds of reducing r l is further\nbounded by k 1. The runtime complexity is then O(n3.5) + \u02dcO(kn3).\n2 +q2m + 9\nThe next corollary is obtained from the bound r l \uf8ff 1\n4 in the proof of Theorem 1.7.\nCorollary 2.1. The number of fractional eigenvalues in any extreme point solution X to SDP(I) is\nbounded byq2m + 9\n4 1\nWe are now ready to state the main result of this section that we can \ufb01nd a low-rank solution for\nMULTI-CRITERIA-DIMENSION-REDUCTION . Recall that Pd is the set of all n \u21e5 d projection\nmatrices P , i.e., matrices with d orthonormal columns and the (f, g)-MULTI-CRITERIA-DIMENSION-\nREDUCTION problem is to solve\n\n2 \uf8ff bp2m + 1c.\n\ng(f1(P ), f2(P ), . . . , fk(P ))\n\nmax\nP2Pd\n\n(6)\n\nTheorem 2.2. There exists a polynomial-time algorithm to solve (f, g)-MULTI-CRITERIA-\nDIMENSION-REDUCTION that returns a solution \u02c6X of rank at most r\u21e4 := d +jq2k + 1\n2k\n4 3\n\nwhose objective value is at least that of the optimal d-dimensional embedding.\n\nThe proof of Theorem 2.2 appears in Appendix. If the assumption that g is monotonic in at least one\ncoordinate is dropped, Theorem 2.2 will hold with r\u21e4 by indexing constraints (11) in SDP(II) for all\ngroups instead of k 1 groups.\n3 Experiments\n\nFirst, we note that experiments for two groups were done in Samadi et al. [2018]. The algorithm\noutputs optimal solutions with exact rank, despite their weaker guarantee that the rank may be\nviolated by at most 1. Hence, our result of Theorem 1.1 is a mathematical explanation of their\nmissing empirical \ufb01nding for two groups. We extend their experiments to more number of groups\nand objectives as follows (See Appendix for results on NSW objective and an additional dataset).\nWe perform experiments using the algorithm as outlined in Section 2 on the Default Credit data\nset [Yeh and Lien, 2009] for different target dimensions d. The data is partitioned into k = 4, 6 groups\nby education and gender and preprocessed to have mean zero and the same variance over features. We\nspeci\ufb01ed our algorithms by two objectives for MULTI-CRITERIA-DIMENSION-REDUCTION problem\nintroduced earlier: the marginal loss function and Nash social welfare. The code is publicly available\nat https://github.com/SDPforAll/multiCriteriaDimReduction. Figure 1 shows\nthe marginal loss by our algorithms compared to the standard PCA. Our algorithms signi\ufb01cantly\nreduce \u201cunfairness\u201d in terms of the marginal loss that the standard PCA introduces.\nIn the experiments, extreme point solutions from SDPs enjoy lower rank violation than our worst-case\nguarantee. Indeed, while the guarantee is that the numbers of additional rank are at most s = 1, 2 for\nk = 4, 6, almost all SDP solutions have exact rank, and in rare cases when the solutions are not exact,\nthe rank violation is only one. While we know that our rank violation guarantee cannot be improved\nin general (due to the integrality gap in Section 8), this opens a question of whether the guarantee is\nbetter for instances that arise in practice.\n\n8\n\n\fFigure 1: Marginal loss function (see (1)) of standard PCA compared to our SDP-based algorithms\non Default Credit data. SDPRoundNSW and SDPRoundMar-Loss are two runs of the SDP-based\nalgorithms maximizing NSW and minimizing marginal loss. Left: k = 4 groups. Right: k = 6.\n\nReferences\nAfonso S Bandeira, Nicolas Boumal, and Vladislav Voroninski. On the low-rank approach for\nsemide\ufb01nite programs arising in synchronization and community detection. In Conference on\nlearning theory, pages 361\u2013382, 2016.\n\nAlexander I Barvinok. Feasibility testing for systems of real quadratic equations. Discrete &\n\nComputational Geometry, 10(1):1\u201313, 1993.\n\nAlexander I. Barvinok. Problems of distance geometry and convex properties of quadratic maps.\n\nDiscrete & Computational Geometry, 13(2):189\u2013202, 1995.\n\nAhron Ben-Tal and Arkadi Nemirovski. Lectures on modern convex optimization: analysis, algo-\n\nrithms, and engineering applications, volume 2. Siam, 2001.\n\nAmey Bhangale, Swastik Kopparty, and Sushant Sachdeva. Simultaneous approximation of constraint\nsatisfaction problems. In International Colloquium on Automata, Languages, and Programming,\npages 193\u2013205. Springer, 2015.\n\nAmey Bhangale, Subhash Khot, Swastik Kopparty, Sushant Sachdeva, and Devanathan Thimvenkat-\nachari. Near-optimal approximation algorithm for simultaneous m ax-c ut. In Proceedings of the\nTwenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1407\u20131425. Society\nfor Industrial and Applied Mathematics, 2018.\n\nTolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to\ncomputer programmer as woman is to homemaker? debiasing word embeddings. In Advances in\nneural information processing systems, pages 4349\u20134357, 2016.\n\nNicolas Boumal, Vlad Voroninski, and Afonso Bandeira. The non-convex burer-monteiro approach\nworks on smooth semide\ufb01nite programs. In Advances in Neural Information Processing Systems,\npages 2757\u20132765, 2016.\n\nSamuel Burer and Renato DC Monteiro. A nonlinear programming algorithm for solving semide\ufb01nite\n\nprograms via low-rank factorization. Mathematical Programming, 95(2):329\u2013357, 2003.\n\nSamuel Burer and Renato DC Monteiro. Local minima and convergence in low-rank semide\ufb01nite\n\nprogramming. Mathematical Programming, 103(3):427\u2013444, 2005.\n\nL Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, and Nisheeth K\nVishnoi. Fair and diverse dpp-based data summarization. arXiv preprint arXiv:1802.04023, 2018.\n\nFlavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through\n\nfairlets. In Advances in Neural Information Processing Systems, pages 5029\u20135037, 2017.\n\nKate Crawford. The trouble with bias, 2017. URL http://blog.revolutionanalytics.\ncom/2017/12/the-trouble-with-bias-by-kate-crawford.html. Invited Talk\nby Kate Crawford at NIPS 2017, Long Beach, CA.\n\n9\n\n\fKalyanmoy Deb. Multi-objective optimization. In Search methodologies, pages 403\u2013449. Springer,\n\n2014.\n\nZuyuan Fang and Brahim Bensaou. Fair bandwidth sharing algorithms based on game theory\nframeworks for wireless ad hoc networks. In IEEE infocom, volume 2, pages 1284\u20131295. Citeseer,\n2004.\n\nFabrizio Grandoni, R Ravi, Mohit Singh, and Rico Zenklusen. New approaches to multi-objective\n\noptimization. Mathematical Programming, 146(1-2):525\u2013554, 2014.\n\nD Yu Grigor\u2019ev and NN Vorobjov Jr. Solving systems of polynomial inequalities in subexponential\n\ntime. Journal of Symbolic Computation, 5(1-2):37\u201364, 1988.\n\nDima Grigoriev and Dmitrii V Pasechnik. Polynomial-time computing over quadratic maps i:\n\nsampling in real algebraic sets. Computational complexity, 14(1):20\u201352, 2005.\n\nHarold Hotelling. Analysis of a complex of statistical variables into principal components. Journal\n\nof educational psychology, 24(6):417, 1933.\n\nAmy F Iezzoni and Marvin P Pritts. Applications of principal component analysis to horticultural\n\nresearch. HortScience, 26(4):334\u2013338, 1991.\n\nIan T Jolliffe. Principal component analysis and factor analysis. In Principal component analysis,\n\npages 115\u2013128. Springer, 1986.\n\nEhud Kalai. Proportional solutions to bargaining situations:\n\ninterpersonal utility comparisons.\n\nEconometrica: Journal of the Econometric Society, pages 1623\u20131630, 1977.\n\nEhud Kalai, Meir Smorodinsky, et al. Other solutions to nash bargaining problem. Econometrica, 43\n\n(3):513\u2013518, 1975.\n\nMamoru Kaneko and Kenjiro Nakamura. The nash social welfare function. Econometrica: Journal\n\nof the Econometric Society, pages 423\u2013435, 1979.\n\nMatth\u00e4us Kleindessner, Samira Samadi, Pranjal Awasthi, and Jamie Morgenstern. Guarantees for\n\nspectral clustering with fairness constraints. arXiv preprint arXiv:1901.08668, 2019.\n\nJoseph B Kruskal. Multidimensional scaling by optimizing goodness of \ufb01t to a nonmetric hypothesis.\n\nPsychometrika, 29(1):1\u201327, 1964.\n\nLap Chi Lau, Ramamoorthi Ravi, and Mohit Singh. Iterative methods in combinatorial optimization,\n\nvolume 46. Cambridge University Press, 2011.\n\nLaurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine\n\nlearning research, 9(Nov):2579\u20132605, 2008.\n\nJohn F Nash Jr. The bargaining problem. Econometrica: Journal of the Econometric Society, pages\n\n155\u2013162, 1950.\n\nSa\ufb01ya Umoja Noble. Algorithms of oppression: How search engines reinforce racism. nyu Press,\n\n2018.\n\nG\u00e1bor Pataki. On the rank of extreme matrices in semi-de\ufb01nite programs and the multiplicity of\n\noptimal eigenvalues. Mathematics of operations research, 23(2):339\u2013358, 1998.\n\nKarl Pearson. On lines and planes of closest \ufb01t to systems of points in space. The London, Edinburgh,\n\nand Dublin Philosophical Magazine and Journal of Science, 2(11):559\u2013572, 1901.\n\nRam Ravi and Michel X Goemans. The constrained minimum spanning tree problem. In Scandinavian\n\nWorkshop on Algorithm Theory, pages 66\u201375. Springer, 1996.\n\nSoumya Raychaudhuri, Joshua M Stuart, and Russ B Altman. Principal components analysis to\nsummarize microarray experiments: application to sporulation time series. In Biocomputing 2000,\npages 455\u2013466. World Scienti\ufb01c, 1999.\n\n10\n\n\fUCI Machine Learning Repository. Adult data set. https://archive.ics.uci.edu/ml/datasets/adult.\n\nAccessed May 2019.\n\nSam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding.\n\nscience, 290(5500):2323\u20132326, 2000.\n\nSamira Samadi, Uthaipon Tantipongpipat, Jamie H Morgenstern, Mohit Singh, and Santosh Vempala.\nThe price of fair pca: One extra dimension. In Advances in Neural Information Processing Systems,\npages 10976\u201310987, 2018.\n\nAnthony Man-Cho So, Yinyu Ye, and Jiawei Zhang. A uni\ufb01ed theorem on sdp rank reduction.\n\nMathematics of Operations Research, 33(4):910\u2013920, 2008.\n\nJoshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for\n\nnonlinear dimensionality reduction. science, 290(5500):2319\u20132323, 2000.\n\nGuiyi Wei, Athanasios V Vasilakos, Yao Zheng, and Naixue Xiong. A game-theoretic method of fair\nresource allocation for cloud computing services. The journal of supercomputing, 54(2):252\u2013269,\n2010.\n\nI-Cheng Yeh and Che-hui Lien. The comparisons of data mining techniques for the predictive\naccuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2):\n2473\u20132480, 2009.\n\n11\n\n\f", "award": [], "sourceid": 8678, "authors": [{"given_name": "Uthaipon", "family_name": "Tantipongpipat", "institution": "Georgia Tech"}, {"given_name": "Samira", "family_name": "Samadi", "institution": "Georgia Tech"}, {"given_name": "Mohit", "family_name": "Singh", "institution": "Georgia Tech"}, {"given_name": "Jamie", "family_name": "Morgenstern", "institution": "University of Washington"}, {"given_name": "Santosh", "family_name": "Vempala", "institution": "Georgia Tech"}]}