{"title": "Multiclass Performance Metric Elicitation", "book": "Advances in Neural Information Processing Systems", "page_first": 9356, "page_last": 9365, "abstract": "Metric Elicitation is a principled framework for selecting the performance metric that best reflects implicit user preferences. However, available strategies have so far been limited to binary classification. In this paper, we propose novel strategies for eliciting multiclass classification performance metrics using only relative preference feedback. We also show that the strategies are robust to both finite sample and feedback noise.", "full_text": "Multiclass Performance Metric Elicitation\n\nGaurush Hiranandani\n\nDepartment of Computer Science\n\nShant Boodaghians\n\nDepartment of Computer Science\n\nUniversity of Illinois at Urbana-Champaign\n\nUniversity of Illinois at Urbana-Champaign\n\ngaurush2@illinois.edu\n\nboodagh2@illinois.edu\n\nRuta Mehta\n\nDepartment of Computer Science\n\nOluwasanmi Koyejo\n\nDepartment of Computer Science\n\nUniversity of Illinois at Urbana-Champaign\n\nUniversity of Illinois at Urbana-Champaign\n\nrutameht@illinois.edu\n\nsanmi@illinois.edu\n\nAbstract\n\nMetric Elicitation is a principled framework for selecting the performance metric\nthat best re\ufb02ects implicit user preferences. However, available strategies have so\nfar been limited to binary classi\ufb01cation. In this paper, we propose novel strategies\nfor eliciting multiclass classi\ufb01cation performance metrics using only relative pref-\nerence feedback. We also show that the strategies are robust to both \ufb01nite sample\nand feedback noise.\n\n1\n\nIntroduction\n\nConsider a machine learning model for cancer diagnosis and treatment support where the doctor\napplies a cost-sensitive predictive model to classify patients into cancer categories [23, 24]. It is clear\nthat the chosen costs directly determine the model decisions, and thus dictate the patient outcomes.\nThis raises an obvious question, how should the cost-tradeoffs be chosen so that it re\ufb02ects the\nexpert\u2019s decision-making? As it turns out, going from expert intuition to precise quantitative cost\ntrade-offs is often dif\ufb01cult. Needless to say, this is not only true for medical applications as there are\na plethora of domains where the question of \u2018what to measure\u2019 poses a serious ongoing challenge [3].\nTo address this issue, Hiranandani et al. [7] recently formalized the problem of Metric Elicitation\n(ME), which aims to determine the user\u2019s performance metric based on preference feedback. The\nmotivation behind ME is that employing the performance metrics which re\ufb02ect innate user tradeoffs\nwill allow one to learn models that best capture user preferences. As humans are often inaccurate in\nproviding absolute quality feedback [17], Hiranandani et al. [7] propose to use pairwise comparison\nqueries, where the user (oracle) is asked to compare two classi\ufb01ers and provide an indicator of relative\npreference. They show that in various settings, the user\u2019s innate metric can be elicited based on this\npreference feedback. Figure 1 (reproduced from Hiranandani et al. [7]) illustrates this framework.\nConceptually, ME is applicable to any learning setting. However, Hiranandani et al. [7] only proposed\nmethods for eliciting binary classi\ufb01cation performance metrics. This manuscript extends prior\nwork by proposing ME strategies for the more complicated multiclass classi\ufb01cation setting \u2013 thus\nsigni\ufb01cantly increasing the use cases for ME. Similar to the binary case, we also consider the most\ncommon families of performance metrics which are functions of the confusion matrix [15]; however,\nin our case, the elements of the confusion matrix summarize multiclass error statistics.\nIn order to perform ef\ufb01cient multilcass performance metric elicitation, we study novel geometric\nproperties of the space of multiclass confusion matrices. Our analysis reveals that due to structural\ndifferences between the space of binary and multiclass confusions, we can not trivially extend the\nelicitation procedure used for binary to the multiclass case. Instead, we provide novel strategies for\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fThe Bayes Optimal\nTable 1:\nRestricted-Bayes Optimal (RBO).\n\n(BO) and\n\nName\n\nDe\ufb01nition\n\nBO confusion c\nover a subset S \u2286 C\n\nargmaxc\u2208S\u2286C \u03c6(c)\n\nRBO classi\ufb01er hk1,k2\n\nargmax\nh\u2208Hk1,k2\n\n\u03c8(d(h))\n\nRBO diagonal\nconfusion dk1,k2\n\nargmax\nd\u2208Dk1,k2\n\n\u03c8(d)\n\nFigure 1: Metric Elicitation framework [7].\n\neliciting linear functions of the multiclass confusion matrix and extend elicitation to more complicated\nyet popular functional forms such as linear-fractional functions of the confusion matrix elements [14].\nSpeci\ufb01cally, the elicitation procedures involve binary-search type algorithms that are robust to both\n\ufb01nite sample and oracle feedback noise. In addition, the proposed methods can be applied either by\nquerying pairwise classi\ufb01er preferences or pairwise confusion matrix preferences. We \ufb01nd that this\nequivalence is crucial for practical applications.\nIn summary, our main contributions are novel query ef\ufb01cient metric elicitation algorithms for\nmulticlass classi\ufb01cation. We study ME for linear functions of the confusion matrix and then brie\ufb02y\ndiscuss extensions to more complicated functional forms such as the linear-fractional and arbitrary\nmonotonic functions of the confusion matrix (with details in the appendix). Lastly, we show that the\nproposed procedures are robust to \ufb01nite sample and feedback noise, thus are useful in practice.\nNotation. Matrices and vectors are denoted by bold upper case and bold lower case letters, respec-\ntively. Let R and Z+ denote the set of reals and positive integers, respectively. For k \u2208 Z+, we\ndenote the index set {1, 2,\u00b7\u00b7\u00b7 , k} by [k]. \u2206k denotes the (k \u2212 1) dimensional simplex. (cid:107)\u00b7(cid:107)1,(cid:107)\u00b7(cid:107)2,\nand(cid:107)\u00b7(cid:107)\u221e denote the (cid:96)1-norm, (cid:96)2-norm, and (cid:96)\u221e-norm, respectively. We denote the inner product of\ntwo vectors by (cid:104)\u00b7,\u00b7(cid:105). Given a matrix A, o\ufb00 -diag(A) returns a vector of off-diagonal elements of A\nin row-major form, and diag(A) returns a vector of diagonal elements of A.\n\n2 Preliminaries\nThe standard multiclass classi\ufb01cation setting comprises k classes with X \u2208 X and Y \u2208 [k] repre-\nsenting the input and output random variables, respectively. We have access to a dataset of size n\ndenoted by {(x, y)i}n\ni=1, generated iid from a distribution P(X, Y ). Let \u03b7i(x) = P(Y = i|X = x)\nand \u03b6i = P(Y = i) for i \u2208 [k] be the conditional and the unconditional probability of the k classes,\nrespectively. Let H = {h : X \u2192 \u2206k} be the set of all classi\ufb01ers. A confusion matrix for a classi\ufb01er\nh is denoted by C(h, P) \u2208 Rk\u00d7k, where its elements are given by:\n\nCij(h, P) = P(Y = i, h = j)\n\nfor i, j \u2208 [k].\n\n(1)\n\n(2)\n\nUnder the population law P, it is useful to keep the following decomposition in mind:\n\nP(Y = i, h = i) = \u03b6i \u2212 P(Y = i, h (cid:54)= i) =\u21d2 Cii(h, P) = \u03b6i \u2212 k(cid:88)\n\nCij(h, P).\n\nj=1,j(cid:54)=i\n\nUsing this decomposition, any confusion matrix is uniquely represented by its q := (k2 \u2212 k) off-\ndiagonal elements. Hence, we will represent a confusion matrix C(h, P) by a vector c(h, P) =\no\ufb00 -diag(C(h, P)), and interchangeably refer the confusion matrix as a vector of \u2018off-diagonal\nconfusions\u2019. The space of off-diagonal confusions is denoted by C = {c(h, P) = o\ufb00 -diag(C(h, P)) :\nh \u2208 H}. For clarity, we will suppress the dependence on P and h if it is clear from the context.\nPerformance of a classi\ufb01er is often determined by just the misclassi\ufb01cation and not the type of\nmisclassi\ufb01cation, especially when the number of classes is large. Therefore, we will also consider\nmetrics that only depend on correct and incorrect predictions, namely P(Y = i, h = i) and P(Y =\ni, h (cid:54)= i). Following the decomposition in (2), such metrics require only the diagonal elements\nof the original confusion matrices. Given a confusion matrix C, we will denote its diagonal by\nd = diag(C) and refer it as the vector of \u2018diagonal confusions\u2019. The space of diagonal confusions is\nrepresented by D = {d = diag(C(h)) : h \u2208 H}.\n\n2\n\n\fLet \u03c6 : [0, 1]q \u2192 R and \u03c8 : [0, 1]k \u2192 R be the performance metrics for a classi\ufb01er h determined by\nits corresponding off-diagonal and diagonal confusion entries c(h) and d(h), respectively. Without\nloss of generality (wlog), we assume the metrics \u03c6 and \u03c8 are utilities so that larger values are preferred.\nFurthermore, the metrics are scale invariant as global scale does not affect the learning problem [15].\nFor this manuscript, we assume the following regularity assumption on the data distribution.\n\nAssumption 1. We assume that the functions gij(r) = P(cid:104) \u03b7i(X)\n\nand strictly decreasing for r \u2208 [0,\u221e).\nIntuitively, this weak assumption ensures that when the cost or reward tradeoffs for the classes change,\nthe preferred confusion matrices for those cost or reward tradeoffs also change (and vice-versa).\n\n(cid:105) \u2200 i, j \u2208 [k] are continuous\n\n\u03b7j (X) \u2265 r\n\n2.1 Bayes Optimal and Restricted Bayes Optimal Confusions and Classi\ufb01ers\n\nAs illustrated in Table 1, the Bayes Optimal (BO) confusion c represents the optimal value of the\noff-diagonal confusions according to the metric \u03c6 over a subset S \u2286 C. This is analogously de\ufb01ned\nfor \u03c8 and D. The Restricted Bayes Optimal (RBO) entities are of interest for diagonal metrics \u03c8, and\nindicate the case where classi\ufb01ers are \u2018restricted\u2019 to predict only classes k1, k2 \u2208 [k]. Thus Hk1,k2\nand Dk1,k2 denote the space of classi\ufb01ers which exclusively predict either k1 or k2 and the associated\nspace of diagonal confusions, respectively. Note that for such restricted classi\ufb01ers h, Cii(h) = di(h)\nevaluates to zero at every index i (cid:54)= k1, k2.\n\n2.2 Performance Metrics\n\nWe \ufb01rst discuss elicitation for the following two major types of metrics used in classi\ufb01cation.\nDe\ufb01nition 1. Diagonal Linear Performance Metric (DLPM): We denote this family by \u03d5DLP M .\nGiven a \u2208 Rk such that (cid:107)a(cid:107)1 = 1 ( wlog., due to scale invariance), the metric is de\ufb01ned as:\n\u03c8(d) := (cid:104)a, d(cid:105). This is also called weighted accuracy [15] and focuses on correct classi\ufb01cation.\nDe\ufb01nition 2. Linear Performance Metric (LPM): We denote this family by \u03d5LP M . Given a \u2208 Rq\nsuch that (cid:107)a(cid:107)2 = 1 (wlog., due to scale invariance), the metric is de\ufb01ned as: \u03c6(c) := (cid:104)a, c(cid:105).\nCost-sensitive linear metrics belong to \u03d5LP M [1] and focus on the types of misclassi\ufb01cations.\n\nThe difference of norms in the de\ufb01nitions is only for simplicity of exposition and chosen to best\ncomplement the underlying metric elicitation algorithm and vice-versa. Moreover, notice that the\nelements of diagonal confusions (d\u2019s) and off-diagonal confusions (c\u2019s) re\ufb02ect correct and incorrect\nclassi\ufb01cation, respectively. Thus, according to standard practice, wlog., we focus on eliciting\nmonotonically increasing DLPMs and monotonically decreasing LPMs in their respective arguments.\n\n2.3 Metric Elicitation; Problem Setup\n\nThis section describes the problem of Metric Elicitation and the associated oracle query. Our\nde\ufb01nitions follow from Hiranandani et al. [7], extended so the confusion elements and the performance\nmetrics correspond to the multiclass classi\ufb01cation setting. The following de\ufb01nitions hold analogously\nfor the diagonal case by replacing \u03c6, c and C by \u03c8, d, and D, respectively.\nDe\ufb01nition 3 (Oracle Query). Given two classi\ufb01ers h, h(cid:48) (equivalent to off-diagonal confusions c, c(cid:48)\nrespectively), a query to the Oracle (with metric \u03c6) is represented by:\n\n\u0393(h, h(cid:48)) = \u2126(c, c(cid:48)) = 1[\u03c6(c) > \u03c6(c(cid:48))] =: 1[c (cid:31) c(cid:48)],\n\n(3)\nwhere \u0393 : H \u00d7 H \u2192 {0, 1} and \u2126 : C \u00d7 C \u2192 {0, 1}. The query asks whether h is preferred to h(cid:48)\n(equivalent to c is preferred to c(cid:48)), as measured by \u03c6.\nWe elicit metrics which are functions of the confusion matrix, thus comparison queries using\nclassi\ufb01ers are indistinguishable from comparison queries using confusions. Henceforth, for simplicity\nof notation, we denote any query as confusions based query. Next, we formally state the ME problem.\nDe\ufb01nition 4 (Metric Elicitation with Pairwise Queries (given {(x, y)i}n\ni=1)). Suppose that the\noracle\u2019s (unknown) performance metric is \u03c6. Using oracle queries of the form \u2126(\u02c6c, \u02c6c(cid:48)), where \u02c6c, \u02c6c(cid:48)\nare the estimated off-diagonal confusions from samples, recover a metric \u02c6\u03c6 such that (cid:107)\u03c6 \u2212 \u02c6\u03c6(cid:107) < \u03ba\nunder a suitable norm (cid:107) \u00b7 (cid:107) for suf\ufb01ciently small error tolerance \u03ba > 0.\n\n3\n\n\fd2\n\nv2 =\n(0, \u03b62, 0)\n\nd3\n\nv3 = (0, 0, \u03b63)\n\ndk2\n\n(0, \u03b6k2 )\n\n\u2202D+\n\nk1,k2\n\n\u03b6k1\n2 ,\n\n(\n\n\u03b6k2\n2 )\n\nd1\n\n\u2202D\u2212\n\nk1,k2\n\n(a)\n\nv1 = (\u03b61, 0, 0)\n\n(\u03b6k1 , 0)\n\ndk1\n\n(b)\n\n(cid:96)\u2217\n\nu2\n\nc\u2217 \u2212\u2207\u03c6\u2217\nS\u03bb\n\n\u03bb\n\n\u2217\n\n(cid:96)\n\nu1\n\n(c)\n\nf\u2217\nC\n\nc\n\no\n\nc\u2217\n\nuk\n\n(c)\n\nFigure 2: (a) Geometry of the space of diagonal confusions D for k = 3: a strictly convex space.\nNotice that each of the three axis-aligned faces are equivalent in geometry to the following \ufb01gure\nin (b); (b) Geometry of diagonal confusions when restricted to classi\ufb01ers predicting only classes k1\nand k2 i.e. Dk1,k2; (c) A sphere S\u03bb centered at o with radius \u03bb, contained in the convex space of\n\u2217\noff-diagonal confusions C. f\u2217(c) denotes the distance of c from the hyperplane (cid:96)\n\ntangent at c\u2217.\n\nThe performance of ME is evaluated both by the \ufb01delity of the recovered metric and the query\ncomplexity. Given the formal de\ufb01nitions, we can now proceed. As is standard in the decision theory\nliterature [13, 7], we present our ME solution by \ufb01rst assuming access to population quantities such\nas the population confusions c(h, P), then examine practical implementation by considering the\nestimation error from \ufb01nite samples e.g. with empirical confusions \u02c6c(h,{(x, y)i}n\n\ni=1).\n\n3 Geometry and Parametrizations of the Query Spaces\nFor any query based approach, it is important to understand the structure of the query space. Thus, we\n\ufb01rst study the properties of the query spaces and then develop parametrizations required for ef\ufb01cient\nelicitation. Readers may \ufb01nd these properties independently useful in other applications as well.\n3.1 Geometry of the space of diagonal confusions D and parametrization of its boundary\nLet vi \u2208 Rk for i \u2208 [k] be the vectors with \u03b6i at the i-th index and zero everywhere else. Notice that\nvi\u2019s are the diagonal confusions of the trivial classi\ufb01ers predicting only class i on the entire space X .\nProposition 1 (Geometry of D \u2013 Figure 2 (a)). Under Assumption 1, the space of diagonal confusions\nD is strictly convex, closed, and contained in the box [0, \u03b61] \u00d7 \u00b7\u00b7\u00b7 \u00d7 [0, \u03b6k]. The diagonal confusions\nvi \u2200 i \u2208 [k] are the only vertices of D. Moreover, for any k1, k2 \u2208 [k], the 2-dimensional (k1, k2) axes-\naligned face of D is Dk1,k2 (Figure 2 (b)), which is equivalent to the space of binary classi\ufb01cation\nconfusion matrices con\ufb01ned to classes k1, k2. In particular, Dk1,k2 is strictly convex.\nProposition 1 characterizes the geometry of the space of diagonal confusions D. Figure 2(a) illustrates\nthis geometry when k = 3. Interestingly, the 2-dimensional axes-aligned faces of D (Figure 2 (b))\nhave exactly the same geometry as the space of binary classi\ufb01cation confusion matrices (compare\nthis with Figure 2(a) of Hiranandani et al. [7]), where recall that a binary classi\ufb01cation confusion\nmatrix is uniquely determined by its two diagonal elements due to (2). We will exploit the set Dk1,k2\n(more speci\ufb01cally, its boundary) for the elicitation task. Now notice that for \u03c8 \u2208 \u03d5DLP M , the RBO\nclassi\ufb01er restricted to predict classes k1, k2, predicts the label (out of the two possible choices) that\nmaximizes the expected utility conditioned on the instance. This is discussed below.\nProposition 2. Let \u03c8 \u2208 \u03d5DLP M be parametrized by a such that (cid:107)a(cid:107)1 = 1, and let k1, k2 \u2208 [k], then\n\n(cid:27)\n\n(cid:26) k1,\n\nk2,\n\nhk1,k2 (x) =\n\nif ak1\u03b7k1 (x) \u2265 ak2 \u03b7k2(x)\no.w.\n\nis the Restricted Bayes Optimal classi\ufb01er (restricted to classes k1, k2) with respect to \u03c8.\nFor a metric \u03c8 \u2208 \u03d5DLP M , Proposition 2 provides RBO classi\ufb01ers in Hk1,k2, which further gives\nus RBO diagonal confusions dk1,k2 using (1). We know that this dk1,k2 is unique, since any linear\nmetric over a strictly convex domain (Dk1,k2) is maximized at a unique point on the boundary [2]. So,\ngiven a DLPM, we have access to a unique point in the query space. This allows us to de\ufb01ne and then\nparametrize a subset of the query space, speci\ufb01cally, the upper boundary of Dk1,k2 through DLPMs.\n\n4\n\n\fk1,k2\n\n.\n\nk1,k2\n\nk1,k2\n\nk1,k2\n\n(cid:80)k\n\nDe\ufb01nition 5. The upper boundary of Dk1,k2, denoted by \u2202D+\n, constitutes the RBO diagonal\nconfusions con\ufb01ned to classes k1, k2 \u2208 [k] for monotonically increasing DLPMs (ai \u2265 0\u2200 i \u2208 [k])\nsuch that at least one out of ak1 or ak2 is non-zero (i.e. ak1 + ak2 > 0).\nParameterizing the upper boundary \u2202D+\n. Let m \u2208 [0, 1]. Construct a DLPM by setting\nak1 = m, ak2 = 1 \u2212 m, and ai = 0 for i (cid:54)= k1, k2. By using Proposition 2 and (1), obtain its RBO\ndiagonal confusions, which by de\ufb01nition lies on the upper boundary. Thus, varying m in this process,\nparametrizes the upper boundary \u2202D+\n. We denote this parametrization by \u03bd(m; k1, k2), where\n\u03bd : ([0, 1]; k1, k2) \u2192 \u2202D+\n3.2 Geometry of the space C and parametrization of the enclosed sphere\nRecall that, unlike the diagonal case, we focus on eliciting LPMs monotonically decreasing in the\nelements of the off-diagonal confusions (Section 2.2). To this end, let ui \u2208 C for i \u2208 [k] be the\noff-diagonal confusions achieved by trivial classi\ufb01ers predicting only class i on the entire space X .\nProposition 3 (Geometry of C \u2013 Figure 2 (c)). The space of off-diagonal confusions C is convex and\ncontained in the box [0, \u03b61](k\u22121) \u00d7 \u00b7\u00b7\u00b7 \u00d7 [0, \u03b6k](k\u22121). {ui}k\ni=1 belong to the set of vertices of C. C\ni=1 ui which corresponds to the off-diagonal confusions of the\nalways contains the point o = 1\ntrivial classi\ufb01er that randomly predicts each class with equal probability on the entire space X .\nk\nWe \ufb01nd that the space of off-diagonal confusions C has quite different geometry than the diagonal\ncase. For instance, C is not strictly convex. Nevertheless, since C is convex and always contains the\npoint o, we may make the following assumption. Please see Figure 2(c) for an illustration.\nAssumption 2. There exists a q-dimensional sphere S\u03bb \u2282 C of radius \u03bb > 0 centered at o.\nSuch a sphere always exists as long as the class-conditional distributions are not completely over-\nlapping i.e. there is some signal for non-trivial classi\ufb01cation. A method to obtain S\u03bb is discussed in\nSection 5. Now recall that the optimum for a linear function optimized over a sphere is given by the\nslope of the function scaled by the radius of the sphere. This is formalized as a trivial lemma below.\nLemma 1. Let \u03c6 \u2208 \u03d5LP M be parametrized by a such that (cid:107)a(cid:107)2 = 1, then the unique optimal\noff-diagonal confusion c over the sphere S\u03bb is a point on the boundary of S\u03bb given by c = \u03bba + o.\nGiven an LPM, Lemma 1 provides a unique point in the query space S\u03bb \u2282 C. This gives us an\nopportunity to characterize and then parametrize a subset of the query space through LPMs. Since\nwe focus on eliciting monotonically decreasing LPMs, we parametrize the lower boundary of S\u03bb.\nDe\ufb01nition 6. The lower boundary of S\u03bb, denoted by \u2202S\u2212\n\u03bb , constitutes the set of optimal off-diagonal\nconfusions over the sphere S\u03bb for LPMs with ai \u2264 0 \u2200 i \u2208 [q] (monotonically decreasing condition).\nParameterizing the lower boundary of the enclosed sphere \u2202S\u2212\n\u03bb . We follow the standard method\nfor parametrizing points on the surface of a sphere via angles. Let \u03b8 be a (q\u2212 1)-dimensional vector of\nangles, where all the angles except the primary angle are in second quadrant, i.e. {\u03b8i \u2208 [\u03c0/2, \u03c0]}q\u22122\ni=1 ,\nand the primary angle is in the third quadrant, i.e. \u03b8(q\u22121) \u2208 [\u03c0, 3\u03c0/2]. Construct an LPM ((cid:107)a(cid:107)2 = 1)\nj=1 sin \u03b8j cos \u03b8i for i \u2208 [q \u2212 1] and aq = \u03a0q\u22121\nby setting ai = \u03a0i\u22121\nj=1 sin \u03b8j. The choice of the quadrants\nensures the monontonically decreasing condition i.e. {ai \u2264 0}q\ni=1. By using Lemma 1, obtain its BO\noff-diagonal confusions over the sphere S\u03bb, which clearly lies on the lower boundary. Thus, varying\n\u03b8 in this procedure, parametrizes the lower boundary \u2202S\u2212\n\u03bb . We denote this parametrization by \u00b5(\u03b8),\nwhere \u00b5 : [\u03c0/2, \u03c0]q\u22122 \u00d7 [\u03c0, 3\u03c0/2] \u2192 \u2202S\u2212\n\u03bb .\n\n4 Metric Elicitation\nUsing the outlined parametrizations {\u03bd, \u00b5}, we propose ef\ufb01cient binary-search type algorithms to\nelicit oracle\u2019s implicit performance metric. We will \ufb01rst discuss elicitation procedures with no\nfeedback noise from the oracle. We will later show robustness to noisy feedback in Section 5.\n\n4.1 DLPM Elicitation\n\nThe following lemma concerning a broader family of metrics is the route to our elicitation procedures.\nSince both linear and linear-fractional functions are quasiconcave, the lemma applies to both.\n\n5\n\n\fAlgorithm 1: DLPM Elicitation\nInput: \u0001 > 0, oracle \u2126, \u02c6a1 = 1\nFor i = 2,\u00b7\u00b7\u00b7 , k do\n\n(cid:12)(cid:12)(cid:12)mb \u2212 ma(cid:12)(cid:12)(cid:12) > \u0001 do\n\nInitialize: ma = 0, mb = 1.\nWhile\n\n2\n\n4\n\n4\n.\n\n, and\n\n, md = ma+mb\n\n\u2022 Set mc = 3ma+mb\nme = ma+3mb\na\n\u2022 Set d\n1,i = \u03bd(ma; 1, i) (i.e. parametriza-\ntion of \u2202D+\n1,i in Section 3.1). Similarly, set\nb\ne\nd\nc\n1,i.\n1,i, d\nd\n1,i, d\n1,i, d\nd\na\nc\n\u2022 Query \u2126(d\n1,i, d\n1,i), \u2126(d\n1,i, d\ne\nb\nd\ne\n1,i).\n1,i), and \u2126(d\n\u2126(d\n1,i, d\n1,i, d\n\u2022 [ma, mb] \u2190 ShrinkInterval-1 (responses).\nmd \u02c6a1.\n\n. Then set \u02c6ai = 1\u2212md\n,\u00b7\u00b7\u00b7 , \u02c6ak(cid:107)\u02c6a(cid:107)1\n\n(cid:16) \u02c6a1(cid:107)\u02c6a(cid:107)1\n\nc\n1,i),\n\n(cid:17)\n\n.\n\n2\n\nSet md = ma+mb\n\nOutput: \u02c6a =\n\nAlgorithm 2: LPM Elicitation\nInput: \u0001 > 0, oracle \u2126, \u03bb, and \u03b8 = \u03b8(1)\nFor t = 1, 2,\u00b7\u00b7\u00b7 , T do\n\nSet \u03b8a = \u03b8c = \u03b8d = \u03b8e = \u03b8b = \u03b8(t).\nif (t%(q \u2212 1)) Set j = t%(q \u2212 1); else j = q \u2212 1.\nif (j == q \u2212 1) Initialize: \u03b8a\nelse Initialize: \u03b8a\nj = \u03c0/2, \u03b8b\nWhile\n\nj = \u03c0, \u03b8b\nj = \u03c0.\n\n(cid:12)(cid:12)(cid:12) > \u0001 do\n\n(cid:12)(cid:12)(cid:12)\u03b8b\n\nj = 3\u03c0/2.\n\nj \u2212 \u03b8a\n\nj\n3\u03b8a\nj +\u03b8b\nj\n4\n\n\u03b8a\nj +\u03b8b\nj\n\nj +3\u03b8b\n\u03b8a\nj\n\u03bb in\n\n4\n\n2\n\n.\n\n, \u03b8d\n\nj =\n\nj =\n\nj =\n\n, and \u03b8e\n\n\u2022 Set \u03b8c\n\u2022 Set ca = \u00b5(\u03b8a) (i.e. parametrization of \u2202S\u2212\nSection 3.2). Similarly, set cc, cd, ce, cb.\n\u2022 Query \u2126(cc, ca), \u2126(cd, cc), \u2126(ce, cd), \u2126(cb, ce)\nj ] \u2190 ShrinkInterval-2 (responses).\n\u2022 [\u03b8a\nj ) and then set \u03b8(t) = \u03b8d.\nSet \u03b8d\n2 (\u03b8a\n(T ) \u2200i \u2208 [q \u2212 1],\nOutput: \u02c6ai = \u03a0i\u22121\n\u02c6aq = \u03a0q\u22121\n\nj + \u03b8b\n(T )\nj=1 sin \u03b8\nj\n(T )\nj=1 sin \u03b8\nj\n\nj , \u03b8b\nj = 1\n\ncos \u03b8i\n.\n\nk1,k2\n\nk1,k2\n\nLemma 2. Let \u03c8 : D \u2192 R be a quasiconcave metric which is monotone increasing in all {di}k\ni=1.\nFor k1, k2 \u2208 [k], let \u03c1+ : [0, 1] \u2192 \u2202D+\nbe a continuous, bijective, parametrization of the upper\nboundary. Then the composition \u03c8 \u25e6 \u03c1+ : [0, 1] \u2192 R is quasiconcave and thus unimodal on [0, 1].\nRemark 1. Under Assumption 1, every supporting hyperplane of Dk1,k2 supports a unique point on\nthe boundary \u2202D+\nand vice-versa (Proposition 1); therefore, the composition \u03c8 \u25e6 \u03c1+ has no \ufb02at\nregions. In other words, the function \u03c8 \u25e6 \u03c1+ is concave.\nThe proof of Lemma 2 \ufb01rst shows that any quasiconcave metric \u03c8 de\ufb01ned on the space D is\nalso quasiconcave on the restricted space Dk1,k2, and then shows the quasiconcavity and thus the\nunimodality (due to the one-dimensional parametrization of \u2202D+\n) of \u03c8 on a further restricted\nspace \u2202D+\n. Furthermore, Remark 1 reveals that the function \u03c8 \u25e6 \u03c1+ is concave, allowing us to\ndevise the following binary-search type method for elicitation.\nSuppose that the oracle\u2019s metric is \u03c8\u2217 \u2208 \u03d5DLP M parametrized by a\u2217 where(cid:107)a\u2217(cid:107)1 = 1, {a\u2217\ni }k\ni=1 \u2265 0\n(Section 2.2). Using the parametrization \u03bd, Algorithm 1 returns an estimate \u02c6a of a\u2217. It takes two\nclasses at a time, class 1 and class i. Since the metric is unimodal on \u2202D+\n1,i (Lemma 2), the algorithm\napplies binary-search in the inner while-loop to estimate the ratio a\u2217\ni /a\u2217\n1. The ShrinkInterval-1\nsubroutine shrinks the interval [ma, mb] into half based on the oracle responses in the usual binary-\nsearch way for searching the optimum (Figure 4, Appendix A). The algorithm repeats this (k \u2212 1)\n1}. Finally, it outputs a normalized metric estimate \u02c6a.\ntimes to estimate the ratios {a\u2217\n\n1, . . . , a\u2217\n\n2/a\u2217\n\nk/a\u2217\n\nk1,k2\n\nk1,k2\n\n4.2 LPM Elicitation\n\nWe now discuss LPM elicitation, where the metrics are assumed to be monotonically decreasing in\nthe off-diagonal confusions. Unfortunately, \u2202C may have \ufb02at regions due to lack of strict convexity,\nso the algorithm for the diagonal case does not apply. Instead, we consider a query space given by\nthe sphere S\u03bb \u2282 C and propose a coordinate-wise binary-search style algorithm, which is an outcome\nof our novel geometric characterization and the approach in Derivative-Free Optimization (DFO) [9].\nSuppose that the oracle\u2019s metric is \u03c6\u2217 \u2208 \u03d5LP M parametrized by a\u2217 where(cid:107)a\u2217(cid:107)2 = 1, {a\u2217\ni=1 \u2264 0\ni }q\n(Section 2.2). Using the parametrization \u00b5(\u03b8) of \u2202S\u2212\n\u03bb (Section 3.2), Algorithm 2 returns an estimate\n\u02c6a of a\u2217. In each iteration, the algorithm updates one angle \u03b8j keeping other angles \ufb01xed by a\nbinary-search procedure, where again the ShrinkInterval-2 subroutine shrinks the interval [\u03b8a\nj] by\nhalf based on the oracle responses (Figure 5, Appendix A). Then the algorithm cyclically updates\neach angle until it converges to a metric suf\ufb01ciently close to the true metric. The convergence is\nassured because, intuitively, the algorithm via a dual interpretation minimizes a smooth, strongly\n\u2217\nconvex function f\u2217(c) measuring the distance of the boundary points from a hyperplane (cid:96)\n, whose\nslope is given by a\u2217 and is tangent at the BO confusion c\u2217 (see Figure 2(c)).\n\nj , \u03b8b\n\n6\n\n\fTable 2: DLPM elicitation at \u0001 = 0.01 for synthetic data. #Q denotes the number of queries.\n\nClasses k = 3\n\nClasses k = 4\n\n\u03c8\u2217 = a\u2217\n\n(0.21, 0.59, 0.20)\n(0.23, 0.15, 0.62)\n\n\u02c6\u03c8 = \u02c6a\n\n(0.21, 0.60, 0.20)\n(0.23, 0.15, 0.62)\n\n#Q\n56\n56\n\n\u03c8\u2217 = a\u2217\n\n(0.22, 0.13, 0.14, 0.52)\n(0.58, 0.17, 0.08, 0.18)\n\n\u02c6\u03c8 = \u02c6a\n\n(0.22, 0.13, 0.14, 0.52)\n(0.58, 0.17, 0.08, 0.18)\n\n#Q\n84\n84\n\n5 Guarantees\nWe discuss robustness under the following feedback model, which is useful in practical scenarios.\nDe\ufb01nition 7 (Oracle Feedback Noise: \u0001\u2126 \u2265 0). The oracle responses correctly as long as |\u03c6(c) \u2212\n\u03c6(c(cid:48))| > \u0001\u2126 (analogously |\u03c8(d) \u2212 \u03c8(d(cid:48))| > \u0001\u2126). Otherwise, it may provide incorrect answers.\nIn other words, the oracle may respond incorrectly if the confusions are too close as measured by the\nmetric \u03c6 (analogously \u03c8). Next, we discuss elicitation guarantees for DLPM and LPM elicitation.\nTheorem 1. Given \u0001, \u0001\u2126 \u2265 0, and a 1-Lipschitz DLPM \u03c8\u2217 parametrized by a\u2217. Then the output \u02c6a of\nAlgorithm 1 after O((k \u2212 1) log 1\n\u0001 ) queries to the oracle satis\ufb01es (cid:107)a\u2217 \u2212 \u02c6a(cid:107)\u221e \u2264 O(\u0001 +\n\u221a\n\u0001\u2126), which\nis equivalent to (cid:107)a\u2217 \u2212 \u02c6a(cid:107)2 \u2264 O(\nThe following theorem guarantees LPM elicitation when the sphere radius dominates the oracle noise.\nTheorem 2. Given \u0001, \u0001\u2126 \u2265 0, and a 1-Lipschitz LPM \u03c6\u2217 parametrized by a\u2217. Suppose \u03bb (cid:29) \u0001\u2126, then\n\nthe output \u02c6a of Algorithm 2 after O(cid:0)z1 log(z2/(q\u00012))(q \u2212 1) log \u03c0\nq(\u0001 +(cid:112)\u0001\u2126/\u03bb)), where z1, z2 are constants independent of \u0001 and q.\n\n(cid:1) queries satis\ufb01es (cid:107)a\u2217 \u2212 \u02c6a(cid:107)2 \u2264\n\nk(\u0001 +\n\n\u0001\u2126)) using standard norm bounds.\n\n\u221a\nO(\n\n\u221a\n\n\u221a\n\n2\u0001\n\n\u221a\n\nWe see that the algorithms are robust to noise, and their query complexity depends linearly in the\nunknown entities. The term z1 log(z2/(q\u00012)) may attribute to the number of cycles in Algorithm 2,\nbut due to the curvature of the sphere, we observe that it is not a dominating factor in the query\ncomplexity. For instance, we \ufb01nd that when \u0001 = 10\u22122, two cycles (i.e. T = 2(q \u2212 1) in Algorithm 2)\nare suf\ufb01cient for achieving elicitation up to the error tolerance\nq\u0001. One remaining question for LPM\nelicitation is to select a suf\ufb01ciently large value of \u03bb. Algorithm 3 (Appendix D) provides an of\ufb02ine\nprocedure to compute a \u03bb \u2265 \u02dcr/k, where \u02dcr is the radius of the largest ball contained in the set C.\nME with Finite Samples: As a \ufb01nal step, we consider the following questions when working with\n\ufb01nite samples: (a) do we get the correct feedback from querying \u2126(\u02c6c, \u02c6c(cid:48)) instead of querying \u2126(c, c(cid:48))?\n(b) what is the effect of \u02c6\u03b7i\u2019s when used in place of true \u03b7i\u2019s? The answers are straightforward. Since\nthe sample estimates of confusion matrices are consistent estimators and the metrics discussed are\n1-Lipschitz with respect to the confusion matrices, with high probability, we gather correct oracle\nfeedback as long as we have suf\ufb01cient samples. Furthermore, subject to regularity assumptions,\nLemma 3 of Hiranandani et al. [7] shows that the errors due to using \u02c6\u03b7 affect the (binary) confusion\nmatrices on the boundary in a controlled manner. Since Algorithm 1 uses pairwise RBO (binary)\nclassi\ufb01ers, it inherits the error guarantees in the multiclass case. Due to limited space, we do not\nrepeat the details here. On the other hand, since Algorithm 2 does not use the boundary, its results are\nagnostic to \ufb01nite sample error as long as the sphere is contained within the feasible region C.\n\n6 Experiments\nIn this section, we empirically validate the results of theorems 1 and 2 and investigate sensitivity due\nto \ufb01nite sample estimates.1 For the ease of judgments, we show results for k = 3 and k = 4 classes.\n\n6.1 Synthetic Data Experiments\nWe assume a joint distribution for X = [\u22121, 1] and Y = [k]. This is given by the marginal distribution\nfX = U[\u22121, 1] and \u03b7i(x) = 1\n1+epix for i \u2208 [k], where U[\u22121, 1] is the uniform distribution on [\u22121, 1]\nand {pi}k\ni=1 are the parameters controlling the degree of noise in the labels. We \ufb01x (p1, p2, p3) =\n(1, 3, 5) and (p1, p2, p3, p4) = (1, 3, 6, 10) for experiments with three and four classes, respectively.\nTo verify elicitation, we \ufb01rst de\ufb01ne a true metric \u03c8\u2217 or \u03c6\u2217. This speci\ufb01es the query outputs of\nAlgorithm 1 or Algorithm 2. Then we run the algorithms to check whether or not we recover the same\n\n1A subset of results is shown here. Refer Appendix F for more results.\n\n7\n\n\fTable 3: LPM elicitation at \u0001 = 0.01 for synthetic data. #Q denotes the number of queries.\n\nClasses\n\n3\n3\n4\n\n4\n\n\u03c6\u2217 = a\u2217\n\n(-0.37, -0.89, -0.09, -0.23, -0.04, -0.03)\n(-0.80, -0.55, -0.18, -0.08, -0.14, -0.05)\n(-0.90, -0.28 -0.10, -0.31, -0.04, -0.05,\n-0.03, -0.04, -0.02, -0.01, -0.01, -0.01)\n(-0.54, -0.10, -0.62, -0.52, -0.03, -0.07,\n-0.11, -0.07, -0.14, -0.03, -0.03, -0.04)\n\n\u02c6\u03c6 = \u02c6a\n\n(-0.37, -0.89, -0.09, -0.23, -0.04, -0.03)\n(-0.80, -0.55, -0.18, -0.08, -0.14, -0.05)\n(-0.90, -0.28, -0.10, -0.31, -0.04, -0.05,\n-0.03, -0.04, -0.02, -0.01, -0.01, -0.01)\n(-0.55, -0.11, -0.62, -0.51, -0.03, -0.07,\n-0.11, -0.07, -0.14, -0.03, -0.03, -0.04)\n\n#Q\n320\n320\n704\n\n704\n\nFigure 3: DLPM elicitation on real data for \u0001 = 0.01. For randomly chosen hundred a\u2217, we show the\n\nproportion of times our estimates \u02c6a obtained with 4(k\u22121)(cid:6)log(1/\u0001)(cid:7) queries satisfy (cid:107)a\u2217\u2212 \u02c6a(cid:107)\u221e \u2264 \u03c9.\neven for small \u0001 = 0.01, and as predicted, this requires only 4(k \u2212 1)(cid:6)log(1/\u0001)(cid:7) and 4T(cid:6)log(\u03c0/2\u0001)(cid:7)\n\nmetric. Some results are shown in Table 2 and Table 3. Results verify that we elicit the true metrics\nqueries for DLPM and LPM elicitation respectively, where (cid:100)\u00b7(cid:101) is the ceil function and T = 2(q \u2212 1).\n\n6.2 Real-World Data Experiments\nFinite samples may affect the size of the sphere S\u03bb in LPM elicitation, but we observe that as long as\n\u03bb is greater than \u0001\u2126 LPMs can be elicited (Appendix F.2). Thus, here we emprically validate only\nDLPM elicitation with \ufb01nite samples. We consider two real-world datasets: (a) SensIT (Acoustic)\ndataset [5] (78823 instances, 3 classes), and (b) Vehicle dataset [21] (846 instances, 4 classes).\nFrom each dataset, we create two other datasets containing randomly chosen 50% and 75% of the\ndatapoints. So, we have six datasets in total. For all the datasets, we standardize the features and\nsplit the dataset into two parts S1 and S2. On S1, we learn {\u02c6\u03b7i(x)}k\ni=1 using a regularized softmax\nregression model. We use S2 for making predictions and computing sample confusions.\nWe randomly selected 100 DLPMs i.e. a\u2217\u2019s. We then used Algorithm 1 with \u0001 = 0.01 to recover the\nestimates \u02c6a\u2019s. In Figure 3, we show the proportion of times (cid:107)a\u2217 \u2212 \u02c6a(cid:107)\u221e \u2264 \u03c9 for different values of \u03c9.\nWe see improved elicitation as we increase the number of datapoints in both the datasets, suggesting\nthat ME improves with larger datasets. In particular, for the full SensIT (Acoustic) dataset, we elicit\nall the metrics within \u03c9 = 0.12. We also observe that \u03c9 \u2208 [0.04, 0.08] is an overly tight evaluation\ncriterion that can result in failures. This is because the elicitation routine gets stuck at the closest\nachievable sample confusions, which need not be optimal within the (small) search tolerance \u0001.\n\n7 Discussion Points and Future Work\n\n\u2022 Extensions. The family of human evaluation metrics is believed to be large and now that we have\ndiscussed elicitation and guarantees for linear metrics, we can certainly aim for eliciting broader\nmetric families.\n\n(a) Linear-fractional metrics e.g. F-measure [15] are common in classi\ufb01cation problems because\noften one measures classi\ufb01cation quality using proportions of predictions with respect to\ndifferent classes. For eliciting linear-fractional metrics, we exploit their quasiconcave and\n\u2217\nquasiconvex nature. Intuitively, we aim to get a supporting hyperplane (cid:96)\nat the maximizer\n\u2217 at the minimizer c\u2217 (see Figure 2(c)), which results in\nc\u2217 and a supporting hyperplane (cid:96)\ntwo non-linear systems of equations. Then we \ufb01nd a common solution to both the systems\nresulting in the true metric in just twice the number of queries required in the linear case.\nDue to limited space, we defer the details of diagonal and full linear-fractional elicitation to\nappendices E.1 and E.2, respectively.\n\n8\n\n0.040.060.080.10.120.140.20.40.60.81.0Successful ME ProportionME on SensIT (Acoustic) Dataset50% Data75% Data100% Data0.040.060.080.10.120.140.00.10.20.30.40.50.60.7Successful ME ProportionME on Vehicle Dataset50% Data75% Data100% Data\f(b) When the oracle\u2019s metric is just monotonically increasing in diagonal confusions without even\nhaving a restricted functional form, then Algorithm 1 can return a \ufb01rst order approximation at\nthe BO diagonal confusion. Notice that even this may be of high importance to practitioners.\nThe elicitation details are discussed in Appendix E.3.\n\n\u2022 Practical Convenience. Our procedures can also be applied by posing pairwise classi\ufb01er compar-\nisons directly. One way is to use A/B testing [22] where the user population acts an oracle. Another\nway is to use comparisons from a single expert, perhaps combined with interpretable machine\nlearning techniques [19, 4]. We suggest the approach proposed by Narasimhan [14] for estimating\nthe classi\ufb01er associated with a given confusion matrix.\n\n\u2022 Advantage of Algorithm 1. When there is a reason to restrict the metric search to DLPM e.g. due\n\nto prior knowledge, then Algorithm 1 is preferred for its lower query complexity.\n\n\u2022 Future Work. We conjecture that our query complexity bounds are tight; however, we leave this\ndetail for the future. We also plan to extend our procedures for the oracles that are only probably\ncorrect. This can be done easily by applying majority voting over repeated queries [11].\n\n8 Related Work\nThe closest line of work to ours is Hiranandani et al. [7], who proposed the problem of ME but solved\nit only for a simpler setting of binary classi\ufb01cation. As we move to multiclass performance ME, we\n\ufb01nd that the form of metrics and the complexity of the query space increases. This results in stark\ndifferences in the elicitation algorithms. Algorithm 1, which is closest to the binary approach, only\nworks for Restricted Bayes Optimal classi\ufb01ers, and Algorithm 2 requires a coordinate-wise binary-\nsearch approach. As a result, novel methods are also required to provide query complexity guarantees.\nThe LPM elicitation problem can be posed as a Derivative-Free Optimization [9] to a certain extent,\nbut only after exploiting the geometry as we have. In addition, passively learning linear functions\nusing pairwise comparisons has been studied before [6, 10, 16], but these approaches fail to control\nsample (i.e. query) complexity and end up utilizing more queries than the active approaches [20, 8, 12].\nPapers which actively control the query samples for linear elicitation, e.g. [18], exploit the query\nspace like us in order to achieve lower query complexity. However, unlike us, [18] does not provide\ntheoretical bounds and is also applied to a different query space.\n\n9 Conclusion\nWe study the space of multiclass confusions and propose robust, ef\ufb01cient algorithms to elicit diagonal-\nlinear and linear performance metrics using preference feedback. We extend elicitation to other\nfamilies e.g. linear-fractional metrics, thus covering a wide range of metrics encountered in practice.\n\nAcknowledgments\n\nGaurush Hiranandani and Oluwasanmi Koyejo thank Microsoft Azure for providing computing\ncredits. Shant Boodaghians and Ruta Mehta acknowledge the support of NSF via CCF 1750436.\n\nReferences\n[1] N. Abe, B. Zadrozny, and J. Langford. An iterative method for multi-class cost-sensitive\nlearning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge\ndiscovery and data mining, pages 3\u201311. ACM, 2004.\n\n[2] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.\n[3] P. Dmitriev and X. Wu. Measuring metrics. In CIKM, 2016.\n\n[4] F. Doshi-Velez and B. Kim. Towards A Rigorous Science of Interpretable Machine Learning.\n\nArXiv e-prints:1702.08608, 2017.\n\n[5] M. F. Duarte and Y. H. Hu. Vehicle classi\ufb01cation in distributed sensor networks. Journal of\n\nParallel and Distributed Computing, 64(7):826\u2013838, 2004.\n\n[6] R. Herbrich. Large margin rank boundaries for ordinal regression. In Advances in large margin\n\nclassi\ufb01ers, pages 115\u2013132. The MIT Press, 2000.\n\n9\n\n\f[7] G. Hiranandani, S. Boodaghians, R. Mehta, and O. Koyejo. Performance metric elicitation from\npairwise classi\ufb01er comparisons. In The 22nd International Conference on Arti\ufb01cial Intelligence\nand Statistics, pages 371\u2013379, 2019.\n\n[8] K. G. Jamieson and R. Nowak. Active ranking using pairwise comparisons. In NIPS, pages\n\n2240\u20132248, 2011.\n\n[9] K. G. Jamieson, R. Nowak, and B. Recht. Query complexity of derivative-free optimization. In\n\nAdvances in Neural Information Processing Systems, pages 2672\u20132680, 2012.\n\n[10] T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth\nACM SIGKDD international conference on Knowledge discovery and data mining, pages\n133\u2013142. ACM, 2002.\n\n[11] M. K\u00e4\u00e4ri\u00e4inen. Active learning in the non-realizable case. In International Conference on\n\nAlgorithmic Learning Theory, pages 63\u201377. Springer, 2006.\n\n[12] D. M. Kane, S. Lovett, S. Moran, and J. Zhang. Active classi\ufb01cation with comparison queries.\nIn 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages\n355\u2013366. IEEE, 2017.\n\n[13] O. O. Koyejo, N. Natarajan, P. K. Ravikumar, and I. S. Dhillon. Consistent multilabel classi\ufb01ca-\n\ntion. In NIPS, pages 3321\u20133329, 2015.\n\n[14] H. Narasimhan. Learning with complex loss functions and constraints.\nConference on Arti\ufb01cial Intelligence and Statistics, pages 1646\u20131654, 2018.\n\nIn International\n\n[15] H. Narasimhan, H. Ramaswamy, A. Saha, and S. Agarwal. Consistent multiclass algorithms for\n\ncomplex performance measures. In ICML, pages 2398\u20132407, 2015.\n\n[16] M. Peyrard, T. Botschen, and I. Gurevych. Learning to score system summaries for better content\nselection evaluation. In Proceedings of the Workshop on New Frontiers in Summarization, pages\n74\u201384, 2017.\n\n[17] B. Qian, X. Wang, F. Wang, H. Li, J. Ye, and I. Davidson. Active learning from relative queries.\n\nIn IJCAI, pages 1614\u20131620, 2013.\n\n[18] L. Qian, J. Gao, and H. Jagadish. Learning user preferences by adaptive pairwise comparison.\n\nProceedings of the VLDB Endowment, 8(11):1322\u20131333, 2015.\n\n[19] M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions\n\nof any classi\ufb01er. In ACM SIGKDD, pages 1135\u20131144. ACM, 2016.\n\n[20] B. Settles. Active learning literature survey. Technical report, University of Wisconsin-Madison\n\nDepartment of Computer Sciences, 2009.\n\n[21] J. P. Siebert. Vehicle recognition using rule based methods. 1987.\n\n[22] G. Tamburrelli and A. Margara. Towards automated A/B testing. In International Symposium\n\non Search Based Software Engineering, pages 184\u2013198. Springer, 2014.\n\n[23] S. Yang and D. Q. Naiman. Multiclass cancer classi\ufb01cation based on gene expression compari-\n\nson. Statistical applications in genetics and molecular biology, 13(4):477\u2013496, 2014.\n\n[24] Z.-H. Zhou and X.-Y. Liu. On multi-class cost-sensitive learning. Computational Intelligence,\n\n26(3):232\u2013257, 2010.\n\n10\n\n\f", "award": [], "sourceid": 4994, "authors": [{"given_name": "Gaurush", "family_name": "Hiranandani", "institution": "University of Illinois at Urbana-Champaign"}, {"given_name": "Shant", "family_name": "Boodaghians", "institution": "UIUC"}, {"given_name": "Ruta", "family_name": "Mehta", "institution": "UIUC"}, {"given_name": "Oluwasanmi", "family_name": "Koyejo", "institution": "UIUC"}]}