{"title": "MaxGap Bandit: Adaptive Algorithms for Approximate Ranking", "book": "Advances in Neural Information Processing Systems", "page_first": 11047, "page_last": 11057, "abstract": "This paper studies the problem of adaptively sampling from K distributions (arms) in order to identify the largest gap between any two adjacent means. We call this the MaxGap-bandit problem. This problem arises naturally in approximate ranking, noisy sorting, outlier detection, and top-arm identification in bandits. The key novelty of the MaxGap bandit problem is that it aims to adaptively determine the natural partitioning of the distributions into a subset with larger means and a subset with smaller means, where the split is determined by the largest gap rather than a pre-specified rank or threshold. Estimating an arm\u2019s gap requires sampling its neighboring arms in addition to itself, and this dependence results in a novel hardness parameter that characterizes the sample complexity of the problem. We propose elimination and UCB-style algorithms and show that they are minimax optimal. Our experiments show that the UCB-style algorithms require 6-8x fewer samples than non-adaptive sampling to achieve the same error.", "full_text": "MaxGap Bandit: Adaptive Algorithms for\n\nApproximate Ranking\n\nSumeet Katariya \u21e4\n\nUW-Madison and Amazon\nsumeetsk@gmail.com\n\nArdhendu Tripathy \u21e4\n\nUW-Madison\n\nastripathy@wisc.edu\n\nRobert Nowak\nUW-Madison\n\nrdnowak@wisc.edu\n\nAbstract\n\nThis paper studies the problem of adaptively sampling from K distributions (arms)\nin order to identify the largest gap between any two adjacent means. We call\nthis the MaxGap-bandit problem. This problem arises naturally in approximate\nranking, noisy sorting, outlier detection, and top-arm identi\ufb01cation in bandits. The\nkey novelty of the MaxGap bandit problem is that it aims to adaptively determine\nthe natural partitioning of the distributions into a subset with larger means and a\nsubset with smaller means, where the split is determined by the largest gap rather\nthan a pre-speci\ufb01ed rank or threshold. Estimating an arm\u2019s gap requires sampling\nits neighboring arms in addition to itself, and this dependence results in a novel\nhardness parameter that characterizes the sample complexity of the problem. We\npropose elimination and UCB-style algorithms and show that they are minimax\noptimal. Our experiments show that the UCB-style algorithms require 6-8x fewer\nsamples than non-adaptive sampling to achieve the same error.\n\n1\n\nIntroduction\n\nConsider an algorithm that can draw i.i.d. samples from K unknown distributions. The goal is\nto partially rank the distributions according to their (unknown) means. This model encompasses\nmany problems including best-arms identi\ufb01cation (BAI) in multi-armed bandits, noisy sorting and\nranking, and outlier detection. Partial ranking is often preferred to complete ranking because correctly\nordering distributions with nearly equal means is an expensive task (in terms of number of required\nsamples). Moreover, in many applications it is arguably unnecessary to resolve the order of such\nclose distributions. This observation motivates algorithms that aim to recover a partial ordering of\ngroups of distributions having similar means. This entails identifying large \u201cgaps\u201d in the ordered\nsequence of means. The focus of this paper is the fundamental problem of \ufb01nding the largest gap by\nsampling adaptively. Identi\ufb01cation of the largest gap separates the distributions into two groups, and\nrecursive application can identify any desired number of groupings in a partial order.\nAs an illustration, consider a subset of images from the Chicago streetview dataset [17] shown in\nFig. 1. In this study, people were asked to judge how safe each scene looks [18], and a larger mean\nindicates a safer looking scene. While each person has a different sense of how safe an image looks,\nwhen aggregated there are clear trends in the safety scores (denoted by \u00b5(i)) of the images. Fig. 1\nschematically shows the distribution of scores given by people as a bell curve below each image.\nAssuming the sample means are close to their true means, one can nominally classify them as \u2018safe\u2019,\n\u2018maybe unsafe\u2019 and \u2018unsafe\u2019 as indicated in Fig. 1. Here we have implicitly used the large gaps\n\u00b5(2) \u00b5(3) and \u00b5(4) \u00b5(5) to mark the boundaries. Note that \ufb01nding the safest image (BAI) is hard as\nwe need a lot of human responses to decide the larger mean between the two rightmost distributions;\nit is also arguably unnecessary. A common way to address this problem is to specify a tolerance \u270f [7],\n\n\u21e4Authors contributed equally and are listed alphabetically.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f\u00b5(6)\n\n\u00b5(5)\n\nunsafe\n\n\u00b5(3)\n\u00b5(4)\nmaybe unsafe\n\nlarge gap\n\n\u00b5(1)\n\n\u00b5(2)\nsafe\n\nFigure 1: Six representative images from Chicago streetview dataset and their safety (Borda) scores.\n\nand stop sampling if the means are less than \u270f apart; however determining this can require \u2326(1/\u270f2)\nsamples. Distinguishing the top 2 distributions from the rest is easy and can be ef\ufb01ciently done\nusing top-m arm identi\ufb01cation [15], however this requires the experimenter to prescribe the location\nm = 2 where a large gap exists which is unknown. Automatically identifying natural splits in the\nset of distributions is the aim of the new theory and algorithms we propose. We call this problem of\nadaptive sampling to \ufb01nd the largest gap the MaxGap-bandit problem.\n\n1.1 Notation and Problem Statement\n\nWe will use multi-armed bandit terminology and notation throughout the paper. The K distributions\nwill be called arms and drawing a sample from a distribution will be refered to as sampling the arm.\nLet \u00b5i 2 R denote the mean of the i-th arm, i 2{ 1, 2, . . . , K} =: [K]. We add a parenthesis around\nthe subscript j to indicate the j-th largest mean, i.e., \u00b5(K) \uf8ff \u00b5(K1) \uf8ff\u00b7\u00b7\u00b7\uf8ff \u00b5(1). For the i-th arm,\nwe de\ufb01ne its gap i to be the maximum of its left and right gaps, i.e.,\n(1)\nWe de\ufb01ne \u00b5(0) = 1 and \u00b5(K+1) = 1 to account for the fact that extreme arms have only one gap.\nThe goal of the MaxGap-bandit problem is to (adaptively) sample the arms and return two clusters\n\ni = max{\u00b5(`) \u00b5(`+1) , \u00b5(`1) \u00b5(`)} where \u00b5i = \u00b5(`).\n\nand C2 = {(m + 1), . . . , (K)},\nwhere m is the rank of the arm with the largest gap between adjacent means, i.e.,\n\nC1 = {(1), (2), . . . , (m)}\n\nm = arg max\nj2[K1]\n\n\u00b5(j) \u00b5(j+1).\n\n(2)\n\nThe mean values are unknown as is the ordering of the arms according to their means. A solution to\nthe MaxGap-bandit problem is an algorithm which given a probability of error > 0, samples the\n\narms and upon stopping partitions [K] into two clusters bC1 and bC2 such that\n\n(3)\nThis setting is known as the \ufb01xed-con\ufb01dence setting [10], and the goal is to achieve the probably\ncorrect clustering using as few samples as possible. In the sequel, we assume that m is uniquely\nde\ufb01ned and let max = i\u21e4 where \u00b5i\u21e4 = \u00b5(m).\n\nP(bC1 6= C1) \uf8ff .\n\n1.2 Comparison to a Naive Algorithm: Sort then search for MaxGap\n\n2 gaps since the MaxGap-bandit problem\nThe MaxGap-bandit problem is not equivalent to BAI onK\nrequires identifying the largest gap between adjacent arm means (BAI onK\n2 gaps would always\nidentify \u00b5(1) \u00b5(K) as the largest gap). This suggests a naive two-step algorithm: we \ufb01rst sample\nthe arms enough number of times so as to identify all pairs of adjacent arms (i.e., we sort the arms\naccording to their means), and then run a BAI bandit algorithm [13] on the (K 1) gaps between\nadjacent arms to identify the largest gap (an unbiased sample of the gap can be obtained by taking the\ndifference of the samples of the two arms forming the gap).\nWe analyze the sample complexity of this naive algorithm in Appendix A , and discuss the results\nhere for an example. Consider the arrangement of means shown in Fig. 2 where there is one large gap\nmax and all the other gaps are equal to min \u2327 max. The naive algorithm\u2019s sample complexity is\n\u2326(K/2\n\nmin), as the \ufb01rst sorting step requires these many samples, which can be very large.\n\n2\n\n\fIs this sorting of the arm means necessary? For in-\nstance, we do not need to sort K real numbers in\norder to cluster them according to the largest gap 1.\nThe algorithms we propose in this paper solve the\nMaxGap-bandit problem without necessarily sorting\nthe arm means. For the con\ufb01guration in Fig. 2 they require \u02dcO(K/2\napproximately (max/min)2 samples.\nThe analysis of our algorithms suggests a novel hardness parameter for the MaxGap-bandit problem\nthat we discuss next. We let i,j := \u00b5j \u00b5i for all i, j 2 [K]. We show in Section 5 that the number\nof samples taken from distribution i due to its right gap is inversely proportional to the square of\n(4)\n\nFigure 2: Arm means with one large gap.\n\nmax) samples, giving a saving of\n\n:= max\n\nr\ni\n\nj:i,j >0\n\nmini,j , max i,j .\n\ni analogously. The total number of samples drawn from distribution\nFor the left gap of i we de\ufb01ne l\ni}. The intuition for Eq. (4) is that\ni is inversely proportional to the square of i := min{r\ndistribution i can be eliminated quickly if there is another distribution j that has a moderately large\ngap from i (so that this gap can be quickly detected), but not too large (so that the gap is easy to\ndistinguish from max), and (4) chooses the best j. We discuss (4) in detail in Section 5, where we\n\nshow that our algorithms use eOPi2[K]/{(m),(m+1)} 2\n\ngap with probability at least 1 . This sample complexity is minimax optimal.\n1.3 Summary of Main Results and Paper Organization\n\nlog(K/i) samples to \ufb01nd the largest\n\ni , l\n\ni\n\nIn addition to motivating and formulating the MaxGap-bandit problem, we make the following\ncontributions. First, we design elimination and UCB-style algorithms as solutions to the MaxGap-\nbandit problem that do not require sorting the arm means (Section 3). These algorithms require\ncomputing upper bounds on the gaps i, which can be formulated as a mixed integer optimization\nproblem. We design a computationally ef\ufb01cient dynamic programming subroutine to solve this\noptimization problem and this is our second contribution (Section 4). Third, we analyze the sample\ncomplexity of our proposed algorithms, and discover a novel problem-hardness parameter (Section 5).\nThis parameter arises because of the arm interactions in the MaxGap-bandit problem where, in order\nto reduce uncertainty in the value of an arm\u2019s gap, we not only need to sample the said arm but also\nits neighboring arms. Fourth, we show that this sample complexity is minimax optimal (Section 6).\nFinally, we evaluate the empirical performance of our algorithms on simulated and real datasets and\nobserve that they require 6-8x fewer samples than non-adaptive sampling to achieve the same error\n(Section 7).\n\n2 Related Work\n\nOne line of related research is best-arm identi\ufb01cation (BAI) in multi-armed bandits. A typical goal\nin this setting is to identify the top-m arms with largest means, where m is a prespeci\ufb01ed number\n[15, 16, 1, 3, 9, 4, 14, 7, 20]. As explained in Section 1, our motivation behind formulating the\nMaxGap-bandit problem is to have an adaptive algorithm which \ufb01nds the \u201cnatural\u201d set of top arms as\ndelineated by the largest gap in consecutive mean values. Our work can also be used to automatically\ndetect \u201coutlier\u201d arms [23].\nThe MaxGap-bandit problem is different from the standard multi-armed bandit because of the local\ndependence of an arm\u2019s gap on other arms. Other best-arm settings where an arm\u2019s reward can\ninform the quality of other arms include linear bandits [22] and combinatorial bandits [5, 11]. In\nthese problems, the decision space is known to the learner, i.e., the vectors corresponding to the\narms in linear bandits and the subsets of arms over which the objective function is to be optimized in\ncombinatorial bandits is known to the learner. However in our problem, we do not know the sorted\norder of the arm means, i.e., the set of all valid gaps is unknown a priori. Our problem does not\nreduce to these settings.\n\n1First \ufb01nd the smallest and largest numbers, say a and b respectively. Divide the interval [a, b] into K + 1\nequal-width bins and map each number to its corresponding bin, while maintaining the smallest and largest\nnumber in each bin. Since at least one bin is empty by the pigeonhole principle, the largest gap is between two\nnumbers belonging to different bins. Calculate all gaps between bins and cluster based on the largest of those.\n\n3\n\n\u2206min\u2206max\fAnother related problem is noisy sorting and ranking. Here the typical goal is to sort a list using noisy\npairwise comparisons. Our framework encompasses noisy ranking based on Borda scores [1]. The\nBorda score of an item is the probability that it is ranked higher in a pairwise comparison with another\nitem chosen uniformly at random. In our setting, the Borda score is the mean of each distribution.\nMuch of the theoretical computer science literature on this topic assumes a bounded noise model\nfor comparisons (i.e., comparisons are probably correct with a positive margin) [8, 6, 2, 21]. This is\nunrealistic in many real-world applications since near equals or outright ties are not uncommon. The\nlargest gap problem we study can be used to (partially) order items into two natural groups, one with\nlarge means and one with small means. Previous related work considered a similar problem with\nprescribed (non-adaptive) quantile groupings [18].\n\n3 MaxGap Bandit Algorithms\n\nWe propose elimination [7] and UCB [13] style algorithms for the MaxGap-bandit problem. These\nalgorithms operate on the arm gaps instead of the arm means. The subroutine to construct con\ufb01dence\nintervals on the gaps (denoted by Ua(t)) using con\ufb01dence intervals on the arm means (denoted\nby [la(t), ra(t)]) is described in Algorithm 4 in Section 4, and this subroutine is used by all three\nalgorithms described in this section.\n\n3.1 Elimination Algorithm: MaxGapElim\n\nAt each time step, MaxGapElim (Algorithm 1) samples all arms in an active set consisting of arms a\nwhose gap upper bound Ua is larger than the global lower bound L on the maximum gap, and stops\nwhen there are only two arms in the active set.\n\nAlgorithm 1 MaxGapElim\n1: Initialize active set A = [K]\n2: for t = 1, 2, . . . do\n3:\n4:\n5:\n6:\n7:\n\n// rounds\n8 a 2 A, sample arm a, compute [la(t), ra(t)] using (5).\n//arm con\ufb01dence intervals\n8 a 2 A, compute Ua(t) using Algorithm 4.\n// upper bound on arm max gap\nCompute L(t) using (9).\n// lower bound on max gap\n8 a 2 A, if Ua(t) \uf8ff L(t), A = A \\ a.\n// Elimination\nIf |A| = 2, stop. Return clusters using max gap in the empirical means. // Stopping condition\n\n3.2 UCB algorithms: MaxGapUCB and MaxGapTop2UCB\n\nMaxGapUCB (Algorithm 2) is motivated from the principle of \u201coptimism in the face of uncertainty\u201d.\nIt samples all arms with the highest gap upper bound. Note that there are at least two arms with the\nhighest gap upper bound because any gap is shared by at least two arms (one on the right and one on\nthe left). The stopping condition is akin to the stopping condition in Jamieson et al. [13].\n\nAlgorithm 2 MaxGapUCB\n1: Initialize U = [K].\n2: for t = 1, 2, . . . do\n3:\n4:\n5:\n6:\n\n8a 2U , sample a and update [la(t), ra(t)] using (5).\n8a 2 [K], compute Ua(t) using Algorithm 4.\nLet M1(t) = maxj2[K] Uj(t). Set U = {a : Ua(t) = M1(t)}.\nIf 9 i, j such that Ti(t) + Tj(t) cPa /2{i,j} Ta(t), stop.\n\n// highest gap-UCB arms\n// stopping condition\n\nAlternatively, we can use an LUCB [16]-type algorithm that samples arms which have the two highest\ngap upper bounds, and stops when the second-largest gap upper bound is smaller than the global\nlower bound L(t) . We refer to this algorithm as MaxGapTop2UCB (Algorithm 3).\n\n4\n\n\fAlgorithm 3 MaxGapTop2UCB\n1: Initialize U1 [U 2 = [K].\n2: for t = 1, 2, . . . do\n3:\n4:\n5:\n6:\n7:\n\n8a 2U 1 [U 2, sample a and update [la(t), ra(t)] using (5).\n8a 2 [K], compute Ua(t) using Algorithm 4.\nLet M1(t) = maxj2[K] Uj(t). Set U1 = {a : Ua(t) = M1(t)}.\n// highest gap-UCB arms\nLet M2(t) = maxj2[K]\\U1 Uj(t). Set U2 = {a : Ua(t) = M2(t)}. // 2nd highest gap-UCB\nCompute L(t) using (9). If M2(t) < L(t), stop.\n\nAlgorithm 4 Procedure to \ufb01nd Ua(t)\n1: Set P r\n2: Ur\n\na = {i : li(t) 2 [la(t), ra(t)]}.\n\na(t) = max\ni2P r\n\na {Gr\na Gl\n\na(li(t), t)}, where Gr\na(rj(t), t) , where Gl\n\na(t), Ul\n\na = {i : ri(t) 2 [la(t), ra(t)]}.\n\n3: Set P l\n4: Ul\n5: return Ua(t) max{Ur\n\na(t) = max\ni2P l\n\na(t)}\n\na(x, t) is given by (7).\n\na(x, t) is given by (19).\n\n// eqn. (8)\n\n// eqn. (20)\n\n4 Con\ufb01dence Bounds for Gaps\n\nIn this section we explain how to construct con\ufb01dence bounds for the arm gaps (denoted by Ua\nand L) using con\ufb01dence bounds for the arm means (denoted by [la, ra]). These bounds are key\ningredients for the algorithms described in Section 3.\n\nGiven i.i.d. samples from arm a, an empirical meanb\u00b5a and con\ufb01dence interval on the arm mean\n\ncan be constructed using standard methods. Let Ta(t) denote the number of samples from arm a\nafter t time steps of the algorithm. Throughout our analysis and experimentation we use con\ufb01dence\nintervals on the mean of the form\n\nla(t) = \u02c6\u00b5a(t) cTa(t) and ra(t) = \u02c6\u00b5a(t) + cTa(t), where cs =q log(4Ks2/)\n\ns\n\nThe con\ufb01dence intervals are chosen so that [12]\n\n.\n\n(5)\n\n(6)\n\nP(8 t 2 N,8 a 2 [K], \u00b5a 2 [la(t), ra(t)]) 1 .\n\nConceptually, the con\ufb01dence intervals on the arm means can be used to construct upper con\ufb01dence\nbounds on the mean gaps {i}i2[K] in the following manner. Consider all possible con\ufb01gurations\nof the arm means that satisfy the con\ufb01dence interval constraints in (5). Each con\ufb01guration \ufb01xes\nthe gaps associated with any arm a 2 [K]. Then the maximum gap value over all con\ufb01gurations\nis the upper con\ufb01dence bound on arm a\u2019s gap; we denote it as Ua. The above procedure can be\nformulated as a mixed integer linear program (see Appendix B.1). In the algorithms in Section 3,\nthis optimization problem needs to be solved at every time t and for every arm a 2 [K] before\nquerying a new sample, which can be practically infeasible. In Algorithm 4, we give an ef\ufb01cient\nO(K2) time dynamic programming algorithm to compute Ua. We next explain the main ideas used\nin this algorithm, and refer the reader to Appendix B.2 for the proofs.\na := \u00b5(`) \u00b5(`+1), where ` is the\nEach arm a has a right and left gap, r\na := \u00b5(`1) \u00b5(`) and l\na(t) and Ul\nrank of a, i.e., \u00b5a = \u00b5(`). We construct separate upper bounds Ur\na(t) for these gaps and\na(t)}. Here we provide an intuitive description for how the\nthen de\ufb01ne Ua(t) = max{Ur\na(t), Ul\nbounds are computed, focusing on Ur\na(t) as an example. To start, suppose the true mean of arm a is\nknown exactly, while the means of other arms are only known to lie within their con\ufb01dence intervals.\nIf there exist arms that cannot go to the left of arm a, one can see that the largest right gap for a is\nobtained by placing all arms that can go to the left of a at their leftmost positions, and all remaining\narms at their rightmost positions, as shown in Fig. 3(a). If however all arms can go to the left of\narm a, the con\ufb01guration that gives the largest right gap for a is obtained by placing the arm with the\nlargest upper bound at its right boundary, and all other arms at their left boundaries, as illustrated in\nFig. 3(b). We de\ufb01ne a function Gr\na(x, t) that takes as input a known position x for the mean of arm a\n\n5\n\n\fa\n\na\n\n(a)\n\n(b)\n\nFigure 3: Computing maximum right gap of blue arm when its true mean is known (at position\nindicated by blue x), while the other means are known only to lie within their con\ufb01dence intervals.\n(a) If there exist arms that cannot go to the left of blue (red, green, purple), the largest right gap for\nblue is obtained by placing all arms that can go to the left of blue at their left boundaries and the\nremaining arms at their rightmost positions. (b) If all arms can go to the left of blue, the largest right\ngap for blue is obtained by placing the arm with the largest right con\ufb01dence bound (purple) at its\nright boundary and all other arms at their left boundaries.\n\nand the con\ufb01dence intervals of all other arms at time t, and returns the maximum right gap for arm a\nusing the above idea as follows.\n\nGr\n\na(x, t) =\u21e2minj:lj (t)>x rj(t) x if {j : lj(t) > x}6 = ;,\n\nmaxj6=a rj(t) x\n\notherwise.\n\n(7)\n\nHowever, the true mean of arm a is not known exactly but only that it lies within its con\ufb01dence\ninterval. The insight that helps here is that Gr\na(x, t) must achieve its maximum when x is at one of\nthe \ufb01nite locations in {lj(t) : la(t) \uf8ff lj(t) \uf8ff ra(t)}. We de\ufb01ne P r\na := {j : la(t) \uf8ff lj(t) \uf8ff ra(t)}\nas the set of arms relevant for the right gap of a, and then the maximum possible right gap of a is\n\nUr\n\na(t) = max{Gr\na can be similarly obtained. We explain this and give a proof of\n\na(lj(t), t) : j 2 P r\na}.\n\nAn upper bound for the left gap Ul\ncorrectness in Appendix B.2.\nThe algorithms also use a single global lower bound on the maximum gap. To do so, we sort the items\naccording to their empirical means, and \ufb01nd partitions of items that are clearly separated in terms\nof their con\ufb01dence intervals. At time t, let (i)t denote the arm with the ith-largest empirical mean,\n\ni.e.,b\u00b5(K)t(t) \uf8ff . . .b\u00b5(2)t(t) \uf8ffb\u00b5(1)t(t) (this can be different from the true ranking which is denoted\n\nby (\u00b7) without the subscript t). We detect a nonzero gap at arm k if maxa2{(k+1)t,...,(K)t} ra(t) <\nmina2{(1)t,...,(k)t} la(t). Thus, a lower bound on the largest gap is\n\n(8)\n\nL(t) = max\n\nk2[K1]\u2713\n\nmin\n\na2{(1)t,...,(k)t}\n\nla(t) \n\nmax\n\na2{(k+1)t,...,(K)t}\n\nra(t)\u25c6 .\n\n(9)\n\n5 Analysis\n\nIn this section, we \ufb01rst state the accuracy and sample complexity guarantees for MaxGapElim and\nMaxGapUCB, and then discuss our results. The proofs can be found in the Supplementary material.\nTheorem 1. With probability 1 , MaxGapElim, MaxGapUCB and MaxGapTop2UCB cluster the\narms according to the maximum gap, i.e., they satisfy (3).\nThe number of times arm a is sampled by both the algorithms depends on a = min{l\n\na, r\n\nr\na =\n\nl\na =\n\nmax\n\nj:0 L(t) and arm 7 cannot be eliminated. Thus arm a\nthen the upper bound Ur\ncould have a small gap with respect to its adjacent arms, but if there is a large gap in the vicinity of\narm a, it cannot be eliminated quickly. This illustrates that the maximum gap identi\ufb01cation problem\nis not equivalent to best-arm identi\ufb01cation (BAI) on gaps. Section 6 formalizes this intuition.\nKey Differences compared to BAI Analysis: The analysis of MaxGapUCB is very different from\nthe standard UCB analysis. On a high-level, in BAI, the number of samples of a sub-optimal arm i is\nbounded by observing that\n\nArm i is pulled =) \u00b5i + 2cTi(t) \u02c6\u00b5i + cTi(t) \u02c6\u00b5(1) + cT(1)(t) \u00b5(1)\n\n(12)\nThe last inequality directly bounds the number of samples Ti(t) of a sub-optimal arm i. In MaxGapUCB,\nthe gap upper bound is obtained using the con\ufb01dence intervals of two arms, and the fact that a sub-\noptimal gap (i, j) has the highest gap-UCB implies that\n\n=) 2cTi(t) \u00b5(1) \u00b5i = i.\n\n(\u00b5j + 2cTj (t)) (\u00b5i 2cTi(t)) (\u02c6\u00b5j + cTj (t)) (\u02c6\u00b5i 2cTi(t)) max\n\n=) 2(cTj (t) + cTi(t)) max ij.\n\nThus unlike the reasoning in (12), the number of samples from arm i is coupled to the number of\nsamples from arm j, and Ti(t) ! 1 if j is not sampled enough. We show in our analysis that this\ncannot happen in MaxGapUCB. Furthermore, any arm i is coupled with multiple other arms since\nthe ordering of the arms is unknown, and may have to be sampled even if its own gap is small - a\nphenomenon absent in standard BAI analysis because of the independence of the arm means. In our\nproof, we account for all samples of an arm by de\ufb01ning states the arm can belong to (called levels),\nand arguing about the con\ufb01dence intervals of the arms in unison.\n\n6 Minimax Lower Bound\n\nIn this section, we demonstrate that the MaxGap problem is fundamentally different from best-arm\nidenti\ufb01cation (BAI) on gaps. We construct a problem instance and prove a lower bound on the number\nof samples needed by any probably correct algorithm. The lower bound matches the upper bounds in\nthe previous section for this instance.\n\n7\n\n\fLemma 1. Consider a model B with K = 4 normal distributions Pi = N (\u00b5i, 1), where\nfor some \u232b \u270f> 0. Then any algorithm that is correct with probability at least 1 must collect\n\u2326(1/\u270f2) samples of arm 4 in expectation.\n\n\u00b54 = 0, \u00b53 = \u270f, \u00b52 = \u232b + 2\u270f, \u00b51 = 2\u232b + 2\u270f,\n\nFigure 5: Changing the original bandit model\nB to B0. \u00b54 is shifted to the right by 2.1\u270f. As\na result, the maximum gap in B0 is between\ngreen and purple.\n\nProof Outline: The proof uses a standard change of\nmeasure argument [10]. We construct another prob-\nlem instance B0 which has a different maximum gap\nclustering compared to B (see Fig. 5; the maxgap\nclustering in B is {4, 3}[{ 2, 1}, while the maxgap\nclustering in B0 is {4, 3, 2}[{ 1}), and show that in\norder to distinguish between B and B0, any proba-\nbly correct algorithm must collect at least \u2326(1/\u270f2)\nsamples of arm 4 in expectation (see Appendix E for\ndetails). From the de\ufb01nition of a using (10),(11), it\nis easy to check that 4 = \u270f. Therefore, for problem instance B our algorithms \ufb01nd the maxgap\nclustering using at most O(log(\u270f/)/\u270f2) samples of arm 4 (c.f. Theorem 2). This essentially matches\nthe lower bound above.\nThis example illustrates why the maximum gap identi\ufb01cation problem is different from a simple BAI\non gaps. Suppose an oracle told a BAI algorithm the ordering of the arm means. Using the ordering\nit can convert the 4-arm maximum gap problem B to a BAI problem on 3 gaps, with distributions\nP4,3 = N (\u270f, 2),P3,2 = N (\u232b + \u270f, 2), and P2,1 = N (\u232b, 2). The BAI algorithm can sample arms i\nand i + 1 to get a sample of the gap (i + 1, i). We know from standard BAI analysis [13] that the\ngap (4, 3) can be eliminated from being the largest by sampling it (and hence arm 4) O(1/\u232b2) times,\nwhich can be arbitrarily lower than the 1/\u270f2 lower bound in Lemma 1. Thus the ordering information\ngiven to the BAI algorithm is crucial for it to quickly identify the larger gaps. The problem we solve\nin this paper is identifying the maximum gap when the ordering information is not available.\n\n7 Experiments\n\nWe conduct three experiments. First, we verify the validity of our sample complexity bounds\nin Section 7.1. We then study the performance of our adaptive algorithms on simulated data in\nSection 7.2, and on the Streetview dataset in Section 7.3. The code for all experiments is publicly\navailable [19].\n\n7.1 Sample Complexity\n\n(a)\n\nIn Fig. 6(b) and Fig. 6(c), we plot the empiri-\ncal stopping time against the theoretical sample\ncomplexity (Theorem 2) for different arm con-\n\ufb01gurations. We choose the arm con\ufb01guration\nin Fig. 6(a) containing K = 15 arms and three\nunique gaps - a small gap 3 and two large gaps\n2 < 1 = max = 0.4. The hardness param-\neter is changed by increasing 2 (from 0.35 to\n0.39) and bringing it closer to 1. The rewards\nare normally distributed with = 0.05. We see\na linear relationship in Fig. 6(b) which suggests that the sample complexity expression in Theorem 2 is\ncorrect up to constants. In Fig. 6(c) we include random sampling and see that our adaptive algorithms\nrequire up to 5x fewer samples when run until completion. Fig. 6(c) also shows that our adaptive\nalgorithms always outperform random sampling, and the gains increase with hardness. We used a\nlower bound based stopping condition for Random, Elimination, Top2UCB, and set c = 5 in the\nUCB stopping condition (value of c chosen empirically as in [13]).\n\nFigure 6: Stopping time experiments.\n\n(b)\n\n(c)\n\n7.2 Simulated Data\n\nIn the second experiment, we study the performance on a simulated set of means containing two large\ngaps. The mean distribution plotted in Fig. 7(a) has K = 24 arms (N (\u00b7, 1)), with two large mean\n\n8\n\nBB'\u22062\u22061\u22063\f(a)\n\n(b)\n\n(c)\n\nFigure 7: (a) Two large gaps. (b) Clustering error probability for means shown in Fig. 7(a). (c) The\npro\ufb01le of samples allocated by MaxGapUCB to each arm in (a) at different time steps.\ngaps 10,9 = 0.98, 19,18 = 1.0, and remaining small gaps (i+1,i = 0.2 for i /2{ 9, 18}). We\nexpect to see a big advantage for adaptive sampling in this example because almost every sub-optimal\narm has a helper arm (see Section 5) which can help eliminate it quickly, and adaptive algorithms can\nthen focus on distinguishing the two large gaps. A non-adaptive algorithm on the other hand would\ncontinue sampling all arms. We plot the fraction of times C1 6= {1, . . . , 18} in 120 runs in Fig. 7(b),\nand see that the active algorithms identify the largest gap in 8x fewer samples. To visualize the\nadaptive allocation of samples to the arms, we plot in Fig. 7(c) the number of samples queried for each\narm at different time steps by MaxGapUCB. Initially, MaxGapUCB allocates samples uniformly\nover all the arms. After a few time steps, we see a bi-modal pro\ufb01le in the number of samples. Since\nall arms that achieve the largest U are sampled, we see that several arms that are near the pairs (10, 9)\nand (19, 18) are also sampled frequently. As time progresses, only the pairs (10, 9) and (19, 18) get\nsampled, and eventually more samples are allocated to the larger gap (19, 18) among the two.\n\n7.3 Streetview Dataset\n\nFor our third experiment we study performance on the\nStreetview dataset [17, 18] whose means are plotted in Fig. 8(a).\nWe have K = 90 arms, where each arm is a normal distribution\nwith mean equal to the Borda safety score of the image and stan-\ndard deviation = 0.05. The largest gap of 0.029 is between\narms 2 and 3, and the second largest gap is 0.024. In Fig. 8(b),\nwe plot the fraction of times \u02c6C1 6= {1, 2} in 120 runs as a\nfunction of the number of samples, for four algorithms, viz.,\nrandom (non-adaptive) sampling, MaxGapElim, MaxGapUCB,\nand MaxGapTop2UCB. The error bars denote standard deviation\nover the runs. MaxGapUCB and MaxGapTop2UCB require 6-7x\nfewer samples than random sampling.\n\n8 Conclusion\n\n(a)\n\n(b)\nFigure 8: (a) Borda safety scores\nfor Streetview images. (b) Proba-\nbility of returning a wrong cluster.\n\nIn this paper, we proposed the MaxGap-bandit problem: a novel maximum-gap identi\ufb01cation problem\nthat can be used as a basic primitive for clustering and approximate ranking. Our analysis shows\na novel hardness parameter for the problem, and our experiments show 6-8x gains compared to\nnon-adaptive algorithms. We use simple Hoeffding based con\ufb01dence intervals in our analysis\nfor simplicity, but better bounds can be obtained using tighter con\ufb01dence intervals [13]. Several\nextensions of this basic problem are possible. An \u270f-relaxation of the MaxGap Bandit is useful when\nthe largest and second-largest gaps are close to each other. Other possibilities include identifying\nthe largest gap within a top quantile of the arms, or clustering with a constraint that the returned\nclusters are of similar cardinality. All of these extensions will likely require new ideas, as it is unclear\nhow to obtain a lower bound for the gap associated with every arm. Finding an instance-dependent\nlower bound for MaxGap-bandit is an intriguing problem. Finally, one way to cluster the distributions\ninto more than two clusters is to apply the max-gap identi\ufb01cation algorithms recursively; however it\nwould be interesting to come up with algorithms that can perform this clustering directly.\n\n9\n\n\u0003(1)\u0003(9)\u0003(10)\u0003(18)\u0003(19)\u0003(24)\u2206max=1\u22062=0.98\u0003(1)\u0003(90)\u2206max\fAcknowledgments\n\nArdhendu Tripathy would like to thank Ervin T\u00e1nczos for helpful discussions. The authors would also\nlike to thank the reviewers for their comments and suggestions. This work was partially supported by\nAFOSR/AFRL grants FA8750-17-2-0262 and FA9550-18-1-0166.\n\nReferences\n[1] Arpit Agarwal, Shivani Agarwal, Sepehr Assadi, and Sanjeev Khanna. Learning with limited\nrounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons.\nIn Conference on Learning Theory, pages 39\u201375, 2017.\n\n[2] Mark Braverman, Jieming Mao, and S Matthew Weinberg. Parallel algorithms for select and\npartition with noisy comparisons. In Proceedings of the forty-eighth annual ACM symposium\non Theory of Computing, pages 851\u2013862. ACM, 2016.\n\n[3] Sebastien Bubeck, Tengyao Wang, and Nitin Viswanathan. Multiple identi\ufb01cations in multi-\narmed bandits. In Proceedings of the 30th International Conference on International Conference\non Machine Learning - Volume 28, ICML\u201913, pages I\u2013258\u2013I\u2013265. JMLR.org, 2013. URL\nhttp://dl.acm.org/citation.cfm?id=3042817.3042848.\n\n[4] Lijie Chen, Jian Li, and Mingda Qiao. Nearly instance optimal sample complexity bounds for\n\ntop-k arm selection. In Arti\ufb01cial Intelligence and Statistics, pages 101\u2013110, 2017.\n\n[5] Shouyuan Chen, Tian Lin, Irwin King, Michael R Lyu, and Wei Chen. Combinatorial pure\nexploration of multi-armed bandits. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence,\nand K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages\n379\u2013387. Curran Associates, Inc., 2014. URL http://papers.nips.cc/paper/\n5433-combinatorial-pure-exploration-of-multi-armed-bandits.\npdf.\n\n[6] Susan Davidson, Sanjeev Khanna, Tova Milo, and Sudeepa Roy. Top-k and clustering with\n\nnoisy comparisons. ACM Transactions on Database Systems (TODS), 39(4):35, 2014.\n\n[7] Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions\nfor the multi-armed bandit and reinforcement learning problems. Journal of machine learning\nresearch, 7(Jun):1079\u20131105, 2006.\n\n[8] Uriel Feige, Prabhakar Raghavan, David Peleg, and Eli Upfal. Computing with noisy informa-\n\ntion. SIAM Journal on Computing, 23(5):1001\u20131018, 1994.\n\n[9] Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric. Best arm identi\ufb01cation:\nA uni\ufb01ed approach to \ufb01xed budget and \ufb01xed con\ufb01dence. In Advances in Neural Information\nProcessing Systems, pages 3212\u20133220, 2012.\n\n[10] Aur\u00e9lien Garivier and Emilie Kaufmann. Optimal best arm identi\ufb01cation with \ufb01xed con\ufb01dence.\n\nIn Conference on Learning Theory, pages 998\u20131027, 2016.\n\n[11] Weiran Huang, Jungseul Ok, Liang Li, and Wei Chen. Combinatorial pure exploration with\ncontinuous and separable reward functions and its applications. In IJCAI, volume 18, pages\n2291\u20132297, 2018.\n\n[12] Kevin Jamieson and Robert Nowak. Best-arm identi\ufb01cation algorithms for multi-armed bandits\nin the \ufb01xed con\ufb01dence setting. In 2014 48th Annual Conference on Information Sciences and\nSystems (CISS), pages 1\u20136. IEEE, 2014.\n\n[13] Kevin Jamieson, Matthew Malloy, Robert Nowak, and S\u00e9bastien Bubeck. lil\u2019ucb: An optimal\nexploration algorithm for multi-armed bandits. In Conference on Learning Theory, pages\n423\u2013439, 2014.\n\n[14] Kwang-Sung Jun, Kevin G Jamieson, Robert D Nowak, and Xiaojin Zhu. Top arm identi\ufb01cation\n\nin multi-armed bandits with batch arm pulls. In AISTATS, pages 139\u2013148, 2016.\n\n10\n\n\f[15] Shivaram Kalyanakrishnan and Peter Stone. Ef\ufb01cient selection of multiple bandit arms: Theory\n\nand practice. In ICML, volume 10, pages 511\u2013518, 2010.\n\n[16] Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone. Pac subset selection in\n\nstochastic multi-armed bandits. In ICML, volume 12, pages 655\u2013662, 2012.\n\n[17] Sumeet Katariya, Lalit Jain, Nandana Sengupta, James Evans, and Robert Nowak.\nChicago streetview dataset. 2018. URL https://github.com/sumeetsk/coarse_\nranking/.\n\n[18] Sumeet Katariya, Lalit Jain, Nandana Sengupta, James Evans, and Robert Nowak. Adaptive\nsampling for coarse ranking. In International Conference on Arti\ufb01cial Intelligence and Statistics,\npages 1839\u20131848, 2018.\n\n[19] Sumeet Katariya, Ardhendu Tripathy, and Robert Nowak. Code for maxgap bandit algorithms\n\nand experiments. 2019. URL https://github.com/sumeetsk/maxgap_bandit.\n\n[20] Shie Mannor and John N Tsitsiklis. The sample complexity of exploration in the multi-armed\n\nbandit problem. Journal of Machine Learning Research, 5(Jun):623\u2013648, 2004.\n\n[21] Cheng Mao, Jonathan Weed, and Philippe Rigollet. Minimax rates and ef\ufb01cient algorithms for\nnoisy sorting. In Firdaus Janoos, Mehryar Mohri, and Karthik Sridharan, editors, Proceedings\nof Algorithmic Learning Theory, volume 83 of Proceedings of Machine Learning Research,\npages 821\u2013847. PMLR, 07\u201309 Apr 2018. URL http://proceedings.mlr.press/\nv83/mao18a.html.\n\n[22] Marta Soare, Alessandro Lazaric, and R\u00e9mi Munos. Best-arm identi\ufb01cation in linear bandits.\n\nIn Advances in Neural Information Processing Systems, pages 828\u2013836, 2014.\n\n[23] Honglei Zhuang, Chi Wang, and Yifan Wang. Identifying outlier arms in multi-armed bandit.\n\nIn Advances in Neural Information Processing Systems, pages 5204\u20135213, 2017.\n\n11\n\n\f", "award": [], "sourceid": 5909, "authors": [{"given_name": "Sumeet", "family_name": "Katariya", "institution": "UW-Madison and Amazon"}, {"given_name": "Ardhendu", "family_name": "Tripathy", "institution": "University of Wisconsin - Madison"}, {"given_name": "Robert", "family_name": "Nowak", "institution": "University of Wisconsion-Madison"}]}