{"title": "On Bootstrapping the ROC Curve", "book": "Advances in Neural Information Processing Systems", "page_first": 137, "page_last": 144, "abstract": "This paper is devoted to thoroughly investigating how to bootstrap the ROC curve, a widely used visual tool for evaluating the accuracy of test/scoring statistics in the bipartite setup. The issue of confidence bands for the ROC curve is considered and a resampling procedure based on a smooth version of the empirical distribution called the smoothed bootstrap\" is introduced. Theoretical arguments and simulation results are presented to show that the \"smoothed bootstrap\" is preferable to a \"naive\" bootstrap in order to construct accurate confidence bands.\"", "full_text": "On Bootstrapping the ROC Curve\n\nPatrice Bertail\n\nCREST (INSEE) & MODAL\u2019X - Universit\u00b4e Paris 10\n\npbertail@u-paris10.fr\n\nSt\u00b4ephan Cl\u00b4emenc\u00b8on\n\nTelecom Paristech (TSI) - LTCI UMR Institut Telecom/CNRS 5141\n\nstephan.clemencon@telecom-paristech.fr\n\nNicolas Vayatis\n\nENS Cachan & UniverSud - CMLA UMR CNRS 8536\n\nvayatis@cmla.ens-cachan.fr\n\nAbstract\n\nThis paper is devoted to thoroughly investigating how to bootstrap the ROC curve,\na widely used visual tool for evaluating the accuracy of test/scoring statistics in\nthe bipartite setup. The issue of con\ufb01dence bands for the ROC curve is considered\nand a resampling procedure based on a smooth version of the empirical distribu-\ntion called the \u201dsmoothed bootstrap\u201d is introduced. Theoretical arguments and\nsimulation results are presented to show that the \u201dsmoothed bootstrap\u201d is prefer-\nable to a \u201dnaive\u201d bootstrap in order to construct accurate con\ufb01dence bands.\n\n1 Introduction\n\nSince the seminal contribution of [14], so-called ROC curves (ROC standing for Receiving Oper-\nator Characteristic) have been extensively used in a wide variety of applications (anomaly detec-\ntion in signal analysis, medical diagnosis, search engines, credit-risk screening) as a visual tool for\nevaluating the performance of a test statistic regarding its capacity of discrimination between two\npopulations, see [8]. Whereas the statistical properties of their empirical counterparts have been\nonly lately studied from the asymptotic angle, see [18, 13, 11, 16], ROC curves also have recently\nreceived much attention in the machine-learning literature through the development of statistical\nlearning procedures tailored for the ranking problem, see [10, 2]. The latter consists of determining,\nbased on training data, a test statistic s(X) (also called a scoring function) with a ROC curve \u201das\nhigh as possible\u201d at all points of the ROC space. Given a candidate s(X), it is thus of prime impor-\ntance to assess its performance by computing a con\ufb01dence band for the corresponding ROC curve,\nin a data-driven fashion preferably. Indeed, in such a functional setup, resampling-based procedures\nshould naturally be preferred to those relying on computing/simulating the (gaussian) limiting dis-\ntribution, as \ufb01rst observed in [19, 21, 20], where the use of the bootstrap is promoted for building\ncon\ufb01dence bands in the ROC space.\nBy building on recent works, see [17, 12], it is the purpose of this paper to investigate how the\nbootstrap approach should be practically implemented based on a thorough analysis of the asymp-\ntotic properties of empirical ROC curves. Beyond the pointwise analysis developed in the studies\nmentioned above, here we tackle the problem from a functional angle, considering the entire ROC\ncurve or parts of it. This viewpoint indeed appears as particularly relevant in scoring applications.\nAlthough the asymptotic results established in this paper are of a theoretical nature, they are con-\nsiderably meaningful from a computational perspective. It turns out indeed that smoothing is the\n\n1\n\n\fkey ingredient for the bootstrap con\ufb01dence band to be accurate, whereas a naive bootstrap approach\nwould yield bands of low coverage probability in this case and should be consequently avoided by\npracticioners for analyzing ROC curves.\nThe rest of the paper is organized as follows. In Section 2, notations are \ufb01rst set out and certain\nkey notions of ROC analysis are brie\ufb02y recalled. The choice of an adequate (pseudo-)metric on the\nROC space, a crucial point of the analysis, is also considered. The smoothed bootstrap algorithm\nis presented in Section 3, together with the theoretical results establishing its asymptotic accuracy\nas well as preliminary simulation results illustrating the impact of smoothing on the bootstrap per-\nformance. In Section 4, the gain in terms of convergence rate acquired by the smoothing step is\nthoroughly discussed. We refer to [1] for technical proofs.\n\n2 Background\n\nHere we brie\ufb02y recall basic concepts of the bipartite ranking problem as well as key results related\nto the statistical estimation of ROC curves. We also set out the notations that shall be needed\nthroughout the paper. Although the results contained in this paper can be formulated without\nreferring to the bipartite ranking framework, in the purpose of motivating the present analysis\nwe intentionally connected them to this major statistical learning problem, which has recently\nrevitalized the interest for the problem of assessing the accuracy of empirical ROC curves, see [4].\n\n2.1 Assumptions and notation\nIn the bipartite ranking problem, the problem is to order all the elements X of a set X by degree\nof relevance, when relevancy may be observed through some binary indicator variable Y . Precisely,\none has a system consisting of a binary random output Y , taking its values in {\u22121, 1} say, and a\nrandom input X, taking its values in a (generally high-dimensional) feature space X , which models\nsome observation for predicting Y . The probabilistic model is the same as for standard binary\nclassi\ufb01cation but the prediction task is different. In the case of information retrieval for instance, the\ngoal is to order all documents x of the list X by degree of relevance for a particular request (rather\nthan simply classifying them as relevant or not as in classi\ufb01cation). This amounts to assigning to\neach document x in X a score s(x) indicating its degree of relevance for this speci\ufb01c query. The\nchallenge is thus to build a scoring function s : X \u2192 R from sampling data, so as to rank the\nobservations x by increasing order of their score s(x) as accurately as possible: the higher the score\ns(X) is, the more likely one should observe Y = +1.\nTrue ROC curves. A standard way of measuring the ranking performance consists of plotting the\nROC curve, namely the graph of the mapping\n\nROCs : \u03b1 \u2208 (0, 1) (cid:55)\u2192 1 \u2212 (Gs \u25e6 H\u22121\n\ns )(1 \u2212 \u03b1),\n\nwhere Gs (respectively Hs) denotes s(X)\u2019s cdf conditioned on Y = +1 (resp. conditioned on\nY = \u22121) and F \u22121(\u03b1) = inf{x \u2208 R/ F (x) \u2265 \u03b1} the generalized inverse of any cdf F on R. It\nboils down to plotting the true positive rate versus the false positive rate when testing the assumption\n\u201dH0 : Y = \u22121\u201d based on the statistic s(X). This functional performance measure induces a partial\norder on the set of scoring functions, according to which it may be shown, by standard Neyman-\nPearson\u2019s arguments, that increasing transforms of the regression function \u03b7(x) = P(Y = +1 |\nX = x) are the optimal scoring functions (the test statistic \u03b7(X) is uniformly more powerful, i.e.\n\u2200\u03b1 \u2208 (0, 1), ROC\u03b7(\u03b1) \u2265 ROCs(\u03b1), for any scoring function s(x)).\nEmpirical ROC curve estimates. Practical learning strategies for selecting a good scoring func-\ntion are based on training data Dn = {(Xi, Yi)}1\u2264i\u2264n and should thus rely on accurate empirical\nestimates of the true ROC curves. Let p = P(Y = +1). For any scoring function candidate s(X),\nan empirical counterpart of ROCs is naturally obtained by computing\n\n\u2200\u03b1 \u2208 (0, 1), (cid:91)ROCs(\u03b1) = 1 \u2212 (cid:98)Gs \u25e6 (cid:98)H\u22121\ns (1 \u2212 \u03b1)\nn(cid:88)\nI{Yi=+1}K(x \u2212 s(Xi)) and (cid:98)Hs(x) =\n\nn(cid:88)\n\ni=1\n\n2\n\nfrom empirical cdf estimates:\n\n(cid:98)Gs(x) =\n\n1\nn+\n\n1\nn\u2212\n\ni=1\n\nI{Yi=\u22121}K(x \u2212 s(Xi)),\n\n\fi=1\n\nwhere n+ = (cid:80)n\nI{Yi = +1} = n \u2212 n\u2212 is the (random) number of positive instances among\norder to obtain smoothed versions (cid:101)Gs(x) and (cid:101)Fs(x) of the latter cdfs, a typical choice consists of\npicking instead a function K(u) of the form(cid:82)\nthe sample (distributed as the binomial Bin(n, p)) and K(u) denotes the step function I{u\u22650}. In\nsuch that(cid:82) K(v)dv = 1) and h > 0 is the smoothing bandwidth, see Remark 1 for a practical view\nv\u22650 Kh(u \u2212 v)dv, with Kh(u) = h\u22121K(h\u22121 \u00b7 u)\nwhere K \u2265 0 is a regularizing Parzen-Rosenblatt kernel (i.e. a bounded square integrable function\nof smoothing. Here and throughout, I{A} denotes the indicator function of any event A.\nMetrics on the ROC space. When it comes to measure closeness between curves in the ROC\nspace, various metrics may be used, see [9]. Viewing the ROC space as a subset of the Skorohod\u2019s\nspace D([0, 1]) of c`ad-l`ag functions f : [0, 1] \u2192 R, the standard metric induced by the sup norm\n||.||\u221e appears as a natural choice. As shall be seen below, asymptotic arguments for grounding\nthe bootstrapping of the empirical ROC curve \ufb02uctuations, when measured in terms of the sup\nnorm ||.||\u221e, are rather straightforward. However, given the geometry of empirical ROC curves,\nthis metric is not always convenient for our purpose and may produce very wide, and thus non\ninformative con\ufb01dence bands. For analyzing stepwise graphs, such as empirical ROC curves, we\nshall consider the closely related pseudo-metric de\ufb01ned as follows:\n\n\u2200(f1, f2) \u2208 D([0, 1])2, dB(f1, f2) = sup\nt\u2208[0,1]\n\ndB(f1, f2; t),\n\n1\n\n2\n\n\u25e6 f1(t) \u2212 t|, |f\u22121\n\nwhere dB(f1, f2; t) = min{|f1(t) \u2212 f2(t)|, |f\u22121\n\u25e6 f2(t) \u2212 t|. We clearly have\ndB(f1, f2) \u2264 ||f1\u2212f2||\u221e. The major advantage of considering this pseudo-metric is that it provides\na control on vertical and horizontal jumps of ROC curves both at the same time, treating both\ntypes of error in a symmetric fashion. Equipped with this pseudo-metric, two piecewise constant\nROC curves may be close to each other, even if their jumps do not exactly match. This is clearly\nappropriate for describing the \ufb02uctuations of the empirical ROC curve (and the deviation between\nthe latter and its bootstrap counterpart as well). This way, dB permits to construct builds bands of\nreasonable size, well adapted to the stepwise shape of empirical ROC curves, with better coverage\nprobabilities. In this respect, the closely related Hausdorff distance (i.e. the distance between the\ngraphs completed by linear segments at jump points) would also be a pertinent choice. However,\nproviding a theoretical basis in the case of the Hausdorff distance is very challenging and will not\nbe addressed in this paper, owing to space limitations.\nAs the goal pursued in the present paper is to build, in the ROC space viewed as a subspace of\nthe Skorohod\u2019s space D([0, 1]) equipped with a proper (pseudo-) metric, a con\ufb01dence band for\nthe ROC curve of a given diagnosis test statistic s(X), we shall omit to index by s the quan-\ntities considered and denote by Z the r.v. s(X) (and by Zi, 1 \u2264 i \u2264 n, the s(Xi)\u2019s) for\nnotational simplicity. Throughout the paper, we assume that H(dx) and G(dx) are continuous\nprobability distributions, with densities h(x) and g(x) respectively. Eventually, denote by P the\njoint distribution of (Z, Y ) on R \u00d7 {\u22121, +1} and by Pn its empirical version based on the sam-\nple Dn = {(Zi, Yi)}1\u2264i\u2264n. Equipped with the notations above, one may write P(dz, y) =\npI{y=+1}G(dz) + (1 \u2212 p)I{y=\u22121}H(dz).\n\n2.2 Asymptotic law - Gaussian approximation\n\n\u221a\nIn the situation described above, the next theorem establishes the strong consistency of the empirical\nROC curve in sup norm and provides a strong approximation at the rate 1/\nn, up to logarithmic\nfactors, for the \ufb02uctuation process:\n\u221a\n\nrn(\u03b1) =\n\nn((cid:91)ROCn(\u03b1) \u2212 ROC(\u03b1)), \u03b1 \u2208 [0, 1].\n\nThis (gaussian) approximation plays a crucial role in understanding the asymptotic behavior of the\nempirical ROC curve and of its bootstrap counterpart. The following assumptions are required.\nH1 The slope of the ROC curve is bounded: sup\u03b1\u2208[0,1]{g(H\u22121(\u03b1))/h(H\u22121(\u03b1))} < \u221e.\nH2 H is twice differentiable on [0, 1]. Furthermore, \u2200\u03b1 \u2208 [0, 1], h(\u03b1) > 0 and there exists\n\n\u03b3 > 0 such that sup\u03b1\u2208[0,1]{\u03b1(1 \u2212 \u03b1) \u00b7 d log(h \u25e6 H\u22121(\u03b1))/d\u03b1} \u2264 \u03b3 < \u221e.\n\nTheorem. 1 (FUNCTIONAL LIMIT THEOREM) Suppose that H1 \u2212 H2 are ful\ufb01lled. Then,\n\n3\n\n\f(i) the empirical ROC curve is strongly consistent:\n\n|(cid:91)ROCn(\u03b1) \u2212 ROC(\u03b1)| \u2192 0 a.s. as n \u2192 \u221e,\n\nsup\n\u03b1\u2208[0,1]\n\n(ii) there exist a sequence of two independent brownian bridges {(B(n)\n\nsuch that we almost surely have, uniformly over [0, 1],\n\n(cid:16)\n\n(\u03b1), B(n)\n\u221a\n(log log n)\u03c11(\u03b3)(log n)\u03c12(\u03b3))/\nn\n\n(cid:17)\n\n2\n\n1\n\n,\n\n(\u03b1))}\u03b1\u2208[0,1]\n\n(1)\n\nrn(\u03b1) = z(n)(\u03b1) + o\n\nwhere\n\nand\n\nz(n)(\u03b1) = (1 \u2212 p)\u22121/2 g(H\u22121(1 \u2212 \u03b1))\n\nh(H\u22121(1 \u2212 \u03b1)) B(n)\n\n(cid:40) \u03c11(\u03b3) = 0, \u03c12(\u03b3) = 1, if \u03b3 < 1\n\n1\n\n(\u03b1) + p\u22121/2B(n)\n\n2\n\n(ROC(\u03b1))\n\n\u03c11(\u03b3) = 0, \u03c12(\u03b3) = 2, if \u03b3 = 1\n\u03c11(\u03b3) = \u03b3, \u03c12(\u03b3) = \u03b3 \u2212 1 + \u03b5, \u03b5 > 0, if \u03b3 > 1\n\n.\n\nThese results may be immediately derived from classical strong approximations for the empirical\n\u221a\nand quantile processes, see [5, 18]). Incidentally, we mention that the approximation rate is not\nalways log2(n)/\nn, contrarily to what is claimed in [18].\nWe point out that, owing to the presence of the term (g/h)(H\u22121(1 \u2212 \u03b1)) in it, the gaussian approx-\nimant can hardly be used for constructing ROC con\ufb01dence bands. To avoid explicit computation of\ndensity estimates, bootstrap con\ufb01dence sets should be certainly preferred in practice.\n\n3 Bootstrapping empirical ROC curves\n\nBeyond consistency of the empirical curve in sup norm and the asymptotic normality of the \ufb02uctu-\nation process, we now tackle the question of constructing con\ufb01dence bands for the true ROC curve\nvia the bootstrap approach introduced by [6], extending pointwise results established in [17]. The\nlatter suggests to consider, as an estimate of the law of the \ufb02uctuation process rn = {rn(\u03b1)}\u03b1\u2208[0,1],\nthe conditional law given Dn of the bootstrapped \ufb02uctuation process\n\nn = {\u221a\nr\u2217\n\nn(ROC\u2217(\u03b1) \u2212 (cid:91)ROC(\u03b1))}\u03b1\u2208[0,1],\npairs with a common distribution (cid:101)Pn close to Pn. We shall also consider\nn = {(Z\u2217\n\nwhere ROC\u2217 is the ROC curve corresponding to a sample D\u2217\n\ni , Y \u2217\n\n(2)\ni )}1\u2264i\u2264n of i.i.d. random\n\n\u221a\nndB(ROC\u2217, (cid:91)ROC),\n\nd\u2217\nn =\n\n(3)\n\u221a\nndB((cid:91)ROC, ROC).\nwhose random \ufb02uctuations, given Dn, are expected to mimic those of dn =\nThe dif\ufb01culty is twofold. Firstly, the target of the bootstrap procedure is here a distribution on a path\nspace, the ROC space being viewed as a subspace of Dn([0, 1]), equipped with either ||.||\u221e or else\n\ndB(., .). Secondly, both rn and dn are functionals of the quantile process {(cid:98)H\u22121(\u03b1)}\u03b1\u2208[0,1]. It is\n\nwell-known that the naive bootstrap (i.e. resampling from the raw empirical distribution) generally\nprovides bad approximations of the distribution of empirical quantiles in practice: the rate of con-\nvergence for a given quantile is indeed of order OP(n\u22121/4), see [7], whereas the rate of the gaussian\napproximation is n\u22121/2. As shall be seen below, the same phenomenon may be naturally observed\nfor ROC curves. In a similar fashion to what is generally recommended for empirical quantiles, we\nsuggest to implement a smoothed version of the bootstrap algorithm in order to improve the approx-\nimation rate of ||rn||\u221e\u2019s distribution, respectively of dn\u2019s distribution . In short, this boils down to\nresampling the data from a smoothed version of the empirical distribution Pn.\n\n3.1 The Algorithm\nHere we describe the algorithm for building a con\ufb01dence band at level 1\u2212 \u0001 in the ROC space from\nI{Yi=1} = n \u2212 n\u2212. It is performed\n\nsampling data Dn = {(Zi, Yi); 1 \u2264 i \u2264 n}. Set n+ =(cid:80)\n\n1\u2264i\u2264n\n\nin four steps as follows.\n\n4\n\n\f1. Based on Dn, compute the empirical class cdf estimates (cid:98)G and (cid:98)H, as well as their\nsmoothed versions (cid:101)G and (cid:101)H. Plot the ROC curve estimate:\n\nALGORITHM - SMOOTHED ROC BOOTSTRAP\n\n2. From the smooth distribution estimate\n\n(cid:92)ROC(\u03b1) = 1 \u2212 (cid:98)G \u25e6 (cid:98)H\u22121(1 \u2212 \u03b1), \u03b1 \u2208 [0, 1].\nI{y=\u22121}(cid:101)H(dz),\n(cid:101)Pn(dz, y) = n\u2212\ni )}1\u2264i\u2264n conditioned on Dn.\nn = {(Z\u2217\n\nI{y=+1}(cid:101)G(dz) + n+\n\ni , Y \u2217\n\nn\n\nn\n\ndraw a bootstrap sample D\u2217\n\n3. Based on D\u2217\n\nn, compute the bootstrap versions of the empirical class cdf estimates G\u2217\n\nand H\u2217. Plot the bootstrap ROC curve\n\nROC\u2217(\u03b1) = 1 \u2212 G\u2217 \u25e6 H\u2217\u22121(1 \u2212 \u03b1), \u03b1 \u2208 [0, 1].\n\n4. Eventually, get the bootstrap con\ufb01dence bands at level 1\u2212\u0001 de\ufb01ned by the ball of center\nn||\u221e \u2264 \u03b4\u0001) = 1 \u2212 \u0001\nn \u2264 \u03b4\u0001) = 1 \u2212 \u0001, when considering the dB\n\n(cid:91)ROC and radius \u03b4\u0001/\nin the case of the sup norm or by P\u2217(d\u2217\ndistance, denoting by P\u2217(.) the conditional probability given the original data Dn.\n\n\u221a\nn in D([0, 1]), where \u03b4\u0001 is de\ufb01ned by P\u2217(||r\u2217\n\nBefore turning to the theoretical properties of this algorithm and related numerical experiments, a\nfew remarks are in order.\n\n(this procedure is equivalent to drawing bootstrap data from a smooth estimate (cid:101)Pn(dz, dy) com-\n\nRemark 1 (MONTE-CARLO APPROXIMATION) From a computational angle, the true smoothed\nbootstrap distribution must be approximated in its turn, using a Monte-Carlo approximation scheme.\nA convenient way of doing this in practice, while reproducing theoretical advantages of smoothing,\nconsists of drawing B bootstrap samples, of size n, with replacement in the original data and then\nperturbating each drawn data by independent centered gaussian random variables of variance h2\nputed using a gaussian kernel Kh(u) = (2\u03c0h2)\u22121/2 exp(\u2212u2/(2h2))), see [22]. Regarding the\nchoice of the number of bootstrap replications, picking B = n does not modify the rate of conver-\ngence. However, choosing B of magnitude comparable to n so that (1 + B)\u0001 is an integer may be\nmore appropriate: the \u0001-quantile of the approximate bootstrap distribution is the uniquely de\ufb01ned\nand this will not modify the rate of convergence neither, see [15].\n\nRemark 2 (ON TUNING PARAMETERS) The primary tuning parameters of the Algorithm are those\nrelated to the smoothing stage. When using a gaussian regularizing kernel, one should typically\nchoose a bandwidth hn of order n\u22121/5 in order to minimize the mean square error.\nRemark 3 (ON RECENTERING) From the asymptotic analysis viewpoint, it would be fairly equiva-\n\nlent to recenter by a smoothed version of the original empirical curve (cid:93)ROC(.) = 1\u2212(cid:101)G\u25e6(cid:101)H\u22121(1\u2212 .)\n\nin the computation of the bootstrap \ufb02uctuation process. However, numerically speaking, computing\nthe sup norm of the estimate (2) is much more tractable, insofar as it solely requires to evaluate the\ndistance between piecewise constant curves over the pooled set of jump points. It should also be\nnoticed that smoothing the original curve, as proposed in [17], should be also avoided in practice,\nsince it hides the jump locations, which constitute the essential part of the information.\n\n3.2 Asymptotic analysis\n\nWe now investigate the accuracy of the bootstrap estimate output by the Algorithm. The result stated\nin the next theorem extend those established in [17] in the pointwise framework. The functional\nnature of the approximation result below is essential, since it should be enhanced that, in most\nranking applications, assessing the uncertainty about the whole estimated ROC curve, or some part\nof it at least, is what really matters. In the sequel, we assume that the kernel K used in the smoothing\nstep is \u201dpyramidal\u201d (e.g. gaussian or of the form I{u\u2208[\u22121,+1]}).\n\n5\n\n\f|P\u2217(||r\u2217\n\n|P\u2217(d\u2217\n\nn \u2264 t) \u2212 P(dn \u2264 t)| are of order oP\n\nn \u2192 \u221e and nh5\n\nn||\u221e \u2264 t) \u2212 P(||rn||\u221e \u2264 t)| and sup\nt\u2208R\n\nAssume further that smoothed versions of the cdf\u2019s (cid:101)G and (cid:101)H are computed at step 1 using a scaled\nTheorem. 2 (ASYMPTOTIC ACCURACY) Suppose that the hypotheses of Theorem 1 are ful\ufb01lled.\nkernel Khn(u) with hn \u2193 0 as n \u2192 \u221e in a way that nh3\nn log2 n \u2192 0. Then, the\n(cid:18)log(h\u22121\nbootstrap distribution estimates output by the Algorithm are such that\n\nn )\n\u221a\nsup\nt\u2208R\nnhn\nHence, up to logarithmic factors, choosing hn \u223c 1/(log2+\u03b7 n1/5) with \u03b7 > 0 yields an approxima-\ntion error of order n\u22122/5 for the bootstrap estimate. Although its rate is slower than the one of the\ngaussian approximation (1), the smoothed bootstrap method remains very appealing from a compu-\ntational perspective, the construction of con\ufb01dence bands from simulated brownian bridges being\nvery dif\ufb01cult to implement in practice. As shall be seen below, the rate reached by the smoothed\nbootstrap distribution is nevertheless a great improvement, compared to the naive bootstrap approach\n(see the discussion below).\n\n(cid:19)\n\n.\n\nRemark 4 (BOOTSTRAPPING SUMMARY STATISTICS) From Theorem 1 above, asymptotic validity\nof the smooth bootstrap method for estimating the distribution of the \ufb02uctuations of a functional\n\u03a6((cid:91)ROC) of the empirical ROC curve may be deduced, as soon as the function \u03a6 de\ufb01ned on D([0, 1])\nis suf\ufb01ciently smooth (namely continuously Hadamard differentiable). For instance, it could be\napplied to summary statistics involving a speci\ufb01c piece of the ROC curve only in order to focus on\nthe \u201dbest instances\u201d [3], or more classically to the area under the ROC curve (AUC). However, in\nthe latter case, due to the fact that this particular summary statistic is of the form of a U-statistic\n[2], the naive bootstrap rate is faster than the one we obtained here (of order n\u22121).\n\n3.3 Simulation results\n\nThe striking advantage of the smoothed bootstrap is the improved rate of convergence of the resulting\nestimator. Furthermore, choosing dB for measuring the magnitude order of curve \ufb02uctuations has an\neven larger impact on the accuracy of the empirical bands. As an illustration of this theoretical result,\nwe now display simulation results, emphasizing the gain acquired by smoothing and considering the\npseudo-metric dB.\nWe present con\ufb01dence bands for a single trajectory and the estimation of the coverage probability\nof the bands for a simple binormal model:\n\nYi = +1 if \u03b20 + \u03b21X + \u03b5 > 0, and Yi = \u22121 otherwise,\n\nwhere \u03b5 and X are independent standard normal r.v.\u2019s. In this example, the scoring function s(x)\nis the maximum likelihood estimator of the probit model on the training set. We choose here\n\u03b20 = \u03b21 = 1, n = 1000, B = 999 and \u03b3 = 0.95 for the targeted coverage probability. Cov-\nerage probabilities are obtained over 2000 replications of the procedure, using the package ROCR of\nstatistical software R. As mentioned before, choosing ||.||\u221e yields very large bands with coverage\nprobability close to 1! Though still large, bands based on the pseudo-metric dB are clearly much\nmore informative (see Fig. 1). It should be noticed that the coverage improvement obtained by\nsmoothing is clearer in the pontwise estimation setup (here \u03b1 = 0.2) but much more dif\ufb01cult to\nevidence for con\ufb01dence bands.\n\nTable 1: Empirical coverage probabilities for 95% empirical bands/intervals.\n\nMETHOD\nNAIVE BOOTSTRAP ||rn||\u221e\nSMOOTHED BOOTSTRAP ||rn||\u221e\nNAIVE BOOTSTRAP dn\nSMOOTHED BOOTSTRAP dn\nNAIVE BOOTSTRAP rn(0.2)\nSMOOTHED BOOTSTRAP rn(0.2)\n\nCOVERAGE (%)\n\n100\n100\n90.3\n93.1\n89.7\n92.5\n\n6\n\n\fFigure 1: ROC con\ufb01dence bands.\n\n4 Discussion\n\nLet us now give an insight into the reason why the smoothed bootstrap procedure outperforms the\nbootstrap without smoothing.\nIn most statistical problems where the nonparametric bootstrap is\nuseful, there is no particular reason for implementing it from a smoothed version of the empirical df\nfrom the raw cdf\u2019s (cid:98)G and (cid:98)H instead of their smoothed versions at step 2 of the Algorithm. Then, for\nrather from the raw empirical distribution itself, see [22]. However, in the present case, smoothing\naffects the rate of convergence. Suppose indeed that the bootstrap process (2) is built by drawing\nn(\u03b1) \u2264 t) \u2212 P(rn(\u03b1) \u2264 t)| = OP(n\u22121/4). Hence, the naive bootstrap\nany \u03b1 \u2208]0, 1[, supt\u2208R |P\u2217(r\u2217\nrate induces an error of order O(n\u22121/4) which cannot be improved, whereas it may be shown that\nthe rate n\u22122/5 is attained by the smoothed bootstrap (in a similar fashion to the functional setup),\nprovided that the amount of smoothing is properly chosen. Heuristically, this is a consequence of the\noscillation behavior of the deviation between the bootstrap quantile H\u2217\u22121(1 \u2212 \u03b1) and its expected\n\nvalue (cid:98)H\u22121(1 \u2212 \u03b1) given the data Dn, due to the fact that the step cdf (cid:98)H is not regular around\n(cid:98)H\u22121(1 \u2212 \u03b1): this corresponds to a jump with probability one.\n\nHigher-order accuracy. A classical way of improving the pointwise approximation rate consists of\nbootstrapping a standardized version of the r.v. rn(\u03b1). It is natural to consider, as standardization\nfactor, the square root of an estimate of the asymptotic variance:\ng(H\u22121(1 \u2212 \u03b1))2\nh(H\u22121(1 \u2212 \u03b1))2 +\n\n\u03c32(\u03b1) = var(z(n)(\u03b1)) = \u03b1(1 \u2212 \u03b1)\n1 \u2212 p\n\nROC(\u03b1)(1 \u2212 ROC(\u03b1))\n\n(4)\n\nAn estimate (cid:98)\u03c32\nsmoothed density estimators \u02dch = (cid:101)H(cid:48) and \u02dcg = (cid:101)G(cid:48) into (4) instead of their (unknown) theoretical\nn of plug-in type could be considered, obtained by plugging n+/n, (cid:93)ROC and\n\ncounterparts. More interestingly, from a computational viewpoint, a bootstrap estimator of the vari-\nance could also be used. Following the argument used in [17] for a smoothed original estimate of\n\u221a\nthe ROC curve, one may show that a smoothed bootstrap of the studentized statistic rn(\u03b1)/\u03c3n(\u03b1)\nyields a better pointwise rate of convergence than 1/\nn, the one of the gaussian approximation in\nthe Central Limit Theorem. Precisely, for a given \u03b1 \u2208]0, 1[, if the bandwidth used in the computation\n\np\n\n.\n\n7\n\nFalse positive rateTrue positive rate0.00.20.40.60.81.00.00.20.40.60.81.00.020.220.410.610.81Figure1:||.||\u221econ\ufb01dencebandFalse positive rateTrue positive rate0.00.20.40.60.81.00.00.20.40.60.81.00.010.210.410.60.81Figure2:dBcon\ufb01dencebandFalse positive rateTrue positive rate0.00.20.40.60.81.00.00.20.40.60.81.00.010.210.410.60.81Figure3:Ponctualsmoothbootstrapcon\ufb01denceinterval\fof \u03c32\n\nn(\u03b1) is chosen of order n\u22121/3, we have:\n\n\u2212 P\nn(\u03b1)\u2019s bootstrap counterpart by \u03c3\u22172\n\nn(\u03b1)\n\u03c3\u2217\nn(\u03b1)\n\nsup\nt\u2208R\n\n\u2264 t\n\n(cid:12)(cid:12)(cid:12)(cid:12)P\u2217(cid:18) r\u2217\n\n(cid:19)\n\n(cid:18) rn(\u03b1)\n\n\u03c3n(\u03b1)\n\n(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) = OP\n\n\u2264 t\n\n(cid:18) 1\n\nn2/3\n\n(cid:19)\n\n,\n\n(5)\n\nn (\u03b1). Notice that the bandwidth used in the standard-\ndenoting \u03c32\nization step (i.e. for estimating the variance) is not the same as the one used at the resampling stage\nof the procedure. This is a key point for achieving second-order accuracy. This time, the smoothed\n(studentized) bootstrap method widely outperforms the gaussian approach, when the matter is to\nbuild con\ufb01dence intervals for the ordinate (cid:91)ROC(\u03b1) of a point of abciss \u03b1 on the empirical ROC\ncurve. However, it is not clear yet, whether this result remains true for con\ufb01dence bands, when\nconsidering the whole ROC curve (this would actually require to establish an Edgeworth expansion\n\nfor the supremum ||rn/(cid:98)\u03c3n||\u221e). This will be the scope of further research.\n\nReferences\n[1] P. Bertail, S. Cl\u00b4emenc\u00b8on, and N. Vayatis. On constructing accurate con\ufb01dence bands for ROC curves\n\nthrough smooth resampling, http://hal.archives-ouvertes.fr/hal-00335232/fr/. Technical report, 2008.\n\n[2] S. Cl\u00b4emenc\u00b8on, G. Lugosi, and N. Vayatis. Ranking and scoring using empirical risk minimization. Pro-\n\nceedings of COLT 2005, Eds P. Auer and R. Meir, LNAI 3559, Springer, 2005.\n\n[3] S. Cl\u00b4emenc\u00b8on and N. Vayatis. Ranking the best instances. Journal of Machine Learning Research,\n\n5:197\u2013227, 2007.\n\n[4] W. Cohen, R. Schapire, and Y. Singer. Learning to order things. Journal of Arti\ufb01cial Intelligence Research,\n\n10:243\u2013270, 1999.\n\n[5] M. Csorgo and P. Revesz. Strong approximations in probability and statistics. Academic Press, 1981.\n[6] B. Efron. Bootstrap methods: another look at the jacknife. Annals of Statistics, 7:1\u201326, 1979.\n[7] M. Falk and R. Reiss. Weak convergence of smoothed and nonsmoothed bootstrap quantile estimates.\n\nAnnals of Probability, 17:362\u2013371, 1989.\n\n[8] T. Fawcett. ROC graphs: Notes and practical considerations for data mining researchers. Technical Report\n\nHPL 2003-4), 5:197\u2013227, 2003.\n\n[9] P. Flach. The geometry of roc space: understanding machine learning metrics through roc isometrics. In\nT. Fawcett and N. Mishra, editors, Proc. 20th International Conference on Machine Learning (ICML\u201903),\nAAAI Press, 86:194\u2013201, 2003.\n\n[10] Y. Freund, R. Iyer, R. Schapire, and Y. Singer. An ef\ufb01cient boosting algorithm for combining preferences.\n\nJournal of Machine Learning Research, 4:933\u2013969, 2003.\n\n[11] P. Ghosal and J. Gu. Bayesian ROC curve estimation under binormality using a partial likelihood based\n\non ranks. Submitted for publication, 2007.\n\n[12] P. Ghosal and J. Gu. Strong approximations for resample quantile process and application to ROC method-\n\nology. Submitted for publication, 2007.\n\n[13] A. Girling. ROC con\ufb01dence bands: An empirical evaluation. Journal of the Royal Statistical Society,\n\nSeries B, 62:367\u2013382, 2000.\n\n[14] D. Green and J. Swets. Signal detection theory and psychophysics. Wiley, NY, 1966.\n[15] P. Hall. On the number of bootstrap simulations required to construct a con\ufb01dence interval. Annals of\n\nStatistics, 14:1453\u20131462, 1986.\n\n[16] P. Hall and R. Hyndman.\n\nImproved methods for bandwidth selection when estimating ROC curves.\n\nStatistics and Probability Letters, 64:181\u2013189, 2003.\n\n[17] P. Hall, R. Hyndman, and Y. Fan. Nonparametric con\ufb01dence intervals for receiver operating characteristic\n\ncurves. Biometrika, 91:743\u2013750, 2004.\n\n[18] F. Hsieh and B. Turnbull. Nonparametric and semi-parametric statistical estimation of the ROC curve.\n\nThe Annals of Statistics, 24:25\u201340, 1996.\n\n[19] S. Macskassy and F. Provost. Con\ufb01dence bands for ROC curves: methods and an empirical study. In\n\nProceedings of the \ufb01rst Workshop on ROC Analysis in AI (ROCAI-2004) at ECAI-2004, 2004.\n\n[20] S. Macskassy, F. Provost, and S. Rosset. Bootstrapping the ROC curve: an empirical evaluation.\nProceedings of ICML-2005 Workshop on ROC Analysis in Machine Learning (ROCML-2005), 2005.\n\nIn\n\n[21] S. Macskassy, F. Provost, and S. Rosset. ROC con\ufb01dence bands: An empirical evaluation. In Proceedings\n\nof the 22nd International Conference on Machine Learning (ICML-2005), 2005.\n\n[22] B. Silverman and G. Young. The bootstrap: to smooth or not to smooth? Biometrika, 74:469\u2013479, 1987.\n\n8\n\n\f", "award": [], "sourceid": 589, "authors": [{"given_name": "Patrice", "family_name": "Bertail", "institution": null}, {"given_name": "St\u00e9phan", "family_name": "Cl\u00e9men\u00e7con", "institution": null}, {"given_name": "Nicolas", "family_name": "Vayatis", "institution": null}]}