Clayton Scott, Robert Nowak
Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anoma- lies and constructing conﬁdence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference mea- sure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classiﬁcation. As in classiﬁcation, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain ﬁnite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules.