Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

Bartlett, Peter; Jordan, Michael; Mcauliffe, Jon

Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

Peter L. Bartlett, Michael I. Jordan, Jon D. Mcauliffe

Advances in Neural Information Processing Systems 16 (NIPS 2003)

Abstract

Many classiﬁcation algorithms, including the support vector machine, boosting and logistic regression, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. We characterize the statistical consequences of using such a surrogate by pro- viding a general quantitative relationship between the risk as assessed us- ing the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial bounds un- der the weakest possible condition on the loss function—that it satisfy a pointwise form of Fisher consistency for classiﬁcation. The relationship is based on a variational transformation of the loss function that is easy to compute in many applications. We also present a reﬁned version of this result in the case of low noise. Finally, we present applications of our results to the estimation of convergence rates in the general setting of function classes that are scaled hulls of a ﬁnite-dimensional base class.

Abstract

Name Change Policy