Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)
Michael C. Mozer, Robert Dodier, Michael Colagrosso, Cesar Guerra-Salcedo, Richard Wolniewicz
When designing a two-alternative classiﬁer, one ordinarily aims to maximize the classiﬁer’s ability to discriminate between members of the two classes. We describe a situation in a real-world business application of machine-learning prediction in which an additional constraint is placed on the nature of the solu- tion: that the classiﬁer achieve a speciﬁed correct acceptance or correct rejection rate (i.e., that it achieve a ﬁxed accuracy on members of one class or the other). Our domain is predicting churn in the telecommunications industry. Churn refers to customers who switch from one service provider to another. We pro- pose four algorithms for training a classiﬁer subject to this domain constraint, and present results showing that each algorithm yields a reliable improvement in performance. Although the improvement is modest in magnitude, it is nonethe- less impressive given the difﬁculty of the problem and the ﬁnancial return that it achieves to the service provider.
When designing a classiﬁer, one must specify an objective measure by which the classi- ﬁer’s performance is to be evaluated. One simple objective measure is to minimize the number of misclassiﬁcations. If the cost of a classiﬁcation error depends on the target and/ or response class, one might utilize a risk-minimization framework to reduce the expected loss. A more general approach is to maximize the classiﬁer’s ability to discriminate one class from another class (e.g., Chang & Lippmann, 1994).
An ROC curve (Green & Swets, 1966) can be used to visualize the discriminative performance of a two-alternative classiﬁer that outputs class posteriors. To explain the ROC curve, a classiﬁer can be thought of as making a positive/negative judgement as to whether an input is a member of some class. Two different accuracy measures can be obtained from the classiﬁer: the accuracy of correctly identifying an input as a member of the class (a correct acceptance or CA), and the accuracy of correctly identifying an input as a nonmember of the class (a correct rejection or CR). To evaluate the CA and CR rates, it is necessary to pick a threshold above which the classiﬁer’s probability estimate is inter- preted as an “accept,” and below which is interpreted as a “reject”—call this the criterion. The ROC curve plots CA against CR rates for various criteria (Figure 1a). Note that as the threshold is lowered, the CA rate increases and the CR rate decreases. For a criterion of 1, the CA rate approaches 0 and the CR rate 1; for a criterion of 0, the CA rate approaches 1