Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)
Clayton Scott, Robert Nowak
This paper reports on a family of computationally practical classifiers that converge to the Bayes error at near-minimax optimal rates for a va- riety of distributions. The classifiers are based on dyadic classification trees (DCTs), which involve adaptively pruned partitions of the feature space. A key aspect of DCTs is their spatial adaptivity, which enables lo- cal (rather than global) fitting of the decision boundary. Our risk analysis involves a spatial decomposition of the usual concentration inequalities, leading to a spatially adaptive, data-dependent pruning criterion. For any distribution on (X, Y ) whose Bayes decision boundary behaves locally like a Lipschitz smooth function, we show that the DCT error converges to the Bayes error at a rate within a logarithmic factor of the minimax optimal rate. We also study DCTs equipped with polynomial classifica- tion rules at each leaf, and show that as the smoothness of the boundary increases their errors converge to the Bayes error at a rate approaching n−1/2, the parametric rate. We are not aware of any other practical classi- fiers that provide similar rate of convergence guarantees. Fast algorithms for tree pruning are discussed.