Generalization in Decision Trees and DNF: Does Size Matter?

Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)

Bibtex Metadata Paper


Mostefa Golea, Peter Bartlett, Wee Sun Lee, Llew Mason


Recent theoretical results for pattern classification with thresh(cid:173) olded real-valued functions (such as support vector machines, sig(cid:173) moid networks, and boosting) give bounds on misclassification probability that do not depend on the size of the classifier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more widely applied, by representing other boolean functions as two-layer neural networks (thresholded convex combinations of boolean functions). For example, we show that with high probabil(cid:173) ity any decision tree of depth no more than d that is consistent with m training examples has misclassification probability no more than o ( (~ (Neff VCdim(U) log2 m log d)) 1/2), where U is the class of node decision functions, and Neff ::; N can be thought of as the effective number of leaves (it becomes small as the distribution on the leaves induced by the training data gets far from uniform). This bound is qualitatively different from the VC bound and can be considerably smaller. We use the same technique to give similar results for DNF formulae.

• Author to whom correspondence should be addressed


M. Golea, P Bartlett, W. S. Lee and L Mason