Part of Advances in Neural Information Processing Systems 19 (NIPS 2006)
Benjamin Rubinstein, Peter Bartlett, J. Rubinstein
Under the prediction model of learning, a prediction strategy is presented with an i.i.d. sample of n - 1 points in X and corresponding labels from a concept f F , and aims to minimize the worst-case probability of erring on an nth point. By exploiting the structure of F , Haussler et al. achieved a VC(F )/n bound for the natural one-inclusion prediction strategy, improving on bounds implied by PAC-type results by a O(log n) factor. The key data structure in their result is the natural subgraph of the hypercube--the one-inclusion graph; the key step is a d = VC(F ) bound on one-inclunion graph density. The first main result of this s /n -1 paper is a density bound of n d-1 ( d ) < d, which positively resolves a conjecture of Kuzmin & Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved mistake bound for the randomized (deterministic) one-inclusion strategy for all d (for d (n)). The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is a k -class analogue of the d/n mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This bound on expected risk improves on known PAC-based results by a factor of O(log n) and is shown to be optimal up to a O(log k ) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout.