Part of Advances in Neural Information Processing Systems 8 (NIPS 1995)
A theory of early stopping as applied to linear models is presented. The backpropagation learning algorithm is modeled as gradient descent in continuous time. Given a training set and a validation set, all weight vectors found by early stopping must lie on a cer(cid:173) tain quadric surface, usually an ellipsoid. Given a training set and a candidate early stopping weight vector, all validation sets have least-squares weights lying on a certain plane. This latter fact can be exploited to estimate the probability of stopping at any given point along the trajectory from the initial weight vector to the least(cid:173) squares weights derived from the training set, and to estimate the probability that training goes on indefinitely. The prospects for extending this theory to nonlinear models are discussed.