Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)
The generalization ability of a neural network can sometimes be improved dramatically by regularization. To analyze the improve(cid:173) ment one needs more refined results than the asymptotic distri(cid:173) bution of the weight vector. Here we study the simple case of one-dimensional linear regression under quadratic regularization, i.e., ridge regression. We study the random design, misspecified case, where we derive expansions for the optimal regularization pa(cid:173) rameter and the ensuing improvement. It is possible to construct examples where it is best to use no regularization.