Part of Advances in Neural Information Processing Systems 8 (NIPS 1995)
Shun-ichi Amari, Noboru Murata, Klaus-Robert Müller, Michael Finke, Howard Yang
A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback(cid:173) Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stop(cid:173) ping, even if we have access to the optimal stopping time. Consider(cid:173) ing cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in or(cid:173) der to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the general(cid:173) ization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.