Sample Size Requirements for Feedforward Neural Networks

Part of Advances in Neural Information Processing Systems 7 (NIPS 1994)

Bibtex Metadata Paper


Michael Turmon, Terrence L. Fine


We estimate the number of training samples required to ensure that the performance of a neural network on its training data matches that obtained when fresh data is applied to the network. Existing estimates are higher by orders of magnitude than practice indicates. This work seeks to narrow the gap between theory and practice by transforming the problem into determining the distribution of the supremum of a random field in the space of weight vectors, which in turn is attacked by application of a recent technique called the Poisson clumping heuristic.



We investigate the tradeofi"s among network complexity, training set size, and sta(cid:173) tistical performance of feedforward neural networks so as to allow a reasoned choice of network architecture in the face of limited training data. Nets are functions 7](x; w), parameterized by their weight vector w E W ~ Rd , which take as input points x E Rk. For classifiers, network output is restricted to {a, 1} while for fore(cid:173) casting it may be any real number. The architecture of all nets under consideration is N, whose complexity may be gauged by its Vapnik-Chervonenkis (VC) dimension v, the size of the largest set of inputs the architecture can classify in any desired way ('shatter'). Nets 7] EN are chosen on the basis of a training set T = {(Xi, YiHr=l. These n samples are i.i.d. according to an unknown probability law P. Performance of a network is measured by the mean-squared error