Mark Plutowski, Shinichi Sakata, Halbert White
Integrated Mean Squared Error (IMSE) is a version of the usual mean squared error criterion, averaged over all possible training If it could be observed, it could be used sets of a given size. to determine optimal network complexity or optimal data sub(cid:173) sets for efficient training. We show that two common methods of cross-validating average squared error deliver unbiased estimates of IMSE, converging to IMSE with probability one. These esti(cid:173) mates thus make possible approximate IMSE-based choice of net(cid:173) work complexity. We also show that two variants of cross validation measure provide unbiased IMSE-based estimates potentially useful for selecting optimal data subsets.
1 Summary To begin, assume we are given a fixed network architecture. (We dispense with this assumption later.) Let zN denote a given set of N training examples. Let QN(zN) denote the expected squared error (the expectation taken over all possible examples) of the network after being trained on zN. This measures the quality of fit afforded by training on a given set of N examples. Let IMSEN denote the Integrated Mean Squared Error for training sets of size N. Given reasonable assumptions, it is straightforward to show that IMSEN = E[Q N(ZN)] - 0"2, where the expectation is now over all training sets of size N, ZN is a random training set of size N, and 0"2 is the noise variance. Let CN = CN(zN) denote the "delete-one cross-validation" squared error measure for a network trained on zN. CN is obtained by training networks on each of the N training sets of size N -1 obtained by deleting a single example; the measure follows