The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems

Part of Advances in Neural Information Processing Systems 4 (NIPS 1991)

Bibtex Metadata Paper

Authors

John Moody

Abstract

We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learn(cid:173) ing systems, such as multilayer perceptrons and radial basis functions. The principal result is the following relationship (computed to second order) between the expected test set and tlaining set errors:

(1) Here, n is the size of the training sample e, u;f f is the effective noise variance in the response variable( s), ,x is a regularization or weight decay parameter, and Peff(,x) is the effective number of parameters in the non(cid:173) linear model. The expectations ( ) of training set and test set errors are taken over possible training sets e and training and test sets e' respec(cid:173) tively. The effective number of parameters Peff(,x) usually differs from the true number of model parameters P for nonlinear or regularized models; this theoretical conclusion is supported by Monte Carlo experiments. In addition to the surprising result that Peff(,x) ;/; p, we propose an estimate of (1) called the generalized prediction error (GPE) which generalizes well established estimates of prediction risk such as Akaike's F P E and AI C, Mallows Cp, and Barron's PSE to the nonlinear setting.!

lCPE and Peff(>") were previously introduced in Moody (1991).