On the Use of Evidence in Neural Networks

Part of Advances in Neural Information Processing Systems 5 (NIPS 1992)

Bibtex Metadata Paper


David Wolpert


The Bayesian "evidence" approximation has recently been employed to determine the noise and weight-penalty terms used in back-propagation. This paper shows that for neural nets it is far easier to use the exact result than it is to use the evidence approximation. Moreover, unlike the evi(cid:173) dence approximation, the exact result neither has to be re-calculated for every new data set, nor requires the running of computer code (the exact result is closed form). In addition, it turns out that the evidence proce(cid:173) dure's MAP estimate for neural nets is, in toto, approximation error. An(cid:173) other advantage of the exact analysis is that it does not lead one to incor(cid:173) rect intuition, like the claim that using evidence one can "evaluate differ(cid:173) ent priors in light of the data". This paper also discusses sufficiency conditions for the evidence approximation to hold, why it can sometimes give "reasonable" results, etc.


It has recently become popular to consider the problem of training neural nets from a Baye(cid:173) sian viewpoint (Buntine and Weigend 1991, MacKay 1992). The usual way of doing this starts by assuming that there is some underlying target function f from Rn to R, parameter(cid:173) ized by an N-dimensional weight vector w. We are provided with a training set L of noise(cid:173) corrupted samples of f. Our goal is to make a guess for w, basing that guess only on L. Now assume we have Li.d. additive gaussian noise resulting in P(L I w, ~) oc exp(-~ X2)), where X2(w, L) is the usual sum-squared training set error, and ~ reflects the noise level. Assume further that P(w I a.) oc exp(-o.W(w)), where W(w) is the sum of the squares ofthe weights. If the values of a. and ~ are known and fixed, to the values ~ and ~t respectively, then P(w)