Part of Advances in Neural Information Processing Systems 6 (NIPS 1993)
David H. Wolpert
The conventional Bayesian justification of backprop is that it finds the MAP weight vector. As this paper shows, to find the MAP i-o function instead one must add a correction tenn to backprop. That tenn biases one towards i-o functions with small description lengths, and in particular fa(cid:173) vors (some kinds of) feature-selection, pruning, and weight-sharing.