Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)

*Tom Heskes*

A simple linear averaging of the outputs of several networks as e.g. in bagging [3], seems to follow naturally from a bias/variance decomposition of the sum-squared error. The sum-squared error of the average model is a quadratic function of the weighting factors assigned to the networks in the ensemble [7], suggesting a quadratic programming algorithm for finding the "optimal" weighting factors. If we interpret the output of a network as a probability statement, the sum-squared error corresponds to minus the loglikelihood or the Kullback-Leibler divergence, and linear averaging of the out(cid:173) puts to logarithmic averaging of the probability statements: the logarithmic opinion pool. The crux of this paper is that this whole story about model aver(cid:173) aging, bias/variance decompositions, and quadratic programming to find the optimal weighting factors, is not specific for the sum(cid:173) squared error, but applies to the combination of probability state(cid:173) ments of any kind in a logarithmic opinion pool, as long as the Kullback-Leibler divergence plays the role of the error measure. As examples we treat model averaging for classification models under a cross-entropy error measure and models for estimating variances.

Do not remove: This comment is monitored to verify that the site is working properly