Jonathan Li, Andrew Barron
Gaussian mixtures (or so-called radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neu(cid:173) ral networks for function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improve(cid:173) ment of performance as components of the network are introduced one at a time. In particular, for mixture density estimation we show that a k-component mixture estimated by maximum likelihood (or by an iterative likelihood improvement that we introduce) achieves log-likelihood within order 1/k of the log-likelihood achievable by any convex combination. Consequences for approximation and es(cid:173) timation using Kullback-Leibler risk are also given. A Minimum Description Length principle selects the optimal number of compo(cid:173) nents k that minimizes the risk bound.