Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher in(cid:173) formation matrices are singular. In this paper , the rigorous asymp(cid:173) totic form of the stochastic complexity is clarified based on resolu(cid:173) tion of singularities and two different problems are studied. (1) If the prior is positive, then the stochastic complexity is far smaller than BIO, resulting in the smaller generalization error than regular statistical models, even when the true distribution is not contained in the parametric model. nate free and equal to zero at singularities, is employed then the stochastic complexity has the same form as BIO. It is useful for model selection, but not for generalization.
(2) If Jeffreys' prior, which is coordi(cid:173)