Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)
Sumio Watanabe, Shun-ichi Amari
A lot of learning machines with hidden variables used in infor- mation science have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degenerate, resulting that the learning theory of regular statistical models does not hold. Recently, it was proven that, if the true parameter is contained in singularities, then the coe(cid:14)cient of the Bayes gen- eralization error is equal to the pole of the zeta function of the Kullback information. In this paper, under the condition that the true parameter is almost but not contained in singularities, we show two results. (1) If the dimension of the parameter from in- puts to hidden units is not larger than three, then there exits a region of true parameters where the generalization error is larger than those of regular models, however, if otherwise, then for any true parameter, the generalization error is smaller than those of regular models. (2) The symmetry of the generalization error and the training error does not hold in singular models in general.