Part of Advances in Neural Information Processing Systems 9 (NIPS 1996)
The parameter space of neural networks has a Riemannian met(cid:173) ric structure. The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper studies the information-geometrical structure of perceptrons and other networks, and prove that the on-line learning method based on the natural gradient is asymptotically as efficient as the optimal batch algorithm. Adaptive modification of the learning constant is proposed and analyzed in terms of the Riemannian mea(cid:173) sure and is shown to be efficient. The natural gradient is finally applied to blind separation of mixtured independent signal sources.