Asymptotics of Gradient-based Neural Network Training Algorithms

Mukherjee, Sayandev; Fine, Terrence L.

Asymptotics of Gradient-based Neural Network Training Algorithms

Sayandev Mukherjee, Terrence L. Fine

Advances in Neural Information Processing Systems 7 (NIPS 1994)

Abstract

We study the asymptotic properties of the sequence of iterates of weight-vector estimates obtained by training a multilayer feed for(cid:173) ward neural network with a basic gradient-descent method using a fixed learning constant and no batch-processing. In the one(cid:173) dimensional case, an exact analysis establishes the existence of a limiting distribution that is not Gaussian in general. For the gen(cid:173) eral case and small learning constant, a linearization approximation permits the application of results from the theory of random ma(cid:173) trices to again establish the existence of a limiting distribution. We study the first few moments of this distribution to compare and contrast the results of our analysis with those of techniques of stochastic approximation.

Abstract

Name Change Policy