Asymptotics of Gradient-based Neural Network Training Algorithms

Part of Advances in Neural Information Processing Systems 7 (NIPS 1994)

Bibtex Metadata Paper


Sayandev Mukherjee, Terrence L. Fine


We study the asymptotic properties of the sequence of iterates of weight-vector estimates obtained by training a multilayer feed for(cid:173) ward neural network with a basic gradient-descent method using a fixed learning constant and no batch-processing. In the one(cid:173) dimensional case, an exact analysis establishes the existence of a limiting distribution that is not Gaussian in general. For the gen(cid:173) eral case and small learning constant, a linearization approximation permits the application of results from the theory of random ma(cid:173) trices to again establish the existence of a limiting distribution. We study the first few moments of this distribution to compare and contrast the results of our analysis with those of techniques of stochastic approximation.