Part of Advances in Neural Information Processing Systems 4 (NIPS 1991)
Christian Darken, John Moody
Stochastic gradient descent is a general algorithm which includes LMS, on-line backpropagation, and adaptive k-means clustering as special cases. The standard choices of the learning rate 1] (both adaptive and fixed func(cid:173) tions of time) often perform quite poorly. In contrast, our recently pro(cid:173) posed class of "search then converge" learning rate schedules (Darken and Moody, 1990) display the theoretically optimal asymptotic convergence rate and a superior ability to escape from poor local minima. However, the user is responsible for setting a key parameter. We propose here a new method(cid:173) ology for creating the first completely automatic adaptive learning rates which achieve the optimal rate of convergence.