Part of Advances in Neural Information Processing Systems 2 (NIPS 1989)
Subutai Ahmad, Gerald Tesauro, Yu He
Yu He Dept. of Physics Ohio State Univ. Columbus, OH 43212
We have calculated, both analytically and in simulations, the rate of convergence at long times in the backpropagation learning al(cid:173) gorithm for networks with and without hidden units. Our basic finding for units using the standard sigmoid transfer function is lit convergence of the error for large t, with at most logarithmic cor(cid:173) rections for networks with hidden units. Other transfer functions may lead to a 8lower polynomial rate of convergence. Our analytic calculations were presented in (Tesauro, He & Ahamd, 1989). Here we focus in more detail on our empirical measurements of the con(cid:173) vergence rate in numerical simulations, which confirm our analytic results.