Q-Learning with Hidden-Unit Restarting

Part of Advances in Neural Information Processing Systems 5 (NIPS 1992)

Bibtex Metadata Paper


Charles Anderson


Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restart(cid:173) ing, units continue to learn via back-propagation. The resulting restart algorithm is tested in a Q-Iearning network that learns to solve an inverted pendulum problem. Solutions are found faster on average with the restart algorithm than without it.