Part of Advances in Neural Information Processing Systems 9 (NIPS 1996)
A new reinforcement learning architecture for nonlinear control is proposed. A direct feedback controller, or the actor, is trained by a value-gradient based controller, or the tutor. This architecture enables both efficient use of the value function and simple computa(cid:173) tion for real-time implementation. Good performance was verified in multi-dimensional nonlinear control tasks using Gaussian soft(cid:173) max networks.