Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)
Christian Shelton
For many problems which would be natural for reinforcement learning, the reward signal is not a single scalar value but has multiple scalar com(cid:173) ponents. Examples of such problems include agents with multiple goals and agents with multiple users. Creating a single reward value by com(cid:173) bining the multiple components can throwaway vital information and can lead to incorrect solutions. We describe the multiple reward source problem and discuss the problems with applying traditional reinforce(cid:173) ment learning. We then present an new algorithm for finding a solution and results on simulated environments.