Balancing Multiple Sources of Reward in Reinforcement Learning

Christian R. Shelton

Advances in Neural Information Processing Systems 13 (NIPS 2000)

For many problems which would be natural for reinforcement learning, the reward signal is not a single scalar value but has multiple scalar com(cid:173) ponents. Examples of such problems include agents with multiple goals and agents with multiple users. Creating a single reward value by com(cid:173) bining the multiple components can throwaway vital information and can lead to incorrect solutions. We describe the multiple reward source problem and discuss the problems with applying traditional reinforce(cid:173) ment learning. We then present an new algorithm for finding a solution and results on simulated environments.