The Steering Approach for Multi-Criteria Reinforcement Learning

Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)

Bibtex Metadata Paper


Shie Mannor, Nahum Shimkin


We consider the problem of learning to attain multiple goals in a dynamic envi- ronment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm com- bines, in an appropriate way, a flnite set of standard, scalar-reward learning algo- rithms. Su–cient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.