Policy Evaluation Using the Ω-Return

Part of Advances in Neural Information Processing Systems 28 (NIPS 2015)

Bibtex Metadata Paper Reviews Supplemental

Authors

Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris

Abstract

We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.