TD(0) Leads to Better Policies than Approximate Value Iteration

Part of Advances in Neural Information Processing Systems 18 (NIPS 2005)

Bibtex »Metadata »Paper »

Authors

Benjamin Roy

Abstract

Abstract Unavailable