Action-Gap Phenomenon in Reinforcement Learning

Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)

Bibtex »Metadata »Paper »SpotlightSlide »

Authors

Amir-massoud Farahmand

Abstract

<p>Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an agent following the greedy policy (\hat{\pi}) with respect to an action-value function (\hat{Q}), the performance loss (E[V^<em>(X) - V^{\hat{X}} (X)]) is upper bounded by (O(|| \hat{Q} - Q^</em>||_\infty^{1+\zeta})), in which (\zeta &gt;= 0) is the parameter quantifying the action-gap regularity. For (\zeta &gt; 0), our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms.</p>