TD(0) Leads to Better Policies than Approximate Value Iteration

Roy, Benjamin

TD(0) Leads to Better Policies than Approximate Value Iteration

Benjamin V. Roy

Advances in Neural Information Processing Systems 18 (NIPS 2005)

Abstract

We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to having projection weights equal to the invariant distribution of the resulting policy. Such projection weighting leads to the same fixed points as TD(0). Our analysis also leads to the first performance loss bound for approximate value iteration with an average cost objective.

Abstract

Name Change Policy