Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)
This paper describes some of the interactions of model learning algorithms and planning algorithms we have found in exploring model-based reinforcement learning. The paper focuses on how lo(cid:173) cal trajectory optimizers can be used effectively with learned non(cid:173) parametric models. We find that trajectory planners that are fully consistent with the learned model often have difficulty finding rea(cid:173) sonable plans in the early stages of learning. Trajectory planners that balance obeying the learned model with minimizing cost (or maximizing reward) often do better, even if the plan is not fully consistent with the learned model.