NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper introduces a new complexity measure for MDPs, the expected hitting costs. In contrast to former complexity measures, the hitting costs also depend on the reward of the MDP and can provide a tighter bound for UCRL2. The theory also provides an intersting connection between reward shapeing and the complexity of a MDP. All reviewers appreciated the strong theoretical contribution of the paper which improves our theoretical understanding of the complexity of MDPs. The reviewers also liked that the paper is well written and establishes connections to reward shaping, a method that has also a highly practical value. All reviewers recommend acceptance and I agree with their assessment.