NeurIPS 2020

What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes

Meta Review

This paper tackles an interesting problem of explaining agent behavior in RL and doing so in the form of future events. The theoretical claims, experiments, and writing of this paper are done well. However, the paper suffers from the following drawbacks: 1. Writing about certain assumptions is somewhat unclear -- e.g., the assumption that there exists a mismatch between agent's model and true environment -- this is a major underlying assumption which is not clarified upfront. It would also be good to show that this assumption indeed holds in the real world with some experiments. 2. There is no clarification on whether the resulting explanation is local or global until much later in the paper.