Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper develops a variational inference algorithm for modelling discrete state action MDPs. The model can be used to capture the correlations inherent between the states of an MDP. For doing so, Polya-Gamma auxiliary variables have been used, which has been proposed before. The contribution of the paper is the variational inference algorithm instead of using Block-based Gibbs sampling as in the original model. The paper attacks a very important problem, follows a nice idea and is well executed. We had 3 positive initial reviews. The reviews appreciated the contribution of the paper and the generality of the approach as well as the experiments. However, all reviews have been with a rather low confidence. We therefore acquired a 4th (emergency) review that was unfortunately not available for the rebuttal phase. The 4th review was very exhaustive but less positive. The main concerns of the reviewer were: - Missing comparisions to more convincing baselines. In particular, a comparison to the Gibbs sampling approach from  is crucial as this is the direct predecessor algorithm. - More complex experiments are needed to evaluate the scale-ability of the approach. I do agree with these points, but think they have been partially addressed in the rebuttal (the authors were not aware of R4). The authors stated results from a more complex problem domain (grid world with walls) and also explained that  does not contain a hyperparameter optimization procedure. I would still appreciate a comparison to . but also given that the authors did not had a chance to reply to R4, I tend to still accept the paper as it contains very good ideas and the Poly Gamma model has been used for the first time for MDPs. However, the authors should extend the experimental section with the results given in the rebuttal.