NeurIPS 2020

Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods

Meta Review

After a discussion with the reviewers, I converged towards recommending to accept this submission. The reviewers raised the following aspects: 1) The perspective is novel, and has interesting potential. 2) The approximations seem very strong. 3) Experiments are not conclusive. Re 1: all reviewers agree that this is a pro for the paper and should be considered its main strength. The authors agree (rebuttal, lines 23-25). Re 2: R3 believes that questioning the approximations is a valid point. However, as the authors argue, they have provided sufficient empirical evidence for mini-batch Gaussianity in appendix B, and Gaussianity is sometimes assumed without further justification in other Bayesian inference applications as well, simply to keep the computations tractable. Even if the assumptions are not fully realistic, they seem to be "less concerning than those in past work" (rebuttal, line 19). R3 appreciated that the authors were aware of them and honest about the limitations of their approach, and that they have further ideas how to improve on them in the future. Re 3: This is the main weakness. R3 accepts that the main contribution of the paper is "understanding" from a different perspective. However, this does not imply that there is no need to show that this knowledge can be "transferred" into practical improvements. R3 shared their ideas with the authors in the updated review. There is no consensus whether point 3 not being addressed outweighs the submission's "interestingness" to the community, in a way that others could build on these insights in the future. Further, all "reject" reviewers indicated low confidence in their assessment. Given the above, I decided to recommend to accept this submission.