NeurIPS 2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Meta Review

This is a compelling paper which covers a lot of ground while keeping the presentation accessible and engaging for the reader. It analyzes the behavior of various approximate natural gradient methods in the infinite width limit, including unitwise, quasi-diagonal, and the various forms of K-FAC (including the notorious block tridiagonal one). Interestingly, it finds that the K-FAC approximations match the exact NGD trajectory in function space but not weight space. The paper answers quite a lot of questions which are natural to ask, and (having worked a lot in this area) I found the answers interesting and novel. The reviewers seem to have checked it over pretty carefully and didn't spot any problems. The paper is well written, and the authors have clearly paid a lot of attention to the presentation of the ideas. The reviewers feel their concerns have been addressed well in the rebuttal. I recommend acceptance as a spotlight or oral.