Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper proves a new upper bound for the generalization ability of algorithms trained by SGD, which demonstrate a negative correlation with the ratio of batch size to learning rate. The authors conducted experiments to verify the theoretical findings on a large number of models. The reviewers have mixed opinions on the paper. On one hand, the paper studies an important problem to the deep learning community, and the theoretical result has its uniqueness (e.g., regarding the ratio of batch size to learning rate), although some discussions on its correlation with previous PAC bounds are missing and some assumptions in the theory need more justifications. On the other hand, the suggestions resulting from the experiments (e.g., always increase the learning rate) seem not very reasonable and need more empirical verifications. Especially for more complex models for which local optima are not easy to find and are not very stable, the experimental observations could be totally different.