Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper derives generalization bounds for overparametrized deep residual networks learned by gradient descent from random initialization. All reviewers appreciate the importance of the topic of the paper. However, R1 and R3 feel that the contribution is too close to prior art, including   and another NeurIPS submission. On the other hand, R2 thinks that the contributions relative to prior art are meaningful and vouches for acceptance. The rebuttal successfully addresses the differences:  focuses on optimization with squared loss, while this submission focuses on generalization with cross entropy loss. The Wide and Deep paper focuses on optimization and generalization for fully connected networks trained by SGD, while this paper focuses on residual networks trained with gradient descent. This AC sides with R2 assessment that there are enough differences relative to prior art to justify acceptance.