This paper studies SGD dynamics for two-layers neural networks by approximating it by a mean field diffusion focusing on a large width limit. In a nutshell the authors bound the distance between SGD iterates and that of the mean field dynamic. Using this theory the authors also study the effect of the scaling of the step size. All reviewers thought the paper was interesting, in particular the regime change result. Reviewer 2 had some concerns about the strictness of the assumptions which were mitigated based on the authors’ response. I concur with the positive reviews and recommend the paper to be accepted. I do want to note that the reviewers raised very valid points in their feedback including “extended discussion of how to interpret the learning rate in terms of alpha, beta, N, n” and “that generally the bounds produced in this article will grow exponentially with depth” raised by reviewer 1 and “It is unclear what insight emerges from this new analysis that was not already in previous work” from Reviewer 3. The authors should address these concerns in their final manuscript.