This paper studies the early training phase of a two-layer neural network and shows that the dynamics remain close to that of a linear model provided that the input data is well behaved. Initially, the paper received mixed reviews (marginally below, top 50%, accept, accept). On the positive side, R2 finds the results novel and quite interesting and the experiments nice, R3 finds the paper interesting, well written, with good experiments, R4 finds the paper presents new theoretical and experimental insights, the analysis intuitive and insightful, and the requirement in width reasonable. On the negative side, R2 finds the results are not complete (limited to one hidden layer) and the experiments not sufficient, and R3 and R4 find the data distribution assumption somewhat restrictive. The rebuttal argues that many interesting deep learning theory papers focus on two-layer networks and that this case is already challenging. I also find that the rebuttal successfully responds to all critiques from R1. Post rebuttal, the negative review was upgraded to accept, so now all reviewers recommend acceptance.