All four knowledgeable reviewers support acceptance. In particular, the reviewers found that the analysis was correct and the empirical evidence impressive. Therefore, I recommend accept. I recommend that the authors cite these earlier papers which propose a related scaling: Balduzzi, D., Frean, M., Leary, L., Lewis, J.P., Ma, K.W.D. and McWilliams, B., 2017. The shattered gradients problem: If resnets are the answer, then what is the question?. arXiv preprint arXiv:1702.08591. Gehring, J., Auli, M., Grangier, D., Yarats, D. and Dauphin, Y.N., 2017. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122.