Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The authors present a new form of normalization for deep networks called RMSNorm. This normalization acts like layer normalization but without mean centering. Because the method only requires a single pass of statistics calculations, the authors demonstrate improved training times for both machine translation and image caption retrieval while maintaining predictive accuracy. As commented by the reviewers, the paper is clearly written; the results are clearly presented and the experiments are quite thorough (different ML systems; ML architectures). In sum, the results and convincing (1 reviewer upgrade their score accordingly) and the results are use-able by those that build language models and potentially other forms of deep networks that require normalization schemes. For these reasons, assuming the authors revise the paper to address all reviewer comments, this paper is accepted to this conference.