Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper presents an alternative training regime for the BERT contextual embedding model that incorporates additional conditioning contexts such as left to right language modelling and sequence transduction. The reviewers agree that the work is well motivated and is a reasonable attempt to address some of the issues with the original BERT model. The results are suitably strong, and as such this paper is likely to be of interest to those working on contextual embedding models, although it is puzzling that a classic language modelling perplexity evaluation was not included, given this is one of the objectives that the model optimises. The author's final paper should incorporate the answers to the questions raised by the reviewers.