Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper reformulates the attention component of the classic Transformer language model in terms of tensor operations. The reviewers agree that the proposed model is well motivated and that the reduction in parameters achieved is significant. As such this paper is worthy of publication. However the reviewers also note a number of issues with the clarity of the presentation, general grammatical errors, and errors in the accompanying code. All of these issues must be addressed before publication. It is also required to add a more complete evaluation across a range of parameters scales for the tensorised model and the baseline, and to include the total flops used.