Reviews: A Tensorized Transformer for Language Modeling

This paper reformulates the attention component of the classic Transformer language model in terms of tensor operations. The reviewers agree that the proposed model is well motivated and that the reduction in parameters achieved is significant. As such this paper is worthy of publication. However the reviewers also note a number of issues with the clarity of the presentation, general grammatical errors, and errors in the accompanying code. All of these issues must be addressed before publication. It is also required to add a more complete evaluation across a range of parameters scales for the tensorised model and the baseline, and to include the total flops used.

Paper ID:	1322
Title:	A Tensorized Transformer for Language Modeling