Title:Fast Structured Decoding for Sequence Models

The reviewers and I find the paper interesting, especially because such a simple approach performs favorably in comparison with non-autoregressive and expressive autoregressive models for machine translation. I recommend acceptance as a poster given that the reviewers raise several concerns about the original manuscript. I ask the authors to change the title as agreed in the rebuttal by using terms such as low-latency, fast, etc. It seems that the paper uses approximate partition function for training which is is not explained in details. The theoretical properties of such an approximation may be interesting to study. The submission should cite and discuss relevant previous work on combining neural networks with CRF for sequence labeling such as Andor et al., 2016: and Collobert et al. 2011: