This paper provides a novel approach to integrating a pre-trained model into a sequence-to-sequence model by decoupling the parameters of a light-weight adapter module from those of the pre-trained BERT models. The approach is task-agnostic and can be deployed in different problems and they show strong results in speedup and BLEU scores for non-autoregressive and auto-regressive machine translation. The reviewers all agreed that it is worth publishing. The author response was detailed and appropriate and there was some further reviewer discussion which led to a consensus accept. The paper is intuitive and simple and I think it will be an interesting addition to NeurIPS2020.