Overall, the reviewers appreciated this paper and thought it was an interesting contribution to the literature. Because of this I am recommending that the paper be accepted. However, there was one major request from the Reviewer 3 (and me) that the authors appropriately frame their discussion of previous work, specifically . Quoting reviewer 3's discussion directly: "The authors claim to be first to directly supervise attention; which has been done before with token-level annotation, and is exactly what  does with gaze. This false claim of novelty is problematic, but also unnecessary, since this is a great paper that already makes decent contributions, eg smart pretraining." I would highly encourage the authors to take this comment in earnest, and give the relationship with  proper treatment and explaining the additional contributions that this paper makes on top of it.