Reviews: Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

This is a well-written and thorough paper, where the idea is to compare recent successful neural-network models for language processing (both pure classification and transfer/representation-learning models) with human brain data. This is an important goal, because humans have neural networks that can make sense of language, and there may be a lot that the machine learning community can learn from understanding these exemplary language processors better. Indeed, a teleological explanation of how human or artificial neural-networks process language would be an enormous breakthrough in language science. The reviewers are positive about this paper and therefore I support them in recommending acceptance. To add to the positive aspects that they point out, I have a few concerns about the framing, which I list below in case they can improve the camera ready submission: 1) The abstract states that: "it is still unclear what the representations learned by these networks correspond to". It seems to me that this paper does not really answer the question it poses in the first line of the abstract. I would recommend revising this rhetoric, because I don't think the paper shows what the representations learned by big neural networks correspond to (at least in terms of things in language or things in the world). Even if we could determine what patterns of activation in deep neural networks correspond to it 's not clear that that would provide a causal explanation of how the network computes predictions or behaviour given stimuli. This is question that much neuroscience has engaged with, and I feel that it could be brought out by the authors in this work. Certainly identifying correlations, which is the approach of this paper, is very different from helping to explain something. According to the scientific method, explaining something should really involve a 'theory' of how the model is doing something, followed by a hypothesis test. 2) The paper claims to 'improve and interpret' NLP models (or BERT), but I'm still unsure whether it does the claim to improve BERT is fair. The improvement that is exhibited is on a very esoteric task, that is more like a probe task than a practical application. To show that BERT has been been improved should probably involve evaluations on the standard tasks on which BERT was originally evaluated.

Paper ID:	8531
Title:	Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)