Reviews: Inducing brain-relevant bias in natural language processing models

Notwithstanding the significant contributions of the paper discussed above, there is perhaps too much focus on overall performance metrics and too little scientific investigation of how BERT and brain data relate to each other. How do representations in the pre-trained BERT model and the brain-fine-tuned BERT model encode linguistic information, and what is gained by fine-tuning? It would be interesting to compare the two models to try and understand what brain-relevant linguistic information is absent from the original BERT instantiation. It is arguably a bit tautological to argue that “fine-tuning models to predict recordings of brain activity … will lead to representations that encode more brain-activity relevant language information” (Abstract lines 6-7; Also lines 143-144). Isn’t the more interesting question about what commonalities exist between brain representations of linguistic information and BERT representations of linguistic information? What kind of language information is “brain-activity relevant language information”? For example, emotional sentiment may be a particularly important aspect of the brain data, but not a very salient component of the representations in BERT, though sentiment may be decodable from BERTs embeddings (and thus can be magnified by the brain fine tuning process). Lines 120-121: 20 words are used as the relevant context for each fMRI image, but these 20 words do not respect sentence boundaries. Does this mean that the input to BERT during fine-tuning does not respect sentence boundaries? If so, this seems undesirable - - it introduces a discrepancy between the format of the input in the initial BERT training and the format of the input in the current fine-tuning. Therefore, we cannot be sure whether changes to BERTs linguistic representations as a result of fine-tuning are a result of the neurocogntive signal relating to the input, or to the new input scheme (e.g. when it comes to the GLUE evaluations).

This paper addresses a young and exciting area of research linking cognitive neuroscience and artificial intelligence. While the methods are mostly reasonable, I find the paper very lacking in both framing and interpretation of results. ## Significance I don’t think this study is well motivated anywhere in the paper. Why is it of interest whether or not a natural language processing model can accurately model brain activations across participants? What do we expect to learn (about either brains or models) by finding the answer to that question? Is it reasonable to think that the results could have been otherwise — that, for example, fine-tuning on predicting brain activations would have made the model worse at predicting brain activations (within-subject or across-subject)? The introduction suggests that this paper might be of interest re: combining information from multiple neuroimaging modalities. But the simple co-training method is not particularly interesting in this respect — for contrast, compare with e.g. other computational models for fusing multimodal data [see e.g. 2]. ## Quality ### Interpretation The paper does not attempt to provide any explanation about why prediction performance changes the way it does. The most substantial analysis I could find was on p. 7: “the fine-tuning must be changing something about how the model encodes language to improve this prediction accuracy.” The results further show that this “something” is at least partially shared between imaging modalities. What is it about predicting brain activation data, then, that isn’t already present in the pre-trained BERT model? If the framing of this research has to do with learning about either artificial neural network models or human language processing, then it’d be good to have an answer to this question. ### Evaluation It is very unclear how to interpret the NLP evaluation results in Table 1. Most of the quantitative changes are very small. What would the difference in performance between N random restarts of the BERT model look like in this table? It might be interesting/useful to run a statistical evaluation (pairwise t-test or sign test, depending on the data) to better understand the changes between the vanilla and fine-tuned models. 20 vs. 20 is a very coarse evaluation measure and not well justified a priori — this will seem strange to audiences less familiar with brain encoding/decoding. Please explain why it is a reasonable measure for the model selection you are doing in this paper (see e.g. [3] p238 left column). ## Clarity Figure 2 is quite difficult to interpret at a glance — it would be useful to have a higher-level summary figure of some sort. ## Originality I believe this work is original, though its methods largely overlap with those of [1]. It is a valid separate line of work from the more common brain encoding/decoding papers, which don’t investigate fine-tuning. ## References [1] Schwartz, D., & Mitchell, T. (2019). Understanding language-elicited EEG data by predicting it from a fine-tuned language model. arXiv preprint arXiv:1904.01548. [2] Calhoun, V., Adali, T., & Liu, J. (2006, August). A feature-based approach to combine functional MRI, structural MRI and EEG brain imaging data. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 3672-3675). IEEE. [3] Wehbe, L., Vaswani, A., Knight, K., & Mitchell, T. (2014, October). Aligning context-based statistical models of language with brain activity during reading. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 233-243).

Paper ID:	7868
Title:	Inducing brain-relevant bias in natural language processing models

Reviewer 1

Reviewer 2

Reviewer 3

Reviewer 4