CogLTX: Applying BERT to Long Texts

Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

AuthorFeedback »Bibtex »MetaReview »Paper »Review »Supplemental »

Authors

Ming Ding, Chang Zhou, Hongxia Yang, Jie Tang

Abstract

BERTs are incapable of processing long texts due to its quadratically increasing memory and time consumption. The straightforward thoughts to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels. The limited text length of BERT reminds us the limited capacity (5∼ 9 chunks) of the working memory of humans – then how do human beings Cognize Long TeXts? Founded on the cognitive theory stemming from Baddeley, our CogLTX framework identifies key sentences by training a judge model, concatenates them for reasoning and enables multi-step reasoning via rehearsal and decay. Since relevance annotations are usually unavailable, we propose to use treatment experiments to create supervision. As a general algorithm, CogLTX outperforms or gets comparable results to SOTA models on NewsQA, HotpotQA, multi-class and multi-label long-text classification tasks with memory overheads independent of the text length.