{"title": "Defending Against Neural Fake News", "book": "Advances in Neural Information Processing Systems", "page_first": 9054, "page_last": 9065, "abstract": "Recent progress in natural language generation has raised dual-use concerns. While applications like summarization and translation are positive, the underlying technology also might enable adversaries to generate neural fake news: targeted propaganda that closely mimics the style of real news.\n\nModern computer security relies on careful threat modeling: identifying potential threats and vulnerabilities from an adversary's point of view, and exploring potential mitigations to these threats. Likewise, developing robust defenses against neural fake news requires us first to carefully investigate and characterize the risks of these models. We thus present a model for controllable text generation called Grover. Given a headline like 'Link Found Between Vaccines and Autism,' Grover can generate the rest of the article; humans find these generations to be more trustworthy than human-written disinformation.\n\nDeveloping robust verification techniques against generators like Grover is critical. We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data. Counterintuitively, the best defense against Grover turns out to be Grover itself, with 92% accuracy, demonstrating the importance of public release of strong generators. We investigate these results further, showing that exposure bias -- and sampling strategies that alleviate its effects -- both leave artifacts that similar discriminators can pick up on. We conclude by discussing ethical issues regarding the technology, and plan to release Grover publicly, helping pave the way for better detection of neural fake news.", "full_text": "Defending Against Neural Fake News\n\nRowan Zellers\u007f, Ari Holtzman\u007f, Hannah Rashkin\u007f, Yonatan Bisk\u007f\n\nAli Farhadi\u007f~, Franziska Roesner\u007f, Yejin Choi\u007f~\n\n\u007fPaul G. Allen School of Computer Science & Engineering, University of Washington\n\n~Allen Institute for Arti\ufb01cial Intelligence\nhttps://rowanzellers.com/grover\n\nAbstract\n\nRecent progress in natural language generation has raised dual-use concerns. While\napplications like summarization and translation are positive, the underlying tech-\nnology also might enable adversaries to generate neural fake news: targeted propa-\nganda that closely mimics the style of real news.\nModern computer security relies on careful threat modeling: identifying potential\nthreats and vulnerabilities from an adversary\u2019s point of view, and exploring potential\nmitigations to these threats. Likewise, developing robust defenses against neural\nfake news requires us \ufb01rst to carefully investigate and characterize the risks of these\nmodels. We thus present a model for controllable text generation called Grover.\nGiven a headline like \u2018Link Found Between Vaccines and Autism,\u2019 Grover can\ngenerate the rest of the article; humans \ufb01nd these generations to be more trustworthy\nthan human-written disinformation.\nDeveloping robust veri\ufb01cation techniques against generators like Grover is critical.\nWe \ufb01nd that best current discriminators can classify neural fake news from real,\nhuman-written, news with 73% accuracy, assuming access to a moderate level of\ntraining data. Counterintuitively, the best defense against Grover turns out to be\nGrover itself, with 92% accuracy, demonstrating the importance of public release\nof strong generators. We investigate these results further, showing that exposure\nbias \u2013 and sampling strategies that alleviate its e\u21b5ects \u2013 both leave artifacts that\nsimilar discriminators can pick up on. We conclude by discussing ethical issues\nregarding the technology, and plan to release Grover publicly, helping pave the\nway for better detection of neural fake news.\n\n1\n\nIntroduction\n\nOnline fake news \u2013 news designed to intentionally deceive \u2013 has recently emerged as a major\nsocietal problem. Malicious actors spread fallacious viral stories in order to gain advertising revenue,\nin\ufb02uence opinions, and even tip elections (Faris et al., 2017; Wardle and Derakhshan, 2017). As such,\ncountering the spread of disinformation online presents an urgent technical and political issue.\nTo the best of our knowledge, most disinformation online today is manually written (Vargo et al., 2018).\nHowever, as progress continues in natural language generation, malicious actors will increasingly be\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\nSCIENCE\n\nSUBSCRIBE NOW\n\nLOG IN\n\nLink Found Between Vaccines and Autism\n\nBy Paul Waldman May 29, 2019\n\nThe Newest York Times\n\nf a k e\n\nThose who have been vaccinated against measles have a more than \n5-fold higher chance of developing autism, researchers at the \nUniversity of California San Diego School of Medicine and the \nCenters for Disease Control and Prevention report today in the \nJournal of Epidemiology and Community Health. (continued)\n\nFake \nnews!\n\nNews \n\nVeri\ufb01cation\n\nFake News\nGeneration\n\nFigure 1: In this paper, we explore Grover, a model which can detect and generate neural fake news.\nHumans \ufb01nd the articles dicult to distinguish from \u201creal news\u201d without high levels of scrutiny.\n\n\fable to controllably generate realistic-looking propaganda at scale. Thus, while we are excited about\nrecent progress in text generation (J\u00f3zefowicz et al., 2016; Radford et al., 2018; 2019), we are also\nconcerned with the inevitability of AI-generated \u2018neural\u2019 fake news.1\nWith this paper, we seek to understand and respond to neural fake news before it manifests at scale.\nWe draw on the \ufb01eld of computer security, which relies on threat modeling: analyzing the space of\npotential threats and vulnerabilities in a system to develop robust defenses. To scienti\ufb01cally study the\nrisks of neural disinformation, we present a new generative model called Grover.2 Our model allows\nfor controllable yet ecient generation of an entire news article \u2013 not just the body, but also the\ntitle, news source, publication date, and author list. This lets us study an adversary with controllable\ngenerations (e.g. Figure 1, an example anti-vaccine article written in the style of the New York\nTimes).\nHumans rate the disinformation generated by Grover as trustworthy, even more so than human-\nwritten disinformation. Thus, developing robust veri\ufb01cation techniques against generators such as\nGrover is an important research area. We consider a setting in which a discriminator has access\nto 5000 Grover generations, but unlimited access to real news. In this setting, the best existing\nfake news discriminators are, themselves, deep pretrained language models (73% accuracy) (Peters\net al., 2018; Radford et al., 2018; 2019; Devlin et al., 2018). However, we \ufb01nd that Grover, when\nused in a discriminative setting, performs even better at 92% accuracy. This \ufb01nding represents an\nexciting opportunity for defense against neural fake news: the best models for generating neural\ndisinformation are also the best models at detecting it.\nNext, we investigate how deep pretrained language models distinguish between real and machine-\ngenerated text. We \ufb01nd that key artifacts are introduced during generation as a result of exposure bias:\nthe generator is not perfect, so randomly sampling from its distribution results in generations that fall\nincreasingly out-of-distribution as length increases. However, sampling strategies that alleviate these\ne\u21b5ects also introduce artifacts that strong discriminators can pick up on.\nWe conclude with a sketch of the ethical territory that must be mapped out in order to understand our\nresponsibilities as researchers when studying fake news, and the potential negative implications of\nreleasing models (Hecht et al., 2018; Zellers, 2019; Solaiman et al., 2019). Accordingly, we suggest\na provisional policy of how such models should be released and why we believe it to be safe \u2013 and\nperhaps even imperative \u2013 to do so. We believe our proposed framework and accompanying models\nprovide a concrete initial proposal for an evolving conversation about ML-based disinformation\nthreats and how they can be countered.\n\n2 Fake News in a Neural and Adversarial Setting\n\nWe present a framework \u2013 motivated by today\u2019s dynamics of manually created fake news \u2013 for\nunderstanding what adversaries will attempt with deep models, and how veri\ufb01ers should respond.\n\nScope of fake news. There are many types of false news, ranging from satire to propaganda\n(Wardle, 2017). In this paper, we focus on text-only documents formatted as news articles: stories\nand their corresponding metadata that contain purposefully false information. Existing fake news is\npredominantly human-written, for two broad goals: monetization (ad revenue through clicks) and\npropaganda (communicating targeted information) (Bradshaw and Howard, 2017; Melford and Fagan,\n2019). Achieving either goal requires the adversary to be selective about the news that they make,\nwhether by producing only viral content, or content that advances a given agenda.\n\nFact checking and veri\ufb01cation: related work. There is considerable interest in \ufb01ghting online\ndisinformation. Major platforms such as Facebook prioritize trustworthy sources and shut down\naccounts linked to disinformation (Mosseri, 2018; Dwoskin and Romm, 2018). Some users of\nthese platforms avoid fake news with tools such as NewsGuard and Hoaxy (Shao et al., 2016) and\nwebsites like Snopes and PolitiFact. These services rely on manual fact-checking e\u21b5orts: verifying\nthe accuracy of claims, articles, and entire websites. E\u21b5orts to automate fake news detection generally\npoint out stylistic biases that exist in the text (Rashkin et al., 2017; Wang, 2017; P\u00e9rez-Rosas et al.,\n\n1 We thank past work, such as OpenAI\u2019s Staged Release Policy for GPT2 for drawing attention to neural\n\ndisinformation, alongside other dual-use implications.\n\n2Short for Generating aRticles by Only Viewing mEtadata Records.\n\n2\n\n\f2018). These e\u21b5orts can help moderators on social media platforms shut down suspicious accounts.\nHowever, fact checking is not a panacea \u2013 cognitive biases such as the back\ufb01re e\u21b5ect and con\ufb01rmation\nbias make humans liable to believe fake news that \ufb01ts their worldview (Swire et al., 2017).\n\nFramework. We cast fake news generation and detection as an adversarial game, with two players:\n\u2022 Adversary. Their goal is to generate fake stories that match speci\ufb01ed attributes: generally, being\nviral or persuasive. The stories must read realistically to both human users as well as the veri\ufb01er.\n\u2022 Veri\ufb01er. Their goal is to classify news stories as real or fake. The veri\ufb01er has access to unlimited\nreal news stories, but few fake news stories from a speci\ufb01c adversary. This setup matches the\nexisting landscape: when a platform blocks an account or website, their disinformative stories\nprovide training for the veri\ufb01er; but it is dicult to collect fake news from newly-created accounts.\nThe dual objectives of these two players suggest an escalating \u201carms race\u201d between attackers and\ndefenders. As veri\ufb01cation systems get better, so too will adversaries. We must therefore be prepared\nto deal with ever-stronger adversarial attacks, which is the focus of the next section.\n\n3 Grover: Modeling Conditional Generation of Neural Fake News\n\nGiven existing online disinformation, we have reason to believe adversaries will try to generate\ntargeted content (e.g. clickbait and propaganda). Recently introduced large-scale generative models\nproduce realistic-looking text (Radford et al., 2019), but they do not lend themselves to producing\ncontrollable generations (Hu et al., 2017).3 Therefore, to probe the feasibility of realistic-looking\nneural fake news, we introduce Grover, which produces both realistic and controlled generations.\nThe current state-of-the-art in unconditional text generation views it as a language modeling problem\n(Bengio et al., 2003), in which the probability of a document x is the product of the conditional\nprobability of generating each token xi given previous tokens:\nppxi|x1 . . . xi\u00b41q.\n\nppxq \u201c\n\n(1)\n\nN\u03c0i\u201c1\n\nThe document is typically treated as a single unstructured text \ufb01eld, beginning with a <start> token\nand ending with an <end> token. The latter, <end>, is particularly important because it indicates\nthe end of the \ufb01eld, and when to should stop generating. However, a news article has necessary\nstructure beyond the running text, or body \ufb01eld. Metadata \ufb01elds include the domain where the article\nis published (indirectly marking the style), the date of publication, the names of the authors, and\nthe headline of the article itself. Not only does generating a news article require producing all of\nthese components, these \ufb01elds also allow signi\ufb01cant control over the generations (e.g. specifying a\nheadline helps control the generated body). An article can be modeled by the joint distribution:\n\nppdomain, date, authors, headline, bodyq.\n\n(2)\n\n1 , x f1\n\n2 , . . . , x f|F|\n| f|F||\n\nHowever, it is not immediately obvious how to sample from Equation 2. One option is to de\ufb01ne a\ncanonical order among the article\u2019s \ufb01elds F : ( f1\u2020 f2\u2020. . .\u2020 f|F|), and model the article left-to-right in\nthat order using Equation 1: x f1\n. However, this ordering would forbid sampling certain\n\ufb01elds without prohibitively expensive marginalization. Alternatively, one could generate \ufb01elds in any\norder, but this requires the model to learn to handle |F|! potential orderings during inference time.\nOur solution is Grover, a new approach for ecient learning and generation of multi-\ufb01eld docu-\nments. We adopt the language modeling framework of Equation 1 in a way that allows for \ufb02exible\ndecomposition of Equation 2. During inference time, we start with a set of \ufb01elds F as context, with\neach \ufb01eld f containing \ufb01eld-speci\ufb01c start and end tokens. We sort the \ufb01elds using a standard order4\nand combine the resulting tokens together. To generate a target \ufb01eld \u2327, we append the \ufb01eld-speci\ufb01c\nstart token <start\u00b4\u2327> to the context tokens; then, we sample from the model until we hit <end\u00b4\u2327>.\n3A common workaround is to have a human seed the text to provide context. However, this a) is a heavy\nhanded technique for biasing which may not capture the desired attributes, and b) leaves in place a human-written\nbeginning (as tokens are only generated left-to-right), which may create distributional artifacts.\n\n4Our ordering is the following \ufb01eld types in order: domain, date, authors, headline, and then the body.\n\n3\n\n\fContext\na)\n\nb)\n\nc)\n\ndomain\nwired.com\n\ndomain\nwired.com\n\ndomain\nwired.com\n\ndate\n\nMay 29, 2019\n\ndate\n\nMay 29, 2019\n\ndate\n\nMay 29, 2019\n\nauthors\n\nheadline\n\nNew Research Shows that \nVaccines Cause Autism\n\nheadline\n\nNew Research Shows that \nVaccines Cause Autism\n\nauthors\nJustin Furillo\n\nbody\n\nNew research from the \n\nUniversity of California, Davis, \n\nfinds that childhood \u2026\n\nbody\n\nNew research from the \n\nUniversity of California, Davis, \n\nfinds that childhood \u2026\n\nTarget\nNew research from the University of California, \n\nbody\n\nDavis, finds that childhood vaccinations \nthemselves can cause autism in some kids\u2026\n\nauthors\nJustin Furillo\n\nheadline\n\nVaccines Might Be a Bigger Threat to \nYour Child's Future Than You Realized\n\nFigure 2: A diagram of three Grover examples for article generation. In row a), the body is generated\nfrom partial context (the authors \ufb01eld is missing). In b), the model generates the authors. In c), the\nmodel uses the new generations to regenerate the provided headline to one that is more realistic.\n\nFigure 2 shows an example of using Grover to generate an anti-vaccine article. Here, the adversary\nspeci\ufb01es a domain, date, and headline. After Grover generates the body, it can be used to generate a\nfake author, before \ufb01nally generating a new and more appropriate headline.\nDuring training, we simulate inference by randomly partitioning an article\u2019s \ufb01elds into two disjoint\nsets F1 and F2. We also randomly drop out individual \ufb01elds with probability 10%, and drop out all\nbut the body with probability 35%. This allows the model to learn how to perform unconditional\ngeneration. We sort the metadata \ufb01elds in each set using our standard order, and concatenate the\nunderlying tokens. The model is then trained to minimize the cross-entropy of predicting the tokens\nin F1 followed by the tokens in F2.5\nArchitecture. We draw on recent progress in training large Transformers for language modeling\n(Vaswani et al., 2017), building Grover using the same architecture as for GPT2 (Radford et al.,\n2019). We consider three model sizes. Our smallest model, Grover-Base, has 12 layers and 124\nmillion parameters, on par with GPT and BERT-Base (Radford et al., 2018; Devlin et al., 2018). Our\nnext model, Grover-Large, has 24 layers and 355 million parameters, on par with BERT-Large. Our\nlargest model, Grover-Mega, has 48 layers and 1.5 billion parameters, on par with GPT2.\n\nDataset. We present RealNews, a large corpus of news articles from Common Crawl. Training\nGrover requires a large corpus of news articles with metadata, but none currently exists. Thus, we\nconstruct one by scraping dumps from Common Crawl, limiting ourselves to the 5000 news domains\nindexed by Google News. We used the Newspaper Python library to extract the body and meta-\ndata from each article. News from Common Crawl dumps from December 2016 through March 2019\nwere used as training data; articles published in April 2019 from the April 2019 dump were used for\nevaluation. After deduplication, RealNews is 120 gigabytes without compression.\n\nLearning. We trained each Grover model on randomly-sampled sequences from RealNews with\nlength 1024. Other optimization hyperparameters are in Appendix A. We trained Grover-Mega for\n800k iterations, using a batch size of 512 and 256 TPU v3 cores. Training time was two weeks.\n\n3.1 Language Modeling results: measuring the importance of data, context, and size\n\nWe validate Grover, versus standard unconditional language models, on the April 2019 test set. We\nconsider two evaluation modes: unconditional, where no context is provided and the model must\ngenerate the article body; and conditional, in which the full metadata is provided as context. In both\ncases, we calculate the perplexity only over the article body.\nOur results, shown in Figure 3, show several conclusions. First, Grover noticeably improves (between\n.6 to .9 perplexity points) when conditioned on metadata. Second, perplexity decreases with size,\nwith Grover-Mega obtaining 8.7 perplexity in the conditional setting. Third, the data distribution is\nstill important: though the GPT2 models with 124M parameters and 355M parameters respectively\nmatch our Grover-Base and Grover-Large architectures, our model is over 5 perplexity points lower\nin both cases, possibly because the OpenAI WebText corpus also contains non-news articles.\n\n5All tokens use the same vocabulary. By using a standard order, but partitioning the \ufb01elds into two sets, the\n\nmodel can generate any \ufb01eld conditioned on others while only needing to learn 2|F| orderings, versus |F|!.\n\n4\n\n\fFigure 3: Language Modeling results on the\nbody \ufb01eld of April 2019 articles. We evaluate\nin the Unconditional setting (without provided\nmetadata) as well as in the Conditional setting\n(with all metadata). Grover sees over a 0.6 point\ndrop in perplexity when given metadata.\n\nFigure 4: Human evaluation. For each article,\nthree annotators evaluated style, content, and\nthe overall trustworthiness; 100 articles of each\ncategory were used. The results show that propa-\nganda generated by Grover is rated more plausi-\nble than the original human-written propaganda.\n\n3.2 Carefully restricting the variance of generations with Nucleus Sampling\n\nSampling from Grover is straightforward as it behaves like a left-to-right language model during\ndecoding. However, the choice of decoding algorithm is important. While likelihood-maximization\nstrategies such as beam search work well for closed-ended generation tasks where the output contains\nthe same information as the context (like machine translation), these approaches have been shown\nto produce degenerate text during open-ended generation (Hashimoto et al., 2019; Holtzman et al.,\n2019). However, as we will show in Section 6, restricting the variance of generations is also crucial.\nIn this paper, we primarily use Nucleus Sampling (top-p): for a given threshold p, at each timestep\nwe sample from the most probable words whose cumulative probability comprises the top-p% of the\nentire vocabulary (Holtzman et al., 2019).6\n\n4 Humans are Easily Fooled by Grover-written Propaganda\n\nWe evaluate the quality of disinformation generated by our largest model, Grover-Mega, using p\u201c.96.\nWe consider four classes of articles: human-written articles from reputable news websites (Human\nNews), Grover-written articles conditioned on the same metadata (Machine News), human-written arti-\ncles from known propaganda websites (Human Propaganda), and Grover-written articles conditioned\non the propaganda metadata (Machine Propaganda).7 The domains used are in Appendix B; examples\nare in Appendix F. We asked a pool of quali\ufb01ed workers on Amazon Mechanical Turk to rate each\narticle on three dimensions: stylistic consistency, content sensibility, and overall trustworthiness.8\nResults (Figure 4) show a striking trend: though the quality of Grover-written news is not as high\nas human-written news, it is adept at rewriting propaganda. The overall trustworthiness score of\npropaganda increases from 2.19 to 2.42 (out of 3) when rewritten by Grover.9\n\n6In early experiments, we found Nucleus Sampling produced better and less-detectable generations than\nalternatives like top-k sampling, wherein the most probable k tokens are used at each timestep (Fan et al., 2018).\n7We use the technique described in Figure 2 to rewrite the propaganda: given the metadata, generate the\n\narticle \ufb01rst, and then rewrite the headline.\n\n8With these guidelines, we tried to separate style versus content. Overall trustworthiness asks \u2018Does the\narticle read like it comes from a trustworthy source?\u2019 which emphasizes style, while content sensibility asks\nwhether the content is believable on a semantic level.\n9This di\u21b5erence is statistically signi\ufb01cant at p \u201c 0.01. One possible hypothesis for this e\u21b5ect is that\nGrover ignores the provided context. To test this hypothesis, we did a human evaluation of the consistency\nof the article body with the headline, date, and author. We found that human-written propaganda articles are\nconsistent with the headline with an average score of 2.85 of 3 on the same 1-3 scale, while machine-written\npropaganda is consistent with 2.64 of 3.\n\n5\n\n\f5 Neural Fake News Detection\n\nThe high quality of neural fake news written by Grover, as judged by humans, makes automatic neural\nfake news detection an important research area. Using models (below) for the role of the Veri\ufb01er\ncan mitigate the harm of neural fake news by classifying articles as Human or Machine written. These\ndecisions can assist content moderators and end users in identifying likely (neural) disinformation.\na. Grover. We consider a version of our model adapted for discrimination. Similar to GPT (Radford\net al., 2018), we place a special [CLS] token at the end of each article, and extract the \ufb01nal hidden\nstate at that point. The hidden state is fed to a linear layer to predict the label Human or Machine.\nTo simulate real conditions, and ensure minimal overlap between the generator and discriminator\nparameters, we initialize Grover for discrimination using the checkpoint at iteration 700k, whereas\nthe generator uses the checkpoint at iteration 800k.\n\nb. GPT2, a 124M or 355M parameter pretrained Transformer language model. Similar to Grover,\n\nwe follow the GPT approach and extract the hidden state from a newly-added [CLS] token.\n\nc. BERT, a 110M parameter (BERT-Base) or 340M parameter (BERT-Large) bidirectional Trans-\nformer encoder commonly used for discriminative tasks. We perform domain adaptation to adapt\nBERT to the news domain, as well as to account for long articles; details in Appendix C.\n\nd. FastText, an o\u21b5-the-shelf library for bag-of-ngram text classi\ufb01cation (Joulin et al., 2017). Though\n\nnot pretrained, similar models do well at detecting human-written fake news.\n\nAll models are trained to minimize the cross-entropy loss of predicting the right label. Hyperparame-\nters used during discrimination are in Appendix D.\n\n5.1 A semi-supervised setting for neural fake news detection\n\nWhile there are many human-written articles online, most are from the distant past, whereas articles to\nbe detected will likely be set in the present. Likewise, there might be relatively few neural fake news\narticles from a given adversary.10 We thus frame neural fake news detection as a semi-supervised\nproblem. A neural veri\ufb01er (or discriminator) has access to many human-written news articles\nfrom March 2019 and before \u2013 the entire RealNews training set. However, it has limited access to\ngenerations, and more recent news articles. Using 10k news articles from April 2019, we generate\narticle body text; another 10k articles are used as a set of human-written news articles. We split the\narticles in a balanced way, with 10k for training (5k per label), 2k for validation, and 8k for testing.\nWe consider two evaluation modes. In the unpaired setting, a discriminator is provided single\nnews articles, and must classify each independently as Human or Machine. In the paired setting,\na model is given two news articles with the same metadata, one real and one machine-generated.\nThe discriminator must assign the machine-written article a higher Machine probability than the\nhuman-written article. We evaluate both modes in terms of accuracy.\n\n5.2 Discrimination results: Grover performs best at detecting Grover\u2019s fake news\n\nWe present experimental results in Table 1 for all generator and discriminator combinations. For\neach pair, we show the test results using the most adversarial generation hyperparameters (top-p) as\njudged on the validation set.11 The results show several trends. First, the paired setting appears much\neasier than the unpaired setting, suggesting that it is dicult for the model to calibrate its predictions.\nSecond, model size is highly important in the arms race between generators and discriminators. Using\nGrover to discriminate Grover\u2019s generations results in roughly 90% accuracy across the range of\nsizes. If a larger generator is used, accuracy slips below 81%; conversely, if the discriminator is\nlarger, accuracy is above 98%. Third, other discriminators perform worse than Grover overall, even\nwhen controlling for architecture size and (for both BERT models) the domain.\nThat Grover is the best discriminator is possibly surprising: being unidirectional, it is less expressive\nthan deep bidirectional models such as BERT.12 That the more expressive model here is not the best at\n\nto pin down a single generated model.\n\n10Moreover, since disinformation can be shared on a heterogeneous mix of platforms, it might be challenging\n11For each discriminator/generator pair, we search over p P t.9, .92, .94, .96, .98, 1.0u.\n12Indeed, bidirectional approaches perform best on leaderboards like GLUE (Wang et al., 2018).\n\n6\n\n\fTable 1: Results of discriminators versus gener-\nators, in both the paired and unpaired settings\nand across architecture sizes. We also vary the\ngeneration hyperparameters for each generator-\ndiscriminator pair, reporting the discrimination\ntest accuracy for the hyperparameters with the\nlowest validation accuracy. Compared with other\nmodels such as BERT, Grover is the best at de-\ntecting its own generations as neural fake news.\nPaired Accuracy\nGenerator size\n\nUnpaired Accuracy\n\nGenerator size\n\n1.5B 355M 124M 1.5B 355M 124M\n\ne\nz\ni\ns\n\nr\no\nt\na\nn\ni\nm\n\ni\nr\nc\ns\ni\nD\n\n355M\n\nChance\n\n50.0\n1.5B Grover-Mega 92.0 98.5\nGrover-Large 80.8 91.2\nBERT-Large 73.1 75.9\nGPT2\n70.1 78.0\nGrover-Base 70.1 80.0\n67.2 76.6\nBERT-Base\nGPT2\n66.2 71.9\n63.8 65.6\n\n11M FastText\n\n124M\n\n50.0\n\n97.4 100.0 100.0\n100.0\n89.0 96.9\n99.9\n84.1 91.5\n96.8\n78.8 87.0\n95.7\n77.5 88.2\n96.2\n80.0 89.5\n72.5 79.6\n89.6\n74.4\n65.9 69.0\n\n99.8\n98.4\n97.5\n90.3\n89.2\n84.1\n83.5\n69.7\n\nFigure 5: Exploring weak supervision for dis-\ncriminating Grover-Mega generations. With\nno weak supervision, the discriminator sees x\nmachine-written articles (from Grover Mega).\nFor `Grover-Base and `Grover-Mega, the dis-\ncriminator sees 5000\u00b4x machine-written articles\ngiven by the weaker generator in question. See-\ning weaker generations improves performance\nwhen few in-domain samples are given.\n\ndiscriminating between real and generated news articles suggests that neural fake news discrimination\nrequires having a similar inductive bias as the generator.13\n\n5.3 Weak supervision: what happens if we don\u2019t have access to Grover-Mega?\nThese results suggest that Grover is an e\u21b5ective discriminator when we have a medium number of\nfake news examples from the exact adversary that we will encounter at test time. What happens if we\nrelax this assumption? Here, we consider the problem of detecting an adversary who is generating\nnews with Grover-Mega and an unknown top-p threshold.14 In this setup, during training, we have\naccess to a weaker model (Grover-Base or Grover-Large). We consider the e\u21b5ect of having only x\nexamples from Grover-Mega, and sampling the missing 5000\u00b4x articles from one of the weaker\nmodels, where the top-p threshold is uniformly chosen for each article in the range of r0.9, 1.0s.\nWe show the results of this experiment in Figure 5. The results suggest that observing additional\ngenerations greatly helps discrimination performance when few examples of Grover-Mega are\navailable: weak supervision with between 16 and 256 examples from Grover-Large yields around\n78% accuracy, while accuracy remains around 50% without weak supervision. As the portion of\nexamples that come from Grover-Mega increases, however, accuracy converges to 92%.15\n\n6 How does a model distinguish between human and machine text?\n\nIn this section, we explore why Grover performs best at detecting fake news generated by other\nGrover models. We \ufb01nd that there is a double-bind between exposure bias and variance-reduction\nalgorithms that alleviate these biases while at the same time creating other artifacts.\n\nExposure Bias. Models maximizing Equation 1 are trained only conditioned on human-written\ntext, never on its own generations, creating a problem known as exposure bias (Ranzato et al., 2016).\nWe investigate the importance of exposure bias towards creating artifacts. In Figure 6 we plot the\nperplexities given by Grover-Mega over each position for body text at top-p thresholds of 0.96\nand 1, as well as over human text. Generating the \ufb01rst token after <startbody> results in high\n\nwritten by a \ufb01netuned GPT model, a GPT discriminator outperforms BERT-Base at picking out human text.\n\n13This matches \ufb01ndings on the HellaSwag dataset (Zellers et al., 2019b). Given human text and machine text\n14The top-p threshold used was p\u201c0.96, but we are not supposed to know this!\n15In additional experiments we show that accuracy increases even more \u2013 up to 98% \u2013 when the number of\nexamples is increased (Zellers et al., 2019c). We also \ufb01nd that Grover when trained to discriminate between real\nand fake Grover-generated news can detect GPT2-Mega generated news as fake with 96% accuracy.\n\n7\n\n\fFigure 6: Perplexities of Grover-Mega, averaged over\neach position in the body (after conditioning on meta-\ndata). We consider human-written with Grover-Mega\ngenerated text at p\u201c1 (random sampling) and p\u201c.96.\nThe perplexity of randomly sampled text is higher than\nhuman-written text, and the gap increases with position.\nThis suggests that sampling without variance reduction\nincreasingly falls out-of-distribution.\n\nFigure 7: Unpaired validation accuracy,\ntelling apart generated news articles (from\nGrover Mega) from real articles, at di\u21b5er-\nent variance reduction thresholds p (for\nNucleus Sampling). Results varying p\nshow a sweet spot (p \u201c 0.92 \u2013 0.96)\nwherein discrimination is hardest.\n\nperplexity. However, the rest of the positions show a curious pattern: the perplexity of human-written\ntext is lower than randomly sampled text, and this gap increases with sequence length, suggesting\nthat random sampling causes Grover to fall increasingly out of the distribution of human language.\nHowever, limiting the variance (p\u201c0.96) lowers the resulting perplexity and limits its growth.\nLimiting the variance of a model also creates artifacts On the other hand, clipping the model\u2019s\nvariance also leaves an artifact, as prior work has observed for top-k sampling (Strobelt and Gehrmann,\n2019). A similar phenomenon holds for Nucleus (top-p) sampling. The probability of observing a\nhuman-written article where all tokens are drawn from the top-p% of the distribution is pn, where n\nis the document\u2019s length. This probability goes to zero as n increases. However, for Nucleus Sampled\ntext \u2013 in which the \ufb01nal 1\u00b4p is cut o\u21b5 \u2013 all tokens come from the top-p.\nThe visibility of the artifacts depends on the choice of discriminator. The top-p at each timestep\nis calculated under the generator\u2019s worldview, meaning that if the discriminator models text in a\ndi\u21b5erent way, it might have a harder time pinpointing the empty 1\u00b4p tail. This could explain BERT\u2019s\nlower performance during discrimination.\n\nA sweet spot of careful variance reduction Not reducing the variance, as well as signi\ufb01cantly\nreducing the variance, both cause problems. Might there be a sweet spot for how much to truncate\nthe variance, to make discrimination maximally hard? In Figure 7, we show results varying the\ntop-p threshold for the discrimination task applied to Grover-Mega\u2019s generations. The results indeed\nshow a sweet spot, roughly between p\u201c0.92 and p\u201c0.98 depending on the discriminator, wherein\ndiscrimination is hardest. Interestingly, we note that the most adversarial top-p threshold for BERT-\nLarge is considerably lower than the corresponding top-p for Grover-Large of the same size. This\nsupports our hypothesis that BERT\u2019s view of language di\u21b5ers markedly from Grover; using a lower\ntop-p threshold does not seem to give it much more information about the missing tail.\nOverall, our analysis suggests that Grover might be the best at catching Grover because it is the\nbest at knowing where the tail is, and thus whether it was truncated.\n\n7 Conclusion: a Release Strategy for Grover\n\nThis paper investigates the threats posed by adversaries seeking to spread disinformation. Our sketch\nof what these threats might look like \u2013 a controllable language model named Grover \u2013 suggests that\nthese threats are real and dangerous. Grover can rewrite propaganda articles, with humans rating the\nrewritten versions as more trustworthy. At the same time, there are defenses to these models \u2013 notably,\nin the form of Grover itself. We conclude with a discussion of next steps and ethical considerations.\n\n8\n\n\fThe Era of Neural Disinformation. Though training Grover was challenging, it is easily achiev-\nable by real-world adversaries today. Obtaining the data required through Common Crawl cost\n$10k in AWS credits and can be massively parallelized over many CPUs. Training Grover-Mega is\nrelatively inexpensive: at a cost of $0.30 per TPU v3 core-hour and two weeks of training, the total\ncost is $25k. Spending more money and engineering time could yield even more powerful generators.\n\nRelease of generators is critical. At \ufb01rst, it would seem like keeping models like Grover private\nwould make us safer. However, Grover serves as an e\u21b5ective detector of neural fake news, even\nwhen the generator is much larger (Section 5). If generators are kept private, then there will be little\nrecourse against adversarial attacks. We thus released our models to researchers (Zellers, 2019).\n\nFuture of progress in generation. Models like BERT are strong discriminators for many NLP\ntasks, but they are not as good at detecting Grover\u2019s generations as left-to-right models like Grover,\neven after domain adaptation. One hypothesis is that the artifacts shown in Section 6 are most visible\nto a left-to-right discriminator. This also suggests that recent progress on generating text in any order\n(Gu et al., 2019; Stern et al., 2019; Ghazvininejad et al., 2019) may lead to models that evade a\nGrover discriminator. Likewise, models that are trained conditioned on their own predictions might\navoid exposure bias, however, these objectives often lead to low performance on language tasks\n(Caccia et al., 2018). One additional possibility is the use of Adversarial Filtering (Zellers et al., 2018;\n2019b) to oversample and then select a subset of generations. However, we found this didn\u2019t work\nwell for very long sequences (up to 1024 BPE tokens), possibly as these are far from the \u2018Goldilocks\nZone\u2019 wherein discrimination is hard for machines.\n\nAdditional threat models.\nIn this paper, we studied the threat model whereby an adversary gener-\nates an entire news article from scratch, given minimal context. Other threat models are possible: for\ninstance, an adversary might generate comments or have entire dialogue agents, they might start with\na human-written news article and modify a few sentences, and they might fabricate images or video.\nThese threat models ought to be studied by researchers also so that we can create better defenses.\n\nMachine-generated real news? Our study focused on detecting machine-written fake news,\nthough the same Grover approach can be used for spotting human-written fake news as well (Zellers\net al., 2019c). However, machines can also generate truthful news using templated systems. Domains\nwith templated news articles exist in our dataset,16 and are easy for Grover to spoof convincingly.\n\nFuture of progress in discrimination. Our discriminators are e\u21b5ective, but they primarily leverage\ndistributional features rather than evidence. In contrast, humans assess whether an article is truthful\nby relying on a model of the world, assessing whether the evidence in the article matches that\nmodel. Future work should investigate integrating knowledge into the discriminator (e.g. for claim\nveri\ufb01cation in FEVER; Thorne et al., 2018). An open question is to scale progress in this task towards\nentire news articles, and without paired evidence (similar to open-domain QA; Chen et al., 2017).\n\nWhat should platforms do? Video-sharing platforms like YouTube use deep neural networks to\nscan videos while they are uploaded, to \ufb01lter out content like pornography (Hosseini et al., 2017).\nWe suggest platforms do the same for news articles. An ensemble of deep generative models, such as\nGrover, can analyze the content of text \u2013 together with more shallow models that predict human-\nwritten disinformation. However, humans must still be in the loop due to dangers of \ufb02agging real\nnews as machine-generated, and possible unwanted social biases of these models.\nAcknowledgments\nWe thank the anonymous reviewers, as well as Dan Weld, for their helpful feedback. Thanks also to\nZak Stone and the Google Cloud TPU team for help with the computing infrastructure. This work\nwas supported by the National Science Foundation through a Graduate Research Fellowship (DGE-\n1256082) and NSF grants (IIS-1524371, 1637479, 165205, 1703166), the DARPA CwC program\nthrough ARO (W911NF-15-1-0543), the Sloan Research Foundation through a Sloan Fellowship, the\nAllen Institute for Arti\ufb01cial Intelligence, the NVIDIA Arti\ufb01cial Intelligence Lab, Samsung through a\nSamsung AI research grant, and gifts by Google and Facebook. Computations on beaker.org were\nsupported in part by credits from Google Cloud.\n\n16An example is https://americanbankingnews.com.\n\n9\n\n\fReferences\nYoshua Bengio, R\u00e9jean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic\n\nlanguage model. Journal of machine learning research, 3(Feb):1137\u20131155, 2003.\n\nSamantha Bradshaw and Philip Howard. Troops, trolls and troublemakers: A global inventory of\n\norganized social media manipulation. Technical report, Oxford Internet Institute, 2017.\n\nMassimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, and Laurent Charlin.\n\nLanguage gans falling short. arXiv preprint arXiv:1811.02549, 2018.\n\nDanqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to answer open-\ndomain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational\nLinguistics (Volume 1: Long Papers), pages 1870\u20131879, 2017.\n\nJacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep\nbidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.\n\nRachel Dicker.\n\nAvoid These Fake News Sites\n\nat All Costs.\n\nhttps:\n\n//www.usnews.com/news/national-news/articles/2016-11-14/\navoid-these-fake-news-sites-at-all-costs, 2016. [Online; accessed 22-May-2019].\n\nElizabeth Dwoskin and Tony Romm. Facebook says it has uncovered a coordinated disinformation\n\noperation ahead of the 2018 midterm elections. The Washington Post, 2018.\n\nAngela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story generation. In Proceedings\nof the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long\nPapers), pages 889\u2013898, 2018.\n\nRobert Faris, Hal Roberts, Bruce Etling, Nikki Bourassa, Ethan Zuckerman, and Yochai Benkler.\nPartisanship, propaganda, and disinformation: Online media and the 2016 us presidential election.\nBerkman Klein Center Research Publication 2017-6., 2017.\n\nMarjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. Constant-time machine\n\ntranslation with conditional masked language models. arXiv preprint arXiv:1904.09324, 2019.\n\nJiatao Gu, Qi Liu, and Kyunghyun Cho.\n\nInsertion-based decoding with automatically inferred\n\ngeneration order. arXiv preprint arXiv:1902.01370, 2019.\n\nXiaochuang Han and Jacob Eisenstein. Unsupervised domain adaptation of contextualized embed-\n\ndings: A case study in early modern english. arXiv preprint arXiv:1904.02817, 2019.\n\nTatsunori B Hashimoto, Hugh Zhang, and Percy Liang. Unifying human and statistical evaluation for\n\nnatural language generation. arXiv preprint arXiv:1904.02792, 2019.\n\nBrent Hecht, Lauren Wilcox, Je\u21b5rey P. Bigham, Johannes Sch\u00f6ning, Ehsan Hoque, Jason Ernnst,\nYonatan Bisk, Luigi De Russis, Lana Yarosh, Bushra Anjum, Danish Contractor, and Cathy Wu.\nIt\u2019s time to do something: Mitigating the negative impacts of computing through a change to the\npeer review process. ACM Future of Computing Blog, 2018.\n\nAri Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. The curious case of neural text degenera-\n\ntion. arXiv preprint arXiv:1904.09751, 2019.\n\nHossein Hosseini, Baicen Xiao, Andrew Clark, and Radha Poovendran. Attacking automatic video\nanalysis algorithms: A case study of google cloud video intelligence api. In Proceedings of the\n2017 on Multimedia Privacy and Security, pages 21\u201332. ACM, 2017.\n\nZhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. Toward controlled\ngeneration of text. In Proceedings of the 34th International Conference on Machine Learning-\nVolume 70, pages 1587\u20131596. JMLR. org, 2017.\n\nArmand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. Bag of tricks for ecient text\nclassi\ufb01cation. In Proceedings of the 15th Conference of the European Chapter of the Association\nfor Computational Linguistics: Volume 2, Short Papers, volume 2, pages 427\u2013431, 2017.\n\n10\n\n\fRafal J\u00f3zefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring the\n\nlimits of language modeling. CoRR, abs/1602.02410, 2016.\n\nDiederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR,\n\nabs/1412.6980, 2014.\n\nClare Melford and Craig Fagan. Cutting the funding of disinformation: The ad-tech solution.\n\nTechnical report, The Global Disinformation Index, 2019.\n\nAdam Mosseri. News feed fyi: Helping ensure news on facebook is from trusted sources. Facebook\n\nNewsroom, 19, 2018.\n\nMyle Ott, Yejin Choi, Claire Cardie, and Je\u21b5rey T Hancock. Finding deceptive opinion spam by\nany stretch of the imagination. In Proceedings of the 49th annual meeting of the association for\ncomputational linguistics: Human language technologies-volume 1, pages 309\u2013319. Association\nfor Computational Linguistics, 2011.\n\nVer\u00f3nica P\u00e9rez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. Automatic\ndetection of fake news. In Proceedings of the 27th International Conference on Computational\nLinguistics, pages 3391\u20133401, 2018.\n\nMatthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and\nLuke Zettlemoyer. Deep contextualized word representations. In Proceedings of the 2018 Confer-\nence of the North American Chapter of the Association for Computational Linguistics: Human\nLanguage Technologies, Volume 1 (Long Papers), volume 1, pages 2227\u20132237, 2018.\n\nAlec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever.\n\nImproving language\nunderstanding by generative pre-training. Technical report, OpenAI, 2018. URL https:\n//blog.openai.com/language-unsupervised/.\n\nAlec Radford, Je\u21b5rey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language\n\nmodels are unsupervised multitask learners. Technical report, OpenAI, 2019.\n\nMarc\u2019Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. Sequence level training\n\nwith recurrent neural networks. In ICLR. ICLR, 2016.\n\nHannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. Truth of varying\nshades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017\nConference on Empirical Methods in Natural Language Processing, pages 2931\u20132937, 2017.\n\nChengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Filippo Menczer. Hoaxy:\nA platform for tracking online misinformation. In Proceedings of the 25th international conference\ncompanion on world wide web, pages 745\u2013750. International World Wide Web Conferences\nSteering Committee, 2016.\n\nNoam Shazeer and Mitchell Stern. Adafactor: Adaptive learning rates with sublinear memory cost.\n\nIn International Conference on Machine Learning, pages 4603\u20134611, 2018.\n\nIrene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Je\u21b5 Wu, Alec\nRadford, and Jasmine Wang. Release strategies and the social impacts of language models. arXiv\npreprint arXiv:1908.09203, 2019.\n\nMitchell Stern, William Chan, Jamie Kiros, and Jakob Uszkoreit. Insertion transformer: Flexible\n\nsequence generation via insertion operations. arXiv preprint arXiv:1902.03249, 2019.\n\nHendrik Strobelt and Sebastian Gehrmann. Catching a unicorn with gltr: A tool to detect automatically\n\ngenerated text. Technical report, Harvard, 2019.\n\nBriony Swire, Ullrich KH Ecker, and Stephan Lewandowsky. The role of familiarity in correcting\ninaccurate information. Journal of experimental psychology: learning, memory, and cognition, 43\n(12):1948, 2017.\n\n11\n\n\fJames Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. Fever: a large-\nscale dataset for fact extraction and veri\ufb01cation. In Proceedings of the 2018 Conference of the\nNorth American Chapter of the Association for Computational Linguistics: Human Language\nTechnologies, Volume 1 (Long Papers), pages 809\u2013819, 2018.\n\nChris J Vargo, Lei Guo, and Michelle A Amazeen. The agenda-setting power of fake news: A big\ndata analysis of the online media landscape from 2014 to 2016. New Media & Society, 20(5):\n2028\u20132049, 2018.\n\nAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz\nKaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International\nConference on Neural Information Processing Systems, pages 6000\u20136010. Curran Associates Inc.,\n2017.\n\nAlex Wang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. Glue:\nA multi-task benchmark and analysis platform for natural language understanding. arXiv preprint\narXiv:1804.07461, 2018.\n\nWilliam Yang Wang. \u201cliar, liar pants on \ufb01re\u201d: A new benchmark dataset for fake news detection. In\nProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume\n2: Short Papers), pages 422\u2013426, 2017.\n\nClaire Wardle. Fake news. it\u2019s complicated. First Draft News, 16, 2017.\nClaire Wardle and Hossein Derakhshan. Information disorder: Toward an interdisciplinary framework\n\nfor research and policy making. Council of Europe report, DGI (2017), 9, 2017.\n\nRowan Zellers. Why we released grover. Technical report, 2019. URL https://thegradient.\n\npub/why-we-released-grover/.\n\nRowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. Swag: A large-scale adversarial\ndataset for grounded commonsense inference. In Proceedings of the 2018 Conference on Empirical\nMethods in Natural Language Processing (EMNLP), 2018.\n\nRowan Zellers, Yonatan Bisk, Ali Farhadi, and Yejin Choi. From recognition to cognition: Visual\ncommonsense reasoning. In The IEEE Conference on Computer Vision and Pattern Recognition\n(CVPR), 2019a.\n\nRowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine\nreally \ufb01nish your sentence? In Proceedings of the 57th Annual Meeting of the Association for\nComputational Linguistics, 2019b.\n\nRowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner,\nAli Farhadi Franziska Roesner Choi, Yejin Yonatan Bisk, and Yejin Choi. Counteracting neural\ndisinformation with grover. Technical report, 2019c. URL https://medium.com/ai2-blog/\ncounteracting-neural-disinformation-with-grover-6cf6690d463b.\n\n12\n\n\f", "award": [], "sourceid": 4848, "authors": [{"given_name": "Rowan", "family_name": "Zellers", "institution": "University of Washington"}, {"given_name": "Ari", "family_name": "Holtzman", "institution": "University of Washington"}, {"given_name": "Hannah", "family_name": "Rashkin", "institution": "University of Washington"}, {"given_name": "Yonatan", "family_name": "Bisk", "institution": "Carnegie Mellon University"}, {"given_name": "Ali", "family_name": "Farhadi", "institution": "University of Washington, Allen Institute for Artificial Intelligence"}, {"given_name": "Franziska", "family_name": "Roesner", "institution": "University of Washington"}, {"given_name": "Yejin", "family_name": "Choi", "institution": "University of Washington"}]}