Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano
Training regimes based on Maximum Likelihood Estimation (MLE) suffer from known limitations, often leading to poorly generated text sequences that lack of coherence, factualness, and are prone to repetitions. At the root of these limitations is the mismatch between training and inference, i.e. the so-called exposure bias. Another problem lies in considering only the reference text as correct, while in practice several alternative formulations could be as good.
Generative Adversarial Networks (GANs) could mitigate those limitations. Nonetheless, the discrete nature of text has hindered their application to language generation: the approaches proposed so far, based on Reinforcement Learning, have been shown to under-perform MLE. In this context, the exploration is known to be critical, while surprisingly being under-studied. In this work, we show how the most popular sampling method results in unstable training for language GANs. We propose alternative exploration strategies that we named Cold-GANs. By forcing the sampling to be close to the distribution mode, the learning dynamic becomes smoother.
We report experimental results obtained on three tasks: unconditional text generation, question generation, and abstractive summarization. For the first time, to the best of our knowledge, the proposed language GANs compare favorably to MLE, and obtain improvements over the state-of-the-art on the considered tasks.