Grammar as a Foreign Language

Part of Advances in Neural Information Processing Systems 28 (NIPS 2015)

Bibtex Metadata Paper Reviews Supplemental

Authors

Oriol Vinyals, Łukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton

Abstract

Syntactic constituency parsing is a fundamental problem in naturallanguage processing which has been the subject of intensive researchand engineering for decades. As a result, the most accurate parsersare domain specific, complex, and inefficient. In this paper we showthat the domain agnostic attention-enhanced sequence-to-sequence modelachieves state-of-the-art results on the most widely used syntacticconstituency parsing dataset, when trained on a large synthetic corpusthat was annotated using existing parsers. It also matches theperformance of standard parsers when trained on a smallhuman-annotated dataset, which shows that this model is highlydata-efficient, in contrast to sequence-to-sequence models without theattention mechanism. Our parser is also fast, processing over ahundred sentences per second with an unoptimized CPU implementation.