In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an $F_{0.5}$ of 65.3/66.5 on CoNLL-2014 (test) and $F_{0.5}$ of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available.
GECToR -- Grammatical Error Correction: Tag, Not Rewrite
A simple and efficient GEC sequence tagger using a Transformer encoder achieves high performance with fast inference speed and is pre-trained and fine-tuned on errorful and error-free corpora.
- Year
- 2020
- Venue
- gector-grammatical-error-correction-tag-not-1
- Authors
- 4
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2005.12592v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar