Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (\url{https://github.com/GEM-benchmark/NL-Augmenter}).
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
NL-Augmenter is a participatory Python-based framework for natural language data augmentation, offering transformations and filters to assess model robustness and enhance data diversity.
- Year
- 2021
- Venue
- arXiv 2021
- Authors
- 125
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2112.02721v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
125Niklas MuennighoffSebastian GehrmannThomas ScialomTianbao XieYue ZhangJing ZhangWanxiang CheZijie J. WangSebastian RuderTanya GoyalLibo QinChandan SinghDamien SileoDenis KleykoGenta Indra WinataGerard de MeloGloria WangJascha Sohl-DicksteinKaustubh D. DholeMarie TolkiehnMichael A. YeeMo TiwariMukund Varma TPriti OliRabin BanjadeRoman SitelewRyan TeehanSajant AnandVikas RaunakXinyi WuXudong ShenZijian WangVukosi MarivateKalpesh KrishnaNafise Sadat MoosaviPierre ColomboKaizhao LiangKenton MurraySamson TanEduard HovyTanay DixitWitold WydmańskiJan PfisterJuan Diego RodriguezVarun GangalTongshuang WuVenelin KovatchevOndřej DušekAadesh GuptaZhenhao LiSaad MahamoodAbinaya MahendiranSimon MilleAshish ShrivastavaJinho D. ChoiNagender AnejaLisa BartheHanna BehnkeIan Berlot-AttwellConnor BoyleCaroline BrunMarco Antonio Sobrevilla CabezudoSamuel CahyawijayaEmile ChapuisMukund ChoudharyChristian ClaussFilip CornellGautier DaganMayukh DasThomas DopierrePaul-Alexis DraySuchitra DubeyTatiana EkeinhorMarco Di GiovanniRishabh GuptaLouanes HamlaSang HanFabrice Harel-CanadaAntoine HonoreIshan JindalPrzemyslaw K. JoniakAshutosh KumarStefan LangerSeungjae Ryan LeeCorey James LevinsonHualou LiangZhexiong LiuAndrey LukyanenkoSimon MeoniMaxime MeyerAfnan MirTimothy Sum Hon MunMarcin NamyslMaria ObedkovaNivranshu PasrichaRichard PlantVinay PrabhuVasile PaisShahab RajiPawan Kumar RajpootRoy RinbergNicolas RobertsClaude RouxVasconcellos P. H. S.Ananya B. SaiRobin M. SchmidtTshephisho SefaraSaqib N. ShamsiHaoyue ShiYiwen ShiAnna ShvetsNick SiegelJamie SimonPriyank SoniTaylor SorensenWilliam SotoAman SrivastavaKV Aditya SrivatsaTony SunA TabassumFiona Anting TanAthena WangFuxuan WeiBryan WilieUsama Yaseen