0

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

NL-Augmenter is a participatory Python-based framework for natural language data augmentation, offering transformations and filters to assess model robustness and enhance data diversity.

Year
2021
Venue
arXiv 2021
Authors
125
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2112.02721v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (\url{https://github.com/GEM-benchmark/NL-Augmenter}).

Authors

125
Niklas MuennighoffSebastian GehrmannThomas ScialomTianbao XieYue ZhangJing ZhangWanxiang CheZijie J. WangSebastian RuderTanya GoyalLibo QinChandan SinghDamien SileoDenis KleykoGenta Indra WinataGerard de MeloGloria WangJascha Sohl-DicksteinKaustubh D. DholeMarie TolkiehnMichael A. YeeMo TiwariMukund Varma TPriti OliRabin BanjadeRoman SitelewRyan TeehanSajant AnandVikas RaunakXinyi WuXudong ShenZijian WangVukosi MarivateKalpesh KrishnaNafise Sadat MoosaviPierre ColomboKaizhao LiangKenton MurraySamson TanEduard HovyTanay DixitWitold WydmańskiJan PfisterJuan Diego RodriguezVarun GangalTongshuang WuVenelin KovatchevOndřej DušekAadesh GuptaZhenhao LiSaad MahamoodAbinaya MahendiranSimon MilleAshish ShrivastavaJinho D. ChoiNagender AnejaLisa BartheHanna BehnkeIan Berlot-AttwellConnor BoyleCaroline BrunMarco Antonio Sobrevilla CabezudoSamuel CahyawijayaEmile ChapuisMukund ChoudharyChristian ClaussFilip CornellGautier DaganMayukh DasThomas DopierrePaul-Alexis DraySuchitra DubeyTatiana EkeinhorMarco Di GiovanniRishabh GuptaLouanes HamlaSang HanFabrice Harel-CanadaAntoine HonoreIshan JindalPrzemyslaw K. JoniakAshutosh KumarStefan LangerSeungjae Ryan LeeCorey James LevinsonHualou LiangZhexiong LiuAndrey LukyanenkoSimon MeoniMaxime MeyerAfnan MirTimothy Sum Hon MunMarcin NamyslMaria ObedkovaNivranshu PasrichaRichard PlantVinay PrabhuVasile PaisShahab RajiPawan Kumar RajpootRoy RinbergNicolas RobertsClaude RouxVasconcellos P. H. S.Ananya B. SaiRobin M. SchmidtTshephisho SefaraSaqib N. ShamsiHaoyue ShiYiwen ShiAnna ShvetsNick SiegelJamie SimonPriyank SoniTaylor SorensenWilliam SotoAman SrivastavaKV Aditya SrivatsaTony SunA TabassumFiona Anting TanAthena WangFuxuan WeiBryan WilieUsama Yaseen