We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We analyze our datasets and conduct an extensive empirical evaluation of state-of-the-art methods across both supervised and transfer learning settings. We release the data, code, and models in order to inspire future research on African NLP.
MasakhaNER: Named Entity Recognition for African Languages
A large dataset for named entity recognition in ten African languages is created and analyzed, with evaluations of state-of-the-art techniques in supervised and transfer learning.
- Year
- 2021
- Venue
- arXiv 2021
- Authors
- 61
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2103.11811v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
61Daniel D'souzaGraham NeubigJulia KreutzerSebastian RuderClemencia SiroJesujoba AlabiSalomey OseiRubungo Andre NiyongaboPerez OgayoOrevaoghene AhiaKelechi OguejiJade AbbottIroro OrifeIgnatius EzeaniBlessing SibandaAdewale AkinfaderinConstantine LignosDavid Ifeoluwa AdelaniSeid Muhie YimamChester Palen-MichelHappy BuzaabaShruti RijhwaniStephen MayhewIsrael Abebe AzimeShamsuddeen MuhammadChris Chinenye EmezueJoyce Nakatumba-NabendeAnuoluwapo AremuCatherine GitauDerguene MbayeTajuddeen GwadabeJonathan MukiibiVerrah OtiendeDavis DavidSamba NgomTosin AdewumiPaul RaysonMofetoluwa AdeyemiGerald MuriukiEmmanuel AnebiChiamaka ChukwunekeNkiruka OduEric Peter WairagalaSamuel OyerindeTobius Saul BateesaTemilola OloyedeYvonne WambuiVictor AkinodeDeborah NabagerekaMaurice KatusiimeAyodele AwokoyaMouhamadane MBOUPDibora GebreyohannesHenok TilayeKelechi NwaikeDegaga WoldeAbdoulaye FayeBonaventure F. P. DossouThierno Ibrahima DIOPAbdoulaye DialloTendai Marengereke