We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see http://norlm.nlpl.eu
Large-Scale Contextualised Language Modelling for Norwegian
Large-scale monolingual language models for Norwegian are developed using ELMo and BERT frameworks, with contrastive benchmark results provided on various NLP tasks.
- Year
- 2021
- Venue
- NoDaLiDa 2021 5
- Authors
- 5
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2104.06546ARXIV-DEFAULT
- TL;DR
- Semantic Scholar