0

Pretraining Finnish ModernBERTs

ModernBERT encoder models pre-trained in limited multilingual settings, focusing on Finnish languages, perform better than existing multilingual models and outperform monolingual models on tasks requiring longer context.

Year
2025
Venue
arXiv 2025
Authors
4
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2511.09213ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

This paper reports on pretraining ModernBERT encoder models in six different sizes, ranging from 51M to 475M parameters, with a focus on limited multilingualism, emphasizing languages relevant to Finland. Our models are competitive with, or superior to, existing multilingual models. They outperform monolingual models on tasks that require a context longer than 512 tokens. We present empirical results on using different data in the final stage of training. The code and models are publicly released.

Authors

4