We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 101
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2509.14233ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
101Nathan RanchinFlorian TramerKumar ShridharAuguste PoirouxAntoine BosselutTorsten HoeflerAlejandro Hernández CanoAngelika RomanouKyle MatobaMatteo PagliardiniSimin FanSyrielle MontariolMartin JaggiAndrei PanferovYiXuan XuAlexander HoyleRaghav SinghalKaustubh PonksheIdo HakimiAndreas KrauseFrederike LübeckBarna PásztorDongyang FanXiaozhe YaoAna KlimovicVinko SabolčecBettina MessmerNegar ForoutanAlexander HägeleAllen Hao HuangAntoni-Joan SolergibertDhia GarbayaEduard Frank ĎurechJuan García GiraldoMete IsmayilzadaSkander MoallaTiancheng ChenMichael AerniBadr AlKhamissiInes Altemir MarinasMohammad Hossein AmaniMatin AnsaripourIlia BadaninHarold BenoitEmanuela BorosNicholas BrowningFabian BöschMaximilian BötherNiklas CanovaCamille ChallierClement CharmillotJonathan ColesJan DeriuArnout DevosLukas DrescherDaniil DzenhaliouMaud EhrmannSilin GaoMiguel GilaMaría GranduryDiba HashemiJiaming JiangMark KleinAndrei KucharavyAnastasiia KucherenkoRoman MachacekTheofilos ManitarasAndreas MarfurtSimon MatrenokHenrique MendoncçaFawzi Roberto MohamedLuca MouchelSven Najem-MeyerJingwei NiGennaro OlivaElia PalmeLéo PaolettiMarco PasseriniIvan PavlovJavi RandoMathieu SauserJakhongir SaydalievMuhammad Ali SayfiddinovMarian SchneiderStefano SchuppliMarco ScialangaAndrei SemenovAnna SotnikovaAlexander SternfeldAyush Kumar TarunPaul TeiletcheJannis VamvasHao Zhao Alexander IlicCaglar GulcehreDavid RosenthalElliott AshJoost VandeVondeleLivio VeraldiMartin RajmanThomas SchulthessImanol Schlag