Cite
Notes
Only stored in your browser.
Attribution
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
arXiv 2025
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
A Dataset and Strong Baselines for Classification of Czech News Texts
arXiv 2023
from 3 papers
Amir Hossein Kargaran
Anh Dao
Bettina Messmer
Changyu Chen
Chao Du
Colin Raffel
Cunxiao Du
Fajri Koto
Fan Zhou
Guilherme Penedo