Stas Bekman
- Papers
- 4
Cite
Notes
Only stored in your browser.
4papers
Authored papers
4Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences
arXiv 2025
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
obelics-an-open-web-scale-filtered-dataset-of
What Language Model to Train if You Have One Million GPU Hours?
arXiv 2022
Datasets: A Community Library for Natural Language Processing
EMNLP (ACL) 2021 11
Affiliations
No known affiliations.
Frequent co-authors
10from 4 papers
Victor Sanh
Alexander M. Rush
Lucile Saulnier
Teven Le Scao
Thomas Wang
Abhishek Thakur
Albert Villanova del Moral
Amanpreet Singh
Angelina McMillan-Major
Anton Lozhkov