Cite
Notes
Only stored in your browser.
Attribution
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
arXiv 2025
Zyda: A 1.3T Dataset for Open Language Modeling
arXiv 2024
from 2 papers
Benjamin Thérien
Beren Millidge
Eugene Belilovsky
Irina Rish
James Whittington
Jonathan Pilault
Paolo Glorioso
Paria Mehrbod
Paul Janson
Quentin Anthony