Cite
Notes
Only stored in your browser.
Attribution
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
arXiv 2024
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
arXiv 2022
from 2 papers
Adam Paszke
Alek Andreev
Aleksandar Botev
Andy Brock
Antonia Paterson
Anushan Fernando
Armand Joulin
Arnaud Doucet
Arthur Zucker
Cassidy Hardin