Cite
Notes
Only stored in your browser.
Attribution
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
arXiv 2024
Improving Transformers with Probabilistic Attention Keys
transformer-with-a-mixture-of-gaussian-keys
from 2 papers
Dung D. Le
Duy Khuong Nguyen
Nhat Ho
Rachel S. Y. Teo
Richard G. Baraniuk
Stanley J. Osher
Tam Nguyen
Viet-Anh Tran