Cite
Notes
Only stored in your browser.
Attribution
AXLearn: Modular Large Model Training on Heterogeneous Infrastructure
arXiv 2025
Mega: Moving Average Equipped Gated Attention
arXiv 2022
from 2 papers
BoWen Zhang
Chang Lan
Cheng Leong
Chung-Cheng Chiu
Chunting Zhou
Danyang Zhuo
David Qiu
Dongseong Hwang
Floris Weers
Graham Neubig
professor