Cite
Notes
Only stored in your browser.
Attribution
AXLearn: Modular Large Model Training on Heterogeneous Infrastructure
arXiv 2025
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
arXiv 2024
from 2 papers
Amitis Shidani
BoWen Zhang
Chang Lan
Cheng Leong
Chung-Cheng Chiu
Dan Busbridge
Danyang Zhuo
David Qiu
Dongseong Hwang
Eeshan Dhekane