Cite
Notes
Only stored in your browser.
Attribution
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
arXiv 2024
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
arXiv 2023
from 2 papers
Ajit Mathews
Alban Desmaison
Andrew Gu
Bernard Nguyen
Can Balioglu
Chien-chin Huang
Chunting Zhou
Gargi Ghosh
Geeta Chauhan
Hamid Shojanazeri