Cite
Notes
Only stored in your browser.
Attribution
Can Test-Time Scaling Improve World Foundation Model?
arXiv 2025
APOLLO: SGD-like Memory, AdamW-level Performance
arXiv 2024
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
NeurIPS 2023 11
from 3 papers
David Z. Pan
Wenyan Cong
Zhangyang Wang
Bangya Liu
Bo Long
Dejia Xu
Jiaqi Gu
Jinwon Lee
Kevin Wang
Peihao Wang