Ya Wang
- Papers
- 4
Cite
Notes
Only stored in your browser.
4papers
Authored papers
4Mixture-of-Depths Attention
arXiv 2026
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
arXiv 2025
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
arXiv 2025
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
arXiv 2024
Affiliations
No known affiliations.
Frequent co-authors
10from 4 papers