Zhenghao Xu
- Papers
- 3
Cite
Notes
Only stored in your browser.
3papers
Authored papers
3Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training
arXiv 2026
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
arXiv 2025
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
arXiv 2025
Affiliations
No known affiliations.
Frequent co-authors
10from 3 papers