Hanning Zhang
- Papers
- 5
Cite
Notes
Only stored in your browser.
5papers
Authored papers
5Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
arXiv 2025
Self-rewarding correction for mathematical reasoning
arXiv 2025
Entropy-Regularized Process Reward Model
arXiv 2024
R-Tuning: Instructing Large Language Models to Say `I Don't Know'
arXiv 2023
Mitigating the Alignment Tax of RLHF
arXiv 2023
Affiliations
No known affiliations.
Frequent co-authors
10from 5 papers