Cite
Notes
Only stored in your browser.
Attribution
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
arXiv 2025
Self-rewarding correction for mathematical reasoning
from 2 papers
Tong Zhang
Anurag Beniwal
Hanning Zhang
Hao Chen
Jing Huang
Lichang Chen
Nan Jiang
Narayanan Sadagopan
Wei Xiong
Zhou Yu