Cite
Notes
Only stored in your browser.
Attribution
Self-rewarding correction for mathematical reasoning
arXiv 2025
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
from 2 papers
Tong Zhang
Anurag Beniwal
Hanning Zhang
Hao Chen
Jing Huang
Lichang Chen
Nan Jiang
Narayanan Sadagopan
Wei Xiong
Zhou Yu