Cite
Notes
Only stored in your browser.
Attribution
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
arXiv 2025
from 1 papers
Abolfazl Razi
Aristeidis Sotiras
Hao Wang
Huayu Li
Peijie Qiu
Wenhui Zhu
Xiwen Chen
Xuanzhao Dong
Yalin Wang