Cite
Notes
Only stored in your browser.
Attribution
Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO
arXiv 2026
from 1 papers
Canyu Zhao
Hao Jiang
Hongwei Zhang
Jiamang Wang
Jinlong Liu
Ju Huang
Mushui Liu
Peng Zhang
Shiyi Zhang
Wanggui He