Cite
Notes
Only stored in your browser.
Attribution
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR
arXiv 2026
Flexible Entropy Control in RLVR with Gradient-Preserving Perspective
from 2 papers
Fanfan Liu
Haibo Qiu
Peng Shi
Zhixiong Zeng
Kun Chen
Wenji Mao
Youyang Yin