Siqi Yang

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

arXiv 2026

Flexible Entropy Control in RLVR with Gradient-Preserving Perspective

arXiv 2026

No known affiliations.

from 2 papers

Fanfan Liu

Haibo Qiu

Peng Shi

Zhixiong Zeng

Kun Chen

Wenji Mao

Youyang Yin