Cite
Notes
Only stored in your browser.
Attribution
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
arXiv 2025
from 1 papers
Bo Zhou
Guanhua Huang
Mingze Wang
Qi Yi
Ruibin Xiong
Siheng Li
Tingqiang Xu
Xue Gong
Yuhao Jiang