Cite
Notes
Only stored in your browser.
Attribution
Technical Report of TeleChat2, TeleChat2.5 and T1
arXiv 2025
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
from 2 papers
Bingkai Yang
Bo Zhou
Chao Wang
Fubei Yao
Guanhua Huang
Hanming Wu
Jiaxin Peng
Kaidong Yu
Kaipeng Jia
Kejiao Li