Cite
Notes
Only stored in your browser.
Attribution
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
arXiv 2026
Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs
arXiv 2025
from 2 papers
Benchang Zhu
Binbin Zheng
Huaiyu Wan
Jia Leng
Jingqing Ruan
Kepeng Lin
Mengqi Liao
Ruinian Chen
Shuai Liu
Xiangyu Xi