Saiyong Yang
- Papers
- 8
Cite
Notes
Only stored in your browser.
Authored papers
8Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
arXiv 2026
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
arXiv 2026
Debiased Model-based Representations for Sample-efficient Continuous Control
arXiv 2026
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
arXiv 2026
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
arXiv 2025
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
arXiv 2025
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation
arXiv 2025
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
arXiv 2024
Affiliations
Frequent co-authors
10from 8 papers