Cite
Notes
Only stored in your browser.
Attribution
Baichuan-M1: Pushing the Medical Capability of Large Language Models
arXiv 2025
DCPO: Dynamic Clipping Policy Optimization
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
from 3 papers
Bingning Wang
Han Liu
Yupeng Zhang
Zecheng Wang
Chengfeng Dou
Da Pan
Dianbo Sui
Fei Deng
Fei Kou
Fei Li