Cite
Notes
Only stored in your browser.
Attribution
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
arXiv 2025
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
from 2 papers
Dongbin Zhao
Linjing Li
Qichao Zhang
Songjun Tu
Xiangyu Tian
Xiangyuan Lan
Dongmei Jiang
Nan Xu
wei he
Yuqian Fu