Songjun Tu
- Papers
- 3
Cite
Notes
Only stored in your browser.
3papers
Authored papers
3Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
arXiv 2025
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
arXiv 2025
In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
arXiv 2024
Affiliations
No known affiliations.
Frequent co-authors
10from 3 papers