Changlong Yu
- Papers
- 6
Cite
Notes
Only stored in your browser.
6papers
Authored papers
6Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training
arXiv 2026
Beyond Test-Time Memory: State-Space Optimal Control for LLM Reasoning
arXiv 2026
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
arXiv 2025
Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting
arXiv 2025
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
arXiv 2025
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
arXiv 2025
Affiliations
No known affiliations.
Frequent co-authors
10from 6 papers