Junkang Wu
- Papers
- 8
Cite
Notes
Only stored in your browser.
8papers
Authored papers
8On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
arXiv 2026
Aligning Multimodal LLM with Human Preference: A Survey
arXiv 2025
Quantile Advantage Estimation for Entropy-Safe Reasoning
arXiv 2025
Robust Preference Optimization via Dynamic Target Margins
arXiv 2025
RePO: ReLU-based Preference Optimization
arXiv 2025
Direct Multi-Turn Preference Optimization for Language Agents
arXiv 2024
$β$-DPO: Direct Preference Optimization with Dynamic $β$
arXiv 2024
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
arXiv 2024
Affiliations
No known affiliations.
Frequent co-authors
10from 8 papers