Cite
Notes
Only stored in your browser.
Attribution
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
arXiv 2026
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization
Aligning Multimodal LLM with Human Preference: A Survey
arXiv 2025
from 3 papers
Bolin Ding
Chiyu Ma
Guoyin Wang
Haoming Meng
Jingren Zhou
Junkang Wu
Kexin Huang
Chaoyou Fu
Dingjie Song
Guibin Zhang