Cite
Notes
Only stored in your browser.
Attribution
GRAM: A Generative Foundation Reward Model for Reward Generalization
arXiv 2025
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
arXiv 2024
Prior Constraints-based Reward Model Training for Aligning Large Language Models
from 3 papers
Chenglong Wang
Jingbo Zhu
Tong Xiao
Murun Yang
Qiaozhi He
Tongran Liu
Yang Gan
Yifu Huo
Yongyu Mu
Bei Li