Binghai Wang

Cite

Notes

Only stored in your browser.

Attribution

4papers

Authored papers

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

arXiv 2026

WorldPM: Scaling Human Preference Modeling

arXiv 2025

Secrets of RLHF in Large Language Models Part II: Reward Modeling

arXiv 2024

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

arXiv 2024

No known affiliations.

from 4 papers

Qi Zhang

Tao Gui

Xuanjing Huang

Bowen Yu

Chujie Zheng

Enyu Zhou

Fei Huang

Junyang Lin

researcher

Le Yu

Rui Zheng