Cite
Notes
Only stored in your browser.
Attribution
OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning
arXiv 2026
REBEL: Reinforcement Learning via Regressing Relative Rewards
arXiv 2024
Dataset Reset Policy Optimization for RLHF
from 3 papers
Jason D. Lee
Jonathan D. Chang
Kianté Brantley
Wen Sun
Wenhao Zhan
Arnav Singhvi
Ashutosh Baheti
Cindy Wang
Dipendra Misra
Erich Elsen