Rishabh Agarwal
- Papers
- 6
Cite
Notes
Only stored in your browser.
6papers
Authored papers
6Process Reward Models That Think
arXiv 2025
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
arXiv 2024
Training Language Models to Self-Correct via Reinforcement Learning
arXiv 2024
Bigger, Better, Faster: Human-level Atari with human-level efficiency
arXiv 2023
Revisiting Bellman Errors for Offline Model Selection
arXiv 2023
Deep Reinforcement Learning at the Edge of the Statistical Precipice
NeurIPS 2021 12
Affiliations
No known affiliations.
Frequent co-authors
10from 6 papers