Scott Niekum
- Papers
- 5
Cite
Notes
Only stored in your browser.
5papers
Authored papers
5Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
arXiv 2025
D2PO: Discriminator-Guided DPO with Response Evaluation Models
arXiv 2024
Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
arXiv 2023
Contrastive Preference Learning: Learning from Human Feedback without RL
arXiv 2023
Learning Optimal Advantage from Preferences and Mistaking it for Reward
arXiv 2023
Affiliations
No known affiliations.
Frequent co-authors
10from 5 papers