Scott Niekum

Cite

Notes

Only stored in your browser.

Attribution

5papers

Authored papers

Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints

arXiv 2025

D2PO: Discriminator-Guided DPO with Response Evaluation Models

arXiv 2024

Contrastive Preference Learning: Learning from Human Feedback without RL

arXiv 2023

Learning Optimal Advantage from Preferences and Mistaking it for Reward

arXiv 2023

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

arXiv 2023

No known affiliations.

from 5 papers

Harshit Sikchi

W. Bradley Knox

Amy Zhang

Anca Dragan

Austin Hoag

Blossom Metevier

Chelsea Finn

Dorsa Sadigh

Greg Durrett

Joey Hejna