Cite
Notes
Only stored in your browser.
Attribution
Training a Generally Curious Agent
arXiv 2025
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
arXiv 2024
Sample Efficient Preference Alignment in LLMs via Active Exploration
arXiv 2023
Reasoning with Latent Diffusion in Offline Reinforcement Learning
from 4 papers
Fahim Tajwar
Stefano Ermon
Abitha Thankaraj
Anikait Singh
Archit Sharma
Aviral Kumar
Barbara Engelhardt
Chelsea Finn
Glen Berseth
Ilija Bogunovic