Cite
Notes
Only stored in your browser.
Attribution
Base Models Look Human To AI Detectors
arXiv 2026
Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories
Jailbreaking in the Haystack
arXiv 2025
from 3 papers
aditi raghunathan
Shashwat Saxena
Alexander Robey
Chen Henry Wu
Fei Fang
Ivan Bercovich
Ivgeni Segal
J. Zico Kolter
Kexun Zhang
Rishi Rajesh Shah