Carson Denison
- Papers
- 4
Cite
Notes
Only stored in your browser.
4papers
Authored papers
4Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
arXiv 2024
Alignment faking in large language models
arXiv 2024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
arXiv 2024
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
arXiv 2023
Affiliations
No known affiliations.
Frequent co-authors
10from 4 papers
Ethan Perez
Evan Hubinger
Jared Kaplan
co-founder / Chief Science Officer
Samuel R. Bowman
Buck Shlegeris
David Duvenaud
Monte MacDiarmid
Nicholas Schiefer
Ansh Radhakrishnan
Fazl Barez