Carson Denison

Cite

Notes

Only stored in your browser.

Attribution

4papers

Authored papers

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

arXiv 2024

Alignment faking in large language models

arXiv 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

arXiv 2024

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

arXiv 2023

No known affiliations.

from 4 papers

Ethan Perez

Evan Hubinger

Jared Kaplan

co-founder / Chief Science Officer

Samuel R. Bowman

Buck Shlegeris

David Duvenaud

Monte MacDiarmid

Nicholas Schiefer

Ansh Radhakrishnan

Fazl Barez