Evan Hubinger
- Papers
- 7
Cite
Notes
Only stored in your browser.
Authored papers
7Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas
arXiv 2025
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
arXiv 2024
Alignment faking in large language models
arXiv 2024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
arXiv 2024
Steering Llama 2 via Contrastive Activation Addition
arXiv 2023
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
arXiv 2023
Discovering Language Model Behaviors with Model-Written Evaluations
arXiv 2022
Affiliations
Frequent co-authors
10from 7 papers
Ethan Perez
Jared Kaplan
co-founder / Chief Science Officer
Samuel R. Bowman
Carson Denison
Nicholas Schiefer
Buck Shlegeris
David Duvenaud
Monte MacDiarmid
Shauna Kravec
researcher
Tamera Lanham