Evan Hubinger

Papers: 7

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

7papers

Authored papers

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

arXiv 2025

2025

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

arXiv 2024

2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

arXiv 2024

2024

Alignment faking in large language models

arXiv 2024

2024

Steering Llama 2 via Contrastive Activation Addition

arXiv 2023

2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

arXiv 2023

2023

Discovering Language Model Behaviors with Model-Written Evaluations

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

from 7 papers

Ethan Perez

5 shared papers

Jared Kaplan

co-founder / Chief Science Officer

Samuel R. Bowman

Carson Denison

Nicholas Schiefer

Buck Shlegeris

David Duvenaud

Monte MacDiarmid

Shauna Kravec

researcher

3 shared papers

Tamera Lanham

3 shared papers