Owain Evans

Founder of Truthful AI; AI alignment researcher known for TruthfulQA, situational awareness, emergent misalignment, and subliminal learning.

Role: founder
Currently at: Truthful AI
Twitter: twitter.com/OwainEvans_UK
GitHub: github.com/owainevans
Scholar: scholar.google.com/citations
Papers: 12

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

12papers·1eval contribs

Authored papers

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

arXiv 2025

2025

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

arXiv 2025

2025

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers

arXiv 2025

2025

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

arXiv 2025

2025

Tell me about yourself: LLMs are aware of their learned behaviors

arXiv 2025

2025

Looking Inward: Language Models Can Learn About Themselves by Introspection

arXiv 2024

2024

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

arXiv 2024

2024

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Forecasting Future World Events with Neural Networks

arXiv 2022

2022

Teaching Models to Express Their Uncertainty in Words

arXiv 2022

2022

TruthfulQA: Measuring How Models Mimic Human Falsehoods

ACL

2021

Eval contributions

TruthfulQA

Future of Humanity Institute (Oxford)

817 questions targeting common human misconceptions, measuring whether a model gives factually true answers or repeats popular falsehoods.

SaturatedHallucinationFactual Recall

Affiliations

Currently at

Truthful AI

founder · research group

Previously

University of California, Berkeleyuniversity lab University of Oxforduniversity lab

Frequent co-authors

from 12 papers

James Chua

4 shared papers

Jan Betley

4 shared papers

Anna Sztyber-Betley

3 shared papers

Jacob Hilton

researcher

3 shared papers

Stephanie Lin

researcher

3 shared papers

Andy Arditi

2 shared papers

Andy Zou

founder

2 shared papers

Dan Hendrycks

director

2 shared papers

Henry Sleight

2 shared papers

Mantas Mazeika

researcher

2 shared papers