Amanda Askell

Philosopher and AI alignment researcher at Anthropic; leads work on Claude's character and constitutional AI design.

Role: researcher
Currently at: Anthropic
Twitter: twitter.com/AmandaAskell
Scholar: scholar.google.com/citations
Papers: 8

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

8papers

Authored papers

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

arXiv 2024

2024

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

preprint

2022

Training language models to follow instructions with human feedback

NeurIPS

2022

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

arXiv 2022

2022

Discovering Language Model Behaviors with Model-Written Evaluations

arXiv 2022

2022

Constitutional AI: Harmlessness from AI Feedback

arXiv 2022

2022

Learning Transferable Visual Models From Natural Language Supervision

arXiv 2021

2021

Affiliations

Currently at

Anthropic

researcher · frontier lab

Previously

OpenAIfrontier lab

Frequent co-authors

from 8 papers

Deep Ganguli

researcher

6 shared papers

Jared Kaplan

co-founder / Chief Science Officer

6 shared papers

Yuntao Bai

researcher

6 shared papers

Danny Hernandez

researcher

5 shared papers

Jack Clark

founder

5 shared papers

Jackson Kernion

researcher

5 shared papers

Kamal Ndousse

researcher

5 shared papers

Nova DasSarma

researcher

5 shared papers

Shauna Kravec

researcher

5 shared papers

Andy Jones

researcher

4 shared papers