Deep Ganguli
Anthropic alignment-and-policy researcher; led red-teaming and societal-impacts work, co-authored Constitutional AI.
- Role
- researcher
- Currently at
- Anthropic
- twitter.com/deepganguli
- Scholar
- scholar.google.com/citations
- Papers
- 7
Cite
Notes
Only stored in your browser.
Authored papers
7Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
arXiv 2024
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
arXiv 2023
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
preprint
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
arXiv 2022
Discovering Language Model Behaviors with Model-Written Evaluations
arXiv 2022
Constitutional AI: Harmlessness from AI Feedback
arXiv 2022
Tool contributions
1Affiliations
Previously
Frequent co-authors
10from 7 papers
Amanda Askell
researcher
Jared Kaplan
co-founder / Chief Science Officer
Yuntao Bai
researcher
Danny Hernandez
researcher
Jackson Kernion
researcher
Kamal Ndousse
researcher
Nova DasSarma
researcher
Shauna Kravec
researcher
Andy Jones
researcher
Anna Chen
researcher