Saurav Kadavath

Researcher at Anthropic on finetuning and alignment; co-author of Constitutional AI and "Language Models (Mostly) Know What They Know".

Role: researcher
Currently at: Anthropic
GitHub: github.com/ssss1029
Scholar: scholar.google.com/citations
Papers: 8

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

8papers

Authored papers

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

preprint

2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

arXiv 2022

2022

Discovering Language Model Behaviors with Model-Written Evaluations

arXiv 2022

2022

Constitutional AI: Harmlessness from AI Feedback

arXiv 2022

2022

Measuring Mathematical Problem Solving With the MATH Dataset

NeurIPS

2021

Measuring Coding Challenge Competence With APPS

arXiv 2021

2021

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

ICCV 2021 10

2020

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

using-self-supervised-learning-can-improve-1

2019

Affiliations

Currently at

Anthropic

researcher · frontier lab

Frequent co-authors

from 8 papers

Amanda Askell

researcher

4 shared papers

Andy Jones

researcher

4 shared papers

Anna Chen

researcher

4 shared papers

Ben Mann

founder

4 shared papers

Catherine Olsson

researcher

4 shared papers

Dan Hendrycks

director

4 shared papers

Danny Hernandez

researcher

4 shared papers

Dario Amodei

CEO

4 shared papers

Dawn Drain

researcher

4 shared papers

Dawn Song

professor

4 shared papers