Saurav Kadavath
Researcher at Anthropic on finetuning and alignment; co-author of Constitutional AI and "Language Models (Mostly) Know What They Know".
- Role
- researcher
- Currently at
- Anthropic
- GitHub
- github.com/ssss1029
- Scholar
- scholar.google.com/citations
- Papers
- 8
Cite
Notes
Only stored in your browser.
Authored papers
8Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
preprint
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
arXiv 2022
Discovering Language Model Behaviors with Model-Written Evaluations
arXiv 2022
Constitutional AI: Harmlessness from AI Feedback
arXiv 2022
Measuring Mathematical Problem Solving With the MATH Dataset
NeurIPS
Measuring Coding Challenge Competence With APPS
arXiv 2021
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
ICCV 2021 10
Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
using-self-supervised-learning-can-improve-1
Affiliations
Frequent co-authors
10from 8 papers
Amanda Askell
researcher
Andy Jones
researcher
Anna Chen
researcher
Ben Mann
founder
Catherine Olsson
researcher
Dan Hendrycks
director
Danny Hernandez
researcher
Dario Amodei
CEO
Dawn Drain
researcher
Dawn Song
professor