Jared Kaplan
Co-founder and Chief Science Officer of Anthropic; first author of "Scaling Laws for Neural Language Models."
- Role
- co-founder / Chief Science Officer
- Currently at
- Anthropic
- Scholar
- scholar.google.com/citations
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11Alignment faking in large language models
arXiv 2024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
arXiv 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
arXiv 2024
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
arXiv 2023
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
preprint
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
arXiv 2022
Discovering Language Model Behaviors with Model-Written Evaluations
arXiv 2022
Toy Models of Superposition
arXiv 2022
Constitutional AI: Harmlessness from AI Feedback
arXiv 2022
Evaluating Large Language Models Trained on Code
preprint
Affiliations
Frequent co-authors
10from 11 papers
Ethan Perez
Nicholas Schiefer
Sam McCandlish
founder
Samuel R. Bowman
Shauna Kravec
researcher
Amanda Askell
researcher
Danny Hernandez
researcher
Dario Amodei
CEO
Deep Ganguli
researcher
Jackson Kernion
researcher