Jared Kaplan

Co-founder and Chief Science Officer of Anthropic; first author of "Scaling Laws for Neural Language Models."

Role: co-founder / Chief Science Officer
Currently at: Anthropic
Scholar: scholar.google.com/citations
Papers: 11

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

11papers

Authored papers

11

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

arXiv 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

arXiv 2024

Alignment faking in large language models

arXiv 2024

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

arXiv 2023

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

preprint

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

arXiv 2022

Discovering Language Model Behaviors with Model-Written Evaluations

arXiv 2022

Toy Models of Superposition

arXiv 2022

Constitutional AI: Harmlessness from AI Feedback

arXiv 2022

Evaluating Large Language Models Trained on Code

preprint

Affiliations

Currently at

co-founder / Chief Science Officer · frontier lab

Previously

Johns Hopkins Universityuniversity lab OpenAIfrontier lab

Frequent co-authors

10

from 11 papers

Ethan Perez

7 shared papers

Nicholas Schiefer

7 shared papers

Sam McCandlish

founder

7 shared papers

Samuel R. Bowman

7 shared papers

Shauna Kravec

researcher

7 shared papers

Amanda Askell

researcher

6 shared papers

Danny Hernandez

researcher

6 shared papers

Dario Amodei

CEO

6 shared papers

Deep Ganguli

researcher

6 shared papers

Jackson Kernion

researcher

6 shared papers