Nicholas Schiefer
- Papers
- 7
Cite
Notes
Only stored in your browser.
Authored papers
7Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
arXiv 2024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
arXiv 2024
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
arXiv 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
arXiv 2022
Discovering Language Model Behaviors with Model-Written Evaluations
arXiv 2022
Toy Models of Superposition
arXiv 2022
Constitutional AI: Harmlessness from AI Feedback
arXiv 2022
Affiliations
Frequent co-authors
10from 7 papers
Jared Kaplan
co-founder / Chief Science Officer
Ethan Perez
Shauna Kravec
researcher
Sam McCandlish
founder
Samuel R. Bowman
Zac Hatfield-Dodds
researcher
Amanda Askell
researcher
Anna Chen
researcher
Catherine Olsson
researcher
Danny Hernandez
researcher