Nicholas Schiefer

Papers: 7

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

7papers

Authored papers

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

arXiv 2024

2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

arXiv 2024

2024

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

arXiv 2023

2023

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

arXiv 2022

2022

Discovering Language Model Behaviors with Model-Written Evaluations

arXiv 2022

2022

Toy Models of Superposition

arXiv 2022

2022

Constitutional AI: Harmlessness from AI Feedback

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

from 7 papers

Jared Kaplan

co-founder / Chief Science Officer

7 shared papers

Ethan Perez

6 shared papers

Shauna Kravec

researcher

6 shared papers

Sam McCandlish

founder

5 shared papers

Samuel R. Bowman

5 shared papers

Zac Hatfield-Dodds

researcher

5 shared papers

Amanda Askell

researcher

4 shared papers

Anna Chen

researcher

4 shared papers

Catherine Olsson

researcher

4 shared papers

Danny Hernandez

researcher

4 shared papers