Ethan Perez
- Papers
- 22
Cite
Notes
Only stored in your browser.
Authored papers
22Inverse Scaling in Test-Time Compute
arXiv 2025
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
arXiv 2024
Debating with More Persuasive LLMs Leads to More Truthful Answers
arXiv 2024
Alignment faking in large language models
arXiv 2024
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
arXiv 2024
Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
arXiv 2024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
arXiv 2024
Best-of-N Jailbreaking
arXiv 2024
Language Models Learn to Mislead Humans via RLHF
arXiv 2024
Looking Inward: Language Models Can Learn About Themselves by Introspection
arXiv 2024
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
arXiv 2024
Pretraining Language Models with Human Preferences
arXiv 2023
Improving Code Generation by Training with Natural Language Feedback
arXiv 2023
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
arXiv 2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
language-models-don-t-always-say-what-they
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
arXiv 2023
Training Language Models with Language Feedback at Scale
arXiv 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
arXiv 2022
Discovering Language Model Behaviors with Model-Written Evaluations
arXiv 2022
Constitutional AI: Harmlessness from AI Feedback
arXiv 2022
Few-shot Adaptation Works with UnpredicTable Data
arXiv 2022
FiLM: Visual Reasoning with a General Conditioning Layer
arXiv 2017
Affiliations
Frequent co-authors
10from 22 papers
Samuel R. Bowman
Jared Kaplan
co-founder / Chief Science Officer
Nicholas Schiefer
Evan Hubinger
Henry Sleight
Julian Michael
researcher
Shauna Kravec
researcher
Amanda Askell
researcher
Anna Chen
researcher
Carson Denison