Samuel R. Bowman
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24Debating with More Persuasive LLMs Leads to More Truthful Answers
arXiv 2024
Alignment faking in large language models
arXiv 2024
Language Models Learn to Mislead Humans via RLHF
arXiv 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
arXiv 2024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
arXiv 2024
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
arXiv 2024
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
COLM
Pretraining Language Models with Human Preferences
arXiv 2023
Improving Code Generation by Training with Natural Language Feedback
arXiv 2023
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
arXiv 2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
language-models-don-t-always-say-what-they
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
arXiv 2023
Debate Helps Supervise Unreliable Experts
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Discovering Language Model Behaviors with Model-Written Evaluations
arXiv 2022
Instruction Induction: From Few Examples to Natural Language Task Descriptions
arXiv 2022
Constitutional AI: Harmlessness from AI Feedback
arXiv 2022
SQuALITY: Building a Long-Document Summarization Dataset the Hard Way
arXiv 2022
QuALITY: Question Answering with Long Input Texts, Yes!
NAACL 2022 7
BBQ: A Hand-Built Bias Benchmark for Question Answering
Findings (ACL) 2022 5
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
EMNLP 2020 11
BLiMP: The Benchmark of Linguistic Minimal Pairs for English
blimp-the-benchmark-of-linguistic-minimal
On Measuring Social Biases in Sentence Encoders
on-measuring-social-biases-in-sentence-1
A large annotated corpus for learning natural language inference
a-large-annotated-corpus-for-learning-natural-1
Affiliations
Frequent co-authors
10from 24 papers
Ethan Perez
Jared Kaplan
co-founder / Chief Science Officer
Angelica Chen
Evan Hubinger
Jason Phang
Julian Michael
researcher
Nicholas Schiefer
Alicia Parrish
Amanda Askell
researcher
Carson Denison