0

Samuel R. Bowman

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

Debating with More Persuasive LLMs Leads to More Truthful Answers

arXiv 2024

2024

Alignment faking in large language models

arXiv 2024

2024

Language Models Learn to Mislead Humans via RLHF

arXiv 2024

2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

arXiv 2024

2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

arXiv 2024

2024

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

arXiv 2024

2024

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

COLM

2023

Pretraining Language Models with Human Preferences

arXiv 2023

2023

Improving Code Generation by Training with Natural Language Feedback

arXiv 2023

2023

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

arXiv 2023

2023

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

language-models-don-t-always-say-what-they

2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

arXiv 2023

2023

Debate Helps Supervise Unreliable Experts

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Discovering Language Model Behaviors with Model-Written Evaluations

arXiv 2022

2022

Instruction Induction: From Few Examples to Natural Language Task Descriptions

arXiv 2022

2022

Constitutional AI: Harmlessness from AI Feedback

arXiv 2022

2022

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way

arXiv 2022

2022

QuALITY: Question Answering with Long Input Texts, Yes!

NAACL 2022 7

2021

BBQ: A Hand-Built Bias Benchmark for Question Answering

Findings (ACL) 2022 5

2021

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

EMNLP 2020 11

2020

BLiMP: The Benchmark of Linguistic Minimal Pairs for English

blimp-the-benchmark-of-linguistic-minimal

2019

On Measuring Social Biases in Sentence Encoders

on-measuring-social-biases-in-sentence-1

2019

A large annotated corpus for learning natural language inference

a-large-annotated-corpus-for-learning-natural-1

2015

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers