Akbir Khan

Cite

Notes

Only stored in your browser.

Attribution

5papers

Authored papers

Language Models Learn to Mislead Humans via RLHF

arXiv 2024

Debating with More Persuasive LLMs Leads to More Truthful Answers

arXiv 2024

Alignment faking in large language models

arXiv 2024

JaxMARL: Multi-Agent RL Environments and Algorithms in JAX

arXiv 2023

The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

the-goldilocks-of-pragmatic-understanding

No known affiliations.

from 5 papers

Ethan Perez

Samuel R. Bowman

Tim Rocktaschel

Edward Grefenstette

Laura Ruis

Alexander Rutherford

Alexandra Souly

Andrei Lupu

Ansh Radhakrishnan

Benjamin Ellis