Wes Gurnee

Papers: 6

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

6papers

Authored papers

Refusal in Language Models Is Mediated by a Single Direction

arXiv 2024

2024

Universal Neurons in GPT2 Language Models

arXiv 2024

2024

Confidence Regulation Neurons in Language Models

arXiv 2024

2024

Not All Language Model Features Are Linear

arXiv 2024

2024

The Remarkable Robustness of LLMs: Stages of Inference?

arXiv 2024

2024

Language Models Represent Space and Time

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

from 6 papers

Max Tegmark

professor

3 shared papers

Neel Nanda

researcher

Aaquib Syed

Alessandro Stolfo

Andy Arditi

Ben Wu

Daniel Paleka

Dimitris Bertsimas

Eric J. Michaud

Isaac Liao