Atticus Geiger

Papers: 15

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

15papers

Authored papers

From Directions to Regions: Decomposing Activations in Language Models via Local Geometry

arXiv 2026

2026

HyperSteer: Activation Steering at Scale with Hypernetworks

arXiv 2025

2025

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context

arXiv 2025

2025

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

arXiv 2025

2025

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

arXiv 2025

2025

ReFT: Representation Finetuning for Language Models

arXiv 2024

2024

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

arXiv 2024

2024

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

arXiv 2024

2024

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

arXiv 2024

2024

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

arXiv 2024

2024

Linear Representations of Sentiment in Large Language Models

arXiv 2023

2023

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

arXiv 2023

2023

Causal Proxy Models for Concept-Based Model Explanations

arXiv 2022

2022

Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

EMNLP (BlackboxNLP) 2020 11

2020

DynaSent: A Dynamic Benchmark for Sentiment Analysis

ACL 2021 5

2020

Affiliations

No known affiliations.

Frequent co-authors

from 15 papers

Christopher Potts

Zhengxuan Wu

Mor Geva

Aryaman Arora

Christopher D. Manning

Jing Huang

Noah D. Goodman

Or Shafran

Yoav Gur-Arieh

Zheng Wang