Atticus Geiger
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15From Directions to Regions: Decomposing Activations in Language Models via Local Geometry
arXiv 2026
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
arXiv 2025
HyperSteer: Activation Steering at Scale with Hypernetworks
arXiv 2025
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
arXiv 2025
Enhancing Automated Interpretability with Output-Centric Feature Descriptions
arXiv 2025
ReFT: Representation Finetuning for Language Models
arXiv 2024
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
arXiv 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
arXiv 2024
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments
arXiv 2024
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
arXiv 2024
Linear Representations of Sentiment in Large Language Models
arXiv 2023
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
arXiv 2023
Causal Proxy Models for Concept-Based Model Explanations
arXiv 2022
DynaSent: A Dynamic Benchmark for Sentiment Analysis
ACL 2021 5
Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation
EMNLP (BlackboxNLP) 2020 11
Affiliations
Frequent co-authors
10from 15 papers