Aryaman Arora

Cite

Notes

Only stored in your browser.

Attribution

4papers

Authored papers

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

arXiv 2024

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

arXiv 2024

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

arXiv 2024

ReFT: Representation Finetuning for Language Models

arXiv 2024

No known affiliations.

from 4 papers

Christopher Potts

Atticus Geiger

Zhengxuan Wu

Christopher D. Manning

Dan Jurafsky

Jing Huang

Noah D. Goodman

Zheng Wang

Thomas Icard