Zhengxuan Wu
- Papers
- 8
Cite
Notes
Only stored in your browser.
8papers
Authored papers
8HyperSteer: Activation Steering at Scale with Hypernetworks
arXiv 2025
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
arXiv 2024
ReFT: Representation Finetuning for Language Models
arXiv 2024
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
arXiv 2024
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments
arXiv 2024
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
arXiv 2023
Causal Proxy Models for Concept-Based Model Explanations
arXiv 2022
DynaSent: A Dynamic Benchmark for Sentiment Analysis
ACL 2021 5
Affiliations
No known affiliations.
Frequent co-authors
10from 8 papers