Mechanistic Interpretability (MI) has emerged as a vital approach to demystify the opaque decision-making of Large Language Models (LLMs). However, existing reviews primarily treat MI as an observational science, summarizing analytical insights while lacking a systematic framework for actionable intervention. To bridge this gap, we present a practical survey structured around the pipeline: "Locate, Steer, and Improve." We formally categorize Localizing (diagnosis) and Steering (intervention) methods based on specific Interpretable Objects to establish a rigorous intervention protocol. Furthermore, we demonstrate how this framework enables tangible improvements in Alignment, Capability, and Efficiency, effectively operationalizing MI as an actionable methodology for model optimization. The curated paper list of this work is available at https://github.com/rattlesnakey/Awesome-Actionable-MI-Survey.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models
Mechanistic interpretability is presented as an actionable framework for understanding and optimizing large language models through systematic localization, steering, and improvement methods.
- Year
- 2026
- Venue
- arXiv 2026
- Stars
- 135
- Authors
- 28
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2601.14004ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Topics
1Abstract
Authors
28Zunhai SuJing XiongNgai WongQi ZhangZhiheng XiTao GuiXuanjing HuangDongdong ZhangRuobing XieHinrich SchützeHui ShenChaofan TaoHayden Kwok-Hay SoXiao LiangHengyuan ZhangYiwei WangSophia AnaniadouZeping YuZhihao ZhangMingyang WangQianli WangShuzhou YuanErcong NieXufeng DuanQibo XueChenming ShangZhengwu LiuSenjie Jin