0

Sparse Neuron Ablation Triggers Catastrophic Collapse of the Language Core in Large Vision-Language Models

Large Vision-Language Models (LVLMs) have shown impressive multimodal understanding capabilities, yet the structures that sustain their functionality remain poorly understood from a mechanistic interpretability standpoint.

Preview
Year
2025
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2512.00918ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Large Vision-Language Models (LVLMs) have shown impressive multimodal understanding capabilities, yet the structures that sustain their functionality remain poorly understood from a mechanistic interpretability standpoint. We propose Consistently Activated Neurons (CAN), a progressive neuron ablation method to identify critical neurons whose removal triggers catastrophic collapse, and use it to investigate structural vulnerabilities in representative 7B LVLMs. Experiments reveal that catastrophic collapse can be triggered by ablating as few as four neurons in LLaVA-1.5-7b-hf and a few thousand in InstructBLIP-vicuna-7b, both representing a small fraction of model parameters. Notably, critical neurons are predominantly localized in the language model, particularly in its down-projection layer, rather than in the vision components. We also observe a consistent two-stage collapse pattern: initial expressive degradation followed by sudden, complete collapse. These findings reveal that LVLM functionality depends on a sparse subset of neurons concentrated in the language backbone, offering mechanistic insights into how their functionality is structured and where these models are most vulnerable.