Fine-tuning often introduces spurious correlations alongside task knowledge, causing systematic failures on underrepresented groups. Existing mitigations require retraining, group labels, or curated counterfactual data. We show a simple post-hoc intervention reduces shortcut reliance without any of these: truncating the tail of the SVD of ΔW = W_ft - W_base reduces the spurious-group gap while preserving task accuracy. Across three instruction-tuned models (0.5B--7B) and four classification benchmarks, top-k truncation reduces the gap on every cell at <2 pp accuracy loss, by up to 5\times on CivilComments. We propose this works because the shortcut response sits in the tail of the singular ordering of ΔW, a claim about how truncation behaves rather than about the raw singular values, which are broadly distributed and look the same across all four datasets. A controlled boundary case in which fine-tuning has only a shortcut to learn shows the predicted FT-to-base collapse, and bottom-/random-k and matched-rank LoRA controls rule out generic low-rank approximation and rank-constrained training as the explanation. We read this as preliminary evidence that the singular basis of ΔW is a useful coordinate system for studying what fine-tuning has learned.
Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates
Fine-tuning often introduces spurious correlations alongside task knowledge, causing systematic failures on underrepresented groups. Existing mitigations require retraining, group labels, or curated counterfactual data.
- Preview

- Year
- 2026
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2606.07596CC-BY-4.0
- TL;DR
- Semantic Scholar