0

Measuring Dead Directions: Decomposing and Classifying Singular Structure off Canonical Alignment

We give a descent-free, alignment-free measurement of singular structure on trained networks. At a single frozen checkpoint the read recovers the order $k$ of each dead direction from the directional-Fisher rate, the master invariant from which the per-direction learning…

Preview
Year
2026
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2607.00603CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We give a descent-free, alignment-free measurement of singular structure on trained networks. At a single frozen checkpoint the read recovers the order k of each dead direction from the directional-Fisher rate, the master invariant from which the per-direction learning coefficient 1/(2k) follows exactly, in whatever basis the optimizer left. The same read classifies each direction, separating a genuine singularity, whose order the architecture fixes, from a flat gauge symmetry; the directional-Fisher magnitude settles the cases the order cannot. A pluggable detector supplies the directions for transformer, convolutional, and normalisation layers. The read recovers the architecture-predicted order across constructed cells and trained networks, including a fine-tuned vision transformer whose dead structure is the LayerNorm-kernel gauge and a from-scratch one whose compressed MLP forms a node-death at its activation order. Where the singular structure enumerates, the per-direction orders assemble, through the typed intersection of the loci, into the global coefficient (λ, m) matching the closed form. The method removes the canonical-alignment and descent preconditions of the underlying rate result, turning order-recovery into a deterministic, architecture-general reading. We then map its reach into the Watanabe triple: the order determines the universal singular fluctuation ν(k), though a trained network's realized ν falls below it as the live structure absorbs the dead direction's data fluctuation, and the multiplicity recovers from the dominant structure under a single-locus assumption.