Pruning Deep Neural Networks via the Marchenko--Pastur Distribution

We study a Marchenko--Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and fine-tuning schedules, rather than a long post-pruning reoptimization pipeline. The theory gives deterministic data-path certificates: if the removed component R has small propagated logit effect L_s | R ψ_1(s) |\infty, pruning decreases an elastic-net objective and preserves samples whose dense margin exceeds twice the perturbation. The zero-budget case gives perfect pruning; a prune--restore extension models weight restoration inside a fixed sparse-execution pattern; and an additive L_2-regularized model shows admissible random-like components vanish at the training limit, with persistent spikes stabilizing as the MP bulk collapses. Under iid-Gaussian sufficient conditions, the fitted MP edge σ+ gives a high-probability layerwise budget signal. On ImageNet-1k, after only three distillation epochs, ViT-B/16 2{:}4{+}ToMe reaches 83.41% top-1 (-1.70 pp from dense) at 59.81% sparse-execution MAC reduction, with 1.388\times best-observed A40 native-2{:}4 backend speedup for the same checkpoint and ToMe graph; a separate no-ToMe A100 endpoint gives 2.705\times. At structured sparsity, ViT-B/16 6{:}12 reaches 83.74%, ViT-L/16 8{:}16 dense+permutation reaches 85.33% (-0.51 pp), and ConvNeXtV2-Base 12{:}16 reaches 86.35% (-0.37 pp). For CNNs, ResNet50 8{:}16 dense+permutation reaches 75.87% (-0.26 pp), and ResNet152d CAST-conv+permutation reaches 81.33% (-1.53 pp) at {\sim}50% MAC accounting with a 1.62\times A40 im2col+2{:}4 sparse-GEMM audit.