0

A Spectral Phase Diagram for Binary Few-Shot Classification: Intrinsic Dimensionality, Geometric Saturation, and Representational Diagnosis

Deciding when to stop collecting labeled examples is a fundamental but undertheorized problem in applied machine learning. The saturation index $S(K) = \operatorname{erank}(\widehatΣ_W^{(K)}) / K$ measures the ratio of the effective rank of the pooled within-class sample…

Preview
Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2606.24903ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Deciding when to stop collecting labeled examples is a fundamental but undertheorized problem in applied machine learning. The saturation index S(K) = \operatorname{erank}(\widehatΣ_W^{(K)}) / K measures the ratio of the effective rank of the pooled within-class sample covariance to the shot count; we prove it falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized. The index is computable in O(d^3) time from support features alone, requiring no test labels or trained classifier. Evaluated across N = 246 doubling-pair observations from seventeen binary tasks and six datasets, sixteen of seventeen tasks have a positive within-task Spearman correlation between S(K) and marginal accuracy gain (median ρ= 0.811). The pooled Spearman correlation is ρ= 0.548 (p = 1.1 \times 10^{-20}, N = 246). A three-phase diagram (exploration, transition, saturation) with mean marginal gains of 3.48%, 2.40%, and 0.82% is supported by all pairwise significance tests (p \leq 0.008). As a binary stopping rule, the index achieves AUC = 0.752, providing meaningful probabilistic guidance for annotation decisions. Asymptotic effective rank and peak accuracy show no significant monotone relationship across tasks (Spearman r_s = 0.380, p = 0.133, N = 17). A small saturation index paired with low accuracy diagnoses representational inadequacy. All results are for binary classification with a fixed linear classifier; extensions to N-way settings and pretrained backbone representations are discussed as future work.