BALF: Budgeted Activation-Aware Low-Rank Factorization for Fine-Tuning-Free Model Compression

Activation-aware low-rank factorization techniques yield strong compression results but are generally confined to linear layers, while existing whitening-based theory typically makes an implicit full-rank assumption on activations. We introduce a layer representation framework that extends activation-aware factorization beyond linear layers, including standard and grouped convolutions. Within this framework, our whitening-based formulation is more general than prior ones, naturally covering rank-deficient activations, and yields an optimal low-rank projection that attains the reconstruction error of the best low-rank approximation to layer activations. The resulting singular spectrum provides a closed-form per-layer distortion proxy, which we use to allocate per-layer ranks under explicit FLOP or parameter-count budgets via a Lagrangian relaxation with negligible overhead. Together, these components form BALF, an end-to-end pipeline for efficient vision model compression. Across CNNs and vision transformers on CIFAR-10 and ImageNet-1K, BALF generally achieves higher accuracy than SVD-based factorization baselines at matched FLOP or parameter count targets and remains competitive with other fine-tuning-free compression techniques.