Is Oracle Pruning the True Oracle?

Oracle pruning, which selects unimportant weights by minimizing the pruned train loss, has served as the foundation for most neural network pruning methods for over thirty-five years, while few (if any) have thought about how much the foundation really holds. This paper, for the first time, attempts to systematically examine its validity on deep neural networks through empirical correlation analyses and provides meta-framework reflections on the field of neural network pruning. Specifically, this paper focuses on the pruning algorithms with three stages: training, pruning, and retraining. We analyze the correlation in model performance before and after the retraining stage. Extensive experiments (37K models are trained) across a wide spectrum of models (LeNet5, VGG, ResNets, ViT, MLLM) and datasets (MNIST, CIFAR10/CIFAR100, ImageNet-1K, MLLM data) are conducted. For large-scale experiments, we adopt approximate oracle pruning due to the prohibitive cost of exact oracle pruning. The results point to a counterintuitive conclusion: for deep learning models of nontrivial size (already at the scale of ResNet56 on CIFAR-10), pre-retraining performance is negligibly correlated with post-retraining performance. In other words, the weights identified by oracle pruning can scarcely guarantee strong performance following retraining. This further suggests that existing works that derive pruning criteria from oracle pruning may rest on a questionable foundational premise. Further studies suggest that rising task complexity is a primary factor behind the invalidity of oracle pruning nowadays. Finally, given the evidence, we argue that the retraining stage in a pruning algorithm should be accounted for when developing pruning criteria.