Gradual Capacity Growth for Sparse Network Discovery

Sparse neural network methods typically assume that the target sparsity (or density) is fixed in advance, even though the relationship between network capacity and performance is generally unknown and task-dependent. Existing approaches (including iterative pruning, dynamic sparse training, and pruning at initialization) either rely on dense pretraining, incur substantial retraining cost, or require a preset sparsity budget. We propose Gradual Capacity Growth (GCG), a constructive sparse-to-dense training framework that allocates network capacity progressively during training. Starting from a sparse seed network, GCG grows new connections in stages using PathGrow, a probabilistic path-based growth rule that biases additions toward high-signal input-output pathways while preserving structural diversity. Growth is interleaved with limited training, and a lightweight performance-density extrapolation rule is used to estimate the smallest density beyond which further capacity increases yield diminishing accuracy gains. Experiments on CIFAR, TinyImageNet, and ImageNet show that GCG efficiently identifies sparse networks that achieve near-dense performance at moderate densities, without requiring dense pretraining or exhaustive retraining at each sparsity level. While growth-only methods do not reach the extreme sparsity achievable by pruning or dynamic reallocation techniques, GCG substantially reduces total training cost compared to iterative magnitude pruning and provides a practical mechanism for exploring the accuracy-density tradeoff under limited optimization budgets.