GranQ: Granular Zero-Shot Quantization with Channel-Wise Activation Scaling in QAT

Zero-shot quantization (ZSQ) enables neural network compression without original training data, making it a promising solution for restricted data access scenarios. To compensate for the lack of data, recent ZSQ methods typically rely on synthetic inputs generated from the full-precision model. However, these synthetic inputs often lead to activation distortion, especially under low-bit settings. As a result, existing methods struggle to mitigate this issue due to coarse activation scaling. To address this issue, we propose GranQ, a novel activation quantization framework that efficiently applies per-channel scaling through vectorized computation. In contrast to conventional channel-wise methods, which apply vectorization only to the quantization step, GranQ improves efficiency by vectorizing the scaling operation. This design allows GranQ to maintain fine-grained quantization granularity with minimal computational overhead, even in low-bit environments. Extensive experiments under quantization-aware training (QAT) settings demonstrate that GranQ consistently outperforms state-of-the-art ZSQ methods across CIFAR and ImageNet. In particular, our method achieves up to 5.45% higher accuracy in the 3-bit setting on CIFAR-100 and even surpasses the full-precision baseline on CIFAR-10. Furthermore, GranQ achieves significant speedup in quantization latency over conventional per-channel methods, demonstrating improved efficiency. With these findings, we anticipate that GranQ will inspire future research beyond conventional ZSQ approaches centered on data generation and model fine-tuning.

GranQ: Granular Zero-Shot Quantization with Channel-Wise Activation Scaling in QAT

Abstract

Authors