0

ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality

ZipAR accelerates auto-regressive visual generation by decoding spatially adjacent tokens in parallel, reducing the number of forward passes by up to 91%.

Year
2024
Venue
arXiv 2024
Authors
7
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2412.04062v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework for accelerating auto-regressive (AR) visual generation. The motivation stems from the observation that images exhibit local structures, and spatially distant regions tend to have minimal interdependence. Given a partially decoded set of visual tokens, in addition to the original next-token prediction scheme in the row dimension, the tokens corresponding to spatially adjacent regions in the column dimension can be decoded in parallel, enabling the ``next-set prediction'' paradigm. By decoding multiple tokens simultaneously in a single forward pass, the number of forward passes required to generate an image is significantly reduced, resulting in a substantial improvement in generation efficiency. Experiments demonstrate that ZipAR can reduce the number of model forward passes by up to 91% on the Emu3-Gen model without requiring any additional retraining. Code is available here: https://github.com/ThisisBillhe/ZipAR.

Authors

7