We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai.
MAGI-1: Autoregressive Video Generation at Scale
We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 39
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2505.13211ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
39Ming YanYue CaoZheng ZhangSand. aiHansi TengHongyu JiaLei SunLingzhi LiMaolin LiMingqiu TangShuai HanTianning ZhangW. Q. ZhangWeifeng LuoXiaoyang KangYuchen SunYunpeng HuangYutong LinYuxin FangZewei TaoZhongshu WangZixun LiuDai ShiGuoli SuHanwen SunHong PanJie WangJiexin ShengMin CuiMin HuShucheng YinSiran ZhangTingting LiuXianping YinXiaoyu YangXin SongXuan HuYankai ZhangYuqiao Li