We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video generation tasks, achieving 16x16 spatial and 8x temporal compression ratios, while maintaining exceptional video reconstruction quality. User prompts are encoded using two bilingual text encoders to handle both English and Chinese. A DiT with 3D full attention is trained using Flow Matching and is employed to denoise input noise into latent frames. A video-based DPO approach, Video-DPO, is applied to reduce artifacts and improve the visual quality of the generated videos. We also detail our training strategies and share key observations and insights. Step-Video-T2V's performance is evaluated on a novel video generation benchmark, Step-Video-T2V-Eval, demonstrating its state-of-the-art text-to-video quality when compared with both open-source and commercial engines. Additionally, we discuss the limitations of current diffusion-based model paradigm and outline future directions for video foundation models. We make both Step-Video-T2V and Step-Video-T2V-Eval available at https://github.com/stepfun-ai/Step-Video-T2V. The online version can be accessed from https://yuewen.cn/videos as well. Our goal is to accelerate the innovation of video foundation models and empower video content creators.
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 115
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2502.10248ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
115Daxin JiangLiangyu ChenBin WangJing LiJie WuChenfei WuKun YanZecheng TangZekai ZhangTianyu WangNa WangLiang ZhaoYang LiRui WangYingming WangQuan SunZheng GeRanchen MingLei XiaXianfang ZengYibo ZhuBinxing JiaoXiangyu ZhangGang YuJie YangMing LiXu ZhaoHaoyang HuangNan DuanYu ChenBo wangShuchang ZhouHeung-Yeung ShumHeng WangJingyang ZhangHongcheng GuoXuan YangShengming YinYu LuoYu ZhouLei LiuWei JiGe YangChen XuWen SunJiansheng ChenXinhao ZhangGuoqing MaXin HanBrian LiChangyi WanDapeng ShiDeshan SunEnle LiuGuanzhe HuangGulin YanHao NieHaonan JiaJian ZhouJiaoren WuJunzhe LinKaijun TanKaixiang LiKang AnLiguo TanMingliang LiQinglin HeShaoliang PangShiliang YangSiQi LiuTiancheng CaoXiaojia LiuXing ChenYanbo YuYuchu LuoYuxiang YangZidong YangDeyu ZhouAojie LiHanpeng HuMei ChenXuelin ZhangShuli GaoDingyuan HuLiying ShiWenqing HeYilei WangYuanwei LuYuhe YinJianchang WuJiahao GongJunjing GuoJiashuai LiuQiling WuRan SunSitong LiuWeipeng MingYanan WeiYaqi DaiXiaoniu SongBizhu HuangChangxing MiaoChenguang YuHaiyang FengHanqi ChenHaolong YanHuilin XiongHuixin XiongJiashuo LiLiwen HuangMuhua ChengQiaohui ChenQiuyan LiangYineng DengYuheng Feng