This paper presents Step-Audio 2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation. By integrating a latent audio encoder and reasoning-centric reinforcement learning (RL), Step-Audio 2 achieves promising performance in automatic speech recognition (ASR) and audio understanding. To facilitate genuine end-to-end speech conversation, Step-Audio 2 incorporates the generation of discrete audio tokens into language modeling, significantly enhancing its responsiveness to paralinguistic information such as speaking styles and emotions. To effectively leverage the rich textual and acoustic knowledge in real-world data, Step-Audio 2 integrates retrieval-augmented generation (RAG) and is able to call external tools such as web search to mitigate hallucination and audio search to switch timbres. Trained on millions of hours of speech and audio data, Step-Audio 2 delivers intelligence and expressiveness across diverse conversational scenarios. Evaluation results demonstrate that Step-Audio 2 achieves state-of-the-art performance on various audio understanding and conversational benchmarks compared to other open-source and commercial solutions. Please visit https://github.com/stepfun-ai/Step-Audio2 for more information.
Step-Audio 2 Technical Report
This paper presents Step-Audio 2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 109
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2507.16632v3ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
109Daxin JiangLi ZhouBin WangJing LiYi LiuJie WuPeng LiuMingrui ChenLei YangNa WangYibo ZhuBinxing JiaoXiangyu ZhangGang YuBo LiJie YangJin YangMing LiYifan LuHan ZhangXu ZhaoYuxin ZhangHongYu ZhouHeung-Yeung ShumHaiyang SunChen HuSiyu ChenJianjian SunQi HanYang YangChe LiuYuxiang ZhangYu ZhouWei JiChen XuGuoqiang HuZixin ZhangWen SunJiansheng ChenBrian LiBuyun MaChangxin MiaoChangyi WanDapeng ShiEnle LiuGuanzhe HuangGulin YanHao NieHaonan JiaJiaoren WuJunzhe LinKaixiang LiKang AnMingliang LiShaoliang PangShengjie FanSiQi LiuSong YuanTiancheng CaoWang YouWuxun XieYanbo YuYuanhao DingYuchu LuoYufan LuYuxiang YangZidong YangYuanwei LiangXiangyu Tony ZhangFei TianYayue DengHaoyang ZhangYuxin LiYechang HuangXuerui YangNan WuMingxiao LiXueqi LiHanpeng HuXuelin ZhangYimin JiangBoyong WuChao YanCheng YiChengli FengFeiyu ShenJingbei LiXingyuan LiZhao YouJiangjie ZhenBingxin LiChanghe SongDongqing PangShuli GaoWen LiXuan WenYong RenYuankai MaDingyuan HuDonghang WuLiying ShiLonglong GuQinyuan TanWanying LuWenqing HeYilei WangYuanwei LuYuhe YinYumeng Zhan