We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single model to efficiently process and fuse multimodal inputs within a unified framework, thereby facilitating diverse tasks without requiring separate models, task-specific fine-tuning, or structural redesign. Importantly, Ming-Omni extends beyond conventional multimodal models by supporting audio and image generation. This is achieved through the integration of an advanced audio decoder for natural-sounding speech and Ming-Lite-Uni for high-quality image generation, which also allow the model to engage in context-aware chatting, perform text-to-speech conversion, and conduct versatile image editing. Our experimental results showcase Ming-Omni offers a powerful solution for unified perception and generation across all modalities. Notably, our proposed Ming-Omni is the first open-source model we are aware of to match GPT-4o in modality support, and we release all code and model weights to encourage further research and development in the community.
Ming-Omni: A Unified Multimodal Model for Perception and Generation
We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 58
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2506.09344ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
58Rui LiuYifei WuJun ZhouQiang XuJingdong ChenFurong XuInclusion AIBiao GongCheng ZouChuanyang ZhengChunluan ZhouCanxiang YanChunxiang JinChunjie ShenDandan ZhengFudong WangGuangming YaoJianxin SunJiajia LiuJianjiang ZhuJun PengKaixiang JiKaiyou SongKaimeng RenLibin WangLixiang RuLele XieLonghua TanLyuxin XueLan WangMochen BaiNing GaoPei ChenQingpei GuoQinglong ZhangRuijie XiongSirui GaoTinghao LiuTaisong LiWeilong ChaiXinyu XiaoXiaomei WangXiaoxue ChenXiao LuXiaoyu LiXingning DongXuzheng YuYi YuanYuting GaoYunxiao SunYipeng chenYongjie LyuZiping MaZipeng FengZhijiang FangZhihao QiuZiyuan HuangZhengyu He