We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks.
Baichuan-Omni-1.5 Technical Report
Baichuan-Omni-1.5 is an omni-modal model with end-to-end audio generation, featuring a comprehensive data pipeline, audio-tokenizer, and multi-stage training strategy for superior performance across multimodal tasks.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 92
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2501.15368ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
92Jia LiWentao ZhangChenzheng ZhuLei ZhangXin ChenBowen LiHao LiangFan YangXu LiTao ZhangXuezhen DongWei SongXu JiaJun LiuWeiPeng ChenYan ZhangBin XiaoXiaoxi ChenHui LiuMingyang ChenTianpeng LiHaoze SunYijie ZhouZenan ZhouYuran WangShuai ZhaoYuqi HuoMang WangChenglin ZhuYanjun ShenWenjing LuoMingAn LinYujing QiaoDa PanFei LiFuzhong ChenGuosheng DongHongda ZhangJinjie YangKegeng WuLei SuLinzhuang SunShunya DangXionghai LinYifei DuanZhi MaZhiying WuJianhua XuAiyuan YangDian WangGuangwei AiTianyu ZhangYadong LiDongdong KuangXin Wuzehuan liKun LiBowen DingZhe SuChong LiHongyu GuoZheng LiangSong ChenShusen ZhangKeer LuYaqi ZhaoLijun LiuLingfeng MingYuanbo FangMingrui WangYouwei ZhangFengyu ZhangLinchu XiongYozhen WuJiahui YeWenhao LuYaqi ZhouNa NieTing LiPing ZhangYijia SunJincheng WuJianqiang ZhangYicong ChenXiaoqin HuangLingling ZhuRan XiaoJiani PuMengyu AiMiao ZhenYouquan LiYanzhao Qin