We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a multimodal large language model (MLLM) that supports visual inputs, InternVL3 jointly acquires multimodal and linguistic capabilities from both diverse multimodal data and pure-text corpora during a single pre-training stage. This unified training paradigm effectively addresses the complexities and alignment challenges commonly encountered in conventional post-hoc training pipelines for MLLMs. To further improve performance and scalability, InternVL3 incorporates variable visual position encoding (V2PE) to support extended multimodal contexts, employs advanced post-training techniques such as supervised fine-tuning (SFT) and mixed preference optimization (MPO), and adopts test-time scaling strategies alongside an optimized training infrastructure. Extensive empirical evaluations demonstrate that InternVL3 delivers superior performance across a wide range of multi-modal tasks. In particular, InternVL3-78B achieves a score of 72.2 on the MMMU benchmark, setting a new state-of-the-art among open-source MLLMs. Its capabilities remain highly competitive with leading proprietary models, including ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 2.5 Pro, while also maintaining strong pure-language proficiency. In pursuit of open-science principles, we will publicly release both the training data and model weights to foster further research and development in next-generation MLLMs.
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 51
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2504.10479v3ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
51Lijun WuKai ChenYu QiaoDahua LinConghui HeHao LiJunjun HeShenglong YeWenhai WangYi WangJiahao WangLewei LuWeiyun WangZhangwei GaoZhe ChenJinguo ZhuYangzhou LiuYue CaoXizhou ZhuJifeng DaiHao TianZhaoyang LiuLixin GuYuchen DuanWeijie SuJie ShaoErfei CuiXuehui WangXingguang WeiHongjie ZhangHaomin WangWeiye XuNianchen DengSongze LiYinan HeTan JiangJiapeng LuoBotian ShiXingcheng ZhangWenqi ShaoYingtong XiongWenwen QuPeng SunPenglong JiaoHan LvKaipeng ZhangHuipeng DengJiaye GeLiMin WangMin DouTong Lu