We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Perception, Status Prediction, and Driving Planning. Across these tasks, MiMo-Embodied significantly outperforms existing open-source, closed-source, and specialized baselines. Our results indicate that through multi-stage learning, curated data construction, and CoT/RL fine-tuning, these two domains exhibit strong positive transfer and mutually reinforce one another. We provide a detailed analysis of our model design and training methodologies to facilitate further research. Code and models are available at https://github.com/XiaomiMiMo/MiMo-Embodied.
MiMo-Embodied: X-Embodied Foundation Model Technical Report
MiMo-Embodied, a cross-embodied foundation model, achieves state-of-the-art performance in both autonomous driving and embodied AI through multi-stage learning, curated data, and CoT/RL fine-tuning.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 44
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2511.16518ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
44Yuchen ZhangZeyu ZhuFuli LuoJinghui LuGuang ChenHao TianHeng QuShuhao GuShuhuai RenZihao YueWen ZhangLong ChenHaiyang SunBing WangHangjun YeYinan ZhengRui CaiKun MaDiyun XiangXiaoshuai HaoHanbing LiShaoqing XuGuang LiLei ZhouZhijian HuangZhiwen HouYingbo TangLingfeng ZhangZheng LuXianhui MengJing WuChenxu DangJiayi GuanJianhua WuZhiyi HouShumeng XiaMingliang ZhouYuannan ShenJianwei CuiYuncheng JiangZibin GuoChuhong GongChaofan ZhangWenbo Ding