Today's most advanced vision-language models (VLMs) remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed VLMs into open ones. As a result, the community has been missing foundational knowledge about how to build performant VLMs from scratch. We present Molmo, a new family of VLMs that are state-of-the-art in their class of openness. Our key contribution is a collection of new datasets called PixMo, including a dataset of highly detailed image captions for pre-training, a free-form image Q&A dataset for fine-tuning, and an innovative 2D pointing dataset, all collected without the use of external VLMs. The success of our approach relies on careful modeling choices, a well-tuned training pipeline, and, most critically, the quality of our newly collected datasets. Our best-in-class 72B model not only outperforms others in the class of open weight and data models, but also outperforms larger proprietary models including Claude 3.5 Sonnet, and Gemini 1.5 Pro and Flash, second only to GPT-4o based on both academic benchmarks and on a large human evaluation. Our model weights, new datasets, and source code are available at https://molmo.allenai.org/blog.
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Today's most advanced vision-language models (VLMs) remain proprietary.
- Year
- 2024
- Venue
- CVPR 2025 1
- Authors
- 50
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2409.17146v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
50Ali FarhadiHannaneh HajishirziNathan LambertNiklas MuennighoffRanjay KrishnaJae Sung ParkEli VanderBiltRose HendrixRoss GirshickPete WalshLuca SoldainiDirk GroeneveldKyle LoTaira AndersonChristopher ClarkCrystal NamMichael SchmitzSam SkjonsbergNoah A. SmithJon BorchardtMatt DeitkeSangho LeeRohun TripathiYue YangMohammadreza SalehiJiasen LuErin BransomKiana EhsaniHuong NgoYenSung ChenAjay PatelMark YatskarChris Callison-BurchAndrew HeadFavyen BastaniYvonne ChouArnavi ChhedaJenna SparksAaron SarnatByron BischoffChris NewellPiper WoltersTanmay GuptaKuo-Hao ZengSophie LebrechtCaitlin WittlifCarissa SchoenickOscar MichelLuca WeihsAniruddha Kembhavi