Ludwig Schmidt

SWE-smith: Scaling Data for Software Engineering Agents

arXiv 2025

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

arXiv 2025

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

CVPR 2025 1

Resolving Discrepancies in Compute-Optimal Scaling of Language Models

arXiv 2024

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

arXiv 2024

Why are Visually-Grounded Language Models Bad at Image Classification?

arXiv 2024

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

arXiv 2024

Language models scale reliably with over-training and on downstream tasks

arXiv 2024

Large Scale Transfer Learning for Tabular Data via Language Modeling

arXiv 2024

Stable and low-precision training for large-scale vision-language models

NeurIPS 2023 11

DataComp: In search of the next generation of multimodal datasets

NeurIPS 2023 11

GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment

NeurIPS 2023 11

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

arXiv 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

multimodal-c4-an-open-billion-scale-corpus-of

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

arXiv 2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

arXiv 2022

CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation

CVPR 2023 1

Editing Models with Task Arithmetic

arXiv 2022

Measuring and Narrowing the Compositionality Gap in Language Models

arXiv 2022

Reproducible scaling laws for contrastive language-image learning

CVPR 2023 1

Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

arXiv 2022

SFT DatasetMathCode GenerationScientific Reasoning

Robust fine-tuning of zero-shot models

robust-fine-tuning-of-zero-shot-models-1

2021

Retiring Adult: New Datasets for Fair Machine Learning

NeurIPS 2021 12

2021

Do ImageNet Classifiers Generalize to ImageNet?

NeurIPS Workshop ImageNet_PPF 2021 12

2019

Towards Deep Learning Models Resistant to Adversarial Attacks

towards-deep-learning-models-resistant-to-1

2017

Tool contributions

OpenThoughts

Open Thoughts

A fully-open distillation of long DeepSeek-R1 reasoning traces - the community's flagship "open R1" SFT corpus for reasoning models.

Affiliations

Currently at

Stanford University

professor · university lab

Previously

Anthropicfrontier lab University of Washingtonuniversity lab Allen Institute for AI (Ai2)non profit

Frequent co-authors

from 30 papers

Mitchell Wortsman

11 shared papers

Gabriel Ilharco

9 shared papers

Jenia Jitsev

8 shared papers

Ali Farhadi

CEO

Anas Awadalla

Hannaneh Hajishirzi

professor

Samir Yitzhak Gadre

Yejin Choi

professor