Niklas Muennighoff

Stanford PhD student known for BLOOMZ, MTEB embedding benchmark, OctoPack code instruction tuning, and OLMoE / s1 efficient reasoning models.

Role: grad-student
Currently at: Stanford University
Twitter: twitter.com/Muennighoff
GitHub: github.com/Muennighoff
Scholar: scholar.google.com/citations
Papers: 34

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

34papers·1tool contribs

Authored papers

34

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

arXiv 2026

s1: Simple Test-Time Scaling

preprint

OpenThoughts: Data Recipes for Reasoning Models

arXiv 2025

ReasonIR: Training Retrievers for Reasoning Tasks

arXiv 2025

Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability

arXiv 2025

FlexOlmo: Open Language Models for Flexible Data Use

arXiv 2025

A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

arXiv 2025

Crosslingual Reasoning through Test-Time Scaling

arXiv 2025

HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

arXiv 2025

UQ: Assessing Language Models on Unsolved Questions

arXiv 2025

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

ACL

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

arXiv 2024

OLMo: Accelerating the Science of Language Models

arXiv 2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CVPR 2025 1

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

arXiv 2024

OLMoE: Open Mixture-of-Experts Language Models

arXiv 2024

KTO: Model Alignment as Prospect Theoretic Optimization

arXiv 2024

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

arXiv 2024

Generative Representational Instruction Tuning

arXiv 2024

A Survey on Data Selection for Language Models

arXiv 2024

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

arXiv 2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

arXiv 2024

RegMix: Data Mixture as Regression for Language Model Pre-training

arXiv 2024

Language models scale reliably with over-training and on downstream tasks

arXiv 2024

SantaCoder: don't reach for the stars!

arXiv 2023

Scaling Data-Constrained Language Models

scaling-data-constrained-language-models

OctoPack: Instruction Tuning Code Large Language Models

arXiv 2023

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

arXiv 2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

What Language Model to Train if You Have One Million GPU Hours?

arXiv 2022

Crosslingual Generalization through Multitask Finetuning

arXiv 2022

SGPT: GPT Sentence Embeddings for Semantic Search

arXiv 2022

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

arXiv 2022

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

arXiv 2021

Tool contributions

1

s1K

Stanford Center for Research on Foundation Models (CRFM)

Stanford's hand-curated 1,000-problem reasoning dataset that, paired with budget forcing at inference, produced o1-competitive results for ~$50 of compute.

SFT DatasetMathScientific Reasoning

Affiliations

Currently at

Stanford University

grad-student · university lab

Previously

Allen Institute for AI (Ai2)non profit Hugging Faceinfra BigSciencecommunity

Frequent co-authors

10

from 34 papers

Hannaneh Hajishirzi

professor

7 shared papers

Zheng Xin Yong

researcher

7 shared papers

Genta Indra Winata

6 shared papers

Luca Soldaini

6 shared papers

Alham Fikri Aji

5 shared papers

Colin Raffel

5 shared papers

Dirk Groeneveld

5 shared papers

Kyle Lo

5 shared papers

Luke Zettlemoyer

professor

5 shared papers

Nathan Lambert

researcher

5 shared papers