Niklas Muennighoff
Stanford PhD student known for BLOOMZ, MTEB embedding benchmark, OctoPack code instruction tuning, and OLMoE / s1 efficient reasoning models.
- Role
- grad-student
- Currently at
- Stanford University
- twitter.com/Muennighoff
- GitHub
- github.com/Muennighoff
- Scholar
- scholar.google.com/citations
- Papers
- 34
Cite
Notes
Only stored in your browser.
Authored papers
34Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
arXiv 2026
s1: Simple Test-Time Scaling
preprint
OpenThoughts: Data Recipes for Reasoning Models
arXiv 2025
ReasonIR: Training Retrievers for Reasoning Tasks
arXiv 2025
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability
arXiv 2025
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task
arXiv 2025
UQ: Assessing Language Models on Unsolved Questions
arXiv 2025
FlexOlmo: Open Language Models for Flexible Data Use
arXiv 2025
Crosslingual Reasoning through Test-Time Scaling
arXiv 2025
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?
arXiv 2025
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
ACL
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
arXiv 2024
OLMo: Accelerating the Science of Language Models
arXiv 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025 1
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
arXiv 2024
OLMoE: Open Mixture-of-Experts Language Models
arXiv 2024
KTO: Model Alignment as Prospect Theoretic Optimization
arXiv 2024
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
arXiv 2024
Generative Representational Instruction Tuning
arXiv 2024
A Survey on Data Selection for Language Models
arXiv 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
arXiv 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
arXiv 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
arXiv 2024
Language models scale reliably with over-training and on downstream tasks
arXiv 2024
SantaCoder: don't reach for the stars!
arXiv 2023
Scaling Data-Constrained Language Models
scaling-data-constrained-language-models
OctoPack: Instruction Tuning Code Large Language Models
arXiv 2023
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
What Language Model to Train if You Have One Million GPU Hours?
arXiv 2022
Crosslingual Generalization through Multitask Finetuning
arXiv 2022
SGPT: GPT Sentence Embeddings for Semantic Search
arXiv 2022
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
arXiv 2022
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
arXiv 2021
Tool contributions
1Affiliations
Frequent co-authors
10from 34 papers
Hannaneh Hajishirzi
professor
Zheng Xin Yong
researcher
Genta Indra Winata
Luca Soldaini
Alham Fikri Aji
Colin Raffel
Dirk Groeneveld
Kyle Lo
Luke Zettlemoyer
professor
Nathan Lambert
researcher