0

Niklas Muennighoff

Stanford PhD student known for BLOOMZ, MTEB embedding benchmark, OctoPack code instruction tuning, and OLMoE / s1 efficient reasoning models.

Role
grad-student
Papers
34

Cite

Notes

Only stored in your browser.

34papers·1tool contribs

Authored papers

34

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

arXiv 2026

2026

s1: Simple Test-Time Scaling

preprint

2025

OpenThoughts: Data Recipes for Reasoning Models

arXiv 2025

2025

ReasonIR: Training Retrievers for Reasoning Tasks

arXiv 2025

2025

Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability

arXiv 2025

2025

HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

arXiv 2025

2025

UQ: Assessing Language Models on Unsolved Questions

arXiv 2025

2025

FlexOlmo: Open Language Models for Flexible Data Use

arXiv 2025

2025

Crosslingual Reasoning through Test-Time Scaling

arXiv 2025

2025

A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

arXiv 2025

2025

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

ACL

2024

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

arXiv 2024

2024

OLMo: Accelerating the Science of Language Models

arXiv 2024

2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CVPR 2025 1

2024

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

arXiv 2024

2024

OLMoE: Open Mixture-of-Experts Language Models

arXiv 2024

2024

KTO: Model Alignment as Prospect Theoretic Optimization

arXiv 2024

2024

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

arXiv 2024

2024

Generative Representational Instruction Tuning

arXiv 2024

2024

A Survey on Data Selection for Language Models

arXiv 2024

2024

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

arXiv 2024

2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

arXiv 2024

2024

RegMix: Data Mixture as Regression for Language Model Pre-training

arXiv 2024

2024

Language models scale reliably with over-training and on downstream tasks

arXiv 2024

2024

SantaCoder: don't reach for the stars!

arXiv 2023

2023

Scaling Data-Constrained Language Models

scaling-data-constrained-language-models

2023

OctoPack: Instruction Tuning Code Large Language Models

arXiv 2023

2023

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

What Language Model to Train if You Have One Million GPU Hours?

arXiv 2022

2022

Crosslingual Generalization through Multitask Finetuning

arXiv 2022

2022

SGPT: GPT Sentence Embeddings for Semantic Search

arXiv 2022

2022

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

arXiv 2022

2022

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

arXiv 2021

2021

Tool contributions

1

Affiliations

Frequent co-authors

10

from 34 papers