Sedrick Keh

Research scientist at Toyota Research Institute and contributor to the DataComp-LM (DCLM) open-source LM pretraining benchmark.

Role: researcher
Currently at: Toyota Research Institute
Twitter: twitter.com/sedrickkeh2
GitHub: github.com/sedrickkeh
Scholar: scholar.google.com/citations
Papers: 7

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

7papers

Authored papers

Open Thoughts: Curating Reasoning Datasets for Open-Source R1 Replications

blog

2025

OpenThoughts: Data Recipes for Reasoning Models

arXiv 2025

2025

SkillFactory: Self-Distillation For Learning Cognitive Behaviors

arXiv 2025

2025

A Critical Evaluation of AI Feedback for Aligning Large Language Models

arXiv 2024

2024

Language models scale reliably with over-training and on downstream tasks

arXiv 2024

2024

Linearizing Large Language Models

arXiv 2024

2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

arXiv 2024

2024

Affiliations

Currently at

Toyota Research Institute

researcher · research group

Previously

Carnegie Mellon Universityuniversity lab

Frequent co-authors

from 7 papers

Jean Mercat

researcher

4 shared papers

Achal Dave

3 shared papers

Georgios Smyrnis

grad-student

3 shared papers

Kushal Arora

3 shared papers

Marianna Nezhurina

researcher

3 shared papers

Niklas Muennighoff

grad-student

3 shared papers

Thomas Kollar

3 shared papers

Zayne Sprague

grad-student

3 shared papers

Alexandros G. Dimakis

2 shared papers

Ashima Suvarna

grad-student

2 shared papers