Sedrick Keh
Research scientist at Toyota Research Institute and contributor to the DataComp-LM (DCLM) open-source LM pretraining benchmark.
- Role
- researcher
- Currently at
- Toyota Research Institute
- twitter.com/sedrickkeh2
- GitHub
- github.com/sedrickkeh
- Scholar
- scholar.google.com/citations
- Papers
- 7
Cite
Notes
Only stored in your browser.
Authored papers
7Open Thoughts: Curating Reasoning Datasets for Open-Source R1 Replications
blog
OpenThoughts: Data Recipes for Reasoning Models
arXiv 2025
SkillFactory: Self-Distillation For Learning Cognitive Behaviors
arXiv 2025
A Critical Evaluation of AI Feedback for Aligning Large Language Models
arXiv 2024
Language models scale reliably with over-training and on downstream tasks
arXiv 2024
Linearizing Large Language Models
arXiv 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
arXiv 2024
Affiliations
Previously
Frequent co-authors
10from 7 papers
Jean Mercat
researcher
Achal Dave
Georgios Smyrnis
grad-student
Kushal Arora
Marianna Nezhurina
researcher
Niklas Muennighoff
grad-student
Thomas Kollar
Zayne Sprague
grad-student
Alexandros G. Dimakis
Ashima Suvarna
grad-student