Jesse Dodge
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
arXiv 2025
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
arXiv 2025
OLMo: Accelerating the Science of Language Models
arXiv 2024
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
arXiv 2024
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
arXiv 2024
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
arXiv 2024
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
multimodal-c4-an-open-billion-scale-corpus-of
What's In My Big Data?
arXiv 2023
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
arXiv 2023
Retrofitting Word Vectors to Semantic Lexicons
retrofitting-word-vectors-to-semantic-1
Affiliations
Frequent co-authors
10from 10 papers
Dirk Groeneveld
Ian Magnusson
Luca Soldaini
Noah A. Smith
Akshita Bhagia
Emma Strubell
Hannaneh Hajishirzi
professor
Pete Walsh
Yanai Elazar
Abhilasha Ravichander