Luca Soldaini
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
arXiv 2026
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
arXiv 2025
Olmo 3
arXiv 2025
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
arXiv 2025
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
arXiv 2025
Bolmo: Byteifying the Next Generation of Language Models
arXiv 2025
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
arXiv 2025
FlexOlmo: Open Language Models for Flexible Data Use
arXiv 2025
Teaching Models to Understand (but not Generate) High-risk Data
arXiv 2025
2 OLMo 2 Furious
arXiv 2024
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
preprint
OLMo: Accelerating the Science of Language Models
arXiv 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025 1
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
arXiv 2024
OLMoE: Open Mixture-of-Experts Language Models
arXiv 2024
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
arXiv 2024
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
arXiv 2024
RouterRetriever: Routing over a Mixture of Expert Embedding Models
arXiv 2024
Language models scale reliably with over-training and on downstream tasks
arXiv 2024
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
arXiv 2024
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
arXiv 2024
What's In My Big Data?
arXiv 2023
The Semantic Scholar Open Data Platform
arXiv 2023
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
arXiv 2022
Paragraph-based Transformer Pre-training for Multi-Sentence Inference
NAACL 2022 7
The Cascade Transformer: an Application for Efficient Answer Sentence Selection
the-cascade-transformer-an-application-for-1
Affiliations
Frequent co-authors
10from 26 papers
Kyle Lo
Hannaneh Hajishirzi
professor
Noah A. Smith
Dirk Groeneveld
Jacob Morrison
research-engineer
Luke Zettlemoyer
professor
Pang Wei Koh
Pete Walsh
Akshita Bhagia
Dustin Schwenk