Kyle Lo
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
arXiv 2026
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
arXiv 2025
Olmo 3
arXiv 2025
FlexOlmo: Open Language Models for Flexible Data Use
arXiv 2025
2 OLMo 2 Furious
arXiv 2024
OLMo: Accelerating the Science of Language Models
arXiv 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025 1
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
arXiv 2024
OLMoE: Open Mixture-of-Experts Language Models
arXiv 2024
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
arXiv 2024
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
arXiv 2024
RouterRetriever: Routing over a Mixture of Expert Embedding Models
arXiv 2024
FABLES: Evaluating faithfulness and content selection in book-length summarization
arXiv 2024
One Thousand and One Pairs: A "novel" challenge for long-context language models
arXiv 2024
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
arXiv 2024
The Semantic Scholar Open Data Platform
arXiv 2023
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
arXiv 2023
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
arXiv 2022
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers
NAACL 2021 4
MultiVerS: Improving scientific claim verification with weak supervision and full-document context
Findings (NAACL) 2022 7
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
don-t-stop-pretraining-adapt-language-models-1
TLDR: Extreme Summarization of Scientific Documents
Findings of the Association for Computational Linguistics 2020
CORD-19: The COVID-19 Open Research Dataset
ACL 2020 7
S2ORC: The Semantic Scholar Open Research Corpus
s2orc-the-semantic-scholar-open-research
SciBERT: A Pretrained Language Model for Scientific Text
scibert-a-pretrained-language-model-for
Affiliations
Frequent co-authors
10from 25 papers
Luca Soldaini
Hannaneh Hajishirzi
professor
Arman Cohan
Noah A. Smith
Dirk Groeneveld
Iz Beltagy
Jacob Morrison
research-engineer
Pete Walsh
Akshita Bhagia
Dustin Schwenk