Kyle Lo

Papers: 25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

25papers

Authored papers

How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs

arXiv 2026

2026

olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

arXiv 2025

2025

Olmo 3

arXiv 2025

2025

FlexOlmo: Open Language Models for Flexible Data Use

arXiv 2025

2025

2 OLMo 2 Furious

arXiv 2024

2024

OLMo: Accelerating the Science of Language Models

arXiv 2024

2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CVPR 2025 1

2024

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

arXiv 2024

2024

OLMoE: Open Mixture-of-Experts Language Models

arXiv 2024

2024

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

arXiv 2024

2024

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

arXiv 2024

2024

RouterRetriever: Routing over a Mixture of Expert Embedding Models

arXiv 2024

2024

FABLES: Evaluating faithfulness and content selection in book-length summarization

arXiv 2024

2024

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

arXiv 2024

2024

One Thousand and One Pairs: A "novel" challenge for long-context language models

arXiv 2024

2024

The Semantic Scholar Open Data Platform

arXiv 2023

2023

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

arXiv 2023

2023

Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities

arXiv 2022

2022

A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers

NAACL 2021 4

2021

MultiVerS: Improving scientific claim verification with weak supervision and full-document context

Findings (NAACL) 2022 7

2021

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

don-t-stop-pretraining-adapt-language-models-1

2020

TLDR: Extreme Summarization of Scientific Documents

Findings of the Association for Computational Linguistics 2020

2020

CORD-19: The COVID-19 Open Research Dataset

ACL 2020 7

2020

S2ORC: The Semantic Scholar Open Research Corpus

s2orc-the-semantic-scholar-open-research

2019

SciBERT: A Pretrained Language Model for Scientific Text

scibert-a-pretrained-language-model-for

2019

Affiliations

No known affiliations.

Frequent co-authors

from 25 papers

Luca Soldaini

14 shared papers

Hannaneh Hajishirzi

professor

Arman Cohan

Noah A. Smith

Dirk Groeneveld

Iz Beltagy

Jacob Morrison

research-engineer

Pete Walsh

Akshita Bhagia

Dustin Schwenk