Pang Wei Koh
- Papers
- 29
Cite
Notes
Only stored in your browser.
Authored papers
29Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch
arXiv 2026
Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
arXiv 2026
Olmo 3
arXiv 2025
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
arXiv 2025
ReasonIR: Training Retrievers for Reasoning Tasks
arXiv 2025
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
arXiv 2025
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
arXiv 2025
FlexOlmo: Open Language Models for Flexible Data Use
arXiv 2025
Large-Scale Data Selection for Instruction Tuning
arXiv 2025
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions
arXiv 2025
Spurious Rewards: Rethinking Training Signals in RLVR
arXiv 2025
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
arXiv 2025
2 OLMo 2 Furious
arXiv 2024
OLMoE: Open Mixture-of-Experts Language Models
arXiv 2024
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
arXiv 2024
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
arXiv 2024
Instructional Fingerprinting of Large Language Models
arXiv 2024
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
arXiv 2024
Impossibility Theorems for Feature Attribution
arXiv 2022
PLeaS -- Merging Models with Permutations and Least Squares
arXiv 2024
Language models scale reliably with over-training and on downstream tasks
arXiv 2024
Negative Token Merging: Image-based Adversarial Feature Guidance
arXiv 2024
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
arXiv 2024
DataComp: In search of the next generation of multimodal datasets
NeurIPS 2023 11
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
arXiv 2023
Out-of-Domain Robustness via Targeted Augmentations
arXiv 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
arXiv 2023
WILDS: A Benchmark of in-the-Wild Distribution Shifts
arXiv 2020
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
arXiv 2019
Affiliations
Frequent co-authors
10from 29 papers
Hannaneh Hajishirzi
professor
Luke Zettlemoyer
professor
Luca Soldaini
Sewon Min
Rulin Shao
Ali Farhadi
CEO
Wen-tau Yih
Dirk Groeneveld
Hamish Ivison
grad-student
Kyle Lo