Sewoong Oh
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch
arXiv 2026
Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
arXiv 2026
OpenThoughts: Data Recipes for Reasoning Models
arXiv 2025
Scalable Fingerprinting of Large Language Models
arXiv 2025
Spurious Rewards: Rethinking Training Signals in RLVR
arXiv 2025
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
arXiv 2025
PLeaS -- Merging Models with Permutations and Least Squares
arXiv 2024
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
arXiv 2024
DataComp: In search of the next generation of multimodal datasets
NeurIPS 2023 11
One-shot Empirical Privacy Estimation for Federated Learning
arXiv 2023
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
arXiv 2022
CRISP: Curriculum based Sequential Neural Decoders for Polar Code Family
arXiv 2022
Affiliations
Frequent co-authors
10from 12 papers
Jonathan Hayase
Pang Wei Koh
Ludwig Schmidt
professor
Pramod Viswanath
Yejin Choi
professor
Anshul Nasery
Creston Brooks
Gabriel Ilharco
Georgios Smyrnis
grad-student
Hannaneh Hajishirzi
professor