Sewoong Oh

Papers: 12

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

12papers

Authored papers

Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch

arXiv 2026

2026

Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model

arXiv 2026

2026

OpenThoughts: Data Recipes for Reasoning Models

arXiv 2025

2025

Spurious Rewards: Rethinking Training Signals in RLVR

arXiv 2025

2025

Open Deep Search: Democratizing Search with Open-source Reasoning Agents

arXiv 2025

2025

Scalable Fingerprinting of Large Language Models

arXiv 2025

2025

PLeaS -- Merging Models with Permutations and Least Squares

arXiv 2024

2024

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

arXiv 2024

2024

DataComp: In search of the next generation of multimodal datasets

NeurIPS 2023 11

2023

One-shot Empirical Privacy Estimation for Federated Learning

arXiv 2023

2023

CRISP: Curriculum based Sequential Neural Decoders for Polar Code Family

arXiv 2022

2022

Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

from 12 papers

Jonathan Hayase

5 shared papers

Pang Wei Koh

5 shared papers

Ludwig Schmidt

professor

3 shared papers

Pramod Viswanath

3 shared papers

Yejin Choi

professor

Anshul Nasery

Creston Brooks

Gabriel Ilharco

Georgios Smyrnis

grad-student

2 shared papers

Hannaneh Hajishirzi

professor

2 shared papers