One-shot Entropy Minimization

Open

Preview
Year: 2025
Venue: arXiv 2025
ArXiv: arxiv.org/abs/2505.20282
Authors: 4
Hosting: Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2505.20282v2ARXIV-DEFAULT
TL;DR: Semantic Scholar

Attribution policy →

Abstract

We trained 13,440 large language models and found that entropy minimization requires only a single unlabeled data and 10 steps optimization to achieve performance improvements comparable to or even greater than those obtained using thousands of data and carefully designed rewards in rule-based reinforcement learning. This striking result may prompt a rethinking of post-training paradigms for large language models. Our code is avaliable at https://github.com/zitian-gao/one-shot-em.

Authors

Bryan Dai Zitian Gao Lynx Chen Joey Zhou