0

Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs

Continuous diffusion language models such as ELF report record-low generative perplexity (Gen-PPL). We find a catch: these models repeat far more than human text, and Gen-PPL rewards rather than penalizes that repetition, so its low scores overstate quality.

Preview
Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2607.00588ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Continuous diffusion language models such as ELF report record-low generative perplexity (Gen-PPL). We find a catch: these models repeat far more than human text, and Gen-PPL rewards rather than penalizes that repetition, so its low scores overstate quality. Strip the repetition and ELF-B's Gen-PPL rises from 19.5 to 27.7; the smallest model even posts the best Gen-PPL because it repeats most. We trace the repetition to its source: a contractive attractor along a single direction in the self-conditioning feedback loop, the loop that feeds each step's clean estimate into the next. Because the failure is one-dimensional, a one-dimensional fix suffices, and we propose one. ACE (Attractor-Contrast-Escape) subtracts that single, label-free direction from the feedback at each step. Estimated once on the 105M model, the direction cuts repetition to near the human level while keeping quality competitive, and transfers near-unchanged to the 342M and 652M models and across samplers; the same recipe recovers useful directions on other architectures. Since Gen-PPL itself rewards repetition, we instead measure the compute each fix needs to produce human-clean text, where ACE is 1.5--5\times cheaper.