0

On Variance Reduction in Learning Mean Flows

One-step generative modeling has emerged as a leading approach for amortizing the inference cost of diffusion and flow-matching models. Among distillation-free methods, MeanFlow training is notoriously unstable, with non-decreasing loss and unbounded gradient variance.

Preview
Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2605.09235ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

One-step generative modeling has emerged as a leading approach for amortizing the inference cost of diffusion and flow-matching models. Among distillation-free methods, MeanFlow training is notoriously unstable, with non-decreasing loss and unbounded gradient variance. In this work, we establish a theory that attributes this pathology to a misuse of the conditional velocity field. We show that the conditional velocity plays two distinct statistical roles in the loss: both as an unbiased regression target and as a Monte Carlo control variate in a Jacobi-vector product, with the original MeanFlow loss assigning the wrong coefficient to the latter. We derive the optimal coefficient in closed form and show that a family of fixes in concurrent works corresponds to different practical realizations of the same optimum. A controlled sweep of this coefficient on two-dimensional benchmarks and on a latent Diffusion Transformer recovers the predicted bias-variance ordering. Our DiT experiment also reveals a quantitative FID-MSE landscape mismatch. Specifically, although the gradient-MSE is minimized at an interior coefficient value near β!=!0.94, the coefficient that minimizes FID prefers to use conditional velocity directly at the unbiased corner. Our analysis therefore explains why MeanFlow is unstable and unifies its concurrent remedies, and shows that the variance-optimal coefficient need not coincide with the quality-optimal one.