0

Phase transitions for the noisy transformer model in arbitrary dimension

We study the McKean--Vlasov free energy on the unit sphere associated with the unnormalized self-attention (USA) model for noisy transformer dynamics. We prove a sharp global-minimizer dichotomy in every dimension $d\ge2$.

Preview
Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2606.05140ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We study the McKean--Vlasov free energy on the unit sphere associated with the unnormalized self-attention (USA) model for noisy transformer dynamics. We prove a sharp global-minimizer dichotomy in every dimension d\ge2. There is a unique β_^{(d)}>0 such that \begin{equation} \frac{I_{d/2+1}(β_^{(d)})}{I_{d/2}(β_^{(d)})}=\frac1d, \end{equation*} where I_ν is the modified Bessel function of the first kind. For 0<β\le β_^{(d)}, the uniform density remains the unique global minimizer up to the linear-stability threshold \begin{equation} K_#^{(d)}(β)=\frac{β^{d/2}}{2^{d/2}Γ(d/2)I_{d/2}(β)}, \end{equation*} and the phase transition is continuous. For β>β_*^{(d)}, the uniform density is not globally minimizing at K_#^{(d)}(β), so the critical coupling satisfies K_c<K_#^{(d)}(β) and the transition is discontinuous. This result generalizes the authors' recent d=2 work arXiv:2604.16288 to arbitrary dimension. The proof uses the sharp Beckner--Onofri/logarithmic Hardy-Littlewood-Sobolev (HLS) inequality on the sphere, together with a Funk--Hecke/Bessel coefficient computation and a degree-two quartic obstruction.