We study Transformer-type residual layers under cross-entropy training through a continuous-depth mean field control viewpoint. Depth is treated as time, layer parameters as controls, and the residual Transformer recursion as an explicit Euler scheme for a controlled hidden-state flow. For fixed controls, we prove an O(\varepsilon) pathwise approximation of finite-depth trajectories by the continuous flow and combine this with high-probability sampling bounds for the empirical cross-entropy risk. We formulate the limiting population problem as a first-order transport control problem for the law of hidden states and derive a Pontryagin condition whose terminal adjoint contains the softmax residual. We also give finite-class and metric-entropy uniform estimates, compare optimal values, and discuss existence, stability, continuous-to-discrete recovery, initialization, and range estimates for continuous minimizers.
A First-Order Mean Field Control Analysis of Transformer Layers under Cross-Entropy Training
We study Transformer-type residual layers under cross-entropy training through a continuous-depth mean field control viewpoint. Depth is treated as time, layer parameters as controls, and the residual Transformer recursion as an explicit Euler scheme for a controlled…
- Preview

- Year
- 2026
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2606.23235ARXIV-DEFAULT
- TL;DR
- Semantic Scholar