0

Gated MLPs as Symmetry-Broken Rank-1 Bilinear Attention

We show that the conventional gated MLP can be viewed as a rank-1 approximation to a bilinear attention mechanism with two distinct factors corresponding to the query and the key.

Preview
Year
2026
Hosting
Full text hostedCC0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2606.22172CC0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We show that the conventional gated MLP can be viewed as a rank-1 approximation to a bilinear attention mechanism with two distinct factors corresponding to the query and the key. We further show that moving the nonlinearity onto one factor breaks the exchange symmetry between the two factors and, for non-homogeneous activations, the inverse-scaling symmetry as well. This perspective may help explain why gated MLPs are effective in practice and inform the design of future architectures.