Sequential Group Composition: A Window into the Mechanics of Deep Learning

How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation? To gain insight into this question, we introduce the sequential group composition task. In this task, networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. This task can be order-sensitive and cannot be solved by a linear model. Our analysis isolates the roles of the group structure, encoding statistics, and sequence length in shaping learning. We prove that two-layer networks from vanishing initialization learn this task one irreducible representation of the group at a time in an order determined by the Fourier statistics of the encoding. To perfectly learn the task, these networks require a hidden width exponential in the sequence length $k$. In contrast, we construct deeper architectures that exploit associativity to dramatically improve this scaling: recurrent neural networks can compose elements sequentially in $k$ steps, while multilayer networks can compose adjacent pairs in parallel in $\log k$ layers. Overall, the sequential group composition task offers a tractable window into the mechanics of deep learning.