From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns -- a stark contrast to the {predictable scaling laws} seen in large language models (LLMs). We identify the root cause as a {fundamental} structural misalignment: {standard} Transformers assume sequential compositionality, whereas CTR data demand combinatorial reasoning over {heterogeneous} fields. To restore alignment, we introduce the Field-Aware Transformer (FAT). {By reconstructing the standard Transformer block with field-centric parameters, FAT achieves structured expressivity, {fundamentally shifting the model complexity dependence from the total vocabulary size n with the number of fields F (n \gg F).}} Crucially, to decouple model capacity from field cardinality, FAT employs a {Basis-Composed Hypernetwork} to synthesize field-specific parameters from shared bases, further reducing parameter complexity. {Theoretically, we ground this scaling behavior through a formal scaling law based on Rademacher complexity. Empirically, FAT outperforms exisiting state-of-the-art methods with up to +4.38% AUC improvement, and delivers +2.33% CTR and +0.66% RPM in live production.} Our work establishes that scalable recommendation arises not from size alone, but from structured expressivity -- architectural coherence with data semantics.