CeRA: Breaking the Linear Ceiling of Low-Rank Adaptation with Non-linearity Retained at Inference

Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT). However, it faces a ``linear ceiling'': increasing the rank yields diminishing returns in expressive capacity due to linear constraints. We introduce CeRA (Capacity-enhanced Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and dropout to induce non-linearity during inference, thereby placing it in a different function class from adapters whose non-linearity exists during training and collapses to an affine map at inference time. On both the basic arithmetic (GSM8K) and the complex MATH benchmark, CeRA is markedly more parameter-efficient. Across a full rank \times learning rate sweep, CeRA at rank 64 achieves the highest MATH pass@1 of any configuration in the grid (23.6%), matching or exceeding both a rank-512 LoRA (22.4%) and DoRA (19.8%) while using only 1/8 of the parameter budget. With the rank and learning rate fixed, CeRA equals or outperforms LoRA in 10 of 12 matched settings. Spectral analysis attributes the gain, at least in part, to smooth (SiLU) gating, which broadens the utilization of the singular-value spectrum and mitigates the rank collapse that linear adapters exhibit at high rank. Additionally, dropout appears to contribute to regularization rather than rank expansion.