Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising

Normalization Equivariance (NE) is a structural prior that improves robustness to distribution shift in image-to-image tasks. A function f is normalization equivariant iff f(a y + b1) = a f(y) + b1 for all a>0 and b\in\mathbb{R}. Existing NE methods constrain every internal layer to NE-compatible operations. These constraints add runtime cost and exclude standard transformer components such as softmax attention and LayerNorm. We introduce Wrapped Normalization Equivariance (WNE), a parameter-free wrapper that normalizes the input, applies any backbone, and denormalizes the output. We prove every NE function admits this factorization, so the wrapper exactly parameterizes the class of NE functions. On blind denoising, wrapping CNN and transformer architectures improves robustness under noise-level mismatch with no measurable GPU overhead, while architectural NE baselines are up to 1.6\times slower.