Universality of empirical risk minimization

We study a general class of optimization problems with decision variable \boldsymbolΘ \in \mathbb{R}^{p \times k} and cost function which is the sum of n terms, each dependent on \boldsymbolΘ through the k-dimensional projection \boldsymbolΘ^\top x_i, where x_i, i \leq n are i.i.d. random vectors. This setting is general enough to include examples of current interest in statistical physics, high-dimensional statistics, and statistical learning theory. We consider the proportional asymptotics n, p \to \infty, with n/p = Θ(1), and prove that, whenever there exists a minimizer satisfying a suitable generalization of a "delocalization" condition, the minimum value is universal. Namely, (for subgaussian x_i) it depends on the distribution of x_i only through its asymptotic mean and covariance. This delocalization condition is essentially necessary. Earlier universality results for such problems were limited to strongly convex loss functions. We derive applications of our theory to statistical learning and prove general universality results both for train and (under additional conditions) test error. In particular, we establish universality for vectors x_i generated by random 1-layer neural networks (random features models) and first-order Taylor approximations of 2-layer networks (neural tangent models). Finally, we establish that the delocalization property holds for a class of statistical learning problems under a condition that is easy to verify.