Robust Learning of a Group DRO Neuron

We study the problem of learning a single neuron under standard squared loss in the presence of arbitrary label noise and group-level distributional shifts, for a broad family of covariate distributions. Our goal is to identify a ''best-fit'' neuron parameterized by w_* that performs well under the most challenging reweighting of the groups. Specifically, we address a Group Distributionally Robust Optimization problem: given sample access to K distinct distributions \mathcal p_{[1]},\dots,\mathcal p_{[K]}, we seek to approximate w_* that minimizes the worst-case objective over convex combinations of group distributions \boldsymbolλ \in Δ_K, where the objective is \sum_{i \in [K]}λ_{[i]},\mathbb E_{(\mathbf x,y)\sim\mathcal p_{[i]}}(σ(\mathbf w\cdot\mathbf x)-y)^2 - νd_f(\boldsymbolλ,\frac{1}{K}\mathbf1) and d_f is an f-divergence that imposes (optional) penalty on deviations from uniform group weights, scaled by a parameter ν\geq 0. We develop a computationally efficient primal-dual algorithm that outputs a vector \widehat{\mathbf w} that is constant-factor competitive with w_* under the worst-case group weighting. Our analytical framework directly confronts the inherent nonconvexity of the loss function, providing robust learning guarantees in the face of arbitrary label corruptions and group-specific distributional shifts. The implementation of the dual extrapolation update motivated by our algorithmic framework shows promise on LLM pre-training benchmarks.