Clipping the Price of Adaptivity at the Tail

Adaptive stochastic convex optimization (SCO) methods face a fundamental ``price of adaptivity'' barrier: under the standard set of assumptions, they cannot efficiently adapt to large uncertainty in both the initial distance to optimality and the Lipschitz constant. We circumvent this barrier by requiring a small amount of additional structure common to many learning problems. Specifically, we assume that the objective decomposes into a model and a loss function, enabling us to intervene by modifying the model's output before it passes to the loss function. Under this assumption, we design a method that clips the learned model output in tail events where it deviates too much from the output of a fixed reference model. Our method matches the optimal bounds for known-parameter SCO up to logarithmic factors in the uncertainty in the distance and Lipschitz parameters, thus efficiently adapting to large uncertainty in both.