Unregularized limit of stochastic gradient method for Wasserstein distributionally robust optimization

Wasserstein distributionally robust optimization offers a framework for model fitting in machine learning under potential shifts in the data distribution. We study a regularized variant of this problem in which entropic smoothing produces a sampled approximation of the original objective. We establish convergence of the approximate gradients to subgradients of the unregularized objective as the regularization parameter vanishes, enabling convergence guarantees for stochastic gradient methods. We obtain qualitative convergence results under general assumptions, then we provide convergence rates under additional regularity. In particular, we prove rates for the convergence of the unregularized objective values, up to sampling errors, when the regularization level is decreased across iterations. Our analysis yields byproducts of independent interest, including approximation results for smoothing of maximum functions subdifferentials and empirical lower bounds for dual solutions of Wasserstein distributionally robust optimization.