Aggregation with Exponential Weights is Optimal in Expectation

The aggregation with exponential weights (AEW) estimator is not fully understood in the basic setting of model selection aggregation with squared loss. In particular, whether it is minimax-rate optimal in expectation for large enough fixed temperatures and under random design has been an open problem since its introduction, which was explicitly posed by Lecué and Mendelson (2013). In this paper, we settle this problem by showing that without requiring a Bernstein-type assumption, the AEW indeed achieves the excess risk T \log (M) / (n+1) in expectation, whenever the temperature T satisfies (L^2/T)\exp(B/T)\leq μ/2. Here, the number of dictionary elements is M, the estimator has observed n i.i.d. samples from any distribution, and the loss is assumed to be bounded by B, L-Lipschitz continuous and μ-strongly convex. For squared loss, we show that T\geq 4 b^2 suffices when the predictions and labels are [0,b]-valued. Because AEW is known to be suboptimal in expectation for temperatures below some constant, this shows that AEW has a sharp phase transition when the temperature is large enough but constant, as conjectured by Lecué and Mendelson.