TAMUNA: Doubly Accelerated Distributed Optimization under Partial Participation

In distributed optimization and federated learning, slow and costly communication between parallel devices and the central server constitutes the primary bottleneck. To alleviate this burden, two strategies have emerged: 1) local training (LT), which reduces communication frequency by performing multiple local computations between rounds, and 2) compression (CC), which consists of transmitting lower-dimensional, compact representations. Recent theoretical advances have successfully combined LT and CC to achieve doubly-accelerated communication rates, with respect to both condition number and model dimension. However, these methods have a major drawback: they require full client participation and break down when idle clients miss communication triggers. We introduce TAMUNA, the first algorithm to successfully intertwine LT, CC, and partial participation. By decoupling primal model updates from dual control variates, TAMUNA overcomes the architectural deadlock of prior methods. In the strongly convex setting, TAMUNA converges linearly to the exact solution, establishing a new state of the art by exhibiting doubly-accelerated convergence, while supporting arbitrary levels of client participation.