Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

We study distributed adversarial bandits, where N agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is \tildeΘ(\sqrt{(ρ^{-1/2}+K/N)T}), where T is the horizon, K is the number of actions, and ρ is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound \tilde{O}(ρ^{-1/3}(KT)^{2/3}) of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost ρ^{-1/4}\sqrt{T} and a bandit cost \sqrt{KT/N}. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in R^d, obtaining a regret bound of \tilde{O}(\sqrt{(ρ^{-1/2}+1/N)dT}), achieved with only O(d) communication cost per agent and per round via a volumetric spanner.