Hierarchical Adversarial Bandits for Online Configuration Optimization

Motivated by Online Configuration Optimization in large, dynamic parameter spaces, this work studies the nonstochastic multi-armed bandit (MAB) problem in metric action spaces with oblivious Lipschitz adversaries. We propose ABoB (Adversarial Bandit over Bandits), a hierarchical framework that decomposes the configuration space into clusters to accelerate learning and adapt to changing environments. We evaluate ABoB using standard algorithms such as EXP3 and Tsallis-INF on a real-world production storage system, demonstrating significant performance gains of up to 50% compared to state-of-the-art "flat" bandit algorithms. Extensive simulations further confirm that ABoB effectively exploits metric structures, achieving up to 91% improvement in adversarial metric scenarios while significantly reducing computational running time. Theoretical analysis grounds this empirical success: we prove that ABoB maintains a worst-case "safety net" bound of O(\sqrt{kT}), matching traditional methods, where T is the number of rounds and k is the number of arms, while capable of accelerating learning to O(k^{1/4}\sqrt{T}) under favorable Lipschitz conditions. This combination of operational efficiency and theoretical soundness makes ABoB a practical solution for automated system tuning.