A Complete Characterization of Learnability for Adversarial Noisy Bandits

We study adversarial noisy bandits given a known function class F. In each round, the adversary selects a function f \in F, the learner chooses an arm, and then observes a noisy reward determined by the chosen arm and the function f. The goal is to minimize the cumulative regret R(T), defined as the difference between the learner's performance and that of the best fixed arm in hindsight over T rounds. We say that a function class F is learnable if there exists an algorithm achieving sublinear regret. Our main result is a complete characterization of learnability for adversarial noisy bandits. The characterization is given in terms of a convexified variant of the generalized maximin volume introduced by Hanneke and Wang (2025): namely, the generalized maximin volume evaluated on the convex hull \operatorname{co}(\mathcal F). We prove that \mathcal F is learnable if and only if this convexified generalized maximin volume is positive at every scale. This condition characterizes learnability against both oblivious and adaptive adversaries, showing in particular that these two notions of learnability are equivalent in the noisy bandit setting. Our analysis reveals that the key complexity measure is closely connected to two new combinatorial notions, hitting set and distribution covering number, which may be of independent interest. These results establish the first complete characterization of learnability for adversarial noisy bandits.