We introduce the first variance-aware algorithms for contextual dueling bandits that leverage shallow exploration strategies with neural networks for nonlinear utility approximation. A key theoretical challenge is the absence of a closed-form estimator, which led prior work to require an extremely large network width m (i.e., m = \widetildeΩ(T^{14})). We address this constraint with a novel analytical approach that combines iterative self-improvement with spectral analysis. Our analysis significantly reduces the network width requirement to m = \widetildeΩ(T^{6}), and shows that our algorithms achieve a sublinear regret of \widetilde{O}(d\sqrt{\sum_{t=1}^{T} σ_t^2} + \sqrt{dT}) under both UCB and TS frameworks. Empirical results show that the proposed algorithms are not only computationally efficient and exhibit sublinear regret in practical settings, but also achieve state-of-the-art performance on both synthetic and real-world tasks.
Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
We introduce the first variance-aware algorithms for contextual dueling bandits that leverage shallow exploration strategies with neural networks for nonlinear utility approximation.
- Year
- 2025
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2506.01250CC-BY-4.0
- TL;DR
- Semantic Scholar