This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set \xset under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x \in \xset. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs \otil(\poly(d)\sqrt{T}) regret. Since any algorithm has regret at least Ω(\sqrt{T}) on this problem, our algorithm is optimal in terms of the scaling with T.
Stochastic convex optimization with bandit feedback
This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x…
- Year
- 2011
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/1107.1744ARXIV-DEFAULT
- TL;DR
- Semantic Scholar