Decentralized Learning for Multi-player Multi-armed Bandits

We consider the problem of distributed online learning with multiple players in multi-armed bandits (MAB) models. Each player can pick among multiple arms. When a player picks an arm, it gets a reward. We consider both i.i.d. reward model and Markovian reward model.

Open

Year: 2012
ArXiv: arxiv.org/abs/1206.3582
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/1206.3582
TL;DR: Semantic Scholar

Attribution policy →