The Survival Bandit Problem

We introduce and study a new variant of the multi-armed bandit problem (MAB), called the survival bandit problem (S-MAB). While in both problems, the objective is to maximize the so-called cumulative reward, in this new variant, the procedure is interrupted if the cumulative…

Open

Year: 2022
ArXiv: arxiv.org/abs/2206.03019
URL: arxiv.org/abs/2206.03019v4
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2206.03019v4
TL;DR: Semantic Scholar

Attribution policy →