Optimal cross-learning for contextual bandits with unknown context distributions

We consider the problem of designing contextual bandit algorithms in the ``cross-learning'' setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round.

Open

Year: 2024
ArXiv: arxiv.org/abs/2401.01857
URL: arxiv.org/abs/2401.01857v1
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2401.01857v1
TL;DR: Semantic Scholar

Attribution policy →