Optimal cross-learning for contextual bandits with unknown context distributions
We consider the problem of designing contextual bandit algorithms in the ``cross-learning'' setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round.
- Year
- 2024
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.