This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over K+1 outcomes follows a multinomial logistic model of d-dimensional action vectors. A representative UCB-type algorithm, OFUL-MLogB, achieves a regret bound of \tilde{O}(Kd\sqrt{T}), but still requires O(K^3d^3) time and O(K^2d^2) space per round due to parameter estimation and optimistic reward construction, which is prohibitive in high-dimensional settings. To address this limitation, we propose EOFD-MLogB, which integrates frequent directions matrix sketching into OFUL-MLogB. By maintaining a low-rank SVD sketch of the accumulated Hessian, constrained online Newton updates in parameter estimation and Kd \times K spectral-norm computations in the reward bonus are reduced to one-dimensional root-finding tasks and K \times K eigenvalue computations, respectively. This yields dominant per-round time complexity O(Kd(m+K)^2) and space complexity O(Kd(m+K)), where m \ll d is the sketch size. We further prove a regret bound of \tilde{O}(Δ_T(Kd\lnΔ_T+m)\sqrt{T}), where the sketching error factor Δ_T is controlled by the m-truncated spectral tail of the Hessian. Thus, when the Hessian is approximately low-rank, the regret is close to that of OFUL-MLogB. Experiments validate the computational efficiency and competitive performance.
Efficient Multinomial Logistic Bandit via Frequent Directions
This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over $K+1$ outcomes follows a multinomial logistic model of $d$-dimensional action vectors.
- Preview

- Year
- 2026
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2606.11968CC-BY-4.0
- TL;DR
- Semantic Scholar