We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate $\frac{(dK)^2\mathrm{poly}\!\log(dKT)}{\Delta_{\min}}$, where $\Delta_{\min}$ is the minimum suboptimality gap over the $d$-dimensional context space. In the adversarial regime, we obtain either the first-order $\widetilde{\mathcal{O}}(dK\sqrt{L^*})$ bound, or the second-order $\widetilde{\mathcal{O}}(dK\sqrt{\Lambda^*})$ bound, where $L^*$ is the cumulative loss of the best action and $\Lambda^*$ is a notion of the cumulative second moment for the losses incurred by the algorithm. Moreover, we develop an algorithm based on FTRL with Shannon entropy regularizer that does not require the knowledge of the inverse of the covariance matrix, and achieves a polylogarithmic regret in the stochastic regime while obtaining $\widetilde{\mathcal{O}}\big(dK\sqrt{T}\big)$ regret bounds in the adversarial regime.

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits / Y. Kuroki, A. Rumi, T. Tsuchiya, F. Vitale, N. Cesa Bianchi (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: Proceedings of The International Conference on Artificial Intelligence and Statistics / [a cura di] S. Dasgupt, S. Mandt, Y. Li. - [s.l] : ML Research Press, 2024. - pp. 1216-1224 (( Intervento presentato al 27. convegno International Conference on Artificial Intelligence and Statistics tenutosi a Valencia nel 2024.

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits

A. Rumi
Secondo
;
N. Cesa Bianchi
Ultimo
2024

Abstract

We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate $\frac{(dK)^2\mathrm{poly}\!\log(dKT)}{\Delta_{\min}}$, where $\Delta_{\min}$ is the minimum suboptimality gap over the $d$-dimensional context space. In the adversarial regime, we obtain either the first-order $\widetilde{\mathcal{O}}(dK\sqrt{L^*})$ bound, or the second-order $\widetilde{\mathcal{O}}(dK\sqrt{\Lambda^*})$ bound, where $L^*$ is the cumulative loss of the best action and $\Lambda^*$ is a notion of the cumulative second moment for the losses incurred by the algorithm. Moreover, we develop an algorithm based on FTRL with Shannon entropy regularizer that does not require the knowledge of the inverse of the covariance matrix, and achieves a polylogarithmic regret in the stochastic regime while obtaining $\widetilde{\mathcal{O}}\big(dK\sqrt{T}\big)$ regret bounds in the adversarial regime.
Settore INFO-01/A - Informatica
   Learning in Markets and Society
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   2022EKNE5K_001

   European Lighthouse of AI for Sustainability (ELIAS)
   ELIAS
   EUROPEAN COMMISSION
   101120237
2024
https://proceedings.mlr.press/v238/kuroki24a.html
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
kuroki24a.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 398.17 kB
Formato Adobe PDF
398.17 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1122718
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact