We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study sparse regret bounds, that depend on the number S of non-zero coefficients in the linear reward function. Previous works focused on the case where S is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when S is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When S is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries / T. Jin, K. Jang, N. Cesa Bianchi (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: Advances in Neural Information Processing Systems / [a cura di] A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang. - [s.l] : Curran Associates, Inc., 2024. - ISBN 9798331314385. - pp. 42015-42047 (( Intervento presentato al 38. convegno Annual Conference on Neural Information Processing Systems tenutosi a Vancouver nel 2024.

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

K. Jang;N. Cesa Bianchi
2024

Abstract

We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study sparse regret bounds, that depend on the number S of non-zero coefficients in the linear reward function. Previous works focused on the case where S is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when S is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When S is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.
Settore INFO-01/A - Informatica
   Learning in Markets and Society
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   2022EKNE5K_001

   European Lighthouse of AI for Sustainability (ELIAS)
   ELIAS
   EUROPEAN COMMISSION
   101120237
2024
https://proceedings.neurips.cc/paper_files/paper/2024/file/4a36c3c51af11ed9f34615b81edb5bbc-Paper-Conference.pdf
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
NeurIPS-2024-sparsity-agnostic-linear-bandits-with-adaptive-adversaries-Paper-Conference.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 723.2 kB
Formato Adobe PDF
723.2 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1157683
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact