We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable repre- sentations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that com- bines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BAN- DITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees / A. Tirinzoni, M. Papini, A. Touati, A. Lazaric, M. Pirotta (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: 36th Conference on Neural Information Processing Systems (NeurIPS 2022) / [a cura di] S. Koyejo, S. Mohamed. - [s.l] : Neural information processing systems foundation, 2022. - ISBN 9781713871088. - pp. 2307-2319 (( 36. Conference on Neural Information Processing Systems (NeurIPS 2022) New Orleans 2022.

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

M. Papini
Secondo
;
2022

Abstract

We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable repre- sentations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that com- bines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BAN- DITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
2022
https://proceedings.neurips.cc/paper_files/paper/2022/hash/0fd489e5e393f61b355be86ed4c24a54-Abstract-Conference.html
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
NeurIPS-2022-scalable-representation-learning-in-linear-contextual-bandits-with-constant-regret-guarantees-Paper-Conference.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Licenza: Creative commons
Dimensione 1.14 MB
Formato Adobe PDF
1.14 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226142
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact