IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable repre- sentations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that com- bines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BAN- DITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees / A. Tirinzoni, M. Papini, A. Touati, A. Lazaric, M. Pirotta (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: 36th Conference on Neural Information Processing Systems (NeurIPS 2022) / [a cura di] S. Koyejo, S. Mohamed. - [s.l] : Neural information processing systems foundation, 2022. - ISBN 9781713871088. - pp. 2307-2319 (( 36. Conference on Neural Information Processing Systems (NeurIPS 2022) New Orleans 2022.

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

Tirinzoni A.;M. Papini^Secondo;Touati A.;Lazaric A.;Pirotta M.

2022

Abstract

We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable repre- sentations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that com- bines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BAN- DITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2022
			
	URL
	
				https://proceedings.neurips.cc/paper_files/paper/2022/hash/0fd489e5e393f61b355be86ed4c24a54-Abstract-Conference.html
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
NeurIPS-2022-scalable-representation-learning-in-linear-contextual-bandits-with-constant-regret-guarantees-Paper-Conference.pdf accesso aperto Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Licenza: Creative commons Dimensione 1.14 MB Formato Adobe PDF Visualizza/Apri	1.14 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226142

Citazioni

ND

2

0

ND

social impact