IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy (2018) and the worst-case theory of Foster et al. (2021) based on the decision-estimation coefficient. Drawing from both lines of work, we propose a algorithmic template called Optimistic Information-Directed Sampling and show that it can achieve instance-dependent regret guarantees similar to the ones achievable by the classic Bayesian IDS method, but with the major advantage of not requiring any Bayesian assumptions. The key technical innovation of our analysis is introducing an optimistic surrogate model for the regret and using it to define a frequentist version of the Information Ratio of Russo and Van Roy (2018), and a less conservative version of the Decision Estimation Coefficient of Foster et al. (2021).

Optimistic Information-Directed Sampling / G. Neu, M. Papini, L. Schwartz (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: Proceedings of Thirty Seventh Conference on Learning Theory[s.l] : PMLR, 2023. - pp. 3970-4006 (( 37. Conference on Learning Theory : July, 30th - 3rd August Edmonton (Alberta, Canada) 2023.

Optimistic Information-Directed Sampling

Neu G.^Primo;M. Papini^Penultimo;Schwartz L.^Ultimo

2023

Abstract

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy (2018) and the worst-case theory of Foster et al. (2021) based on the decision-estimation coefficient. Drawing from both lines of work, we propose a algorithmic template called Optimistic Information-Directed Sampling and show that it can achieve instance-dependent regret guarantees similar to the ones achievable by the classic Bayesian IDS method, but with the major advantage of not requiring any Bayesian assumptions. The key technical innovation of our analysis is introducing an optimistic surrogate model for the regret and using it to define a frequentist version of the Information Ratio of Russo and Van Roy (2018), and a less conservative version of the Decision Estimation Coefficient of Foster et al. (2021).

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Provably Efficient Algorithms for Large-Scale Reinforcement Learning
								
	Acronimo
	
									SCALER
								
	Nome finanziatore
	
										European Commission
									
	Finanziamento
	
									Horizon 2020 Framework Programme
								
	N. Contratto
	
									950180
								
	Data di pubblicazione
	
				2023
			
	URL
	
				https://proceedings.mlr.press/v247/neu24a.html
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
neu24a.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 402.46 kB Formato Adobe PDF Visualizza/Apri	402.46 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226179

Citazioni

ND

2

0

ND

social impact