IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts. We adapt the information-theoretic perspective of Russo and Van Roy [2016] to the contextual setting by considering a lifted version of the information ratio defined in terms of the unknown model parameter instead of the optimal action or optimal policy as done in previous works on the same setting. This allows us to bound the regret in terms of the entropy of the prior distribution through a remarkably simple proof, and with no structural assumptions on the likelihood or the prior. The extension to priors with infinite entropy only requires a Lipschitz assumption on the log-likelihood. An interesting special case is that of logistic bandits with d -dimensional parameters, K actions, and Lipschitz logits, for which we provide a ~ O ( √ d K T ) regret upper-bound that does not depend on the smallest slope of the sigmoid link function.

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits / G. Neu, M. Papini, J. Olkhovskaya, L. Schwartz (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: NIPS '22: Proceedings / [a cura di] S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh. - [s.l] : Neural information processing systems foundation, 2022. - ISBN 9781713871088. - pp. 9486-9498 (( 36. Conference on Neural Information Processing Systems (NeurIPS 2022) New Orleans 2022.

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

Neu G.;M. Papini^Secondo;Olkhovskaya J.;Schwartz L.

2022

Abstract

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts. We adapt the information-theoretic perspective of Russo and Van Roy [2016] to the contextual setting by considering a lifted version of the information ratio defined in terms of the unknown model parameter instead of the optimal action or optimal policy as done in previous works on the same setting. This allows us to bound the regret in terms of the entropy of the prior distribution through a remarkably simple proof, and with no structural assumptions on the likelihood or the prior. The extension to priors with infinite entropy only requires a Lipschitz assumption on the log-likelihood. An interesting special case is that of logistic bandits with d -dimensional parameters, K actions, and Lipschitz logits, for which we provide a ~ O ( √ d K T ) regret upper-bound that does not depend on the smallest slope of the sigmoid link function.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Data di pubblicazione
	
				2022
			
	URL
	
				https://proceedings.neurips.cc/paper_files/paper/2022/hash/3d84d9b523e6e82916d496e58761002e-Abstract-Conference.html
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
NeurIPS-2022-lifting-the-information-ratio-an-information-theoretic-analysis-of-thompson-sampling-for-contextual-bandits-Paper-Conference.pdf accesso riservato Tipologia: Publisher's version/PDF Licenza: Nessuna licenza Dimensione 331.65 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	331.65 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226144

Citazioni

ND

7

ND

ND

social impact