IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Nonstationary phenomena, such as satiation effects in recommendations, have mostly been modeled using bandits with finitely many arms. However, the richer action space provided by linear bandits is often preferred in practice. In this work, we introduce a novel nonstationary linear bandit model, where current rewards are influenced by the learner’s past actions in a fixed-size window. Our model, which recovers stationary linear bandits as a special case, leverages two parameters: the window size m ≥ 0, and an exponent γ that captures the rotting (γ < 0) or rising (γ > 0) nature of the phenomenon. When both m and γ are known, we propose and analyze a variant of OFUL which minimizes regret against cyclic policies. By choosing the cycle length so as to trade-off approximation and estimation errors, we then prove a bound of order √d (m + 1) 1 2 +max{γ,0} T 3/4 (ignoring log factors) on the regret against the optimal sequence of actions, where T is the horizon and d is the dimension of the linear action space. Through a bandit model selection approach, our results are then extended to the case where both m and γ are unknown. Finally, we complement our theoretical results with experiments comparing our approach to natural baselines

Linear Bandits with Memory / G. Clerici, P. Laforgue, N. Cesa Bianchi. - In: TRANSACTIONS ON MACHINE LEARNING RESEARCH. - ISSN 2835-8856. - 5(2024), pp. 1-26.

Linear Bandits with Memory

G. Clerici;P. Laforgue;N. Cesa Bianchi

2024

Abstract

Nonstationary phenomena, such as satiation effects in recommendations, have mostly been modeled using bandits with finitely many arms. However, the richer action space provided by linear bandits is often preferred in practice. In this work, we introduce a novel nonstationary linear bandit model, where current rewards are influenced by the learner’s past actions in a fixed-size window. Our model, which recovers stationary linear bandits as a special case, leverages two parameters: the window size m ≥ 0, and an exponent γ that captures the rotting (γ < 0) or rising (γ > 0) nature of the phenomenon. When both m and γ are known, we propose and analyze a variant of OFUL which minimizes regret against cyclic policies. By choosing the cycle length so as to trade-off approximation and estimation errors, we then prove a bound of order √d (m + 1) 1 2 +max{γ,0} T 3/4 (ignoring log factors) on the regret against the optimal sequence of actions, where T is the horizon and d is the dimension of the linear action space. Through a bandit model selection approach, our results are then extended to the case where both m and γ are unknown. Finally, we complement our theoretical results with experiments comparing our approach to natural baselines

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Learning in Markets and Society
								
	Nome finanziatore
	
										MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
									
	N. Contratto
	
									2022EKNE5K_001
								
	Titolo Progetto
	
									European Lighthouse of AI for Sustainability (ELIAS)
								
	Acronimo
	
									ELIAS
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	N. Contratto
	
									101120237
								
	Data di pubblicazione
	
				2024
			
	Rivista in ANCE
	
				TRANSACTIONS ON MACHINE LEARNING RESEARCH
			
	URL
	
				https://openreview.net/forum?id=CrpDwMFgxr
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
2053_Linear_Bandits_with_Memor.pdf accesso aperto Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 4.17 MB Formato Adobe PDF Visualizza/Apri	4.17 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1087071

Citazioni

ND

ND

ND

social impact