IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λλ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of O((λK)1/3T2/3+KT−−−√)O((λK)1/3T2/3+KT), where TT is the time horizon and KK is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of O((λK)2/3T1/3+lnT)∑i≠i∗Δ−1i)O((λK)2/3T1/3+ln⁡T)∑i≠i∗Δi−1), where ΔiΔi are suboptimality gaps and i∗i∗ is the unique optimal arm. In the special case of λ=0λ=0 (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost.

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs / C. Rouyer, Y. Seldin, N. Cesa Bianchi (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: International Conference on Machine Learning / [a cura di] M. Meila, T. Zhang. - [s.l] : PMLR, 2021. - pp. 9127-9135 (( convegno International Conference on Machine Learning.

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

C. Rouyer;Y. Seldin;N. Cesa Bianchi

2021

Abstract

We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λλ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of O((λK)1/3T2/3+KT−−−√)O((λK)1/3T2/3+KT), where TT is the time horizon and KK is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of O((λK)2/3T1/3+lnT)∑i≠i∗Δ−1i)O((λK)2/3T1/3+ln⁡T)∑i≠i∗Δi−1), where ΔiΔi are suboptimality gaps and i∗i∗ is the unique optimal arm. In the special case of λ=0λ=0 (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo
	
			Settore INF/01 - Informatica
		
	Titolo del progetto
	
	Titolo Progetto
	
									European Learning and Intelligent Systems Excellence (ELISE)
								
	Acronimo
	
									ELISE
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									H2020
								
	N. Contratto
	
									951847
								
	Titolo Progetto
	
									Algorithms, Games, and Digital Markets (ALGADIMAR)
								
	Acronimo
	
									ALGADIMAR
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2017R9FHSR_006
								
	Data di pubblicazione
	
			2021
		
	URL
	
			http://proceedings.mlr.press/v139/rouyer21a/rouyer21a.pdf
		
	Tipologia
	
			Book Part (author)
		
	Appare nelle tipologie:
	
			03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
rouyer21a.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 429.45 kB Formato Adobe PDF Visualizza/Apri	429.45 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/857465

Citazioni

ND

4

0

social impact