IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that tphe transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order √(d+1)KT ($\sqrt{(d+1)KT}$), where d is the maximum delay, K is the number of arms, and T is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting.

Nonstochastic Bandits with Composite Anonymous Feedback / N. Cesa Bianchi, T.R. Cesari, R. Colomboni, C. Gentile, Y. Mansour. - In: JOURNAL OF MACHINE LEARNING RESEARCH. - ISSN 1533-7928. - 23:(2022 Aug), pp. 277.1-277.24. (Intervento presentato al 31. convegno Conference on Learning Theory (COLT) tenutosi a Stockholm nel 2018).

Nonstochastic Bandits with Composite Anonymous Feedback

N. Cesa Bianchi^Primo;T.R. Cesari^Secondo;R. Colomboni;C. Gentile^Penultimo;Yishay Mansour^Ultimo

2022

Abstract

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that tphe transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order √(d+1)KT ($\sqrt{(d+1)KT}$), where d is the maximum delay, K is the number of arms, and T is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Multi-armed bandits; non-stochastic losses; composite losses; delayed feedback; online learning
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Algorithms, Games, and Digital Markets (ALGADIMAR)
								
	Acronimo
	
									ALGADIMAR
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2017R9FHSR_006
								
	Titolo Progetto
	
									European Learning and Intelligent Systems Excellence (ELISE)
								
	Acronimo
	
									ELISE
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									H2020
								
	N. Contratto
	
									951847
								
	Data di pubblicazione
	
				ago-2022
			
	Rivista in ANCE
	
				JOURNAL OF MACHINE LEARNING RESEARCH
			
	URL
	
				http://jmlr.org/papers/v23/21-1443.html
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
21-1443.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 446.26 kB Formato Adobe PDF Visualizza/Apri	446.26 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/939315

Citazioni

ND

1

0

ND

social impact