IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in three important settings. On the one hand, we derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov decision processes with delay (and known transition functions). On the other hand, we use our analysis to derive an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. Our novel regret decomposition shows that FTRL remains stable across multiple rounds under mild assumptions on the Hessian of the regularizer.

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / D. van der Hoeven, L. Zierahn, T. Lancewicki, A. Rosenberg, N. Cesa Bianchi (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: Proceedings of Thirty Sixth Conference on Learning Theory / [a cura di] G. Neu, L. Rosasco. - [s.l] : PMLR, 2023. - pp. 1285-1321 (( Intervento presentato al 36. convegno Annual Conference on Learning Theory tenutosi a Bangalore nel 2023.

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

D. van der Hoeven;L. Zierahn;T. Lancewicki;A. Rosenberg;N. Cesa Bianchi

2023

Abstract

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in three important settings. On the one hand, we derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov decision processes with delay (and known transition functions). On the other hand, we use our analysis to derive an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. Our novel regret decomposition shows that FTRL remains stable across multiple rounds under mild assumptions on the Hessian of the regularizer.

Scheda breve

Scheda completa

Scheda completa (DC)

	Presenza di coautori internazionali
	
				Sì
			
	Lingua del contributo
	
				English
			
	Parole chiave
	
				Online learning; bandit feedback; delayed feedback
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Tipo
	
				Intervento a convegno
			
	Revisione (peer review)
	
				Esperti anonimi
			
	Classificazione in base al tipo di ricerca
	
				Ricerca di base
			
	Classificazione della pubblicazione
	
				Pubblicazione scientifica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Algorithms, Games, and Digital Markets (ALGADIMAR)
								
	Acronimo
	
									ALGADIMAR
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2017R9FHSR_006
								
	Titolo Progetto
	
									European Learning and Intelligent Systems Excellence (ELISE)
								
	Acronimo
	
									ELISE
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									H2020
								
	N. Contratto
	
									951847
								
	Titolo del volume
	
				Proceedings of Thirty Sixth Conference on Learning Theory
			
	Curatori del volume
	
				G. Neu, L. Rosasco
			
	Editore
	
				PMLR
			
	Data di pubblicazione
	
				2023
			
	Pagina iniziale
	
				1285
			
	Pagina finale
	
				1321
			
	Numero di pagine
	
				37
			
	Collana
	
				PROCEEDINGS OF MACHINE LEARNING RESEARCH
			
	Numero del volume
	
				195
			
	Tipo di volume
	
				Volume a diffusione internazionale
			
	Nome del convegno
	
				Annual Conference on Learning Theory
			
	Luogo del convegno
	
				Bangalore
			
	Anno del convegno
	
				2023
			
	Numero del convegno
	
				36
			
	Tipo di convegno
	
				Convegno internazionale
			
	URL
	
				https://proceedings.mlr.press/v195/hoeven23a.html
			
	Centro di ricerca coordinata
	
				DSRC - Data science research center
			
	Banca dati sorgente
	
				bibtex
			
	Identificativo ISI
	
				WOS:001222719101010
			
	Identificativo SCOPUS
	
				2-s2.0-85171562513
			
	Adesione alla policy Open Access di Ateneo
	
				Aderisco
			
	Tutti gli autori
	
						D. van der Hoeven, L. Zierahn, T. Lancewicki, A. Rosenberg, N. Cesa Bianchi
					
	Tipologia
	
				Book Part (author)
			
	Fulltext
	
				open
			
	Tipologia sito docente
	
				273
			
	Citazione
	
				A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / D. van der Hoeven, L. Zierahn, T. Lancewicki, A. Rosenberg, N. Cesa Bianchi (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: Proceedings of Thirty Sixth Conference on Learning Theory / [a cura di] G. Neu, L. Rosasco. - [s.l] : PMLR, 2023. - pp. 1285-1321 (( Intervento presentato al 36. convegno Annual Conference on Learning Theory tenutosi a Bangalore nel 2023.
			
	Tipologia
	
				info:eu-repo/semantics/bookPart
			
	Numero autori
	
				5
			
	Tipologia
	
				Prodotti della ricerca::03 - Contributo in volume
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
hoeven23a.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 473.54 kB Formato Adobe PDF Visualizza/Apri	473.54 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1024140

Citazioni

ND

5

0

ND

social impact