IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in four important settings. We derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov Decision Processes with delay (both known and unknown transition functions). Furthermore, we use our analysis to develop an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. In order to derive these results we show that FTRL remains stable across multiple rounds under mild assumptions on the regularizer.

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / L. Zierahn, D. Van Der Hoeven, T. Lancewicki, A. Rosenberg, N.A. Cesa Bianchi. - In: JOURNAL OF MACHINE LEARNING RESEARCH. - ISSN 1533-7928. - 26:(2025), pp. 104.1-104.60.

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

L. Zierahn^Primo;D. Van Der Hoeven^Secondo;Tal Lancewicki;Aviv Rosenberg;N.A. Cesa Bianchi^Ultimo

2025

Abstract

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in four important settings. We derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov Decision Processes with delay (both known and unknown transition functions). Furthermore, we use our analysis to develop an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. In order to derive these results we show that FTRL remains stable across multiple rounds under mild assumptions on the regularizer.

Scheda breve

Scheda completa

Scheda completa (DC)

	Presenza di coautori internazionali
	
				Sì
			
	Lingua dell'articolo
	
				English
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Tipo
	
				Articolo
			
	Revisione (peer review)
	
				Esperti anonimi
			
	Classificazione in base al tipo di ricerca
	
				Ricerca di base
			
	Classificazione della pubblicazione
	
				Pubblicazione scientifica
			
	Titolo del progetto
	
	Titolo Progetto
	
									European Lighthouse of AI for Sustainability (ELIAS)
								
	Acronimo
	
									ELIAS
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	N. Contratto
	
									101120237
								
	Titolo Progetto
	
									Algorithms, Games, and Digital Markets (ALGADIMAR)
								
	Acronimo
	
									ALGADIMAR
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2017R9FHSR_006
								
	Data di pubblicazione
	
				2025
			
	Data ahead of print o data di stampa
	
				mar-2025
			
	Rivista in ANCE
	
				JOURNAL OF MACHINE LEARNING RESEARCH
			
	Editore
	
				MIT Press
			
	Volume o annata
	
				26
			
	Numero dell'articolo
	
				104
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				60
			
	Numero di pagine
	
				60
			
	Stato di pubblicazione
	
				Pubblicato
			
	Rilevanza del periodico
	
				Periodico con rilevanza internazionale
			
	URL
	
				http://jmlr.org/papers/v26/24-0496.html
			
	Centro di ricerca coordinata
	
				DSRC - Data science research center
			
	Banca dati sorgente
	
				bibtex
			
	Identificativo ISI
	
				WOS:001534910300001
			
	Adesione alla policy Open Access di Ateneo
	
				Aderisco
			
	Tipologia
	
				info:eu-repo/semantics/article
			
	Citazione
	
				A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / L. Zierahn, D. Van Der Hoeven, T. Lancewicki, A. Rosenberg, N.A. Cesa Bianchi. - In: JOURNAL OF MACHINE LEARNING RESEARCH. - ISSN 1533-7928. - 26:(2025), pp. 104.1-104.60.
			
	Fulltext
	
				open
			
	Tipologia
	
				Prodotti della ricerca::01 - Articolo su periodico
			
	Numero autori
	
				5
			
	Tipologia sito docente
	
				262
			
	Tipologia
	
				Article (author)
			
	Presenza impact factor
	
				Periodico senza Impact Factor
			
	Tutti gli autori
	
						L. Zierahn, D. Van Der Hoeven, T. Lancewicki, A. Rosenberg, N.A. Cesa Bianchi
					
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
24-0496.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 564.04 kB Formato Adobe PDF Visualizza/Apri	564.04 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1175556

Citazioni

ND

ND

0

ND

social impact