We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in four important settings. We derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov Decision Processes with delay (both known and unknown transition functions). Furthermore, we use our analysis to develop an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. In order to derive these results we show that FTRL remains stable across multiple rounds under mild assumptions on the regularizer.

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / L. Zierahn, D. Van Der Hoeven, T. Lancewicki, A. Rosenberg, N.A. Cesa Bianchi. - In: JOURNAL OF MACHINE LEARNING RESEARCH. - ISSN 1533-7928. - 26:(2025), pp. 104.1-104.60.

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

L. Zierahn
Primo
;
D. Van Der Hoeven
Secondo
;
N.A. Cesa Bianchi
Ultimo
2025

Abstract

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in four important settings. We derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov Decision Processes with delay (both known and unknown transition functions). Furthermore, we use our analysis to develop an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. In order to derive these results we show that FTRL remains stable across multiple rounds under mild assumptions on the regularizer.
English
Settore INFO-01/A - Informatica
Articolo
Esperti anonimi
Ricerca di base
Pubblicazione scientifica
   European Lighthouse of AI for Sustainability (ELIAS)
   ELIAS
   EUROPEAN COMMISSION
   101120237

   Algorithms, Games, and Digital Markets (ALGADIMAR)
   ALGADIMAR
   MINISTERO DELL'ISTRUZIONE E DEL MERITO
   2017R9FHSR_006
2025
mar-2025
MIT Press
26
104
1
60
60
Pubblicato
Periodico con rilevanza internazionale
http://jmlr.org/papers/v26/24-0496.html
DSRC - Data science research center
bibtex
Aderisco
info:eu-repo/semantics/article
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / L. Zierahn, D. Van Der Hoeven, T. Lancewicki, A. Rosenberg, N.A. Cesa Bianchi. - In: JOURNAL OF MACHINE LEARNING RESEARCH. - ISSN 1533-7928. - 26:(2025), pp. 104.1-104.60.
open
Prodotti della ricerca::01 - Articolo su periodico
5
262
Article (author)
Periodico senza Impact Factor
L. Zierahn, D. Van Der Hoeven, T. Lancewicki, A. Rosenberg, N.A. Cesa Bianchi
File in questo prodotto:
File Dimensione Formato  
24-0496.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 564.04 kB
Formato Adobe PDF
564.04 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1175556
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact