We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in three important settings. On the one hand, we derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov decision processes with delay (and known transition functions). On the other hand, we use our analysis to derive an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. Our novel regret decomposition shows that FTRL remains stable across multiple rounds under mild assumptions on the Hessian of the regularizer.

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / D. van der Hoeven, L. Zierahn, T. Lancewicki, A. Rosenberg, N. Cesa Bianchi (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: Proceedings of Thirty Sixth Conference on Learning Theory / [a cura di] G. Neu, L. Rosasco. - [s.l] : PMLR, 2023. - pp. 1285-1321 (( Intervento presentato al 36. convegno Annual Conference on Learning Theory tenutosi a Bangalore nel 2023.

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

L. Zierahn;N. Cesa Bianchi
2023

Abstract

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in three important settings. On the one hand, we derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov decision processes with delay (and known transition functions). On the other hand, we use our analysis to derive an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. Our novel regret decomposition shows that FTRL remains stable across multiple rounds under mild assumptions on the Hessian of the regularizer.
English
Online learning; bandit feedback; delayed feedback
Settore INF/01 - Informatica
Intervento a convegno
Esperti anonimi
Ricerca di base
Pubblicazione scientifica
   Algorithms, Games, and Digital Markets (ALGADIMAR)
   ALGADIMAR
   MINISTERO DELL'ISTRUZIONE E DEL MERITO
   2017R9FHSR_006

   European Learning and Intelligent Systems Excellence (ELISE)
   ELISE
   EUROPEAN COMMISSION
   H2020
   951847
Proceedings of Thirty Sixth Conference on Learning Theory
G. Neu, L. Rosasco
PMLR
2023
1285
1321
37
195
Volume a diffusione internazionale
Annual Conference on Learning Theory
Bangalore
2023
36
Convegno internazionale
https://proceedings.mlr.press/v195/hoeven23a.html
DSRC - Data science research center
bibtex
Aderisco
D. van der Hoeven, L. Zierahn, T. Lancewicki, A. Rosenberg, N. Cesa Bianchi
Book Part (author)
open
273
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / D. van der Hoeven, L. Zierahn, T. Lancewicki, A. Rosenberg, N. Cesa Bianchi (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: Proceedings of Thirty Sixth Conference on Learning Theory / [a cura di] G. Neu, L. Rosasco. - [s.l] : PMLR, 2023. - pp. 1285-1321 (( Intervento presentato al 36. convegno Annual Conference on Learning Theory tenutosi a Bangalore nel 2023.
info:eu-repo/semantics/bookPart
5
Prodotti della ricerca::03 - Contributo in volume
File in questo prodotto:
File Dimensione Formato  
hoeven23a.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 473.54 kB
Formato Adobe PDF
473.54 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1024140
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact