In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is O√αT ln K, where K is the number of actions, α is the independence number of the graph, and T is the time horizon. The √ln K factor is known to be necessary when α = 1 (the experts case). On the other hand, when α = K (the bandits case), the minimax rate is known to be Θ√KT , and a lower bound Ω√αT  is known to hold for any α. Our improved upper bound OpαT (1 + ln(K/α)) holds for any α and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with q-Tsallis entropy for a carefully chosen value of q ∈ [1/2, 1) that varies with α. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to time- varying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved ΩpαT (ln K)/(ln α) lower bound for all α > 1, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as α < K.

On the Minimax Regret for Online Learning with Feedback Graphs / K. Eldowa, E. Esposito, T. Cesari, N. Cesa Bianchi (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: Advances in Neural Information Processing Systems. 36 / [a cura di] A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine. - [s.l] : Curran Associates, 2023. - pp. 46122-46133 (( Intervento presentato al 37. convegno Neural Information Processing Systems tenutosi a 2023 nel 2023.

On the Minimax Regret for Online Learning with Feedback Graphs

K. Eldowa
Primo
;
E. Esposito
Secondo
;
T. Cesari
Penultimo
;
N. Cesa Bianchi
Ultimo
2023

Abstract

In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is O√αT ln K, where K is the number of actions, α is the independence number of the graph, and T is the time horizon. The √ln K factor is known to be necessary when α = 1 (the experts case). On the other hand, when α = K (the bandits case), the minimax rate is known to be Θ√KT , and a lower bound Ω√αT  is known to hold for any α. Our improved upper bound OpαT (1 + ln(K/α)) holds for any α and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with q-Tsallis entropy for a carefully chosen value of q ∈ [1/2, 1) that varies with α. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to time- varying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved ΩpαT (ln K)/(ln α) lower bound for all α > 1, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as α < K.
Settore INF/01 - Informatica
   Learning in Markets and Society
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   2022EKNE5K_001

   European Lighthouse of AI for Sustainability (ELIAS)
   ELIAS
   EUROPEAN COMMISSION
   101120237
2023
https://proceedings.neurips.cc/paper_files/paper/2023/file/908f03779b5b063413fbf0247a46a403-Paper-Conference.pdf
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
NeurIPS-2023-on-the-minimax-regret-for-online-learning-with-feedback-graphs-Paper-Conference.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 327.62 kB
Formato Adobe PDF
327.62 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1034112
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact