IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is Õ(√αT), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O( (ln T)2 maxS∈I(G) ∑i∈S ∆-1i) where I (G) is the family of all independent sets in a suitably defined undirected version of the graph and ∆i are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time. © 2022 Neural information processing systems foundation.

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs / C. Rouyer, D. van der Hoeven, N. Cesa Bianchi, Y. Seldin (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: NeurIPS / [a cura di] S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh. - [s.l] : Curran Associates, 2022. - ISBN 9781713871088. - pp. 35035-35048 (( Intervento presentato al 36. convegno Conference on Neural Information Processing Systems : Monday November 28th through Friday December 9th tenutosi a New Orleans nel 2022.

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

C. Rouyer;D. van der Hoeven;N. Cesa Bianchi;Y. Seldin

2022

Abstract

We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is Õ(√αT), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O( (ln T)2 maxS∈I(G) ∑i∈S ∆-1i) where I (G) is the family of all independent sets in a suitably defined undirected version of the graph and ∆i are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time. © 2022 Neural information processing systems foundation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo
	
			Settore INF/01 - Informatica
		
	Titolo del progetto
	
	Titolo Progetto
	
									European Learning and Intelligent Systems Excellence (ELISE)
								
	Acronimo
	
									ELISE
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									H2020
								
	N. Contratto
	
									951847
								
	Titolo Progetto
	
									One Health Action Hub: task force di Ateneo per la resilienza di ecosistemi territoriali (1H_Hub)Linea Strategica 3, Tema One health, one earth
								
	Acronimo
	
									1H_Hub
								
	Nome finanziatore
	
										UNIVERSITA' DEGLI STUDI DI MILANO
									
	Titolo Progetto
	
									Algorithms, Games, and Digital Markets (ALGADIMAR)
								
	Acronimo
	
									ALGADIMAR
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2017R9FHSR_006
								
	Data di pubblicazione
	
			2022
		
	URL
	
			https://proceedings.neurips.cc/paper_files/paper/2022/file/e36da7acd188c6655792270b38830124-Paper-Conference.pdf
		
	Tipologia
	
			Book Part (author)
		
	Appare nelle tipologie:
	
			03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
NeurIPS-2022-a-near-optimal-best-of-both-worlds-algorithm-for-online-learning-with-feedback-graphs-Paper-Conference.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 306.81 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	306.81 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/961296

Citazioni

ND

3

ND

social impact