The framework of feedback graphs is a generalization of sequential decisionmaking with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order min minε p (αε/ε)T, minε(δε/ε)1/3T2/3 (ignoring logarithmic factors), where αε and δε are graph-theoretic quantities measured on the support of the stochastic feedback graph G with edge probabilities thresholded at ε. Our result, which holds without any preliminary knowledge about G, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.
Learning on the Edge: Online Learning with Stochastic Feedback Graphs / E. Esposito, F. Fusco, D. van der Hoeven, N. Cesa Bianchi (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: NeurIPS / [a cura di] S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh. - [s.l] : Curran Associates, 2022. - pp. 34776-34788 (( Intervento presentato al 36. convegno Conference on Neural Information Processing Systems : Monday November 28th through Friday December 9th tenutosi a New Orleans nel 2022.
Learning on the Edge: Online Learning with Stochastic Feedback Graphs
E. EspositoPrimo
;D. van der HoevenPenultimo
;N. Cesa BianchiUltimo
2022
Abstract
The framework of feedback graphs is a generalization of sequential decisionmaking with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order min minε p (αε/ε)T, minε(δε/ε)1/3T2/3 (ignoring logarithmic factors), where αε and δε are graph-theoretic quantities measured on the support of the stochastic feedback graph G with edge probabilities thresholded at ε. Our result, which holds without any preliminary knowledge about G, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.File | Dimensione | Formato | |
---|---|---|---|
NeurIPS-2022-learning-on-the-edge-online-learning-with-stochastic-feedback-graphs-Paper-Conference.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
490.32 kB
Formato
Adobe PDF
|
490.32 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.