In contrast to the classic formulation of partial monitoring, linear partial monitoring can model infinite outcome spaces, while imposing a linear structure on both the losses and the observations. This setting can be viewed as a generalization of linear bandits where loss and feedback are decoupled in a flexible manner. In this work, we address a nonstochastic (adversarial), finite-actions version of the problem through a simple instance of the exploration-by-optimization method that is amenable to efficient implementation. We derive regret bounds that depend on the game structure in a more transparent manner than previous theoretical guarantees for this paradigm. Our bounds feature instance-specific quantities that reflect the degree of alignment between observations and losses, and resemble known guarantees in the stochastic setting. Notably, they achieve the standard√T rate in easy (locally observable) games and T 2/3 in hard (globally observable) games, where T is the time horizon. We instantiate these bounds in a selection of old and new partial information settings subsumed by this model, and illustrate that the achieved dependence on the game structure can be tight in interesting cases.
Instance-Dependent Regret Bounds for Nonstochastic Linear Partial Monitoring / F. Di Gennaro, K. Eldowa, N. Cesa Bianchi (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: Advances in Neural Information Processing Systems / [a cura di] D. Belgrave and C. Zhang and H. Lin and R. Pascanu and P. Koniusz and M. Ghassemi and N. Chen. - [s.l] : Curran Associates, Inc., 2025. - pp. 72439-72491 (( 38. Advances in Neural Information Processing Systems2025.
Instance-Dependent Regret Bounds for Nonstochastic Linear Partial Monitoring
K. EldowaSecondo
;N. Cesa BianchiUltimo
2025
Abstract
In contrast to the classic formulation of partial monitoring, linear partial monitoring can model infinite outcome spaces, while imposing a linear structure on both the losses and the observations. This setting can be viewed as a generalization of linear bandits where loss and feedback are decoupled in a flexible manner. In this work, we address a nonstochastic (adversarial), finite-actions version of the problem through a simple instance of the exploration-by-optimization method that is amenable to efficient implementation. We derive regret bounds that depend on the game structure in a more transparent manner than previous theoretical guarantees for this paradigm. Our bounds feature instance-specific quantities that reflect the degree of alignment between observations and losses, and resemble known guarantees in the stochastic setting. Notably, they achieve the standard√T rate in easy (locally observable) games and T 2/3 in hard (globally observable) games, where T is the time horizon. We instantiate these bounds in a selection of old and new partial information settings subsumed by this model, and illustrate that the achieved dependence on the game structure can be tight in interesting cases.| File | Dimensione | Formato | |
|---|---|---|---|
|
NeurIPS-2025-instance-dependent-regret-bounds-for-nonstochastic-linear-partial-monitoring-Paper-Conference.pdf
accesso aperto
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Licenza:
Creative commons
Dimensione
782.17 kB
Formato
Adobe PDF
|
782.17 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




