In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order (Formula presented) for the worst-case regret, where K is the number of actions, N > K the number of experts, and T the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of (Formula presented). For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.
Improved Regret Bounds for Bandits with Expert Advice / N. Cesa Bianchi, K. Eldowa, E. Esposito, J. Olkhovskaya. - In: JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH. - ISSN 1943-5037. - 83:(2025 Jul), pp. 6.1-6.14. [10.1613/jair.1.16738]
Improved Regret Bounds for Bandits with Expert Advice
N. Cesa BianchiPrimo
;K. EldowaSecondo
;E. EspositoPenultimo
;
2025
Abstract
In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order (Formula presented) for the worst-case regret, where K is the number of actions, N > K the number of experts, and T the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of (Formula presented). For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.| File | Dimensione | Formato | |
|---|---|---|---|
|
16738_final.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
607.81 kB
Formato
Adobe PDF
|
607.81 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




