We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1}d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in the class. We consider the “bandit” setting in which the forecaster has only access to the losses of the chosen vectors. We introduce a new general forecaster achieving a regret bound that, for a variety of concrete choices of S, is of order of square root of (nd ln|S|) where n is the time horizon. This is not improvable in general and is better than previously known bounds. We also point out that computationally efficient implementations for various interesting choices of S exist.
Combinatorial bandits / N. Cesa-Bianchi, G. Lugosi - In: COLT 2009 : proceedings of the 22nd Annual Conference on Learning Theory, Montréal, Canada, June 18-21, 2009 / / [a cura di] S. Dasgupta, A. Klivans. - Madison, USA : Omnipress, 2009. - ISBN 9780982252918. - pp. 237-246 (( Intervento presentato al 22. convegno Annual Conference on Learning Theory tenutosi a Montréal, Canada nel 2009.
Combinatorial bandits
N. Cesa-BianchiPrimo
;
2009
Abstract
We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1}d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in the class. We consider the “bandit” setting in which the forecaster has only access to the losses of the chosen vectors. We introduce a new general forecaster achieving a regret bound that, for a variety of concrete choices of S, is of order of square root of (nd ln|S|) where n is the time horizon. This is not improvable in general and is better than previously known bounds. We also point out that computationally efficient implementations for various interesting choices of S exist.File | Dimensione | Formato | |
---|---|---|---|
comband-colt.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
228.07 kB
Formato
Adobe PDF
|
228.07 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.