We study sequential prediction problems in which, at each time instance, the forecaster chooses a vector from a given finite set S⊆R^d. At the same time, the opponent chooses a “loss” vector in R^d and the forecaster suffers a loss that is the inner product of the two vectors. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible element in S. We consider the “bandit” setting in which the forecaster only has access to the losses of the chosen vectors (i.e., the entire loss vectors are not observed). We introduce a variant of a strategy by Dani, Hayes and Kakade achieving a regret bound that, for a variety of concrete choices of S, is of order sqrt(nd ln|S|) where n is the time horizon. This is not improvable in general and is better than previously known bounds. The examples we consider are all such that S⊆{0,1}^d, and we show how the combinatorial structure of these classes can be exploited to improve the regret bounds. We also point out computationally efficient implementations for various interesting choices of S.

Combinatorial bandits / N. Cesa-Bianchi, G. Lugosi. - In: JOURNAL OF COMPUTER AND SYSTEM SCIENCES. - ISSN 0022-0000. - 78:5(2012), pp. 1404-1422. [10.1016/j.jcss.2012.01.001]

Combinatorial bandits

N. Cesa-Bianchi
Primo
;
2012

Abstract

We study sequential prediction problems in which, at each time instance, the forecaster chooses a vector from a given finite set S⊆R^d. At the same time, the opponent chooses a “loss” vector in R^d and the forecaster suffers a loss that is the inner product of the two vectors. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible element in S. We consider the “bandit” setting in which the forecaster only has access to the losses of the chosen vectors (i.e., the entire loss vectors are not observed). We introduce a variant of a strategy by Dani, Hayes and Kakade achieving a regret bound that, for a variety of concrete choices of S, is of order sqrt(nd ln|S|) where n is the time horizon. This is not improvable in general and is better than previously known bounds. The examples we consider are all such that S⊆{0,1}^d, and we show how the combinatorial structure of these classes can be exploited to improve the regret bounds. We also point out computationally efficient implementations for various interesting choices of S.
Adversarial bandit problems; Online linear optimization; Online prediction
Settore INF/01 - Informatica
   Pattern Analysis, Statistical Modelling and Computational Learning 2
   PASCAL2
   EUROPEAN COMMISSION
   FP7
   216886
2012
Article (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/223627
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 298
  • ???jsp.display-item.citation.isi??? 234
  • OpenAlex ND
social impact