IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order √dn log N for any finite action set with N actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous √d factor compared to previous works, and gives a regret bound of order d√n log n for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in d dimension is the same as for the basic d-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: The hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a d p n regret, thus improving by a factor √d log n over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a √ dn log n regret, again shaving off an extraneous √d compared to previous works.

Towards minimax policies for online linear optimization with bandit feedback / S. Bubeck, N. Cesa-Bianchi, S. Kakade - In: Proceedings of the 25th Annual Conference on Learning Theory : june 25–June 27, 2012, Edinburgh, Scotland / [a cura di] S. Mannor, N. Srebro, R.C. Williamson. - Brookline, USA : Microtome, 2012. - pp. 41.1-41.14 (( Intervento presentato al 25. convegno Annual Conference on Learning Theory tenutosi a Edinburgh nel 2012.

Towards minimax policies for online linear optimization with bandit feedback

S. Bubeck;N. Cesa-Bianchi^Secondo;S. Kakade

2012

Abstract

We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order √dn log N for any finite action set with N actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous √d factor compared to previous works, and gives a regret bound of order d√n log n for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in d dimension is the same as for the basic d-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: The hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a d p n regret, thus improving by a factor √d log n over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a √ dn log n regret, again shaving off an extraneous √d compared to previous works.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Pattern Analysis, Statistical Modelling and Computational Learning 2
								
	Acronimo
	
									PASCAL2
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									FP7
								
	N. Contratto
	
									216886
								
	Data di pubblicazione
	
				2012
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
bubeck12a-1.pdf accesso aperto Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 303.19 kB Formato Adobe PDF Visualizza/Apri	303.19 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/176974

Citazioni

ND

66

ND

ND

social impact