IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The first ingredient is a new concentration inequality that makes it possible to control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. The second ingredient is an application of this inequality to the exploration-exploitation trade-off via importance weighted sampling. We apply the new tool to the stochastic multiarmed bandit problem, however, the main importance of this paper is the development and understanding of the new tool rather than improvement of existing algorithms for stochastic multiarmed bandits. In the follow-up work we demonstrate that the new tool can improve over state-of-the-art in structurally richer problems, such as stochastic multiarmed bandits with side information.

PAC-Bayes-Bernstein inequality for martingales and its application to multiarmed bandits / Y. Seldin, N. Cesa-Bianchi, P. Auer, F. Laviolette, J. Shawe Taylor - In: Proceedings of the Workshop on on-line trading of exploration and exploitation 2 : july 2, 2011, Bellevue, Washington, USA[s.l] : Microtome, 2012. (( convegno On-line Trading of Exploration and Exploitation tenutosi a Bellevue, Washington, USA nel 2011.

PAC-Bayes-Bernstein inequality for martingales and its application to multiarmed bandits

Y. Seldin;N. Cesa-Bianchi^Secondo;P. Auer;F. Laviolette;J. Shawe Taylor

2012

Abstract

We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The first ingredient is a new concentration inequality that makes it possible to control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. The second ingredient is an application of this inequality to the exploration-exploitation trade-off via importance weighted sampling. We apply the new tool to the stochastic multiarmed bandit problem, however, the main importance of this paper is the development and understanding of the new tool rather than improvement of existing algorithms for stochastic multiarmed bandits. In the follow-up work we demonstrate that the new tool can improve over state-of-the-art in structurally richer problems, such as stochastic multiarmed bandits with side information.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
			PAC-Bayesian analysis ; Bernstein's inequality ; Martingales ; Multiarmed bandits ; Model order selection ; Exploration-exploitation trade-off
		
	Settori scientifico-disciplinari del contributo
	
			Settore INF/01 - Informatica
		
	Titolo del progetto
	
	Titolo Progetto
	
									Pattern Analysis, Statistical Modelling and Computational Learning 2
								
	Acronimo
	
									PASCAL2
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									FP7
								
	N. Contratto
	
									216886
								
	Data di pubblicazione
	
			2012
		
	Tipologia
	
			Book Part (author)
		
	Appare nelle tipologie:
	
			03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
seldin12a.pdf accesso aperto Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 428.19 kB Formato Adobe PDF Visualizza/Apri	428.19 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/179282

Citazioni

ND

ND

ND

social impact