IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We prove that two popular linear contextual bandit algorithms, OFUL and Thompson Sampling, can be made efficient using Frequent Directions, a deterministic online sketching technique. More precisely, we show that a sketch of size m allows a O(md) update time for both algorithms, as opposed to Ω(d 2 ) required by their non-sketched versions in general (where d is the dimension of context vectors). This computational speedup is accompanied by regret bounds of order (1 + εm) 3/2d √ T for OFUL and of order (1 + εm)d 3/2√ T for Thompson Sampling, where εm is bounded by the sum of the tail eigenvalues not covered by the sketch. In particular, when the selected contexts span a subspace of dimension at most m, our algorithms have a regret bound matching that of their slower, non-sketched counterparts. Experiments on real-world datasets corroborate our theoretical results.

Efficient Linear Bandits through Matrix Sketching / I. Kuzborskij, L. Cella, N. Cesa-Bianchi (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: International Conference on Artificial Intelligence and Statistics / [a cura di] K. Chaudhuri, M. Sugiyama. - [s.l] : PMLR, 2019. - pp. 177-185 (( Intervento presentato al 22. convegno International Conference on Artificial Intelligence and Statistics nel 2019.

Efficient Linear Bandits through Matrix Sketching

I. Kuzborskij;L. Cella;N. Cesa-Bianchi

2019

Abstract

We prove that two popular linear contextual bandit algorithms, OFUL and Thompson Sampling, can be made efficient using Frequent Directions, a deterministic online sketching technique. More precisely, we show that a sketch of size m allows a O(md) update time for both algorithms, as opposed to Ω(d 2 ) required by their non-sketched versions in general (where d is the dimension of context vectors). This computational speedup is accompanied by regret bounds of order (1 + εm) 3/2d √ T for OFUL and of order (1 + εm)d 3/2√ T for Thompson Sampling, where εm is bounded by the sum of the tail eigenvalues not covered by the sketch. In particular, when the selected contexts span a subspace of dimension at most m, our algorithms have a regret bound matching that of their slower, non-sketched counterparts. Experiments on real-world datasets corroborate our theoretical results.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2019
			
	URL
	
				http://proceedings.mlr.press/v89/kuzborskij19a.html
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
kuzborskij19a.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 2.01 MB Formato Adobe PDF Visualizza/Apri	2.01 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/640342

Citazioni

ND

ND

9

social impact