We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the wellknown OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

Meta-learning with stochastic linear bandits / L. Cella, A. Lazaric, M. Pontil (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: ICML[s.l] : International Machine Learning Society (IMLS), 2020. - ISBN 9781713821120. - pp. 1337-1347 (( Intervento presentato al 37. convegno International Conference on Machine Learning : 13 through 18 July nel 2020.

Meta-learning with stochastic linear bandits

L. Cella
Primo
;
2020

Abstract

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the wellknown OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
Settore INFO-01/A - Informatica
2020
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
3524938.3525065.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 785.86 kB
Formato Adobe PDF
785.86 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1145278
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 24
  • ???jsp.display-item.citation.isi??? 18
social impact