We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the wellknown OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
Meta-learning with stochastic linear bandits / L. Cella, A. Lazaric, M. Pontil (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: ICML[s.l] : International Machine Learning Society (IMLS), 2020. - ISBN 9781713821120. - pp. 1337-1347 (( Intervento presentato al 37. convegno International Conference on Machine Learning : 13 through 18 July nel 2020.
Meta-learning with stochastic linear bandits
L. Cella
Primo
;
2020
Abstract
We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the wellknown OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.File | Dimensione | Formato | |
---|---|---|---|
3524938.3525065.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
785.86 kB
Formato
Adobe PDF
|
785.86 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.