This manuscript is dedicated to the analysis of the application of stochastic bandits to the recommender systems domain. Here a learning agent sequentially recommends one item from a catalog of available alternatives. Consequently, the environment returns a reward that is a noisy observation of the rating associated to the suggested item. The peculiarity of the bandit setting is that no information is given about not recommended products, and the collected rewards are the only information available to the learning agent. By relying on them the learner adapts his strategy towards reaching its learning objective, that is, maximizing the cumulative reward collected over all the interactions. In this dissertation we cover the investigation of two main research directions: the development of efficient learning algorithms and the introduction of a more realistic learning setting. In addressing the former objective we propose two approaches to speedup the learning process. The first solution aims to reduce the computational costs associated to the learning procedure, while the second's goal is to boost the learning phase by relying on data corresponding to terminated recommendation sessions. Regarding the latter research line, we propose a novel setting representing use-cases that do not fit in the standard bandit model.

EFFICIENCY AND REALISM IN STOCHASTIC BANDITS / L. Cella ; tutor: N. Cesa-Bianchi ; coordinatore PhD program: P. Boldi. Dipartimento di Informatica Giovanni Degli Antoni, 2021 Jan 15. 33. ciclo, Anno Accademico 2020. [10.13130/cella-leonardo_phd2021-01-15].

EFFICIENCY AND REALISM IN STOCHASTIC BANDITS

L. Cella
2021

Abstract

This manuscript is dedicated to the analysis of the application of stochastic bandits to the recommender systems domain. Here a learning agent sequentially recommends one item from a catalog of available alternatives. Consequently, the environment returns a reward that is a noisy observation of the rating associated to the suggested item. The peculiarity of the bandit setting is that no information is given about not recommended products, and the collected rewards are the only information available to the learning agent. By relying on them the learner adapts his strategy towards reaching its learning objective, that is, maximizing the cumulative reward collected over all the interactions. In this dissertation we cover the investigation of two main research directions: the development of efficient learning algorithms and the introduction of a more realistic learning setting. In addressing the former objective we propose two approaches to speedup the learning process. The first solution aims to reduce the computational costs associated to the learning procedure, while the second's goal is to boost the learning phase by relying on data corresponding to terminated recommendation sessions. Regarding the latter research line, we propose a novel setting representing use-cases that do not fit in the standard bandit model.
15-gen-2021
Settore INF/01 - Informatica
machine learning; multi-armed bandits; stochastic bandits; online learning
CESA BIANCHI, NICOLO' ANTONIO
BOLDI, PAOLO
Doctoral Thesis
EFFICIENCY AND REALISM IN STOCHASTIC BANDITS / L. Cella ; tutor: N. Cesa-Bianchi ; coordinatore PhD program: P. Boldi. Dipartimento di Informatica Giovanni Degli Antoni, 2021 Jan 15. 33. ciclo, Anno Accademico 2020. [10.13130/cella-leonardo_phd2021-01-15].
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R11945.pdf

accesso aperto

Tipologia: Tesi di dottorato completa
Dimensione 3.33 MB
Formato Adobe PDF
3.33 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/807862
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact