IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

This manuscript is dedicated to the analysis of the application of stochastic bandits to the recommender systems domain. Here a learning agent sequentially recommends one item from a catalog of available alternatives. Consequently, the environment returns a reward that is a noisy observation of the rating associated to the suggested item. The peculiarity of the bandit setting is that no information is given about not recommended products, and the collected rewards are the only information available to the learning agent. By relying on them the learner adapts his strategy towards reaching its learning objective, that is, maximizing the cumulative reward collected over all the interactions. In this dissertation we cover the investigation of two main research directions: the development of efficient learning algorithms and the introduction of a more realistic learning setting. In addressing the former objective we propose two approaches to speedup the learning process. The first solution aims to reduce the computational costs associated to the learning procedure, while the second's goal is to boost the learning phase by relying on data corresponding to terminated recommendation sessions. Regarding the latter research line, we propose a novel setting representing use-cases that do not fit in the standard bandit model.

EFFICIENCY AND REALISM IN STOCHASTIC BANDITS / L. Cella ; tutor: N. Cesa-Bianchi ; coordinatore PhD program: P. Boldi. Dipartimento di Informatica Giovanni Degli Antoni, 2021 Jan 15. 33. ciclo, Anno Accademico 2020. [10.13130/cella-leonardo_phd2021-01-15].

EFFICIENCY AND REALISM IN STOCHASTIC BANDITS

L. Cella

2021

Abstract

This manuscript is dedicated to the analysis of the application of stochastic bandits to the recommender systems domain. Here a learning agent sequentially recommends one item from a catalog of available alternatives. Consequently, the environment returns a reward that is a noisy observation of the rating associated to the suggested item. The peculiarity of the bandit setting is that no information is given about not recommended products, and the collected rewards are the only information available to the learning agent. By relying on them the learner adapts his strategy towards reaching its learning objective, that is, maximizing the cumulative reward collected over all the interactions. In this dissertation we cover the investigation of two main research directions: the development of efficient learning algorithms and the introduction of a more realistic learning setting. In addressing the former objective we propose two approaches to speedup the learning process. The first solution aims to reduce the computational costs associated to the learning procedure, while the second's goal is to boost the learning phase by relying on data corresponding to terminated recommendation sessions. Regarding the latter research line, we propose a novel setting representing use-cases that do not fit in the standard bandit model.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				15-gen-2021
			
	Settori scientifico-disciplinari della tesi
	
				Settore INF/01 - Informatica
			
	Parole chiave
	
				machine learning; multi-armed bandits; stochastic bandits; online learning
			
	Tutor afferenti all'Ateneo
	
				CESA BIANCHI, NICOLO' ANTONIO
			
	Supervisori e coordinatori afferenti all'Ateneo
	
				BOLDI, PAOLO
			
	Tipologia
	
				Doctoral Thesis
			
	Citazione
	
				EFFICIENCY AND REALISM IN STOCHASTIC BANDITS / L. Cella ; tutor: N. Cesa-Bianchi ; coordinatore PhD program: P. Boldi. Dipartimento di Informatica Giovanni Degli Antoni, 2021 Jan 15. 33. ciclo, Anno Accademico 2020. [10.13130/cella-leonardo_phd2021-01-15].
			
	Appare nelle tipologie:
	
				Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R11945.pdf accesso aperto Tipologia: Tesi di dottorato completa Dimensione 3.33 MB Formato Adobe PDF Visualizza/Apri	3.33 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/807862

Citazioni

ND

ND

ND

social impact