NON-STATIONARY MULTIARMED BANDITS FOR SATIATION AND SEASONALITY PHENOMENA IN MUSIC RECOMMENDER SYSTEMS

Clerici, G.

This thesis delves into theoretical aspects of non-stationary multiarmed bandits, motivated by their application to music recommender systems. An intrinsic challenge of such systems lies in evolving user preferences. Rather than finding a single optimal item, the objective is to craft an ordered sequence of items aligning with the user's behaviour. Indeed, these systems need to address dynamic user preferences, characterized by phenomena like satiation. Our goal is to study these phenomena in a sequential learning setting characterized by partial feedback, adopting multiarmed bandits as the approach to tackle this problem. We first introduce a novel model with finitely many arms, which handles contrasting phenomena like satiation and seasonality in a unified framework. We formalize this model, called \textit{Last Switch Dependent Bandit}, as a non-stationary multiarmed bandit where the expected reward of an arm is determined by its state, which indicates the time elapsed since the arm last took part in a switch of actions. This model can generalize to different types of non-stationarity as we relaxed many typical assumptions. Furthermore, it can recover state-dependent bandits in the literature. In this thesis, we will discuss this bandit problem and the solution proposed to solve it, which is based on upper confidence bounds and techniques derived from combinatorial semi-bandits. We conclude this work by providing an upper bound of the regret, to assess the performance of the proposed solution. Aware of the limitations of finite action sets, which are not always representative of real-world applications, a new linear bandit model is proposed to handle non-stationary phenomena within an infinite set of actions, posing complex challenges for cross-arm dependencies. We formalize a model called \textit{Linear Bandit with Memory}, where the expected reward of an action is influenced by the learner's past actions in a fixed-size window, called \textit{memory matrix}. Specifically, the two parameters $m$ and $\gamma$, the size of the window and the exponent respectively, characterizing this memory matrix can produce two types of behaviours: rotting and rising. In our discussion, we show how our model generalizes stationary linear bandits, as well as partially recovering rested rotting and rising bandits in the limit $m \rightarrow +\infty$. %we show how our model can recover the rested variants of rotting and rising bandits, as well as stationary linear bandits. We study the complex problem of modelling the interactions among arms in a linear non-stationary setting and propose a solution to solve it. Furthermore, we study the setting where $m$ and $\gamma$ are unknown and propose a meta-bandit algorithm for model selection to jointly learn the parameters and solve the bandit problem. For both models, our contributions consist in defining novel multiarmed bandit problems, proving sound theoretical guarantees, and discussing our contributions with respect to the literature. Additionally, within the context of music streaming services, we present an additional line of research exploring the Spotify\texttrademark music credits network. We represent this music credits network as a directed graph and study the relationship between music genres and graph-related metrics. After observing interesting patterns on several centrality measures conditioned on music genre, we introduce a novel node-wise index of reciprocity as a potential index for informed recommendations.

NON-STATIONARY MULTIARMED BANDITS FOR SATIATION AND SEASONALITY PHENOMENA IN MUSIC RECOMMENDER SYSTEMS / G. Clerici ; supervisore: N. Cesa-Bianchi ; co-supervisore: P. Laforgue ; coordinatore di dottorato: R. Sassi. Dipartimento di Informatica Giovanni Degli Antoni, 2024. 36. ciclo, Anno Accademico 2022/2023.