IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

This thesis investigates the design and analysis of algorithms for online decision-making under uncertainty, with a particular focus on the multi-armed bandit and online learning frameworks. At the core of these settings lies a simple yet powerful interaction protocol: at each round the learner receives a question (information available prior to a decision), provides an answer (an action or prediction), and observes a feedback signal that may be partial, noisy, or structured. This iterative loop captures a wide range of sequential decision problems and serves as the unifying perspective for the contributions of this thesis. We address this challenge across four distinct, fundamental problems in online learning. First, we address settings in which the learner receives no additional information prior to making a decision (i.e., the question is implicit), and must adapt to a nonstationary environment under bandit feedback. We introduce a parameter-free algorithm achieving optimal dynamic regret guarantees without prior knowledge of the environment's variability. This result resolves a long-standing open problem in the field. Next, we enrich the answer phase by allowing the learner to abstain. We propose an efficient algorithm, that leverages this option to obtain cumulative reward guarantees that can outperform standard methods, while matching their worst-case performance. This framework is then extended to the adversarial contextual bandit problem with abstention. We investigate problems with combinatorial answers, where the learner selects a subset of actions and the reward feedback is defined by a sum-max function, a rich subclass of monotone submodular functions. We design an algorithm for this bandit problem, and prove it achieves significant improvement in regret with respect to standard algorithms for monotone submodular functions. Finally, we consider the Stochastic Shortest Path problem, where the additional structure is dictated by an underlying Markov Decision Process. We analyze the impact of sparse cost feedback, demonstrating that standard methods fail to exploit it. We design a novel family of regularizers that achieves an optimal, sparsity-adaptive regret bound. This result, complemented by a matching lower bound, provides a complete characterization of the benefits of sparsity in this setting. The contributions in this thesis demonstrate that by designing principled algorithms that recognize and exploit structure within the question, answer, feedback loop, we can achieve strong, adaptive guarantees for learning under uncertainty.

LEARNING UNDER STRUCTURE AND UNCERTAINTY: ALGORITHMS FOR BANDIT AND ONLINE DECISION MAKING / A. Rumi ; tutor: N. Cesa-Bianchi ; cosupervisor: F. Vitale ; coordinator: R. Sassi. Dipartimento di Informatica Giovanni Degli Antoni, 2025. 38. ciclo, Anno Accademico 2025/2026.

LEARNING UNDER STRUCTURE AND UNCERTAINTY: ALGORITHMS FOR BANDIT AND ONLINE DECISION MAKING

A. Rumi

2025

Abstract

This thesis investigates the design and analysis of algorithms for online decision-making under uncertainty, with a particular focus on the multi-armed bandit and online learning frameworks. At the core of these settings lies a simple yet powerful interaction protocol: at each round the learner receives a question (information available prior to a decision), provides an answer (an action or prediction), and observes a feedback signal that may be partial, noisy, or structured. This iterative loop captures a wide range of sequential decision problems and serves as the unifying perspective for the contributions of this thesis. We address this challenge across four distinct, fundamental problems in online learning. First, we address settings in which the learner receives no additional information prior to making a decision (i.e., the question is implicit), and must adapt to a nonstationary environment under bandit feedback. We introduce a parameter-free algorithm achieving optimal dynamic regret guarantees without prior knowledge of the environment's variability. This result resolves a long-standing open problem in the field. Next, we enrich the answer phase by allowing the learner to abstain. We propose an efficient algorithm, that leverages this option to obtain cumulative reward guarantees that can outperform standard methods, while matching their worst-case performance. This framework is then extended to the adversarial contextual bandit problem with abstention. We investigate problems with combinatorial answers, where the learner selects a subset of actions and the reward feedback is defined by a sum-max function, a rich subclass of monotone submodular functions. We design an algorithm for this bandit problem, and prove it achieves significant improvement in regret with respect to standard algorithms for monotone submodular functions. Finally, we consider the Stochastic Shortest Path problem, where the additional structure is dictated by an underlying Markov Decision Process. We analyze the impact of sparse cost feedback, demonstrating that standard methods fail to exploit it. We design a novel family of regularizers that achieves an optimal, sparsity-adaptive regret bound. This result, complemented by a matching lower bound, provides a complete characterization of the benefits of sparsity in this setting. The contributions in this thesis demonstrate that by designing principled algorithms that recognize and exploit structure within the question, answer, feedback loop, we can achieve strong, adaptive guarantees for learning under uncertainty.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				19-dic-2025
			
	Settori scientifico-disciplinari della tesi (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Tutor afferenti all'Ateneo
	
				CESA BIANCHI, NICOLO' ANTONIO
			
	Supervisori e coordinatori afferenti all'Ateneo
	
				SASSI, ROBERTO
			
	Tipologia
	
				Doctoral Thesis
			
	Citazione
	
				LEARNING UNDER STRUCTURE AND UNCERTAINTY: ALGORITHMS FOR BANDIT AND ONLINE DECISION MAKING / A. Rumi ; tutor: N. Cesa-Bianchi ; cosupervisor: F. Vitale ; coordinator: R. Sassi. Dipartimento di Informatica Giovanni Degli Antoni, 2025. 38. ciclo, Anno Accademico 2025/2026.
			
	Appare nelle tipologie:
	
				Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R13858.pdf accesso aperto Descrizione: Doctoral thesis Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Licenza: Creative commons Dimensione 3.56 MB Formato Adobe PDF Visualizza/Apri	3.56 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1205016

Citazioni

ND

ND

ND

ND

social impact