PAPINI, MATTEO
PAPINI, MATTEO
Dipartimento di Informatica Giovanni Degli Antoni
Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning
2026 P. Olivieri, F. Lasca, A. Gianola, M. Papini
Convergence Analysis of Policy Gradient Methods with Dynamic Stochasticity
2025 A. Montenegro, M. Mussi, M. Papini, A.M. Metelli
Exploration-Free Reinforcement Learning with Linear Function Approximation
2025 L. Civitavecchia, M. Papini
Search or split: policy gradient with adaptive policy space
2025 G. Tedeschi, M. Papini, A.M. Metelli, M. Restelli
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
2024 A. Montenegro, M. Mussi, A. Maria Metelli, M. Papini
Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs
2024 D. Maran, A. Maria Metelli, M. Papini, M. Restelli
Importance-Weighted Offline Learning Done Right
2024 G. Gabbianelli, G. Neu, M. Papini
Policy Gradient with Active Importance Sampling
2024 M. Papini, G. Manganini, A. Maria Metelli, M. Restelli
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning
2024 A. Montenegro, M. Mussi, M. Papini, A. Maria Metelli
Online Learning with Off-Policy Feedback in Adversarial MDPs
2024 F. Bacchiocchi, F. Stradi, M. Papini, A. Metelli, N. Gatti
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs
2024 D. Maran, A. Maria Metelli, M. Papini, M. Restelli
No-Regret Reinforcement Learning in Smooth MDPs
2024 D. Maran, A. Maria Metelli, M. Papini, M. Restelli
Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds
2024 G. Paczolay, M. Papini, A.M. Metelli, I. Harmati, M. Restelli
Offline Primal-Dual Reinforcement Learning for Linear MDPs
2024 G. Gabbianelli, G. Neu, N. Okolo, M. Papini
Online Learning with Off-Policy Feedback
2023 G. Gabbianelli, G. Neu, M. Papini
Optimistic Information-Directed Sampling
2023 G. Neu, M. Papini, L. Schwartz
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
2022 A. Tirinzoni, M. Papini, A. Touati, A. Lazaric, M. Pirotta
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
2022 G. Neu, M. Papini, J. Olkhovskaya, L. Schwartz
Smoothing policies and safe policy gradients
2022 M. Papini, M. Pirotta, M. Restelli
Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
2021 M. Papini, A. Tirinzoni, A. Pacchiano, M. Restelli, A. Lazaric, M. Pirotta