In the context of Markov Decision Processes (MDPs) with linear Bellman completeness, a generalization of linear MDPs, we reconsider the learning capabilities of a *greedy* algorithm. The motivation is that, when exploration is costly or dangerous, an exploration-free approach may be preferable to optimistic or randomized solutions. We show that, under a condition of sufficient diversity in the feature distribution, Least-Squares Value Iteration (LSVI) can achieve sublinear regret. Specifically, we show that the expected cumulative regret is at most , where is the number of episodes, is the task horizon, is the dimension of the feature map and is a measure of feature diversity. We empirically validate our theoretical findings on synthetic linear MDPs. Our analysis is a first step towards exploration-free reinforcement learning in MDPs with large state spaces.
Exploration-Free Reinforcement Learning with Linear Function Approximation / L. Civitavecchia, M. Papini. - In: REINFORCEMENT LEARNING JOURNAL. - ISSN 2996-8577. - 6:(2025), pp. 1856-1879. ( Reinforcement Learning Conference: 5-9 agosto Edmonton 2025).
Exploration-Free Reinforcement Learning with Linear Function Approximation
M. PapiniUltimo
2025
Abstract
In the context of Markov Decision Processes (MDPs) with linear Bellman completeness, a generalization of linear MDPs, we reconsider the learning capabilities of a *greedy* algorithm. The motivation is that, when exploration is costly or dangerous, an exploration-free approach may be preferable to optimistic or randomized solutions. We show that, under a condition of sufficient diversity in the feature distribution, Least-Squares Value Iteration (LSVI) can achieve sublinear regret. Specifically, we show that the expected cumulative regret is at most , where is the number of episodes, is the task horizon, is the dimension of the feature map and is a measure of feature diversity. We empirically validate our theoretical findings on synthetic linear MDPs. Our analysis is a first step towards exploration-free reinforcement learning in MDPs with large state spaces.| File | Dimensione | Formato | |
|---|---|---|---|
|
RLJ_RLC_2025_194.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
611.07 kB
Formato
Adobe PDF
|
611.07 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




