IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

In the context of Markov Decision Processes (MDPs) with linear Bellman completeness, a generalization of linear MDPs, we reconsider the learning capabilities of a *greedy* algorithm. The motivation is that, when exploration is costly or dangerous, an exploration-free approach may be preferable to optimistic or randomized solutions. We show that, under a condition of sufficient diversity in the feature distribution, Least-Squares Value Iteration (LSVI) can achieve sublinear regret. Specifically, we show that the expected cumulative regret is at most , where is the number of episodes, is the task horizon, is the dimension of the feature map and is a measure of feature diversity. We empirically validate our theoretical findings on synthetic linear MDPs. Our analysis is a first step towards exploration-free reinforcement learning in MDPs with large state spaces.

Exploration-Free Reinforcement Learning with Linear Function Approximation / L. Civitavecchia, M. Papini. - In: REINFORCEMENT LEARNING JOURNAL. - ISSN 2996-8577. - 6:(2025), pp. 1856-1879. ( Reinforcement Learning Conference: 5-9 agosto Edmonton 2025).

Exploration-Free Reinforcement Learning with Linear Function Approximation

Luca Civitavecchia^Primo;M. Papini^Ultimo

2025

Abstract

In the context of Markov Decision Processes (MDPs) with linear Bellman completeness, a generalization of linear MDPs, we reconsider the learning capabilities of a *greedy* algorithm. The motivation is that, when exploration is costly or dangerous, an exploration-free approach may be preferable to optimistic or randomized solutions. We show that, under a condition of sufficient diversity in the feature distribution, Least-Squares Value Iteration (LSVI) can achieve sublinear regret. Specifically, we show that the expected cumulative regret is at most , where is the number of episodes, is the task horizon, is the dimension of the feature map and is a measure of feature diversity. We empirically validate our theoretical findings on synthetic linear MDPs. Our analysis is a first step towards exploration-free reinforcement learning in MDPs with large state spaces.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2025
			
	Rivista in ANCE
	
				REINFORCEMENT LEARNING JOURNAL
			
	URL
	
				https://rlj.cs.umass.edu/2025/papers/Paper194.html
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
RLJ_RLC_2025_194.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 611.07 kB Formato Adobe PDF Visualizza/Apri	611.07 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226156

Citazioni

ND

ND

ND

ND

social impact