IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We consider the problem of learning an ε-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our O˜(ϵ−2−d/(ν+1)) sample complexity, where d is the dimension of the state-action space and ν the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs (ν=0). At the same time, for ν→∞, it recovers and greatly generalizes the O(ϵ−2) rate of low-rank MDPs, which are more amenable to regression approaches. In this sense, our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.

Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs / D. Maran, A. Maria Metelli, M. Papini, M. Restelli - In: The Thirty Seventh Annual Conference on Learning Theory / [a cura di] S. Agrawal, A. Roth. - [s.l] : PMLR, 2024. - pp. 3743-3774 (( 37. Annual Conference on Learning Theory Edmonton 2023.

Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs

Davide Maran;Alberto Maria Metelli;M. Papini^Penultimo;Marcello Restelli

2024

Abstract

We consider the problem of learning an ε-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our O˜(ϵ−2−d/(ν+1)) sample complexity, where d is the dimension of the state-action space and ν the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs (ν=0). At the same time, for ν→∞, it recovers and greatly generalizes the O(ϵ−2) rate of low-rank MDPs, which are more amenable to regression approaches. In this sense, our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2024
			
	URL
	
				https://proceedings.mlr.press/v247/maran24a.html
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
Projection by Convolution Optimal Sample Complexity for.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 448.46 kB Formato Adobe PDF Visualizza/Apri	448.46 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226140

Citazioni

ND

1

ND

ND

social impact