Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning

Montenegro, A.; Mussi, M.; Papini, M.; Alberto Maria Metelli,

doi:10.52202/079017-4014

Constrained Reinforcement Learning (CRL) tackles sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints, which are often formulated on expected costs. In this setting, policy-based methods are widely used since they come with several advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or parameter-based exploration strategy, depending on whether they learn directly the parameters of a stochastic policy or those of a stochastic hyperpolicy. In this paper, we propose a general framework for addressing CRL problems via gradient-based primal-dual algorithms, relying on an alternate ascent/descent scheme with dual-variable regularization. We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-iterate convergence guarantees under (weak) gradient domination assumptions, improving and generalizing existing results. Then, we design C-PGAE and C-PGPE, the action-based and the parameter-based versions of C-PG, respectively, and we illustrate how they naturally extend to constraints defined in terms of risk measures over the costs, as it is often requested in safety-critical scenarios. Finally, we numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines, demonstrating their effectiveness.

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning / A. Montenegro, M. Mussi, M. Papini, A. Maria Metelli - In: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)[s.l] : Neural information processing systems foundation, 2024. - pp. 126363-126416 (( 38. Conference on Neural Information Processing Systems Vancouver 2024 [10.52202/079017-4014].

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning

Alessandro Montenegro;Marco Mussi;M. Papini^Penultimo;Alberto Maria Metelli

2024

Abstract

Constrained Reinforcement Learning (CRL) tackles sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints, which are often formulated on expected costs. In this setting, policy-based methods are widely used since they come with several advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or parameter-based exploration strategy, depending on whether they learn directly the parameters of a stochastic policy or those of a stochastic hyperpolicy. In this paper, we propose a general framework for addressing CRL problems via gradient-based primal-dual algorithms, relying on an alternate ascent/descent scheme with dual-variable regularization. We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-iterate convergence guarantees under (weak) gradient domination assumptions, improving and generalizing existing results. Then, we design C-PGAE and C-PGPE, the action-based and the parameter-based versions of C-PG, respectively, and we illustrate how they naturally extend to constraints defined in terms of risk measures over the costs, as it is often requested in safety-critical scenarios. Finally, we numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines, demonstrating their effectiveness.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2024
			
	DOI
	
				https://dx.doi.org/10.52202/079017-4014
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
paper.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 1.17 MB Formato Adobe PDF Visualizza/Apri	1.17 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226147

Citazioni

ND

1

0

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca