IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.

Gradient-Aware Model-Based Policy Search / P. D'Oro, A. Maria Metelli, A. Tirinzoni, M. Papini, M. Restelli (PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE). - In: AAAI-20 Technical Tracks 4[s.l] : AAAI PRESS, 2020. - ISBN 978-1-57735-835-0. - pp. 3801-3808 (( 34. AAAI Conference on Artificial Intelligence New York 2020 [10.1609/aaai.v34i04.5791].

Gradient-Aware Model-Based Policy Search

Pierluca D'Oro;Alberto Maria Metelli;Andrea Tirinzoni;M. Papini^Penultimo;Marcello Restelli

2020

Abstract

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2020
			
	DOI
	
				https://dx.doi.org/10.1609/aaai.v34i04.5791
			
	URL
	
				https://aaai.org/ojs/index.php/AAAI/article/view/5791
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
5791-Article Text-9016-1-10-20200513.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 656.95 kB Formato Adobe PDF Visualizza/Apri	656.95 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226150

Citazioni

ND

30

16

12

social impact