IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

How can we effectively exploit the collected samples when solving a continuous control task with Reinforcement Learning? Recent results have empirically demonstrated that multiple policy optimization steps can be performed with the same batch by using off-distribution techniques based on importance sampling. However, when dealing with off-distribution optimization, it is essential to take into account the uncertainty introduced by the importance sampling process. In this paper, we propose and analyze a class of model-free, policy search algorithms that extend the recent Policy Optimization via Importance Sampling (Metelli et al., 2018) by incorporating two advanced variance reduction techniques: per-decision and multiple importance sampling. For both of them, we derive a high-probability bound, of independent interest, and then we show how to employ it to define a suitable surrogate objective function that can be used for both action-based and parameter-based settings. The resulting algorithms are finally evaluated on a set of continuous control tasks, using both linear and deep policies, and compared with modern policy optimization methods.

Importance Sampling Techniques for Policy Optimization / M. Metelli Alberto, M. Papini, N. Montali, M. Restelli. - In: JOURNAL OF MACHINE LEARNING RESEARCH. - ISSN 1532-4435. - 21:141(2020), pp. 1-75.

Importance Sampling Techniques for Policy Optimization

Metelli Alberto Maria;M. Papini^Secondo;Montali Nico;Restelli Marcello

2020

Abstract

How can we effectively exploit the collected samples when solving a continuous control task with Reinforcement Learning? Recent results have empirically demonstrated that multiple policy optimization steps can be performed with the same batch by using off-distribution techniques based on importance sampling. However, when dealing with off-distribution optimization, it is essential to take into account the uncertainty introduced by the importance sampling process. In this paper, we propose and analyze a class of model-free, policy search algorithms that extend the recent Policy Optimization via Importance Sampling (Metelli et al., 2018) by incorporating two advanced variance reduction techniques: per-decision and multiple importance sampling. For both of them, we derive a high-probability bound, of independent interest, and then we show how to employ it to define a suitable surrogate objective function that can be used for both action-based and parameter-based settings. The resulting algorithms are finally evaluated on a set of continuous control tasks, using both linear and deep policies, and compared with modern policy optimization methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Reinforcement Learning; Policy Optimization; Importance Sampling; Per-Decision Importance Sampling; Multiple Importance Sampling
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2020
			
	Rivista in ANCE
	
				JOURNAL OF MACHINE LEARNING RESEARCH
			
	URL
	
				https://jmlr.org/papers/v21/20-124.html
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
20-124.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 1.57 MB Formato Adobe PDF Visualizza/Apri	1.57 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226057

Citazioni

ND

43

35

26

social impact