IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Policy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems. In real-world RL applications, it is common to have a good initial policy whose performance needs to be improved and it may not be acceptable to try bad policies during the learning process. Although several methods for choosing the step size exist, research paid less attention to determine the batch size, that is the number of samples used to estimate the gradient direction for each update of the policy parameters. In this paper, we propose a set of methods to jointly optimize the step and the batch sizes that guarantee (with high probability) to improve the policy performance after each update. Besides providing theoretical guarantees, we show numerical simulations to analyse the behaviour of our methods.

Adaptive batch size for safe policy gradients / M. Papini, M. Pirotta, M. Restelli (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: Advances in Neural Information Processing Systems / [a cura di] U. von Luxburg, I. Guyon, S. Bengio, H. Wallach, R. Fergus. - [s.l] : Curran Associates : Neural information processing systems foundation, 2017 Dec 04. - ISBN 9781510860964. - pp. 3594-3603 (( 31. NIPS Annual Conference on Neural Information Processing Systems : December 4 - 9 Long Beach (California, USA) 2017.

Adaptive batch size for safe policy gradients

M. Papini^Primo;Pirotta M.;Restelli M.

2017

Abstract

Policy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems. In real-world RL applications, it is common to have a good initial policy whose performance needs to be improved and it may not be acceptable to try bad policies during the learning process. Although several methods for choosing the step size exist, research paid less attention to determine the batch size, that is the number of samples used to estimate the gradient direction for each update of the policy parameters. In this paper, we propose a set of methods to jointly optimize the step and the batch sizes that guarantee (with high probability) to improve the policy performance after each update. Besides providing theoretical guarantees, we show numerical simulations to analyse the behaviour of our methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Data di pubblicazione
	
				4-dic-2017
			
	URL
	
				https://proceedings.neurips.cc/paper_files/paper/2017/hash/ea6b2efbdd4255a9f1b3bbc6399b58f4-Abstract.html
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
NIPS-2017-adaptive-batch-size-for-safe-policy-gradients-Paper.pdf accesso riservato Tipologia: Publisher's version/PDF Licenza: Nessuna licenza Dimensione 434.19 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	434.19 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1225938

Citazioni

ND

27

19

ND

social impact