Several variance-reduced versions of REINFORCE based on importance sampling achieve an improved O(ϵ-3) sample complexity to find an ϵ-stationary point, under an unrealistic assumption on the variance of the importance weights. In this paper, we propose the Defensive Policy Gradient (DEF-PG) algorithm, based on defensive importance sampling, achieving the same result without any assumption on the variance of the importance weights. We also show that this is not improvable by establishing a matching Ω(ϵ-3) lower bound, and that REINFORCE with its O(ϵ-4) sample complexity is actually optimal under weaker assumptions on the policy class. Numerical simulations show promising results for the proposed technique compared to similar algorithms based on vanilla importance sampling.

Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds / G. Paczolay, M. Papini, A.M. Metelli, I. Harmati, M. Restelli. - In: MACHINE LEARNING. - ISSN 0885-6125. - 113:9(2024), pp. 6475-6510. [10.1007/s10994-024-06573-4]

Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds

M. Papini
Secondo
;
2024

Abstract

Several variance-reduced versions of REINFORCE based on importance sampling achieve an improved O(ϵ-3) sample complexity to find an ϵ-stationary point, under an unrealistic assumption on the variance of the importance weights. In this paper, we propose the Defensive Policy Gradient (DEF-PG) algorithm, based on defensive importance sampling, achieving the same result without any assumption on the variance of the importance weights. We also show that this is not improvable by establishing a matching Ω(ϵ-3) lower bound, and that REINFORCE with its O(ϵ-4) sample complexity is actually optimal under weaker assumptions on the policy class. Numerical simulations show promising results for the proposed technique compared to similar algorithms based on vanilla importance sampling.
Reinforcement learning; Policy gradient; Variance reduction
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
2024
Article (author)
File in questo prodotto:
File Dimensione Formato  
s10994-024-06573-4 (1).pdf

accesso riservato

Tipologia: Publisher's version/PDF
Licenza: Nessuna licenza
Dimensione 3.86 MB
Formato Adobe PDF
3.86 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1226055
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact