IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We consider the problem of goal-directed planning under a deterministic transition model. Monte Carlo Tree Search has shown remarkable performance in solving deterministic control problems. By using function approximators to bias the search of the tree, MCTS has been extended to complex continuous domains, resulting in the AlphaZero family of algorithms. Nonetheless, these algorithms still struggle with control problems with sparse rewards such as goal-directed domains, where a positive reward is awarded only when reaching a goal state. In this work, we extend AlphaZero with Hindsight Experience Replay to tackle complex goal-directed planning tasks. We demonstrate the effectiveness of the proposed approach through an extensive empirical evaluation in several simulated domains, including a novel application to a quantum compiling domain.

Goal-Directed Planning via Hindsight Experience Replay / L. Moro, A. Likmeta, M. Restelli, E. Prati - In: ICLR 2022 - 10th International Conference on Learning Representations[s.l] : International Conference on Learning Representations, ICLR, 2022. - pp. 1-16 (( Intervento presentato al 10. convegno International Conference on Learning Representations tenutosi a on line nel 2022.

Goal-Directed Planning via Hindsight Experience Replay

Moro L.;Likmeta A.;Restelli M.;E. Prati^Ultimo

2022

Abstract

We consider the problem of goal-directed planning under a deterministic transition model. Monte Carlo Tree Search has shown remarkable performance in solving deterministic control problems. By using function approximators to bias the search of the tree, MCTS has been extended to complex continuous domains, resulting in the AlphaZero family of algorithms. Nonetheless, these algorithms still struggle with control problems with sparse rewards such as goal-directed domains, where a positive reward is awarded only when reaching a goal state. In this work, we extend AlphaZero with Hindsight Experience Replay to tackle complex goal-directed planning tasks. We demonstrate the effectiveness of the proposed approach through an extensive empirical evaluation in several simulated domains, including a novel application to a quantum compiling domain.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo
	
			Settore FIS/02 - Fisica Teorica, Modelli e Metodi Matematici
		
	Data di pubblicazione
	
			2022
		
	Enti collegati al convegno
	
			ByteDance
Meta AI
Microsoft
Qualcomm
Sea Al Lab
		
	Tipologia
	
			Book Part (author)
		
	Appare nelle tipologie:
	
			03 - Contributo in volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/991816

Citazioni

ND

4

ND

social impact