Dynamic Resource Allocation for Deadline-Constrained Neural Network Training

Baresi, L.; Garlini, M.; Quattrocchi, G.

doi:10.1109/SEAMS66627.2025.00013

Neural networks (NN) serve as the backbone for various applications, including computer vision, speech recognition, and natural language processing. Due to their iterative nature, training NNs is a highly compute-intensive task that is typically executed using a statically allocated set of devices (e.g., CPUs or GPUs). This static allocation prevents adjusting priorities, making it impossible to reassign resources to urgent tasks, and potentially causing high-priority training jobs to miss their expected completion times. This paper proposes DECOR-NN (DEadline COnstrained Resource allocation for Neural Networks), a control mechanism for NN training that dynamically allocates resources according to a user-defined deadline (i.e., a Service Level Agreement), ensuring that the training phase completes within the specified time. The solution leverages control theory and has been developed on top of PyTorch, a widely-used framework for training NNs. DECOR-NN dynamically allocates either GPUs or fractions of CPUs to meet user deadlines and also allows users to modify the deadline at runtime to accommodate changes in job priorities. A comprehensive empirical evaluation using three benchmark applications demonstrates that DECOR-NN successfully completes training jobs with an average deviation from the deadline of only 1.75 %.

Dynamic Resource Allocation for Deadline-Constrained Neural Network Training / L. Baresi, M. Garlini, G. Quattrocchi (ICSE WORKSHOP ON SOFTWARE ENGINEERING FOR ADAPTIVE AND SELF-MANAGING SYSTEMS). - In: 2025 IEEE/ACM 20th Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS)[s.l] : IEEE, 2025. - ISBN 9798331501815. - pp. 39-49 (( 20. International Conference on Software Engineering for Adaptive and Self-Managing Systems Ottawa 2025 [10.1109/SEAMS66627.2025.00013].

Dynamic Resource Allocation for Deadline-Constrained Neural Network Training

Luciano Baresi;Marco Garlini;G. Quattrocchi^Ultimo

2025

Abstract

Neural networks (NN) serve as the backbone for various applications, including computer vision, speech recognition, and natural language processing. Due to their iterative nature, training NNs is a highly compute-intensive task that is typically executed using a statically allocated set of devices (e.g., CPUs or GPUs). This static allocation prevents adjusting priorities, making it impossible to reassign resources to urgent tasks, and potentially causing high-priority training jobs to miss their expected completion times. This paper proposes DECOR-NN (DEadline COnstrained Resource allocation for Neural Networks), a control mechanism for NN training that dynamically allocates resources according to a user-defined deadline (i.e., a Service Level Agreement), ensuring that the training phase completes within the specified time. The solution leverages control theory and has been developed on top of PyTorch, a widely-used framework for training NNs. DECOR-NN dynamically allocates either GPUs or fractions of CPUs to meet user deadlines and also allows users to modify the deadline at runtime to accommodate changes in job priorities. A comprehensive empirical evaluation using three benchmark applications demonstrates that DECOR-NN successfully completes training jobs with an average deviation from the deadline of only 1.75 %.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Neural Networks; Dynamic Resource Allocation; GPU; Control Theory; PyTorch
			
	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Data di pubblicazione
	
				2025
			
	DOI
	
				https://dx.doi.org/10.1109/SEAMS66627.2025.00013
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
SEAMS_25___Dynamic_Resource_Allocation_for_Deadline_Constrained__Neural_Network_Training-11.pdf accesso riservato Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Licenza: Nessuna licenza Dimensione 448.58 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	448.58 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
Dynamic_Resource_Allocation_for_Deadline-Constrained_Neural_Network_Training.pdf accesso riservato Tipologia: Publisher's version/PDF Licenza: Nessuna licenza Dimensione 644.29 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	644.29 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1227054

Citazioni

ND

0

0

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca