Many big-data applications are batch applications that exploit dedicated frameworks to perform massively parallel computations across clusters of machines. The time needed to process the entirety of the inputs represents the application's response time, which can be subject to deadlines. Spark, probably the most famous incarnation of these frameworks today, allocates resources to applications statically at the beginning of the execution and deviations are not managed: to meet the applications' deadlines, resources must be allocated carefully. This paper proposes an extension to Spark, called dynaSpark, that is able to allocate and redistribute resources to applications dynamically to meet deadlines and cope with the execution of unanticipated applications. This work is based on two key enablers: containers, to isolate Spark's parallel executors and allow for the dynamic and fast allocation of resources, and control-theory to govern resource allocation at runtime and obtain required precision and speed. Our evaluation shows that dynaSpark can (i) allocate resources efficiently to execute single applications with respect to set deadlines and (ii) reduce deadline violations (w.r.t. Spark) when executing multiple concurrent applications.

Fine-Grained Dynamic Resource Allocation for Big-Data Applications / L. Baresi, A. Leva, G. Quattrocchi. - In: IEEE TRANSACTIONS ON SOFTWARE ENGINEERING. - ISSN 0098-5589. - 47:8(2021), pp. 8778680.1668-8778680.1682. [10.1109/TSE.2019.2931537]

Fine-Grained Dynamic Resource Allocation for Big-Data Applications

G. Quattrocchi
Ultimo
2021

Abstract

Many big-data applications are batch applications that exploit dedicated frameworks to perform massively parallel computations across clusters of machines. The time needed to process the entirety of the inputs represents the application's response time, which can be subject to deadlines. Spark, probably the most famous incarnation of these frameworks today, allocates resources to applications statically at the beginning of the execution and deviations are not managed: to meet the applications' deadlines, resources must be allocated carefully. This paper proposes an extension to Spark, called dynaSpark, that is able to allocate and redistribute resources to applications dynamically to meet deadlines and cope with the execution of unanticipated applications. This work is based on two key enablers: containers, to isolate Spark's parallel executors and allow for the dynamic and fast allocation of resources, and control-theory to govern resource allocation at runtime and obtain required precision and speed. Our evaluation shows that dynaSpark can (i) allocate resources efficiently to execute single applications with respect to set deadlines and (ii) reduce deadline violations (w.r.t. Spark) when executing multiple concurrent applications.
batch processing systems; control theory; Distributed architectures; quality assurance
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
2021
Article (author)
File in questo prodotto:
File Dimensione Formato  
Fine-Grained_Dynamic_Resource_Allocation_for_Big-Data_Applications.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Licenza: Nessuna licenza
Dimensione 795.58 kB
Formato Adobe PDF
795.58 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1227040
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 10
  • OpenAlex ND
social impact