Apache Spark is one of the best-known frameworks for exe- cuting big-data batch applications over a cluster of (virtual) machines. Defining the cluster (i.e., the number of machines and CPUs) to attain guarantees on the execution times (deadlines) of the application is indeed a trade-off between the cost of the infrastructure and the time needed to execute the application. Sizing the computational resources, in order to prevent cost overruns, can benefit from the use of formal models as a means to capture the execution time of applications. Our model of Spark applications, based on the CLTLoc logic, is defined by considering the directed acyclic graph around which Spark programs are organized, the number of available CPUs, the number of tasks elaborated by the application, and the average execution times of tasks. If the outcome of the analysis is positive, then the execution is feasible—that is, it can be completed within a given time span. The anal- ysis tool has been implemented on top of the Zot formal verification tool. A preliminary evaluation shows that our model is sufficiently accurate: the formal analysis identifies execution times that are close (the error is less than 10%) to those obtained by actually running the applications.
On the timed analysis of big-data applications / F. Marconi, G. Quattrocchi, L. Baresi, M.M. Bersani, M. Rossi (LECTURE NOTES IN COMPUTER SCIENCE). - In: NASA Formal Methods / [a cura di] A. Dutle, C. Muñoz, A. Narkawicz. - [s.l] : Springer Verlag, 2018. - ISBN 978-3-319-77934-8. - pp. 315-332 (( 10. NFM Newport News 2018 [10.1007/978-3-319-77935-5_22].
On the timed analysis of big-data applications
G. QuattrocchiSecondo
;
2018
Abstract
Apache Spark is one of the best-known frameworks for exe- cuting big-data batch applications over a cluster of (virtual) machines. Defining the cluster (i.e., the number of machines and CPUs) to attain guarantees on the execution times (deadlines) of the application is indeed a trade-off between the cost of the infrastructure and the time needed to execute the application. Sizing the computational resources, in order to prevent cost overruns, can benefit from the use of formal models as a means to capture the execution time of applications. Our model of Spark applications, based on the CLTLoc logic, is defined by considering the directed acyclic graph around which Spark programs are organized, the number of available CPUs, the number of tasks elaborated by the application, and the average execution times of tasks. If the outcome of the analysis is positive, then the execution is feasible—that is, it can be completed within a given time span. The anal- ysis tool has been implemented on top of the Zot formal verification tool. A preliminary evaluation shows that our model is sufficiently accurate: the formal analysis identifies execution times that are close (the error is less than 10%) to those obtained by actually running the applications.| File | Dimensione | Formato | |
|---|---|---|---|
|
978-3-319-77935-5_22.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Licenza:
Nessuna licenza
Dimensione
633.12 kB
Formato
Adobe PDF
|
633.12 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




