The execution of Spark applications is based on the execution order and parallelism of the different jobs, given data and available resources. Spark reifies these dependencies in a graph that we refer to as the (parallel) execution plan of the application. All the approaches that have studied the estimation of the execution times and the dynamic provisioning of resources for this kind of applications have always assumed that the execution plan is unique, given the computing resources at hand. This assumption is at least simplistic for applications that include conditional branches or loops and limits the precision of the prediction techniques. This paper introduces SEEPEP, a novel technique based on symbolic execution and search-based test generation, that: i) automatically extracts the possible execution plans of a Spark application, along with dedicated launchers with properly synthesized data that can be used for profiling, and ii) tunes the allocation of resources at runtime based on the knowledge of the execution plans for which the path conditions hold. The assessment we carried out shows that SEEPEP can effectively complement dynaSpark, an extension of Spark with dynamic resource provisioning capabilities, to help predict the execution duration and the allocation of resources.
Symbolic execution-driven extraction of the parallel execution plans of Spark applications / L. Baresi, G. Denaro, G. Quattrocchi - In: ESEC/FSE 2019 : Proceedings / [a cura di] M. Dumas, D. Pfahl, S. Apel, A. Russo. - [s.l] : ACM, 2019. - ISBN 9781450355728. - pp. 246-256 (( 27. Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering Tallinn 2019 [10.1145/3338906.3338973].
Symbolic execution-driven extraction of the parallel execution plans of Spark applications
G. Quattrocchi
2019
Abstract
The execution of Spark applications is based on the execution order and parallelism of the different jobs, given data and available resources. Spark reifies these dependencies in a graph that we refer to as the (parallel) execution plan of the application. All the approaches that have studied the estimation of the execution times and the dynamic provisioning of resources for this kind of applications have always assumed that the execution plan is unique, given the computing resources at hand. This assumption is at least simplistic for applications that include conditional branches or loops and limits the precision of the prediction techniques. This paper introduces SEEPEP, a novel technique based on symbolic execution and search-based test generation, that: i) automatically extracts the possible execution plans of a Spark application, along with dedicated launchers with properly synthesized data that can be used for profiling, and ii) tunes the allocation of resources at runtime based on the knowledge of the execution plans for which the path conditions hold. The assessment we carried out shows that SEEPEP can effectively complement dynaSpark, an extension of Spark with dynamic resource provisioning capabilities, to help predict the execution duration and the allocation of resources.| File | Dimensione | Formato | |
|---|---|---|---|
|
3338906.3338973.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
1.27 MB
Formato
Adobe PDF
|
1.27 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




