Predictive maintenance of infrastructure code using “fluid” datasets: An exploratory study on Ansible defect proneness

Quattrocchi, G.; Tamburri, D.A.

doi:10.1002/smr.2480

This work consolidates and compounds previous investigations in recognizing defects for infrastructure-as-code (IaC) scripts using general software development quality metrics with a focus on defect severity but adding to previous work an explorative look at creating datasets, which may boost the predictive power of provided models-we call this notion a fluid dataset. More specifically, we experiment with 50 different metrics harnessing a multiple dataset creation process whereby different versions of the same datasets are rigged with auto-training facilities for model retraining and redeployment in a DataOps fashion. At this point, with a focus on the Ansible infrastructure code language-as a de facto standard for industrial-strength infrastructure code-we build defect prediction models and manage to improve on the state of the art by finding an F1 score of 0.52 and a recall of 0.57 using a Naive-Bayes classifier. On the one hand, by improving state-of-the-art defect prediction models using metrics generalizable for different IaC languages, we provide interesting leads for the future of infrastructure-as-code. On the other hand, we have barely scratched the surface on the novel approach of fluid-datasets creation and automated retraining of Machine Learning (ML) defect prediction models, warranting for more research on the same direction in the future.

Predictive maintenance of infrastructure code using “fluid” datasets: An exploratory study on Ansible defect proneness / G. Quattrocchi, D.A. Tamburri. - In: JOURNAL OF SOFTWARE. - ISSN 2047-7481. - 34:11(2022 Nov), pp. e2480.1-e2480.26. [10.1002/smr.2480]

Predictive maintenance of infrastructure code using “fluid” datasets: An exploratory study on Ansible defect proneness

G. Quattrocchi^Primo;Tamburri D. A.^Ultimo

2022

Abstract

This work consolidates and compounds previous investigations in recognizing defects for infrastructure-as-code (IaC) scripts using general software development quality metrics with a focus on defect severity but adding to previous work an explorative look at creating datasets, which may boost the predictive power of provided models-we call this notion a fluid dataset. More specifically, we experiment with 50 different metrics harnessing a multiple dataset creation process whereby different versions of the same datasets are rigged with auto-training facilities for model retraining and redeployment in a DataOps fashion. At this point, with a focus on the Ansible infrastructure code language-as a de facto standard for industrial-strength infrastructure code-we build defect prediction models and manage to improve on the state of the art by finding an F1 score of 0.52 and a recall of 0.57 using a Naive-Bayes classifier. On the one hand, by improving state-of-the-art defect prediction models using metrics generalizable for different IaC languages, we provide interesting leads for the future of infrastructure-as-code. On the other hand, we have barely scratched the surface on the novel approach of fluid-datasets creation and automated retraining of Machine Learning (ML) defect prediction models, warranting for more research on the same direction in the future.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				defect prediction; DevOps; fluid datasets; infrastructure code
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Data di pubblicazione
	
				nov-2022
			
	Data ahead of print o data di stampa
	
				14-giu-2022
			
	Rivista in ANCE
	
				JOURNAL OF SOFTWARE
			
	DOI
	
				https://dx.doi.org/10.1002/smr.2480
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
J Software Evolu Process - 2022 - Quattrocchi - Predictive maintenance of infrastructure code using fluid datasets An.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 10.17 MB Formato Adobe PDF Visualizza/Apri	10.17 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1227056

Citazioni

ND

5

3

4

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca