This work consolidates and compounds previous investigations in recognizing defects for infrastructure-as-code (IaC) scripts using general software development quality metrics with a focus on defect severity but adding to previous work an explorative look at creating datasets, which may boost the predictive power of provided models-we call this notion a fluid dataset. More specifically, we experiment with 50 different metrics harnessing a multiple dataset creation process whereby different versions of the same datasets are rigged with auto-training facilities for model retraining and redeployment in a DataOps fashion. At this point, with a focus on the Ansible infrastructure code language-as a de facto standard for industrial-strength infrastructure code-we build defect prediction models and manage to improve on the state of the art by finding an F1 score of 0.52 and a recall of 0.57 using a Naive-Bayes classifier. On the one hand, by improving state-of-the-art defect prediction models using metrics generalizable for different IaC languages, we provide interesting leads for the future of infrastructure-as-code. On the other hand, we have barely scratched the surface on the novel approach of fluid-datasets creation and automated retraining of Machine Learning (ML) defect prediction models, warranting for more research on the same direction in the future.
Predictive maintenance of infrastructure code using “fluid” datasets: An exploratory study on Ansible defect proneness / G. Quattrocchi, D.A. Tamburri. - In: JOURNAL OF SOFTWARE. - ISSN 2047-7481. - 34:11(2022 Nov), pp. e2480.1-e2480.26. [10.1002/smr.2480]
Predictive maintenance of infrastructure code using “fluid” datasets: An exploratory study on Ansible defect proneness
G. Quattrocchi
Primo
;
2022
Abstract
This work consolidates and compounds previous investigations in recognizing defects for infrastructure-as-code (IaC) scripts using general software development quality metrics with a focus on defect severity but adding to previous work an explorative look at creating datasets, which may boost the predictive power of provided models-we call this notion a fluid dataset. More specifically, we experiment with 50 different metrics harnessing a multiple dataset creation process whereby different versions of the same datasets are rigged with auto-training facilities for model retraining and redeployment in a DataOps fashion. At this point, with a focus on the Ansible infrastructure code language-as a de facto standard for industrial-strength infrastructure code-we build defect prediction models and manage to improve on the state of the art by finding an F1 score of 0.52 and a recall of 0.57 using a Naive-Bayes classifier. On the one hand, by improving state-of-the-art defect prediction models using metrics generalizable for different IaC languages, we provide interesting leads for the future of infrastructure-as-code. On the other hand, we have barely scratched the surface on the novel approach of fluid-datasets creation and automated retraining of Machine Learning (ML) defect prediction models, warranting for more research on the same direction in the future.| File | Dimensione | Formato | |
|---|---|---|---|
|
J Software Evolu Process - 2022 - Quattrocchi - Predictive maintenance of infrastructure code using fluid datasets An.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
10.17 MB
Formato
Adobe PDF
|
10.17 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




