In the context of Genomic and Precision Medicine, prediction problems are often characterized by a high imbalance between classes and Big Data. This requires specialized tools, as traditional Machine Learn- ing approaches may struggle with big datasets and often fail to predict the minority class with unbalanced classification problems. In this work we present ParSMURF-NG, a High Performance Computing-oriented Machine Learning approach designed to scale well on big omics data. We measured its performance capabilities on three current-generation HPC systems and we showed its usefulness in the context of Genomic Medicine, providing a powerful model for the detec- tion of pathogenic single nucleotide variants in the non-coding regions of the human genome.
ParSMURF-NG: A Machine Learning High Performance Computing System for the Analysis of Imbalanced Big Omics Data / A. Petrini, M. Notaro, J. Gliozzo, T. Castrignanò, P.N. Robinson, E. Casiraghi, G. Valentini (IFIP ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY). - In: Artificial Intelligence Applications and Innovations / [a cura di] I. Maglogiannis, L. Iliadis, J. Macintyre, P. Cortez. - [s.l] : IFIP, 2022. - ISBN 978-3-031-08340-2. - pp. 424-435 (( convegno MHDW 2022, 5G-PINE 2022, AIBMG 2022, ML@HC 2022, and AIBEI 2022 tenutosi a Hersonissos nel 2022 [10.1007/978-3-031-08341-9_34].
ParSMURF-NG: A Machine Learning High Performance Computing System for the Analysis of Imbalanced Big Omics Data
A. PetriniPrimo
;M. NotaroSecondo
;J. Gliozzo;E. CasiraghiPenultimo
;G. ValentiniUltimo
2022
Abstract
In the context of Genomic and Precision Medicine, prediction problems are often characterized by a high imbalance between classes and Big Data. This requires specialized tools, as traditional Machine Learn- ing approaches may struggle with big datasets and often fail to predict the minority class with unbalanced classification problems. In this work we present ParSMURF-NG, a High Performance Computing-oriented Machine Learning approach designed to scale well on big omics data. We measured its performance capabilities on three current-generation HPC systems and we showed its usefulness in the context of Genomic Medicine, providing a powerful model for the detec- tion of pathogenic single nucleotide variants in the non-coding regions of the human genome.File | Dimensione | Formato | |
---|---|---|---|
18th_AIAI_2022.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
286.26 kB
Formato
Adobe PDF
|
286.26 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
978-3-031-08341-9_34.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
435.85 kB
Formato
Adobe PDF
|
435.85 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.