In the context of Genomic and Precision Medicine, prediction problems are often characterized by a high imbalance between classes and Big Data. This requires specialized tools, as traditional Machine Learn- ing approaches may struggle with big datasets and often fail to predict the minority class with unbalanced classification problems. In this work we present ParSMURF-NG, a High Performance Computing-oriented Machine Learning approach designed to scale well on big omics data. We measured its performance capabilities on three current-generation HPC systems and we showed its usefulness in the context of Genomic Medicine, providing a powerful model for the detec- tion of pathogenic single nucleotide variants in the non-coding regions of the human genome.

ParSMURF-NG: A Machine Learning High Performance Computing System for the Analysis of Imbalanced Big Omics Data / A. Petrini, M. Notaro, J. Gliozzo, T. Castrignanò, P.N. Robinson, E. Casiraghi, G. Valentini (IFIP ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY). - In: Artificial Intelligence Applications and Innovations / [a cura di] I. Maglogiannis, L. Iliadis, J. Macintyre, P. Cortez. - [s.l] : IFIP, 2022. - ISBN 978-3-031-08340-2. - pp. 424-435 (( convegno MHDW 2022, 5G-PINE 2022, AIBMG 2022, ML@HC 2022, and AIBEI 2022 tenutosi a Hersonissos nel 2022 [10.1007/978-3-031-08341-9_34].

ParSMURF-NG: A Machine Learning High Performance Computing System for the Analysis of Imbalanced Big Omics Data

A. Petrini
Primo
;
M. Notaro
Secondo
;
J. Gliozzo;E. Casiraghi
Penultimo
;
G. Valentini
Ultimo
2022

Abstract

In the context of Genomic and Precision Medicine, prediction problems are often characterized by a high imbalance between classes and Big Data. This requires specialized tools, as traditional Machine Learn- ing approaches may struggle with big datasets and often fail to predict the minority class with unbalanced classification problems. In this work we present ParSMURF-NG, a High Performance Computing-oriented Machine Learning approach designed to scale well on big omics data. We measured its performance capabilities on three current-generation HPC systems and we showed its usefulness in the context of Genomic Medicine, providing a powerful model for the detec- tion of pathogenic single nucleotide variants in the non-coding regions of the human genome.
Parallel machine learning tool for big data; Machine learning for genomic medicine; Prediction of deleterious variants; Machine learning tool for imbalanced data
Settore INF/01 - Informatica
2022
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
18th_AIAI_2022.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 286.26 kB
Formato Adobe PDF
286.26 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
978-3-031-08341-9_34.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 435.85 kB
Formato Adobe PDF
435.85 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/948209
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact