The increasing scope of genetic testing allowed by next-generation sequencing (NGS) dramatically increased the number of genetic variants to be interpreted as pathogenic or benign for adequate patient management. Still, the interpretation process often fails to deliver a clear classification, resulting in either variants of unknown significance (VUSs) or variants with conflicting interpretation of pathogenicity (CIP); these represent a major clinical problem because they do not provide useful information for decision-making, causing a large fraction of genetically determined disease to remain undertreated. We developed a machine learning (random forest)-based tool, RENOVO, that classifies variants as pathogenic or benign on the basis of publicly available information and provides a pathogenicity likelihood score (PLS). Using the same feature classes recommended by guidelines, we trained RENOVO on established pathogenic/benign variants in ClinVar (training set accuracy = 99%) and tested its performance on variants whose interpretation has changed over time (test set accuracy = 95%). We further validated the algorithm on additional datasets including unreported variants validated either through expert consensus (ENIGMA) or laboratory-based functional techniques (on BRCA1/2 and SCN5A). On all datasets, RENOVO outperformed existing automated interpretation tools. On the basis of the above validation metrics, we assigned a defined PLS to all existing ClinVar VUSs, proposing a reclassification for 67% with >90% estimated precision. RENOVO provides a validated tool to reduce the fraction of uninterpreted or misinterpreted variants, tackling an area of unmet need in modern clinical genetics.

Machine learning-based reclassification of germline variants of unknown significance: The RENOVO algorithm / V. Favalli, G. Tini, E. Bonetti, G. Vozza, A. Guida, S. Gandini, P.G. Pelicci, L. Mazzarella. - In: AMERICAN JOURNAL OF HUMAN GENETICS. - ISSN 1537-6605. - 108:4(2021 Apr 01), pp. 682-695. [10.1016/j.ajhg.2021.03.010]

Machine learning-based reclassification of germline variants of unknown significance: The RENOVO algorithm

E. Bonetti;G. Vozza;P.G. Pelicci;L. Mazzarella
Ultimo
2021-04-01

Abstract

The increasing scope of genetic testing allowed by next-generation sequencing (NGS) dramatically increased the number of genetic variants to be interpreted as pathogenic or benign for adequate patient management. Still, the interpretation process often fails to deliver a clear classification, resulting in either variants of unknown significance (VUSs) or variants with conflicting interpretation of pathogenicity (CIP); these represent a major clinical problem because they do not provide useful information for decision-making, causing a large fraction of genetically determined disease to remain undertreated. We developed a machine learning (random forest)-based tool, RENOVO, that classifies variants as pathogenic or benign on the basis of publicly available information and provides a pathogenicity likelihood score (PLS). Using the same feature classes recommended by guidelines, we trained RENOVO on established pathogenic/benign variants in ClinVar (training set accuracy = 99%) and tested its performance on variants whose interpretation has changed over time (test set accuracy = 95%). We further validated the algorithm on additional datasets including unreported variants validated either through expert consensus (ENIGMA) or laboratory-based functional techniques (on BRCA1/2 and SCN5A). On all datasets, RENOVO outperformed existing automated interpretation tools. On the basis of the above validation metrics, we assigned a defined PLS to all existing ClinVar VUSs, proposing a reclassification for 67% with >90% estimated precision. RENOVO provides a validated tool to reduce the fraction of uninterpreted or misinterpreted variants, tackling an area of unmet need in modern clinical genetics.
ClinVar; VUS; machine learning; reclassification; variant interpretation; Computer User Training; Datasets as Topic; Genes, BRCA1; Germ-Line Mutation; Humans; Reproducibility of Results; Machine Learning
Settore MED/04 - Patologia Generale
23-mar-2021
Article (author)
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S000292972100094X-main.pdf

solo utenti autorizzati

Tipologia: Publisher's version/PDF
Dimensione 2.42 MB
Formato Adobe PDF
2.42 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2434/930362
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact