Motivation: Automated protein function prediction is a complex multi-class, multi-label, structured classification problem in which protein functions are organized in a controlled vocabulary, according to the Gene Ontology (GO). "Hierarchy-unaware" classifiers, also known as "flat" methods, predict GO terms without exploiting the inherent structure of the ontology, potentially violating the True-Path-Rule (TPR) that governs the GO, while "hierarchy-aware" approaches, even if they obey the TPR, do not always show clear improvements with respect to flat methods, or do not scale well when applied to the full GO. Results: To overcome these limitations, we propose Hierarchical Ensemble Methods for Directed Acyclic Graphs (HEMDAG), a family of highly modular hierarchical ensembles of classifiers, able to build upon any flat method and to provide "TPR-safe" predictions, by leveraging a combination of isotonic regression and TPR learning strategies. Extensive experiments on synthetic and real data across several organisms firstly show that HEMDAG can be used as a general tool to improve the predictions of flat classifiers, and secondly that HEMDAG is competitive versus state-of-the-art hierarchy-aware learning methods proposed in the last CAFA international challenges.

HEMDAG: a family of modular and scalable hierarchical ensemble methods to improve Gene Ontology term prediction / M. Notaro, M. Frasca, A. Petrini, J. Gliozzo, E. Casiraghi, P.N. Robinson, G. Valentini. - In: BIOINFORMATICS. - ISSN 1367-4803. - (2021). [Epub ahead of print] [10.1093/bioinformatics/btab485]

HEMDAG: a family of modular and scalable hierarchical ensemble methods to improve Gene Ontology term prediction

M. Notaro
Primo
;
M. Frasca;A. Petrini;J. Gliozzo;E. Casiraghi;G. Valentini
2021

Abstract

Motivation: Automated protein function prediction is a complex multi-class, multi-label, structured classification problem in which protein functions are organized in a controlled vocabulary, according to the Gene Ontology (GO). "Hierarchy-unaware" classifiers, also known as "flat" methods, predict GO terms without exploiting the inherent structure of the ontology, potentially violating the True-Path-Rule (TPR) that governs the GO, while "hierarchy-aware" approaches, even if they obey the TPR, do not always show clear improvements with respect to flat methods, or do not scale well when applied to the full GO. Results: To overcome these limitations, we propose Hierarchical Ensemble Methods for Directed Acyclic Graphs (HEMDAG), a family of highly modular hierarchical ensembles of classifiers, able to build upon any flat method and to provide "TPR-safe" predictions, by leveraging a combination of isotonic regression and TPR learning strategies. Extensive experiments on synthetic and real data across several organisms firstly show that HEMDAG can be used as a general tool to improve the predictions of flat classifiers, and secondly that HEMDAG is competitive versus state-of-the-art hierarchy-aware learning methods proposed in the last CAFA international challenges.
Settore INF/01 - Informatica
2021
Article (author)
File in questo prodotto:
File Dimensione Formato  
btab485.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 743.47 kB
Formato Adobe PDF
743.47 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/858316
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact