Gene function prediction is a complex multilabel classification problem with several distinctive features: the hierarchical relationships between functional classes, the presence of multiple sources of biomolecular data, the unbalance between positive and negative examples for each class, the complexity of the whole-ontology and genome-wide dimensions. Unlike previous works, which mostly looked at each one of these issues in isolation, we explore the interaction and potential synergy of hierarchical multilabel methods, data fusion methods, and cost-sensitive approaches on whole-ontology and genome-wide gene function prediction. Besides classical top-down hierarchical multilabel ensemble methods, in our experiments we consider two recently proposed multilabel methods: one based on the approximation of the Bayesian optimal classifier with respect to the hierarchical loss, and one based on a heuristic approach inspired by the true path rule for the biological functional ontologies. Our experiments show that key factors for the success of hierarchical ensemble methods are the integration and synergy among multilabel hierarchical, data fusion, and cost-sensitive approaches, as well as the strategy of selecting negative examples.

Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference / N. Cesa Bianchi, M. Re, G. Valentini. - In: MACHINE LEARNING. - ISSN 0885-6125. - 88:1/2(2012), pp. 209-241. [10.1007/s10994-011-5271-6]

Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference

N. Cesa Bianchi;M. Re;G. Valentini
2012

Abstract

Gene function prediction is a complex multilabel classification problem with several distinctive features: the hierarchical relationships between functional classes, the presence of multiple sources of biomolecular data, the unbalance between positive and negative examples for each class, the complexity of the whole-ontology and genome-wide dimensions. Unlike previous works, which mostly looked at each one of these issues in isolation, we explore the interaction and potential synergy of hierarchical multilabel methods, data fusion methods, and cost-sensitive approaches on whole-ontology and genome-wide gene function prediction. Besides classical top-down hierarchical multilabel ensemble methods, in our experiments we consider two recently proposed multilabel methods: one based on the approximation of the Bayesian optimal classifier with respect to the hierarchical loss, and one based on a heuristic approach inspired by the true path rule for the biological functional ontologies. Our experiments show that key factors for the success of hierarchical ensemble methods are the integration and synergy among multilabel hierarchical, data fusion, and cost-sensitive approaches, as well as the strategy of selecting negative examples.
hierarchical multilabel classification; data integration; cost-sensitive classification; ensemble methods; gene function prediction
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
2012
Article (author)
File in questo prodotto:
File Dimensione Formato  
cesa-re-vale-mld.mlj.rev.pdf

accesso riservato

Descrizione: Articolo principale
Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 346.3 kB
Formato Adobe PDF
346.3 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
cesa_bianchi_re_valentini.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 1.3 MB
Formato Adobe PDF
1.3 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/175471
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 70
  • ???jsp.display-item.citation.isi??? 54
social impact