Gene function prediction is a complex computational problem, characterized by several items: the number of functional classes is large, and a gene may belong to multiple classes; functional classes are structured according to a hierarchy; classes are usually unbalanced, with more negative than positive examples; class labels can be uncertain and the annotations largely incomplete; to improve the predictions, multiple sources of data need to be properly integrated. In this contribution we focus on the first three items, and in particular on the development of a new method for the hierarchical genome-wide and ontology-wide gene function prediction. The proposed algorithm is inspired by the “true path rule” that governs both the Gene Ontology and FunCat taxonomies. According to this rule, the proposed True Path Rule (TPR) ensemble method is characterized by a two-way asymmetric flow of information that traverses the graph-structured ensemble: positive predictions for a node influence in a recursive way its ancestors, while negative predictions influence its offsprings. Cross-validated results with the model organism S. cerevisiae, using 7 different sources of biomolecular data, and a theoretical analysis of the the TPR algorithm show the effectiveness and the drawbacks of the proposed approach.

True path rule hierarchical ensembles for genome-wide gene function prediction / G. Valentini. - In: IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. - ISSN 1545-5963. - 8:3(2011 May), pp. 5467036.832-5467036.847. [10.1109/TCBB.2010.38]

True path rule hierarchical ensembles for genome-wide gene function prediction

G. Valentini
Primo
2011

Abstract

Gene function prediction is a complex computational problem, characterized by several items: the number of functional classes is large, and a gene may belong to multiple classes; functional classes are structured according to a hierarchy; classes are usually unbalanced, with more negative than positive examples; class labels can be uncertain and the annotations largely incomplete; to improve the predictions, multiple sources of data need to be properly integrated. In this contribution we focus on the first three items, and in particular on the development of a new method for the hierarchical genome-wide and ontology-wide gene function prediction. The proposed algorithm is inspired by the “true path rule” that governs both the Gene Ontology and FunCat taxonomies. According to this rule, the proposed True Path Rule (TPR) ensemble method is characterized by a two-way asymmetric flow of information that traverses the graph-structured ensemble: positive predictions for a node influence in a recursive way its ancestors, while negative predictions influence its offsprings. Cross-validated results with the model organism S. cerevisiae, using 7 different sources of biomolecular data, and a theoretical analysis of the the TPR algorithm show the effectiveness and the drawbacks of the proposed approach.
Gene function prediction; ensemble methods; hierarchical classification; Functional Catalogue (FunCat)
Settore INF/01 - Informatica
mag-2011
Article (author)
File in questo prodotto:
File Dimensione Formato  
05467036.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 3.21 MB
Formato Adobe PDF
3.21 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/155424
Citazioni
  • ???jsp.display-item.citation.pmc??? 19
  • Scopus 123
  • ???jsp.display-item.citation.isi??? 99
  • OpenAlex ND
social impact