Negative examples in automated protein function prediction (AFP), that is proteins known not to possess a given protein function, are usually not directly stored in public proteome and genome databases, such as the Gene Ontology database. Nevertheless, most computational methods need negative examples to infer new predictions. A variety of algorithms has been proposed in AFP for negative selection, ranging from network- and feature-based heuristics, to hierarchy-based and hierarchy-less strategies. Moreover, several bio-molecular information sources about proteins, such as gene co-expression, genetic and protein-protein interactions data, are naturally encoded in protein networks, where nodes are proteins and edges connect proteins sharing common characteristics. Although selecting negatives in biological networks is thereby a central and challenging problem in computational biology, detecting the characteristics proteins should have to be considered as negative is still a difficult task. It this work, we show that a few protein features extracted from the network help in detecting reliable negatives. We tested such features in two real world experiments: predicting unreliable negatives with an SVM classifier through temporal holdout on model organisms for AFP, and selecting reliable negatives with a clustering-based state-of-the-art negative selection procedure.
Analysis of Informative Features for Negative Selection in Protein Function Prediction / M. Frasca, F. Lipreri, D. Malchiodi (LECTURE NOTES IN COMPUTER SCIENCE). - In: Bioinformatics and Biomedical Engineering / [a cura di] I. Rojas Ignacio, F. Ortuño. - Switzerland : Springer, 2017. - ISBN 9783319561530. - pp. 267-276 (( Intervento presentato al 5. convegno IWBBIO tenutosi a Granada nel 2017 [10.1007/978-3-319-56154-7_25].
Analysis of Informative Features for Negative Selection in Protein Function Prediction
M. FrascaPrimo
;D. Malchiodi
2017
Abstract
Negative examples in automated protein function prediction (AFP), that is proteins known not to possess a given protein function, are usually not directly stored in public proteome and genome databases, such as the Gene Ontology database. Nevertheless, most computational methods need negative examples to infer new predictions. A variety of algorithms has been proposed in AFP for negative selection, ranging from network- and feature-based heuristics, to hierarchy-based and hierarchy-less strategies. Moreover, several bio-molecular information sources about proteins, such as gene co-expression, genetic and protein-protein interactions data, are naturally encoded in protein networks, where nodes are proteins and edges connect proteins sharing common characteristics. Although selecting negatives in biological networks is thereby a central and challenging problem in computational biology, detecting the characteristics proteins should have to be considered as negative is still a difficult task. It this work, we show that a few protein features extracted from the network help in detecting reliable negatives. We tested such features in two real world experiments: predicting unreliable negatives with an SVM classifier through temporal holdout on model organisms for AFP, and selecting reliable negatives with a clustering-based state-of-the-art negative selection procedure.File | Dimensione | Formato | |
---|---|---|---|
NegSel_iwbbio_v2.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
150.57 kB
Formato
Adobe PDF
|
150.57 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.