Automated protein function prediction is a challenging problem with distinctive features, such as the hierarchical organization of protein functions and the scarcity of annotated proteins for most biological functions. We propose a multitask learning algorithm addressing both issues. Unlike standard multitask algorithms, which use task (protein functions) similarity information as a bias to speed up learning, we show that dissimilarity information enforces separation of rare class labels from frequent class labels, and for this reason is better suited for solving unbalanced protein function prediction problems. We support our claim by showing that a multitask extension of the label propagation algorithm empirically works best when the task relatedness information is represented using a dissimilarity matrix as opposed to a similarity matrix. Moreover, the experimental comparison carried out on three model organism shows that our method has a more stable performance in both "protein-centric" and "function-centric" evaluation settings.

Multitask Protein Function Prediction Through Task Dissimilarity / M. Frasca, N..A. Cesa Bianchi. - In: IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. - ISSN 1545-5963. - 16:5(2019), pp. 1550-1560. [10.1109/TCBB.2017.2684127]

Multitask Protein Function Prediction Through Task Dissimilarity

M. Frasca
;
N..A. Cesa Bianchi
2019

Abstract

Automated protein function prediction is a challenging problem with distinctive features, such as the hierarchical organization of protein functions and the scarcity of annotated proteins for most biological functions. We propose a multitask learning algorithm addressing both issues. Unlike standard multitask algorithms, which use task (protein functions) similarity information as a bias to speed up learning, we show that dissimilarity information enforces separation of rare class labels from frequent class labels, and for this reason is better suited for solving unbalanced protein function prediction problems. We support our claim by showing that a multitask extension of the label propagation algorithm empirically works best when the task relatedness information is represented using a dissimilarity matrix as opposed to a similarity matrix. Moreover, the experimental comparison carried out on three model organism shows that our method has a more stable performance in both "protein-centric" and "function-centric" evaluation settings.
Proteins; Protein engineering; Prediction algorithms; Symmetric matrices; Labeling; Standards; Ontologies; Multitask learning; protein function prediction; label propagation algorithm; gene ontology; task dissimilarity
Settore INF/01 - Informatica
2019
17-mar-2017
Article (author)
File in questo prodotto:
File Dimensione Formato  
MTLPFrascaCesaBianchi.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 328.71 kB
Formato Adobe PDF
328.71 kB Adobe PDF Visualizza/Apri
07880576.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 539.6 kB
Formato Adobe PDF
539.6 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/501051
Citazioni
  • ???jsp.display-item.citation.pmc??? 8
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 9
social impact