Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives.

Evaluating the impact of topological protein features on the negative examples selection / P. Boldi, M. Frasca, D. Malchiodi. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - 19:suppl. 14(2018 Nov 20), pp. 417.115-417.126. [10.1186/s12859-018-2385-x]

Evaluating the impact of topological protein features on the negative examples selection

P. Boldi;M. Frasca
;
D. Malchiodi
2018

Abstract

Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives.
biological networks; negative example selection; protein features; protein function prediction
Settore INF/01 - Informatica
20-nov-2018
Article (author)
File in questo prodotto:
File Dimensione Formato  
final.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 2.42 MB
Formato Adobe PDF
2.42 MB Adobe PDF Visualizza/Apri
correction.pdf

accesso aperto

Tipologia: Altro
Dimensione 259.8 kB
Formato Adobe PDF
259.8 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/602835
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact