Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives.
Evaluating the impact of topological protein features on the negative examples selection / P. Boldi, M. Frasca, D. Malchiodi. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - 19:suppl. 14(2018 Nov 20), pp. 417.115-417.126. [10.1186/s12859-018-2385-x]
Evaluating the impact of topological protein features on the negative examples selection
P. Boldi;M. Frasca
;D. Malchiodi
2018
Abstract
Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives.File | Dimensione | Formato | |
---|---|---|---|
final.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
2.42 MB
Formato
Adobe PDF
|
2.42 MB | Adobe PDF | Visualizza/Apri |
correction.pdf
accesso aperto
Tipologia:
Altro
Dimensione
259.8 kB
Formato
Adobe PDF
|
259.8 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.