IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives.

Evaluating the impact of topological protein features on the negative examples selection / P. Boldi, M. Frasca, D. Malchiodi. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - 19:suppl. 14(2018 Nov 20), pp. 417.115-417.126. [10.1186/s12859-018-2385-x]

Evaluating the impact of topological protein features on the negative examples selection

P. Boldi;M. Frasca;D. Malchiodi

2018

Abstract

Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
			biological networks; negative example selection; protein features; protein function prediction
		
	Settori scientifico-disciplinari dell'articolo
	
			Settore INF/01 - Informatica
		
	Data di pubblicazione
	
			20-nov-2018
		
	Rivista in ANCE
	
			BMC BIOINFORMATICS
		
	DOI
	
			https://dx.doi.org/10.1186/s12859-018-2385-x
		
	Tipologia
	
			Article (author)
		
	Appare nelle tipologie:
	
			01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
final.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 2.42 MB Formato Adobe PDF Visualizza/Apri	2.42 MB	Adobe PDF	Visualizza/Apri
correction.pdf accesso aperto Tipologia: Altro Dimensione 259.8 kB Formato Adobe PDF Visualizza/Apri	259.8 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/602835

Citazioni

1

2

1

social impact