Public repositories for genome and proteome annotations, such asthe Gene Ontology (GO), rarely stores negative annotations, i.e.proteins not possessing a given function. This leaves undefined orill defined the set of negative examples, which is crucial for trainingthe majority of machine learning methods inferring proteins func-tions. Automated techniques to choose reliable negative proteinsare thereby required to train accurate function prediction models.This study proposes the first extensive analysis of the temporalevolution of protein annotations in the GO repository. Novel an-notations registered through the years have been analyzed to ver-ify the presence of annotation patterns in the GO hierarchy. Ourresearch supplied fundamental clues about proteins likely to be un-reliable as negative examples, that has been verified into a novelalgorithm of our own construction, validated on two organismsin a genome wide fashion against approaches proposed to choosenegative examples in the context of functional prediction.

Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples / M. Sepehri, M. Frasca (ACM INTERNATIONAL CONFERENCE PROCEEDINGS SERIES). - In: ICBET' 19 : Proceedings[s.l] : ACM, 2019. - ISBN 9781450361309. - pp. 294-301 (( Intervento presentato al 9. convegno International Conference on Biomedical Engineering and Technology tenutosi a Tokyo nel 2019 [10.1145/3326172.3326228].

Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples

M. Sepehri
Primo
;
M. Frasca
Ultimo
2019

Abstract

Public repositories for genome and proteome annotations, such asthe Gene Ontology (GO), rarely stores negative annotations, i.e.proteins not possessing a given function. This leaves undefined orill defined the set of negative examples, which is crucial for trainingthe majority of machine learning methods inferring proteins func-tions. Automated techniques to choose reliable negative proteinsare thereby required to train accurate function prediction models.This study proposes the first extensive analysis of the temporalevolution of protein annotations in the GO repository. Novel an-notations registered through the years have been analyzed to ver-ify the presence of annotation patterns in the GO hierarchy. Ourresearch supplied fundamental clues about proteins likely to be un-reliable as negative examples, that has been verified into a novelalgorithm of our own construction, validated on two organismsin a genome wide fashion against approaches proposed to choosenegative examples in the context of functional prediction.
Gene Ontology; protein functions; negative sample selection; protein classification
Settore INF/01 - Informatica
2019
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
main_ICBET19.pdf

accesso riservato

Descrizione: Articolo Principale
Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 4.58 MB
Formato Adobe PDF
4.58 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
p294-Sepehri.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 1.96 MB
Formato Adobe PDF
1.96 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/654886
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact