Public repositories for genome and proteome annotations, such asthe Gene Ontology (GO), rarely stores negative annotations, i.e.proteins not possessing a given function. This leaves undefined orill defined the set of negative examples, which is crucial for trainingthe majority of machine learning methods inferring proteins func-tions. Automated techniques to choose reliable negative proteinsare thereby required to train accurate function prediction models.This study proposes the first extensive analysis of the temporalevolution of protein annotations in the GO repository. Novel an-notations registered through the years have been analyzed to ver-ify the presence of annotation patterns in the GO hierarchy. Ourresearch supplied fundamental clues about proteins likely to be un-reliable as negative examples, that has been verified into a novelalgorithm of our own construction, validated on two organismsin a genome wide fashion against approaches proposed to choosenegative examples in the context of functional prediction.
Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples / M. Sepehri, M. Frasca (ACM INTERNATIONAL CONFERENCE PROCEEDINGS SERIES). - In: ICBET' 19 : Proceedings[s.l] : ACM, 2019. - ISBN 9781450361309. - pp. 294-301 (( Intervento presentato al 9. convegno International Conference on Biomedical Engineering and Technology tenutosi a Tokyo nel 2019 [10.1145/3326172.3326228].
Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples
M. SepehriPrimo
;M. Frasca
Ultimo
2019
Abstract
Public repositories for genome and proteome annotations, such asthe Gene Ontology (GO), rarely stores negative annotations, i.e.proteins not possessing a given function. This leaves undefined orill defined the set of negative examples, which is crucial for trainingthe majority of machine learning methods inferring proteins func-tions. Automated techniques to choose reliable negative proteinsare thereby required to train accurate function prediction models.This study proposes the first extensive analysis of the temporalevolution of protein annotations in the GO repository. Novel an-notations registered through the years have been analyzed to ver-ify the presence of annotation patterns in the GO hierarchy. Ourresearch supplied fundamental clues about proteins likely to be un-reliable as negative examples, that has been verified into a novelalgorithm of our own construction, validated on two organismsin a genome wide fashion against approaches proposed to choosenegative examples in the context of functional prediction.File | Dimensione | Formato | |
---|---|---|---|
main_ICBET19.pdf
accesso riservato
Descrizione: Articolo Principale
Tipologia:
Pre-print (manoscritto inviato all'editore)
Dimensione
4.58 MB
Formato
Adobe PDF
|
4.58 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
p294-Sepehri.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
1.96 MB
Formato
Adobe PDF
|
1.96 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.