Public repositories for genome and proteome annotations, such asthe Gene Ontology (GO), rarely stores negative annotations, i.e.proteins not possessing a given function. This leaves undefined orill defined the set of negative examples, which is crucial for trainingthe majority of machine learning methods inferring proteins func-tions. Automated techniques to choose reliable negative proteinsare thereby required to train accurate function prediction models.This study proposes the first extensive analysis of the temporalevolution of protein annotations in the GO repository. Novel an-notations registered through the years have been analyzed to ver-ify the presence of annotation patterns in the GO hierarchy. Ourresearch supplied fundamental clues about proteins likely to be un-reliable as negative examples, that has been verified into a novelalgorithm of our own construction, validated on two organismsin a genome wide fashion against approaches proposed to choosenegative examples in the context of functional prediction.
Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples / M. Sepehri, M. Frasca (ACM INTERNATIONAL CONFERENCE PROCEEDINGS SERIES). - In: ICBET' 19 : Proceedings[s.l] : ACM, 2019. - ISBN 9781450361309. - pp. 294-301 (( Intervento presentato al 9. convegno International Conference on Biomedical Engineering and Technology tenutosi a Tokyo nel 2019.
|Titolo:||Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples|
SEPEHRI, MARYAM (Primo)
FRASCA, MARCO (Ultimo) (Corresponding)
|Parole Chiave:||Gene Ontology; protein functions; negative sample selection; protein classification|
|Settore Scientifico Disciplinare:||Settore INF/01 - Informatica|
|Data di pubblicazione:||2019|
|Digital Object Identifier (DOI):||http://dx.doi.org/10.1145/3326172.3326228|
|Tipologia:||Book Part (author)|
|Appare nelle tipologie:||03 - Contributo in volume|