A recent class of gene/protein function predictors, based on Graph Semi Supervised Learning (GSSL), is able to exploit the functional relationships between genes to propagate existing annotations to unannotated genes that are topologically related in the network. As the prediction of gene functions using network-based methods is frequently performed at whole genome level, the development of scalable methods is of critical importance to make feasible the analysis of very large graphs. Unfortunately GSSL methods scale poorly with the size of the graph [1] and usually have time complexity that becomes quickly prohibitive in large graphs, thus preventing their adoption in whole genome applications. This problem is particularly evident with the prediction of the function of genes in high eukaryotes like mammalians or plants. We propose a novel framework for scalable semi-supervised network-based learning of gene functions that: provides a “local implementation” of both classical algorithms (e.g. random walks and random walks with restart) and recently proposed methods (e.g. kernelized score functions), based on a “vertex centric” computational model; computes a random walk graph kernel without approximation; does not make assumptions on the nature of the considered network; exploits graph database technologies for the storage of the graph and for efficiently handling nodes and edges in secondary memory.

Scalable Network-based Learning Methods for Automated Function Prediction based on the Neo4j Graph-database / M. Mesiti, M. Re, G. Valentini. ((Intervento presentato al convegno Automated Function Prediction - ISMB 2013 tenutosi a Berlin nel 2013.

Scalable Network-based Learning Methods for Automated Function Prediction based on the Neo4j Graph-database

M. Mesiti
Primo
;
M. Re
Secondo
;
G. Valentini
Ultimo
2013

Abstract

A recent class of gene/protein function predictors, based on Graph Semi Supervised Learning (GSSL), is able to exploit the functional relationships between genes to propagate existing annotations to unannotated genes that are topologically related in the network. As the prediction of gene functions using network-based methods is frequently performed at whole genome level, the development of scalable methods is of critical importance to make feasible the analysis of very large graphs. Unfortunately GSSL methods scale poorly with the size of the graph [1] and usually have time complexity that becomes quickly prohibitive in large graphs, thus preventing their adoption in whole genome applications. This problem is particularly evident with the prediction of the function of genes in high eukaryotes like mammalians or plants. We propose a novel framework for scalable semi-supervised network-based learning of gene functions that: provides a “local implementation” of both classical algorithms (e.g. random walks and random walks with restart) and recently proposed methods (e.g. kernelized score functions), based on a “vertex centric” computational model; computes a random walk graph kernel without approximation; does not make assumptions on the nature of the considered network; exploits graph database technologies for the storage of the graph and for efficiently handling nodes and edges in secondary memory.
lug-2013
Settore INF/01 - Informatica
ISCB
http://homes.di.unimi.it/~valenti/papers/AFP-ISMB13ValentiniReMesiti-final.pdf
Scalable Network-based Learning Methods for Automated Function Prediction based on the Neo4j Graph-database / M. Mesiti, M. Re, G. Valentini. ((Intervento presentato al convegno Automated Function Prediction - ISMB 2013 tenutosi a Berlin nel 2013.
Conference Object
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/224160
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact