"Motivation" - In network biology and medicine several problems can be modeled as node label inference in partially labeled networks. Nodes are biomedical entities (e.g. genes, patients) and connections represent a notion of functional similarity between entities. Usually, the class being predicted is represented through a labeling vector highly unbalanced towards negatives: that is only few positive instances (those associated with the class) are available. This fosters the adoption of imbalance­aware methodologies to accurately predict node labels. In addition, input data can be large­sized, since we may have millions of instances (e.g. in multi­species protein networks), thus requiring the design of efficient and scalable methodologies. To address these problems, a parametric neural algorithm based on the Hopfield model, COSNet [1,2,3], has been proposed, leveraging the minimization of a Hopfield network energy through the usual sequential dynamics to achieve an asymptotically stable attractor representing a valuable prediction. In this study, we propose a sparse and partially parallel implementation of COSNet, for sparse networks, which decomposes the input net in independent sets of neurons, each processed concurrently by hardware accelerators, like modern GPUs, while still keeping the overall dynamics sequential. "Methods" - The Hopfield dynamics is decomposed in independent tasks by solving the graph coloring problem, that is assigning colors to the graph vertices so that adjacent vertices receive different colors. Thus, the units of the neural network are split into clusters of independent neurons, which are sequentially updated, whereas the single units within each cluster are updated simultaneously. We simulate the algorithm on GPUs achieving a significant speed up with respect to the original sequential implementation and, at the same time, lowering memory requirements thanks to compressed memorization strategies, thus opening the possibility to face with prediction issues on big size instances. Also, a cooperative CPU multithreading – GPU model have been implemented, where the computations over different functional classes are carried independently by assigning each class to a different CPU thread. "Results" - We tested both COSNet and COSNet­GPU on partially labeled networks containing genes belonging to D. melanogaster and Homo sapiens organisms for predicting respectively the Gene Ontology (GO) and the Human Phenotype Ontology (HPO) terms with 10­50 annotated genes. The algorithm behavior has been measured in terms of execution time and memory consumption. Table 1 summarizes the results in term of speed­up and memory usage, when performing a 3­fold cross validation procedure. The results show significant reductions in both execution times and memory consumption, and interestingly the improvement factors increases more than linearly with the number of nodes/genes. This also corroborates the fact that the proposed implementation nicely scales on big data.

Speeding up node label learning in unbalanced biomolecular networks through a parallel and sparse GPU­based Hopfield model / A. Petrini, M. Notaro, J. Gliozzo, G. Valentini, G. Grossi, M. Frasca. ((Intervento presentato al 14. convegno Annual Meeting of the Bioinformatics Italian Society tenutosi a Cagliari nel 2017.

Speeding up node label learning in unbalanced biomolecular networks through a parallel and sparse GPU­based Hopfield model

A. Petrini
Primo
;
M. Notaro;J. Gliozzo;G. Valentini;G. Grossi;M. Frasca
2017

Abstract

"Motivation" - In network biology and medicine several problems can be modeled as node label inference in partially labeled networks. Nodes are biomedical entities (e.g. genes, patients) and connections represent a notion of functional similarity between entities. Usually, the class being predicted is represented through a labeling vector highly unbalanced towards negatives: that is only few positive instances (those associated with the class) are available. This fosters the adoption of imbalance­aware methodologies to accurately predict node labels. In addition, input data can be large­sized, since we may have millions of instances (e.g. in multi­species protein networks), thus requiring the design of efficient and scalable methodologies. To address these problems, a parametric neural algorithm based on the Hopfield model, COSNet [1,2,3], has been proposed, leveraging the minimization of a Hopfield network energy through the usual sequential dynamics to achieve an asymptotically stable attractor representing a valuable prediction. In this study, we propose a sparse and partially parallel implementation of COSNet, for sparse networks, which decomposes the input net in independent sets of neurons, each processed concurrently by hardware accelerators, like modern GPUs, while still keeping the overall dynamics sequential. "Methods" - The Hopfield dynamics is decomposed in independent tasks by solving the graph coloring problem, that is assigning colors to the graph vertices so that adjacent vertices receive different colors. Thus, the units of the neural network are split into clusters of independent neurons, which are sequentially updated, whereas the single units within each cluster are updated simultaneously. We simulate the algorithm on GPUs achieving a significant speed up with respect to the original sequential implementation and, at the same time, lowering memory requirements thanks to compressed memorization strategies, thus opening the possibility to face with prediction issues on big size instances. Also, a cooperative CPU multithreading – GPU model have been implemented, where the computations over different functional classes are carried independently by assigning each class to a different CPU thread. "Results" - We tested both COSNet and COSNet­GPU on partially labeled networks containing genes belonging to D. melanogaster and Homo sapiens organisms for predicting respectively the Gene Ontology (GO) and the Human Phenotype Ontology (HPO) terms with 10­50 annotated genes. The algorithm behavior has been measured in terms of execution time and memory consumption. Table 1 summarizes the results in term of speed­up and memory usage, when performing a 3­fold cross validation procedure. The results show significant reductions in both execution times and memory consumption, and interestingly the improvement factors increases more than linearly with the number of nodes/genes. This also corroborates the fact that the proposed implementation nicely scales on big data.
2017
Settore INF/01 - Informatica
Speeding up node label learning in unbalanced biomolecular networks through a parallel and sparse GPU­based Hopfield model / A. Petrini, M. Notaro, J. Gliozzo, G. Valentini, G. Grossi, M. Frasca. ((Intervento presentato al 14. convegno Annual Meeting of the Bioinformatics Italian Society tenutosi a Cagliari nel 2017.
Conference Object
File in questo prodotto:
File Dimensione Formato  
BITS17-gcosnet.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 284.87 kB
Formato Adobe PDF
284.87 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1022608
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact