"Motivation" - In network biology and medicine several problems can be modeled as node label inference in partially labeled networks. Nodes are biomedical entities (e.g. genes, patients) and connections represent a notion of functional similarity between entities. Usually, the class being predicted is represented through a labeling vector highly unbalanced towards negatives: that is only few positive instances (those associated with the class) are available. This fosters the adoption of imbalanceaware methodologies to accurately predict node labels. In addition, input data can be largesized, since we may have millions of instances (e.g. in multispecies protein networks), thus requiring the design of efficient and scalable methodologies. To address these problems, a parametric neural algorithm based on the Hopfield model, COSNet [1,2,3], has been proposed, leveraging the minimization of a Hopfield network energy through the usual sequential dynamics to achieve an asymptotically stable attractor representing a valuable prediction. In this study, we propose a sparse and partially parallel implementation of COSNet, for sparse networks, which decomposes the input net in independent sets of neurons, each processed concurrently by hardware accelerators, like modern GPUs, while still keeping the overall dynamics sequential. "Methods" - The Hopfield dynamics is decomposed in independent tasks by solving the graph coloring problem, that is assigning colors to the graph vertices so that adjacent vertices receive different colors. Thus, the units of the neural network are split into clusters of independent neurons, which are sequentially updated, whereas the single units within each cluster are updated simultaneously. We simulate the algorithm on GPUs achieving a significant speed up with respect to the original sequential implementation and, at the same time, lowering memory requirements thanks to compressed memorization strategies, thus opening the possibility to face with prediction issues on big size instances. Also, a cooperative CPU multithreading – GPU model have been implemented, where the computations over different functional classes are carried independently by assigning each class to a different CPU thread. "Results" - We tested both COSNet and COSNetGPU on partially labeled networks containing genes belonging to D. melanogaster and Homo sapiens organisms for predicting respectively the Gene Ontology (GO) and the Human Phenotype Ontology (HPO) terms with 1050 annotated genes. The algorithm behavior has been measured in terms of execution time and memory consumption. Table 1 summarizes the results in term of speedup and memory usage, when performing a 3fold cross validation procedure. The results show significant reductions in both execution times and memory consumption, and interestingly the improvement factors increases more than linearly with the number of nodes/genes. This also corroborates the fact that the proposed implementation nicely scales on big data.
Speeding up node label learning in unbalanced biomolecular networks through a parallel and sparse GPUbased Hopfield model / A. Petrini, M. Notaro, J. Gliozzo, G. Valentini, G. Grossi, M. Frasca. ((Intervento presentato al 14. convegno Annual Meeting of the Bioinformatics Italian Society tenutosi a Cagliari nel 2017.
Speeding up node label learning in unbalanced biomolecular networks through a parallel and sparse GPUbased Hopfield model
A. PetriniPrimo
;M. Notaro;J. Gliozzo;G. Valentini;G. Grossi;M. Frasca
2017
Abstract
"Motivation" - In network biology and medicine several problems can be modeled as node label inference in partially labeled networks. Nodes are biomedical entities (e.g. genes, patients) and connections represent a notion of functional similarity between entities. Usually, the class being predicted is represented through a labeling vector highly unbalanced towards negatives: that is only few positive instances (those associated with the class) are available. This fosters the adoption of imbalanceaware methodologies to accurately predict node labels. In addition, input data can be largesized, since we may have millions of instances (e.g. in multispecies protein networks), thus requiring the design of efficient and scalable methodologies. To address these problems, a parametric neural algorithm based on the Hopfield model, COSNet [1,2,3], has been proposed, leveraging the minimization of a Hopfield network energy through the usual sequential dynamics to achieve an asymptotically stable attractor representing a valuable prediction. In this study, we propose a sparse and partially parallel implementation of COSNet, for sparse networks, which decomposes the input net in independent sets of neurons, each processed concurrently by hardware accelerators, like modern GPUs, while still keeping the overall dynamics sequential. "Methods" - The Hopfield dynamics is decomposed in independent tasks by solving the graph coloring problem, that is assigning colors to the graph vertices so that adjacent vertices receive different colors. Thus, the units of the neural network are split into clusters of independent neurons, which are sequentially updated, whereas the single units within each cluster are updated simultaneously. We simulate the algorithm on GPUs achieving a significant speed up with respect to the original sequential implementation and, at the same time, lowering memory requirements thanks to compressed memorization strategies, thus opening the possibility to face with prediction issues on big size instances. Also, a cooperative CPU multithreading – GPU model have been implemented, where the computations over different functional classes are carried independently by assigning each class to a different CPU thread. "Results" - We tested both COSNet and COSNetGPU on partially labeled networks containing genes belonging to D. melanogaster and Homo sapiens organisms for predicting respectively the Gene Ontology (GO) and the Human Phenotype Ontology (HPO) terms with 1050 annotated genes. The algorithm behavior has been measured in terms of execution time and memory consumption. Table 1 summarizes the results in term of speedup and memory usage, when performing a 3fold cross validation procedure. The results show significant reductions in both execution times and memory consumption, and interestingly the improvement factors increases more than linearly with the number of nodes/genes. This also corroborates the fact that the proposed implementation nicely scales on big data.File | Dimensione | Formato | |
---|---|---|---|
BITS17-gcosnet.pdf
accesso aperto
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
284.87 kB
Formato
Adobe PDF
|
284.87 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.