Among the many proposed solutions in graph embedding, traditional random walk-based embedding methods have shown their promise in several fields. However, when the graph contains high-degree nodes, random walks often neglect low- or middle-degree nodes and tend to prefer stepping through high-degree ones instead. This results in random-walk samples providing a very accurate topological representation of neighbourhoods surrounding high-degree nodes, which contrasts with a coarse-grained representation of neighbourhoods surrounding middle and low-degree nodes. This in turn affects the performance of the subsequent predictive models, which tend to overfit high-degree nodes and/or edges having high-degree nodes as one of the vertices. We propose a solution to this problem, which relies on a degree normalization approach. Experiments with popular RW-based embedding methods applied to edge prediction problems involving eight protein-protein interaction (PPI) graphs from the STRING database show the effectiveness of the proposed approach: degree normalization not only improves predictions but also provides more stable results, suggesting that our proposal has a regularization effect leading to a more robust convergence.
Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs / L. Cappelletti, S. Taverni, T. Fontana, M.P. Joachimiak, J. Reese, P. Robinson, E. Casiraghi, G. Valentini (LECTURE NOTES IN COMPUTER SCIENCE). - In: Bioinformatics and Biomedical Engineering / [a cura di] I. Rojas, O. Valenzuela, F. Rojas Ruiz,L. J. Herrera, F. Ortuño. - Cham : Springer, 2023. - ISBN 978-3-031-34959-1. - pp. 372-383 (( Intervento presentato al 10. convegno IWBBIO : International Work-Conference on Bioinformatics and Biomedical Engineering tenutosi a Meloneras : July 12–14 nel 2023 [10.1007/978-3-031-34960-7_26].
Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs
E. CasiraghiPenultimo
;G. Valentini
Ultimo
2023
Abstract
Among the many proposed solutions in graph embedding, traditional random walk-based embedding methods have shown their promise in several fields. However, when the graph contains high-degree nodes, random walks often neglect low- or middle-degree nodes and tend to prefer stepping through high-degree ones instead. This results in random-walk samples providing a very accurate topological representation of neighbourhoods surrounding high-degree nodes, which contrasts with a coarse-grained representation of neighbourhoods surrounding middle and low-degree nodes. This in turn affects the performance of the subsequent predictive models, which tend to overfit high-degree nodes and/or edges having high-degree nodes as one of the vertices. We propose a solution to this problem, which relies on a degree normalization approach. Experiments with popular RW-based embedding methods applied to edge prediction problems involving eight protein-protein interaction (PPI) graphs from the STRING database show the effectiveness of the proposed approach: degree normalization not only improves predictions but also provides more stable results, suggesting that our proposal has a regularization effect leading to a more robust convergence.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.