The selection of negative samples plays a crucial role in a wide range of machine learning algorithms and is particularly critical in edge classification tasks, where this choice has a direct impact on predictive performance. In this paper, we propose a set of strategies for generating negative edges in large, heterogeneous biomedical knowledge graphs, tailored to different link prediction scenarios. In these graphs, the absence of an observed edge does not necessarily indicate the absence of a relationship; instead, it may simply reflect missing or undiscovered knowledge. Leveraging latent-space graph embeddings, we analyze the impact of different negative sample selection strategies that account for both node types and edge semantics. Our initial experiments on two biomedical knowledge graphs demonstrate substantial improvements in classification performance, independent of the underlying predictive model, highlighting the robustness and effectiveness of the proposed approach. Results show that our strategies for generating negative edges in a knowledge graph outperform random negative sampling, yielding statistically significant improvements in balanced accuracy. Code and data for reproducing experiments are available at https://github.com/SLIMlaboratory/glow26 and https://zenodo.org/records/18074722.

Better Negatives, Better Predictions: Negative Sample Selection Strategies for Enhancing Biomedical KG Edge Classification / E. Cavalleri, M.A. - In: WWW Companion '26: Companion[s.l] : ACM, 2026 May. - ISBN 9798400723087. - pp. 597-606 (( 35. ACM Web Conference Dubai 2026 [10.1145/3774905.3794655].

Better Negatives, Better Predictions: Negative Sample Selection Strategies for Enhancing Biomedical KG Edge Classification

E. Cavalleri
Primo
;
M. Mesiti;D. Malchiodi
Ultimo
2026

Abstract

The selection of negative samples plays a crucial role in a wide range of machine learning algorithms and is particularly critical in edge classification tasks, where this choice has a direct impact on predictive performance. In this paper, we propose a set of strategies for generating negative edges in large, heterogeneous biomedical knowledge graphs, tailored to different link prediction scenarios. In these graphs, the absence of an observed edge does not necessarily indicate the absence of a relationship; instead, it may simply reflect missing or undiscovered knowledge. Leveraging latent-space graph embeddings, we analyze the impact of different negative sample selection strategies that account for both node types and edge semantics. Our initial experiments on two biomedical knowledge graphs demonstrate substantial improvements in classification performance, independent of the underlying predictive model, highlighting the robustness and effectiveness of the proposed approach. Results show that our strategies for generating negative edges in a knowledge graph outperform random negative sampling, yielding statistically significant improvements in balanced accuracy. Code and data for reproducing experiments are available at https://github.com/SLIMlaboratory/glow26 and https://zenodo.org/records/18074722.
Negative edge selection; graph representation learning; knowledge graphs; link prediction; edge classification
Settore INFO-01/A - Informatica
mag-2026
ACM
https://dl.acm.org/doi/10.1145/3774905.3794655
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
3774905.3794655.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 3.59 MB
Formato Adobe PDF
3.59 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1250341
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact