Motivation: We recently introduced RNA-knowledge graph (KG), an ontology-based KG that integrates biological data on RNAs from over 60 public databases. RNA-KG captures functional relationships and interactions between RNA molecules and other biomolecules, chemicals, and biomedical concepts such as diseases and phenotypes, all represented within graph-structured bio-ontologies. We present the first comprehensive computational analysis of RNA-KG, evaluating the potential of graph representation learning and machine learning models to predict node types and edges within the graph. Results: We performed node classification experiments to predict up to 81 distinct node types, and performed both generic- and specific-edge prediction tasks. Generic-edge prediction focused on identifying the presence of an edge irrespective of its type, while specific-edge prediction targeted specific interactions between ncRNAs, e.g. between microRNAs (miRNA-miRNA) or between small interfering RNA-messenger and RNA-messenger molecules (siRNA-mRNA), or relationships between ncRNA and biomedical concepts, e.g. miRNA-disease or lncRNA-Gene Ontology term relationships. Using embedding methods for homogeneous graphs, such as Large-scale Information Network Embedding (LINE) and node2vec, in combination with machine learning models like decision trees and random forests, we achieved balanced accuracy exceeding 90% for the 20 most common node types and over 80% for most specific-edge prediction tasks. These results show that simple embedding methods for homogeneous graphs can successfully predict nodes and edges of the RNA-KG, paving the way to discover novel ncRNA interactions and laying the foundation for further exploration, and utilization of this rich information source to enhance prediction accuracy and support further research into the “RNA world.” Availability and implementation: Python code to reproduce the experiments is available at https://github.com/AnacletoLAB/RNA-KG_homogeneous_emb_analysis

RNA knowledge-graph analysis through homogeneous embedding methods / F. Torgano, M. Soto Gomez, M. Zignani, J. Gliozzo, E. Cavalleri, M. Mesiti, E. Casiraghi, G. Valentini. - In: BIOINFORMATICS ADVANCES. - ISSN 2635-0041. - 5:1(2025 May 13), pp. vbaf109.1-vbaf109.9. [10.1093/bioadv/vbaf109]

RNA knowledge-graph analysis through homogeneous embedding methods

M. Soto Gomez
Secondo
;
M. Zignani;J. Gliozzo;E. Cavalleri;M. Mesiti;E. Casiraghi
Penultimo
;
G. Valentini
Ultimo
2025

Abstract

Motivation: We recently introduced RNA-knowledge graph (KG), an ontology-based KG that integrates biological data on RNAs from over 60 public databases. RNA-KG captures functional relationships and interactions between RNA molecules and other biomolecules, chemicals, and biomedical concepts such as diseases and phenotypes, all represented within graph-structured bio-ontologies. We present the first comprehensive computational analysis of RNA-KG, evaluating the potential of graph representation learning and machine learning models to predict node types and edges within the graph. Results: We performed node classification experiments to predict up to 81 distinct node types, and performed both generic- and specific-edge prediction tasks. Generic-edge prediction focused on identifying the presence of an edge irrespective of its type, while specific-edge prediction targeted specific interactions between ncRNAs, e.g. between microRNAs (miRNA-miRNA) or between small interfering RNA-messenger and RNA-messenger molecules (siRNA-mRNA), or relationships between ncRNA and biomedical concepts, e.g. miRNA-disease or lncRNA-Gene Ontology term relationships. Using embedding methods for homogeneous graphs, such as Large-scale Information Network Embedding (LINE) and node2vec, in combination with machine learning models like decision trees and random forests, we achieved balanced accuracy exceeding 90% for the 20 most common node types and over 80% for most specific-edge prediction tasks. These results show that simple embedding methods for homogeneous graphs can successfully predict nodes and edges of the RNA-KG, paving the way to discover novel ncRNA interactions and laying the foundation for further exploration, and utilization of this rich information source to enhance prediction accuracy and support further research into the “RNA world.” Availability and implementation: Python code to reproduce the experiments is available at https://github.com/AnacletoLAB/RNA-KG_homogeneous_emb_analysis
No
English
knowledge graph; graph representation learning; RNA;
Settore INFO-01/A - Informatica
Articolo
Esperti anonimi
Pubblicazione scientifica
   National Center for Gene Therapy and Drugs based on RNA Technology (CN3 RNA)
   CN3 RNA
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   CN00000041
13-mag-2025
Oxford University Press
5
1
vbaf109
1
9
9
Pubblicato
Periodico con rilevanza internazionale
https://academic.oup.com/bioinformaticsadvances/article/5/1/vbaf109/8129559
crossref
Aderisco
info:eu-repo/semantics/article
RNA knowledge-graph analysis through homogeneous embedding methods / F. Torgano, M. Soto Gomez, M. Zignani, J. Gliozzo, E. Cavalleri, M. Mesiti, E. Casiraghi, G. Valentini. - In: BIOINFORMATICS ADVANCES. - ISSN 2635-0041. - 5:1(2025 May 13), pp. vbaf109.1-vbaf109.9. [10.1093/bioadv/vbaf109]
open
Prodotti della ricerca::01 - Articolo su periodico
8
262
Article (author)
Periodico senza Impact Factor
F. Torgano, M. Soto Gomez, M. Zignani, J. Gliozzo, E. Cavalleri, M. Mesiti, E. Casiraghi, G. Valentini
File in questo prodotto:
File Dimensione Formato  
vbaf109.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 1.51 MB
Formato Adobe PDF
1.51 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1168515
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact