Relation extraction from scientific literature to align with a domain ontology is a well-known challenge in natural language processing, particularly critical in precision medicine. The advent of large language models (LLMs) has enabled the development of new and effective approaches to this problem. However, the extracted relations can be prone to problems (e.g., hallucination) that must be minimized. In this paper, we present the initial development of SPIREX, an extension of the SPIRES-based system designed to extract triples from scientific literature involving RNA molecules. Our system leverages schema constraints in the formulation of LLM prompts and utilizes graph machine learning on our RNA-based knowledge graph, RNA-KG, to assess the plausibility of the extracted triples. RNA-KG comprises more than 12.5M edges representing various types of relationships involving RNA molecules.

SPIREX: Improving LLM-based relation extraction from RNA-focused scientific literature using graph machine learning / E. Cavalleri, M. Soto Gomez, A. Pashaeibarough, D. Malchiodi, J.H. Caufield, J.T. Reese, C. Mungall, P.N. Robinson, E. Casiraghi, G. Valentini, M. Mesiti - In: Proceedings of Workshops at the 50th International Conference on Very Large Data Bases[s.l] : VLDB.org, 2024. - pp. 1-11 (( Intervento presentato al 50. convegno International Conference on Very Large Data Bases tenutosi a Guangzhou nel 2024.

SPIREX: Improving LLM-based relation extraction from RNA-focused scientific literature using graph machine learning

E. Cavalleri
Primo
;
M. Soto Gomez
Secondo
;
A. Pashaeibarough;D. Malchiodi;E. Casiraghi;G. Valentini
Penultimo
;
M. Mesiti
Ultimo
2024

Abstract

Relation extraction from scientific literature to align with a domain ontology is a well-known challenge in natural language processing, particularly critical in precision medicine. The advent of large language models (LLMs) has enabled the development of new and effective approaches to this problem. However, the extracted relations can be prone to problems (e.g., hallucination) that must be minimized. In this paper, we present the initial development of SPIREX, an extension of the SPIRES-based system designed to extract triples from scientific literature involving RNA molecules. Our system leverages schema constraints in the formulation of LLM prompts and utilizes graph machine learning on our RNA-based knowledge graph, RNA-KG, to assess the plausibility of the extracted triples. RNA-KG comprises more than 12.5M edges representing various types of relationships involving RNA molecules.
LLMs; Machine Learning; RNA; NLP; Biomedical Knowledge Graphs
Settore INFO-01/A - Informatica
   National Center for Gene Therapy and Drugs based on RNA Technology
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
   CN00000041

   MUSA - Multilayered Urban Sustainability Actiona
   MUSA
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
2024
https://vldb.org/workshops/2024/proceedings/LLM+KG/LLM+KG-12.pdf
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
LLM+KG-12.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.69 MB
Formato Adobe PDF
1.69 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1146216
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact