Relation extraction from the scientific literature to comply with a domain ontology is a well-known problem in natural language processing and is particularly critical in precision medicine. The advent of large language models (LLMs) has paved the way for the development of new effective approaches to this problem, but the extracted relations can be affected by issues such as hallucination, which must be minimized. In this paper, we present the initial design and preliminary experimental validation of SPIREX, an extension of the SPIRES-based system for the extraction of RDF triples from scientific literature involving RNA molecules. Our system exploits schema constraints in the formulations of LLM prompts along with our RNA-based KG, RNA-KG, for evaluating the plausibility of the extracted triples. RNA-KG contains more than 9M edges representing different kinds of relationships in which RNA molecules can be involved. Initial experimental results on a controlled data set are quite encouraging.

Initial achievements in relation extraction from RNA-focused scientific papers / E. Cavalleri, M. Soto-Gomez, A. Pashaeibarough, D. Malchiodi, H. Caufield, J. Reese, C.J. Mungall, P.N. Robinson, E. Casiraghi, G. Valentini, M. Mesiti (CEUR WORKSHOP PROCEEDINGS). - In: SEBD 2024 : Symposium on Advanced Database Systems 2024 / [a cura di] M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta. - [s.l] : CEUR-WS, 2024. - pp. 61-69 (( Intervento presentato al 32. convegno Italian Symposium on Advanced Database Systems tenutosi a Villasimius nel 2024.

Initial achievements in relation extraction from RNA-focused scientific papers

E. Cavalleri
Primo
;
M. Soto-Gomez
Secondo
;
A. Pashaeibarough;D. Malchiodi;E. Casiraghi;G. Valentini;M. Mesiti
Ultimo
2024

Abstract

Relation extraction from the scientific literature to comply with a domain ontology is a well-known problem in natural language processing and is particularly critical in precision medicine. The advent of large language models (LLMs) has paved the way for the development of new effective approaches to this problem, but the extracted relations can be affected by issues such as hallucination, which must be minimized. In this paper, we present the initial design and preliminary experimental validation of SPIREX, an extension of the SPIRES-based system for the extraction of RDF triples from scientific literature involving RNA molecules. Our system exploits schema constraints in the formulations of LLM prompts along with our RNA-based KG, RNA-KG, for evaluating the plausibility of the extracted triples. RNA-KG contains more than 9M edges representing different kinds of relationships in which RNA molecules can be involved. Initial experimental results on a controlled data set are quite encouraging.
Link Prediction; LLM; Prompt Engineering; relation discovery; RNA-based Knowledge Graphs
Settore INFO-01/A - Informatica
2024
https://ceur-ws.org/Vol-3741/paper53.pdf
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
paper53.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 1.19 MB
Formato Adobe PDF
1.19 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1172229
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact