Relation extraction from the scientific literature to comply with a domain ontology is a well-known problem in natural language processing and is particularly critical in precision medicine. The advent of large language models (LLMs) has paved the way for the development of new effective approaches to this problem, but the extracted relations can be affected by issues such as hallucination, which must be minimized. In this paper, we present the initial design and preliminary experimental validation of SPIREX, an extension of the SPIRES-based system for the extraction of RDF triples from scientific literature involving RNA molecules. Our system exploits schema constraints in the formulations of LLM prompts along with our RNA-based KG, RNA-KG, for evaluating the plausibility of the extracted triples. RNA-KG contains more than 9M edges representing different kinds of relationships in which RNA molecules can be involved. Initial experimental results on a controlled data set are quite encouraging.

Initial achievements in relation extraction from RNA-focused scientific papers / E. Cavalleri, M. Soto-Gomez, A. Pashaeibarough, D. Malchiodi, H. Caufield, J. Reese, C.J. Mungall, P.N. Robinson, E. Casiraghi, G. Valentini, M. Mesiti (CEUR WORKSHOP PROCEEDINGS). - In: SEBD 2024 : Symposium on Advanced Database Systems 2024 / [a cura di] M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta. - [s.l] : CEUR-WS, 2024. - pp. 61-69 (( Intervento presentato al 32. convegno Italian Symposium on Advanced Database Systems tenutosi a Villasimius nel 2024.

Initial achievements in relation extraction from RNA-focused scientific papers

E. Cavalleri
Primo
;
M. Soto-Gomez
Secondo
;
A. Pashaeibarough;D. Malchiodi;E. Casiraghi;G. Valentini;M. Mesiti
Ultimo
2024

Abstract

Relation extraction from the scientific literature to comply with a domain ontology is a well-known problem in natural language processing and is particularly critical in precision medicine. The advent of large language models (LLMs) has paved the way for the development of new effective approaches to this problem, but the extracted relations can be affected by issues such as hallucination, which must be minimized. In this paper, we present the initial design and preliminary experimental validation of SPIREX, an extension of the SPIRES-based system for the extraction of RDF triples from scientific literature involving RNA molecules. Our system exploits schema constraints in the formulations of LLM prompts along with our RNA-based KG, RNA-KG, for evaluating the plausibility of the extracted triples. RNA-KG contains more than 9M edges representing different kinds of relationships in which RNA molecules can be involved. Initial experimental results on a controlled data set are quite encouraging.
English
Link Prediction; LLM; Prompt Engineering; relation discovery; RNA-based Knowledge Graphs
Settore INFO-01/A - Informatica
Intervento a convegno
Esperti anonimi
Pubblicazione scientifica
SEBD 2024 : Symposium on Advanced Database Systems 2024
M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta
CEUR-WS
2024
61
69
9
3741
Volume a diffusione nazionale
Gold
Italian Symposium on Advanced Database Systems
Villasimius
2024
32
https://ceur-ws.org/Vol-3741/paper53.pdf
orcid
Aderisco
E. Cavalleri, M. Soto-Gomez, A. Pashaeibarough, D. Malchiodi, H. Caufield, J. Reese, C.J. Mungall, P.N. Robinson, E. Casiraghi, G. Valentini, M. Mesit...espandi
Book Part (author)
open
273
Initial achievements in relation extraction from RNA-focused scientific papers / E. Cavalleri, M. Soto-Gomez, A. Pashaeibarough, D. Malchiodi, H. Caufield, J. Reese, C.J. Mungall, P.N. Robinson, E. Casiraghi, G. Valentini, M. Mesiti (CEUR WORKSHOP PROCEEDINGS). - In: SEBD 2024 : Symposium on Advanced Database Systems 2024 / [a cura di] M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta. - [s.l] : CEUR-WS, 2024. - pp. 61-69 (( Intervento presentato al 32. convegno Italian Symposium on Advanced Database Systems tenutosi a Villasimius nel 2024.
info:eu-repo/semantics/bookPart
11
Prodotti della ricerca::03 - Contributo in volume
File in questo prodotto:
File Dimensione Formato  
paper53.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 1.19 MB
Formato Adobe PDF
1.19 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1172229
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact