Relation extraction from the scientific literature to comply with a domain ontology is a well-known problem in natural language processing and is particularly critical in precision medicine. The advent of large language models (LLMs) has paved the way for the development of new effective approaches to this problem, but the extracted relations can be affected by issues such as hallucination, which must be minimized. In this paper, we present the initial design and preliminary experimental validation of SPIREX, an extension of the SPIRES-based system for the extraction of RDF triples from scientific literature involving RNA molecules. Our system exploits schema constraints in the formulations of LLM prompts along with our RNA-based KG, RNA-KG, for evaluating the plausibility of the extracted triples. RNA-KG contains more than 9M edges representing different kinds of relationships in which RNA molecules can be involved. Initial experimental results on a controlled data set are quite encouraging.
Initial achievements in relation extraction from RNA-focused scientific papers / E. Cavalleri, M. Soto-Gomez, A. Pashaeibarough, D. Malchiodi, H. Caufield, J. Reese, C.J. Mungall, P.N. Robinson, E. Casiraghi, G. Valentini, M. Mesiti (CEUR WORKSHOP PROCEEDINGS). - In: SEBD 2024 : Symposium on Advanced Database Systems 2024 / [a cura di] M. Atzori, P. Ciaccia, M. Ceci, F. Mandreoli, D. Malerba, M. Sanguinetti, A. Pellicani, F. Motta. - [s.l] : CEUR-WS, 2024. - pp. 61-69 (( Intervento presentato al 32. convegno Italian Symposium on Advanced Database Systems tenutosi a Villasimius nel 2024.
Initial achievements in relation extraction from RNA-focused scientific papers
E. CavalleriPrimo
;M. Soto-GomezSecondo
;A. Pashaeibarough;D. Malchiodi;E. Casiraghi;G. Valentini;M. MesitiUltimo
2024
Abstract
Relation extraction from the scientific literature to comply with a domain ontology is a well-known problem in natural language processing and is particularly critical in precision medicine. The advent of large language models (LLMs) has paved the way for the development of new effective approaches to this problem, but the extracted relations can be affected by issues such as hallucination, which must be minimized. In this paper, we present the initial design and preliminary experimental validation of SPIREX, an extension of the SPIRES-based system for the extraction of RDF triples from scientific literature involving RNA molecules. Our system exploits schema constraints in the formulations of LLM prompts along with our RNA-based KG, RNA-KG, for evaluating the plausibility of the extracted triples. RNA-KG contains more than 9M edges representing different kinds of relationships in which RNA molecules can be involved. Initial experimental results on a controlled data set are quite encouraging.File | Dimensione | Formato | |
---|---|---|---|
paper53.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
1.19 MB
Formato
Adobe PDF
|
1.19 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.