Relation extraction from scientific literature to align with a domain ontology is a well-known challenge in natural language processing, particularly critical in precision medicine. The advent of large language models (LLMs) has enabled the development of new and effective approaches to this problem. However, the extracted relations can be prone to problems (e.g., hallucination) that must be minimized. In this paper, we present the initial development of SPIREX, an extension of the SPIRES-based system designed to extract triples from scientific literature involving RNA molecules. Our system leverages schema constraints in the formulation of LLM prompts and utilizes graph machine learning on our RNA-based knowledge graph, RNA-KG, to assess the plausibility of the extracted triples. RNA-KG comprises more than 12.5M edges representing various types of relationships involving RNA molecules.
SPIREX: Improving LLM-based relation extraction from RNA-focused scientific literature using graph machine learning / E. Cavalleri, M. Soto Gomez, A. Pashaeibarough, D. Malchiodi, J.H. Caufield, J.T. Reese, C. Mungall, P.N. Robinson, E. Casiraghi, G. Valentini, M. Mesiti - In: Proceedings of Workshops at the 50th International Conference on Very Large Data Bases[s.l] : VLDB.org, 2024. - pp. 1-11 (( Intervento presentato al 50. convegno International Conference on Very Large Data Bases tenutosi a Guangzhou nel 2024.
SPIREX: Improving LLM-based relation extraction from RNA-focused scientific literature using graph machine learning
E. CavalleriPrimo
;M. Soto GomezSecondo
;A. Pashaeibarough;D. Malchiodi;E. Casiraghi;G. ValentiniPenultimo
;M. Mesiti
Ultimo
2024
Abstract
Relation extraction from scientific literature to align with a domain ontology is a well-known challenge in natural language processing, particularly critical in precision medicine. The advent of large language models (LLMs) has enabled the development of new and effective approaches to this problem. However, the extracted relations can be prone to problems (e.g., hallucination) that must be minimized. In this paper, we present the initial development of SPIREX, an extension of the SPIRES-based system designed to extract triples from scientific literature involving RNA molecules. Our system leverages schema constraints in the formulation of LLM prompts and utilizes graph machine learning on our RNA-based knowledge graph, RNA-KG, to assess the plausibility of the extracted triples. RNA-KG comprises more than 12.5M edges representing various types of relationships involving RNA molecules.File | Dimensione | Formato | |
---|---|---|---|
LLM+KG-12.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
1.69 MB
Formato
Adobe PDF
|
1.69 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.