Large Language Models (LLMs) offer an appealing alternative to training dedicated models for many Natural Language Processing (NLP) tasks. However, outdated knowledge and hallucination issues can be major obstacles in their application in knowledge-intensive biomedical scenarios. In this study, we consider the task of biomedical concept recognition (CR) from unstructured scientific literature and explore the use of Retrieval Augmented Generation (RAG) to improve accuracy and reliability of the LLM-based biomedical CR. Our approach, named REAL (Retrieval Augmented Entity Linking), combines the generative capabilities of LLMs with curated knowledge bases to automatically annotate natural language texts with concepts from bio-ontologies. By applying REAL to benchmark corpora on phenotype concept recognition, we show its effectiveness in improving LLM-based CR performance. This research highlights the potential of combining LLMs with external knowledge sources to advance biomedical text processing.

REAL: A Retrieval-Augmented Entity Linking Approach for Biomedical Concept Recognition / D. Shlyk, T. Groza, S. Montanelli, E. Cavalleri, M. Mesiti - In: Proceedings of the 23rd Workshop on Biomedical Natural Language Processing / [a cura di] D. Demner-Fushman, S. Ananiadou, M. Miwa, K. Roberts, J. Tsujii. - [s.l] : Association for Computational Linguistics, 2024. - ISBN 9798891761308. - pp. 380-389 (( Intervento presentato al 23. convegno Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing tenutosi a Bangkok nel 2024.

REAL: A Retrieval-Augmented Entity Linking Approach for Biomedical Concept Recognition

D. Shlyk
Primo
;
S. Montanelli;E. Cavalleri
Penultimo
;
M. Mesiti
Ultimo
2024

Abstract

Large Language Models (LLMs) offer an appealing alternative to training dedicated models for many Natural Language Processing (NLP) tasks. However, outdated knowledge and hallucination issues can be major obstacles in their application in knowledge-intensive biomedical scenarios. In this study, we consider the task of biomedical concept recognition (CR) from unstructured scientific literature and explore the use of Retrieval Augmented Generation (RAG) to improve accuracy and reliability of the LLM-based biomedical CR. Our approach, named REAL (Retrieval Augmented Entity Linking), combines the generative capabilities of LLMs with curated knowledge bases to automatically annotate natural language texts with concepts from bio-ontologies. By applying REAL to benchmark corpora on phenotype concept recognition, we show its effectiveness in improving LLM-based CR performance. This research highlights the potential of combining LLMs with external knowledge sources to advance biomedical text processing.
Settore INF/01 - Informatica
Settore INFO-01/A - Informatica
2024
ACL Special Interest Group on Biomedical Natural Language Processing (SIGBIOMED)
https://aclanthology.org/2024.bionlp-1.29/
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
2024.bionlp-1.29.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.31 MB
Formato Adobe PDF
1.31 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1122499
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact