Motivation: Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for clinical management. Currently, manual biocuration is used to annotate MAxO terms to rare diseases. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10 000 rare diseases. Results: We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. We used this approach to process abstracts related to 37 rare genetic diseases and identified 958 novel treatment annotations that were transferred to the MAxO annotation dataset.
Leveraging generative AI to assist biocuration of medical actions for rare disease / E. Niyonkuru, J.H. Caufield, L.C. Carmody, M.A. Gargano, S. Toro, P.L. Whetzel, H. Blau, M. Soto Gomez, E. Casiraghi, L. Chimirri, J.T. Reese, G. Valentini, M.A. Haendel, C.J. Mungall, P.N. Robinson. - In: BIOINFORMATICS ADVANCES. - ISSN 2635-0041. - 5:1(2025 Jun 12), pp. vbaf141.1-vbaf141.10. [10.1093/bioadv/vbaf141]
Leveraging generative AI to assist biocuration of medical actions for rare disease
M. Soto Gomez;E. Casiraghi;G. Valentini;
2025
Abstract
Motivation: Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for clinical management. Currently, manual biocuration is used to annotate MAxO terms to rare diseases. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10 000 rare diseases. Results: We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. We used this approach to process abstracts related to 37 rare genetic diseases and identified 958 novel treatment annotations that were transferred to the MAxO annotation dataset.| File | Dimensione | Formato | |
|---|---|---|---|
|
vbaf141.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
1.65 MB
Formato
Adobe PDF
|
1.65 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




