Objective: To systematically evaluate the diagnostic accuracy, educational utility, and communication potential of generative AI, particularly Large Language Models (LLMs) such as ChatGPT, in otolaryngology. Data Sources: A comprehensive search of PubMed, Embase, Scopus, Web of Science, and IEEE Xplore identified English-language peer-reviewed studies from January 2022 to March 2025. Review Methods: Eligible studies evaluated text-based generative AI models used in otolaryngology. Two reviewers screened and assessed studies using JBI and QUADAS-2 tools. A random-effects meta-analysis was conducted on diagnostic accuracy outcomes, with subgroup analyses by task type and model version. Results: Ninety-one studies were included; 61 reported quantitative outcomes. Of these, 43 provided diagnostic accuracy data across 59 model-task pairs. Pooled diagnostic accuracy was 72.7 % (95 % CI: 67.4–77.6 %; I² = 93.8 %). Accuracy was highest in education (83.0 %) and diagnostic imaging tasks (84.9 %), and lowest in clinical decision support (67.1 %). GPT-4 consistently outperformed GPT-3.5 across both education and CDS domains. Hallucinations and performance variability were noted in complex clinical reasoning tasks. Conclusion: Generative AI performs well in structured otolaryngology tasks, particularly education and communication. However, its inconsistent performance in clinical reasoning tasks limits standalone use. Future research should focus on hallucination mitigation, standardized evaluation, and prospective validation to guide safe clinical integration.

Performance of generative AI across ENT tasks: A systematic review and meta-analysis / S. Hack, R. Attal, A. Farzad, E.E. Alon, E. Glikson, E. Remer, A. Maria Saibene, H.G. Zalzal. - In: AURIS, NASUS, LARYNX. - ISSN 0385-8146. - 52:5(2025 Oct), pp. 585-596. [10.1016/j.anl.2025.08.010]

Performance of generative AI across ENT tasks: A systematic review and meta-analysis

A. Maria Saibene
Penultimo
;
2025

Abstract

Objective: To systematically evaluate the diagnostic accuracy, educational utility, and communication potential of generative AI, particularly Large Language Models (LLMs) such as ChatGPT, in otolaryngology. Data Sources: A comprehensive search of PubMed, Embase, Scopus, Web of Science, and IEEE Xplore identified English-language peer-reviewed studies from January 2022 to March 2025. Review Methods: Eligible studies evaluated text-based generative AI models used in otolaryngology. Two reviewers screened and assessed studies using JBI and QUADAS-2 tools. A random-effects meta-analysis was conducted on diagnostic accuracy outcomes, with subgroup analyses by task type and model version. Results: Ninety-one studies were included; 61 reported quantitative outcomes. Of these, 43 provided diagnostic accuracy data across 59 model-task pairs. Pooled diagnostic accuracy was 72.7 % (95 % CI: 67.4–77.6 %; I² = 93.8 %). Accuracy was highest in education (83.0 %) and diagnostic imaging tasks (84.9 %), and lowest in clinical decision support (67.1 %). GPT-4 consistently outperformed GPT-3.5 across both education and CDS domains. Hallucinations and performance variability were noted in complex clinical reasoning tasks. Conclusion: Generative AI performs well in structured otolaryngology tasks, particularly education and communication. However, its inconsistent performance in clinical reasoning tasks limits standalone use. Future research should focus on hallucination mitigation, standardized evaluation, and prospective validation to guide safe clinical integration.
Artificial intelligence; ChatGPT; Generative AI; Large language models; Otolaryngology; Systematic review
Settore MEDS-18/A - Otorinolaringoiatria
ott-2025
4-set-2025
Article (author)
File in questo prodotto:
File Dimensione Formato  
auris nasus larynx.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Licenza: Nessuna licenza
Dimensione 4.65 MB
Formato Adobe PDF
4.65 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1183841
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 0
social impact