Performance and Consistency of ChatGPT‐4 Versus Otolaryngologists: A Clinical Case Series

Lechien, J.R.; Naunheim, M.R.; Maniaci, A.; Radulesco, T.; Saibene, A.M.; Chiesa‐estomba, C.M.; Vaira, L.A.

doi:10.1002/ohn.759

Objective: To study the performance of Chatbot Generative Pretrained Transformer-4 (ChatGPT-4) in the management of cases in otolaryngology-head and neck surgery. Study design: Prospective case series. Setting: Multicenter University Hospitals. Methods: History, clinical, physical, and additional examinations of adult outpatients consulting in otolaryngology departments of CHU Saint-Pierre and Dour Medical Center were presented to ChatGPT-4, which was interrogated for differential diagnoses, management, and treatment(s). According to specialty, the ChatGPT-4 responses were assessed by 2 distinct, blinded board-certified otolaryngologists with the Artificial Intelligence Performance Instrument. Results: One hundred cases were presented to ChatGPT-4. ChaGPT-4 indicated a mean of 3.34 (95% confidence interval [CI]: 3.09, 3.59) additional examinations per patient versus 2.10 (95% CI: 1.76, 2.34; P = .001) for the practitioners. There was strong consistency (k > 0.600) between otolaryngologists and ChatGPT-4 for the indication of upper aerodigestive tract endoscopy, positron emission tomography and computed tomography, audiometry, tympanometry, and psychophysical evaluations. Primary diagnosis was correctly performed by ChatGPT-4 in 38% to 86% of cases depending on subspecialty. Additional examinations indicated by ChatGPT-4 were pertinent and necessary in 8% to 31% of cases, while the treatment regimen was pertinent in 12% to 44% of cases. The performance of ChatGPT-4 was not influenced by the human-reported level of difficulty of clinical cases. Conclusion: ChatGPT-4 may be a promising adjunctive tool in otolaryngology, providing extensive documentation about additional examinations, primary and differential diagnoses, and treatments. The ChatGPT-4 is more effective in providing a primary diagnosis, and less effective in the selection of additional examinations and treatments.

Performance and Consistency of ChatGPT‐4 Versus Otolaryngologists: A Clinical Case Series / J.R. Lechien, M.R. Naunheim, A. Maniaci, T. Radulesco, A.M. Saibene, C.M. Chiesa‐estomba, L.A. Vaira. - In: OTOLARYNGOLOGY-HEAD AND NECK SURGERY. - ISSN 0194-5998. - (2024), pp. 1-8. [Epub ahead of print] [10.1002/ohn.759]

Performance and Consistency of ChatGPT‐4 Versus Otolaryngologists: A Clinical Case Series

Lechien, Jérôme R.;Naunheim, Mattheuw R.;Maniaci, Antonino;Radulesco, Thomas;A.M. Saibene;Chiesa‐Estomba, Carlos M.;Vaira, Luigi A.

2024

Abstract

Objective: To study the performance of Chatbot Generative Pretrained Transformer-4 (ChatGPT-4) in the management of cases in otolaryngology-head and neck surgery. Study design: Prospective case series. Setting: Multicenter University Hospitals. Methods: History, clinical, physical, and additional examinations of adult outpatients consulting in otolaryngology departments of CHU Saint-Pierre and Dour Medical Center were presented to ChatGPT-4, which was interrogated for differential diagnoses, management, and treatment(s). According to specialty, the ChatGPT-4 responses were assessed by 2 distinct, blinded board-certified otolaryngologists with the Artificial Intelligence Performance Instrument. Results: One hundred cases were presented to ChatGPT-4. ChaGPT-4 indicated a mean of 3.34 (95% confidence interval [CI]: 3.09, 3.59) additional examinations per patient versus 2.10 (95% CI: 1.76, 2.34; P = .001) for the practitioners. There was strong consistency (k > 0.600) between otolaryngologists and ChatGPT-4 for the indication of upper aerodigestive tract endoscopy, positron emission tomography and computed tomography, audiometry, tympanometry, and psychophysical evaluations. Primary diagnosis was correctly performed by ChatGPT-4 in 38% to 86% of cases depending on subspecialty. Additional examinations indicated by ChatGPT-4 were pertinent and necessary in 8% to 31% of cases, while the treatment regimen was pertinent in 12% to 44% of cases. The performance of ChatGPT-4 was not influenced by the human-reported level of difficulty of clinical cases. Conclusion: ChatGPT-4 may be a promising adjunctive tool in otolaryngology, providing extensive documentation about additional examinations, primary and differential diagnoses, and treatments. The ChatGPT-4 is more effective in providing a primary diagnosis, and less effective in the selection of additional examinations and treatments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				ChatGPT‐4; artificial intelligence; head neck surgery; otolaryngology; performance;
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore MED/31 - Otorinolaringoiatria
			
	Data di pubblicazione
	
				2024
			
	Data ahead of print o data di stampa
	
				9-apr-2024
			
	Rivista in ANCE
	
				OTOLARYNGOLOGY-HEAD AND NECK SURGERY
			
	DOI
	
				https://dx.doi.org/10.1002/ohn.759
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
Performance and Consistency of ChatGPT-4 Versus Otolaryngologists (2024).pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 565.31 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	565.31 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1044972

Citazioni

20

42

35

41

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca