Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis

Vaira, L.A.; Lechien, J.R.; Abbate, V.; Allevi, F.; Audino, G.; Beltramini, G.A.; Bergonzani, M.; Bolzoni, A.; Committeri, U.; Crimi, S.; Gabriele, G.; Lonardi, F.; Maglitto, F.; Petrocelli, M.; Pucci, R.; Saponaro, G.; Tel, A.; Vellone, V.; Chiesa-Estomba, C.M.; Boscolo-Rizzo, P.; Salzano, G.; De Riu, G.

doi:10.1002/ohn.489

Objective: To investigate the accuracy of Chat-Based Generative Pre-trained Transformer (ChatGPT) in answering questions and solving clinical scenarios of head and neck surgery. Study design: Observational and valuative study. Setting: Eighteen surgeons from 14 Italian head and neck surgery units. Methods: A total of 144 clinical questions encompassing different subspecialities of head and neck surgery and 15 comprehensive clinical scenarios were developed. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using accuracy (range 1-6), completeness (range 1-3), and references' quality Likert scales. Results: The overall median score of open-ended questions was 6 (interquartile range[IQR]: 5-6) for accuracy and 3 (IQR: 2-3) for completeness. Overall, the reviewers rated the answer as entirely or nearly entirely correct in 87.2% of cases and as comprehensive and covering all aspects of the question in 73% of cases. The artificial intelligence (AI) model achieved a correct response in 84.7% of the closed-ended questions (11 wrong answers). As for the clinical scenarios, ChatGPT provided a fully or nearly fully correct diagnosis in 81.7% of cases. The proposed diagnostic or therapeutic procedure was judged to be complete in 56.7% of cases. The overall quality of the bibliographic references was poor, and sources were nonexistent in 46.4% of the cases. Conclusion: The results generally demonstrate a good level of accuracy in the AI's answers. The AI's ability to resolve complex clinical scenarios is promising, but it still falls short of being considered a reliable support for the decision-making process of specialists in head-neck surgery.

Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis / L.A. Vaira, J.R. Lechien, V. Abbate, F. Allevi, G. Audino, G.A. Beltramini, M. Bergonzani, A. Bolzoni, U. Committeri, S. Crimi, G. Gabriele, F. Lonardi, F. Maglitto, M. Petrocelli, R. Pucci, G. Saponaro, A. Tel, V. Vellone, C.M. Chiesa-Estomba, P. Boscolo-Rizzo, G. Salzano, G. De Riu. - In: OTOLARYNGOLOGY--HEAD AND NECK SURGERY. - ISSN 1097-6817. - (2023). [Epub ahead of print] [10.1002/ohn.489]

Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis

Vaira, Luigi Angelo;Lechien, Jerome R;Abbate, Vincenzo;F. Allevi;Audino, Giovanni;G.A. Beltramini;Bergonzani, Michela;A. Bolzoni;Committeri, Umberto;Crimi, Salvatore;Gabriele, Guido;Lonardi, Fabio;Maglitto, Fabio;Petrocelli, Marzia;Pucci, Resi;Saponaro, Gianmarco;Tel, Alessandro;Vellone, Valentino;Chiesa-Estomba, Carlos Miguel;Boscolo-Rizzo, Paolo;Salzano, Giovanni;De Riu, Giacomo

2023

Abstract

Objective: To investigate the accuracy of Chat-Based Generative Pre-trained Transformer (ChatGPT) in answering questions and solving clinical scenarios of head and neck surgery. Study design: Observational and valuative study. Setting: Eighteen surgeons from 14 Italian head and neck surgery units. Methods: A total of 144 clinical questions encompassing different subspecialities of head and neck surgery and 15 comprehensive clinical scenarios were developed. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using accuracy (range 1-6), completeness (range 1-3), and references' quality Likert scales. Results: The overall median score of open-ended questions was 6 (interquartile range[IQR]: 5-6) for accuracy and 3 (IQR: 2-3) for completeness. Overall, the reviewers rated the answer as entirely or nearly entirely correct in 87.2% of cases and as comprehensive and covering all aspects of the question in 73% of cases. The artificial intelligence (AI) model achieved a correct response in 84.7% of the closed-ended questions (11 wrong answers). As for the clinical scenarios, ChatGPT provided a fully or nearly fully correct diagnosis in 81.7% of cases. The proposed diagnostic or therapeutic procedure was judged to be complete in 56.7% of cases. The overall quality of the bibliographic references was poor, and sources were nonexistent in 46.4% of the cases. Conclusion: The results generally demonstrate a good level of accuracy in the AI's answers. The AI's ability to resolve complex clinical scenarios is promising, but it still falls short of being considered a reliable support for the decision-making process of specialists in head-neck surgery.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				ChatGPT; artificial intelligence; maxillofacial surgery; otorhinolaryngology
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore MED/29 - Chirurgia Maxillofacciale
			
	Data di pubblicazione
	
				2023
			
	Data ahead of print o data di stampa
	
				18-ago-2023
			
	Rivista in ANCE
	
				OTOLARYNGOLOGY--HEAD AND NECK SURGERY
			
	DOI
	
				https://dx.doi.org/10.1002/ohn.489
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
Otolaryngol --head neck surg - 2023 - Vaira - Accuracy of ChatGPT‐Generated Information on Head and Neck and.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 813.96 kB Formato Adobe PDF Visualizza/Apri	813.96 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1024627

Citazioni

46

99

95

120

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca