Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI‐Augmented Operative Notes

Hack, S.; Attal, R.; Locatelli, G.; Scotta, G.; Maniaci, A.; Parisi, F.M.; Van Der Poel, N.; Van Daele, M.; Garcia‐lliberos, A.; Rodriguez‐prado, C.; Chiesa‐estomba, C.M.; Andueza‐guembe, M.; Cobb, P.; Zalzal, H.G.; Saibene, A.M.

doi:10.1002/lary.70063

Objectives: Clear, complete operative documentation is essential for surgical safety, continuity of care, and medico-legal standards. Large language models such as ChatGPT offer promise for automating clinical documentation; however, their performance in operative note generation, particularly in surgical subspecialties, remains underexplored. This study aimed to compare the quality, accuracy, and efficiency of operative notes authored by a surgical resident, attending surgeon, GPT alone, and an attending surgeon using GPT as a writing aid. Methods: Five publicly available otolaryngologic procedures were selected. For each procedure, four operative notes were generated, one by a resident, one by an attending, one by GPT alone, and one by a hybrid of attending plus GPT. Ten blinded otolaryngologists (five residents, five attendings) independently reviewed all 20 notes. Reviewers scored each note across eight domains using a five-point scale, assigned a final approval rating, and provided qualitative feedback. Writing time was recorded to assess documentation efficiency. Results: Hybrid notes written by an attending surgeon with GPT assistance received the highest average domain scores and the highest "as is" approval rate (79%), outperforming all other groups. GPT-only notes were the fastest to generate but had the lowest approval rate (23%) and the highest incidence of both omissions and overdocumentation. Writing time was significantly reduced in both AI-assisted groups compared to human-only authorship. Inter-rater reliability among reviewers was moderate to high across most domains. Conclusion: In this limited dataset, hybrid human-AI collaboration outperformed both human-only and AI-only authorship in operative documentation. These findings support GPT-assisted documentation to improve operative note efficiency and consistency. Level of evidence: N/A.

Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI‐Augmented Operative Notes / S. Hack, R. Attal, G. Locatelli, G. Scotta, A. Maniaci, F.M. Parisi, N. Van Der Poel, M. Van Daele, A. Garcia‐lliberos, C. Rodriguez‐prado, C.M. Chiesa‐estomba, M. Andueza‐guembe, P. Cobb, H.G. Zalzal, A.M. Saibene. - In: LARYNGOSCOPE. - ISSN 0023-852X. - (2025), pp. 1-11. [Epub ahead of print] [10.1002/lary.70063]

Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI‐Augmented Operative Notes

Hack, Sholem;Attal, Rebecca;G. Locatelli;Scotta, Gianluca;Maniaci, Antonino;Parisi, Federica Maria;van der Poel, Nicolien;Van Daele, Margot;Garcia‐Lliberos, Ainhoa;Rodriguez‐Prado, Cristina;Chiesa‐Estomba, Carlos Miguel;Andueza‐Guembe, Maider;Cobb, Pollara;Zalzal, Habib G.;A.M. Saibene^Ultimo

2025

Abstract

Objectives: Clear, complete operative documentation is essential for surgical safety, continuity of care, and medico-legal standards. Large language models such as ChatGPT offer promise for automating clinical documentation; however, their performance in operative note generation, particularly in surgical subspecialties, remains underexplored. This study aimed to compare the quality, accuracy, and efficiency of operative notes authored by a surgical resident, attending surgeon, GPT alone, and an attending surgeon using GPT as a writing aid. Methods: Five publicly available otolaryngologic procedures were selected. For each procedure, four operative notes were generated, one by a resident, one by an attending, one by GPT alone, and one by a hybrid of attending plus GPT. Ten blinded otolaryngologists (five residents, five attendings) independently reviewed all 20 notes. Reviewers scored each note across eight domains using a five-point scale, assigned a final approval rating, and provided qualitative feedback. Writing time was recorded to assess documentation efficiency. Results: Hybrid notes written by an attending surgeon with GPT assistance received the highest average domain scores and the highest "as is" approval rate (79%), outperforming all other groups. GPT-only notes were the fastest to generate but had the lowest approval rate (23%) and the highest incidence of both omissions and overdocumentation. Writing time was significantly reduced in both AI-assisted groups compared to human-only authorship. Inter-rater reliability among reviewers was moderate to high across most domains. Conclusion: In this limited dataset, hybrid human-AI collaboration outperformed both human-only and AI-only authorship in operative documentation. These findings support GPT-assisted documentation to improve operative note efficiency and consistency. Level of evidence: N/A.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				GPT‐4.0; artificial intelligence; clinical note quality; large language models; operative documentation; otolaryngology; surgical workflow;
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore MEDS-18/A - Otorinolaringoiatria
			
	Data di pubblicazione
	
				2025
			
	Data ahead of print o data di stampa
	
				20-ago-2025
			
	Rivista in ANCE
	
				LARYNGOSCOPE
			
	DOI
	
				https://dx.doi.org/10.1002/lary.70063
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
GPT and surgical notes (2025).pdf accesso riservato Tipologia: Publisher's version/PDF Licenza: Nessuna licenza Dimensione 1.27 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.27 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1180455

Citazioni

2

1

0

1

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca