Objectives: Clear, complete operative documentation is essential for surgical safety, continuity of care, and medico-legal standards. Large language models such as ChatGPT offer promise for automating clinical documentation; however, their performance in operative note generation, particularly in surgical subspecialties, remains underexplored. This study aimed to compare the quality, accuracy, and efficiency of operative notes authored by a surgical resident, attending surgeon, GPT alone, and an attending surgeon using GPT as a writing aid. Methods: Five publicly available otolaryngologic procedures were selected. For each procedure, four operative notes were generated, one by a resident, one by an attending, one by GPT alone, and one by a hybrid of attending plus GPT. Ten blinded otolaryngologists (five residents, five attendings) independently reviewed all 20 notes. Reviewers scored each note across eight domains using a five-point scale, assigned a final approval rating, and provided qualitative feedback. Writing time was recorded to assess documentation efficiency. Results: Hybrid notes written by an attending surgeon with GPT assistance received the highest average domain scores and the highest "as is" approval rate (79%), outperforming all other groups. GPT-only notes were the fastest to generate but had the lowest approval rate (23%) and the highest incidence of both omissions and overdocumentation. Writing time was significantly reduced in both AI-assisted groups compared to human-only authorship. Inter-rater reliability among reviewers was moderate to high across most domains. Conclusion: In this limited dataset, hybrid human-AI collaboration outperformed both human-only and AI-only authorship in operative documentation. These findings support GPT-assisted documentation to improve operative note efficiency and consistency. Level of evidence: N/A.

Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI‐Augmented Operative Notes / S. Hack, R. Attal, G. Locatelli, G. Scotta, A. Maniaci, F.M. Parisi, N. Van Der Poel, M. Van Daele, A. Garcia‐lliberos, C. Rodriguez‐prado, C.M. Chiesa‐estomba, M. Andueza‐guembe, P. Cobb, H.G. Zalzal, A.M. Saibene. - In: LARYNGOSCOPE. - ISSN 0023-852X. - (2025), pp. 1-11. [Epub ahead of print] [10.1002/lary.70063]

Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI‐Augmented Operative Notes

G. Locatelli;A.M. Saibene
Ultimo
2025

Abstract

Objectives: Clear, complete operative documentation is essential for surgical safety, continuity of care, and medico-legal standards. Large language models such as ChatGPT offer promise for automating clinical documentation; however, their performance in operative note generation, particularly in surgical subspecialties, remains underexplored. This study aimed to compare the quality, accuracy, and efficiency of operative notes authored by a surgical resident, attending surgeon, GPT alone, and an attending surgeon using GPT as a writing aid. Methods: Five publicly available otolaryngologic procedures were selected. For each procedure, four operative notes were generated, one by a resident, one by an attending, one by GPT alone, and one by a hybrid of attending plus GPT. Ten blinded otolaryngologists (five residents, five attendings) independently reviewed all 20 notes. Reviewers scored each note across eight domains using a five-point scale, assigned a final approval rating, and provided qualitative feedback. Writing time was recorded to assess documentation efficiency. Results: Hybrid notes written by an attending surgeon with GPT assistance received the highest average domain scores and the highest "as is" approval rate (79%), outperforming all other groups. GPT-only notes were the fastest to generate but had the lowest approval rate (23%) and the highest incidence of both omissions and overdocumentation. Writing time was significantly reduced in both AI-assisted groups compared to human-only authorship. Inter-rater reliability among reviewers was moderate to high across most domains. Conclusion: In this limited dataset, hybrid human-AI collaboration outperformed both human-only and AI-only authorship in operative documentation. These findings support GPT-assisted documentation to improve operative note efficiency and consistency. Level of evidence: N/A.
GPT‐4.0; artificial intelligence; clinical note quality; large language models; operative documentation; otolaryngology; surgical workflow;
Settore MEDS-18/A - Otorinolaringoiatria
2025
20-ago-2025
Article (author)
File in questo prodotto:
File Dimensione Formato  
GPT and surgical notes (2025).pdf

accesso riservato

Tipologia: Publisher's version/PDF
Licenza: Nessuna licenza
Dimensione 1.27 MB
Formato Adobe PDF
1.27 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1180455
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 1
social impact