IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Purpose: This study aimed to evaluate the scientific accuracy, content validity, and clarity of ChatGPT-4.0’s responses on conservative management of idiopathic scoliosis. The research explored whether the model could effectively support patient education in an area where non-surgical treatment information is crucial. Methods: Fourteen frequently asked questions (FAQs) regarding conservative scoliosis treatment were identified using a systematic, multi-step approach that combined web-based inquiry and expert input. Each question was submitted individually to ChatGPT-4.0 on December 6, 2024, using a standardized patient prompt (“I’m a scoliosis patient. Limit your answer to 150 words”). The generated responses were evaluated by a panel of 37 experts from a specialized spinal deformity center via an online survey using a 6-point Likert scale. Content validity was assessed using the Content Validity Ratio (CVR) and Content Validity Index (CVI), and inter-rater reliability was calculated with Fleiss’ kappa. Experts also provided categorical feedback on reasons for any rating discrepancies. Results: Eleven out of 14 responses met the CVR threshold (≥ 0.38), yielding an overall CVI of 0.68. Three responses - addressing “What is scoliosis?”, “Can exercises or physical therapy cure scoliosis?”, “What is the best sport for scoliosis?”- showed lower validity (CVR scores: 0.37, 0.37, and − 0.58, respectively), primarily due to factual inaccuracies and insufficient detail. Clarity received the highest ratings (median = 6), while comprehensiveness, professionalism, and response length each had a median score of 5. Inter-rater reliability was slight (Fleiss’ kappa = 0.10). Conclusion: ChatGPT-4.0 generally provides clear and accessible information on conservative idiopathic scoliosis management, supporting its potential as a patient education tool. Nonetheless, variability in response accuracy and expert evaluation underscores the need for further refinement and expert supervision before wider clinical application.

Evaluating ChatGPT-4.0’s accuracy and potential in idiopathic scoliosis conservative treatment: a preliminary study on clarity, validity, and expert perceptions / F. Negrini, C. Malfitano, G. Ferriero, G. Morone, A. Negrini, F. Zaina, I. Ferrario, C. Kiekens, S. Negrini, J. Vitale. - In: EUROPEAN SPINE JOURNAL. - ISSN 0940-6719. - (2025 Jul 21). [Epub ahead of print] [10.1007/s00586-025-09166-4]

Evaluating ChatGPT-4.0’s accuracy and potential in idiopathic scoliosis conservative treatment: a preliminary study on clarity, validity, and expert perceptions

C. Malfitano^{Secondo

Writing – Review & Editing};Ferriero G.;Morone G.;Negrini A.;Zaina F.;Ferrario I.;Kiekens C.;S. Negrini^{Penultimo

Writing – Review & Editing};

2025

Abstract

Purpose: This study aimed to evaluate the scientific accuracy, content validity, and clarity of ChatGPT-4.0’s responses on conservative management of idiopathic scoliosis. The research explored whether the model could effectively support patient education in an area where non-surgical treatment information is crucial. Methods: Fourteen frequently asked questions (FAQs) regarding conservative scoliosis treatment were identified using a systematic, multi-step approach that combined web-based inquiry and expert input. Each question was submitted individually to ChatGPT-4.0 on December 6, 2024, using a standardized patient prompt (“I’m a scoliosis patient. Limit your answer to 150 words”). The generated responses were evaluated by a panel of 37 experts from a specialized spinal deformity center via an online survey using a 6-point Likert scale. Content validity was assessed using the Content Validity Ratio (CVR) and Content Validity Index (CVI), and inter-rater reliability was calculated with Fleiss’ kappa. Experts also provided categorical feedback on reasons for any rating discrepancies. Results: Eleven out of 14 responses met the CVR threshold (≥ 0.38), yielding an overall CVI of 0.68. Three responses - addressing “What is scoliosis?”, “Can exercises or physical therapy cure scoliosis?”, “What is the best sport for scoliosis?”- showed lower validity (CVR scores: 0.37, 0.37, and − 0.58, respectively), primarily due to factual inaccuracies and insufficient detail. Clarity received the highest ratings (median = 6), while comprehensiveness, professionalism, and response length each had a median score of 5. Inter-rater reliability was slight (Fleiss’ kappa = 0.10). Conclusion: ChatGPT-4.0 generally provides clear and accessible information on conservative idiopathic scoliosis management, supporting its potential as a patient education tool. Nonetheless, variability in response accuracy and expert evaluation underscores the need for further refinement and expert supervision before wider clinical application.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				scoliosis; rehabilitation; artificial intelligence; natural language processing; patient education as topic
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore MEDS-19/A - Malattie dell'apparato locomotore
			
	Data di pubblicazione
	
				21-lug-2025
			
	Data ahead of print o data di stampa
	
				21-lug-2025
			
	Rivista in ANCE
	
				EUROPEAN SPINE JOURNAL
			
	DOI
	
				https://dx.doi.org/10.1007/s00586-025-09166-4
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
unpaywall-bitstream-785548056.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 1.43 MB Formato Adobe PDF Visualizza/Apri	1.43 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1189139

Citazioni

ND

1

1

ND

social impact