Major depressive disorder (MDD) affects approximately 4.4% of the global population. Its prevalence is increasing among adolescents and has led to the psychosocial condition known as hikikomori. MDD is typically assessed by self-report questionnaires, which, although informative, are subject to evaluator bias and subjectivity. To address these limitations, recent studies have explored machine learning (ML) for automated MDD detection. Among the input data used, speech signals stand out due to their low cost and minimal intrusiveness. However, many speech-based approaches lack integration with cognitive behavioral therapy (CBT) and adherence to evidence-based, patient-centered care-often aiming to replace rather than support clinical monitoring. In this context, we propose ML models to assess MDD in hikikomori patients using speech data from a real-world clinical trial. The trial is conducted in Italy, supervised by physicians, and comprises an eight-session CBT plan that is clinical evidence-based and follows patient-centered practices. Patients' speech is recorded during therapy, and the Mel-Frequency Cepstral Coefficients (MFCCs) and wav2vec 2.0 embedding are extracted to train the models. The results show that the Multi-Layer Perceptron (MLP) predicted depression outcomes with a Root Mean Squared Error (RMSE) of 0.064 using only MFCCs from the first session, suggesting that early-session speech may be valuable for outcome prediction. When considering the entire CBT treatment (i.e., all sessions), the MLP achieved an RMSE of 0.063 using MFCCs and a lower RMSE of 0.057 with wav2vec 2.0, indicating approximately a 9.5% performance improvement. To aid the interpretability of the treatment outcomes, a binary task was conducted, where Logistic Regression (LR) achieved 70% recall in predicting depression improvement among young adults using wav2vec 2.0. These findings position speech as a valuable predictive tool in clinical informatics, potentially supporting clinicians in anticipating treatment response.
Speech-Based Depression Recognition in Hikikomori Patients Undergoing Cognitive Behavioral Therapy / S.S. Leal, S. Ntalampiras, M.G. Rossetti, A. Trabacca, M. Bellani, R. Sassi. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 15:21(2025 Nov), pp. 11750.1-11750.18. [10.3390/app152111750]
Speech-Based Depression Recognition in Hikikomori Patients Undergoing Cognitive Behavioral Therapy
S.S. Leal
Primo
;S. Ntalampiras;R. SassiUltimo
2025
Abstract
Major depressive disorder (MDD) affects approximately 4.4% of the global population. Its prevalence is increasing among adolescents and has led to the psychosocial condition known as hikikomori. MDD is typically assessed by self-report questionnaires, which, although informative, are subject to evaluator bias and subjectivity. To address these limitations, recent studies have explored machine learning (ML) for automated MDD detection. Among the input data used, speech signals stand out due to their low cost and minimal intrusiveness. However, many speech-based approaches lack integration with cognitive behavioral therapy (CBT) and adherence to evidence-based, patient-centered care-often aiming to replace rather than support clinical monitoring. In this context, we propose ML models to assess MDD in hikikomori patients using speech data from a real-world clinical trial. The trial is conducted in Italy, supervised by physicians, and comprises an eight-session CBT plan that is clinical evidence-based and follows patient-centered practices. Patients' speech is recorded during therapy, and the Mel-Frequency Cepstral Coefficients (MFCCs) and wav2vec 2.0 embedding are extracted to train the models. The results show that the Multi-Layer Perceptron (MLP) predicted depression outcomes with a Root Mean Squared Error (RMSE) of 0.064 using only MFCCs from the first session, suggesting that early-session speech may be valuable for outcome prediction. When considering the entire CBT treatment (i.e., all sessions), the MLP achieved an RMSE of 0.063 using MFCCs and a lower RMSE of 0.057 with wav2vec 2.0, indicating approximately a 9.5% performance improvement. To aid the interpretability of the treatment outcomes, a binary task was conducted, where Logistic Regression (LR) achieved 70% recall in predicting depression improvement among young adults using wav2vec 2.0. These findings position speech as a valuable predictive tool in clinical informatics, potentially supporting clinicians in anticipating treatment response.| File | Dimensione | Formato | |
|---|---|---|---|
|
applsci-15-11750.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
1.14 MB
Formato
Adobe PDF
|
1.14 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




