Major depressive disorder (MDD) affects approximately 4.4% of the global population. Its prevalence is increasing among adolescents and has led to the psychosocial condition known as hikikomori. MDD is typically assessed by self-report questionnaires, which, although informative, are subject to evaluator bias and subjectivity. To address these limitations, recent studies have explored machine learning (ML) for automated MDD detection. Among the input data used, speech signals stand out due to their low cost and minimal intrusiveness. However, many speech-based approaches lack integration with cognitive behavioral therapy (CBT) and adherence to evidence-based, patient-centered care-often aiming to replace rather than support clinical monitoring. In this context, we propose ML models to assess MDD in hikikomori patients using speech data from a real-world clinical trial. The trial is conducted in Italy, supervised by physicians, and comprises an eight-session CBT plan that is clinical evidence-based and follows patient-centered practices. Patients' speech is recorded during therapy, and the Mel-Frequency Cepstral Coefficients (MFCCs) and wav2vec 2.0 embedding are extracted to train the models. The results show that the Multi-Layer Perceptron (MLP) predicted depression outcomes with a Root Mean Squared Error (RMSE) of 0.064 using only MFCCs from the first session, suggesting that early-session speech may be valuable for outcome prediction. When considering the entire CBT treatment (i.e., all sessions), the MLP achieved an RMSE of 0.063 using MFCCs and a lower RMSE of 0.057 with wav2vec 2.0, indicating approximately a 9.5% performance improvement. To aid the interpretability of the treatment outcomes, a binary task was conducted, where Logistic Regression (LR) achieved 70% recall in predicting depression improvement among young adults using wav2vec 2.0. These findings position speech as a valuable predictive tool in clinical informatics, potentially supporting clinicians in anticipating treatment response.

Speech-Based Depression Recognition in Hikikomori Patients Undergoing Cognitive Behavioral Therapy / S.S. Leal, S. Ntalampiras, M.G. Rossetti, A. Trabacca, M. Bellani, R. Sassi. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 15:21(2025 Nov), pp. 11750.1-11750.18. [10.3390/app152111750]

Speech-Based Depression Recognition in Hikikomori Patients Undergoing Cognitive Behavioral Therapy

S.S. Leal
Primo
;
S. Ntalampiras;R. Sassi
Ultimo
2025

Abstract

Major depressive disorder (MDD) affects approximately 4.4% of the global population. Its prevalence is increasing among adolescents and has led to the psychosocial condition known as hikikomori. MDD is typically assessed by self-report questionnaires, which, although informative, are subject to evaluator bias and subjectivity. To address these limitations, recent studies have explored machine learning (ML) for automated MDD detection. Among the input data used, speech signals stand out due to their low cost and minimal intrusiveness. However, many speech-based approaches lack integration with cognitive behavioral therapy (CBT) and adherence to evidence-based, patient-centered care-often aiming to replace rather than support clinical monitoring. In this context, we propose ML models to assess MDD in hikikomori patients using speech data from a real-world clinical trial. The trial is conducted in Italy, supervised by physicians, and comprises an eight-session CBT plan that is clinical evidence-based and follows patient-centered practices. Patients' speech is recorded during therapy, and the Mel-Frequency Cepstral Coefficients (MFCCs) and wav2vec 2.0 embedding are extracted to train the models. The results show that the Multi-Layer Perceptron (MLP) predicted depression outcomes with a Root Mean Squared Error (RMSE) of 0.064 using only MFCCs from the first session, suggesting that early-session speech may be valuable for outcome prediction. When considering the entire CBT treatment (i.e., all sessions), the MLP achieved an RMSE of 0.063 using MFCCs and a lower RMSE of 0.057 with wav2vec 2.0, indicating approximately a 9.5% performance improvement. To aid the interpretability of the treatment outcomes, a binary task was conducted, where Logistic Regression (LR) achieved 70% recall in predicting depression improvement among young adults using wav2vec 2.0. These findings position speech as a valuable predictive tool in clinical informatics, potentially supporting clinicians in anticipating treatment response.
No
English
machine learning; speech depression recognition; wav2vec2
Settore INFO-01/A - Informatica
Articolo
Esperti anonimi
Pubblicazione scientifica
   SOLITAIRE - Digital interventions for Social isOLation In youThs And theIR familiEs
   SOLITAIRE
   MINISTERO DELLA SALUTE
   PNRR-MAD-2022-12376834
nov-2025
MDPI
15
21
11750
1
18
18
Pubblicato
Periodico con rilevanza internazionale
crossref
Aderisco
info:eu-repo/semantics/article
Speech-Based Depression Recognition in Hikikomori Patients Undergoing Cognitive Behavioral Therapy / S.S. Leal, S. Ntalampiras, M.G. Rossetti, A. Trabacca, M. Bellani, R. Sassi. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 15:21(2025 Nov), pp. 11750.1-11750.18. [10.3390/app152111750]
open
Prodotti della ricerca::01 - Articolo su periodico
6
262
Article (author)
Periodico con Impact Factor
S.S. Leal, S. Ntalampiras, M.G. Rossetti, A. Trabacca, M. Bellani, R. Sassi
File in questo prodotto:
File Dimensione Formato  
applsci-15-11750.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 1.14 MB
Formato Adobe PDF
1.14 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1197256
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact