This article presents a solution for Speech Emotion Recognition (SER) in multilingual setting using a hierarchical approach. The approach involves two levels, the first level identifies the gender of the speaker, while the second level predicts their emotional state. We evaluate the performance of three classifiers of increasing complexity: k-NN, transfer learning based on YAM- Net, and Bidirectional Long Short-Term Memory neural networks. The models were trained, validated, and tested on a dataset that includes the big-six emotions and was collected from well-known SER datasets representing six different lan- guages. Our results indicate that there are differences in classification accuracy when considering all data versus only female or male data, across all classifiers. Interestingly, prior knowledge of the speaker’s gender can improve the overall classification performance

Gender-Aware Speech Emotion Recognition in Multiple Languages / M. Nicolini, S. Ntalampiras (LECTURE NOTES IN COMPUTER SCIENCE). - In: Pattern Recognition Applications and Methods / [a cura di] M. De Marsico, G. Sanniti Di Baja, A. Fred. - [s.l] : Springer, 2024 Feb 22. - ISBN 9783031547256. - pp. 111-123 [10.1007/978-3-031-54726-3_7]

Gender-Aware Speech Emotion Recognition in Multiple Languages

M. Nicolini;S. Ntalampiras
2024

Abstract

This article presents a solution for Speech Emotion Recognition (SER) in multilingual setting using a hierarchical approach. The approach involves two levels, the first level identifies the gender of the speaker, while the second level predicts their emotional state. We evaluate the performance of three classifiers of increasing complexity: k-NN, transfer learning based on YAM- Net, and Bidirectional Long Short-Term Memory neural networks. The models were trained, validated, and tested on a dataset that includes the big-six emotions and was collected from well-known SER datasets representing six different lan- guages. Our results indicate that there are differences in classification accuracy when considering all data versus only female or male data, across all classifiers. Interestingly, prior knowledge of the speaker’s gender can improve the overall classification performance
Audio pattern recognition; Machine learning; Transfer learning; Convolutional neural network; YAMNet; Multilingual speech emotion recognition
Settore INF/01 - Informatica
22-feb-2024
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
978-3-031-54726-3_7.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 632.82 kB
Formato Adobe PDF
632.82 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1033090
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact