This article approaches the Speech Emotion Recognition (SER) problem with the focus placed on multilingual settings. The proposed solution consists in a hierarchical scheme the first level of which identifies the speaker’s gender and the second level predicts the speaker’s emotional state. We elaborate with three classifiers of increased complexity, i.e. k-NN, transfer learning based on YAMNet and Bidirectional Long Short-Term Memory neural networks. Importantly, model learning, validation and testing consider the full range of the big- six emotions, while the dataset has been assembled using well-known SER datasets representing six different languages. The obtained results show differences in classifying all data against only female or male data with respect to all classifiers. Interestingly, a-priori genre recognition can boost the overall classification performance.

A Hierarchical Approach for Multilingual Speech Emotion Recognition / M. Nicolini, S. Ntalampiras - In: Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods / [a cura di] M. De Marsico, G. Sanniti di Baja, A. Fred. - [s.l] : ScitePress, 2023. - ISBN 978-989-758-626-2. - pp. 679-685 (( Intervento presentato al 12. convegno International Conference on Pattern Recognition Applications and Methods tenutosi a Lisbon nel 2023 [10.5220/0011714800003411].

A Hierarchical Approach for Multilingual Speech Emotion Recognition

S. Ntalampiras
Ultimo
2023

Abstract

This article approaches the Speech Emotion Recognition (SER) problem with the focus placed on multilingual settings. The proposed solution consists in a hierarchical scheme the first level of which identifies the speaker’s gender and the second level predicts the speaker’s emotional state. We elaborate with three classifiers of increased complexity, i.e. k-NN, transfer learning based on YAMNet and Bidirectional Long Short-Term Memory neural networks. Importantly, model learning, validation and testing consider the full range of the big- six emotions, while the dataset has been assembled using well-known SER datasets representing six different languages. The obtained results show differences in classifying all data against only female or male data with respect to all classifiers. Interestingly, a-priori genre recognition can boost the overall classification performance.
Audio Pattern Recognition; Machine Learning; Transfer Learning; Convolutional Neural Network; YAMNet; Multilingual Speech Emotion Recognition
Settore INF/01 - Informatica
2023
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
ICPRAM_2023_100_CR.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 249.33 kB
Formato Adobe PDF
249.33 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/957083
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact