This article approaches the Speech Emotion Recognition (SER) problem with the focus placed on multilingual settings. The proposed solution consists in a hierarchical scheme the first level of which identifies the speaker’s gender and the second level predicts the speaker’s emotional state. We elaborate with three classifiers of increased complexity, i.e. k-NN, transfer learning based on YAMNet and Bidirectional Long Short-Term Memory neural networks. Importantly, model learning, validation and testing consider the full range of the big- six emotions, while the dataset has been assembled using well-known SER datasets representing six different languages. The obtained results show differences in classifying all data against only female or male data with respect to all classifiers. Interestingly, a-priori genre recognition can boost the overall classification performance.

A Hierarchical Approach for Multilingual Speech Emotion Recognition / M. Nicolini, S. Ntalampiras - In: Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods / [a cura di] M. De Marsico, G. Sanniti di Baja, A. Fred. - [s.l] : ScitePress, 2023. - ISBN 978-989-758-626-2. - pp. 679-685 (( Intervento presentato al 12. convegno International Conference on Pattern Recognition Applications and Methods tenutosi a Lisbon nel 2023 [10.5220/0011714800003411].

A Hierarchical Approach for Multilingual Speech Emotion Recognition

M. Nicolini;S. Ntalampiras
Ultimo
2023

Abstract

This article approaches the Speech Emotion Recognition (SER) problem with the focus placed on multilingual settings. The proposed solution consists in a hierarchical scheme the first level of which identifies the speaker’s gender and the second level predicts the speaker’s emotional state. We elaborate with three classifiers of increased complexity, i.e. k-NN, transfer learning based on YAMNet and Bidirectional Long Short-Term Memory neural networks. Importantly, model learning, validation and testing consider the full range of the big- six emotions, while the dataset has been assembled using well-known SER datasets representing six different languages. The obtained results show differences in classifying all data against only female or male data with respect to all classifiers. Interestingly, a-priori genre recognition can boost the overall classification performance.
Audio Pattern Recognition; Machine Learning; Transfer Learning; Convolutional Neural Network; YAMNet; Multilingual Speech Emotion Recognition
Settore INF/01 - Informatica
2023
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
ICPRAM_2023_100_CR.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 249.33 kB
Formato Adobe PDF
249.33 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/957083
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact