This article approaches the Speech Emotion Recognition (SER) problem with the focus placed on multilingual settings. The proposed solution consists in a hierarchical scheme the first level of which identifies the speaker’s gender and the second level predicts the speaker’s emotional state. We elaborate with three classifiers of increased complexity, i.e. k-NN, transfer learning based on YAMNet and Bidirectional Long Short-Term Memory neural networks. Importantly, model learning, validation and testing consider the full range of the big- six emotions, while the dataset has been assembled using well-known SER datasets representing six different languages. The obtained results show differences in classifying all data against only female or male data with respect to all classifiers. Interestingly, a-priori genre recognition can boost the overall classification performance.
A Hierarchical Approach for Multilingual Speech Emotion Recognition / M. Nicolini, S. Ntalampiras - In: Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods / [a cura di] M. De Marsico, G. Sanniti di Baja, A. Fred. - [s.l] : ScitePress, 2023. - ISBN 978-989-758-626-2. - pp. 679-685 (( Intervento presentato al 12. convegno International Conference on Pattern Recognition Applications and Methods tenutosi a Lisbon nel 2023 [10.5220/0011714800003411].
A Hierarchical Approach for Multilingual Speech Emotion Recognition
M. Nicolini;S. Ntalampiras
Ultimo
2023
Abstract
This article approaches the Speech Emotion Recognition (SER) problem with the focus placed on multilingual settings. The proposed solution consists in a hierarchical scheme the first level of which identifies the speaker’s gender and the second level predicts the speaker’s emotional state. We elaborate with three classifiers of increased complexity, i.e. k-NN, transfer learning based on YAMNet and Bidirectional Long Short-Term Memory neural networks. Importantly, model learning, validation and testing consider the full range of the big- six emotions, while the dataset has been assembled using well-known SER datasets representing six different languages. The obtained results show differences in classifying all data against only female or male data with respect to all classifiers. Interestingly, a-priori genre recognition can boost the overall classification performance.| File | Dimensione | Formato | |
|---|---|---|---|
|
ICPRAM_2023_100_CR.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
249.33 kB
Formato
Adobe PDF
|
249.33 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




