IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

This article approaches the Speech Emotion Recognition (SER) problem with the focus placed on multilingual settings. The proposed solution consists in a hierarchical scheme the first level of which identifies the speaker’s gender and the second level predicts the speaker’s emotional state. We elaborate with three classifiers of increased complexity, i.e. k-NN, transfer learning based on YAMNet and Bidirectional Long Short-Term Memory neural networks. Importantly, model learning, validation and testing consider the full range of the big- six emotions, while the dataset has been assembled using well-known SER datasets representing six different languages. The obtained results show differences in classifying all data against only female or male data with respect to all classifiers. Interestingly, a-priori genre recognition can boost the overall classification performance.

A Hierarchical Approach for Multilingual Speech Emotion Recognition / M. Nicolini, S. Ntalampiras - In: Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods / [a cura di] M. De Marsico, G. Sanniti di Baja, A. Fred. - [s.l] : ScitePress, 2023. - ISBN 978-989-758-626-2. - pp. 679-685 (( Intervento presentato al 12. convegno International Conference on Pattern Recognition Applications and Methods tenutosi a Lisbon nel 2023 [10.5220/0011714800003411].

A Hierarchical Approach for Multilingual Speech Emotion Recognition

Nicolini, Marco;S. Ntalampiras^Ultimo

2023

Abstract

This article approaches the Speech Emotion Recognition (SER) problem with the focus placed on multilingual settings. The proposed solution consists in a hierarchical scheme the first level of which identifies the speaker’s gender and the second level predicts the speaker’s emotional state. We elaborate with three classifiers of increased complexity, i.e. k-NN, transfer learning based on YAMNet and Bidirectional Long Short-Term Memory neural networks. Importantly, model learning, validation and testing consider the full range of the big- six emotions, while the dataset has been assembled using well-known SER datasets representing six different languages. The obtained results show differences in classifying all data against only female or male data with respect to all classifiers. Interestingly, a-priori genre recognition can boost the overall classification performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Audio Pattern Recognition; Machine Learning; Transfer Learning; Convolutional Neural Network; YAMNet; Multilingual Speech Emotion Recognition
			
	Settori scientifico-disciplinari del contributo
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2023
			
	DOI
	
				https://dx.doi.org/10.5220/0011714800003411
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
ICPRAM_2023_100_CR.pdf accesso riservato Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 249.33 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	249.33 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/957083

Citazioni

ND

2

ND

social impact