IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Affective computing is gaining increased interest by the scientific community in the last decades with the acoustic modality playing a central role. This paper presents an extensive computational analysis of emotional speech focusing on the Italian language. More precisely, we propose a novel classification algorithm based on a suitable data augmentation scheme. The aim is to classify the seven emotions (anger, disgust, fear, joy, neutral, sadness, and surprise) included in the only publicly available database of Italian emotional speech, i.e. EMOVO. To this end, we employed two feature sets, Mel Frequency Cepstral Coefficients and log-Mel spectrogram, each one combined with a suitable classifier, i.e. Mutilayer perceptron and Convolutional neural network respectively. The implementation and evaluation of the proposed SER pipeline can be accessed through the following link: https://github.com/irenemante/ser_emovo

Italian Speech Emotion Recognition / I. Mantegazza, S. Ntalampiras - In: 2023 24th International Conference on Digital Signal Processing (DSP)[s.l] : IEEE, 2023. - ISBN 979-8-3503-3959-8. - pp. 1-5 (( convegno DSP tenutosi a Rhodes nel 2023 [10.1109/DSP58604.2023.10167766].

Italian Speech Emotion Recognition

Mantegazza, Irene;S. Ntalampiras

2023

Abstract

Affective computing is gaining increased interest by the scientific community in the last decades with the acoustic modality playing a central role. This paper presents an extensive computational analysis of emotional speech focusing on the Italian language. More precisely, we propose a novel classification algorithm based on a suitable data augmentation scheme. The aim is to classify the seven emotions (anger, disgust, fear, joy, neutral, sadness, and surprise) included in the only publicly available database of Italian emotional speech, i.e. EMOVO. To this end, we employed two feature sets, Mel Frequency Cepstral Coefficients and log-Mel spectrogram, each one combined with a suitable classifier, i.e. Mutilayer perceptron and Convolutional neural network respectively. The implementation and evaluation of the proposed SER pipeline can be accessed through the following link: https://github.com/irenemante/ser_emovo

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
			Affective computing; Convolutional neural network; Multilayer perceptron; data augmentation; MFCCs; log-Mel spectrogram
		
	Settori scientifico-disciplinari del contributo
	
			Settore INF/01 - Informatica
		
	Data di pubblicazione
	
			2023
		
	DOI
	
			https://dx.doi.org/10.1109/DSP58604.2023.10167766
		
	Tipologia
	
			Book Part (author)
		
	Appare nelle tipologie:
	
			03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
audio_pattern_recognition.pdf accesso riservato Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 454.53 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	454.53 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
Italian_Speech_Emotion_Recognition.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 1.15 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.15 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/983658

Citazioni

ND

0

ND

social impact