Affective computing is gaining increased interest by the scientific community in the last decades with the acoustic modality playing a central role. This paper presents an extensive computational analysis of emotional speech focusing on the Italian language. More precisely, we propose a novel classification algorithm based on a suitable data augmentation scheme. The aim is to classify the seven emotions (anger, disgust, fear, joy, neutral, sadness, and surprise) included in the only publicly available database of Italian emotional speech, i.e. EMOVO. To this end, we employed two feature sets, Mel Frequency Cepstral Coefficients and log-Mel spectrogram, each one combined with a suitable classifier, i.e. Mutilayer perceptron and Convolutional neural network respectively. The implementation and evaluation of the proposed SER pipeline can be accessed through the following link: https://github.com/irenemante/ser_emovo
Italian Speech Emotion Recognition / I. Mantegazza, S. Ntalampiras - In: 2023 24th International Conference on Digital Signal Processing (DSP)[s.l] : IEEE, 2023. - ISBN 979-8-3503-3959-8. - pp. 1-5 (( convegno DSP tenutosi a Rhodes nel 2023 [10.1109/DSP58604.2023.10167766].
Italian Speech Emotion Recognition
S. Ntalampiras
2023
Abstract
Affective computing is gaining increased interest by the scientific community in the last decades with the acoustic modality playing a central role. This paper presents an extensive computational analysis of emotional speech focusing on the Italian language. More precisely, we propose a novel classification algorithm based on a suitable data augmentation scheme. The aim is to classify the seven emotions (anger, disgust, fear, joy, neutral, sadness, and surprise) included in the only publicly available database of Italian emotional speech, i.e. EMOVO. To this end, we employed two feature sets, Mel Frequency Cepstral Coefficients and log-Mel spectrogram, each one combined with a suitable classifier, i.e. Mutilayer perceptron and Convolutional neural network respectively. The implementation and evaluation of the proposed SER pipeline can be accessed through the following link: https://github.com/irenemante/ser_emovoFile | Dimensione | Formato | |
---|---|---|---|
audio_pattern_recognition.pdf
accesso riservato
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
454.53 kB
Formato
Adobe PDF
|
454.53 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Italian_Speech_Emotion_Recognition.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
1.15 MB
Formato
Adobe PDF
|
1.15 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.