Following the constantly increasing adoption of affective computing based solutions, this paper investigates the feasibility of multilingual anger identification. To this end, we formed such a corpus by suitably combining seven different datasets representing five different languages, i.e. English, German, Italian, Urdu, and Persian. After analyzing the diverse characteristics of the datasets, we designed four classification algorithms, namely Support Vector Machine, Decision Tree-based Bagging scheme, Convolutional Neural Network, and Convolutional Recurrent Neural Network. Such classification mechanisms are trained on appropriate features extracted from time and/or frequency domains, while speech data have been balanced considering every diverse characteristic incorporated in the datasets (language, sex, acted, etc.). Our findings render multilingual anger identification feasible since the proposed audio pattern recognition methodology based on Mel-spectrograms and CRNN achieved quite satisfactory identification rates.

Language-agnostic speech anger identification / A. Saitta, S. Ntalampiras - In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP)[s.l] : IEEE, 2021. - ISBN 978-1-6654-2933-7. - pp. 249-253 (( Intervento presentato al 44. convegno International Conference on Telecommunications and Signal Processing (TSP) tenutosi a Brno nel 2021 [10.1109/TSP52935.2021.9522606].

Language-agnostic speech anger identification

S. Ntalampiras
2021

Abstract

Following the constantly increasing adoption of affective computing based solutions, this paper investigates the feasibility of multilingual anger identification. To this end, we formed such a corpus by suitably combining seven different datasets representing five different languages, i.e. English, German, Italian, Urdu, and Persian. After analyzing the diverse characteristics of the datasets, we designed four classification algorithms, namely Support Vector Machine, Decision Tree-based Bagging scheme, Convolutional Neural Network, and Convolutional Recurrent Neural Network. Such classification mechanisms are trained on appropriate features extracted from time and/or frequency domains, while speech data have been balanced considering every diverse characteristic incorporated in the datasets (language, sex, acted, etc.). Our findings render multilingual anger identification feasible since the proposed audio pattern recognition methodology based on Mel-spectrograms and CRNN achieved quite satisfactory identification rates.
speech emotion recognition; multilingual emotion recognition; audio pattern recognition; deep learning
Settore INF/01 - Informatica
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
61 Language-agnostic speech anger identification.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 320.9 kB
Formato Adobe PDF
320.9 kB Adobe PDF Visualizza/Apri
Language-agnostic_speech_anger_identification.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 2.7 MB
Formato Adobe PDF
2.7 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2434/865518
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact