Following the constantly increasing adoption of affective computing based solutions, this paper investigates the feasibility of multilingual anger identification. To this end, we formed such a corpus by suitably combining seven different datasets representing five different languages, i.e. English, German, Italian, Urdu, and Persian. After analyzing the diverse characteristics of the datasets, we designed four classification algorithms, namely Support Vector Machine, Decision Tree-based Bagging scheme, Convolutional Neural Network, and Convolutional Recurrent Neural Network. Such classification mechanisms are trained on appropriate features extracted from time and/or frequency domains, while speech data have been balanced considering every diverse characteristic incorporated in the datasets (language, sex, acted, etc.). Our findings render multilingual anger identification feasible since the proposed audio pattern recognition methodology based on Mel-spectrograms and CRNN achieved quite satisfactory identification rates.
Language-agnostic speech anger identification / A. Saitta, S. Ntalampiras - In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP)[s.l] : IEEE, 2021. - ISBN 978-1-6654-2933-7. - pp. 249-253 (( Intervento presentato al 44. convegno International Conference on Telecommunications and Signal Processing (TSP) tenutosi a Brno nel 2021 [10.1109/TSP52935.2021.9522606].
Language-agnostic speech anger identification
S. Ntalampiras
2021
Abstract
Following the constantly increasing adoption of affective computing based solutions, this paper investigates the feasibility of multilingual anger identification. To this end, we formed such a corpus by suitably combining seven different datasets representing five different languages, i.e. English, German, Italian, Urdu, and Persian. After analyzing the diverse characteristics of the datasets, we designed four classification algorithms, namely Support Vector Machine, Decision Tree-based Bagging scheme, Convolutional Neural Network, and Convolutional Recurrent Neural Network. Such classification mechanisms are trained on appropriate features extracted from time and/or frequency domains, while speech data have been balanced considering every diverse characteristic incorporated in the datasets (language, sex, acted, etc.). Our findings render multilingual anger identification feasible since the proposed audio pattern recognition methodology based on Mel-spectrograms and CRNN achieved quite satisfactory identification rates.File | Dimensione | Formato | |
---|---|---|---|
61 Language-agnostic speech anger identification.pdf
accesso aperto
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
320.9 kB
Formato
Adobe PDF
|
320.9 kB | Adobe PDF | Visualizza/Apri |
Language-agnostic_speech_anger_identification.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
2.7 MB
Formato
Adobe PDF
|
2.7 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.