This work introduces the few-shot learning paradigm in the speech emotion recognition domain. Emotional characterization of speech segments is carried out through analogies, i.e. by assessing similarities and dissimilarities between novel and known recordings. More specifically, we designed a Siamese Neural Network modeling such relationships on the combined log-Mel and temporal modulation spectrogram space. We present thorough experimentations assessing the performance of the proposed solution holistically, where it is demonstrated that it reaches state of the art rates when following the standard leave-one-speaker-out protocol, while at the same time being able to operate in non-stationary conditions, i.e. with limited knowledge of speakers and/or emotional classes. Finally, we investigated the activation maps in a layer-wise manner in order to interpret the predictions made by the model.
Speech emotion recognition via learning analogies / S. Ntalampiras. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 144:(2021 Apr), pp. 21-26. [10.1016/j.patrec.2021.01.018]
Speech emotion recognition via learning analogies
S. Ntalampiras
2021
Abstract
This work introduces the few-shot learning paradigm in the speech emotion recognition domain. Emotional characterization of speech segments is carried out through analogies, i.e. by assessing similarities and dissimilarities between novel and known recordings. More specifically, we designed a Siamese Neural Network modeling such relationships on the combined log-Mel and temporal modulation spectrogram space. We present thorough experimentations assessing the performance of the proposed solution holistically, where it is demonstrated that it reaches state of the art rates when following the standard leave-one-speaker-out protocol, while at the same time being able to operate in non-stationary conditions, i.e. with limited knowledge of speakers and/or emotional classes. Finally, we investigated the activation maps in a layer-wise manner in order to interpret the predictions made by the model.File | Dimensione | Formato | |
---|---|---|---|
42 1-s2.0-S0167865521000313-main.pdf
Open Access dal 21/01/2023
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
1.12 MB
Formato
Adobe PDF
|
1.12 MB | Adobe PDF | Visualizza/Apri |
1-s2.0-S0167865521000313-main.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
1.47 MB
Formato
Adobe PDF
|
1.47 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.