IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

This work introduces the few-shot learning paradigm in the speech emotion recognition domain. Emotional characterization of speech segments is carried out through analogies, i.e. by assessing similarities and dissimilarities between novel and known recordings. More specifically, we designed a Siamese Neural Network modeling such relationships on the combined log-Mel and temporal modulation spectrogram space. We present thorough experimentations assessing the performance of the proposed solution holistically, where it is demonstrated that it reaches state of the art rates when following the standard leave-one-speaker-out protocol, while at the same time being able to operate in non-stationary conditions, i.e. with limited knowledge of speakers and/or emotional classes. Finally, we investigated the activation maps in a layer-wise manner in order to interpret the predictions made by the model.

Speech emotion recognition via learning analogies / S. Ntalampiras. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 144:(2021 Apr), pp. 21-26. [10.1016/j.patrec.2021.01.018]

Speech emotion recognition via learning analogies

S. Ntalampiras

2021

Abstract

This work introduces the few-shot learning paradigm in the speech emotion recognition domain. Emotional characterization of speech segments is carried out through analogies, i.e. by assessing similarities and dissimilarities between novel and known recordings. More specifically, we designed a Siamese Neural Network modeling such relationships on the combined log-Mel and temporal modulation spectrogram space. We present thorough experimentations assessing the performance of the proposed solution holistically, where it is demonstrated that it reaches state of the art rates when following the standard leave-one-speaker-out protocol, while at the same time being able to operate in non-stationary conditions, i.e. with limited knowledge of speakers and/or emotional classes. Finally, we investigated the activation maps in a layer-wise manner in order to interpret the predictions made by the model.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Affective computing; Deep learning; Few-shot learning; Online learningSpeech emotion recognition;
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				apr-2021
			
	Data ahead of print o data di stampa
	
				20-gen-2021
			
	Rivista in ANCE
	
				PATTERN RECOGNITION LETTERS
			
	DOI
	
				https://dx.doi.org/10.1016/j.patrec.2021.01.018
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
42 1-s2.0-S0167865521000313-main.pdf Open Access dal 21/01/2023 Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 1.12 MB Formato Adobe PDF Visualizza/Apri	1.12 MB	Adobe PDF	Visualizza/Apri
1-s2.0-S0167865521000313-main.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 1.47 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.47 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/810556

Citazioni

ND

37

27

ND

social impact