IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

In this work, we explore the effectiveness of multimodal models for estimating the emotional state expressed continuously in the Valence/Arousal space. We consider four modalities typically adopted for the emotion recognition, namely audio (voice), video (face expression), electrocardiogram (ECG), and electrodermal activity (EDA), investigating different mixtures of them. To this aim, a CNN-based feature extraction module is adopted for each of the considered modalities, and an RNN-based module for modelling the dynamics of the affective behaviour. The fusion is performed in three different ways: at feature-level (after the CNN feature extraction), at model-level (combining the RNN layer’s outputs) and at prediction-level (late fusion). Results obtained on the publicly available RECOLA dataset, demonstrate that the use of multiple modalities improves the prediction performance. The best results are achieved exploiting the contribution of all the considered modalities, and employing the late fusion, but even mixtures of two modalities (especially audio and video) bring significant benefits.

Exploring Fusion Strategies in Deep Multimodal Affect Prediction / S. Patania, A. D'Amelio, R. Lanzarotti (LECTURE NOTES IN COMPUTER SCIENCE). - In: Image Analysis and Processing – ICIAP 2021 / [a cura di] S. Sclaroff, C. Distante, M. Leo, G.M. Farinella, F. Tombari. - [s.l] : Springer Verlag, 2022. - ISBN 978-3-031-06429-6. - pp. 730-741 (( convegno International Conference on Image Analysis and Processing, ICIAP 2021 tenutosi a Lecce nel 2022 [10.1007/978-3-031-06430-2_61].

Exploring Fusion Strategies in Deep Multimodal Affect Prediction

S. Patania;A. D'Amelio;R. Lanzarotti

2022

Abstract

In this work, we explore the effectiveness of multimodal models for estimating the emotional state expressed continuously in the Valence/Arousal space. We consider four modalities typically adopted for the emotion recognition, namely audio (voice), video (face expression), electrocardiogram (ECG), and electrodermal activity (EDA), investigating different mixtures of them. To this aim, a CNN-based feature extraction module is adopted for each of the considered modalities, and an RNN-based module for modelling the dynamics of the affective behaviour. The fusion is performed in three different ways: at feature-level (after the CNN feature extraction), at model-level (combining the RNN layer’s outputs) and at prediction-level (late fusion). Results obtained on the publicly available RECOLA dataset, demonstrate that the use of multiple modalities improves the prediction performance. The best results are achieved exploiting the contribution of all the considered modalities, and employing the late fusion, but even mixtures of two modalities (especially audio and video) bring significant benefits.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Multimodal emotion recognition; Deep learning; Multimodal fusion
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2022
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-031-06430-2_61
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
ICIAP_2021___Exploring_Fusion_Strategies_in_Deep_Multimodal_Affect_Prediction.pdf accesso aperto Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 844.66 kB Formato Adobe PDF Visualizza/Apri	844.66 kB	Adobe PDF	Visualizza/Apri
Patania2022_Chapter_ExploringFusionStrategiesInDee.pdf solo utenti autorizzati Tipologia: Publisher's version/PDF Dimensione 1.15 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.15 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/930966

Citazioni

ND

2

1

social impact