Improving the accuracy of automatic facial expression recognition in speaking subjects with deep learning

Bursic, S.; Boccignone, G.; Ferrara, A.; D'Amelio, A.; Lanzarotti, R.

doi:10.3390/app10114002

When automatic facial expression recognition is applied to video sequences of speaking subjects, the recognition accuracy has been noted to be lower than with video sequences of still subjects. This effect known as the speaking effect arises during spontaneous conversations, and along with the affective expressions the speech articulation process influences facial configurations. In this work we question whether, aside from facial features, other cues relating to the articulation process would increase emotion recognition accuracy when added in input to a deep neural network model. We develop two neural networks that classify facial expressions in speaking subjects from the RAVDESS dataset, a spatio-temporal CNN and a GRU cell RNN. They are first trained on facial features only, and afterwards both on facial features and articulation related cues extracted from a model trained for lip reading, while varying the number of consecutive frames provided in input as well. We show that using DNNs the addition of features related to articulation increases classification accuracy up to 12%, the increase being greater with more consecutive frames provided in input to the model.

Improving the accuracy of automatic facial expression recognition in speaking subjects with deep learning / S. Bursic, G. Boccignone, A. Ferrara, A. D'Amelio, R. Lanzarotti. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 10:11(2020 Jun), pp. 4002.1-4002.15. [10.3390/app10114002]

Improving the accuracy of automatic facial expression recognition in speaking subjects with deep learning

S. Bursic;G. Boccignone;A. Ferrara;A. D'Amelio;R. Lanzarotti

2020

Abstract

When automatic facial expression recognition is applied to video sequences of speaking subjects, the recognition accuracy has been noted to be lower than with video sequences of still subjects. This effect known as the speaking effect arises during spontaneous conversations, and along with the affective expressions the speech articulation process influences facial configurations. In this work we question whether, aside from facial features, other cues relating to the articulation process would increase emotion recognition accuracy when added in input to a deep neural network model. We develop two neural networks that classify facial expressions in speaking subjects from the RAVDESS dataset, a spatio-temporal CNN and a GRU cell RNN. They are first trained on facial features only, and afterwards both on facial features and articulation related cues extracted from a model trained for lip reading, while varying the number of consecutive frames provided in input as well. We show that using DNNs the addition of features related to articulation increases classification accuracy up to 12%, the increase being greater with more consecutive frames provided in input to the model.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
			Affective computing; Deep learning; Emotion recognition; Facial expression recognition; Speaking effect
		
	Settori scientifico-disciplinari dell'articolo
	
			Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
		
	Titolo del progetto
	
	Titolo Progetto
	
									Stairway to elders: bridging space, time and emotions in their social environment for wellbeing
								
	Nome finanziatore
	
										FONDAZIONE CARIPLO
									
	N. Contratto
	
									2018-0858
								
	Data di pubblicazione
	
			giu-2020
		
	Rivista in ANCE
	
			APPLIED SCIENCES
		
	DOI
	
			https://dx.doi.org/10.3390/app10114002
		
	Tipologia
	
			Article (author)
		
	Appare nelle tipologie:
	
			01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
applsci-10-04002.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 10.42 MB Formato Adobe PDF Visualizza/Apri	10.42 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/747320

Citazioni

ND

15

9

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca