Recognizing Visual Signatures of Spontaneous Head Gestures

Sharma, M.; Ahmetovic, D.; Jeni, L.; Kitani, K.

doi:10.1109/WACV.2018.00050

Head movements are an integral part of human nonverbal communication. As such, the ability to detect various types of head gestures from video is important for robotic systems that need to interact with people or for assistive technologies that may need to detect conversational gestures to aid communication. To this end, we propose a novel Multi-Scale Deep Convolution-LSTM architecture, capable of recognizing short and long term motion patterns found in head gestures, from video data of natural and unconstrained conversations. In particular, our models use Convolutional Neural Networks (CNNs) to learn meaningful representations from short time windows over head motion data. To capture longer term dependencies, we use Recurrent Neural Networks (RNNs) that extract temporal patterns across the output of the CNNs. We compare against classical approaches using discriminative and generative graphical models and show that our model is able to significantly outperform baseline models.

Recognizing Visual Signatures of Spontaneous Head Gestures / M. Sharma, D. Ahmetovic, L. Jeni, K. Kitani - In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV)[s.l] : IEEE, 2018. - ISBN 9781538648865. - pp. 400-408 (( Intervento presentato al 18. convegno IEEE Winter Conference on Applications of Computer Vision (WACV) tenutosi a Lake Tahoe nel 2018 [10.1109/WACV.2018.00050].

Recognizing Visual Signatures of Spontaneous Head Gestures

Mohit Sharma;D. Ahmetovic;Laszlo Jeni;Kris Kitani

2018

Abstract

Head movements are an integral part of human nonverbal communication. As such, the ability to detect various types of head gestures from video is important for robotic systems that need to interact with people or for assistive technologies that may need to detect conversational gestures to aid communication. To this end, we propose a novel Multi-Scale Deep Convolution-LSTM architecture, capable of recognizing short and long term motion patterns found in head gestures, from video data of natural and unconstrained conversations. In particular, our models use Convolutional Neural Networks (CNNs) to learn meaningful representations from short time windows over head motion data. To capture longer term dependencies, we use Recurrent Neural Networks (RNNs) that extract temporal patterns across the output of the CNNs. We compare against classical approaches using discriminative and generative graphical models and show that our model is able to significantly outperform baseline models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				turns
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2018
			
	DOI
	
				https://dx.doi.org/10.1109/WACV.2018.00050
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
sharma2018recognizing.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 651.34 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	651.34 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/697973

Citazioni

ND

17

13

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca