Head movements are an integral part of human nonverbal communication. As such, the ability to detect various types of head gestures from video is important for robotic systems that need to interact with people or for assistive technologies that may need to detect conversational gestures to aid communication. To this end, we propose a novel Multi-Scale Deep Convolution-LSTM architecture, capable of recognizing short and long term motion patterns found in head gestures, from video data of natural and unconstrained conversations. In particular, our models use Convolutional Neural Networks (CNNs) to learn meaningful representations from short time windows over head motion data. To capture longer term dependencies, we use Recurrent Neural Networks (RNNs) that extract temporal patterns across the output of the CNNs. We compare against classical approaches using discriminative and generative graphical models and show that our model is able to significantly outperform baseline models.

Recognizing Visual Signatures of Spontaneous Head Gestures / M. Sharma, D. Ahmetovic, L. Jeni, K. Kitani - In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV)[s.l] : IEEE, 2018. - ISBN 9781538648865. - pp. 400-408 (( Intervento presentato al 18. convegno IEEE Winter Conference on Applications of Computer Vision (WACV) tenutosi a Lake Tahoe nel 2018 [10.1109/WACV.2018.00050].

Recognizing Visual Signatures of Spontaneous Head Gestures

D. Ahmetovic;
2018

Abstract

Head movements are an integral part of human nonverbal communication. As such, the ability to detect various types of head gestures from video is important for robotic systems that need to interact with people or for assistive technologies that may need to detect conversational gestures to aid communication. To this end, we propose a novel Multi-Scale Deep Convolution-LSTM architecture, capable of recognizing short and long term motion patterns found in head gestures, from video data of natural and unconstrained conversations. In particular, our models use Convolutional Neural Networks (CNNs) to learn meaningful representations from short time windows over head motion data. To capture longer term dependencies, we use Recurrent Neural Networks (RNNs) that extract temporal patterns across the output of the CNNs. We compare against classical approaches using discriminative and generative graphical models and show that our model is able to significantly outperform baseline models.
turns
Settore INF/01 - Informatica
2018
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
sharma2018recognizing.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 651.34 kB
Formato Adobe PDF
651.34 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/697973
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 9
social impact