Robust model-based 3D head pose estimation

Meyer, G.P.; Gupta, S.; Frosio, I.; Reddy, D.; Kautz, J.

doi:10.1109/ICCV.2015.416

We introduce a method for accurate three dimensional head pose estimation using a commodity depth camera. We perform pose estimation by registering a morphable face model to the measured depth data, using a combination of particle swarm optimization (PSO) and the iterative closest point (ICP) algorithm, which minimizes a cost function that includes a 3D registration and a 2D overlap term. The pose is estimated on the fly without requiring an explicit initialization or training phase. Our method handles large pose angles and partial occlusions by dynamically adapting to the reliable visible parts of the face. It is robust and generalizes to different depth sensors without modification. On the Biwi Kinect dataset, we achieve best-in-class performance, with average angular errors of 2.1, 2.1 and 2.4 degrees for yaw, pitch, and roll, respectively, and an average translational error of 5.9 mm, while running at 6 fps on a graphics processing unit.

Robust model-based 3D head pose estimation / G.P. Meyer, S. Gupta, I. Frosio, D. Reddy, J. Kautz - In: Computer Vision (ICCV), 2015 IEEE International Conference on[s.l] : IEEE, 2016. - ISBN 9781467383912. - pp. 3649-3657 (( convegno International Conference on Computer Vision tenutosi a Santiago nel 2015 [10.1109/ICCV.2015.416].

Robust model-based 3D head pose estimation

G. P. Meyer;S. Gupta;I. Frosio;D. Reddy;J. Kautz

2016

Abstract

We introduce a method for accurate three dimensional head pose estimation using a commodity depth camera. We perform pose estimation by registering a morphable face model to the measured depth data, using a combination of particle swarm optimization (PSO) and the iterative closest point (ICP) algorithm, which minimizes a cost function that includes a 3D registration and a 2D overlap term. The pose is estimated on the fly without requiring an explicit initialization or training phase. Our method handles large pose angles and partial occlusions by dynamically adapting to the reliable visible parts of the face. It is robust and generalizes to different depth sensors without modification. On the Biwi Kinect dataset, we achieve best-in-class performance, with average angular errors of 2.1, 2.1 and 2.4 degrees for yaw, pitch, and roll, respectively, and an average translational error of 5.9 mm, while running at 6 fps on a graphics processing unit.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
			resolution
		
	Settori scientifico-disciplinari del contributo
	
			Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
		
	Data di pubblicazione
	
			2016
		
	DOI
	
			https://dx.doi.org/10.1109/ICCV.2015.416
		
	Tipologia
	
			Book Part (author)
		
	Appare nelle tipologie:
	
			03 - Contributo in volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/485061

Citazioni

ND

98

63

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca