IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

3D recordings and audio, namely techniques that aim to create the perception of sound sources placed anywhere in 3 dimensional space, are becoming an interesting resource for composers, live performances and augmented reality. This thesis focuses on binaural spatialization techniques. We will tackle the problem from three different perspectives. The first one is related to the implementation of an engine for audio convolution, this is a real implementation problem where we will confront with a number of already available systems trying to achieve better results in terms of performances. General Purpose computing on Graphic Processing Units (GPGPU) is a promising approach to problems where a high parallelization of tasks is desirable. In this thesis the GPGPU approach is applied to both offline and real-time convolution having in mind the spatialization of multiple sound sources which is one of the critical problems in the field. Comparisons between this approach and typical CPU implementations are presented as well as between FFT and time domain approaches. The second aspect is related to the implementation of an augmented reality system having in mind an ``off the shelf'' system available to most home computers without the need of specialized hardware. A system capable of detecting the position of the listener through a head-tracking system and rendering a 3D audio environment by binaural spatialization is presented. Head tracking is performed through face tracking algorithms that use a standard webcam, and the result is presented over headphones, like in other typical binaural applications. With this system users can choose audio files to play, provide virtual positions for sources in an Euclidean space, and then listen as if they are coming from that position. If users move their head, the signals provided by the system change accordingly in real-time, thus providing the realistic effect of a coherent scene. The last aspect covered by this work is within the field of psychoacoustic, long term research where we are interested in understanding how binaural audio and recordings are perceived and how then auralization systems can be efficiently designed. Considerations with regard to the quality and the realism of such sounds in the context of ASA (Auditory Scene Analysis) are proposed.

ON BINAURAL SPATIALIZATION AND THE USE OF GPGPU FOR AUDIO PROCESSING / D.a. Mauro ; tutor: G. Haus ; coordinatore: E. Damiani. Universita' degli Studi di Milano, 2012 Mar 06. 24. ciclo, Anno Accademico 2011. [10.13130/mauro-davide-andrea_phd2012-03-06].

ON BINAURAL SPATIALIZATION AND THE USE OF GPGPU FOR AUDIO PROCESSING

D.A. Mauro

2012

Abstract

3D recordings and audio, namely techniques that aim to create the perception of sound sources placed anywhere in 3 dimensional space, are becoming an interesting resource for composers, live performances and augmented reality. This thesis focuses on binaural spatialization techniques. We will tackle the problem from three different perspectives. The first one is related to the implementation of an engine for audio convolution, this is a real implementation problem where we will confront with a number of already available systems trying to achieve better results in terms of performances. General Purpose computing on Graphic Processing Units (GPGPU) is a promising approach to problems where a high parallelization of tasks is desirable. In this thesis the GPGPU approach is applied to both offline and real-time convolution having in mind the spatialization of multiple sound sources which is one of the critical problems in the field. Comparisons between this approach and typical CPU implementations are presented as well as between FFT and time domain approaches. The second aspect is related to the implementation of an augmented reality system having in mind an ``off the shelf'' system available to most home computers without the need of specialized hardware. A system capable of detecting the position of the listener through a head-tracking system and rendering a 3D audio environment by binaural spatialization is presented. Head tracking is performed through face tracking algorithms that use a standard webcam, and the result is presented over headphones, like in other typical binaural applications. With this system users can choose audio files to play, provide virtual positions for sources in an Euclidean space, and then listen as if they are coming from that position. If users move their head, the signals provided by the system change accordingly in real-time, thus providing the realistic effect of a coherent scene. The last aspect covered by this work is within the field of psychoacoustic, long term research where we are interested in understanding how binaural audio and recordings are perceived and how then auralization systems can be efficiently designed. Considerations with regard to the quality and the realism of such sounds in the context of ASA (Auditory Scene Analysis) are proposed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				6-mar-2012
			
	Parole chiave
	
				3d audio ; binaural spatialization ; psychoacoustics ; DSP ; computer science
			
	Settori scientifico-disciplinari della tesi (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Tutor afferenti all'Ateneo
	
				HAUS, GOFFREDO
			
	Supervisori e coordinatori afferenti all'Ateneo
	
				DAMIANI, ERNESTO
			
	Tipologia
	
				Doctoral Thesis
			
	Citazione
	
				ON BINAURAL SPATIALIZATION AND THE USE OF GPGPU FOR AUDIO PROCESSING / D.a. Mauro ; tutor: G. Haus ; coordinatore: E. Damiani. Universita' degli Studi di Milano, 2012 Mar 06. 24. ciclo, Anno Accademico 2011. [10.13130/mauro-davide-andrea_phd2012-03-06].
			
	Appare nelle tipologie:
	
				Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_r08168.pdf accesso aperto Tipologia: Tesi di dottorato completa Dimensione 4.55 MB Formato Adobe PDF Visualizza/Apri	4.55 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/172440

Citazioni

ND

ND

ND

ND

social impact