MoTT: A Speech Dataset for Modular Composition of Turn-Taking Conversations

Salada, G.; Fantini, D.; Avanzini, F.; Presti, G.

doi:10.1109/i3da65421.2025.11202114

Among the numerous speech datasets in the literature, only a minority concerns conversational data, and even fewer datasets isolate the elements occurring in turn-taking conversations. To address this gap, this paper presents MoTT, an English speech dataset composed of questions, answers, reciprocal questions, and backchannel responses recorded by eight participants. The questions and answers pertain to ten topics and were recorded in two takes. The voice directivity pattern was simultaneously captured at frontal and lateral positions by two microphones. The MoTT dataset was designed to provide interchangeable conversational elements and enable their modular composition to obtain fictional but plausible and convincing conversations. As a result, multiple virtual speakers engage in a turn-taking conversation that emulates real-world interactions, with spatial audio techniques employed to enhance realism by arranging the speakers in the auditory scene. This dataset offers a valuable resource for studies in immersive spatial audio, human-computer interaction, and auditory scene analysis. The dataset is therefore well-suited for experiments that necessitate the simulation of ecologically valid conversations, as the one described in the use case reported in this paper.

MoTT: A Speech Dataset for Modular Composition of Turn-Taking Conversations / G. Salada, D. Fantini, F. Avanzini, G. Presti - In: 2025 Immersive and 3D Audio: from Architecture to Automotive (I3DA)[s.l] : IEEE, 2025. - ISBN 979-8-3315-5828-4. - pp. 1-8 (( convegno International Conference on Immersive and 3D Audio tenutosi a Bologna nel 2025 [10.1109/i3da65421.2025.11202114].

MoTT: A Speech Dataset for Modular Composition of Turn-Taking Conversations

Salada, Giulio;D. Fantini;F. Avanzini^Penultimo;G. Presti^Ultimo

2025

Abstract

Among the numerous speech datasets in the literature, only a minority concerns conversational data, and even fewer datasets isolate the elements occurring in turn-taking conversations. To address this gap, this paper presents MoTT, an English speech dataset composed of questions, answers, reciprocal questions, and backchannel responses recorded by eight participants. The questions and answers pertain to ten topics and were recorded in two takes. The voice directivity pattern was simultaneously captured at frontal and lateral positions by two microphones. The MoTT dataset was designed to provide interchangeable conversational elements and enable their modular composition to obtain fictional but plausible and convincing conversations. As a result, multiple virtual speakers engage in a turn-taking conversation that emulates real-world interactions, with spatial audio techniques employed to enhance realism by arranging the speakers in the auditory scene. This dataset offers a valuable resource for studies in immersive spatial audio, human-computer interaction, and auditory scene analysis. The dataset is therefore well-suited for experiments that necessitate the simulation of ecologically valid conversations, as the one described in the use case reported in this paper.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Dataset; speech; audio recording; turn-taking
			
	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Transforming auditory-based social interaction and communication in AR/VR (SONICOM)
								
	Acronimo
	
									SONICOM
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									H2020
								
	N. Contratto
	
									101017743
								
	Data di pubblicazione
	
				2025
			
	DOI
	
				https://dx.doi.org/10.1109/i3da65421.2025.11202114
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
MoTT_A_Speech_Dataset_for_Modular_Composition_of_Turn-Taking_Conversations.pdf accesso riservato Tipologia: Publisher's version/PDF Licenza: Nessuna licenza Dimensione 8.76 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	8.76 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1190266

Citazioni

ND

ND

ND

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca