IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

This article addresses a problem arising within the paralinguistic audio signal processing domain-that of classifying the state of an infant based on the patterns exhibited by the crying sound events. More specifically we propose a methodology able to distinguish among the following five states: (a) hungry, (b) uncomfortable (need change), (c) need to burp, (d) in pain, and (e) need to sleep. A great variety of audio parameters (Perceptual Linear Prediction, Mel Frequency Cepstral Coefficients, Perceptual Wavelet Packets, Teager Energy Operator, Temporal Modulation) related to the task at hand along with a series of classification techniques (Multilayer Perceptron, Support Vector Machine, Random Forest, Reservoir Network, Gaussian Mixture model, Hidden Markov model) were customized for addressing the issue in a reliable manner. The final implementation exploits a representation of the audio structure including a set of descriptors capturing heterogeneous aspects of the signal. Subsequently we introduce the usage of Reservoir Networks to the specific problematic that demonstrated quite encouraging performance. The final goal of the method is to provide an automatic and non-invasive framework for monitoring infants and helping inexperienced/trainee pediatricians and/or parents and babysitters to diagnose their pathological status.

Audio pattern recognition of baby crying sound events / S. Ntalampiras. - In: AES. - ISSN 1549-4950. - 63:5(2015), pp. 358-369. [10.17743/jaes.2015.0025]

Audio pattern recognition of baby crying sound events

S. Ntalampiras

2015

Abstract

This article addresses a problem arising within the paralinguistic audio signal processing domain-that of classifying the state of an infant based on the patterns exhibited by the crying sound events. More specifically we propose a methodology able to distinguish among the following five states: (a) hungry, (b) uncomfortable (need change), (c) need to burp, (d) in pain, and (e) need to sleep. A great variety of audio parameters (Perceptual Linear Prediction, Mel Frequency Cepstral Coefficients, Perceptual Wavelet Packets, Teager Energy Operator, Temporal Modulation) related to the task at hand along with a series of classification techniques (Multilayer Perceptron, Support Vector Machine, Random Forest, Reservoir Network, Gaussian Mixture model, Hidden Markov model) were customized for addressing the issue in a reliable manner. The final implementation exploits a representation of the audio structure including a set of descriptors capturing heterogeneous aspects of the signal. Subsequently we introduce the usage of Reservoir Networks to the specific problematic that demonstrated quite encouraging performance. The final goal of the method is to provide an automatic and non-invasive framework for monitoring infants and helping inexperienced/trainee pediatricians and/or parents and babysitters to diagnose their pathological status.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Engineering (all); Music
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2015
			
	Rivista in ANCE
	
				AES
			
	DOI
	
				https://dx.doi.org/10.17743/jaes.2015.0025
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/615178

Citazioni

ND

33

26

ND

social impact