This article addresses a problem arising within the paralinguistic audio signal processing domain-that of classifying the state of an infant based on the patterns exhibited by the crying sound events. More specifically we propose a methodology able to distinguish among the following five states: (a) hungry, (b) uncomfortable (need change), (c) need to burp, (d) in pain, and (e) need to sleep. A great variety of audio parameters (Perceptual Linear Prediction, Mel Frequency Cepstral Coefficients, Perceptual Wavelet Packets, Teager Energy Operator, Temporal Modulation) related to the task at hand along with a series of classification techniques (Multilayer Perceptron, Support Vector Machine, Random Forest, Reservoir Network, Gaussian Mixture model, Hidden Markov model) were customized for addressing the issue in a reliable manner. The final implementation exploits a representation of the audio structure including a set of descriptors capturing heterogeneous aspects of the signal. Subsequently we introduce the usage of Reservoir Networks to the specific problematic that demonstrated quite encouraging performance. The final goal of the method is to provide an automatic and non-invasive framework for monitoring infants and helping inexperienced/trainee pediatricians and/or parents and babysitters to diagnose their pathological status.

Audio pattern recognition of baby crying sound events / S. Ntalampiras. - In: AES. - ISSN 1549-4950. - 63:5(2015), pp. 358-369. [10.17743/jaes.2015.0025]

Audio pattern recognition of baby crying sound events

S. Ntalampiras
2015

Abstract

This article addresses a problem arising within the paralinguistic audio signal processing domain-that of classifying the state of an infant based on the patterns exhibited by the crying sound events. More specifically we propose a methodology able to distinguish among the following five states: (a) hungry, (b) uncomfortable (need change), (c) need to burp, (d) in pain, and (e) need to sleep. A great variety of audio parameters (Perceptual Linear Prediction, Mel Frequency Cepstral Coefficients, Perceptual Wavelet Packets, Teager Energy Operator, Temporal Modulation) related to the task at hand along with a series of classification techniques (Multilayer Perceptron, Support Vector Machine, Random Forest, Reservoir Network, Gaussian Mixture model, Hidden Markov model) were customized for addressing the issue in a reliable manner. The final implementation exploits a representation of the audio structure including a set of descriptors capturing heterogeneous aspects of the signal. Subsequently we introduce the usage of Reservoir Networks to the specific problematic that demonstrated quite encouraging performance. The final goal of the method is to provide an automatic and non-invasive framework for monitoring infants and helping inexperienced/trainee pediatricians and/or parents and babysitters to diagnose their pathological status.
Engineering (all); Music
Settore INF/01 - Informatica
AES
Article (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2434/615178
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 20
social impact