This article presents a multidomain approach which addresses the problem of automatic home environmental sound recognition. The proposed system will be part of a human activity monitoring system which will be based on heterogeneous sensors. This work concerns the audio classification component and its primary role is to detect anomalous sound events. We compare the discriminative capabilities of three feature sets (MFCC, MPEG-7 low level descriptors and a novel set based on wavelet packets) with respect to the classification of ten sound classes. These are combined with state of the art generative techniques (GMM and HMM) for estimating the density function of each class. The highest average recognition rate is 95.7% and is achieved by the vector formed by all the feature sets juxtaposed.

A multidomain approach for automatic home environmental sound classification / S. Ntalampiras, I. Potamitis, N. Fakotakis - In: INTERSPEECH 2010[s.l] : ISCA, 2010. - ISBN 9781617821233. - pp. 2210-2213 (( Intervento presentato al 11. convegno Annual Conference of the International Speech Communication Association tenutosi a Makuhari nel 2010.

A multidomain approach for automatic home environmental sound classification

S. Ntalampiras;
2010

Abstract

This article presents a multidomain approach which addresses the problem of automatic home environmental sound recognition. The proposed system will be part of a human activity monitoring system which will be based on heterogeneous sensors. This work concerns the audio classification component and its primary role is to detect anomalous sound events. We compare the discriminative capabilities of three feature sets (MFCC, MPEG-7 low level descriptors and a novel set based on wavelet packets) with respect to the classification of ten sound classes. These are combined with state of the art generative techniques (GMM and HMM) for estimating the density function of each class. The highest average recognition rate is 95.7% and is achieved by the vector formed by all the feature sets juxtaposed.
computer audition; content-based audio recognition; MPEG-7 audio standard; wavelet packets
Settore INF/01 - Informatica
2010
Renesas Electronics Corporation
Google
Microsoft Corporation
Nuance Communications, Inc.
Appen Pty Ltd
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
i10_2210.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 102.52 kB
Formato Adobe PDF
102.52 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/615103
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
social impact