IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

This article presents a multidomain approach which addresses the problem of automatic home environmental sound recognition. The proposed system will be part of a human activity monitoring system which will be based on heterogeneous sensors. This work concerns the audio classification component and its primary role is to detect anomalous sound events. We compare the discriminative capabilities of three feature sets (MFCC, MPEG-7 low level descriptors and a novel set based on wavelet packets) with respect to the classification of ten sound classes. These are combined with state of the art generative techniques (GMM and HMM) for estimating the density function of each class. The highest average recognition rate is 95.7% and is achieved by the vector formed by all the feature sets juxtaposed.

A multidomain approach for automatic home environmental sound classification / S. Ntalampiras, I. Potamitis, N. Fakotakis - In: INTERSPEECH 2010[s.l] : ISCA, 2010. - ISBN 9781617821233. - pp. 2210-2213 (( Intervento presentato al 11. convegno Annual Conference of the International Speech Communication Association tenutosi a Makuhari nel 2010.

A multidomain approach for automatic home environmental sound classification

S. Ntalampiras;Potamitis, Ilyas;Fakotakis, Nikos

2010

Abstract

This article presents a multidomain approach which addresses the problem of automatic home environmental sound recognition. The proposed system will be part of a human activity monitoring system which will be based on heterogeneous sensors. This work concerns the audio classification component and its primary role is to detect anomalous sound events. We compare the discriminative capabilities of three feature sets (MFCC, MPEG-7 low level descriptors and a novel set based on wavelet packets) with respect to the classification of ten sound classes. These are combined with state of the art generative techniques (GMM and HMM) for estimating the density function of each class. The highest average recognition rate is 95.7% and is achieved by the vector formed by all the feature sets juxtaposed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				computer audition; content-based audio recognition; MPEG-7 audio standard; wavelet packets
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2010
			
	Enti collegati al convegno
	
				Renesas Electronics Corporation
Google
Microsoft Corporation
Nuance Communications, Inc.
Appen Pty Ltd
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
i10_2210.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 102.52 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	102.52 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/615103

Citazioni

ND

6

6

ND

social impact