This work introduces the methodology of Decision Directed Acyclic Graphs (DDAG)11 This work uses the following abbreviations as regards to directed graphs: Directed Acyclic Graph (DAG), Decision Directed Acyclic Graph (DDAG) and Directed Acyclic Graph Hidden Markov Model (DAGHMM). to the scientific domain of content based audio signal processing. We apply the particular methodology to three multiclass classification problems involving the categories of generalized sound events, musical genres, and speech expressing emotional states. A decision graph is constructed which breaks the overall problem into a series of two-class ones. The order of the graph nodes is revealed using a clustering criterion based on the Kullback-Leibler divergence. Every graph node is composed by two hidden Markov models, each one representing the class which participates in the specific problem. We extract three heterogeneous feature sets (Mel-Filterbank, MPEG-7 Audio Spectrum Projection and Perceptual Wavelet Packets) out of each recording and fuse them for training the HMMs. Extensive comparative experiments are conducted using the following three datasets: (a) a combination of professional sound effects collections, (b) GTZAN musical genre database, and (c) BERLIN emotional speech corpus. The results demonstrate the superiority of the DDAG classification approach over the standard HMM approach regardless the application task.

Directed Acyclic Graphs for Content Based Sound, Musical Genre, and Speech Emotion Classification / S. Ntalampiras. - In: JOURNAL OF NEW MUSIC RESEARCH. - ISSN 0929-8215. - 43:2(2014), pp. 173-182. [10.1080/09298215.2013.859709]

Directed Acyclic Graphs for Content Based Sound, Musical Genre, and Speech Emotion Classification

S. Ntalampiras
2014

Abstract

This work introduces the methodology of Decision Directed Acyclic Graphs (DDAG)11 This work uses the following abbreviations as regards to directed graphs: Directed Acyclic Graph (DAG), Decision Directed Acyclic Graph (DDAG) and Directed Acyclic Graph Hidden Markov Model (DAGHMM). to the scientific domain of content based audio signal processing. We apply the particular methodology to three multiclass classification problems involving the categories of generalized sound events, musical genres, and speech expressing emotional states. A decision graph is constructed which breaks the overall problem into a series of two-class ones. The order of the graph nodes is revealed using a clustering criterion based on the Kullback-Leibler divergence. Every graph node is composed by two hidden Markov models, each one representing the class which participates in the specific problem. We extract three heterogeneous feature sets (Mel-Filterbank, MPEG-7 Audio Spectrum Projection and Perceptual Wavelet Packets) out of each recording and fuse them for training the HMMs. Extensive comparative experiments are conducted using the following three datasets: (a) a combination of professional sound effects collections, (b) GTZAN musical genre database, and (c) BERLIN emotional speech corpus. The results demonstrate the superiority of the DDAG classification approach over the standard HMM approach regardless the application task.
audio signal processing; content-based generalized sound recognition; decision directed acyclic graph; hidden Markov model; 1213; Music
Settore INF/01 - Informatica
2014
Article (author)
File in questo prodotto:
File Dimensione Formato  
09298215.2013.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 732.64 kB
Formato Adobe PDF
732.64 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/615161
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 10
social impact