IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Improving automatic speech recognition systems is one of the hottest topics in speech-signal processing, especially if such systems are to operate in noisy environments. This paper proposes a multimodal evolutionary neuro- fuzzy approach to developing an automatic speech-recognition system. To make inferences at the decision stage about audiovisual information for speech- to-text conversion, the EFuNN paradigm was applied. Two independent feature extractors were developed, one for the speech phonetics (speech listening) and the other for the speech visemics (lip reading). The EFuNN network has been trained to fuse decisions on audio and decisions on video. This soft computing approach proved robust in harsh conditions and, at the same time, less complex than hard computing, pattern-matching methods. Preliminary experiments con- firm the reliability of the proposed method for developing a robust, automatic, speech-recognition system.

Evolving fuzzy-neural method for multimodal speech recognition / M. Malcangi, P. Grew (COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE). - In: Engineering Applications of Neural Networks : proceedings / [a cura di] L. Iliadis, C. Jayne. - Prima edizione. - [s.l] : Springer, 2015. - ISBN 9783319239811. - pp. 216-227 (( Intervento presentato al 16. convegno International Conference, EANN tenutosi a Rhodes nel 2015 [10.1007/978-3-319-23983-5].

Evolving fuzzy-neural method for multimodal speech recognition

M. Malcangi^Primo;P. Grew

2015

Abstract

Improving automatic speech recognition systems is one of the hottest topics in speech-signal processing, especially if such systems are to operate in noisy environments. This paper proposes a multimodal evolutionary neuro- fuzzy approach to developing an automatic speech-recognition system. To make inferences at the decision stage about audiovisual information for speech- to-text conversion, the EFuNN paradigm was applied. Two independent feature extractors were developed, one for the speech phonetics (speech listening) and the other for the speech visemics (lip reading). The EFuNN network has been trained to fuse decisions on audio and decisions on video. This soft computing approach proved robust in harsh conditions and, at the same time, less complex than hard computing, pattern-matching methods. Preliminary experiments con- firm the reliability of the proposed method for developing a robust, automatic, speech-recognition system.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Audiovisual Speech Recognition (AVSR); Decision fusion; Evolutionary Fuzzy Neural Network (EFuNN); Multimodal speech recognition; Speech-To-Text (STT)
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2015
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-319-23983-5
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/320764

Citazioni

ND

3

ND

ND

social impact