IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Reliability is the primary requirement in noisy conditions and for highly variable utterances. Integrating the recognition of visual signals with the recognition of audio signals is indispensable for many applications that require automatic speech recognition (ASR) in harsh conditions. Several important experiments have shown that integrating and adapting to multiple behavioral end-context information during the speech-recognition task significantly improves its success rate. By integrating audio and visual data from speech information, we can improve the performance of an ASR system by differentiating between the most critical cases of phonetic-unit mismatch that occur when processing audio or visual input alone. The evolving fuzzy neural-network (EFuNN) inference method is applied at the decision layer to accomplish this task. This is done through a paradigm that adapts to the environment by changing structure. The EFuNN’s capacity to learn quickly from incoming data and to adapt while on line lowers the ASR system’s complexity and enhances its performance in harsh conditions. Two independent feature extractors were developed, one for speech phonetics (listening to the speech) and the other for speech visemics (lip-reading the spoken input). The EFuNN network was trained to fuse decisions made disjointly by the audio unit and the visual unit. Our experiments have confirmed that the proposed method is reliable for developing a robust, automatic, speech recognition system.

Evolving connectionist method for adaptive audiovisual speech recognition / M.N. Malcangi, G. P.. - In: EVOLVING SYSTEMS. - ISSN 1868-6478. - (2016 Jul). [Epub ahead of print] [10.1007/s12530-016-9156-6]

Evolving connectionist method for adaptive audiovisual speech recognition

M.N. Malcangi^Primo;P. Grew

2016

Abstract

Reliability is the primary requirement in noisy conditions and for highly variable utterances. Integrating the recognition of visual signals with the recognition of audio signals is indispensable for many applications that require automatic speech recognition (ASR) in harsh conditions. Several important experiments have shown that integrating and adapting to multiple behavioral end-context information during the speech-recognition task significantly improves its success rate. By integrating audio and visual data from speech information, we can improve the performance of an ASR system by differentiating between the most critical cases of phonetic-unit mismatch that occur when processing audio or visual input alone. The evolving fuzzy neural-network (EFuNN) inference method is applied at the decision layer to accomplish this task. This is done through a paradigm that adapts to the environment by changing structure. The EFuNN’s capacity to learn quickly from incoming data and to adapt while on line lowers the ASR system’s complexity and enhances its performance in harsh conditions. Two independent feature extractors were developed, one for speech phonetics (listening to the speech) and the other for speech visemics (lip-reading the spoken input). The EFuNN network was trained to fuse decisions made disjointly by the audio unit and the visual unit. Our experiments have confirmed that the proposed method is reliable for developing a robust, automatic, speech recognition system.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Audiovisual speech recognition (AVSR) ; Evolving fuzzy neural network (EFuNN);  Speech-to-text; (STT); Decision fusion; Multimodal speech recognition
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				lug-2016
			
	Rivista in ANCE
	
				EVOLVING SYSTEMS
			
	DOI
	
				https://dx.doi.org/10.1007/s12530-016-9156-6
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
art%3A10.1007%2Fs12530-016-9156-6.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 2.05 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.05 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/425182

Citazioni

ND

6

5

ND

social impact