A framework dedicated to embodied agents facial animation based on speech analysis in presence of background noise is described. Target application areas are entertainment and mobile visual communication. This novel approach derives from the speech signal all the necessary information needed to drive 3-D facial models. Using both digital signal processing and soft computin (fuzzy logic and neural networks) methodologies, a very flexible and lowcost solution for the extraction of lips and facial-related information has been implemented. The main advantage of the speech-based approach is that it is not invasive, as speech is captured by means of a microphone and there is no physical contact with the subject (no use of magnetic sensors or optical markers). This gives additional flexibility to the application in that more applicability derives, if compared to other methodologies. First a speech-based lip driver system was developed in order to synchronize speech to lip movements, then the methodology was extended to some important facial movements so that a facesynching system could be modeled. The developed system is speaker and language independent, so also neural network training operations are not required.

Audio based real-time speech animation of embodied conversational agents / M. Malcangi, R. de Tintis - In: Gesture-based communication in human-computer interaction : 5th International GestureWorkshop, GW 2003 Genova, Italy, April 15-17, 2003 : Selected Revised Papers / [a cura di] A. Camurri, G. Volpe. - Berlin : Springer, 2004. - ISBN 9783540210726. - pp. 429-430 (( Intervento presentato al 5th. convegno International GestureWorkshop tenutosi a Genova - Italy nel 2004.

Audio based real-time speech animation of embodied conversational agents

M. Malcangi
Primo
;
2004

Abstract

A framework dedicated to embodied agents facial animation based on speech analysis in presence of background noise is described. Target application areas are entertainment and mobile visual communication. This novel approach derives from the speech signal all the necessary information needed to drive 3-D facial models. Using both digital signal processing and soft computin (fuzzy logic and neural networks) methodologies, a very flexible and lowcost solution for the extraction of lips and facial-related information has been implemented. The main advantage of the speech-based approach is that it is not invasive, as speech is captured by means of a microphone and there is no physical contact with the subject (no use of magnetic sensors or optical markers). This gives additional flexibility to the application in that more applicability derives, if compared to other methodologies. First a speech-based lip driver system was developed in order to synchronize speech to lip movements, then the methodology was extended to some important facial movements so that a facesynching system could be modeled. The developed system is speaker and language independent, so also neural network training operations are not required.
Speech-animated Avatars ; Speech sprocessing ; Fuzzy Logic ; Artificial Neural Networks
Settore INF/01 - Informatica
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/142620
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? ND
social impact