Multimedia information and embedded systems are two major technological advances that have significantly changed the way people interact with systems and information in recent years. In this context, audio proves to be the most advantageous media for interacting with embedded systems and their content. Advantages include: hands-free operation; unattended interaction; and simple, cheap devices for capture and playback. The use of embedded systems to seek information stored locally or on the web points up several difficulties inherent in the nature of multimedia-information signals. These difficulties are especially evident when palmtop or deeply embedded devices are used for such purposes. Developing a set of digital-signal-processing- based algorithms for extracting audio information is a primary step toward providing user-friendly access to multimedia information and developing powerful communication interfaces. The algorithms aim to extract semantic and syntactic information from audio signals, including voice. Extracted audio features are employed to access information in multimedia databases, as well as to index it. More extensive, higher-level information, such as audio-source identification (speaker identification) and genre (in the case of music), must be extracted from the audio signal. One basic task involves transforming audio into symbols (e.g. music transformed into a score, speech transformed into text) and transcribing symbols into audio (e.g. score transformed into musical audio, text transformed into speech). The purpose is to search for and access any kind of multimedia information by means of audio. To attain these results, digital audio-processing, digital speech-processing, and soft-computing methods need to be integrated. Neural networks are used as classifiers and fuzzy logic is used for making smart decisions.

Multi-method Audio-based Retrieval of Multimedia Information / M. Malcangi. - In: WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS. - ISSN 1790-0832. - Volume 7:Issue 2(2010 Feb), pp. 310-319.

Multi-method Audio-based Retrieval of Multimedia Information

M. Malcangi
Primo
2010

Abstract

Multimedia information and embedded systems are two major technological advances that have significantly changed the way people interact with systems and information in recent years. In this context, audio proves to be the most advantageous media for interacting with embedded systems and their content. Advantages include: hands-free operation; unattended interaction; and simple, cheap devices for capture and playback. The use of embedded systems to seek information stored locally or on the web points up several difficulties inherent in the nature of multimedia-information signals. These difficulties are especially evident when palmtop or deeply embedded devices are used for such purposes. Developing a set of digital-signal-processing- based algorithms for extracting audio information is a primary step toward providing user-friendly access to multimedia information and developing powerful communication interfaces. The algorithms aim to extract semantic and syntactic information from audio signals, including voice. Extracted audio features are employed to access information in multimedia databases, as well as to index it. More extensive, higher-level information, such as audio-source identification (speaker identification) and genre (in the case of music), must be extracted from the audio signal. One basic task involves transforming audio into symbols (e.g. music transformed into a score, speech transformed into text) and transcribing symbols into audio (e.g. score transformed into musical audio, text transformed into speech). The purpose is to search for and access any kind of multimedia information by means of audio. To attain these results, digital audio-processing, digital speech-processing, and soft-computing methods need to be integrated. Neural networks are used as classifiers and fuzzy logic is used for making smart decisions.
Audio features; Audio-to-score; Digital audio processing; Multimedia information; Pattern matching; Score-toaudio; Soft computing; Speech-to-text; Text-to-speech
Settore INF/01 - Informatica
feb-2010
http://www.wseas.us/e-library/transactions/information/2010/89-302.pdf
Article (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/139786
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact