IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Deep learning has achieved state-of-the-art performances in several research applications nowadays: from computer vision to bioinformatics, from object detection to image generation. In the context of such newly developed deep-learning approaches, we can define the concept of multimodality. The objective of this research field is to implement methodologies which can use several modalities as input features to perform predictions. In this, there is a strong analogy with respect to what happens with human cognition, since we rely on several different senses to make decisions. In this article, we present a short survey on multimodal integration using deep-learning methods. In a first instance, we comprehensively review the concept of multimodality, describing it from a two-dimensional perspective. First, we provide, in fact, a taxonomical description of the multimodality concept. Secondly, we define the second multimodality dimension as the one describing the fusion approaches in multimodal deep learning. Eventually, we describe four applications of multimodal deep learning to the following fields of research: speech recognition, sentiment analysis, forensic applications and image processing.

A Short Survey on Deep Learning for Multimodal Integration: Applications, Future Perspectives and Challenges / G.M. Dimitri. - In: COMPUTERS. - ISSN 2073-431X. - 11:11(2022), pp. 163.1-163.14. [10.3390/computers11110163]

A Short Survey on Deep Learning for Multimodal Integration: Applications, Future Perspectives and Challenges

G.M. Dimitri

2022

Abstract

Deep learning has achieved state-of-the-art performances in several research applications nowadays: from computer vision to bioinformatics, from object detection to image generation. In the context of such newly developed deep-learning approaches, we can define the concept of multimodality. The objective of this research field is to implement methodologies which can use several modalities as input features to perform predictions. In this, there is a strong analogy with respect to what happens with human cognition, since we rely on several different senses to make decisions. In this article, we present a short survey on multimodal integration using deep-learning methods. In a first instance, we comprehensively review the concept of multimodality, describing it from a two-dimensional perspective. First, we provide, in fact, a taxonomical description of the multimodality concept. Secondly, we define the second multimodality dimension as the one describing the fusion approaches in multimodal deep learning. Eventually, we describe four applications of multimodal deep learning to the following fields of research: speech recognition, sentiment analysis, forensic applications and image processing.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				deep learning; multi-modal; integration; fusion
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2022
			
	Rivista in ANCE
	
				COMPUTERS
			
	DOI
	
				https://dx.doi.org/10.3390/computers11110163
			
	URL
	
				https://www.mdpi.com/2073-431X/11/11/163
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
computers-11-00163.pdf accesso aperto Tipologia: Publisher's version/PDF Licenza: Creative commons Dimensione 575.09 kB Formato Adobe PDF Visualizza/Apri	575.09 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1187144

Citazioni

ND

21

19

21

social impact